Speech Recognition Don'ts Video



  • Sometimes knowing what to do isn't enough—knowing what not to do is also quite helpful. In this video, we'll teach you how to avoid some of the major pitfalls new speech recognition developers fall into when writing their applications. The video offers tips on what not to do when writing grammars, recording prompts, and designing applications.
  • RUNTIME 11:06


Video Transcription

Speech Recognition Don'ts

Until now we've been focusing on the things that should be done, best practices when building speech recognition applications. Now we're going to move to the dark side and focus on speech recognition don'ts. The worst practices, if you will, the things you absolutely, positively should avoid when you can. We'll look at things you should not do when creating:

  • Grammars
  • Prompts
  • Call Flow
  • Application Design


Grammar is one of the most challenging aspects of a speech application, it requires some programming skills, some human psychology, some statistical evaluation and if you're new to speech it's easy to get hung up with grammar design.


  • Make grammars too large: Probably the biggest mistake that new developers make is that they make grammars too large. Try to keep from making your grammar gigantic. You'll want to try and make it as big as necessary but no bigger. Provide just enough coverage for the majority of your callers so that callers who are using the system in a reasonable manner get recognized verbally and who are abusing the system do not. As an example, there was a company who wanted to include swear words or curse words within the grammar, so if caller told the application to "go to hell," it was recognized. We're not entirely sure what the application was to do with such a request, but there are people out there who call and curse at the machine for whatever reason. You don't want to try and understand why this goes on or try and deal with it with your application. Because what happens if you have a caller who's using the application correctly, however the machine thinks the caller said to go to hell? Remember, as you increase the size of your grammar the chance of misrecognition also increases. The point is, you want to leave the things that most people do not say out of the grammar. Make it just big enough so that it covers everything that a reasonable caller says and leave the other junk out.
  • Make a grammar too complicated: You do not want to make grammars too complicated, SRGS (the language used for writing grammars) is very powerful, as is SISR (the language for putting in semantic interpretation), which actually turns grammars into small applications. You may be tempted to put all of your application logic into your grammar simply because it can be done. You could essentially write a JavaScript application inside a grammar if you really wanted to. However, application logic probably should go inside your application, and not inside the grammar. You're better off keeping the logic within the grammar related to grammar matters, such as interpretation. You don't want to perform other types of operations because, it will slows things down when the grammar is handling functions which are better suited for the application. This is also more complicated to maintain. So keep the grammars as simple as you can, only complicate them if you have a very compelling reason to do so.
  • Write grammars with infinite loops: This may sound like obvious advice, but it happens more often then you may think. You can write recursive grammars with rules that reference themselves, and at times there are reasons to do this. But you don't want these rules to occur infinitely. If it takes an infinite amount of time for the parser to get through the grammar, and a caller says something to cause an infinite loop, it will take an infinite number of seconds for the speech engine to return anything and that is a very long time for a caller to wait. So be very careful when using recursion within a grammar, and make sure they are tested in your grammar editor before hand so you know when you have these types of problems.


Prompts are closely related to grammars, they determine the types of responses you will receive from the user. There are some bad prompts that are seen frequently, the most common of which is a holdover from the days of DTMF.


  • Use "Press or Say" prompts. Let's say you have a DTMF application and you add some speech recognition to it and you decide, because the callers are used to it, why not just have the prompts state "Press or say 1 for?" to mimic DTMF. This is actually a terrible prompt for speech. It's commonly used so a lot of people are comfortable with it, but it's simply not good. Saying "one" or "two" instead of pressing it is little to no improvement from a caller's perspective. Callers would rather actually say "Please give me my balance," rather then trying to remember that "one" is the equivalent to a balance inquiry. So try to avoid "press or say" prompts, they tend to become long and unwieldy, and they don't really help anyone. It doesn't make much sense to spend money on speech just to mimic a DTMF application.
  • Write really long prompts. One of the things the "press or say" prompt does is to get very long quickly. You don't need to spell out everything to the caller, perhaps a prompt that simply says, "Please tell me what you like to do, for instance for your checking account balance, say 'Checking account balance.'" You don't necessarily have to tell the caller that they have to say saving account balance, to hear the balance for saving. You may just tell the call, "Say checking account balance to hear your checking account balance, or you can hear your saving account balance, or you can make a withdrawal." In this prompt, notice you're not continuously saying "say this for this?" The prompt told the caller how the system works and moved on. Don't make it so long because people don't like to listen to very long prompts. The callers are smart enough to figure out how the IVR works if they're provided with one or two examples, and it probably works better. This keeps the calls from getting long, and callers from getting frustrated and hanging up. Which drives the success rate down and no one's happy. Make prompts short, sweet and to the point.
  • Force your users to sit through the entirety of every prompt, every time. We have barge-in for a reason, we have good start and end of speech detection in our speech engine software. Don't force you callers to listen to a prompt. This is not generally useful, or helpful. There may be occasions where you may have important prompt that should not be interrupted, however most of your prompts should allow barge-in. Also, while you're at it, leave short pauses in the prompts, which lets callers know it's all right to speak and interrupt. For example, "You can access your checking account balance [pause], or you can hear your savings account balance [pause]." This allows the caller to feel free as some callers are hesitant to interrupt when someone is speaking.

Call Flow and Application Design

Once prompts and grammars are taken care of, take a look at the overall call flow and design.


  • Mimic DTMF call flows. Avoid the nested menu hell generally associated with DTMF. With speech you can have many flat menus with numerous options that are natural and obvious, and callers don't have to try and remember what each number represents. Don't make callers sit through giant menus, you're not using DTMF, and don't just shoehorn speech into a DTMF application. Instead, build nice, natural menus. Rethink your whole call flow and really take advantage of the powers of speech recognition.
  • Forget about error handling. There may not be as many errors with DTMF as compared to speech, because it's easier to tell what a caller keyed in. However, in speech applications you will need to put in confirmations, such as, "Did you want this, or this?" Then ask the question and the caller will respond with yes or no. Don't overdo it, don't make the caller always confirm their responses. It becomes very frustrating to the caller to have to confirm everything. Check confidence scores to help with confirmation. If the confidence score is high there is no need to confirm the caller's response. Use confirmations where appropriate and your users won't mind it.
  • Pretend the computer is a person. A major problem is that some people don't realize that speech recognition is not the same as having a conversation. Designers will sometimes try to build applications to mimic conversations. Don't do this, especially do not try and trick your caller into believing they speaking to a live person. You don't necessarily have to specify that they are speaking to a machine, but don't try to get the application to appear as a human. These application don't work well, if a caller is fooled their responses become unpredictable, and contribute to more errors. Then, once the caller realizes they've been tricked they most likely will become angry. It's OK to build an application with personality, but go easy on it. If a caller knows that they are talking to a computer, they will use short commands and phrases. Your application will recognize more of what is being said, and your call success will increase.

© 2018 LumenVox, LLC. All rights reserved.