Video Transcription

Speech Application Development - Grammars

In this section, we're going to talk about grammars and application best practices. I'm going to cover grammars in more depth in a couple of sessions, but for now I want to just go over some best practices about grammars.

Grammars will contain the callers' answers to your prompts. They have to represent what the caller is going to say pretty much word for word. So the more open-ended question your prompt is, the more your grammars will become more and more complex. There is a large dynamic between what your prompts are saying and what your grammars will contain, and the simpler or more direct the question you ask, the simpler your grammar can be.

Scope of a Grammar

There is the idea that you have the local grammar, and the local grammar is typically trying to take care of what the caller is going to say in response to your question. You also may have what's called a global grammar, and this global grammar will handle what callers may say as far as navigation or trying to escape a situation they didn't mean to go into. A caller might say "Main menu" or "operator," and that will push the caller to a different location or a different part of your application. That's the difference between local and global.

Additionally, your grammars may contain phonetic variations, because a speech application has an idea of the way people say things, such as, say, "San Diego". But let's say San Diegans had a different way of saying it. They might just say "Diego". Well I might need to add just "Diego" in and not just have "San Diego".

Confusability of Grammar Items

Your grammars should cover what the average person would say in response to your question. Not what anyone could possibly say, but what most people would say. Your grammar represents a guide to your speech application, and the more words you have in your grammar, the more chance you have for confusability. What do I mean by "confusability"? Imagine I was designing an application that was all about getting and leaving voicemail messages. There are two commands. One is a command to get the voicemails, and the second one is to leave a voicemail. If I make those commands "Get Message" and "Leave Message", those are very similar words, and there's a lot of confusability in there. They both end in "Message", "Get" and "Leave" are a similar length, and if for some reason a part of that gets clipped or garbled, there is a very good chance that one may be swapped with the other. Imagine if I chose instead to say "Check voicemail" and "Leave Message". Those are completely different. It doesn't matter if a little bit gets garbled because they're so different that the confusability is very, very low.

The same idea happens when I take my grammar and I add everything that someone could say, whether in jest, or being a smart-aleck or whatever. Then I have a very high chance of confusability, and that in fact will mess up the 95% of people who are actually trying to use your application. Often, you'll have someone trying to say, "I'll test your application". People who are using your application are trying to test a goal. People who are trying to test your application are trying to break it. I could break any application I want. In fact, I could break a touch tone application if I simply mash the keypad. If it actually responded with a menu, I could say I just mashed the keypad, why did it advance along? The same idea happens with speech, but for whatever reason, people feel that speech applications should be artificially intelligent, and in fact they're not. They're programmed with appropriate responses to your questions.

Best Practices for Testing

To a certain degree, this comes back to testing practices. First off, you may test your application and it works great. But the true test of whether it's a good speech application is to have real callers use it who are really trying to accomplish the goals that this application is there for. It's these people that are trying to interpret your prompts and say appropriate things so that they can get the job done. What ends up happening is, the callers will find problems that you didn't know existed. You as a developer did your best to predict what the caller will say. But what we'll find out is that we're a little bit off from what we expect the callers to do, and callers will find problems with your application.

The worst scenario is that you make a very complicated, very large speech application that you've never had a real caller try out. You roll it out to the public, at which point the public finds all of your problems all over your application. You're faced with a tough decision right now. You know you have problems, but they're all solvable once you figure out what you need to do. But in the meantime, people are using your application, and because of those problems, they're losing confidence in your application. And in fact, you're losing buy-in. More and more of those people using your application are just trying to press "0" now, to get to a live operator without using your application. So you could roll it back and fix it to the best of your ability and roll it back out, but all along, you could have been producing small versions of your application, testing it by real people trying to use it in an appropriate fashion. In that sense, you're finding those problems ahead of time, and this is what we call tuning.

Tuning Speech Applications

The idea of tuning is that I expose my product, people try it, and then I study the results and find out how to make it a better application. You'll find in most cases that half your development happens after your initial deployment to the caller base. There's really no solution to this because it's part of the process, so be prepared for it.

A reasonable set of guidelines is that you want to keep your application simple. Make it more complex later, after you've tested your theories. The more time you've spent on, say, a single dialogue to make it just right, only to find out that it doesn't really work for callers, is really a waste of time. Keep it simple, and if you want to make it more complex, introduce that complexity in stages. You want to adapt the application to the caller. If, for example, you have some dialogue, and the callers are having a really hard time with it, resist trying to give the caller more instruction. Look at the adapting the application so you can make the question easier. You might split that question into two or three pieces, so on and so forth, but you want to adapt the application to the caller. Don't try to force the caller to use your application when they find it awkward. I can stress enough, deploy early and tune.


  • Designing grammars can be tricky. This video explains some of the best practices when it comes to building grammars for a speech application, and how you can use real-world experiences to best adjust the lists of words you expect users to say. The video guides you through some common traps in basic grammar design and will introduce you to the process known as "tuning," where you allow real callers to use your speech application and adjust it based upon their experiences.


  • Video Playtime: 8:42



  • Contact Us
  • +1-858-707-7700
  • Toll Free: (877) 977-0707,
    say "Sales"

© 2016 LumenVox, LLC. All rights reserved.