Video Speech Application Development - Prompts



  • The most important pieces in any interactive voice response system are the prompts, as these are what your callers interact with. They'll represent your company and in a speech recognition system, the prompts will determine the type of responses users give. This video will walk you through the basics of prompt design for speech-enabled IVRs, including: picking the right voice for your application, prompt length, and how to create prompts that will lead your users to providing the sort of speech input your application is expecting.
  • RUNTIME 9:39


Video Transcription

Speech Application Development - Prompts

In this section we're going to talk about prompts. Prompts are a really important part of your speech application, because they're a front-facing part of your creation. This is what the caller is going to hear, and so it is the most important thing about your app. If your callers don't understand your prompts, your application is not going to do well. If your prompts misrepresent what your application is supposed to do, your application isn't going to do well.

The first thing I want to say is, don't worry about making them perfect right off the bat. Through the process of experimentation, you'll find that your prompts grow with your application, and represent it very well.

Along the way, there are a few things that will make your application great:

Prompts need to elicit a predictable response from the caller

This is because, with speech recognition, you have to be able to predict exactly what the caller is going to say. There is a little bit of fudge room, but in general, when a person says, "I want a cheeseburger," you need to know they're going to say "I want," or you need to change that prompt so that all they say is "cheeseburger." So when you're designing your prompts, even if they sound great, if your caller says something that you weren't expecting, that prompt is not going to work.

The prompt needs to have an appropriate personality for the application

I don't want to have a bubbly person helping me with my banking transaction. You want to have a voice that people trust to help them reach their solution.

Don't make the prompt too long-winded

If the prompt drones on and on, the caller will get bored listening to it. Likewise, they're going to think it's going to take them too long to reach their destination. You want your prompts to be as concise as you can make them. I have had many applications where the stakeholder insisted that we had to play all of these prompts ahead of time, and so we had about 45 to 60 seconds of prompts saying various things to the caller. Finally, they asked the question, "Would you like to try the voice ordering system," and the caller answered, "No!" They go right to a live operator, which is the thing we really try to avoid.

Use dialog / conversational behaviors

The key idea is that we are building upon the normal process that people speak to one another with. As we'll find out, most people are very polite, and so they will avoid interrupting the prompt. With speech applications, it's very important to allow the ability to barge in. Let's say I have a call router and I want to give people the chance to say their first and last name or a department as soon as they can. I might have the prompt be like this: "Please tell me the first and last name of the person you wish to speak to, or you can tell me a department. The departments available are Sales, Marketing, and Technical Support." What I've done is given the caller a chance to take a turn and tell me what they want before I start going into a list of possible choices.

Consider Emotional Transference

There is a certain amount of emotional understanding between the caller and the way the prompts are talking to them. So if a prompt is talking very fast to the caller, the caller will try to respond in a fast manner. If the prompt is slower, the caller will feel like they have more time to answer it. It's very important if you're going to ask the caller a difficult question that the prompt slows down so the caller does not feel rushed. If the customer feels rushed, they have a tendency to stumble over their words, and that's terrible for speech recognition. It will decrease the confidence score to the point where it will throw it out, or it will do a confirmation, or it will just get it wrong.

Another idea is that you're trying to build a process of making a connection between a caller and the prompts. To achieve this, we want to make sure that our prompt volume and personality is consistent. If prompts are higher and then lower, it breaks the psychological effect. A change of personality, where it's serious at one moment and happy the next, also breaks that build-up that you've attained. To a certain degree, you want to make a connection to the caller. Let's take an example of a longer transaction, such as transferring money in a bank account from savings to checking. At the beginning of this transaction, I want to be serious. In the middle of the transaction I want to be encouraging: "You're almost there." At the end of the transaction, I'll be congratulatory: "Great job, we're all set." Then you'll move on to the next transaction for this caller.

Touch Tone vs. Speech (Menu vs. Question)

You never want to mimic a DTMF application with a speech application, for the reason that they're built on different paradigms. A DTMF application is a menu-based application. You do not want your speech application to mimic this, because a speech application is a question-based system. So let's say you have a menu of eight items. You're going to try to directly reproduce that into a speech application. Right off the bat, people are going to think, "For Sales, say 'Sales' or press 1. For Marketing, say 'Marketing' or press 2." This never works. This makes for a very complicated application, because normally when people use touch-tone applications, they'll think "Oh, 3 sounds about right, but let me listen to the rest just to make sure." And then once they've decided that 3 was in fact the one they wanted, they'll press 3. Now imagine they're trying to remember a word now, they're trying to remember "Marketing." They've listened to all the other choices and now they've forgotten the word. I can put my finger on the 3 button? I can't necessarily put my finger on the word "Marketing." So again, you'll want to avoid menus.


What you want to do with confirmations is confirm the intent of the caller, and not what they actually said. When a customer, say, asks for "Customer Support," and what you actually have is a Technical Support department, you don't want to ask the caller, "Did you say Technical Support?" because in fact what they said was Customer Support. Oftentimes in speech applications you'll have more words in your grammar that represent one thing. So in your grammar, you may have "Customer Support", "Technical Support", and "Customer Service" all going to the same department, but you don't typically have prompts for all of those choices. So what you do is you confirm the intent. "Do you want to speak to our Customer Service department?" and they'll say yes. I'm not going to ask what they said, I'm going to ask if they want to do something.

In the next section we'll talk about grammars and best practices for developing your application.

© 2016 LumenVox, LLC. All rights reserved.