Video Introduction to Speech Application Design



  • This video provides an introduction to designing speech applications. It will give you an overview of the purpose of adding speech to interactive voice response systems, some of the differences between speech and touchtone/DTMF applications, and an idea of some of the pitfalls you can run into when designing speech software. The video also discusses some of the psychology of speech recognition, and how the ways people relate to talking computers can work both for and against the application developer.
  • RUNTIME 5:15


Video Transcription

Introduction to Speech Application Design

Hi, I'm Kyle Danielson and I work for LumenVox. This section is going to be an introduction to speech applications. I'm going to go ahead and cover some ideas that we will be expanding upon on the next few sections about how to design a speech recognition application and what are the ins and outs of that process.

Now the first thing I want to talk about is what do we want to do with speech applications? In fact, as it turns out, we do the same things with speech applications that we do with TouchTone or DTMF applications. So for example, when the caller calls up we try and route them to an appropriate place for them to go. Or I might try and provide information to them such as getting the status of an order and delivering that to them. Additionally I may try to complete a transaction, which is a little bit more complicated but I might try and move money say from a savings account over to checking account.

Now how am I going to do this? Well I'll do it the same way I do it with DTMF applications or TouchTone applications. I'll have an audio prompt and that audio prompt's purpose is to tell the caller what they need to do to progress the call along to their goal. With DTMF applications, it would be a menu of choices and with a speech application it would be a question.

Once the caller responds to that, which is the caller input, I'll either go ahead and progress the call along or with speech applications I have a little extra thing in here where I might actually confirm what they said. With speech, there is a little bit of question as to how sure we are that what we report callers saying is actually what they said. Based on that number or that score we may do a confirmation but the call at that point can move on.

The next thing about this is what are we going to add with speech, what's the reason we want to put speech in our app. In fact it's because we have more options with speech. We have the ability to ask them a city and state question such as "Please tell me the city and state you are traveling from," and the person could respond back with San Diego, California. I might be able to get that kind of information with a DTMF or TouchTone application, but I would have to ask something like the ZIP code and the caller may or may not know the ZIP code to where they're traveling.

As I said before, an audio prompt in a TouchTone application is a menu but I did make the distinction that in a speech applications it's a question. So what does that mean? It means I have to ask questions to the caller and the caller is going to respond back to me. This is distinctly different from the menu. If I had a menu of nine choices and I try and ask that as a question it becomes a very hard question to ask, and it becomes a very hard question to answer, and that's really not appropriate.

What ends up happening is I need to design my application so I have easy questions to ask and the caller can respond in an easy fashion and that's really what's going to make s good speech app. So in this process of asking questions the voice can have more personality, it's very hard to put a personality into a menu but when I'm asking someone a question I can be friendly, I can be serious, I can be concerned. I have a lot I can work with so what ends up happening is in effect you can make a personality that is branded towards your company, this is the way you want your company perceived and that's a very powerful thing you can do.

Some of this does impose again a greater responsibility on the application developer to do a good job and the reason why is that callers will answer questions inappropriately. People see talking computers all the time, they see them on TV, they see them in movies, and the thing about these talking computers are is they always seem to be artificially intelligent so you have an interaction between the caller and the computer that is very much like a human to human interaction.

This is really the point of reference of most people when it comes to a speech application. So if you ask a question like "Where do you want to go?" the person could very legally say, "I want to go to my mother's house," or, "I'm going to upstate New York," but you weren't prepared for that. Or even if you had a grammar that covered all the possibilities you have to program what those possibilities mean.

What we end up doing is we try and narrow our questions down and we say things like, "Please tell me the city and state you would be traveling to." Now I can predict where that's going.

These challenges will be talked about in the next following sections and what you'll find is there are tricks and tips in making this happen. We will talk about prompts, we will talk about grammars, and we'll talk about really the best practices when it comes down to creating your application and deploying your application.

© 2018 LumenVox, LLC. All rights reserved.