Designing a good speech application is often a balancing act between accuracy and flexibility. Every time a developer increases the number of options a user has available, it requires an increase in the number of words a speech engine must be looking to recognize, and thus it increases the chance that the speech engine will misrecognize what users say. Nowhere is this balancing act more evident than in the two general ways of letting users respond to speech applications.
The first is called natural language. In this type of response, a user is asked a broad question such as "What would you like to do?" and is allowed to respond naturally. In a banking application, the responses might be diverse, including things like, "I need to know my balance," or, "Make a withdrawal."
Natural language response systems are, in theory, great for users because there is no learning curve. The interface is completely intuitive and reflects how people speak. The downside is that the developers of such a system need to be able to predict everything a user might say because they must build grammars in advance that contain each word and phrase to be recognized. As the size of the grammar goes up, the chance of misrecognition also increases, so these natural language systems tend to have accuracy problems.
The other type of response is directed dialogue. A directed dialogue system presents users with a range of options and prompts them to pick one. Such a system might say to users, "Please say what you'd like to do: hear your balance, or make a withdrawal."
Because users are being directed to say a specific option, there are fewer likely responses to account for in the grammar. This makes directed dialogue systems more accurate at recognizing speech than natural language systems. It also means directed dialogue systems are easier to develop and require less time to troubleshoot and test. On the other hand, they are a little less flexible from a user's perspective and they are less natural to use.
Whether to use a natural language or directed dialogue approach for an application (or an individual component of an application) is an important thing to consider when designing speech applications. Many developers begin planning their applications with unrealistic expectations of the technology, and assume they can build effective and complex natural language applications.
In reality, the majority of speech applications in use today are of the directed dialogue variety. When combined with well–designed prompts and call flows, the higher accuracy of directed dialogue applications can offer a user experience that is superior to the natural language approach. Especially when new to speech applications, designers need to be careful of planning systems that attempt to do too much.
© 2016 LumenVox, LLC. All rights reserved.