The third part in our introductory tuning video series provides a realistic, hands-on approach to tuning speech recognition systems. The video provides a basic speech tuning methodology and then demonstrates how to implement this methodology on actual call data, using the LumenVox Speech Tuner.
Video Playtime: 12:23
This document will briefly describe the basics of speech tuning, presenting a practical approach to tuning with the LumenVox Speech Tuner. Tuning, the process of altering a speech application based on how people use it, is effective before the initial deployment, and is also vital after the application has been deployed.
The key to tuning is to search for trends. You are looking for problems in the system that affect a significant number of users. This is important for two reasons. First, time is always limited so it's important to address the biggest problems first. Second, any changes to an application may have harmful side effects, e.g. adding more words into a grammar to accommodate a small number of users may lead to a higher misrecognition rate for a larger number of users. This means it is important to carefully consider all changes and only make the most important ones.
There are two basic approaches to tuning, and both should be used. The first is a quantitative approach, one that involves careful statistical analysis of call data. This requires a large amount of transcribed data from the system. Once the data has been transcribed, tests can be run to identify trends. While this system is useful, it is often over-emphasized. You can often get the most value from a different, more practical approach.
A qualitative, more holistic, approach often provides more value. The key idea here is get an idea of the quality of your calls by simply listening to them. This may involve light transcription and statistical analysis, but just understanding how your users use the system should always be the first step in speech tuning.
To put this system in place with the LumenVox Speech Tuner, you would first use the Call Browser to listen to calls and get a feel for how callers use the system. This will often allow you to quickly find small problems, many of which are low hanging fruit. These are easy problems, mostly variations of expected responses. For example, someone says "Dave" instead of "David." This is a real problem with an easy fix: just add alternate pronunciation into the grammar.
While listening to these calls, look for indicators of trends which are the main focus and the main problem. You may find calls that reveal design issues that are not easily solved, and may require you to reword your prompts or edit your grammar or change the entire call flow and application logic.
Once you can identify these issues, you can then begin to transcribe our interactions and apply a more statistical methodology. But there is no reason to invest significant resources into transcription while the system still has glaring problems.
Using the LumenVox Speech Tuner, you can also zero in on problems once you have transcribed data. You can use its filter to just find bad interactions and display those in the Call Browser. Then it may be obvious when bad interactions are occurring, e.g. on interactions that use a particular grammar.
Using these tips, you should be able to significantly reduce the amount of time you spend in the initial phases of tuning a speech recognition application.
© 2017 LumenVox, LLC. All rights reserved.