Practical Guide to Tuning
To give and idea of how much time speech tuning can take, the speech recognition industry estimates that 40–50% of total
development and deployment time should be spent on the tuning process.
Untuned speech applications do not survive contact with customers. Whether your company has live speech applications in
deployment today, plans to implement one within the next three to six months, or is only beginning to consider adding speech
applications, you should consider the importance of tuning. Tuning uses prompts, grammars, call flow, and caller data to
improve the speech application as a whole, and is critical to the success of your deployment.
There are three ideas to keep in mind when approaching a tuning task:
- Make Time for Tuning. Even the best of "best practices" build on assumptions that might not hold true after deployment–once you
have callers, you must often readjust or remove these assumptions to provide the quality experience callers expect.
To give and idea of how much time tuning can take, the speech industry estimates that 40–50% of total development and
deployment time should be spent on the tuning process. Putting emphasis on tuning will help your application run more
smoothly, keeping callers happy — and your customer.
- Adapt the System to the Caller. In general, you will not be able to make users do anything in any
particular way. You can, and should, give as much guidance for callers as possible, but ultimately the caller dictates
the conversation. The trick is to provide good cues and guidelines, so callers choose the pathway you designed for the
application. Remember that if the system fails to meet the caller's needs, it's not the caller who has failed; it's the
- Start with Small Changes. It's all too easy to get caught up in the moment, expending hours of effort on
a seemingly enormous problem — for something that really only affects a few out of several hundred callers. Identify, first,
the issues that are the easiest to resolve and provide the biggest benefit. Making small changes to improve the experience
for most callers is preferable to costly changes that only benefit a few.
What you shouldn't do when tuning a speech recognition application:
- Don't Make Changes Based on One Instance. This should be fairly obvious, but we still see it happen. Making
changes based on a single instance usually results in fixing a problem that doesn't really exist. There are numerous 'one–off'
errors in speech recognition, many of which are associated with noise, or transient effects that won't be generally reproducible.
Real issues will arise multiple times, in multiple places, with plenty of evidence to help you decide how to solve them.
- Don't Make Changes on Unanalyzed Reports. Treat the report with respect: analyze the call, compare it with other
calls, see what really happened — often, the system worked as designed, but the design was flawed. Research the problem
carefully so that you avoid unnecessary (and costly) changes. Instead, try this process when tuning an application:
- Familiarize Yourself with the Caller's Experience. Do this by listening to the calls, from start to finish.
Compare the speech engine results with respect to the audio prompts and the caller's speech. Transcribe the audio, so you can
analyze the accuracy and performance.
Use your Speech Platform's reporting and analytical tools to maximize your information. Above all, identify the key issues
and prioritize them. Solve the easiest dilemmas first, like typical grammar problems. Then, move to prompt and dialogue changes
and finally proceed to acoustic model training and adaptations.
- Test Changes Rigorously. When you make a change, you must test it. You did the transcripts, and so you have the
grammar and audio data: as much as possible, test under 'real' conditions. Give yourself the assurance that any