Practical Guide to Tuning
To give and idea of how much time speech tuning can take, the speech recognition industry estimates
that 40-50% of total development and deployment time should be spent on the tuning process.
Untuned speech applications do not survive contact with customers. Whether your company has live
speech applications in deployment today, plans to implement one within the next three to six months,
or is only beginning to consider adding speech applications, you should consider the importance of
tuning. Tuning uses prompts, grammars, call flow, and caller data to improve the speech application as
a whole, and is critical to the success of your deployment.
There are three ideas to keep in mind when approaching a tuning task:
-
Even the best of "best practices" build on assumptions that might not hold true
after deployment-once you have callers, you must often read just or remove these assumptions
to provide the quality experience callers expect. To give and idea of how much time tuning
can take, the speech industry estimates that 40-50% of total development and deployment time
should be spent on the tuning process. Putting emphasis on tuning will help your application
run more smoothly, keeping callers happy-and your customer.
-
Adapt the System to the Caller. In general, you will not be able to make users
do anything in any particular way. You can, and should, give as much guidance for callers as
possible, but ultimately the caller dictates the conversation. The trick is to provide good cues
and guidelines, so callers choose the pathway you designed for the application. Remember that if
the system fails to meet the caller's needs, it's not the caller who has failed; it's the
speech application.
-
Start with Small Changes. It's all too easy to get caught up in the moment,
expending hours of effort on a seemingly enormous problem-for something that really only affects
a few out of several hundred callers. Identify, first, the issues that are the easiest to resolve
and provide the biggest benefit. Making small changes to improve the experience for most callers
is preferable to costly changes that only benefit a few.
What you shouldn't do when tuning a speech recognition application:
-
Don't Make Changes Based on One Instance. This should be fairly obvious,
but we still see it happen. Making changes based on a single instance usually results in
fixing a problem that doesn't really exist. There are numerous 'one-off' errors in speech
recognition, many of which are associated with noise, or transient effects that won't be
generally reproducible. Real issues will arise multiple times, in multiple places, with
plenty of evidence to help you decide how to solve them.
-
Don't Make Changes on Unanalyzed Reports. Treat the report with respect:
analyze the call, compare it with other calls, see what really happened--often, the system
worked as designed, but the design was flawed. Research the problem carefully so that you
avoid unnecessary (and costly) changes. Instead, try this process when tuning an application:
-
Familiarize Yourself with the Caller's Experience. Do this by listening to
the calls, from start to finish. Compare the speech engine results with respect to the audio
prompts and the caller's speech. Transcribe the audio, so you can analyze the accuracy and
performance.
Use your Speech Platform's reporting and analytical tools to maximize your information.
Above all, identify the key issues and prioritize them. Solve the easiest dilemmas
first, like typical grammar problems. Then, move to prompt and dialogue changes and finally
proceed to acoustic model training and adaptations.
-
Test Changes Rigorously. When you make a change, you must test it. You did
the transcripts, and so you have the grammar and audio data: as much as possible, test under
'real' conditions. Give yourself the assurance that any