Speech tuning is a vital part of building and maintaining any successful speech automation application (for more information, see our online article The Importance of Tuning). Speech tuning, the process of changing an application based on data gleaned from real-world use, improves recognition accuracy and provides a host of vital metrics such as call completion rates, containment rates, and user experience scores. Most importantly, these changes affect the bottom line of anyone who builds, deploys, hosts, or purchases any kind of interactive voice response (IVR) or other application that makes use of automatic speech recognition (ASR) or text-to-speech (TTS) technology.
Because tuning has a direct relationship with how well a voice application functions, tuning an application tends to increase its success rate, providing a faster return on investment (ROI). Tuning also affects the customer experience, providing benefits which are harder to measure directly compared to ROI. Since tuning is something that must be done periodically in order to ensure the application is still performing at optimal levels, the efficiency with which an organization is able to perform tuning will be a factor in understanding the total cost of ownership (TCO) of an application.
The LumenVox Speech Tuner, a unique tool developed by LumenVox, serves two key purposes. First, it makes tuning easier and faster, allowing applications to be tuned in less time, providing a faster ROI and lowering the TCO of voice applications. Secondly, the Speech Tuner allows organizations to find and improve issues they might not otherwise be aware of, which improves the experience of users and customers of the application, increasing loyalty and satisfaction.
Many users of early speech applications will remember experiences with applications that performed poorly due to insufficient tuning. "It didn't understand me," or, "It pronounced that name wrong," are common critiques users have of these applications. Speech tuning helps to improve that situation by making adjustments to the components of the speech application, including:
In all these cases, tuning requires a few things:
The costs and benefits of tuning vary significantly depending on the application and the work required to tune it, but we can look at a generic case to understand how, even though it is time consuming, tuning is almost always worth the effort. Consider a typical IVR application which replaces or supplements live agents in a call center.
Using industry-standard costs for agents, telephony charges, and other costs, we can estimate the cost per minute for an agent to handle a call at $1.045. This compares to a typical IVR that utilizes ASR and TTS technology costs between $0.10 and $0.25 per minute to build, operate, and maintain over a 3-5 year lifespan. This means that each minute an IVR is handling a call, the average savings is around $0.87.
Assume that a newly-deployed IVR has a recognition accuracy rate of 80%. The IVR asks 5 questions from each caller, and if there are ever 3 errors, it is transferred to an agent. This would give us a call containment rate of 94.21%2. If we assume that each call that is not contained goes to an agent and takes 2.5 minutes of the agent's time — a conservative estimate if you factor in the setup/teardown times for agents — then we know that the cost per call that is not contained is $2.175 (this is the difference between 2 minutes of agent's time and 2 minutes of the IVR's time).
Thus a medium-sized call center with 175 agents that handles 1.5 million calls per year would experience a cost of $188,898.75 per year due to ASR errors (1.5 million calls times a failure rate of 5.79% times $2.175 per non-contained call).
If the tuning exercise cut the ASR error rate in half, which is normal for the first tuning cycle, then the new call containment rate would be 99.14% — notice how a seemingly small gain in ASR accuracy greatly improves call containment. Using the same calculations as above, the call center would now only be spending $28,057.50 per year on uncontained calls.
General industry guidelines suggest that half of an application's development time be spent on tuning. If we assume that a relatively simple IVR with just 5 dialogues represents approximately 100 hours of development time (20 hours per dialog is a reasonable approximation, though it will vary greatly depending on the complexity of a given dialog), then 50 hours would be appropriate to allocate for tuning. Considering that tuning is a highly skilled task, then we might expect to spend roughly $300 per hour in fully-burdened costs for a speech expert to perform the tuning. The cost in tuning would thus be $15,000.
This gives us a net savings of $160,841.25 per year on an investment of just $15,000 — the tuning costs pay for themselves in less than two months of operations. Because speech automation is so much more cost effective than live agents to begin with, almost any improvement in performance translates to real savings. A similar pattern holds even for smaller applications with lower volumes.
None of the above even factors in other improvements that come from tuning, such as call completion rate (fewer people hang up in frustration), customer satisfaction from an improved user experience, etc. It is clear that tuning provides a major return on investment
The case for tuning becomes even stronger when paired with the LumenVox Speech Tuner. The cost of a Speech Tuner license is almost always paid for and then some by a single tuning effort. We estimate that the Speech Tuner cuts down tuning times by approximately 50% compared to attempting tuning "by hand" without a specialized tool. The Tuner provides very valuable features such as:
Let's look at the previous example, where 50 hours of tuning cost $15,000 but saved $160,841.25 per year. One seat of the Speech Tuner costs $1,000 per year, but can reduce that tuning cost to $7,500. This is an ROI of $6,500 by buying the Speech Tuner for just one tuning cycle. The Tuner paid for itself 6.5 times over. If the Tuner is used for multiple projects per year, the return will simply add up.
Furthermore, an important part of application development is periodic tuning of running applications in order to verify that they are working correctly and to account for any new usage periods. Thus tuning costs over the lifetime of an application must be factored into the total cost of ownership for an application.
Several factors combine to make speech tuning an investment with a clearly positive return. Because speech automation enjoys such a cost-advantage over handling tasks with live agents, minor improvements in application performance translate quickly into major savings. Though speech tuning often seems expensive in an objective sense, when compared to the money it saves through improvements in performance, it generally pays for itself well within a single year.
And the LumenVox Speech Tuner tool, which can cut tuning costs in half, easily enjoys a positive return on even a single tuning project of almost any size.
1 0.175 is the average per minute cost based on the industry standard $0.10 and $0.25 per-minute costs.
2 This is the chance that there are 2 or fewer errors in 5 trials with a probability of success of 0.8.
© 2016 LumenVox, LLC. All rights reserved.