I bought a fancy new car about five years ago; all the bells and whistles, every luxury upgrade that money could buy. At 3,000 miles, I got an oil and filter change and put air in the tires. I haven’t taken it to the mechanic since. It’s still running—maybe not as well as it used to—but as long as it gets me from point A to point B, why should I do anything else?
Okay, that’s not a true story. Yet I see this happen all the time with the Cadillac of self-service investments: speech recognition applications. Despite the initial time, effort, and dollars spent, companies often take a set-it-and-forget-it mentality with IVR. Tuning a speech application shouldn’t only happen after the pilot phase; it should occur at regular intervals to ensure the application is running just as smoothly as that vehicle in the driveway.
As we learned in the last Blog post, there are many benefits to tuning, so I won’t reiterate them here. Rather, let’s talk about the process, how it works, and the type of improvements you can expect to see.
The first step is identifying when to initiate a tuning cycle. This can be at pre-defined intervals (annually, for example), following an application enhancement, or even timed in conjunction with an outside event that affects usage and traffic flow to your application. Perhaps you’re an insurance provider, and the government has just mandated coverage and benefit changes. Policyholders may call your application with questions and directives that are new and unexpected. Tuning helps reveal what callers are asking for, and how those needs change over time.
Once you’ve decided to engage in tuning, it’s time to enable utterance capture on your speech recognition server. I usually recommend a minimum two-week period for this, but the interval could be longer or shorter based on call volumes. Before utterance capture is enabled, be sure to play a “your call may be monitored or recorded” message up front to keep the legal department happy, and always double-check that utterance capture is working by listening to a WAV file or two. It’s not unusual for there to be permissions issues writing files to a directory on the server.
While utterance capture is underway, you’ll want to make a list of the dialogs to tune. Prioritize those that get the most use, have high error-out rates, or represent critical “gates” in the call flow. If, for example, failure at a certain prompt prevents a caller from going down an important path (such as making a payment), that “gate” should be tuned for optimal performance. Call event reports will come in handy for identifying any problem areas in the application. In general, I suggest minimizing the number of yes/no or digit dialogs that receive tuning attention because they typically use shared grammars, and any findings at one prompt can be applied to the others.
Armed with utterances, logs, and a tuning plan, it’s now time to load up that data into the LumenVox Speech Tuner. Luckily, the tool provides a very user-friendly interface for transcription and analysis. But once you’re listening to caller recordings, what are you actually looking for? Well, tuning is a fairly subjective process that requires a skilled ear and critical thinking, and it’s sometimes difficult to distinguish trends from outliers. That said, I’m typically on the lookout for: (a) out-of-grammar utterances of a significant sample size, (b) red herring “sound-a-likes” that confuse the speech engine, (c) prompts that mislead the caller into giving unexpected responses, (d) rejection of valid utterances due to confidence scores, and (e) “talk-off” issues where only partial utterances are captured. Addressing such problems may require grammar updates, new phrase recordings, configuration changes, or a combination of all three. A trained voice user interface expert can assist in making and implementing the tuning recommendations based on the data revealed by the Speech Tuner tool.
So, with analysis complete and the application changes in place, what type of measurable improvements can you expect to see? The answer here is always: it varies. You may experience self-service task completion rates that jump from 50% to 75%, but you may not. The fruits of a tuning effort are typically weighed over time and over multiple iterations. False-Accepts (the phrases that the recognizer accepted, but shouldn’t have) and False-Rejects (the phrases that it rejected, but should have accepted) should decrease. Confidence scores for Correct-Accepts should increase. Follow-up tuning cycles will expose these trends, but it’s almost never possible to assign your expectations a hard-and-fast number. Continuing to analyze call event reports will help illuminate where the gains have been made and where to focus your efforts in the next tuning cycle.
Performing these steps in regular intervals will ensure that you’re getting the maximum mileage out of your investment, and protects against a user interface breakdown that could have been avoided with routine maintenance. Not to mention the fact that customer satisfaction will improve when the user experience does!
Please contact your account manager or LVSales@LumenVox.com for more information on the LumenVox Speech Tuner. For more information on the speech tuning process, or to engage Interactive Northwest (INI) in an application tuning cycle, visit the Contact INI page to speak with a qualified speech mechanic. Vroom vroom!
Interactive Northwest, Inc. (INI) develops innovative interactive voice response (IVR), computer telephony integration (CTI), and self-service applications for high-volume contact centers in markets such as government, healthcare, finance, utilities and service industries. A strong commitment to platform expertise, seamless systems integration, and project management excellence uniquely position INI to provide value to its customers. As a long-standing partner in the Avaya DevConnect program and developer of call center speech applications, INI has a deep history in deploying applications on Avaya platforms — making it a reliable partner capable of delivering results that promote the success and profitability of its customers. www.interactivenw.com
Speech recognition technology has improved leaps and bounds over the past decade. Today, we use speech technology in all aspects of our lives. Given the high rate of adoption across all markets and the introduction of new devices such a speech-enabled virtual assistants, companies who were once skeptical of using the technology have found themselves re-examining its possibilities. While this is an exciting time in the speech industry, companies considering speech recognition, whether directed dialog or natural language, continue to be challenged in justifying the investment. Executive leadership is often asked to prove the value of the investment through the impact of reduced costs, improved operations or enhanced customer experience. Most often, that value is defined by the rate of payback of the expenditure; the return on investment (ROI).
In theory developing an ROI analysis is quite easy; simply put it’s the cost of the solution divided by the expected monthly benefit of the solution in dollars.
But what does that mean for a speech implementation?
Why is it so hard to identify an accurate payback period when converting to a speech-enabled solution?
The answers lie in the inputs that determine the expected benefits. For most executives, the focus is quite narrow. The mindset generally leans towards deriving the benefit from the replacement of a DTMF IVR to a speech-enabled solution. Projections around increases in the number of identified or authenticated callers and transactional containment tend to drive the analysis. This approach is somewhat valid, but it is highly limiting and, when used in isolation, often yields disappointing results. In the end, the ROI analysis tends to fall short in capturing the full impact, leaving executives and investment committees wondering if a speech solution is a wise investment given what they determine to be the long-range benefit.
To overcome these limitations and build a compelling ROI, traditional thinking must be thrown out the window. Instead of being focused on replacement, it is vital to expand the understanding how a speech-enabled self-service solution will elevate the overall customer experience. With speech, doors open and the dynamic of the customer interaction changes forever. Whomever is responsible for conducting the ROI analysis must understand the complete customer experience, how all contact center technologies and processes will be impacted, and have insight into the overall company vision from a branding and marketing perspective. In particular, executives conducting an ROI analysis must consider the following to truly capture the numerous benefits that speech recognition provides.
Expanded Scope of the Solution
DTMF systems are inherently limited. These systems are bound by a menu hierarchy that, by design, has limitations imposed to effectively guide the caller to their ultimate destination. Since these design limitations significantly impact the reach of the self-service solution, the financial benefits derived will also be constrained. When moving to a speech-enabled solution, those design limitations disappear, providing greater flexibility in expanding the solution’s footprint.
By expanding the scope of the solution footprint, the base of customers you are able to service will expand. Imagine moving from an environment where each of your lines of business has its own entry point or a dedicated IVR to a solution that provides a single point of entry with a consolidated customer experience for all your callers. The benefits of such an approach are easily identified in terms of higher containment, reduced costs of solution maintenance and expanded capabilities. Those benefits play a powerful role in building a strong business case for a speech-enabled customer experience solution.
Speech-enabled solutions change the face of the customer experience. In addition to expanding the overall service footprint, speech recognition enables companies to serve their customers in ways that were never possible in DTMF systems. As organizations consider speech technology investments, it is important to step back and take inventory of their self-service application portfolio. This assessment must consider current functionality as well as new functionality that can be transacted due to the improved capabilities of the speech. By thinking of those customer inputs that were once too complex for a customer to enter using DTMF entry but can be accomplished readily with speech recognition allowing your organization to provide greater service options to customers and thereby experience higher rates of costs savings than the former DTMF system allowed. Additionally, a new speech solution can provide significant opportunities to partially automate transactions, such as claims initiation, shaving off several minutes off of the call once the customer is transferred to the Customer Service Representative.
Improved Intelligence of the Interaction
Since speech-enabled solutions provide a highly conversational interaction with customers, organizations are empowered to expand the level of intelligence their self-service solutions offer. When building the ROI analysis executives should consider how the new speech solution will change the dynamics of the conversation with the customer. Questions to consider are:
- Can I engage customers in manner that allows me to dynamically generate personalized treatment that results in higher rates of self-service or cross/up sell opportunities?
- When customers don’t want to play in the IVR, can I gather enough information to avoid costly misroutes?
- Can I take what I know about the customer and provide proactive information that might resolve their need before they move into the transactional path or transfer to an agent?
As your organization starts to build a business case to justify an investment in speech technology, looking at the areas discussed above will help to enhance the value of the solution. By broadening the reach of the new solution beyond the replacement mentality you will quickly start to see that speech technology is not only vital to the overall customer experience, but provides a solid return on investment, even for those organizations who think their annual call volume is too low to justify a move to speech recognition.
Think Tank Partners, is a leader in developing customer experience transformation strategies and designs, focused on establishing world-class, speech-enabled, conversational interactions across all consumer touchpoints. Think Tank Partners combines deep expertise in human factors and human-computer interaction with experience in business intelligence and analytics, providing strategic consulting that integrates corporate branding, consumer segmentation and business and market strategy to align business and technology roadmaps with end users’ broader strategic vision. www.thinktank-partners.com
Increasingly, these days, applications are finding their way from traditional premise-based installations into a variety of hosted, cloud-based and even hybrid architectures. This is occurring for a number of reasons – one is to help users eliminate the single points of failure in systems and data centers, so that if one machine or data center fails, total functionality is not lost. Another is ability to more cost effectively load balance and consolidate system resources and others choose it because it eliminates the need to manage a stack of systems on premise and some choose the cloud based systems to scale quickly and easily.
The same thinking is being applied within the Speech industry, where more and more applications are migrating to a cloud-based infrastructure.
Since our beginnings in 2001, LumenVox has designed our products to have a completely modular and distributed architecture, allowing the various speech resources to be installed on a number of different machines. This enables users to seamlessly migrate their applications to a cloud-based environment with minimal changes to configuration. In addition, virtualized environments, including Virtual Machines (VMs) are now commonly being used in many applications, allowing users to create new instances of servers based on failure conditions, or in response to increased usage. The larger, more demanding applications we see deployed today often rely on this type of robust architecture to supply their production requirements. LumenVox software is ideally suited for these types of environments and has supported them for some time.
At the beginning of 2012, we embarked on the significant effort to migrate our legacy product licensing to a new mechanism Flexible Licensing System (Flex) that enables modern cloud-based and virtualized configurations. Earlier this year, LumenVox successfully completed migration of all of our subscription-based customers over to the new Flex mechanism and was widely heralded by our customers as fast, easy and flexible.
The Flexible Licensing mechanism customers can oversubscribe to their licensing and peak above what they have purchased. Just like the benefit of cloud computing which grows and contracts on demand, one can easily scale up additional LumenVox ports instantly when necessary and only when needed. Ultimately we believe that it is important to allow customers to mix and match a variety of licensing schemes to suit their needs at that moment.
As our relationship with our partners mature, we find ourselves influenced to create and develop for their needs. We do not build something in hopes that our partners will buy it. We listen carefully to those around us and develop to their needs after the proper qualification is secured. Recently there has been a flurry of activity in Central and South America and especially in Brazil. One of our larger partners who have fully integrated our software into their platform, requested we develop a Brazilian Portuguese acoustic model. Given our willingness to please and our understanding of the market potential of undertaking such a venture, we recently agreed and added this to our ever growing list of languages. We used a novel approach to developing this model that we have been researching for quite some time and we think should satisfy most speech recognition applications.
The Brazilian economy is in a state that presents a favorable opportunity to increase automation to maximize their efficiencies. To answer this urgent call, LumenVox has expanded its ASR offering by creating a Brazilian Portuguese language model which will bring the number of ASR languages to 8 and the number of TTS languages to 23. Today LumenVox covers all of the America’s from Cape Horn to Cape Columbia and everywhere in between!
We will be doing a lot more with the Asian TTS languages in the future, once we figure out how to deal with some of the double byte issues in our Media Server. We just entered into QA with our new version so we should be able to share some details with you on this in just a few weeks.
LumenVox version 12.2, scheduled for release on Tuesday, Sept. 2, has a large number of exciting new changes. In particular, the Tuner is getting a major series of improvements, and some cool new changes have been added throughout.
From almost top to bottom, we have looked at how we can improve the usefulness of the LumenVox Speech Tuner. One of the first things we realized is that many users have trouble figuring out what they need to tune the most.
Analyzing by Menu
Loading data into the Tuner can be overwhelming, so we added a new concept to the Tuner called a menu. A menu is designed to allow you to filter data so you can tune a specific menu in an IVR or speech application.
The way this works is the Tuner analyzes the grammar files that were in use for each speech interaction. A main menu in a banking application might use the following grammars:
And a “transfer funds” menu might use:
Because the Tuner knows which grammars are active for which speech interactions, it can make logical inferences about which interactions should be grouped together. That grouping is the menu system. A new dropdown allows you to select from the various menus the Tuner recognizes and just pick the one you’d like to focus on.
New to the 12.2 are Tuner Wizards, a series of automated tools that guide you through the process of identifying problems and focusing on the relevant data. You can fire up the new Tuning wizard, pick a menu (or all of the data), and choose from a list of options to focus on. That list includes:
- Confidence Threshold Tuning
- Accuracy Tuning
- Out-of-Grammar Tuning
- Out-of-Coverage Tuning
- No-Input Tuning
- Decode Speed Tuning
- Decode Failure Tuning
The Tuning Wizard will let you know whether your data exhibits any problems related to these issues and then will help you identify which interactions contribute to the particular type of issues you’re facing. It’s a great way to focus your time so that you only pay attention to the items most relevant to you.
Grammar Editor Changes
The Grammar Editor is a long-standing feature in the Tuner, giving developers an easy way to build, edit, and test their grammars. Several new features enhance the capabilities even further:
- Multiple grammar parses. Previously, the Grammar Editor could only parse a sentence against a single grammar at a time. A new option allows developers to parse any combination of loaded grammars, making it easier to test how combinations of grammars will affect grammar coverage.
- Pronunciation Checker. A new module called the Pronunciation Checker shows where pronunciations for grammar items come from: are they in our built-in dictionary? A user-defined lexicon? Or are they being produced by our statistical pronunciation rules? Words which don’t have good pronunciation definitions often lead to errors in recognition, so this is a useful module for troubleshooting performance.
- Random Sentence Generator. This module generates 10 random sentences at a time that are allowed by the grammar. Using it, you can check grammar coverage to make sure that the words and phrases you expect to be in grammar are, while simultaneously ensuring that phrases you don’t expect to be in grammar are not.
We’ve written a lot about speech tuning (for instance: Calculating Speech Recognition Accuracy, Collecting Data for Speech Tuning, Practical Tuning Strategies, and The Importance of Tuning) because it’s such a vital part of building great speech applications.
But as much as we might like tuning because it makes applications better, there’s also a real bottom-line value associated with it. In a new whitepaper on LumenVox.com we describe the ROI of Speech Tuning, showing how to calculate the return on investment (ROI) of speech tuning.
You can go read the full paper to understand how tuning can save you hundreds of thousands of dollars per year, but we’ve also put together this infographic to summarize the process: