Speech technology can truly bring the customer experience to life, but it takes a unique blend of creativity, technology, and hardware to do so. We recently interviewed LumenVox Software Engineer Shaun McThomas to gain his perspective on the art of integrating speech technology with IVRs to enhance the customer experience.
Hi, my name is Shaun McThomas. I’m a Software Engineer at LumenVox, and today I’m here to talk to you about creating next-generation conversational Interactive Voice Response systems. IVR for short.
What are the biggest issues facing customers and IVR-s today?
One of the biggest issues with IVRs today is that callers are forced to follow a rigid script. It’s not a conversation, it’s an interrogation. First, they are asked, give me this bit of information; then they are asked for another bit and another bit. There is no flow like you would have with another person, just a series of “painful, tiny steps”, making the whole process ridged and uncomfortable for the caller.
Another issue is you often get trapped in IVR jail there is no escape route. You are forced to listen to the very end of the prompt before you can respond, “This isn’t what I want; let me go back to the Main menu.”
What do you think the solution is to address these pain points?
Most of these issues are easily addressed with an artful blend of good design and use of modern speech recognition technologies, what LumenVox calls Speech Art. If you listen to the very best contact center agents within a business and model how they solve the same issues and how they question a caller, you’ll understand how callers really ask questions and can provide very lifelike IVR responses. By following this model, you can produce frictionless, intuitive (and personalized) interactions with callers, radically improving their experience.
The very first thing a good IVR should do is quickly identify who the caller is, confirm that assumption. Remember, a blend of technologies can make this easier. Look up the phone number they are calling from in your back-end systems and see if you can determine their identify from that. You can use both speech recognition and voice biometric authentication to make that process simple and easy if needed. More than likely calling from their cell phone, which provides a unique identifier.
Once you’ve identified the caller, use data available from your back-end systems to anticipate the reason for calling and personalize the next steps.
For example, if you’re a power company and a customer’s home is in the middle of a known power outage, assume that’s the reason for the call. Likewise, if you’re an airline and they have a flight booked on your airline that departs within the next 24 hours, assume that’s the reason for the call. Now that you’ve made an assumption, confirm that’s the reason they are calling with a simple yes/no prompt and if yes provide them with appropriate information. If they are not calling for that reason, ask them why they’ve called, but allow them to use natural language to answer. And always give them a way to correct themselves.
How does Conversational IVR work, exactly?
Conversational IVRs work by leveraging three key technologies, ASRs, NLU, and TTS. These aren’t the only piece to the puzzle, but they are important ones. Let’s talk a little about each.
First, there is Text to Speech (or TTS for short). TTS is the method to turn text into speech. This is key to allow you to easily ask questions quickly. It is important to use TTS instead of recordings to allow questions to be personalized. For example, when a caller first calls in and you want to verify them, you can use their name and directly ask if it’s them.
Next, there is the Automatic Speech Recognizers (or ASR for short). An ASR’s job is to take speech, recognize it as something meaningful, and then turn it into something useful like text. There are lots of types of ASRs. LumenVox’s new transcription ASR uses machine learning techniques such as deep neural networks for natural language processing. This is effective for transcribing text from human speech. Before this sort of technology existed, you had to constrain your recognizers to a limited set of words (called a grammar) that it could only recognize. Modern NLP models have a large set of words they can recognize allowing you can speak naturally, and it will be able to feed you back the raw transcribed text. Once the ASR has done its job, we have that text.
Finally, we need to use another technology, which is Natural Language Understanding or NLU for short. NLU takes this text and converts it into meaning, intents, and slots, for example:
The caller can say: “I want to fly from New York to LA.” And we parse out “to fly, New York” “destination, from LA.”
Using these three technologies we can create a conversation with a caller rather than a scripted interrogation. First, we would use TTS to ask the caller a question, then an ASR to get text back from the caller’s response, then NLU to understand that response, and then finally use that understanding to figure out what additional information we need from the caller or processes the caller’s request.
What sets LumenVox apart?
At LumenVox, we’re creating a Configurable AI Gateway that makes it easy to integrate many different NLU engines with our ASR. This approach opens the possibility to use widely available NLU platforms from IBM, Google, Microsoft, Amazon, and others with your existing IVR along with LumenVox ASR, TTS and Voice Biometrics.
Many technology vendors don’t offer choices in the combinations of ASR, NLP, and NLU that you can use to build a solution. Their entire suite of technology and tools is often proprietary, expensive, and because its proprietary, involves the use of expensive, dedicated professional service teams. At LumenVox we want to be able to easily integrate existing technologies with our speech recognition, text-to-speech, and voice biometrics software as part of the solution stack. We want to take the technology that’s already out there and make it easier for our customers to use.
Ready to take your contact center to the next level by implementing a conversational IVR? Contact us today!
LumenVox is excited to announce the release of LumenVox Version 18.0.400 of the LumenVox Speech Solutions. These new and enhanced features focus on security and flexibility so that companies of all shapes and sizes can directly address business needs in today’s changing world.
Support for additional connections, testing and enhanced security with encrypted audio packet transmission With this version, we have added Media Server support for MRCPv2 over the TLS protocol. Transport layer security (TLS) provides communication security between client/server applications, enabling privacy, integrity, and protection for the data that’s transmitted. Our Simple MRCP Client has been updated to allow testing of TLS features. This version also includes support for secure SIP (SIPS) connections and encrypted audio packet transmission over the SRTP protocol. The Secure Real-time Transport Protocol (SRTP) provides encryption, message authentication, and integrity, and replay attack protection to the RTP data.
Additional Short Utterance Transcription Language Support We have added support to load alternate language models when using short utterance transcription. This provides greater convenience and ease as it removes the need for a costly or complex statistical language model, and it increases accuracy.
Up to date operating system support for optimal security, performance, development and streamlined installation We have added support for Windows Server 2019 Operating Systems, and deprecated support for Windows Server 2008 and Windows 7. The Windows Product Installer has been streamlined to allow easier product installation and Windows development has been migrated to Visual Studio 2017 for optimal coding algorithms.
Added flexibility regarding MRCP properties All Media Resource Control Protocol (MRCP) headers can now be overwritten as a Vendor-specific-parameter. If a header appears in both the header and the Vendor-specific-parameter header, the one in the Vendor-specific-parameter will override the other setting. This is to allow customer the ability to change MRCP properties that a given platform may not allow.
Extended dashboard diagnostic capabilities The dashboard diagnostic capabilities were extended for TLS protocol functionality/testing and enhanced with the ability to back up and restore configuration settings from the diagnostic page of the dashboard.
These enhancements have been driven by partner feedback and requests. They showcase LumenVox’ continued commitment to making speech applications more secure and easier to incorporate into everyday business.
For a comprehensive list of improvements and features released with LumenVox Version 18.0.400. please click here.
The number of customers calling organizations to request a promise to pay is increasing significantly with these uncertain financial times. The result is a tsunami of phone calls that contact centers can’t handle. This leads to further customer frustration and stress during an already difficult period. It also drives up costs in the contact center with no associated increase in revenue, making it a losing proposition for all involved.
Imagine this scenario: A customer recently lost his job and can’t afford his utility bill. He’s already called the contact center once to defer the payment due date. Admitting to a contact center agent that he can’t afford to pay on time is already frustrating and embarrassing. The problem now is that he knows he can’t pay the full amount, so he calls again to request a payment arrangement.
Every time he calls, he’s asked the same series of questions: account number, street address, last four digits of his social. When he finally reaches the IVR menu, a set of submenus is presented. Then there’s a transfer to the contact center itself, with long wait times. Finally, the customer reaches an agent who can set up a payment plan. It’s laborious, time-consuming and ultimately leaves the customer more stressed than when he started.
The solution to this painful process is increased self-service in the IVR and secure, effortless authentication.
With a few tweaks to the existing IVR, this customer can hear a personalized greeting based on his ANI (caller ID). The IVR can also immediately identify a past due bill and ask him if he is calling to request a promise to pay (or set up a payment plan). If the answer is ‘yes’, the caller is authenticated with a simple passphrase, such as “My voice is my password.” The promise to pay or a payment plan is then completed and the call ends. The customer has a streamlined experience without requiring interaction with an agent. Additionally, IVR containment is achieved; a contact center agent handles one less call – saving money and improving wait times.
LumenVox enables organizations to exceed expectations, even during these stressful times, with state-of-the-art speech recognition and voice biometrics. The use of these technologies can provide material cost savings with a demonstrated return on investment. They can also help customers accomplish their goals quickly using the IVR, one of the least expensive channels available.
Contact LumenVox today to see how speech recognition and voice biometrics can work for you to help increase IVR containment.
LumenVox, a global leader in speech and authentication solutions, was named a top vendor of Biometrics, Speech Recognition and Text-to-Speech in Speech Technology Magazine’s 2nd Annual People’s Choice Awards. Being recognized as users’ top three favorite vendors reflects LumenVox’ reputation as a pioneer in the market and their commitment to providing innovative, natural and intelligent customer experiences.
Speech Technology Magazine’s People’s Choice Awards showcases voice-based technology companies that their readers feel demonstrate excellence, innovation and industry influence with the best technology, products and services. LumenVox won the awards in major categories that speak to the current direction of the market. Voice biometrics provides significant improvement in the security of remote channels, just as Speech Recognition and Text to Speech greatly enhance self-service capabilities.
“It’s an honor to once again be acknowledged by the people who use our technology. Since last year’s awards, LumenVox has continued to deploy and support superior speech and authentication solutions worldwide, including active/passive voice biometric authentication, automated password reset and fraud detection,” said Edward Miller, CEO of Lumenvox. “These awards further demonstrate our role as leaders in the voice industry and drive our resolution to continually improve the customer experience.”
“This year marks the 25th anniversary of Speech Technology magazine’s founding, and the industry is showing no signs of slowing down when it comes to technological innovation,” said Leonard Klie, Speech Technology magazine’s editor. “From consumer electronics to business technology vendors, this year’s winners of the People’s Choice Awards demonstrate the industry’s best and show just how diverse, useful, and popular voice technology has become today.“
The Speech Technology Magazine’s People’s Choice Awards recognizes key industry influencers working within the speech technology market. For the full list of winners, go here.
View the official press release here. Questions about our solutions? Contact us.
In response to COVID-19 and its burden upon our customers, we wanted to do our part. As of today, LumenVox Advanced Speech Recognition (ASR) customers who have a maintenance agreement can use up to a 50% increase of their current ASR license volume, at no additional cost, for 60 days.
It’s imperative to stay connected to your customers right now. During this uncertain time, LumenVox remains flexible and committed to serving our partners and customers, making sure your service remains seamless, as demand and dependency upon remote channels increases.
“We hope that this flexibility to our licensing allows for increased automation and alleviates some of the contact center stress we are experiencing during COVID-19. This is just one way we can help come together and support one another.”
-Edward Miller, CEO of LumenVox.
This free bursting also demonstrates our commitment to both outstanding customer service and meaningful support, from speech application development to deployment and daily use.
LumenVox ASR is a software solution that converts spoken audio into text. Its ability to recognize naturally spoken language and its tuning flexibility set the technology apart as an industry standard. With LumenVox ASR, user experiences improve, and completion rates rise. To accompany this technology, we provide a wide selection of licensing options, including per-port, monthly subscription, use-based, bursting and Software as a Service (SaaS).
LumenVox is also directly addressing one of the major issues resulting from COVID-19, a spike in fraudulent attacks on contact centers, with Fraud Scanner, a voice-based fraud detection tool. Learn more about this fraud detection strategy here.
To further discuss your licensing needs, or to inquire about any of our solutions, contact a service representative here.
You can read a message from our CEO further detailing our response to COVID-19 here.
LumenVox Luminaries is a podcast that broadcasts thought leadership pieces on the subject of voice technology. This episode features Jeff Hopper, Vice President of Business Development with his perspective on LumenVox’ next generation of conversational IVR.
I want to tell you about some work that we’re doing in our engineering team right now that will begin to become available in 2020. We’ve taken a step back and looked at the existing state of the speech recognition market for the IVR space and the product that we used to have, that we deprecated, what our competitors do, etc. And we’ve concluded that there’s a better way to go about this than the way the industry has historically.
When you look at our competition, their traditional tier-four speech recognition was speech recognition with natural language understanding. It was first and foremost 10-year-old technology and a proprietary black box. The only people who could develop an application for a customer with it was the that speech vendor’s professional services team. With my 20 years of personal experience in the space, I can only name–with the fingers on one hand–people outside of that vendor who can actually build a tier-four application successfully for you.
So our first driver to this new idea was let’s take advantage of some things that have changed in the state of the art technically, and let’s build a new platform that is more open more accessible, easier to use and not that proprietary black box, if you will, for speech recognition. So if you understand any of the history of natural language IVRs, essentially the idea is that instead of asking specific questions, like “What city do you want to fly to?” And you say, “Memphis or Nashville,” or whatever the choice is and the recognizer can only make a determination from a defined list of choices. You should be able to say things like, “I’d like to book a flight next Tuesday from Seattle to Memphis in the afternoon.” And that recognizer should be able to parse out both the intent–“I want to book a flight”–and all of the values in that statement that are necessary, like the departure city of Seattle, the arrival city is Memphis, and the travel date is next Thursday from that conversational statement that the caller makes. So the traditional mechanisms have been to build these proprietary applications that use two parts under the hood, but most people don’t realize they’re two parts. The first is the speech recognizer that takes what I said and converts it into raw text. The second part is something called an NL or an SLM, traditionally, in the speech space, a statistical language model that will take those words, parse them apart, and try to infer the meaning based on machine learning.
It is not very different conceptually to modern machine learning and artificial intelligence except that it’s built on a much older set of tools and a much more limited set of machine learning capabilities. So when you build an application like that today with our competition’s ASR offering, it is a sealed box. It’s difficult to make changes to it over time, and they tend to be extremely expensive from a professional services perspective to deliver.
So what we’re proposing, and not just proposing, but building the infrastructure for, is a new generation of conversational IVR. And we’re going to do it in a couple of ways: We’ve already done what I call part A of the three parts, and that is we have built an entirely new speech recognition engine based on the latest in machine learning processes, specifically deep neural networks so that the core recognizer that will work in this stack is absolutely state of the art/ has excellent recognition capabilities and is easy to stand up, install and configure to run in your application stack. And more importantly, it’s designed to do transcription, not directed dialogue with grammars like that old style of IVR application. It’s intended to take raw tech or raw speech from a collar and transform that into text. The second part, part B, of our application stack is going to be a new AI platform artificial intelligence that uses machine learning. It’s built on commercially available AI components that already exist today that are also state-of-the-art. They’re components from companies like Google, or some firms that Google has purchased, that Google has put out into the open source world. We’re going to build the machine learning AI piece that does the intent determination from the text and extracts those values or entities, like departure city, arrival city or whatever the particular conversation might be. From that text we can pass that back to an application in your IVR to do work. That second part is in engineering now, in the process of productization, and it will give you an excellent starting point to accomplish what is very typically a difficult process with tier-four applications today. And the tool set is one that is widely commercially adopted; there’s lots of people who already understand how to use it. We’re essentially just going to provide the plumbing to connect it into the rest of your IVR stack and our speech recognizer in a simple and easy way.
Coming on top of that in the third part of this process will be the addition of something that we’re calling on AI gateway. If you look at the slide in front of you right now, you can see the AI platform over on the right hand side and LumenVox listed down below it as one possible AI platform, but up above you see a number of other names that you’ll recognize things like Amazon Lex Microsoft Luis, Google’s dialogue flow IBM Watson and others. Those are all widely used, commercially available AI engines today that use machine learning to produce artificial intelligence that help you parse out the answers you’re looking for from the text. What we’re going to do is provide a configurable gateway that will operate from the LumenVox media server so that in your IVR applications you can take advantage of existing AI that you’ve already built with those commercial tools, things like FAQ question chat bots that are on your website today, or other mobile applications that you’ve built that use text and machine learning or AI to respond to that text. You’ll be able to take those models and add them to your existing IVR stack so you’re not starting from scratch with the learning process for the AI mechanisms. You can continue to reuse something you’ve already built and enhance it. That’s almost always less expensive than starting from scratch to build a new AI platform and a new AI model for your particular business situation.
We have some customers who are already using this approach in an experimental stack, and I say experimental–some early proof of concept applications today rather than going out of the LumenVox media server. They’re making the AI request out of their voice application platform today, which requires a little bit more work on their part. But we know that the new generation of recognizer we have in place, when combined with that kind of external AI approach, is actually working well. And then in 2020 we will add that third part of the AI gateway to the LumenVox media server to make all of the integration work simpler or quicker and easier for you.
Have questions about our next generation of conversational IVR? Contact us today!