LumenVox Luminaries is a podcast that broadcasts thought leadership pieces on the subject of voice technology. This episode features Jeff Hopper, Vice President of Business Development with his perspective on LumenVox’ next generation of conversational IVR.
You can connect with Jeff on LinkedIn here.
Listen to the Podcast
Read the Transcript
I want to tell you about some work that we’re doing in our engineering team right now that will begin to become available in 2020. We’ve taken a step back and looked at the existing state of the speech recognition market for the IVR space and the product that we used to have, that we deprecated, what our competitors do, etc. And we’ve concluded that there’s a better way to go about this than the way the industry has historically.
When you look at our competition, their traditional tier-four speech recognition was speech recognition with natural language understanding. It was first and foremost 10-year-old technology and a proprietary black box. The only people who could develop an application for a customer with it was the that speech vendor’s professional services team. With my 20 years of personal experience in the space, I can only name–with the fingers on one hand–people outside of that vendor who can actually build a tier-four application successfully for you.
So our first driver to this new idea was let’s take advantage of some things that have changed in the state of the art technically, and let’s build a new platform that is more open more accessible, easier to use and not that proprietary black box, if you will, for speech recognition. So if you understand any of the history of natural language IVRs, essentially the idea is that instead of asking specific questions, like “What city do you want to fly to?” And you say, “Memphis or Nashville,” or whatever the choice is and the recognizer can only make a determination from a defined list of choices. You should be able to say things like, “I’d like to book a flight next Tuesday from Seattle to Memphis in the afternoon.” And that recognizer should be able to parse out both the intent–“I want to book a flight”–and all of the values in that statement that are necessary, like the departure city of Seattle, the arrival city is Memphis, and the travel date is next Thursday from that conversational statement that the caller makes. So the traditional mechanisms have been to build these proprietary applications that use two parts under the hood, but most people don’t realize they’re two parts. The first is the speech recognizer that takes what I said and converts it into raw text. The second part is something called an NL or an SLM, traditionally, in the speech space, a statistical language model that will take those words, parse them apart, and try to infer the meaning based on machine learning.
It is not very different conceptually to modern machine learning and artificial intelligence except that it’s built on a much older set of tools and a much more limited set of machine learning capabilities. So when you build an application like that today with our competition’s ASR offering, it is a sealed box. It’s difficult to make changes to it over time, and they tend to be extremely expensive from a professional services perspective to deliver.
So what we’re proposing, and not just proposing, but building the infrastructure for, is a new generation of conversational IVR. And we’re going to do it in a couple of ways: We’ve already done what I call part A of the three parts, and that is we have built an entirely new speech recognition engine based on the latest in machine learning processes, specifically deep neural networks so that the core recognizer that will work in this stack is absolutely state of the art/ has excellent recognition capabilities and is easy to stand up, install and configure to run in your application stack. And more importantly, it’s designed to do transcription, not directed dialogue with grammars like that old style of IVR application. It’s intended to take raw tech or raw speech from a collar and transform that into text. The second part, part B, of our application stack is going to be a new AI platform artificial intelligence that uses machine learning. It’s built on commercially available AI components that already exist today that are also state-of-the-art. They’re components from companies like Google, or some firms that Google has purchased, that Google has put out into the open source world. We’re going to build the machine learning AI piece that does the intent determination from the text and extracts those values or entities, like departure city, arrival city or whatever the particular conversation might be. From that text we can pass that back to an application in your IVR to do work. That second part is in engineering now, in the process of productization, and it will give you an excellent starting point to accomplish what is very typically a difficult process with tier-four applications today. And the tool set is one that is widely commercially adopted; there’s lots of people who already understand how to use it. We’re essentially just going to provide the plumbing to connect it into the rest of your IVR stack and our speech recognizer in a simple and easy way.
Coming on top of that in the third part of this process will be the addition of something that we’re calling on AI gateway. If you look at the slide in front of you right now, you can see the AI platform over on the right hand side and LumenVox listed down below it as one possible AI platform, but up above you see a number of other names that you’ll recognize things like Amazon Lex Microsoft Luis, Google’s dialogue flow IBM Watson and others. Those are all widely used, commercially available AI engines today that use machine learning to produce artificial intelligence that help you parse out the answers you’re looking for from the text. What we’re going to do is provide a configurable gateway that will operate from the LumenVox media server so that in your IVR applications you can take advantage of existing AI that you’ve already built with those commercial tools, things like FAQ question chat bots that are on your website today, or other mobile applications that you’ve built that use text and machine learning or AI to respond to that text. You’ll be able to take those models and add them to your existing IVR stack so you’re not starting from scratch with the learning process for the AI mechanisms. You can continue to reuse something you’ve already built and enhance it. That’s almost always less expensive than starting from scratch to build a new AI platform and a new AI model for your particular business situation.
We have some customers who are already using this approach in an experimental stack, and I say experimental–some early proof of concept applications today rather than going out of the LumenVox media server. They’re making the AI request out of their voice application platform today, which requires a little bit more work on their part. But we know that the new generation of recognizer we have in place, when combined with that kind of external AI approach, is actually working well. And then in 2020 we will add that third part of the AI gateway to the LumenVox media server to make all of the integration work simpler or quicker and easier for you.
Have questions about our next generation of conversational IVR? Contact us today!