The development of Automatic Speech Recognition techniques continues to accelerate. Already an established technology, Automatic Speech Recognition is growing by leaps and bounds each year, especially as artificial intelligence contributes to evolution. A crucial building block of artificial intelligence is deep learning.
What is Deep Learning?
Deep learning refers to the process of a computer model learning how to do classification tasks by example, directly from audio, text, or images. These models are trained using very large sets of data and neural network topologies with many hidden layers, to which the word “deep” refers. Deep Neural Networks can achieve state-of-the-art performance in many different fields, even exceeding human-level performance on some of them.
What are Neural Networks?
More specifically, neural networks are a series of algorithms, whose job it is to identify relationships within a set of data, a process that simulates the way a human brain identifies underlying connections. When it comes to speech technology, .
Which Neural Network for Automatic Speech Recognition?
Deep Neural Networks are transforming the way humans interact, playing an important role in the technological revolution of artificial intelligence. At LumenVox, our Research and Development team is currently utilizing Time Depth Separable Convolutional Neural Networks (TDS CNN).
Convolutional Neural Networks are advantageous for a few reasons: They are computationally efficient, making them highly useful for mobile applications, and they have fewer knobs to toy with, fewer parameters to adjust. That means LumenVox customers get an ASR engine with greater speech recognition accuracy without requiring more compute performance, encouraging greater efficiency and performance.
LumenVox’ deep learning technology is applied to many of our technologies, including Automatic Speech Recognizer, Natural Language Processing, and Voice Biometrics. To learn more about our comprehensive stack, or to take an even deeper dive into deep learning, contact us today!
Right now, customers crave communication. LumenVox Call Progress Analysis (CPA) is a technology that delivers proactive outbound phone messages with incredible accuracy, enabling businesses to give their customers the information they need exactly when they need it. CPA accomplishes this by a unique combination of tone detection and speech recognition. At first blush, the world of voice technology may label this product as Answering machine detection. But it’s not. So what’s the difference?
Answering Machine Detection (AMD) is typically a software that determines whether a machine (voicemail or answering machine) or a human has answered a call. AMD can be applied to all automated outbound calling use cases (outbound IVR, Text-to-speech, pre-recorded automated calls, or click-to-call).
Traditional Answering Machine Detection products use some form of energy or voice detection that extracts the duration of the greeting from the person or device answering the call. Answering machine messages can be remarkably diverse, effecting the algorithm and its decision-making process.
LumenVox Call Progress (CPA) Analysis has two distinct advantages when compared to AMD software.
Typical AMD software only uses energy detection, while LumenVox CPA is built on LumenVox’ Speech Recognition Engine’s Voice Activity Detection (VAD) which uses a more sophisticated machine learning process to detect speech vs non speech with greater accuracy. The VAD processes are used to determine speech vs background noises and measure the length of speech with greater accuracy. We then use the statistical correlation to make a machine vs human determination.
When the machine vs determination decision is made, traditional AMD software expects to find silence of some duration after the speech to indicate to the application that it is OK to leave a message. With LumenVox CPA we take advantage of our Speech Recognition engine to detect the answering machine tone before indicating that it is OK to leave the message. This ensures that the full message is successfully delivered.
With LumenVox Call Progress Analysis, your outbound dialing application can exceed expectations, starting precisely where it should, right after the beep. Its flexibility makes it compatible with most voice platforms and PBX systems. To learn more about how CPA can work for you, click here or contact us today.
Speech technology can truly bring the customer experience to life, but it takes a unique blend of creativity, technology, and hardware to do so. We recently interviewed LumenVox Software Engineer Shaun McThomas to gain his perspective on the art of integrating speech technology with IVRs to enhance the customer experience.
Hi, my name is Shaun McThomas. I’m a Software Engineer at LumenVox, and today I’m here to talk to you about creating next-generation conversational Interactive Voice Response systems. IVR for short.
What are the biggest issues facing customers and IVR-s today?
One of the biggest issues with IVRs today is that callers are forced to follow a rigid script. It’s not a conversation, it’s an interrogation. First, they are asked, give me this bit of information; then they are asked for another bit and another bit. There is no flow like you would have with another person, just a series of “painful, tiny steps”, making the whole process ridged and uncomfortable for the caller.
Another issue is you often get trapped in IVR jail there is no escape route. You are forced to listen to the very end of the prompt before you can respond, “This isn’t what I want; let me go back to the Main menu.”
What do you think the solution is to address these pain points?
Most of these issues are easily addressed with an artful blend of good design and use of modern speech recognition technologies, what LumenVox calls Speech Art. If you listen to the very best contact center agents within a business and model how they solve the same issues and how they question a caller, you’ll understand how callers really ask questions and can provide very lifelike IVR responses. By following this model, you can produce frictionless, intuitive (and personalized) interactions with callers, radically improving their experience.
The very first thing a good IVR should do is quickly identify who the caller is, confirm that assumption. Remember, a blend of technologies can make this easier. Look up the phone number they are calling from in your back-end systems and see if you can determine their identify from that. You can use both speech recognition and voice biometric authentication to make that process simple and easy if needed. More than likely calling from their cell phone, which provides a unique identifier.
Once you’ve identified the caller, use data available from your back-end systems to anticipate the reason for calling and personalize the next steps.
For example, if you’re a power company and a customer’s home is in the middle of a known power outage, assume that’s the reason for the call. Likewise, if you’re an airline and they have a flight booked on your airline that departs within the next 24 hours, assume that’s the reason for the call. Now that you’ve made an assumption, confirm that’s the reason they are calling with a simple yes/no prompt and if yes provide them with appropriate information. If they are not calling for that reason, ask them why they’ve called, but allow them to use natural language to answer. And always give them a way to correct themselves.
How does Conversational IVR work, exactly?
Conversational IVRs work by leveraging three key technologies, ASRs, NLU, and TTS. These aren’t the only piece to the puzzle, but they are important ones. Let’s talk a little about each.
First, there is Text to Speech (or TTS for short). TTS is the method to turn text into speech. This is key to allow you to easily ask questions quickly. It is important to use TTS instead of recordings to allow questions to be personalized. For example, when a caller first calls in and you want to verify them, you can use their name and directly ask if it’s them.
Next, there is the Automatic Speech Recognizers (or ASR for short). An ASR’s job is to take speech, recognize it as something meaningful, and then turn it into something useful like text. There are lots of types of ASRs. LumenVox’s new transcription ASR uses machine learning techniques such as deep neural networks for natural language processing. This is effective for transcribing text from human speech. Before this sort of technology existed, you had to constrain your recognizers to a limited set of words (called a grammar) that it could only recognize. Modern NLP models have a large set of words they can recognize allowing you can speak naturally, and it will be able to feed you back the raw transcribed text. Once the ASR has done its job, we have that text.
Finally, we need to use another technology, which is Natural Language Understanding or NLU for short. NLU takes this text and converts it into meaning, intents, and slots, for example:
The caller can say: “I want to fly from New York to LA.” And we parse out “to fly, New York” “destination, from LA.”
Using these three technologies we can create a conversation with a caller rather than a scripted interrogation. First, we would use TTS to ask the caller a question, then an ASR to get text back from the caller’s response, then NLU to understand that response, and then finally use that understanding to figure out what additional information we need from the caller or processes the caller’s request.
What sets LumenVox apart?
At LumenVox, we’re creating a Configurable AI Gateway that makes it easy to integrate many different NLU engines with our ASR. This approach opens the possibility to use widely available NLU platforms from IBM, Google, Microsoft, Amazon, and others with your existing IVR along with LumenVox ASR, TTS and Voice Biometrics.
Many technology vendors don’t offer choices in the combinations of ASR, NLP, and NLU that you can use to build a solution. Their entire suite of technology and tools is often proprietary, expensive, and because its proprietary, involves the use of expensive, dedicated professional service teams. At LumenVox we want to be able to easily integrate existing technologies with our speech recognition, text-to-speech, and voice biometrics software as part of the solution stack. We want to take the technology that’s already out there and make it easier for our customers to use.
Ready to take your contact center to the next level by implementing a conversational IVR? Contact us today!
LumenVox is excited to announce the release of LumenVox Version 18.0.400 of the LumenVox Speech Solutions. These new and enhanced features focus on security and flexibility so that companies of all shapes and sizes can directly address business needs in today’s changing world.
Support for additional connections, testing and enhanced security with encrypted audio packet transmission With this version, we have added Media Server support for MRCPv2 over the TLS protocol. Transport layer security (TLS) provides communication security between client/server applications, enabling privacy, integrity, and protection for the data that’s transmitted. Our Simple MRCP Client has been updated to allow testing of TLS features. This version also includes support for secure SIP (SIPS) connections and encrypted audio packet transmission over the SRTP protocol. The Secure Real-time Transport Protocol (SRTP) provides encryption, message authentication, and integrity, and replay attack protection to the RTP data.
Additional Short Utterance Transcription Language Support We have added support to load alternate language models when using short utterance transcription. This provides greater convenience and ease as it removes the need for a costly or complex statistical language model, and it increases accuracy.
Up to date operating system support for optimal security, performance, development and streamlined installation We have added support for Windows Server 2019 Operating Systems, and deprecated support for Windows Server 2008 and Windows 7. The Windows Product Installer has been streamlined to allow easier product installation and Windows development has been migrated to Visual Studio 2017 for optimal coding algorithms.
Added flexibility regarding MRCP properties All Media Resource Control Protocol (MRCP) headers can now be overwritten as a Vendor-specific-parameter. If a header appears in both the header and the Vendor-specific-parameter header, the one in the Vendor-specific-parameter will override the other setting. This is to allow customer the ability to change MRCP properties that a given platform may not allow.
Extended dashboard diagnostic capabilities The dashboard diagnostic capabilities were extended for TLS protocol functionality/testing and enhanced with the ability to back up and restore configuration settings from the diagnostic page of the dashboard.
These enhancements have been driven by partner feedback and requests. They showcase LumenVox’ continued commitment to making speech applications more secure and easier to incorporate into everyday business.
For a comprehensive list of improvements and features released with LumenVox Version 18.0.400. please click here.
The number of customers calling organizations to request a promise to pay is increasing significantly with these uncertain financial times. The result is a tsunami of phone calls that contact centers can’t handle. This leads to further customer frustration and stress during an already difficult period. It also drives up costs in the contact center with no associated increase in revenue, making it a losing proposition for all involved.
Imagine this scenario: A customer recently lost his job and can’t afford his utility bill. He’s already called the contact center once to defer the payment due date. Admitting to a contact center agent that he can’t afford to pay on time is already frustrating and embarrassing. The problem now is that he knows he can’t pay the full amount, so he calls again to request a payment arrangement.
Every time he calls, he’s asked the same series of questions: account number, street address, last four digits of his social. When he finally reaches the IVR menu, a set of submenus is presented. Then there’s a transfer to the contact center itself, with long wait times. Finally, the customer reaches an agent who can set up a payment plan. It’s laborious, time-consuming and ultimately leaves the customer more stressed than when he started.
The solution to this painful process is increased self-service in the IVR and secure, effortless authentication.
With a few tweaks to the existing IVR, this customer can hear a personalized greeting based on his ANI (caller ID). The IVR can also immediately identify a past due bill and ask him if he is calling to request a promise to pay (or set up a payment plan). If the answer is ‘yes’, the caller is authenticated with a simple passphrase, such as “My voice is my password.” The promise to pay or a payment plan is then completed and the call ends. The customer has a streamlined experience without requiring interaction with an agent. Additionally, IVR containment is achieved; a contact center agent handles one less call – saving money and improving wait times.
LumenVox enables organizations to exceed expectations, even during these stressful times, with state-of-the-art speech recognition and voice biometrics. The use of these technologies can provide material cost savings with a demonstrated return on investment. They can also help customers accomplish their goals quickly using the IVR, one of the least expensive channels available.
Contact LumenVox today to see how speech recognition and voice biometrics can work for you to help increase IVR containment.
LumenVox, a global leader in speech and authentication solutions, was named a top vendor of Biometrics, Speech Recognition and Text-to-Speech in Speech Technology Magazine’s 2nd Annual People’s Choice Awards. Being recognized as users’ top three favorite vendors reflects LumenVox’ reputation as a pioneer in the market and their commitment to providing innovative, natural and intelligent customer experiences.
Speech Technology Magazine’s People’s Choice Awards showcases voice-based technology companies that their readers feel demonstrate excellence, innovation and industry influence with the best technology, products and services. LumenVox won the awards in major categories that speak to the current direction of the market. Voice biometrics provides significant improvement in the security of remote channels, just as Speech Recognition and Text to Speech greatly enhance self-service capabilities.
“It’s an honor to once again be acknowledged by the people who use our technology. Since last year’s awards, LumenVox has continued to deploy and support superior speech and authentication solutions worldwide, including active/passive voice biometric authentication, automated password reset and fraud detection,” said Edward Miller, CEO of Lumenvox. “These awards further demonstrate our role as leaders in the voice industry and drive our resolution to continually improve the customer experience.”
“This year marks the 25th anniversary of Speech Technology magazine’s founding, and the industry is showing no signs of slowing down when it comes to technological innovation,” said Leonard Klie, Speech Technology magazine’s editor. “From consumer electronics to business technology vendors, this year’s winners of the People’s Choice Awards demonstrate the industry’s best and show just how diverse, useful, and popular voice technology has become today.“
The Speech Technology Magazine’s People’s Choice Awards recognizes key industry influencers working within the speech technology market. For the full list of winners, go here.
View the official press release here. Questions about our solutions? Contact us.