Voice has remained pervasive for business communications, and it is especially having an impact in this Age of Digital Transformation. However, voice poses major challenges for Contact Center and CX professionals to keep their voice-based resources up and running — not to mention managing to keep technology fresh and add new capabilities.
Too often they feel they are “damned if they do” when buying into the view that customer care is moving to chat and text. Or they feel “damned if they don’t” to keep the voice channel up to date with the latest and greatest AI-infused technologies.
Dramatic improvements in automatic speech recognition (ASR) and voice technologies have transformed the role of voice communication in the enterprise for customer and employee-facing applications.
Speech recognition has reached unprecedented levels of accuracy. Synthetic text-to-speech voices are often indistinguishable from humans. Voice biometrics detects both real and synthesized imposters reliably and at-scale.
We’re excited to join Dan Miller and Derek Top of Opus Research along with Joe Hagan, Chief Product Officer at LumenVox, on Tuesday, September 14th at 10am PT/1pm ET, for a lively discussion on how speech and voice technologies are shaping next-generation customer and employee experiences, including:
Accuracy – how accuracy and other performance gains instill the confidence businesses need to build new voice-first applications
Accessibility – guidance on choosing the right technology foundation and partner to meet current and future business needs
Affordability – the myth of “it’s expensive” and why it no longer applies – and options for businesses where the reverse is true
Flexibility – Deploy speech applications in any environment, in any cloud: on-premise, multi-cloud, or a hybrid model.
“New demands have redefined the very meaning of Automated Speech Recognition,” said Dan Miller, lead analyst at Opus Research. “LumenVox’s new ASR engine provides high levels of accuracy and intelligence required to capture, recognize, and react to each customer’s intent and define what’s possible for speech and voice recognition software.”
Register now to save your seat! Can’t make it? Register to receive a link to the webinar recording!
Opus Research is a diversified advisory and analysis firm providing critical insight on software and services that support multimodal customer care. Opus Research is focused on “Conversational Commerce,” the merging of intelligent assistant technologies, conversational intelligence, intelligent authentication, enterprise collaboration and digital commerce.
With companies rapidly evolving and seeking more voice-enabled applications to deliver powerful experiences, LumenVox was pleased to recently discuss the benefits organizations can see when utilizing an Automatic Speech Recognition (ASR) engine with extremely accurate transcription, flexibility, and high availability.
The Power of Speech
ASR’s everyday applications are vast, and it’s transforming how multiple industries do business. For example, media and entertainment companies can produce content faster when hours of audio or video files are converted into searchable transcripts.
Educational institutions can deliver accessible remote learning through real-time captioning in video conferencing software. In addition, researchers can begin analyzing qualitative data in a matter of minutes thanks to asynchronous, machine-generated transcription.
These are just a few examples of how speech-to-text technology is impacting society.
In addition to industry-leading accuracy and speed, LumenVox’s ASR engine utilizes an end-to-end Deep Neural Network (DNN) architecture to accelerate the ability to add new languages and to recognize non-native speaking accents. This enables LumenVox customers to serve a more diverse base of users.
The Value of Artificial Intelligence
With typical Machine Learning (ML) models, there are two fundamental elements: (1) the language model and (2) the creator of the language model.
The language model can ‘learn’ based upon the data it’s given. With a DNN, creators are not required to augment the code base when building or adding data, which is helpful in eliminating inherent biases.
Ultimately, the more robust data sets will provide a highly accurate, broadly applied language mode.
Delivering Enhanced Customer Experiences with Speech
ASR is a programmatic way to turn voice into text. Voices come in different dialects, languages, and with various levels of background noise.
A good ASR can turn the spoken word from a variety of languages and accents into readable, understandable text. Businesses can then use the text to strengthen decision-making and enhance customer experiences by serving a more diverse user base.
Ready to learn more about automatic speech recognition? Join Dan Miller, lead analyst at Opus Research, and Joe Hagan, chief product officer at LumenVox on September 14 at 11:00 a.m. PT / 2:00 p.m. ET as they discuss what is required to deliver meaningful employee and customer experiences through voice channels. Register now for the webinar.
As an organization that interacts with customers through speech applications, the quality of your speech recognition technology can make or break your CX.
In an ideal world, communicating with technology via speech would be as easy and natural as conversing with a human. This would make it so simple to access information and services remotely. It would also offer more independence to those who have no other option but voice user interfaces, such as young children who aren’t literate yet and people living with visual, motor or mobility impairments.
While some speech recognition technologies have made great strides in achieving these ideals, others are still falling far below expectations. This raises the question, why do some speech recognition technologies work well, while others fail?
The reality is: human speech is complex and constantly changing.
The challenges faced by modern speech recognition tools
An Automatic Speech Recognition (ASR) engine’s job is to take speech and identify it as something meaningful. Some ASRs have transcription capabilities, which allow them to turn that meaning into something useful, like text.
Getting this right is actually an incredibly challenging process. Firstly, ASRs must keep pace with the fact that language is constantly changing. In 2021, for instance, Merriam-Webster added 520 new words and definitions to its American English dictionary.
Also, ASRs must be able to separate speech from background and environmental noise. This could be the sound of traffic, a busy shopping mall, or even the interference that occurs due to the quality of the microphone used.
Unfortunately, many ASRs are simply not capable of handling these variables efficiently.
How to solve these problems
All this considered, companies need to choose their ASR engines carefully when building or modernizing speech-enabled customer experiences.
There are many different types of ASR engines on the market. Ideally, you want one that:
Supports all dialects within a given language
Offers advanced artificial intelligence and machine learning capabilities for maximum accuracy
Is able to continually learn from real-world usage and expand the language model to serve a more diverse base of users
LumenVox ASR with Transcription: Next-generation speech recognition
Status-quo speech recognition engines don’t have the machine learning capabilities to manage all the differentials in natural human speech—certainly not with the accuracy users expect. This is where LumenVox’s new ASR engine changes the game.
The technology that sets the LumenVox ASR engine apart is its end-to-end Deep Neural Network (DNN) architecture and state-of-the-art natural language processing and understanding capabilities. This creates an ASR engine that serves a much more diverse base of users.
Whereas other ASR engines treat different dialects as separate languages, LumenVox’s new ASR Engine with Transcription supports multiple dialects with one language model. This considers many different pronunciations in a single language, as opposed to having to train according to each individual user. The end-to-end recognizer matches audio to the written word—regardless of accent or other factors that impact pronunciation.
Additionally, no matter where the call or audio is coming from, the LumenVox Speech Recognizer separates speech from background noise using Voice Activity Detection (VAD). This takes a range of qualities into consideration, including energy level (volume), frequency (pitch) and changes in duration, to accurately detect the actual speech.
All this means that your speech solution can cater for a more diverse user base, in a broader range of scenarios, with market-leading accuracy.
Improve your speech application success rate with tuning
To get maximum value from your speech applications, LumenVox also offers an advanced turning tool that does all the heavy lifting for you, making it far easier for you to manage tuning in-house (and avoid expensive professional service fees).
LumenVox’s Speech Tuner performs transcriptions, instant parameter and grammar-tuning, and version upgrade-testing of any speech application, in less time and with less effort. This way, you can continually enhance speech recognition accuracy and build competitive advantage.
While there is room for improvement in the speech recognition technology landscape, the demand for voice-enabled solutions continues to grow. A study by National Public Media found that 52% of voice-assistant users say they use voice tech several times a day or nearly every day, compared to 46% before the pandemic.
If your company gets speech recognition right, you will be in a strong position to capitalize on this market growth.
With smart speakers and virtual assistants like Amazon Alexa, Apple’s Siri and Google Assistant part of our everyday lives, most of us understand the concept of voice-enabled technology. But how does speech recognition fit into this landscape and, more importantly, what value can it offer your business?
What is Speech Recognition?
The goal of speech recognition is to let people operate applications and devices, and access services, in a more natural and convenient way—using voice. This reduces reliance on clicking, tapping and typing. These manual approaches are not only more laborious but also exclude certain customers, such as those with motor disabilities who can’t use keyboards or other tactile devices.
The brain behind the modern speech recognition system is called an automatic speech recognizer (ASR) engine. This intelligent software is able to interpret spoken audio and convert it from a verbal format into a text format. This text then acts as a command to drive the next steps of your speech-enabled solution.
Decades of Development
Speech recognition technology is by no means a new concept, but it has evolved substantially since the mid-20th Century. While today, you can carry voice-enabled technology in your pocket, the first documented speech recognizer, launched in 1952, involved an entire room of electronics. Made by Bell Labs, this ‘Automatic Digit Recognition Machine’ was dubbed Audrey, and it could recognize the sound of spoken digits (zero through nine) when it was ‘adapted’ to the speaker—a ground-breaking achievement at the time.
In 2021, there are a great many speech recognition applications and devices available on the market. The more advanced ASRs, built on the foundations of artificial intelligence and deep neural networks, are able to recognize a diverse range of natural languages and dialects, spoken by millions of customers, with great accuracy. All this translates into a high-quality, friction-free automated user experience.
But the journey is far from over. Speech recognition is an ever-advancing field and the market for this technology continues to expand. Looking forward, experts predict that the global voice and speech recognition market will grow at a CAGR of 19.5% during 2021-2026.
Looking at it from another angle: in 2020, there were over 4 billion digital voice assistants being used around the world. In just four years, that number is expected to double. That means there could be more voice assistants on our planet than humans in the near future.
Improve efficiency: Organizations can use speech recognition to step up productivity and performance through a wide range of services, such as voice-activated banking or apps that allow users to compose messages verbally.
Enhance your IVR: With a well-chosen ASR, you can boost accuracy and speed within your IVR, reducing agent handling times and routing calls more efficiently to improve the overall customer experience.
Support analytics: You can automatically transcribe all verbal conversations in your contact center. This makes these interactions easier to analyze, whether you’re using automated sentiment analysis tools to gauge customer satisfaction levels or flagging common call patterns and issues for swift resolution.
Enable multi-tasking: Speech-enabled applications are hands-free. This way, your users can do other tasks (such as drive) while accessing your service. This improves usability and customer satisfaction.
Scale your reach: As with any automated technology, you can scale speech recognition rapidly without increasing human headcount. This makes it easier for you to expand into new markets or manage seasonal spikes in demand.
When you think about it, there are so many ways for your organization to integrate speech recognition into your solutions and services, to boost usability, save time and enhance CX.
LumenVox Automated Speech Recognizer – Speech Recognition, But Better
To harness these advantages and meet customer expectations, it’s vital that you choose a high-performing speech recognition engine. LumenVox’s new AI-driven ASR engine is unique in its ability to accurately recognize naturally spoken language and learn from real-world use for maximum ROI.
To explore what LumenVox can do for your business, request a demo.
In this video, we explore the basic types of ASR, providing a technical overview and looking at the fundamental inputs. We also explain the difference between speaker dependent speech recognition software and speaker independent speech recognition software.
Speech Recognition 101 – Part 2
Part two takes an in-depth look at the grammar component of speech recognition. The number one problem developers have is building good grammars, or modeling how users speak to applications. Find out how to overcome these hurdles with LumenVox.
The automation provided by Speech Recognition can save your business significant time and resources, with a tangible impact on your profitability. It can also revolutionize your customer experience by enabling self-service, enhancing the value offered in your contact center, and augmenting the usability of your speech applications.
All these attributes drive revenue growth. But if yours is the kind of organization that views success as a process rather than an end state, why let the advantages end there? With Speech Tuning, you can eternally optimize your Speech Recognition capabilities.
Born out of the belief that no matter how good technology is, it can always get better, Speech Tuning is the process of continually improving applications, including Automatic Speech Recognition-based systems, after they have been deployed. While this may sound like a chore, rather view Speech Tuning as an excellent opportunity to ensure the efficacy of your applications, maintain your competitive edge and amplify your Speech Recognition ROI.
The reality is: everything around us is evolving at a continuous pace. This includes your customers, the world they live in, their language and, most importantly, their expectations. You simply can’t stand still and expect to remain relevant.
If you’re not familiar with the term, Speech Tuning is a technology-driven approach to refining the performance of your speech-enabled applications, based on data gathered from real-world use. The goal is to perpetually enhance recognition accuracy, with a direct impact on call completion rates, containment rates, user experience scores and other metrics that matter to your business.
Fast, Accurate and Powerful Tuning with the LumenVox Speech Tuner
This tool offers value on multiple levels. First of all, it is designed to drive a swift and seamless tuning process. This allows applications to be tuned in less time with less effort, which lowers the total cost of ownership (TCO) of your speech applications.
There are also benefits for your users. The LumenVox Speech Tuner allows you to find and improve issues that you might otherwise have overlooked. This improves CX and strengthens your brand credibility.
How Speech Tuning for Automatic Speech Recognition Works
Speech Tuning assesses how users interact with the system and its testing changes. The process takes time, but when it comes to speech, every millisecond counts. Even minute improvements in application performance produce an impactful ROI within a brief period of time.
The LumenVox Speech Tuner performs transcriptions, instant parameter and grammar-tuning, and version upgrade-testing of any speech application. This reduces your workload during post-deployment application revisions. It also allows you to bring tuning in-house and thus avoid costly professional service fees.
LumenVox Speech Tuner is up and running, maximizing ROI, with just 3 easy steps:
1. Data Import
First, you import call log data into the Speech Tuner database. All information stored by the call log is available in the Speech Tuner. The Call Indexer service automatically scans remote speech applications for fresh logged calls, ensuring key data is just a click away.
2. Speech Transcription
Then, transcribers type the text of the caller’s speech directly into the Transcriber. Once the audio is transcribed, the Speech Tuner compares audio transcripts with the Speech Engine results to determine accuracy, greatly reducing errors associated with manual evaluations. The transcripts are evaluated using the actual decode grammar, producing measurements such as word error rates (WER), in-and out-of-grammar rates and semantic error rates.
3. Immediate Testing
Selecting an interaction in the Call Log automatically loads the associated audio and grammar into the Tester. The grammar can be edited, Speech Engine parameters set, and individual recognition tests generated. The Speech Tuner natively supports industry standard SRGS grammars. Once a set of possible changes is identified, users can batch test audio to evaluate performance, using those changes.
Ready to Reduce the TCO of Your Speech Applications?
The LumenVox Speech Tuner accelerates ROI by decreasing the time spent in tuning cycles. The more efficient your tuning process is, the more you’ll be able to decrease the Total Cost of Ownership (TCO) of your speech applications. The numbers are significant, with LumenVox clients documenting hundreds of thousands of dollars in savings per year, all as a result of fast, accurate and powerful speech tuning.
Interested in migrating off of your legacy speech applications? Contact us!