LumenVox Luminaries is a podcast that broadcasts thought leadership pieces on the subject of voice technology. This episode features Jason Kawakami, LumenVox Senior Sales Engineer, outlining the sophisticated speech science, functions, and benefits of LumenVox Transcription Engine. You can follow Jason on LinkedIn here.
Listen to the Intersection of Speech Science and Transcription Podcast below:
Read the Transcript
So this is a dive into one particular component of our speech products. LTE or LumenVox Transcription Engine is part of the ASR component of our speech suite.
Q: What is LumenVox Transcription Engine?
What the transcription engine does, is it delivers transcribed text which is representative of the decoded speech. So we take in an utterance. We process it against an unconstrained grammar called a Statistical Language Model or SLM. We take that text and provide it to a downstream piece of technology that might be an existing trained AI model, or it might be fed into any number of things.
Q: What’s the primary use case for LumenVox Transcription Engine?
The primary use case today has been in our case talking about Natural Language IVR’s. The application, the SLM all of the bits of this are focused on that middle of the road. We’re providing a supporting technology to IVR’s that are providing natural language applications.
Straight out of central casting use cases speech-enabling a chatbot.
How does LumenVox Transcription Engine work?
We process the audio against a Statistical Language Model, an SLM instead of a traditional grammar. The traditional grammar is a constrained search space. The Statistical Language Model is a big giant search space that is focused on the spoken language in a particular unique language whether that’s English, Spanish, or other. Now that mathematical model predicts what is going to be spoken next, and that prediction is used to narrow the search space.
This is speech science. This is really high-end science that is done on how languages are spoken, what phonemes come out for what–this is very, very high, complex computational science.
Q: What’s the primary use case for LTE?
Our SLM is tuned for general typical conversation. We’ve talked about the primary use case for our LTE is to feed downstream AI-based processes. The straight-up middle of the road? We’re thinking about NLU- IVR. This is some type of telephony solution that requires the decoding of spoken utterance and that text being provided to some type of AI to determine the meaning. Taking that audio either from the IVR from the conversation that’s between the agent and a customer and feeding that into an AI engine that is specifically tuned and trained to detect sentiment.
Q: What are other mainstream use cases?
Another potential here, as we start getting into the shoulders of the road, is agent-assist applications, so listening to agent conversations in real-time and processing their audio, processing both their leg and the consumer’s leg–and maybe training a model to key in and be integrated with the company’s knowledgebase; that knowledgebase is providing or prompting the agent with particular articles out of the knowledgebase that will assist them with what the consumers are asking them for.
As we move farther outside of the mainstream, potentially that middle the road use case–using LTE to support speech to text applications, true transcription applications as the word transcription is used by normal people, not necessarily speech industry people. So think note-taking. Note-taking is a big deal in lots of industries. A few that stick out– the medical industry, the legal industry–the transcription engine could be applied to an application that is doing verbal note-taking in those cases, legal the same way.
The other one I was thinking about–dispatch apps. There’s lots and lots of mobile workforces these days and mobile workforces are becoming more prevalent with the world that we’re in. People are dispatching service vehicles to your home instead of you taking your car down to some type of central garage to get fixed. Every one of those activities has the standard “did I complete my thing?” and “how much time did I spend?” And there’s oftentimes notes that are associated with those trouble tickets or those service tickets. Our LTE could be used to support taking those notes spoken and pushing them into a text-based system to feed it to analytics applications. And we can use LTE to provide the words to an application that is providing the analysis and the real meanings of the words that are being spoken.
Providing a highly accurate speech attendant to enrich the customer experience and replace Nuance’s discontinued automated attendant solutions.
Contact center solutions developer Interactive Northwest, Inc. (INI) has partnered with LumenVox to create a powerful replacement for Nuance’s automated attendant solutions.
As the date draws near for the end of support for Nuance’s SpeechAttendant® and Open SpeechAttendant® products, organizations are searching for a new speech attendant solution partner. INI’s Interactive Speech Attendant (ISA) is a robust virtual voice attendant that provides callers with a convenient way to reach people within an organization by acting as a single point of access to the corporate directory.
Powered by LumenVox Automated Speech Recognizer (ASR), INI Interactive Speech Attendant (ISA) delivers a natural user experience that promotes higher customer satisfaction and better corporate branding. INI ISA’s full-featured administration dashboard puts organizations in control, allowing for highly customized handling of inbound calls. The integration of LumenVox’ Automatic Speech Recognizer enables greater recognition accuracy, resulting in a dramatically improved user experience. Speech cuts to the chase. It allows callers to bypass pressing numbers or routing to an operator by simply saying the name, department, or conference room. Upon recognition, the caller is transferred to the correct party. This integration of speech recognition transforms touchtone mazes into natural, intuitive call flows.
INI Interactive Speech Attendant provides a friendly and cost-effective solution to call routing. Its easy-to-use web-based administration interface simplifies the configuration of nicknames and aliases, department and location-based transfers, business hours logic, alert messaging, and more.
“INI’s Interactive Speech Attendant is proving itself to be a cornerstone of business operations. The INI ISA promotes operational efficiency, reduces management costs and creates the ideal, frictionless service experience that users need,” said Edward Miller, CEO of LumenVox.
“CX-focused organizations understand there is only one chance to make a first impression,” said Danette Craig, President of INI. “INI’s Interactive Speech Attendant enables companies to realize the best caller experience with a natural, user-friendly interface all while automating the front desk function.”
2020 has fundamentally transformed the way we live and work. Organizations, and the technology they use, are adapting at a record pace. The key to successful evolution is organizational agility. Here are four trends that will propel tech into 2021:
Shift to Total Experience (TX). You may have heard the term “multiexperience,” which refers to the shift from a singular screen and keyboard to a multimodal digital world, where technology surrounds you, going from your laptop to your phone to your tablet to your desktop and back again. TX takes that to another level; it keeps in mind each and every player in the experience game—employees, customers, users. Historically, these players have had immersive digital experiences, but they were separate. Imagine how much easier things would be if they were the same? That’s TX. At LumenVox, we are building our speech and authentication technology stack with multichannel and multimodal capabilities to address this demand in the remote workforce and contact center.
In a remote world, TX strategy will create a strong competitive advantage, as organizations with TX will outperform over the next three years.
Invest in Artificial Intelligence. Artificial intelligence is first and foremost a discipline, with any success stemming from an intense daily grind. However, that effort doesn’t always pay off. According to Gartner research, only 53% of projects make it to production, making it essential that companies put the extra time and engineering effort into AI projects. That’s why at LumenVox, Artificial Intelligence has always been a central component of our technology. Our speech recognition and voice biometric algorithms are built upon AI and Machine Learning principles so that our technology continuously evolves. LumenVox also has invested heavily in Research & Development, so we can deliver Artificial Intelligence technology which is robust, scalable, and easily deployed.
Cloud Enablement. At LumenVox, we have been ensuring our technology responds to the demands of multi-cloud environments so that organizations do not have to rely solely upon in-house IT infrastructure. Instead, they can provide their customers with LumenVox software via the cloud. In a post-COVID world, this is critical, as the world now runs remotely.
Prioritize Privacy. Privacy has always been a priority, but now that priority is marked urgent. Data protection legislation is maturing. In 2020 California finally implemented the California Consumer Privacy Act, signed into law in 2018. The CCPA is one of several state-mandated regulatory policies the US has implemented. Similar regulations appear in other countries whose model has been Europe’s GDPR rules protecting consumer privacy. This requires organizations to put more stringent controls in place to guard against threats and to protect users’ privacy. LumenVox provides advanced voice biometrics to add robust layers of security for organizations of all shapes and sizes, safeguarding sensitive business and consumer information.
In summary, it’s important for any business to always look ahead. The more a business looks toward the future in tech, the more it cannot just adapt and survive, but also thrive. LumenVox’ chief aim for 2021 is to assist businesses in their advancement by directly addressing these trends in voice technology.
Learn more about our full suite of technology here.
LumenVox, a leading global provider of speech and authentication technologies, has officially partnered with the Austrian company fms/Austrosoft. fms/Austrosoft is the technology leader in elaborate dispatch system for taxi and car rental in Europe and has integrated LumenVox’ voice technology into fms Callbot for networked and digital telephone ordering.
fms/Austrosoft has been working for more than 35 years to continuously improve the dispatching service by optimizing the ordering experience for customers. In doing so, the company, just like the mobility industry, is constantly facing new challenges that are mastered with its own know-how and a technology partner like LumenVox.
The hardware and software solutions from fms/Austrosoft are aimed at all participants in the dispatching process. This includes the initial booking by the ride customer, the dispatching by the taxi center and the final order processing by the driver. To take the initial booking interaction via taxi call centers to the next level, the IT experts at fms/Austrosoft developed the fms Callbot with integrated voice solutions from LumenVox.
This is the digital agent for the 21st century, enabling networked and digital telephone ordering to provide the passenger with a completely new ordering experience. Whether it is an instant order or a pre-order, the fms Callbot seamlessly transitions the passenger from the analogue to the digital world.
As an official reseller of LumenVox, fms/Austrosoft offers its customers direct access to LumenVox’s entire voice and authentication portfolio with the fms Callbot. This includes:
LumenVox is pleased to announce that we have partnered with jtel Germany, a leading provider of advanced contact center solutions. This alliance leverages jtel’s solutions with LumenVox’ complementary, best-in-class speech technology.
LumenVox and jtel focus on delivering outstanding customer experiences through the contact center. Operating mainly in Europe and providing professional support in Germany, Austria and Switzerland, jtel specializes in contact center automation development and omni-channel interactions. With jtel’s long-standing commitment to streamlining customer interactions, increasing productivity, and creating deeper customer relationships for businesses around the world, this partnership will accelerate and expand the use of speech technology to improve operational efficiency, increase customer satisfaction and reduce fraud for organizations around the globe.
As a recognized partner of LumenVox, jtel offers its customers streamlined access to LumenVox’ entire speech and authentication portfolio, including:
Call Progress Analysis, which allows the recognizer to analyze speech and automatically discern appropriate timing for outbound telephony campaigns and notifications.
Voice Password, an active voice biometrics platform which authenticates customers using text-dependent voice biometrics.
Passive Voice Biometric Authentication, a passive voice biometrics platform which authenticates customers independent of a phrase, so it can occur naturally in conversation with a live agent.
Fraud Scanner, a cutting-edge fraud detection tool to identify fraudulent activity using state of the art voice biometrics.
As a jtel partner, LumenVox can now provide skills and technology to deliver outstanding customer service to any organization. The partnership ensures consistent integration and exceptional support to meet the unique and individual needs of jtel customers.
The development of Automatic Speech Recognition techniques continues to accelerate. Already an established technology, Automatic Speech Recognition is growing by leaps and bounds each year, especially as artificial intelligence contributes to evolution. A crucial building block of artificial intelligence is deep learning.
What is Deep Learning?
Deep learning refers to the process of a computer model learning how to do classification tasks by example, directly from audio, text, or images. These models are trained using very large sets of data and neural network topologies with many hidden layers, to which the word “deep” refers. Deep Neural Networks can achieve state-of-the-art performance in many different fields, even exceeding human-level performance on some of them.
What are Neural Networks?
More specifically, neural networks are a series of algorithms, whose job it is to identify relationships within a set of data, a process that simulates the way a human brain identifies underlying connections. When it comes to speech technology, neural networks enable us to push the limits of speech recognition.
Which Neural Network for Automatic Speech Recognition?
Deep Neural Networks are transforming the way humans interact, playing an important role in the technological revolution of artificial intelligence. At LumenVox, our Research and Development team is currently utilizing Time Depth Separable Convolutional Neural Networks (TDS CNN).
Convolutional Neural Networks are advantageous for a few reasons: They are computationally efficient, making them highly useful for mobile applications, and they have fewer knobs to toy with, fewer parameters to adjust. That means LumenVox customers get an ASR engine with greater speech recognition accuracy without requiring more compute performance, encouraging greater efficiency and performance.
LumenVox’ deep learning technology is applied to many of our technologies, including Automatic Speech Recognizer, Natural Language Processing, and Voice Biometrics. To learn more about our comprehensive stack, or to take an even deeper dive into deep learning, contact us today!