Hear a revolutionary text-to-speech experience with LumenVox's new Neural TTS engine.

The Intersection of Speech Science and Transcription – LumenVox Luminaries Podcast

The Intersection of Speech Science and Transcription


LumenVox Luminaries is a podcast that broadcasts thought leadership pieces on the subject of voice technology. This episode features Jason Kawakami, LumenVox Senior Sales Engineer, outlining the sophisticated speech science, functions, and benefits of LumenVox Transcription Engine. You can follow Jason on LinkedIn here.

Listen to the Intersection of Speech Science and Transcription Podcast below:

Read the Transcript

So this is a dive into one particular component of our speech products. LTE or LumenVox Transcription Engine is part of the ASR component of our speech suite.

Q: What is LumenVox Transcription Engine?

What the transcription engine does, is it delivers transcribed text which is representative of the decoded speech. So we take in an utterance. We process it against an unconstrained grammar called a Statistical Language Model or SLM. We take that text and provide it to a downstream piece of technology that might be an existing trained AI model, or it might be fed into any number of things.

Q: What’s the primary use case for LumenVox Transcription Engine?

The primary use case today has been in our case talking about Natural Language IVR’s. The application, the SLM all of the bits of this are focused on that middle of the road. We’re providing a supporting technology to IVR’s that are providing natural language applications.

Straight out of central casting use cases speech-enabling a chatbot.

How does LumenVox Transcription Engine work?

We process the audio against a Statistical Language Model, an SLM instead of a traditional grammar. The traditional grammar is a constrained search space. The Statistical Language Model is a big giant search space that is focused on the spoken language in a particular unique language whether that’s English, Spanish, or other. Now that mathematical model predicts what is going to be spoken next, and that prediction is used to narrow the search space.

This is speech science. This is really high-end science that is done on how languages are spoken, what phonemes come out for what–this is very, very high, complex computational science.

Q: What’s the primary use case for LTE?

Our SLM is tuned for general typical conversation. We’ve talked about the primary use case for our LTE is to feed downstream AI-based processes. The straight-up middle of the road? We’re thinking about NLU- IVR. This is some type of telephony solution that requires the decoding of spoken utterance and that text being provided to some type of AI to determine the meaning. Taking that audio either from the IVR from the conversation that’s between the agent and a customer and feeding that into an AI engine that is specifically tuned and trained to detect sentiment.

Q: What are other mainstream use cases?

Another potential here, as we start getting into the shoulders of the road, is agent-assist applications, so listening to agent conversations in real-time and processing their audio, processing both their leg and the consumer’s leg–and maybe training a model to key in and be integrated with the company’s knowledgebase; that knowledgebase is providing or prompting the agent with particular articles out of the knowledgebase that will assist them with what the consumers are asking them for.

As we move farther outside of the mainstream, potentially that middle the road use case–using LTE to support speech to text applications, true transcription applications as the word transcription is used by normal people, not necessarily speech industry people. So think note-taking. Note-taking is a big deal in lots of industries. A few that stick out– the medical industry, the legal industry–the transcription engine could be applied to an application that is doing verbal note-taking in those cases, legal the same way.

The other one I was thinking about–dispatch apps. There’s lots and lots of mobile workforces these days and mobile workforces are becoming more prevalent with the world that we’re in. People are dispatching service vehicles to your home instead of you taking your car down to some type of central garage to get fixed. Every one of those activities has the standard “did I complete my thing?” and “how much time did I spend?” And there’s oftentimes notes that are associated with those trouble tickets or those service tickets. Our LTE could be used to support taking those notes spoken and pushing them into a text-based system to feed it to analytics applications. And we can use LTE to provide the words to an application that is providing the analysis and the real meanings of the words that are being spoken.

Have questions about LumenVox Transcription Engine? Contact us today!

Related Resources

Speech Recognition
There are two types of automatic speech recognition: Grammar ASR and Transcription ASR. This post explains the difference and which type of speech recognition is best for each use case.

Ready to create an extraordinary voice experience for your customers?​