There are two types of automatic speech recognition: Grammar ASR and Transcription ASR.
Grammar ASR uses a closed set of rules (a grammar) that includes all possible inputs from the user. Think of the audio phone trees that give you several options to direct your call to the right department.
Grammar ASR doesn’t need to understand the universe of possible words and numbers. The user is prompted with a closed set of options: “say 1 for accounts receivable or 2 for all other inquiries.”
With grammar ASR, it’s possible to achieve an extremely high level of accuracy because there is a limited number of possible choices for each utterance, which reduces the probability of error.
In grammar ASR, a good engine can achieve upwards of 96% accuracy, while a great one can reach 98 to 99% accuracy, or a Word Error Rate (WER) of under 2%.
Transcription ASR is much more challenging. In transcription ASR, the engine has to recognize every possible word, in every available dialect.
Achieving high levels of accuracy in Transcription ASR requires a language model that covers every regional dialect. It is a massive data problem.
Many of the leading Transcription ASR engines have been mired under 90% accuracy for years. Compare the accuracy of leading ASR engines in The Speech Recognition Buyer’s Guide
Recently there have been stunning breakthroughs in transcription ASR thanks to deep neural networks. DNN can achieve transcription ASR accuracy of over 90%, or a WER of less than 10/100.
Read the new eBook: How to Harness AI for Killer Transcription
The key to understanding the two types of speech recognition is that one is not necessarily better than the other. It depends on what you want to do with it.
Grammar ASR vs. Transcription ASR
Grammars ASR tends to be the best choice for Interactive Voice Response systems, or IVR.
If you have a limited set of options you want to give a caller, you’ll want to use grammar ASR.
There are a lot of companies that can help design and implement an IVR. We’ve organized a listing of some of the best on the LumenVox Partners page.
If you want to use automatic speech recognition to transcribe live or recorded audio, the type of speech recognition you’ll want to use is Transcription ASR. Voicebots also use Transcription ASR.
Transcription ASR will be able to accurately recognize and transcribe words from the entire language.
Speech recognition and regional accents
If your system needs to understand a range of dialects and accents you should use a Transcription ASR engine that uses deep neural networks to train on large, all-encompassing data. Such an engine should handle the variability without placing limits on the number of ways a word can be pronounced. This approach will be more efficient for your business, as opposed to deploying multiple languages to account for every dialect.
Whether you need grammar ASR, transcription ASR, or a hybrid of these two types of speech recognition, LumenVox can help. Book a free consultation to discuss which type of speech recognition is right for your business.