Utterance accuracy refers to how frequently in-grammar utterances return the correct result from an ASR. Obviously, high accuracy is desirable, as it means that the ASR is returning the correct answer when users speak in-grammar utterances. It is defined as the number of correct answers (and semantic matches) divided by the total number of in-grammar results. Using the Utterance Accuracy Tuning Wizard, you can identify menus or grammar sets with lower accuracy measurements and pinpoint the exact utterances that are being incorrectly recognized.
The Utterance Accuracy results page shows the most common utterance errors (where one utterance was recognized for another).
The Word Errors tab allows you to focus on the most common words within an utterance that were misrecognized:
While you will usually want to focus on utterance errors, looking at word errors can be very useful in cases where utterances are made up of long strings of repeated words, such as credit card numbers.
Regardless of whether you are looking word or utterance errors, it is vital to have transcribed text for utterances. In order to find errors, the Tuner compares the transcribed text to the recognized text, so without transcripts it cannot find errors.
The format for displaying errors depends on the type of error:
- A substitution error is when one word is recognized as another. It is indicated with an arrow symbol: TRANSCRIPT TEXT -> RECOGNIZED TEXT (hyphen, greater than). For instance, if the user said "Oh" and "Four" was recognized, it would be shown as OH->FOUR.
- A deletion error is when the ASR leaves out a word. It is indicated by the word deletion: TRANSCRIPT TEXT deletion. For instance, if the user said "One two three" and "One three" was recognized, it would be shown as ONE deletion THREE.
- An insertion error is when the ASR adds a word. It is indicated by the word insertion: TRANSCRIPT TEXT insertion. For instance, if the user said "One two" and "One two three" was recognized, it would be shown as ONE TWO insertion.
Clicking Apply Filter and Exit will exit the Wizard and filter to only utterances that are incorrect.
Improving Utterance Accuracy
Before you spend too much time worrying about utterance accuracy, consider that utterance accuracy by itself is often not the correct thing to focus on first, even though it is the most obvious. A much more sophisticated and useful exercise is to do a Confidence Threshold Analysis, which considers both accuracy and confidence scores. Please read our free whitepaper, Calculating Speech Recognition Accuracy, which discusses these concepts in great detail.
When you are ready to improve accuracy, there are a few strategies to employ:
- Focus on the most commonly misrecognized utterances first.
- Listen to the audio for these utterances. Is the audio clear? Many accuracy problems are due to bad audio.
- Is there "confusability" in the grammar? If two entries sound very similar, e.g. "back" and "fax," they are likely to be misrecognized for one another. Try picking grammar items that are phonetically distinct to improve accuracy.
- Use the Grammar Editor and review the grammars to ensure that they do not have too many grammar items. Useless grammar choices can decrease accuracy and should be removed.
- Check that the misrecognized words are in the ASR's internal dictionary using the Pronunciation Checker tool. Open them in the Phonetic Speller and verify that their pronunciations are good; write custom ASR lexicons if they are not.
- Add weights to phrases that are commonly misrecognized.
- Consider changing the speed versus accuracy setting to favor higher accuracy and slower speed.
After you have made improvements to grammars, lexicons, or ASR settings, be sure to test your changes in the Tester. Run new decodes using your new grammars/settings and consider re-running the confidence threshold and utterance accuracy wizards to understand how your changes have affected your results.