Confidence threshold tuning is an often overlooked part of speech tuning. While new speech developers may focus purely on accuracy, understanding the performance of a speech application at a given confidence threshold is vital to understanding how the application works for users (see our free whitepaper Calculating Speech Recognition Accuracy for a very detailed explanation of how to understand the relationship between accuracy and performance at a given threshold).
By using the Tuning Wizard and selecting Confidence Threshold Tuning as an issue to focus on, you will eventually come to the Confidence Threshold Results page.
Understanding Confidence Threshold Tuning
The goal of tuning confidence thresholds is to determine whether the application performance can be improved by selecting a different confidence threshold. If you have not read our Calculating Speech Recognition Accuracy whitepaper, here is a quick summary of the terms used in this tuning:
- A Correct Accept is when an answer was correctly recognized by the ASR and returned above the confidence threshold (meaning the application accepted the answer).
- A False Accept is when an answer was incorrectly recognized by the ASR (or was simply out of grammar) and returned above the confidence threshold (meaning the application accepted the answer).
- A False Reject is when an in-grammar answer (whether it was correctly or incorrectly recognized) was returned below the confidence threshold (meaning the application rejected the answer).
- A Correct Reject is when an answer which was out of grammar or out of coverage was returned below the confidence threshold (meaning the application rejected the answer
We separate in-grammar and out-of-grammar analysis because the goal of confidence threshold tuning is to improve the number of in-grammar accepts and decrease the number of out-of-grammar accepts. For instance, if the rate of false accepts is very high, then setting a higher confidence threshold can improve performance, as those answers would be rejected. Doing so may decrease the correct accept rates, though, so it is a balance.
The Confidence Threshold Tuning Wizard will allow you to explore how different confidence thresholds will affect your results.
Using the Confidence Threshold Tuning Wizard
From the main Confidence Threshold Results screen, the primary thing you can do is to click on the dropdown to toggle between baseline confidence thresholds (those present in the .callsre files you loaded -- if none are present, a threshold of 0 is assumed) and the optimal threshold calculated by the Tuning Wizard.
By switching between these, you can see how your various CA, FA, FR, and CR rates are affected:
As you can see, clicking the dropdown updates the performance charts, showing the change between the currently selected value and the baseline and optimal values. This way you can see at a glance how your baseline differs from the optimal threshold, and decide whether you want to implement a change.
You can also click the Apply Filter and Exit button to exit the Tuning Wizard with a filter to just the menu/grammar sets you had selected earlier in the wizard if you want to review the data yourself.
Where the Confidence Threshold Tuning Wizard becomes very powerful, however, is the expanded view:
By clicking the expanded view, you get a version of the Confidence Histogram (normally seen in the Tester view) that includes information about the baseline and optimal thresholds. You can click anywhere in the histogram to put down a confidence marker:
By placing a marker, you are telling the Tuner to simulate results with a confidence threshold at that marker. It updates the performance comparisons numbers:
So you can see how a threshold of, say, 750 compares to your baseline of 0 and the calculated optimal threshold of 430. Because your application may have different priorities than the algorithm which calculates optimal thresholds -- maybe false accepts aren't a big deal to you, but correct accepts matter a lot -- you may find that the ideal threshold for a particular application varies, and this is a great way to fine tune your confidence threshold.
Tune Menus Individually
An important consideration for tuning confidence thresholds is that they are best tuned on an individual menu basis. More so than any other Tuning Wizard (except perhaps Out-of-Grammar Frequency), confidence thresholds are very sensitive to the different menus that make up a menu. A good example is that a simple yes/no menu may have a very high threshold, since it is easy for the ASR to discern between "yes" and "no."
However, a menu with a more complex grammar with lots of confusability, such as a company directory with hundreds of names, may get better results with a lower confidence threshold (in general, larger grammars produce lower confidence scores). For this reason, we strongly recommend that before getting to the Confidence Threshold Tuning Results page, you have picked a menu to focus on.