Tools

Confidence Scores and Thresholds

Reference Number: AA-01950 Views: 14998

0 Rating/ Voters

Each utterance file contains the confidence threshold that was used during the recognition (this was added in LumenVox 12.2 -- see below for information on working with .callsre files that do not have a confidence threshold value). A confidence threshold represents the confidence value above which the ASR returns a result and below which it returns a "No-Match," indicating the speech application should reject the utterance. The confidence threshold is sometimes called the accept/reject threshold for this reason: above the score, the utterance is accepted and below it is rejected.

An important use for the confidence threshold is for the Confidence Threshold Tuning Wizard, which can calculate whether your application performance can be improved by setting a new confidence threshold.

You can view the confidence threshold for any interaction by right-clicking on it, choosing Properties, and then looking at the Confidence Threshold value.

The Confidence Histogram

A feature of the Tester is the Confidence Histogram, which displays information about utterances and their confidence scores. In particular, it shows:

Correct utterances, displayed as green bars.
Incorrect utterances, displayed as red bars.
Out-of-grammar utterances, displayed as purple bars.
Out-of-coverage utterances, displayed as blue bars.

The height of a bar represents frequency, and along the horizontal axis (X-axis) results are plotted with lower confidence scores at the left and higher confidence scores at the right. The perfect confidence histogram would show all green bars at the right edge and all red/purple/blue bars on the left.

Confidence Ranges

The purpose of the histogram is let you understand the distribution of confidence scores in order to set accept, confirm, and reject ranges:

The reject range is the range of confidence values for which an utterance will be rejected (no-match).
The accept range is the range of confidence values for which an utterance will be accepted by the application without further prompting of the user.
The confirmation range is the range of confidence values (between the reject and accept ranges) for which an utterance will cause the user to be re-prompted ("Did you say...?").

The Tuner automatically calculates suggested reject, accept, and confirmation ranges for transcribed data sets. These are shown as the red (reject), yellow (confirm), and green (accept) horizontal bars along the bottom of the histogram.

Differing Confidence Scales

Internally, LumenVox represents all confidence scores on a 0-1000 scale. This is used in our SpeechPort API for C and C++ users. The VoiceXML standard (along with MRCPv2) uses a scale of 0-1 and MRCPv1 uses a scale of 1-100, which means that LumenVox must translate between those scales and our internal scale.

So if a VoiceXML application sets a confidence threshold of 0.5, that would be represented by an MRCPv1 client as a confidence value of 50, and internally by LumenVox as 500. All confidence values saved in utterance files use the LumenVox 0-1000 scale, and that is what you will use in the Speech Tuner.

When No Confidence Threshold is Present

In the event that no confidence threshold is set -- either because an API user did not set one, or because it is an utterance file from a LumenVox ASR older than 12.2 -- the Tuner assumes a confidence threshold of -1 (this is functionally equivalent to a threshold of 0, but it allows the Tuner to distinguish between utterances with no value and utterances whose threshold was deliberately set at 0). Note that the Media Server has a default threshold of 5% (0.05 in VXML/MRCPv2, 5 in MRCPv1, and 50 on the LumenVox 0-1000 scale).

When running tests through the Tester, the Tuner will use the confidence threshold set in the utterance file. If none was set in the utterance file, no confidence threshold will be specified. This behavior can be overridden in the Tuner's Application Settings by checking the Force confidence threshold for decodes box.

Exporting Tuner Data