
Because tuning is an absolute requirement for every speech recognition solution, we created the LumenVox Speech Tuner, a complete tuning and maintenance tool.
The Speech Tuner is designed to perform tuning and transcription, as well as instant parameter, grammar, and version upgrade testing of any speech recognition application. It reduces the work of your post-deployment application revisions, and allows you to bring tuning in-house, avoiding costly professional service fees.
Tuning uses prompts, grammars, call flow, and caller data to improve the speech application as a whole. The Speech Tuner v7.0 also provides the following new features:
The Call Browser grants users the ability to quickly choose and listen to a specific interaction, and export the audio file -- all from one window. It displays a list of all currently loaded and filtered calls, and displays all the interactions for a selected call.
The Call Browser window is divided into three distinct sections: the Calls List, the Interactions List in the middle, and an Audio Control at the bottom.
Within the Calls List, you can easily move between calls to highlight a specific interaction. The
data fields available provide key information such as call time, the number of interactions within a
call, the number of times the Speech Engine recognized speech, and the confidence for times the Speech
Engine correctly interpreted a phrase.
The Interactions List contains details of every interaction within a call. For each interaction, you
can click the View Details button for a specific list of details such as: the acoustic model, decode
time, NBest rank, and semantic interpretation.
The Audio control panel allows users to choose between hearing the decoded audio or the actual caller
utterance, with easy volume controls. An "Export as WAV" button provides user with the ability to export
the call audio as a WAV file on their hard drive.
The Speech Tuner communicates with an open-source, freeware database called SQLite
(www.sqlite.org). The Speech Tuner manages call
log importing, searching, and exporting - so you can focus on the task of tuning, not log management.
The database is contained in a single file, is easy to back up and transport, and can be queried using
SQL-92 (see the SQLite web site for full details) from a variety of exterior tools.
The database maintains all the information contained in the original call log. The Speech Tuner includes not only the decode grammar and speech recognition software results, but also the decode platform, parameter settings, alternative results, prompt audio, and pre- and post-processed audio.
In addition, the Tuner stores all transcripts and evaluations within the call log. As transcripts are entered into the Speech Tuner, they are automatically evaluated against the decode grammar.
This transcript, and any notes or additional information, are stored directly into the database. Individual scores -- such as word error rate, semantic error rate, and in and out-of-grammar measurements -- are stored along with their alignments, as well as information about how the scores were reached. Users can generate a variety of reports from these results, including error rate by grammar or dialog, confusion matrices, transcription progress, and confidence thresholds for confirmation or rejection settings.
Make changes to grammars or parameters, secure in the knowledge that those changes will make the application better, faster, and more accurate. The Speech Tuner uses historical information to validate your changes, ensuring your success.
Most 'tuning' tools are passive log viewers, requiring that changes be made in the live speech recognition application and retested over a period of time with live callers. With the Speech Tuner, we send the changes to the Speech Engine, simulating the recognition process and evaluating changes instantly. Instead of slow, non-interactive, static tuning, the Speech Tuner enables on-the-fly, highly interactive, dynamic tuning.
Speech can be evaluated against grammar sets, as they are sent to the Speech Engine. The grammar can be adjusted and re-tested and re-scored to see if the changes improved performance. Therefore, you can determine instantly whether adding a new phrase to the grammar will improve your speech recognition accuracy.
The Speech Tuner rates performance against commonly accepted measures like WER (Word Error Rate). This helps to give an accurate picture of details such as average confidence scores, correct versus incorrect responses, and in-grammar versus out-of-grammar performance.
Setting parameters optimizes the Speech Engine performance, further improving the caller's experience. Traditionally, changing Engine parameters is a difficult and time-consuming task, often requiring long delays between changing a parameter, and evaluating its effects on performance. Our Speech Tuner changes this.
The dynamic test capability of the Speech Tuner allows the user to shorten this delay. Now, Speech Engine parameters such as search optimizations, speech end-pointing, and NBest result processing can be easily adjusted, and immediately re-tested and re-scored from within the testing component.
Good transcripts can be an important part of properly tuning a speech recognition application. The Speech Tuner's Transcriber is designed to make this process as quick and seamless as possible.
In fact, in the latest version of the Speech Tuner, we have made the transcription process up to 5 to 10 times faster with improved statistics, a new control panel interface, and shortcuts.
Transcribing speech is an excellent way to become familiar with how callers interact with the system.
Transcriptions are used to calculate automatic performance measurements such as in-grammar or out-of-grammar rates and recognition accuracy. Good transcripts are a key component in using the Speech Tuner to adjust your speech recognition application as needed.
The Transcriber is used to write down every word in a call. The Grammar Tester uses these transcripts in evaluating how well a Speech Engine is interpreting what users are saying.
The Transcription Log allows for a detailed view of every single transcript during a transcription session. Interaction number, name and transcription description are all fields that the Log tracks.
The main window of the Speech Tuner is used to load call data, filter that data, and launch the other components of the Speech Tuner application.