LumenVox

Speech Tuner

Because tuning is an absolute requirement for every speech recognition solution, we created the LumenVox Speech Tuner, a complete tuning and maintenance tool.

The Speech Tuner is designed to perform tuning and transcription, as well as instant parameter, grammar, and version upgrade testing of any speech recognition application. It reduces the work of your post-deployment application revisions, and allows you to bring tuning in-house, avoiding costly professional service fees.

  • Analyze each stage in the call process.
  • Transcribe audio data, make pinpoint adjustments, and immediately measure the effects on the overall performance.
  • Test design and development decisions of new applications, using data from deployed applications.

Speech Tuner Functionality

Tuning uses prompts, grammars, call flow, and caller data to improve the speech application as a whole. The Speech Tuner v7.0 also provides the following new features:

  • Faster, more user-friendly database loading.
  • Sophisticated filtering tool gives users the power to pinpoint specific calls, or those matching certain user-defined criteria.
  • Updated Call Browser quickly grants users the ability to choose and listen to a specific interaction, and export the audio file -- all from one window.
  • Improved statistics, control panel interface, and shortcuts make the transcription process up to 5 to 10 times faster than the previous version.
  • Revised Grammar Tester allows users to see instantly how proposed changes to grammar or Speech Engine configurations will affect the results of the speech application.

Tuning in 3 Easy Steps

  1. Import Data

    The basic process is simple. Converters import call log data into the Speech Tuner database. All information stored by the call log is available in the Speech Tuner.
  2. Transcribe Speech

    Transcribers can type the text of the caller's speech directly into the Transcriber. Once the audio is transcribed, the Speech Tuner compares audio transcripts with the Speech Engine results to determine accuracy, greatly reducing errors associated with manual evaluations. The transcripts are evaluated using the actual decode grammar, producing measurements such as word-error-rate, in and out-of-grammar rates, and semantic error rates.
  3. Test Immediately

    Selecting an interaction in the Call Log automatically loads the associated audio and grammar into the Tester. The grammar can be edited, Speech Engine parameters set, and individual recognition tests generated. The Speech Tuner natively supports industry standard SRGS grammars. Once a set of possible changes is identified, users can batch test audio to evaluate performance, using those changes.

The Call Browser grants users the ability to quickly choose and listen to a specific interaction, and export the audio file -- all from one window. It displays a list of all currently loaded and filtered calls, and displays all the interactions for a selected call.

The Call Browser window is divided into three distinct sections: the Calls List, the Interactions List in the middle, and an Audio Control at the bottom.

Calls List

Within the Calls List, you can easily move between calls to highlight a specific interaction. The data fields available provide key information such as call time, the number of interactions within a call, the number of times the Speech Engine recognized speech, and the confidence for times the Speech Engine correctly interpreted a phrase.

Interactions List

The Interactions List contains details of every interaction within a call. For each interaction, you can click the View Details button for a specific list of details such as: the acoustic model, decode time, NBest rank, and semantic interpretation.

Audio Controls

The Audio control panel allows users to choose between hearing the decoded audio or the actual caller utterance, with easy volume controls. An "Export as WAV" button provides user with the ability to export the call audio as a WAV file on their hard drive.

The Speech Tuner communicates with an open-source, freeware database called SQLite (www.sqlite.org). The Speech Tuner manages call log importing, searching, and exporting - so you can focus on the task of tuning, not log management. The database is contained in a single file, is easy to back up and transport, and can be queried using SQL-92 (see the SQLite web site for full details) from a variety of exterior tools.

The database maintains all the information contained in the original call log. The Speech Tuner includes not only the decode grammar and speech recognition software results, but also the decode platform, parameter settings, alternative results, prompt audio, and pre- and post-processed audio.

In addition, the Tuner stores all transcripts and evaluations within the call log. As transcripts are entered into the Speech Tuner, they are automatically evaluated against the decode grammar.

This transcript, and any notes or additional information, are stored directly into the database. Individual scores -- such as word error rate, semantic error rate, and in and out-of-grammar measurements -- are stored along with their alignments, as well as information about how the scores were reached. Users can generate a variety of reports from these results, including error rate by grammar or dialog, confusion matrices, transcription progress, and confidence thresholds for confirmation or rejection settings.

Taking Out The Guesswork

Make changes to grammars or parameters, secure in the knowledge that those changes will make the application better, faster, and more accurate. The Speech Tuner uses historical information to validate your changes, ensuring your success.

Most 'tuning' tools are passive log viewers, requiring that changes be made in the live speech recognition application and retested over a period of time with live callers. With the Speech Tuner, we send the changes to the Speech Engine, simulating the recognition process and evaluating changes instantly. Instead of slow, non-interactive, static tuning, the Speech Tuner enables on-the-fly, highly interactive, dynamic tuning.

Make a change, do the test, get the results.

Grammar Evaluation

Speech can be evaluated against grammar sets, as they are sent to the Speech Engine. The grammar can be adjusted and re-tested and re-scored to see if the changes improved performance. Therefore, you can determine instantly whether adding a new phrase to the grammar will improve your speech recognition accuracy.

Performance Rate

The Speech Tuner rates performance against commonly accepted measures like WER (Word Error Rate). This helps to give an accurate picture of details such as average confidence scores, correct versus incorrect responses, and in-grammar versus out-of-grammar performance.

Parameter Evaluation

Setting parameters optimizes the Speech Engine performance, further improving the caller's experience. Traditionally, changing Engine parameters is a difficult and time-consuming task, often requiring long delays between changing a parameter, and evaluating its effects on performance. Our Speech Tuner changes this.

The dynamic test capability of the Speech Tuner allows the user to shorten this delay. Now, Speech Engine parameters such as search optimizations, speech end-pointing, and NBest result processing can be easily adjusted, and immediately re-tested and re-scored from within the testing component.

Good transcripts can be an important part of properly tuning a speech recognition application. The Speech Tuner's Transcriber is designed to make this process as quick and seamless as possible.

In fact, in the latest version of the Speech Tuner, we have made the transcription process up to 5 to 10 times faster with improved statistics, a new control panel interface, and shortcuts.

Transcribing Audio

Transcribing speech is an excellent way to become familiar with how callers interact with the system.

Transcriptions are used to calculate automatic performance measurements such as in-grammar or out-of-grammar rates and recognition accuracy. Good transcripts are a key component in using the Speech Tuner to adjust your speech recognition application as needed.

Transcription Entry

The Transcriber is used to write down every word in a call. The Grammar Tester uses these transcripts in evaluating how well a Speech Engine is interpreting what users are saying.

Transcription Log

The Transcription Log allows for a detailed view of every single transcript during a transcription session. Interaction number, name and transcription description are all fields that the Log tracks.

Video Library

Tips & Articles

Training

Screens

The main window of the Speech Tuner is used to load call data, filter that data, and launch the other components of the Speech Tuner application.