(JULY 2007) — The July release of the Speech Engine, version 7.5.602, contains a major overhaul of its voice activity detection features.
This means that the Engine is much better at detecting the start and end of speech, filtering out background noise, and operating in noisy environments.
To test the new VAD, LumenVox deliberately chose calls where the previous VAD did not perform well. These were unusually noisy calls that included various types of loud background noise. The new VAD correctly ignored the noise 92 percent of the time — a marked improvement for this data set.
These improvements, however, have led us to change the way we handle several VAD settings. While all existing speech applications will benefit from the new VAD, the API changes may require changes in your applications to take full advantage of the new system.
We have simplified the VAD settings by introducing two new parameters: volume sensitivity and SNR sensitivity (in our API, these are the streaming parameters STREAM_PARM_VAD_VOLUME_SENSITIVITY and STREAM_PARM_VAD_SNR_SENSITIVITY).
Volume sensitivity controls the total energy level (volume) of the audio required to trigger barge–in. This is similar to the old barge–in level setting, but is now set on a different scale.
Volume sensitivity is particularly useful for dealing with environments with poor echo cancellation. By making the VAD less sensitive, prompts that bleed back in are less likely to falsely trigger barge–in.
SNR sensitivity is a similar parameter. It determines how much louder the voice signal must be compared to the background noise before barge–in is triggered. This is similar to the old noise floor threshold, but now you are specifying a relationship between voice and background noise.
Both settings are on a scale that ranges from 1 to 100. The lower the setting, the more sensitive the VAD is. In other words, setting these values lower makes it easier to barge–in and setting them higher makes it more difficult.
The BARGE_IN_DELAY stream parameter has had its scale changed. Previously it used increments of 1/8 seconds and now is set using milliseconds. If your application accesses the LumenVox API directly, please be sure and change the values your application is setting this parameter to.
The following stream parameters have been removed: VAD_BARGEIN_LVL, BARGE_IN_DYNAMIC_ADJUST, BARGE_IN_NOISE_COUNT_LOW_THRESHOLD, USE_FREQ_VAD, NOTIFY_OF_BEEPS, VAD_NOISE_FLOOR, and VAD_BURST_THLD.
Note: If your application attempts to set these non-existent parameters with the Engine versions 7.5.603 and above, it will not fail or return an error, but there will be no effect on the VAD settings.
The August releases of the LumenVox software include the improved voice activity detection, so once you upgrade you can expect better performance in many calls.
For information on downloading the latest releases of the LumenVox software, please please contact us. It is a free download for users with current software maintenance packages.
The Speech Engine uses a technology called Voice Activity Detection to distinguish between actual speech and other sounds.
Human speech has unique qualities that make it distinguishable from other noise. VAD listens to incoming audio and listens for these qualities.
These features include: energy level (volume), frequency (pitch) changes in frequency, and duration.
As soon as the Speech Engine begins receiving audio from an application, it takes a sample of the audio. It uses that sample to determine the level of background noise in the audio, meaning that actual speech will have to be louder than that to trigger speech recognition.
This is particularly important for detecting background speech. Since speech in the background does have all the characteristics of human speech, the Engine distinguishes it from the actual caller by listening for audio louder than the background.
In addition to determining when speech begins, VAD is also used to determine when speech has ended.
Finally, VAD helps prevent situations like callers being cut off before they are done speaking, a common problem in speech applications.
One of the best ways to find out if you need to make adjustments to your VAD settings is by tuning your speech applications.
These videos (in the Flash video format) will introduce you to the basics of speech tuning, the Speech Tuner, and walk you through advanced tuning tips and strategies. More
The Speech Resources section of our Web page also contains several useful articles on this and other subjects.