(DEC 2006) — In order to get the highest accuracy rates out of the LumenVox Speech Engine, it is useful to adjust a number of parameters that control how the Engine distinguishes speech from other background noise.
These parameters are called voice activity detection (VAD) parameters. Adjusting them gives developers the ability to fine–tune their speech application.
Many problems with recognition are caused by circumstances such as the Engine mistaking background noise for speech, or when it hears prompt playing and triggers barge–in prematurely. Changing the VAD parameters allows developers to combat these problems.
Probably the most useful VAD parameter to adjust is the end–of–speech delay. This is the amount of time, specified in milliseconds, that the Engine must detect silence after speech before it begins processing the utterance.
Adjusting this depending on the sort of input you expect from the caller will make the system seem a lot more responsive. If the speech application asks a caller a yes/no question, this value can be very short, as the answer is only going to be a single word. If the caller is asked for information that will take a longer time to relay, such as an account number, setting the end–of–speech delay higher will ensure the caller can get through the entire account number.
Another useful parameter to be aware of is the noise floor threshold. This is the level of noise the Engine will disregard as background noise, so any speech will have to be louder than this value.
The noise floor value, which does not correspond to any real–world units, starts at 100 and can be set much higher depending on how noisy your audio is. When giving demonstrations of our products on trade show floors, LumenVox often sets this value between 2,000 and 4,000.
A final parameter to consider is the burst threshold. This is the amount of time, in milliseconds, that speech must be detected before barge–in will trigger.
Adjusting this higher will help prevent accidental loud noises in the background from triggering barge–in, meaning that the audio you send to the Engine will be more often recognized correctly.
Refer to our Speech Engine or Speech Platform documentation for more parameters.
As always, if you have any questions about these parameters, feel free to contact LumenVox technical support for assistance.
If you would like information on downloading the latest release of the LumenVox Speech Engine, please contact us. It is a free download for users with current software maintenance packages.
In addition to the digit models, LumenVox now has beta acoustic models for Spanish and Canadian French.
If you would like to participate in beta testing either of those languages, please contact us for more information.
We are also interested in acquiring more language data for those models. If you have high volumes of callers speaking those languages, contact us about potential partnerships in collecting and using that data.
How the VAD parameters are changed varies by platform.
If you are using the LumenVox Speech Platform, edit a file called stream.ini in the Platform installation directory.
For other platforms, refer to the documentation or ask your vendor how to adjust these parameters.
If you are using your own custom applications and our API, refer to the Speech Engine help documentation for the proper functions.
Our new and improved Speech Tuner is almost out of beta test and is the perfect tool to help you evaluate your call data in order to see how adjusting the VAD parameters will help your accuracy. The Tuner can also help you determine problems before and after deployment, so you can have an idea of which VAD parameters to target in the first place.
The Speech Tuner is free to use with the LumenVox Speech Engine. No special licenses are required, as it can use your existing Speech Engine licenses.
For information on downloading the Speech Tuner, contact LumenVox support.
You can also read our Speech Tuner Frequently Asked Questions for more information.
Work on adding foreign languages to the LumenVox Speech Engine is progressing at a rapid clip. We have two languages available for beta testers: Latin American Spanish and Canadian French (contact support if you're interested in trying them out).
In addition to those, we have acquired audio to build an Australian English model and we are working on acquiring UK English as well. We expect these languages to be available in 2007.
The January release of our speech products is looking to be a good one, with several improvements to accuracy and an increase in efficiency. Be sure to check your email next month for more information.
© 2016 LumenVox, LLC. All rights reserved.