Coming Soon: Even Better Noise Reduction

(MAY 2009) — One of the most interesting features of the upcoming LumenVox 9.0 is enhanced noise cancellation. In version 8.6 we improved the noise cancellation, but we've revamped it again because it's such an important part of a modern speech recognition system.

Noisy telephone conditions seem here to stay: the proliferation of speakerphones, wireless headsets, and mobile phones used in cars and crowded locations mean that anyone with a telephony application has to deal with more noise than ever, and separating the signal (what a caller is saying) from the noise is vital.

How the LumenVox speech recognition software deals with noise affects its overall accuracy. Our recently posted General Motors case study provides a clear illustration of how fantastic accuracy gains can be made when speech recognition is tuned for a noisy environment.

Greater Flexibility

In previous releases, users could toggle noise reduction on and off. In 9.0, the Speech Engine will have 4 distinct settings: an improved default mode, an alternative reduction technology, a brand new adaptive mode, and noise reduction disabled. All of these can be changed in the client_property.conf file at will.

The default noise reduction technology works similarly to the technology introduced in 8.6, but with higher accuracy in almost all tests. Previously, the noise reduction introduced a slight decrease on very clean audio; this is no longer the case with the new noise reduction.

The alternative reduction technology is similar to the new default, but works better in some specific cases. LumenVox is still running tests on these cases to isolate when one is better than another, but we have decided to include it as an option for customers who wish to experiment.

By allowing users to switch between these modes, we hope that they will be able to find the mode that works best for their application. In the future, we would like to develop a method for the Engine to automatically pick the best method for a given piece of audio, but for now we are happy to allow users to make the decision that works best for them.

Adapting to Different Environments

The default LumenVox noise reduction technology gives us fairly significant accuracy gains for noisy data over no noise reduction at all. But it is not ideal for all situations, as it works by sampling the noise at the start of an interaction and then performing reduction based on the properties of the noise in the sample.

This works well for reducing noise if the noise is fairly constant throughout the utterance, such as a person standing next to a highway. The sound of cars speeding past is likely to remain static throughout, as are sounds like the white noise generated by a computer's cooling fans.

Other sounds, such as music from a car's stereo or people talking in the background, are more dynamic. This means they are likely to change in volume and intensity throughout an interaction.

This is why LumenVox is adding a new type of noise reduction that automatically adapts to the background sounds, constantly resampling the audio stream to find the current noise profile.

On very clean audio, the new adaptive technology performs slightly worse than the new default technology, but it works better on certain sets of audio. For instance, it is several times more accurate than the default technology when dealing with loud music in the background.

Accuracy Comparison

The table below illustrates the accuracy of the noise reduction methods across a large sample of noisy data (including different noise types):

Accuracy Gains: New LumenVox Noise Reduction Methods
Reduction Method Clean 20dB 15dB 10dB 5dB 0dB
v8.6 noise reduction 0.02% 2.36% 3.39% 2.55% 2.03% 3.87%
v9.0 noise reduction (default) 0.08% 2.81% 4.93% 5.40% 6.10% 6.41%
v9.0 alternative method 0.10% 2.82% 5.31% 5.57% 4.43% 7.15%
v9.0 adaptive method -0.36% 1.99% 4.73% 5.71% 3.83% 6.37%

The table illustrates the accuracy gain versus no noise reduction at all. The decible value is the signal–to–noise ratio; at clean there is relatively no noise, at 20 dB the signal is 20 dB louder than the noise, and so on.

You will note that in all cases, the new default performs better than the old default, especially in very noisy situations. The 9.0 alternative outperforms the 9.0 default in some cases. The alternative is still experimental, however, so you should probably not enable it without doing some testing of your own.

The adaptive is generally the weakest of the three new methods, but its strength is in very specific environments, as mentioned above: if you see that a lot of your users are speaking in environments with constantly changing background sounds, you may want to try enabling it.

More Upcoming Features

In addition to the new noise reduction technology, a host of other great features are coming in 9.0, which is one of the most significant releases in LumenVox history.

In future tech bulletins, we will cover these in more detail, but for now be aware that 9.0 will have much better accuracy in many cases, a smaller memory footprint, better audio logging, improved out–of–vocabulary handling, and a streamlined API that will make it easier for new developers to start working with our Engine (total backwards compatibility will be retained for existing applications).

LumenVox Speech Engine 9.0 is currently in testing and will be released when it is ready, which we estimate will be sometime in the next several weeks.

