Rss Categories

Working with Languages in the ASR

Reference Number: AA-00614 Views: 4621 0 Rating/ Voters

By setting the language identifier in a grammar, you can load different acoustic models and get recognitions in various languages.

In order to recognize sounds from different languages, we "train" the Speech Recognizer on large sets of transcribed audio from each language. The result of this process is an acoustic model, a large file that contains information about the way words in a language sound.

The LumenVox Speech Recognizer includes support for American English, Australian English, Indian English, U.K. English, Mexican Spanish, South American Spanish, Canadian French and Brazilian Portuguese.

Configuring Acoustic Models

10.0 and later

Starting with version 10.0, the LumenVox Speech Recognizer no longer includes all of its acoustic models with the standard installation. Instead, the acoustic models are broken out into individual packages for selective installation and configuration. They are available with the rest of the installation packages at

By default, the en-US (American English) model is installed with the ASR and loaded automatically. If you wish to use another acoustic model, you only need to install the acoustic model. During its installation, the appropriate components are put into directories needed by the ASR. You do not need to move the models around between folders.

As is always the case, you will need to declare which acoustic model you want the audio decoded against in the grammar header's language declaration. You can only decode audio against a single acoustic model.

If you wish to remove an acoustic model, you can use the Windows Add/Remove Programs function, or in Linux call rpm -e packagename.

9.5 and prior

The Speech Engine comes complete with all of the acoustic models you need in order to use the supported languages. By default, the Speech Engine will only recognize American English. To use other languages, you must first copy them into the proper folder.

On Windows, inside the Engine's installation directory is a Lang folder, and inside of that is a directory called OtherLanguages which contains the various acoustic models. On Linux, these models are in /etc/lumenvox/Lang/ by default. Simply copy the models you wish to use into the Dict directory in the Lang folder. You must stop and restart the Speech  Engine service for these new models to be available.

It is very important to note that you cannot have two acoustic models active the same time. Within an application, you could switch between English and Spanish, but you could not decode an utterance against both models at the same time. This means you cannot have two grammars with different languages active at once -- doing so will cause an error.

Each acoustic model uses a significant amount of memory, so you should not load models you will not be using. If you need models that use less memory (at a cost of being less accurate), you should see our instructions for less memory intensive models.

Continuous vs. Semi-Continuous Models

Previous to version 9.5, the LumenVox Speech Recognizer used either of two algorithms for recognizing patterns in speech – continuous or semi-continuous acoustic models.

As of version 9.5, all acoustic models have been rebuilt using the improved continuous model. If you are using a semi-continuous model on a version after 9.5, it is advised that you switch to the continuous version.

In most cases, more accuracy can be gained from the continuous model, but at the expense of processing time. The continuous model typically uses 15-20% more processing time than the semi-continuous.

If you’re using an acoustic model that does not have continuous mode support, then you must switch to the semi-continuous mode. Otherwise the Speech Recognizer will not function correctly.

If you are looking to use a continuous mode-supported acoustic model (en-AU Australian English, for example), and you’ve already declared the continuous mode in your sre_server.conf, then you only need to load the acoustic model as normal.

All languages are supported with the Semi-Continuous decoder. Please see our Continuous vs. Semi-Continuous Model page for help in switching between the model types.

Which Language to Use?

Which one you should use for your application will usually be obvious. A call router in America is best served by American English, and one in Australia by Australian English.

But there are cases where it's not so clear. What if you were developing an application for South African speakers, who speak an English dialect for which there is currently no acoustic model?

In that case you may want to try UK English first, as South African English is heavily influenced by UK English, and then the other models. The same sort of logic would go for other English or Spanish dialects. For instance, if you want to recognize Spanish speakers from Spain, you would try both our Mexican and South American Spanish models to see which works best.

You may need to experiment a little bit to see which model works best for your speakers. Try different models and add phonetic spellings for words that are commonly misrecognized.