Browse
 
Tools
Rss Categories

Working with Languages in the ASR

Reference Number: AA-00614 Views: 18460 0 Rating/ Voters

By setting the language identifier in a grammar, you can load different acoustic models and get recognitions in various languages.

In order to recognize sounds from different languages, we "train" the Speech Recognizer on large sets of transcribed audio from each language. The result of this process is an acoustic model, a large file that contains information about the way words in a language sound.

The LumenVox Speech Recognizer includes support for the following languages:

    • American English
    • Australian English
    • Indian English
    • U.K. English
    • Mexican Spanish
    • South American Spanish
    • Canadian French
    • Brazilian Portuguese
    • German
    • Italian


Configuring Acoustic Models

10.0 and later

Starting with version 10.0, the LumenVox Speech Recognizer no longer includes all of its acoustic models with the standard installation. Instead, the acoustic models are broken out into individual packages for selective installation and configuration. They are available with the rest of the installation packages at www.LumenVox.com/packages.

By default, the en-US (American English) model is installed with the ASR and loaded automatically. If you wish to use another acoustic model, you only need to install the acoustic model. During its installation, the appropriate components are put into directories needed by the ASR. You do not need to move the models around between folders.

As is always the case, you will need to declare which acoustic model you want the audio decoded against in the grammar header's language declaration. You can only decode audio against a single acoustic model.

If you wish to remove an acoustic model, you can use the Windows Add/Remove Programs function, or in Linux call rpm -e <packagename>. Please refer to the Linux Installation article for more details if needed.

9.5 and prior

The Speech Engine comes complete with all of the acoustic models you need in order to use the supported languages. By default, the Speech Engine will only recognize American English. To use other languages, you must first copy them into the proper folder.

On Windows, inside the Engine's installation directory is a Lang folder, and inside of that is a directory called Other Languages which contains the various acoustic models. On Linux, these models are in /etc/lumenvox/Lang/ by default.Simply copy the models you wish to use into the Dict directory in the Lang folder. You must stop and restart the Speech Engine service for these new models to be available.

It is very important to note that you cannot have two acoustic models active the same time. Within an application, you could switch between English and Spanish, but you could not decode an utterance against both models at the same time. This means you cannot have two grammars with different languages active at once -- doing so will cause an error.

Each acoustic model uses a significant amount of memory, so you should not load models you will not be using. If you need models that use less memory (at a cost of being less accurate), you should see our instructions for less memory intensive models.


Continuous vs. Semi-Continuous Models

Prior to version 9.5, the LumenVox Speech Recognizer used one of two algorithms for recognizing patterns in speech – continuous or semi-continuous acoustic models.

As of version 9.5, all acoustic models have been rebuilt using the improved continuous model. All recent versions of LumenVox implement the continuous version, which is what you should be using.

For more deprecated information about semi-continuous model use, please see our Continuous vs. Semi-Continuous Model article for help in switching between the model types.


Which Language to Use?

Which one you should use for your application will usually be obvious. A call router in America is best served by American English, and one in Australia by Australian English.

But there are cases where it's not so clear. What if you were developing an application for South African speakers, who speak an English dialect for which there is currently no acoustic model?

In that case you may want to try UK English first, as South African English is heavily influenced by UK English, and then the other models. The same sort of logic would go for other English or Spanish dialects.For instance, if you want to recognize Spanish speakers from Spain, you would try both our Mexican and South American Spanish models to see which works best.

You may need to experiment a little bit to see which model works best for your speakers. Try different models and add phonetic spellings for words that are commonly misrecognized.