By setting the language identifier in a grammar, you can load different acoustic models and
get recognitions in various languages.
In order to recognize sounds from different languages, we "train" the Speech Recognizer
on large sets of transcribed audio from each language. The result of this process is an acoustic
model, a large file that contains information about the way words in a language sound.
The LumenVox Speech Recognizer includes support for American English, Australian English, Indian English, U.K. English,
Mexican Spanish, South American Spanish, Canadian French and Brazilian Portuguese.
Configuring Acoustic Models
10.0 and later
Starting with version 10.0, the LumenVox Speech Recognizer no longer includes all of its acoustic models with the standard installation. Instead, the acoustic models are broken out into individual packages for selective installation and configuration. They are available with the rest of the installation packages at www.LumenVox.com/packages.
By default, the en-US (American English) model is installed with the ASR and loaded automatically. If you wish to use another acoustic model, you only need to install the acoustic model. During its installation, the appropriate components are put into directories needed by the ASR. You do not need to move the models around between folders.
As is always the case, you will need to declare which acoustic model you want the audio decoded against in the grammar header's language declaration. You can only decode audio against a single acoustic model.
If you wish to remove an acoustic model, you can use the Windows Add/Remove Programs function, or in Linux call rpm -e packagename.
9.5 and prior
The Speech Engine comes complete with all of the acoustic models you need in order to use the
supported languages. By default, the Speech Engine will only recognize American English. To use
other languages, you must first copy them into the proper folder.
On Windows, inside the Engine's installation directory is a Lang folder, and inside of that is a directory called
OtherLanguages which contains the various acoustic models. On Linux, these models are in /etc/lumenvox/Lang/ by default.
Simply copy the models you wish to use into the Dict directory in the Lang folder. You must stop and restart the Speech
Engine service for these
new models to be available.
It is very important to note that you cannot have two acoustic models active the same time. Within an
application, you could switch between English and Spanish, but you could not decode an utterance against
both models at the same time. This means you cannot have two grammars with different languages active at
once -- doing so will cause an error.
Each acoustic model uses a significant amount of memory, so you should not load models you will not be
using. If you need models that use less memory (at a cost of being less accurate), you should see our
instructions for less memory intensive models.
Continuous vs. Semi-Continuous Models
Previous to version 9.5, the LumenVox Speech Recognizer used either of two algorithms for recognizing patterns in speech – continuous or semi-continuous acoustic models.
As of version 9.5, all acoustic models have been rebuilt using the improved continuous model. If you are using a semi-continuous model on a version after 9.5, it is advised that you switch to the continuous version.
In most cases, more accuracy can be gained from the continuous model, but at the expense of processing time. The continuous model typically uses 15-20% more processing time than the semi-continuous.
If you’re using an acoustic model that does not have continuous mode support, then you must switch to the semi-continuous mode. Otherwise the Speech Recognizer will not function correctly.
If you are looking to use a continuous mode-supported acoustic model (en-AU Australian English, for example), and you’ve already declared the continuous mode in your sre_server.conf, then you only need to load the acoustic model as normal.
All languages are supported with the Semi-Continuous decoder. Please see our Continuous vs. Semi-Continuous Model page for help in switching between the model types.
Which Language to Use?
Which one you should use for your application will usually be obvious. A call router in America is best
served by American English, and one in Australia by Australian English.
But there are cases where it's not so clear. What if you were developing an application for South African
speakers, who speak an English dialect for which there is currently no acoustic model?
In that case you may want to try UK English first, as South African English is heavily influenced by UK
English, and then the other models. The same sort of logic would go for other English or Spanish dialects.
For instance, if you want to recognize Spanish speakers from Spain, you would try both our Mexican and South
American Spanish models to see which works best.
You may need to experiment a little bit to see which model works best for your speakers. Try different
models and add phonetic spellings for words that are commonly misrecognized.