Improving Recognition with Phonemes

(MAR 2007) — When the LumenVox Speech Engine misrecognizes words that are in a grammar, it's often because speakers pronounce them differently than the Engine expects. Speech recognition developers can help the Engine better understand callers by adding phonetic spellings to grammars.

This is especially useful for names of people or products, and can even be used to build limited support for words that are foreign to the language you are using.

Phonetic spellings make use of phonemes, the smallest unit of sound. Phonemes do not correspond directly to letters, but to individual sounds. The word "bill" has four letters, but only three phonemes. Using our English phonemes, it would be spelled B IH L.

English has more than 40 phonemes but only 26 letters, so many letters can be pronounced in different ways depending on their context. We have complete phoneme lists available in our Speech Engine help documentation, in the SRGS Grammars section of the Programmer's Guide.

When to Spell Words Phonetically

There is no reason to supply phonetic spellings for each and every word in a grammar. The Engine will have good pronunciations for most words.

One way to find out when to spell words phonetically is to listen to how users pronounce things using the Speech Tuner.

This is particularly true of words that callers may be unfamiliar with, such as proper names and places. You can add several phonetic spellings for a single word in a grammar, so that users can say any of them.

It is also useful for including pronunciations based on local dialects. If an application is deployed in a region where speakers pronounce a word differently, adding that pronunciation can be very useful.

If the Engine already has the correct pronunciation (see How the Engine Pronounces Words to the right), there is no reason to add in the phonetic spelling. They should only be used when you expect speakers to use pronunciations the Engine does not know.

Syntax for Phonetic Spellings

In an ABNF grammar, phonetic spellings are entered by surrounding phonemes by quotes and curly braces.

A good is example is the word "either," which is commonly pronounced in two ways. One has a long I sound at the start (eye–ther) while the other starts with a long E sound (ee–ther). Broken into their phonemes, these two variants would be spelled as AY DH AXR and IY DH AXR.

A grammar rule that contained the two variants of the word "either" might look like this:

$either = "{AY DH AXR}" | "{IY DH AXR}";

The rule would be matched if the Engine recognized either pronunciation.

When the Engine matches a word based on a phonetic spelling, the raw text it returns are the phonemes. If you would prefer that it return the actual word you're looking for, you can specify the word by placing a colon after the phonemes. Here is our rule above, this time with the word "either" returned as raw text:

$either = "{AY DH AXR:either}" | "{IY DH AXR:either}";

Note that "either" is just an example. Our Engine supplies multiple pronunciations for ambiguous words it has in its dictionary, so by default it knows about both pronunciations of "either."

The "Using Phonetic Spellings" and "Adding Foreign Words" topics in our Speech Engine help documentation provide more details about working with phonetic spellings.

How to Download the Latest Release

If you would like information on downloading the latest release of the LumenVox Speech Engine, please contact us. It is a free download for users with current software maintenance packages.

How the Engine Pronounces Words

To understand when to use phonetic spellings, it is useful to understand how the Engine figures out how words in a grammar are pronounced. For each acoustic model — we have different acoustic models for each language we support, as well a special digits–only model in each language — there is a dictionary of words and an automated phonetic speller.

The dictionary contains a large number of words for the given language and their various phonetic spellings. For English, the dictionary contains more than 120,000 words (more than half of which are proper names). If a grammar word is in the dictionary, the Engine is able to get phonetic spellings from the dictionary entry.

There is always a chance that a word you wish to use will not be in the dictionary. This is especially true of things like company and product names, uncommon first or last names, and foreign words.

In that case, the Engine turns to its automated phonetic speller. This component uses phonetic spelling rules for the language to correlate the spelling of the word with its pronunciation.

The effectiveness of the phonetic speller varies depending on the word. If the word is not native to that language, e.g. a Spanish name while using the American English acoustic model, the phonetic speller may provide an incorrect pronunciation because the word does not follow normal pronunciation rules for the active language.

You can test the phonetic speller's accuracy yourself by running our GUI version and entering words to see their pronunciation. Run C:\Program Files\LumenVox\Engine\PhoneticSpeller.exe (there is not currently a Linux version of the GUI tool).

Available Languages

In addition to American English (and American English digits), we have beta acoustic models available for the following languages:

  • Mexican Spanish
  • South American Spanish
  • Canadian French
  • Australian English

We hope to have UK English available soon.

Note that each language has its own set of phonemes, so please refer to the help documentation for the list of phonemes for a given language.

These models are free for all users with current software maintenance contracts.

If you are interested in using any of these languages, please contact us.

Meet Us at VON

You can meet with LumenVox at VON, a conference aimed at voice over network technologies.

It will be held in San Jose, CA, March 19–22. LumenVox will be at the Digium Partner Pavillion. More >

© 2018 LumenVox, LLC. All rights reserved.