This article does not apply when using the DNN ASR engine.
The DNN ASR engine uses an "end-to-end" architecture, meaning "phonemes" are not used in the traditional sense. However, the DNN ASR engine is very good at recognizing foreign or unknown words, and generally does not need special handling to accommodate such words.
Grammars that use only in-line phonemes may not work correctly with the DNN ASR engine.
Within a LumenVox SRGS grammar, you can specify the pronunciations of words using phonemes. Phonemes are the basic sounds that make up words. Specifying pronunciations this way is useful for adding alternative pronunciations to help deal with some proper names and to help support dialects or even foreign words.
To add a phonetic spelling to a rule, enclose the phonetic spelling within double quotation marks " " and curly braces { }.
For instance, the word "either" is commonly pronounced in two ways. One has a long I sound at the start (eye-ther) while the other starts with a long E sound (ee-ther). Broken into their phonemes, these two variants would be spelled as AY DH AXR and IY DH AXR.
A rule that contained the two variants of the word "either" might look like this:
ABNF Format:
$either = "{AY DH AXR}" | "{IY DH AXR}";
GrXML Format:
<rule id="either">
<one-of>
<item>"{AY DH AXR}"</item>
<item>"{IY DH AXR}"</item>
</one-of>
</rule>
A rule set up like this, however, will only return the phonemes if the rule is matched. If you want to get the actual word returned as raw text by the Engine, you need to enclose the word within the quotation marks and curly braces, but separate it from the phonemes with a colon.
ABNF Format:
$either = "{AY DH AXR:either}" | "{IY DH AXR:either}";
GrXML Format:
<rule id="either">
<one-of>
<item>"{AY DH AXR:either}"</item>
<item>"{IY DH AXR:either}"</item>
</one-of>
</rule>
Note that adding phonetic spellings is distinct from semantic interpretation, even though both use curly braces in ABNF (the curly braces for a phonetic spelling are always inside of double quotation marks).
Here is a Spanish grammar that uses these principles:
ABNF Format:
#ABNF 1.0 ISO-8859-1;
language es-MX;
mode voice;
root $main;
tag-format <semantics/1.0>
public $main = "{K R EY D IY T OW:crédito}" {out="crédito"} //custom pronunciation
| Sí {out="Sí"}| (("el baño") | "{EY L BV AA GN OW}") {out="el baño"}| hola {out="hola"};
GrXML Format:
<?
xml version="1.0" encoding="UTF-8" ?>
<grammar xmlns="
http://www.w3.org/2001/06/grammar"
xmlns:xsi="
http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://www.w3.org/2001/06/grammar
http://www.w3.org/TR/speech-grammar/grammar.xsd"
xml:lang="es-MX" version="1.0"
root="main"
mode="voice"
tag-format="semantics/1.0">
<rule id="main">
<one-of>
<item>"{K R EY D IY T OW:crédito}"<tag>crédito</tag></item>
<item>Sí<tag>Sí</tag></item>
<item>
<one-of>
<item>el baño</item>
<item>"{EY L BV AA GN OW}"</item>
</one-of><tag>el baño</tag> </item>
<item>hola<tag>hola</tag></item>
</one-of>
</rule>
</grammar>
For a common application of phonetic spellings, see Adding Foreign Words.
Multiple Pronunciations - Alternate Form
As was shown above, a rule that contains the two variants of the word "either" might look like this:
ABNF Format:
$either = "{AY DH AXR}" | "{IY DH AXR}";
GrXML Format:
<rule id="either">
<one-of>
<item>"{AY DH AXR}"</item>
<item>"{IY DH AXR}"</item>
</one-of>
</rule>
However there is an alternate form whose support was introduced in LumenVox 13.0, where multiple pronunciations can be included within the curly braces, separated by a pipe | character.
ABNF Format:
$either = "{AY DH AXR | IY DH AXR}";
GrXML Format:
<rule id="either">
<one-of>
<item>"{AY DH AXR | IY DH AXR}"</item>
</one-of>
</rule>
When returning results from this alternate form, the result would be the phoneme string contained within the curly braces, but with the pipe | character stripped out, since it is not valid, thus either of the above would return "AY DH AXR IY DH AXR"
SISR results associated with this combined phoneme string would be the same as expected as shown earlier.
Combining Phonetic Spellings And Lexicons
As you can see from this article and our article on ASR Lexicons, there are two ways in which users can introduce custom pronunciations for words.
It is therefore possible that someone may end up using both of these at some point, so understanding how these two applications of spellings would interact with each other.
The first thing the ASR will rely on when determining a word's pronunciation will be to use its internal methods, which consist of a combination of an extensive lexicon (or dictionary) in each supported language, along with statistical and various rules-based algorithms.
Following this, users can optionally specify custom Lexicons, which themselves can be configured to replace word definitions (type=primary) or to add alternate pronunciations (type=backup).
Starting with LumenVox version 13.0, Inline phonetic spellings will always be applied after any custom lexicons have been processed, thus allowing any inline phonetic spellings within a grammar to always be used as an alternative pronunciation, regardless of how the pronunciation of those words was previously obtained.
Prior to LumenVox version 13.0, inclusion of these inline phonetic spellings was somewhat deterministic, based on the order in which combined grammars was processed, and whether a custom lexicon containing a conflicting pronunciation for the inline inline phonetic version was defined as primary or backup type.
Word Label
Earlier, we discussed having the word appear within the curly braces as being optional. This is certainly true, however there are additional effects on having this Word Label present, as it applies to results.
As previously mentioned, the following definition would result in the phoneme string itself being returned as a result:
"{AY DH AXR}" will return AY DH AXR as a decoded result
However, when a Word Label is specified after a colon, the Word Label will be returned
"{AY DH AXR:either}" will return either as a decoded result
This subtle difference is handled in a much different way inside the ASR during processing. The form without the Word Label, for example, will have any SI associated with that phoneme string only, and will not be affected by any SI associated with the word either as defined elsewhere in the grammar.
The second form, which includes the Word Label "{AY DH AXR:either}" will be handled within the ASR simply as an alternate pronunciation of the word either, where SI rules will be shared among other definitions of either within the grammar.
How this subtle difference affects your grammars depends on how the grammar is written, but to clarify one aspect, users can define a rule for a word, without using a Word Label and have a SI associated with that unique phonetic sequence if desired.
Duplicate Pronunciations
As already mentioned, there are a few ways in which the pronunciation of words can be specified to the ASR during grammar processing.
It is therefore possible that some of these definitions match some existing pronunciation for the word. In such cases, the later definition will be ignored, since that pronunciation already exists. No additional duplicate pronunciation definitions will be added - if someone is attempting to do this in order to influence the weighting of a particular word or phrase, you should see our Applying Grammar Weights article instead.
Combining Inline Phonemes and Other Words
Introduced in LumenVox 13.0, users can now define a rule containing a mixture of inline phonemes and regular words if a rules is needed where the combination of them is needed.
"{S UW P ER} cool" will return S UW P ER cool as a decoded result, only when the combined phonemes of S UW P ER and the current pronunciation definition for cool is matched.
The definition of this combination, where S UW P ER has a Word Label of super applied would be "{S UW P ER:super} cool", and would return super cool as a decoded result, when matched.
Regarding SI processing for this sort of combination, it is essentially the same, where the absence of a Word Label in the inline phonetic string will cause this rule to be processed with its own unique SI, and the presence of the Word Label would be processed as if "super cool" was decoded, and thereby share any SI among other "super cool" definitions in the grammar.