This article does not apply when using the DNN ASR engine.
The DNN ASR engine uses an "end-to-end" architecture, meaning "phonemes" are not used in the traditional sense. However, the DNN ASR engine is very good at recognizing foreign or unknown words, and generally does not need special handling to accommodate such words.
Using phonetic rules within an SRGS grammar, developers can build limited functionality for words in languages not supported by the Speech Engine, or grammars supporting occasional words from multiple languages.
Within a grammar you may create rules that are matched to phonetic spellings by using phonemes. By breaking foreign words into their constituent phonemes, grammars can effectively contain those words.
If a word contains a phoneme that does not occur in English, use the English phoneme that is closest to the foreign phoneme.
For instance, if you were going to build phonetic support for the German numbers zero through ten, you would begin by assembling a list of the digits along with phonetic pronunciations:
Number | Phonemes ( From American English List ) |
Null | N UH L |
Eins | AY N S |
Zwei Zwo | S V AY | T S V AY T S OW |
Drei | D R AY |
Vier | F IY R |
Fünf | F UW N F |
Sechs | Z EH K S |
Sieben | Z IY B EH N |
Acht | AA K T |
Neun | N OY N |
Note that several of the phonemes are approximations. The "ch" sound in "acht" is different from the English phoneme represented by K, but since English does not have the phoneme from "acht" we pick the closest sound.
To enter a phonetic spelling as part of a rule within an SRGS grammar, include the phonetic spelling inside of quotes and curly braces. A basic ABNF grammar to return a spoken German digit (using the American English acoustic model) might look like the following:
ABNF Example
#ABNF 1.0;
language en-US;
mode voice;
tag-format <semantics/1.0>;
root $Digits;
$Digit = ("{AY N S: eins}" {out="1"} |("{S V AY: zwei}" | "{T S V AY: zwei}") {out="2"} |"{D R AY: drei}" {out="3"} |"{F IY R: vier}" {out="4"} |"{F UW N F: funf}" {out="5"} |"{Z EH K S: sechs}" {out="6"} |"{Z IY B EH N: sieben}" {out="7"} |"{AA K T: acht}" {out="8"} |"{N OY N: neun}" {out="9"} |"{N UW L: null}" {out="0"});$Digits = {out=''} ($Digit {out+=rules.latest()})<1->;
GrXML Example
<?
xml version="1.0" encoding="UTF-8" ?>
<grammar xmlns="
http://www.w3.org/2001/06/grammar"
xml:lang="en-US" root="Digits" mode="voice" tag-format="semantics/1.0">
<rule id="Digit">
<one-of>
<item>"{AY N S: eins}"<tag>out="1"</tag></item>
<item>
<one-of>
<item>"{S V AY: zwei}"</item>
<item>"{T S V AY: zwei}"</item>
</one-of>
<tag>out="2"</tag>
</item>
<item>"{D R AY: drei}"<tag>out="3"</tag></item>
<item>"{F IY R: vier}"<tag>out="4"</tag></item>
<item>"{F UW N F: funf}"<tag>out="5"</tag></item>
<item>"{Z EH K S: sechs}"<tag>out="6"</tag></item>
<item>"{Z IY B EH N: sieben}"<tag>out="7"</tag></item>
<item>"{AA K T: acht}"<tag>out="8"</tag></item>
<item>"{N OY N: neun}"<tag>out="9"</tag></item>
<item>"{N UW L: null}"<tag>out="0"</tag></item>
</one-of>
</rule>
<rule id="Digits">
<tag>out=""</tag>
<item repeat="1-">
<ruleref uri="#Digit"/>
<tag>out+=rules.latest();</tag>
</item>
</rule>
</grammar>
An Example in Spanish
Although LumenVox already offers a couple of Spanish language acoustic models, the following example using Spanish digits is provided to give developers a better idea of how to phonetically spell words.
Number | Phonemes ( From American English List ) |
Cero | S EH R OW | S EY R OW |
Uno | AX N OW | UW N OW |
Dos | D AO S | D OW S |
Tres | T R EH S | T R EY S |
Quatro | K W AH T R OW |
Cinco | S IH NG K OW |
Seis | S EY Z |
Siete | S IY Y EH T EH |
Ocho | OW CH OW |
Nueve | N UW EH V EH | N UW W EH V EH | N UW EH V EY |
Again, in this example, to enter a phonetic spelling as part of a rule within an SRGS grammar, include the phonetic spelling inside of quotes and curly braces. A basic ABNF grammar to return a spoken Spanish digit (using the American English acoustic model) might look like the following:
ABNF Example
#ABNF 1.0;
language en-US;
mode voice;
tag-format <semantics/1.0>;
root $Digits;
$Digit = (("{AX N OW: uno}" | "{UW N OW: uno}") {out="1"} |("{D AO S: dos}" | "{D OW S: dos}") {out="2"} |("{T R EH S: tres}" | "{T R EY S:tres}") {out="3"} |"{K W AH T R OW: quatro}" {out="4"} |"{S IH NG K OW: cinco}" {out="5"} |"{S EY Z: seis}" {out="6"} |"{S IY Y EH T EH: siete}" {out="7"} |"{OW CH OW: ocho}" {out="8"} |("{N UW EH V EH: nueve}" | "{N UW W EH V EH: nueve}" | "{N UW EH V EY: nueve}") {out="9"} |("{S EH R OW: cero}" | "{S EY R OW: cero}") {out="0"});$Digits = {out=''} ($Digit {out+=rules.latest()})<1->;
GrXML Example
<?xml version="1.0" encoding="UTF-8" ?><grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" root="Digits" mode="voice" tag-format="semantics/1.0"><rule id="Digit"> <one-of> <item> <one-of> <item>"{AX N OW: uno
}"</item> <item>"{UW N OW: uno
}"</item> </one-of> <tag>out="1"</tag> </item> <item> <one-of> <item>"{D AO S: dos
}"</item> <item>"{D OW S: dos
}"</item> </one-of> <tag>out="2"</tag> </item> <item> <one-of> <item>"{T R EH S: tres
}"</item> <item>"{T R EY S: tres
}"</item> </one-of> <tag>out="3"</tag> </item> <item>"{K W AH T R OW: quatro
}"<tag>out="4"</tag></item> <item>"{S IH NG K OW: cinco
}"<tag>out="5"</tag></item> <item>"{S EY Z: seis
}"<tag>out="6"</tag></item> <item>"{S IY Y EH T EH: siete
}"<tag>out="7"</tag></item> <item>"{OW CH OW: ocho
}"<tag>out="8"</tag></item> <item> <one-of> <item>"{N UW EH V EH: nueve
}"</item> <item>"{N UW W EH V EH: nueve
}"</item> <item>"{N UW EH V EY: nueve
}"</item> </one-of> <tag>out="9"</tag> </item> <item> <one-of> <item>"{S EH R OW: cero
}"</item> <item>"{S EY R OW: cero
}"</item> </one-of> <tag>out="0"</tag> </item> </one-of></rule><rule id="Digits"> <tag>out=""</tag> <item repeat="1-"> <ruleref uri="#Digit"/> <tag>out+=rules.latest();</tag> </item></rule></grammar>
An Example using English, French, Spanish, Portuguese, German and Italian
In this example, the grammar allows the user to say "English", "Français", "Español", "Português", "Deutsche" or "Italiano" to select their desired language - all while using the American English acoustic model
ABNF Example
#ABNF 1.0 UTF-8;
language en-US;
mode voice;
tag-format <semantics/1.0>;
root $PickLang;
$PickLang = ((English {out="English"}) |("{F R OW N S EY: Français}" {out="French"}) |("{EH S P AE N Y OW L:Español}" {out="Spanish"}) |("{P AO R T UW JH EY S:Português}" {out="Portuguese"}) |
("{D OY T CH AX:Deutsche}" {out="German"}) |
("{IH T AA L IY AE N OW:Italiano}" {out="Italian"})
);
GrXML Example
<?xml version="1.0" encoding="UTF-8" ?>
<grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" root="PickLang" mode="voice" tag-format="semantics/1.0">
<rule id="PickLang">
<one-of>
<item>English<tag>out="English"</tag></item>
<item>"{F R OW N S EY: Français}"<tag>out="French"</tag></item>
<item>"{EH S P AE N Y OW L:Español}"<tag>out="Spanish"</tag></item>
<item>"{P AO R T UW JH EY S:Português}"<tag>out="Portuguese"</tag></item>
<item>"{D OY T CH AX:Deutsche}"<tag>out="German"</tag></item>
<item>"{IH T AA L IY AE N OW:Italiano}"<tag>out="Italian"</tag></item>
</one-of>
</rule>
</grammar>