Tools

Lexicons

Reference Number: AA-01065 Views: 19473

0 Rating/ Voters

This article does not apply when using the DNN ASR engine.

The DNN ASR engine uses an "end-to-end" architecture, meaning "phonemes" are not used in the traditional sense. However, the DNN ASR engine is very good at recognizing foreign or unknown words, and generally does not need special handling to accommodate such words.
Currently, the DNN ASR engine ignores lexicons.

In addition to the phonetic spellings that can be placed directly in the grammar, multiple custom pronunciations can be grouped into a single file and referenced from an SRGS grammar. A collection of pronunciations like this is known as a lexicon.

Lexicons introduce a degree of modularity into your grammars, allowing separation between the specification of pronunciation and the rest of the grammar, including the rules and tags. You may reuse a single lexicon across multiple grammars, and fix errors or add words to the lexicon in a single place without modifying the grammars that reference it.

For example, if you were working in American English, and wanted to include the word "franc" into your grammar, the default pronunciation for this word is "F R AE NG K" (lumenvox) or "fr{Nk" (sampa), which sounds like "Frank", but it you wanted to treat the pronunciation like the french sounding franc (with the a sounding more like o from on), you could add a lexicon defining franc as "F R AO NG K" (lumenvox) or "frONk" (sampa) instead. You can choose to have the default and lexicon pronunciations considered, by declaring the lexicon as a "backup", or force the one lexicon's pronunciation to be the only one considered by declaring the lexicon as "primary".

Grammars may reference more than 1 lexicon. See the Using Lexicons section below for a description of how to use multiple lexicons in a single grammar.

In an ABNF grammar, a lexicon is declared in the header using the 'lexicon' keyword followed by an ABNF URI:

#ABNF 1.0;
language en-US;
mode voice;
lexicon <lexicon.xml>;
root $yesorno;

$yes = yes;
$no = no;
$yesorno = $yes | $no;

In a GrXML grammar, 'lexicon' elements are delcared as immediate children of the 'grammar' element. The 'lexicon' element must have a 'uri' attribute.

<grammar version="1.0"
xml:lang="en-US" mode="voice" root="yesorno"
xmlns="http://www.w3.org/2001/06/grammar"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2001/06/grammar

http://www.w3.org/TR/speech-grammar/grammar.xsd">

<lexicon uri="lexicon.xml"/>
<rule id="yesorno">
<one-of>
<item> yes </item>
<item> no </item>
</one-of>
</rule>
</grammar>

Note that similar to working with other references used by LumenVox, the URI specified for lexicon files can refer to a file system, or remote (HTTP) file reference. These files will be automatically fetched as needed, providing sufficient permissions exist.

Lexicon file

The lexicon file is an XML document with a single <lexicon> element. Within the <lexicon> element are one or more <entry> elements, which include one or more <definition> elements.

<lexicon> Element

The <lexicon> element declares that the file is a lexicon and it must include both an 'xml:lang' attribute and an 'alphabet' attribute.

The 'xml:lang' specifies the language of the words in the lexicon, and is given as a language code (for example en-US). Note that if there is a mismatch between the language specified for a lexicon and its parent grammar, the parent grammar language will be used.

The 'alphabet' attribute specifies the format of the pronunciations in the 'definition' elements. LumenVox currently supports two formats of alphabet within lexicon files.

SAMPA alphabet format is supported, and is declared using the alphabet="application/sampa" attribute setting in the <lexicon> element. To see a list of supported SAMPA phonemes that can be used, please refer to the phonemes articles for each supported language, for example American English Phonemes.

LumenVox alphabet is also supported, which is the LumenVox internal alphabet, as described in the phonemes articles specific to each supported language, for example American English Phonemes. The LumenVox alphabet format is specified using the alphabet="application/lumenvox" attribute setting in the <lexicon> element.

The alphabet attribute can be extended with an optional localization modifier. For most reliable performance across multiple engine versions, it is suggested to specify ';localization=lumenvox' appended to the end of the alphabet attribute.

<entry> Elements

The <entry> elements define the words for which custom pronunciations are provided. The required 'key' attribute specifies the spelling of the word. There can be one or many <entry> elements within the <lexicon> element.

<definition> Elements

Within the <entry> elements are one or more <definition> elements. The required 'value' attribute specifies the pronunciation of the word in the parent entry element. This pronunciation is given in the alphabet specified in the 'alphabet' attribute of the lexicon element; currently the SAMPA and LumenVox phonetic alphabets are supported.

Example Lexicon (sampa)

<?xml version="1.0" encoding="UTF-8" ?>
<lexicon xml:lang="en-US"
alphabet="application/sampa;localization=lumenvox">

<entry key="no">
  <definition value="noU" />
</entry>

<entry key="yes">
  <definition value="jEs" />
  <definition value="jeI" />
</entry>

</lexicon>

Example Lexicon (lumenvox)

<?xml version="1.0" encoding="UTF-8" ?>
<lexicon xml:lang="en-US"
alphabet="application/lumenvox;localization=lumenvox">

<entry key="no">
  <definition value="N OW" />
</entry>

<entry key="yes">
  <definition value="Y EH S" />
  <definition value="Y AE" />
</entry>

</lexicon>

Using Lexicons

Referencing multiple lexicons within a single grammar is allowed, and giving multiple pronunciations for the same word is defined behavior. Resolution of priority between multiple grammars is controlled by the 'type' query property in the URI. By default, all pronunciations are added as "primary", meaning they will override any existing internal definitions.

Alternatively, the lexicon can be declared as "backup", where entries are added to the internal alternates, and no existing pronunciations are removed. This is explicitly invoked by adding 'type=backup'to the query string.

ABNF:

lexicon <lexicon.xml?type=backup>;

GrXML:

<lexicon uri="lexicon.xml?type=backup" />;

As mentioned above, lexicons will automatically be designated as a 'primary' type if no type is explicitly specified. If a primary lexicon contains a pronunciation, all existing pronunciations for that word are ignored in lieu of the new pronunciation. This is explicitly invoked by adding 'type=primary' to the query string.

ABNF:

lexicon <lexicon.xml?type=primary>;

GrXML:

<lexicon uri="lexicon.xml?type=primary" />;

SpeechWorks format types are also supported, so the type may be written with an 'SWI' prefix.

ABNF:

lexicon <lexicon.xml?SWI.type=primary>;

GrXML:

<lexicon uri="lexicon.xml?SWI.type=primary" />;

Combining Lexicons with Phonetic Spellings

Our Using Phonetic Spellings article describes an alternate, inline method of specifying the pronunciation of words and phrases.

As of LumenVox version 13.0, a change was implemented to better prioritize how these two methods of pronunciation are processed within the ASR if alternate definitions for the same word are applied by both Custom Lexicons and also Inline Phonetic Spellings. Please read this to understand how these two methods now interact with each other.

Using Phonetic Spellings