If you have not already done so, please see our SRGS Introduction for more information about our SRGS tutorial.
We will begin our look at writing SRGS grammars with a simple grammar that lets the Engine recognize the words "yes" or "no." Yes or no grammars are the "hello world" of grammar writing.
Example
ABNF Format:
#ABNF 1.0 UTF-8;
language en-US; //use the American English pronunciation dictionary.
mode voice; //the input for this grammar will be spoken words.
root $yesorno;
$yes = yes;
$no = no;
$yesorno = $yes | $no;
Equivalent grammar in
GrXML Format:
<?
xml version="1.0" encoding="UTF-8" ?>
<grammar xmlns="
http://www.w3.org/2001/06/grammar"
xmlns:xsi="
http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://www.w3.org/2001/06/grammar
http://www.w3.org/TR/speech-grammar/grammar.xsd"
xml:lang="en-US" version="1.0"
root="yesorno"
mode="voice"
tag-format="semantics/1.0.2006">
<rule id="yes">
<item>yes</item>
</rule>
<rule id="no">
<item>no</item>
</rule>
<rule id="yesorno">
<one-of>
<item><ruleref uri="#yes"/></item>
<item><ruleref uri="#no"/></item>
</one-of>
</rule>
</grammar>
The Grammar Identifier
Any SRGS grammar written in ABNF notation must begin with a line like:
This identifies to the LumenVox grammar compiler that the file being read is an ABNF grammar. Immediately following the grammar type is an optional declaration that indicates the character encoding, e.g. UTF-8, UTF-16, or ISO-8859-1. The line ends with a semicolon, as do all lines in an ABNF grammar.
By contrast, an SRGS GrXML grammar must begin with an XML prolog element followed by a grammar element like:
<?xml version="1.0" encoding="UTF-8" ?>
<grammar ...>
...
</grammar>
Immediately following the XML version is an optional declaration that indicates the character encoding, e.g. UTF-8, UTF-16, or ISO-8859-1.
The Grammar Header
Following the identifier, a well formed grammar will contain information about the language the grammar is written in, the expected interaction mode (voice or DTMF), and the name of a rule where the Engine will begin its search (the root rule). In addition, the header may contain one or more tags, and an identifier describing the tag format for this grammar. Tags will be discussed later in this tutorial.
The contents of the grammar header may be in any order, but no header data may occur in the file after the first rule is written. The Speech Engine only requires the identifier line in the header; if interaction mode, language, or tag format are left blank, the ASR Engine will use default values. It assumes voice for the interaction mode, en-US as the language, and semantics/1.0 for the tag format. If no root rule is specified in the header, all rules will be assigned as root, meaning the grammar will be matched if any rule is matched.
It is good practice to always explicitly assign the header information instead of relying on the default values.
Comments
ABNF grammars may contain comments anywhere in their body (with the exception of the first line, containing the grammar identifier). The comment format is the same one used by the C, C++, and Java programming languages.
GrXML grammars, being XML format, support comments in the following format : <!-- comment -->
Rules
A grammar's rules specify which words and combinations of words the ASR Engine can recognize. They are the heart of the grammar.
In ABNF format, each rule has a name, appearing on the left side of an = sign, and a rule expansion, appearing on the right side. A rule name starts with a $ character. Immediately after the $ is the rule's name, which must start with a letter and may be followed by additional letters, numbers, or underscore characters. The first rule in our above ABNF grammar is:
$yes = yes;
In GrXML format, a rule is defined within a rule element, which contains one or more item child elements (one per word or phrase). The first rule in our above GrXML grammar is:
<rule id="yes">
<item>yes</item>
</rule>
The rule expansion describes to the Engine what sequences of words will allow a rule to be matched. In the above rule, the expansion consists entirely of the word "yes," and thus the rule is matched if the word "yes" is spoken. An entire grammar is matched only if its root rule is matched.
The second rule is matched if the word "no" is detected.
In ABNF format, the third rule (yesorno) contains a pipe symbol (the | character), which is a logical "or" operator. So the third rule is matched if either the $yes rule or the $no rule is matched.
In GrXML format, the third rule (yesorno) contains a <one-of> element with items containing <ruleref> references to either "yes" or "no" rules, allowing (either) one of these rules to be matched.
How the Speech Engine Uses a Grammar
When the ASR Engine begins decoding your audio, it starts at the root rule of the grammar (in this case the rule $yesorno). It then steps through all legal expansions. It moves into the rules $yes and $no, since it can match against either rule. Since the first words in the rules $yes and $no are "yes" and "no," the Engine knows that it is allowed to recognize either word.
If the Engine detects "yes" as a possibility, it then looks for the next word it can recognize in the $yes rule. Since there are no more words in the $yes rule, the rule is matched. And since the $yes rule is matched, the $yesorno root rule is matched, so the entire grammar is matched.
Building more complex rules via rule expansions is the next step in our SRGS tutorial.