Rule expansions are built by combining together small phrases with a number of grammar operations.The operations are:
Operation |
GrXML Example |
ABNF Example |
Description |
Alternatives |
<one-of> <item><ruleref uri="#A"></item> <item><ruleref uri="#B"></item> </one-of> |
$A | $B; |
match either rule A or rule B |
Optional Expansion |
<ruleref uri="#A"> <item repeat="0-1"><ruleref uri="#B/"></item>
|
$A [$B]; |
match rule A, optionally followed by rule B |
Repetition |
<item repeat="7"><ruleref uri="#A"/></item> |
$A <7>; |
match rule A seven times |
Any two rule references not separated by an operator are treated as a logical "and." So $rule = $A $B; is matched if rule $A is matched followed by rule $B..
Rule Alternatives
As we saw in the previous yes/no grammar, the ASR can be told to accept one or more possibilities by using the rule alternative operator. In ABNF this is the pipe symbol (the | character).
$toppings = pepperoni | sausage | green peppers;
In GrXML, this is done by using the <one-of> element to choose between one or more <item> elements.
<rule id="toppings">
<one-of>
<item>pepperoni</item>
<item>sausage</item>
<item>green peppers</item>
</one-of>
</rule>
The above rule is matched by one of the phrases "pepperoni," "sausage," or "green peppers."
Note that the rule alternative operator in ABNF is greedy. It collects "peppers" with "green" to form the alternative "green peppers." If you wish to scope the effects of the rule alternative operator, you can use parentheses:
$pizza = (pepperoni | sausage) pizza;
This rule matches "pepperoni pizza" or "sausage pizza." Without the parentheses, it would match "pepperoni" or "sausage pizza." In GrXML, the <item> element acts similarly to the parentheses.
Optional Expansion
If you wish to make a portion of a rule expansion optional, you can mark that portion of the expansion as optional. In ABNF, there is an optional operator, square brackets (the [ ] characters):
$yes = yes [please];
This rule matches "yes" or "yes please."
In GrXML, you can simply add an attribute called repeat to <item> elements and set it equal to 0-1 to denote something is optional (if a thing repeats 0 times, it does not occur; repeating once means it occurs one time):
<rule id="yes">
yes <item repeat="0-1">please</item>
</rule>
Any of the ABNF operators may be wrapped inside each other, or used in sequence, to create more and more expressive sentences.
$yes = yes [please | thank you];
The same can be done with nested <item> and <one-of> elements in GrXML:
<rule id="yes">
yes
<item repeat="0-1">
<one-of>
<item>please</item>
<item>thank you</item>
</one-of>
</item>
</rule>
This rule matches "yes," "yes please," or "yes thank you."
Repetition
If you wish to allow a portion of a rule expansion to be repeated a number of times, you can use the repeat operator via angle brackets (the < > characters) in ABNF. Inside the angle brackets, enter the number of times you would like to repeat the preceding portion of a rule. The repeat operator can be used to specify a fixed number of repetitions, or a range of repetitions.
ABNF Example
$digit = one | two | three | four | five | six | seven | eight | nine | zero;
$seven_digits = $digit <7>;
$seven_to_ten_digits = $digit <7-10>;
$one_or_more_digit = $digit <1->;
The $seven_digits rule allows any seven digit combination to be recognized. The $seven_to_ten_digits rule allows any seven to ten digit combination to be recognized. By specifying the range 1- (the equivalent of writing 1-n times), the $one_or_more_digit rule allows one or more digits to be recognized.
In GrXML, the repeat attribute on an <item> element is used (as it was used to indicate optionality). The same basic logic applies:
GrXML Example
<rule id="digit">
<one-of>
<item>one</item>
<item>two</item>
<item>three</item>
<item>four</item>
<item>five</item>
<item>six</item>
<item>seven</item>
<item>eight</item>
<item>nine</item>
<item>zero</item>
</one-of>
</rule>
<rule id="seven_digits">
<item repeat="7"><ruleref uri="#digit"/></item>
</rule>
<rule id="seven_to_ten_digits">
<item repeat="7-10"><ruleref uri="#digit"/></item>
</rule>
<rule id="one_or_more_digit">
<item repeat="1-"><ruleref uri="#digit"/></item>
</rule>
The ABNF repeat operator is tightly binding; it only applies to whatever immediately precedes it. Use parentheses to control how much of a rule expansion it applies to.
ABNF Example
$oh_boy1 = oh boy <3>;
$oh_boy2 = (oh boy)<3>;
The rule $oh_boy1 matches "oh boy boy boy." $oh_boy2 matches "oh boy oh boy oh boy."
In GrXML this behavior is more obvious, since repeat only applies to the <item> it's attached to.
You should be very careful when using the repeat operator, especially when allowing for an infinite number of repetitions in combination with rule references. It is possible to build recursive grammars that have very large numbers of valid parses, and when the Engine attempts to decode utterances against these kinds of grammars it can wreak havoc on your applications.
Unbound Repeat Operators
It is very important to understand the implications of using unbound or unlimited repeat operators (such as <0-> or <1-> etc.), where no upper bound or maximum number of repeats is specified, because these can really cause lots of problems within grammars.
Relatively simple grammars using unbound repeat operators become significantly more complex than they may appear when using unbound repeat operators, especially when there are rules or items within rules that follow such an unbound repeat.
We strongly encourage users to always specify an upper bound value whenever using repeat operators. This upper bound value should be based on the maximum number of repeats that your application expects for the current operation.
Using unbound repeat operators is almost always a bad idea, and is generally poor practice when writing grammars, since this can cause very slow grammar compilation, significantly increased memory use as well as problems during recognition.
Once you are comfortable with rule expansions, move on to Rule References to continue the tutorial.