Improving Grammars

Write Better Grammars

(DEC 2007) — At LumenVox, we're fond of saying that grammars — the lists of words and phrases that determine what the Speech Engine can recognize at a given time — are the heart of any speech recognition application.

Grammars constrain what your callers can say, so it is important to be careful about what words are in a grammar. In our grammar white paper, we recommend keeping grammars small enough to provide "just enough" coverage to cover the majority of responses from users.

But writing good grammars goes beyond picking appropriate vocabularies. The Speech Recognition Grammar Specification (SRGS) provides powerful features that makes writing grammars similar to writing in computer programming languages. This means that care must also be taken in the actual structure of your grammar file to ensure that your grammars function as intended, and that they are quickly parsed by the Speech Engine.

Using the following ABNF grammar, we'll explore some of the traps new grammar writers can fall into. If you're unfamiliar with writing ABNF grammars, you may want to read through our SRGS tutorial first. The following grammar is designed to capture 2 digit numbers:

root $TwoDigits;

$TwoDigits = {out=0} ([$TensDigit {out+=rules.latest()}] [$OnesDigit {out+=rules.latest()}] | $TeensDigit{out+=rules.latest()} | [$OnesDigit {out+=rules.latest()*10}] [$OnesDigit {out+=rules.latest()}]) {out=out.toString()};

$TensDigit = ten {out=10} | twenty {out=20} | thirty {out=30} | forty {out=40} | fifty {out=50} | sixty {out=60} | seventy {out=70} | eighty {out=80} | ninety {out=90};

$TeensDigit = eleven {out=11} | twelve {out=12} | thirteen {out=13} | fourteen {out=14} | fifteen {out=15} | sixteen {out=16} | seventeen {out=17} | eighteen {out=18} | nineteen {out=19};

$OnesDigit = zero {out=0} | one {out=1} | two {out=2} | three {out=3} | four {out=4} | five {out=5} | six {out=6} | seven {out=7} | eight {out=8} | nine {out=9};

Avoid Ambiguity

Any input to a grammar should only have one valid parse (the Grammar Editor tool, included with the LumenVox Speech Tuner, can show many parses a grammar returns for a given input).

The more parses a grammar has, the longer it takes the Engine to decode an utterance against it. It also decreases accuracy. As the number of valid parses increases, decode time can increase dramatically.

In the grammar above, the grammar is capable of correctly handling parses such as "two one" or "twenty one." But if a caller says just "one," it allows for two valid parses, as the last part of the root rule allows two optional $OnesDigit rule matches. In this case, each parse has a different interpretation: the first $OnesDigit match multiplies the interpretation by 10, returning a result of 10, while the second one returns a result of 1.

This sort of ambiguity not only increases decode time while decreasing accuracy, it also makes it harder for your application to correctly handle results. You would probably not expect a caller saying "one" to return a result of "10," but that is precisely one thing this grammar allows for.

KeepRules Compact

The larger and more complex rules are, the longer it takes to compile a grammar or decode against it. One good trick to keeping rules short is to combine rules with common words, where possible. For instance, the following rule:

$name = James Anderson | Jim Anderson | Jimmy Anderson | James | Jim | Jimmy;

Can be combined into:

$name = (James | Jim | Jimmy) [Anderson];

While it is a relatively small savings for one rule, across large grammars this sort of compactness can add up, decreasing load and decode times.

Prune Unwanted Parses

You would obviously want to not allow the example to the left, where "one" has a valid parse that returns as "ten." But even allowing "one" to be a valid parse is quite possibly a bad idea if all you want to capture are two digit strings.

The grammar allows for other parses such as "twenty zero" (it returns with an interpretation of "20"), or "ten two" (returning with an interpretation of "12"). Even a null input is a valid parse ("" returns with an interpretation of "0").

Unwanted parses slow down decodes and reduce accuracy. It's pretty unlikely that a caller would ever say "twenty zero" or that a developer would want to allow for that sort of input. Accounting for these sorts of unlikely cases increases the probability that a caller behaving appropriately will be misrecognized — e.g. somebody who says "twenty two" might get mistaken for the unreasonable "twenty zero."

Be Careful with Recursion/span>

SRGS allows for recursive rules, that is rules with references to themselves. Any time you work with recursion, you must be careful to avoid infinite loops. Since the LumenVox grammar parser parses from left to right, you should always avoid doing left–hand recursion.

For instance, the following rule will match the word "foo" any number of times:

$rule = foo ($rule | $NULL);

If the input is "foo foo," the Engine parses the rule, expanding the reference $rule each time until it matches $NULL and terminates. On the other hand, if your rule is written:

$rule = ($rule | $NULL) foo;

The parser will get caught in an infinite loop. The first thing it will attempt to do is to expand the $rule reference, only to expand it again, and again, ad infinitum.

© 2016 LumenVox, LLC. All rights reserved.