Tools

Intro to Semantic Interpretation

Reference Number: AA-01068 Views: 20385

0 Rating/ Voters

When building an application using speech recognition, it is often not enough to know what the user said -- rather, we need the meaning of an utterance. One caller may ask for "Customer Support" while another asks for "Customer Service," but as far as a call router is concerned, these are really the same thing.

Assigning meaning from a raw text is a process called semantic interpretation.

Creating a grammar and examining the parse tree generated by the Engine from a user's speech is the first step toward semantic interpretation. But it is often not enough to just read off the values of the tree; significant post processing of the tree is necessary to extract meaning.

As an example, here is an SRGS/ABNF grammar that matches speaking numbers from zero to nine hundred and ninety nine (it is by no means complete; for instance, it cannot recognize "two forty six" for 246):

If the Engine recognizes "two hundred twelve," the parse tree looks like this:

$small_number:
$hundred:
$base:
"TWO"
"HUNDRED"
$tens:
$teen:
"TWELVE"

But if your application needs to determine if a speaker spoke a number larger than 500, then it's not enough to know the parse tree; all you have is a structure of words. You need to write code to transform the tree into the number 212.

The logic to do this transformation is going to be tied closely to the grammar's rules. For instance, within the $hundred rule, you have to know that there is an optional $base rule that has to be multiplied by 100. But in the $twenty_to_ninetynine rule, the optional $base has to be added to the total of the number you are building.

Because of the close relationship between a grammar's rules, and the semantic interpretation process, it can be convenient if you can put the semantic interpretation directly into the grammar.

Using a standard called Semantic Interpretation for Speech Recognition, SRGS grammars can contain logic that can aid in semantic interpretation. Using SISR, you place "tags" into grammars. These tags contain ECMAScript code (also known as JavaScript) that is executed when a rule (or portions of a rule) are matched.

The LumenVox Speech Engine supports the W3C's first approved recommendation for SISR, version 1.0 (adopted April 2007). You may refer to the complete specification for more details. The Engine also supports older drafts of the specification for backwards compatibility with older grammars.

The basic idea behind LumenVox's implementation of SISR is this:

Each tag contains snippets of ECMAScript code (still popularly known as JavaScript).
Each grammar rule can be thought of as a function that executes the ECMAScript code in its tags from left to right, and returns a value based on that executed code.
Any other rules that are referenced in a grammar rule are also executed left to right, and any tag that appears after a rule reference may use that rule's return value.
Grammar rules are only executed if the recognizer detects something to match the rule.

To get started learning about SISR, see SISR Basics. For more information, see Rule Variables and SI Script by Example. Our documentation on Getting The Return Value describes how to access SISR results in your application.

If you are familiar with SISR using older versions of the standard, see Converting From Older SISR.

Please note that the LumenVox implementation of SISR 1.0 supports more tag format specifications than is specified in the official SISR 1.0 standard in conjunction with the STRICT_SISR_COMPLIANCE config setting (this is to support backwards compatibility with older versions of SISR). Be sure to check Tag Formats for a complete list of supported tag formats, and read SISR Basics for more information on the differences.