Getting Started

Developing speech applications in Asterisk with LumenVox is a pretty straightforward process. Since the LumenVox-Asterisk integration is written and maintained by Digium, the integration uses the Asterisk generic speech API (res_speech.so).

When you load the generic speech module in Asterisk, it makes a number of speech functions (app_speech_utils) available via the dial plan and the Asterisk Gateway Interface. Once the LumenVox connector bridge is loaded (res_speech_lumenvox.so), Asterisk can translate between its generic speech API and LumenVox's own internal API, making the integration seamless.

Full documentation for the speech API is available at Asterisk.org. View the reference for all of the Dialplan Applications that start with the word "Speech."

Basics of Asterisk Speech Recognition

Speech recognition on Asterisk consists of several steps:

  1. Initialize a connection between Asterisk and the Speech Engine using SpeechCreate(). This creates a speech resource with which grammars and recognitions will be associated. For each such resource you initialize, you need one LumenVox license.
  2. Load a grammar, either using the SpeechLoadGrammar() function or by specifying the grammar to be preloaded in lumenvox.conf. SpeechLoadGrammar takes two parameters: a label and a path to the grammar. For instance, to load our built-in boolean grammar using the label yesno, you would use SpeechLoadGrammar(yesno|/opt/lumenvox/engine/Lang/BuiltinGrammars/ABNFBoolean.gram)
  3. Activate grammars as needed. Active grammars control what words the Engine will recognize. At any time, the Engine can only recognize words specified in the active grammars. To activate a grammar, call SpeechActivateGrammar() using the grammar's label as a parameter, e.g. SpeechActivateGrammar(yesno).
  4. Perform a recognition by calling SpeechBackground() or SpeechStart(). Both functions tell Asterisk to start listening for speech, but SpeechBackground() takes the name of a prompt to play while listening for speech. It allows for barge-in, very similar to the Background() function.
  5. Examine the results of a recognition using several variables: $(SPEECH_TEXT(n)) contains the text (or semantic interpretation, if applicable) that a caller said. The n represents the number of the result in case there are multiple returns from the Engine. The first result is number 0. $(SPEECH(results)) contains the number of results. $(SPEECH_SCORE(n)) contains the confidence score for result n.
  6. Unload and deactivate grammars using SpeechUnloadGrammar(label) and SpeechDeactivateGrammar(label). You may then repeat the process, or destroy the speech resource using SpeechDestroy(). Destroying the resource frees up the license.

More Resources

© 2012 LumenVox LLC. All rights reserved.