Getting Started

Developing speech applications in Asterisk with LumenVox is a pretty straightforward process. Since the LumenVox-Asterisk integration is written and maintained by Digium, the integration uses the Asterisk generic speech API (res_speech.so).

When you load the generic speech module in Asterisk, it makes a number of speech functions (app_speech_utils) available via the dial plan and the Asterisk Gateway Interface. Once the LumenVox connector bridge is loaded (res_speech_lumenvox.so), Asterisk can translate between its generic speech API and LumenVox's own internal API, making the integration seamless.

Full documentation for the speech API is available at Asterisk.org. View the reference for all of the Dialplan Applications that start with the word "Speech."

Basics of Asterisk Speech Recognition

Speech recognition on Asterisk consists of several steps:

Initialize a connection between Asterisk and the Speech Engine using SpeechCreate(). This creates a speech resource with which grammars and recognitions will be associated. For each such resource you initialize, you need one LumenVox license.
Load a grammar, either using the SpeechLoadGrammar() function or by specifying the grammar to be preloaded in lumenvox.conf. SpeechLoadGrammar takes two parameters: a label and a path to the grammar. For instance, to load our built-in boolean grammar using the label yesno, you would use SpeechLoadGrammar(yesno|/opt/lumenvox/engine/Lang/BuiltinGrammars/ABNFBoolean.gram)
Activate grammars as needed. Active grammars control what words the Engine will recognize. At any time, the Engine can only recognize words specified in the active grammars. To activate a grammar, call SpeechActivateGrammar() using the grammar's label as a parameter, e.g. SpeechActivateGrammar(yesno).
Perform a recognition by calling SpeechBackground() or SpeechStart(). Both functions tell Asterisk to start listening for speech, but SpeechBackground() takes the name of a prompt to play while listening for speech. It allows for barge-in, very similar to the Background() function.
Examine the results of a recognition using several variables: $(SPEECH_TEXT(n)) contains the text (or semantic interpretation, if applicable) that a caller said. The n represents the number of the result in case there are multiple returns from the Engine. The first result is number 0. $(SPEECH(results)) contains the number of results. $(SPEECH_SCORE(n)) contains the confidence score for result n.
Unload and deactivate grammars using SpeechUnloadGrammar(label) and SpeechDeactivateGrammar(label). You may then repeat the process, or destroy the speech resource using SpeechDestroy(). Destroying the resource frees up the license.

More Resources

Our general Speech Engine help document contains a lot of detailed information on writing grammars that applies to Asterisk. Particularly read through the sections in the Programmer's Guide on SRGS Grammars and Semantic Interpretation for Speech Recognition (SISR).
We have a lot of online training videos that cover speech development. In particular, our Asterisk Speech Recognition 101 series is a good starting point.
The Asterisk Speech Application Zone contains several small applications, with source code, that you can use as a starting point for your own development.
If you would like personalized, in-depth training for Asterisk speech development, we offer training classes.
Digium maintains an Asterisk Speech Recognition Mailing List that you can participate in to have more advanced questions answered.