Browse
 
Tools
Rss Categories

MRCPRecog()

Reference Number: AA-01801 Views: 16658 0 Rating/ Voters

MRCPRecog() is a dialplan application provided by the res_unimrcp.so module (see Developing Speech Applications on Asterisk for more information) that performs basic automatic speech recognition (ASR). It can play an audio file and allow the caller to interrupt it (barge-in) and return the result of what a caller said.

Application

     
  • MRCPRecog(grammar,options)

Parameters

grammar

The grammar that should be used for the recognition. Grammars can be specified as text/XML inline or by using a reference to an external file/URI. Multiple grammars can be specified by surrounding them in quotes and separating them with commas, e.g. "mygrammar1,mygrammar2".

The builtin:grammar/gramname syntax is allowed for built-in grammars.

See our documentation on writing grammars if you have no experience building grammars.

options

Options control details about the recogintion. Valid options are:

  • p - Profile to use in mrcp.conf
  • i - Digits to allow recognition to be interrupted with. Set this to "none" to allow LumenVox to process the DTMF using a DTMF grammar. Otherwise, if "any" or other digits specified, recognition will be interrupted and the digit will be returned to dialplan.
  • f - Filename to play while recognition occurs (if empty or not specified, no file is played)
  • t - The recognition timeout (in milliseconds). This is the total amount of time a caller has to speak.
  • b - Barge-in value (no barge-in=0, ASR engine barge-in=1, Asterisk barge-in=2). LumenVox strongly recommends allowing the ASR to perform barge-in instead of Asterisk.
  • gd – The grammar delimiter. Defaults to a comma.
  • ct - The confidence threshold (0.0 - 1.0). If a recognition result has a confidence score below this value, it will be returned as "no match." Defaults to 0.5.
  • sl - The barge-in sensitivity level (0.0 - 1.0). The higher this number, the easier it is to barge-in. Defaults to 0.5.
  • sva - Speed vs. accuracy, set on a scale of 0.0 - 1.0. The lower this number, the faster (and less accurate) recognitions will be. Defaults to 0.5.
  • nb - N-best list length. Defaults to 1; increase this value if you wish to get more answers back from the recognizer.
  • nit - No input timeout. This is the amount of time the caller has to start speaking before the recognizer returns a no-input result.
  • sct - Speech Complete Timeout. This is the amount of time, in milliseconds, LumenVox must detect silence after a user stops speaking before the recognizer begins processing the utterance. Set this lower for single word utterances and higher for longer utterances. In most cases, a value of 800 is correct.
  • dit - DTMF interdigit timeout
  • dtt - DTMF terminate timout
  • dttc - DTMF terminate characters
  • sw - Save Waveform (true/false)
  • nac - new audio channel (true/false)
  • spl - speech language (en-US/en-GB/etc.). If a language is declared in a grammar (it should be) this will be ignored.
  • cdb - clear DTMF buffer (true/false)
  • mt - media type
  • iwu - input waveform URI (only applies to MRCPv2). Not supported by LumenVox.
  • sint - Speech Incomplete Timeout. Not supported by LumenVox.
  • rm - Recognition Mode. Not supported by LumenVox.
  • hmaxd - hotword max duration. Not supported by LumenVox.
  • hmind - hotword min duration. Not supported by LumenVox.
  • enm - early no match (true/false). Not supported by LumenVox.

You are not required to supply any options. Multiple options can be provided by joining options with an ampersand, e.g. f=sayHelloWorld&t=5000

Return Values

RECOGSTATUS

The channel variable ${RECOGSTATUS} is set to "OK" if the recognition started, otherwise it will be set to "ERROR".

RECOG_RESULT

The channel variable ${RECOG_RESULT} stores the result of the recognition, assuming there was a successful recognition. Note that this is an NLSML-formatted XML string that will contain the speech input, the confidence score, and the semantic interpretation from the recognizer.

Remarks

Because dialplan applications cannot take more than 1024 characters as arguments, any large grammars  should be specified via external reference (see below for examples).

If you supply XML grammars inline, be sure and escape all quotation makes and commas with backslashes.

Parameters such as language specified in a grammar will take precedence over any options set when invoking the MRCPRecog() application. Our recommendation is to use grammars for this kind of control.

Example Uses

Say Yes or No (builtin grammar)

exten => 1,1,MRCPRecog(builtin:grammar/boolean,p=default&f=beep)
exten => 1,n,Verbose(Status is: ${RECOGSTATUS} and result is: ${RECOG_RESULT})

Say Yes or No (builtin grammar, specify Mexican Spanish language)

exten => 1,1,MRCPRecog(builtin:grammar/boolean,p=default&f=beep&spl=es-MX)
exten => 1,n,Verbose(Status is: ${RECOGSTATUS} and result is: ${RECOG_RESULT})

Use External Grammar (http)

exten => 1,1,MRCPRecog(http://myServer/myGrammar.grxml,p=default&f=beep)
exten => 1,n,Verbose(Status is: ${RECOGSTATUS} and result is: ${RECOG_RESULT})

Set N-Best & Confidence Threshold (lots of results)

exten => 1,1,MRCPRecog(http://myServer/myGrammar.grxml,p=default&nb=5&f=beep&ct=0.1)
exten => 1,n,Verbose(Status is: ${RECOGSTATUS} and result is: ${RECOG_RESULT})

Voice and DTMF grammar at same time

exten => 1,1,MRCPRecog("http://myServer/myGrammar-voicemode.grxml, http://myServer/myGrammar-dtmfmode.grxml",p=default&f=beep) 

exten => 1,n,Verbose(Status is: ${RECOGSTATUS} and result is: ${RECOG_RESULT})