Browse
 
Tools
Rss Categories

SynthAndRecog()

Reference Number: AA-01802 Views: 2674 0 Rating/ Voters

SynthAndRecog() is a dialplan application provided by the res_unimrcp.so module (see Developing Speech Applications on Asterisk for more information) that performs basic automatic speech recognition (ASR) while also playing out synthesized audio (TTS). Callers can interrupt (barge-in) the synthesized audio, and the application returns the result of what a caller said.

Application

     
  • SynthAndRecog(text,grammar,options)

Parameters

text

Text for the TTS to read to the caller. Valid options are plaintext specified inline, SSML specified inline, or a path/URI to an SSML document. See our SSML developer's guide for more information about working with SSML.

grammar

The grammar that should be used for the recognition. Grammars can be specified as text/XML inline or by using a reference to an external file/URI. Multiple grammars can be specified by surrounding them in quotes and separating them with commas, e.g. "mygrammar1,mygrammar2".

The builtin:grammar/gramname syntax is allowed for built-in grammars.

See our documentation on writing grammars if you have no experience building grammars.

options

Options control details about the synthesis and recognition. Valid options are:

  • p - Profile to use in mrcp.conf
  • i - Digits to allow recognition to be interrupted with. Set this to "none" to allow LumenVox to process the DTMF using a DTMF grammar. Otherwise, if "any" or other digits specified, recognition will be interrupted and the digit will be returned to dialplan.
  • t - The recognition timeout (in milliseconds). This is the total amount of time a caller has to speak.
  • b - Barge-in value (no barge-in=0, ASR engine barge-in=1, Asterisk barge-in=2). LumenVox strongly recommends allowing the ASR to perform barge-in instead of Asterisk.
  • gd – The grammar delimiter. Defaults to a comma.
  • ct - The confidence threshold (0.0 - 1.0). If a recognition result has a confidence score below this value, it will be returned as "no match." Defaults to 0.5.
  • sl - The barge-in sensitivity level (0.0 - 1.0). The higher this number, the easier it is to barge-in. Defaults to 0.5.
  • sva - Speed vs. accuracy, set on a scale of 0.0 - 1.0. The higher this number, the faster (and less accurate) recognitions will be. Defaults to 0.5.
  • nb - N-best list length. Defaults to 1; increase this value if you wish to get more answers back from the recognizer.
  • nit - No input timeout. This is the amount of time the caller has to start speaking before the recognizer returns a no-input result.
  • sct - Speech Complete Timeout. This is the amount of time, in milliseconds, LumenVox must detect silence after a user stops speaking before the recognizer begins processing the utterance. Set this lower for single word utterances and higher for longer utterances. In most cases, a value of 800 is correct.
  • dit - DTMF interdigit timeout
  • dtt - DTMF terminate timout
  • dttc - DTMF terminate characters
  • pv - Prosody volume (silent/x-soft/soft/medium/load/x-loud/default)
  • pr - Prosody rate (x-slow/slow/medium/fast/x-fast/default)
  • vn - Voice name to use (e.g. "Lindsey", "Chris", etc.)
  • vg - Voice gender to use (e.g. "male", "female")
  • sw - Save Waveform (true/false)
  • nac - new audio channel (true/false)
  • spl - speech language (en-US/en-GB/etc.). If a language is declared in a grammar (it should be) this will be ignored.
  • cdb - clear DTMF buffer (true/false)
  • mt - media type
  • iwu - input waveform URI (only applies to MRCPv2). Not supported by LumenVox.
  • sint - Speech Incomplete Timeout. Not supported by LumenVox.
  • rm - Recognition Mode. Not supported by LumenVox.
  • hmaxd - hotword max duration. Not supported by LumenVox.
  • hmind - hotword min duration. Not supported by LumenVox.
  • enm - early no match (true/false). Not supported by LumenVox.
  • vv - voice variant. Not supported by LumenVox

You are not required to supply any options. Multiple options can be provided by joining options with an ampersand, e.g. vn=Chris&t=5000

Return Values

RECOGSTATUS

The channel variable ${RECOGSTATUS} is set to "OK" if the recognition started, otherwise it will be set to "ERROR".

RECOG_COMPLETION_CAUSE

The channel variable ${RECOG_COMPLETION_CAUSE} indicates whether recognition completed successfully with a match or an error occurred. The possible values are "000" for success, "001" for nomatch, and "002" for no-input.

RECOG_RESULT

The channel variable ${RECOG_RESULT} stores the result of the recognition, assuming there was a successful recognition. Note that this is an NLSML-formatted XML string that will contain the speech input, the confidence score, and the semantic interpretation from the recognizer.

Remarks

Because dialplan applications cannot take more than 1024 characters as arguments, any large grammars and/or SSML should be specified via external reference (see below for examples).

If you supply SSML or grammars inline, be sure and escape all quotation makes and commas with backslashes.

Parameters such as language specified in a grammar will take precedence over any options set when invoking the SynthAndRecog() application. Our recommendation is to use grammars or SSML for this kind of control.

Example Uses

Say Yes or No (builtin grammar)

[synth-and-recog]
exten = s,1,Answer
exten => s,n,SynthAndRecog(Say yes or no,builtin:grammar/boolean,&f=beep)
exten => s,n,Verbose(Status is: ${RECOGSTATUS} completion cause is:  ${RECOG_COMPLETION_CAUSE} and result is: ${RECOG_RESULT})

Use External Grammars and SSML (http)

[synth-and-recog-ext]
exten = s,1,Answer
exten => s,n,SynthAndRecog(http://myServer/mySSML.ssml,
                  http://myServer/myGrammar.grxml,&f=beep)
exten => s,n,Verbose(Status is: ${RECOGSTATUS} completion cause is:  ${RECOG_COMPLETION_CAUSE} and result is: ${RECOG_RESULT})