Browse
 
Tools
Rss Categories

API Integration with CPA

Reference Number: AA-02274 Views: 8632 0 Rating/ Voters

LumenVox CPA functionality was designed to be compatible with our generic LumenVox C/C++ APIs, so if you have previously built an application using one of our APIs, it should be relatively simple to add support for CPA (if you have never used our API, please refer to our Introduction to ASR article).

Important: 

This article section is only intended for voice platform developers who are integrating their platform with LumenVox' C or C++ API. All others should refer to the Using CPA on a Voice Platform article, which is more commonly used.  


The following steps are needed when integrating and using CPA in a C / C++ API-based application: 

Create session

First, you should initialize your session as normal using LV_SRE_CreateClient (C API) or CreateClient (C++ API)

Configure Parameters

Set any configuration parameters to override the defaults, if needed, as you normally would for any other LumenVox API application. 

The parameter STREAM_PARM_AUTO_DECODE should always be set to 0 (disabled) when using CPA or AMD. Your application should call the appropriate "Decode" function to obtain results when ready, as described below.


Loading Grammars

Before loading any grammars, you must first register the grammars with the client using LV_SRE_RegisterGrammarForPendingStream (C API) or RegisterGrammarForPendingStream (C++ API)

This is done before listening to the stream. This does not actually load the grammars, but instead tells the speech port to expect to see these grammars get loaded (typically very shortly afterwards). This has a number of benefits (discussed in the help pages for the RegisterGrammar functions), but most importantly, it tells the LumenVox code not to assign a voice activity detection (VAD) mode until after all of the grammars have had their meta tags evaluated, ensuring that the stream is processed using the correct voice activity mode. 

Assign the Stream State Callback

Next you should set the stream state change call back handler using either LV_SRE_StreamSetStateChangeCallBack (C API) or StreamSetStateChangeCallBack (C++ API)

Create a Callback Handler

Next, you should create a handler to process callbacks specifying the current stream state of the speech port. The following stream states can be received:

STREAM_STATUS_NOT_READY 

This is the state  the port is in until StreamStart or StreamStartListening is called. There is  no special handling required by the handler for this state.

STREAM_STATUS_READY 

This is the state  the port is in once StreamStart or StreamStartListening is called. There is  no special handling required by the handler for this state.

STREAM_STATUS_BARGE_IN  

This is the state when the port detects human speech has begun during callee prediction mode or ASR mode. In  tone detection mode a fake STREAM_STATUS_BARGE_IN state is set just before  the STREAM_STATUS_END_SPEECH state to make the implementation compatible with  existing API integrations where a barge-in is always expected before end of  speech. Receiving this state callback can be used to stop playing an active TTS  prompt. 

STREAM_STATUS_END_SPEECH

In Callee prediction  or tone detection mode, once an event is detected this state callback is  called.  The LV_SRE_Decode function  needs to be called, then either WaitForDecode or WaitForEngineToIdle  needs to be called to wait for the results to be ready.


These API calls  should be done outside of the callback function in a separate thread or  undesirable behavior may be noticed if significant amount of processing is  done within the callback handler since it would lock up any other calls to  the streaming API. 

STREAM_STATUS_STOPPED 

This state is  received if StreamStop is called. For CPA there is no special handling  required in this scenario 

STREAM_STATUS_BARGE_IN_TIMEOUT

This state may be  detected in tone detection mode if no tone is detected by the time the  timeout is hit.


This state does not get called in Callee prediction. If there  is no speech heard in call prediction, the callback is called with  STREAM_STATUS_END_SPEECH state and the result from the decode would be  “UNKNOWN SILENCE”.

  For appropriate logging of this event in the callsre file as a no-input event  and future debugging via the SpeechTuner, StreamCancel needs to be called  followed by AddEvent(EVENT_NOINPUT). These calls should be made in a  different thread from the call back function.

STREAM_STATUS_END_SPEECH_TIMEOUT 

It is the max length  of speech data that can be received after Barge-in. This state is used only  in ASR. It will never be received in Call Progress Analysis.  

Listen to the audio steam

Next, your application should start listening for an stream by calling either LV_SRE_StreamStartListening (C API) or StreamStartListening (C++ API)

This function is in lieu of the older StreamStart API call. After StreamStartListening is called, audio data can be streamed, however, the audio will not get processed until the meta-tags for all the grammars are evaluated. This functionality assumes that all the grammars were correctly registered (see Loading Grammars above). 

Start streaming audio 

Next audio can be send into the API using either LV_SRE_StreamSendData (C API) or StreamSendData (C++ API)

This function is used to stream audio to the port.

Load grammars on separate thread(s) 

Next, you should load the grammars that were previously registered on their own thread.  

Once all the registered grammars are loaded, the streamed audio data will automatically start internally processing the streamed audio data.

Listen to Stream Call Backs 

Next, while streaming audio data is being received, expect the callback handler that was setup earlier to be called. Wait until the handler receives one of the valid final states.

The final states are: 

  • STREAM_STATUS_END_SPEECH
  • STREAM_STATUS_STOPPED 
  • STREAM_STATUS_BARGE_IN_TIMEOUT 
  • STREAM_STATUS_END_SPEECH_TIMEOUT 

NOTE: If STREAM_STATUS_END_SPEECH was received, process as follows; 

Get Results 

If WaitForDecode or WaitForEngineToIdle did not timeout, the results will be ready to be obtained. They may be obtained using the same functions used to get ASR results such as