LumenVox CPA functionality was designed to be compatible with our generic LumenVox C/C++ APIs, so if you have previously built an application using one of our APIs, it should be relatively simple to add support for CPA (if you have never used our API, please refer to our Introduction to ASR article).
Important:
This article section is only intended for voice platform developers who are integrating their platform with LumenVox' C or C++ API. All others should refer to the Using CPA on a Voice Platform article, which is more commonly used.
The following steps are needed when integrating and using CPA in a C / C++ API-based application:
Create session
First, you should initialize your session as normal using LV_SRE_CreateClient (C API) or CreateClient (C++ API)
Configure Parameters
Set any configuration parameters to override the defaults, if needed, as you normally would for any other LumenVox API application.
The parameter STREAM_PARM_AUTO_DECODE should always be set to 0 (disabled) when using CPA or AMD. Your application should call the appropriate "Decode" function to obtain results when ready, as described below.
Loading Grammars
Before loading any grammars, you must first register the grammars with the client using LV_SRE_RegisterGrammarForPendingStream (C API) or RegisterGrammarForPendingStream (C++ API)
This is done before listening to the stream. This does not actually load the grammars, but instead tells the speech port to expect to see these grammars get loaded (typically very shortly afterwards). This has a number of benefits (discussed in the help pages for the RegisterGrammar functions), but most importantly, it tells the LumenVox code not to assign a voice activity detection (VAD) mode until after all of the grammars have had their meta tags evaluated, ensuring that the stream is processed using the correct voice activity mode.
Assign the Stream State Callback
Next you should set the stream state change call back handler using either LV_SRE_StreamSetStateChangeCallBack (C API) or StreamSetStateChangeCallBack (C++ API)
Create a Callback Handler
Next, you should create a handler to process callbacks specifying the current stream state of the speech port. The following stream states can be received:
STREAM_STATUS_NOT_READY
|
This is the state the port is in until StreamStart or StreamStartListening is called. There is no special handling required by the handler for this state.
|
STREAM_STATUS_READY
|
This is the state the port is in once StreamStart or StreamStartListening is called. There is no special handling required by the handler for this state.
|
STREAM_STATUS_BARGE_IN
|
This is the state when the port detects human speech has begun during callee prediction mode or ASR mode. In tone detection mode a fake STREAM_STATUS_BARGE_IN state is set just before the STREAM_STATUS_END_SPEECH state to make the implementation compatible with existing API integrations where a barge-in is always expected before end of speech. Receiving this state callback can be used to stop playing an active TTS prompt.
|
STREAM_STATUS_END_SPEECH
|
In Callee prediction or tone detection mode, once an event is detected this state callback is called. The LV_SRE_Decode function needs to be called, then either WaitForDecode or WaitForEngineToIdle needs to be called to wait for the results to be ready.
These API calls should be done outside of the callback function in a separate thread or undesirable behavior may be noticed if significant amount of processing is done within the callback handler since it would lock up any other calls to the streaming API.
|
STREAM_STATUS_STOPPED
|
This state is received if StreamStop is called. For CPA there is no special handling required in this scenario
|
STREAM_STATUS_BARGE_IN_TIMEOUT
|
This state may be detected in tone detection mode if no tone is detected by the time the timeout is hit.
This state does not get called in Callee prediction. If there is no speech heard in call prediction, the callback is called with STREAM_STATUS_END_SPEECH state and the result from the decode would be “UNKNOWN SILENCE”. For appropriate logging of this event in the callsre file as a no-input event and future debugging via the SpeechTuner, StreamCancel needs to be called followed by AddEvent(EVENT_NOINPUT). These calls should be made in a different thread from the call back function.
|
STREAM_STATUS_END_SPEECH_TIMEOUT
|
It is the max length of speech data that can be received after Barge-in. This state is used only in ASR. It will never be received in Call Progress Analysis.
|
Listen to the audio steam
Next, your application should start listening for an stream by calling either LV_SRE_StreamStartListening (C API) or StreamStartListening (C++ API)
This function is in lieu of the older StreamStart API call. After StreamStartListening is called, audio data can be streamed, however, the audio will not get processed until the meta-tags for all the grammars are evaluated. This functionality assumes that all the grammars were correctly registered (see Loading Grammars above).
Start streaming audio
Next audio can be send into the API using either LV_SRE_StreamSendData (C API) or StreamSendData (C++ API)
This function is used to stream audio to the port.
Load grammars on separate thread(s)
Next, you should load the grammars that were previously registered on their own thread.
Once all the registered grammars are loaded, the streamed audio data will automatically start internally processing the streamed audio data.
Listen to Stream Call Backs
Next, while streaming audio data is being received, expect the callback handler that was setup earlier to be called. Wait until the handler receives one of the valid final states.
The final states are:
- STREAM_STATUS_END_SPEECH
- STREAM_STATUS_STOPPED
- STREAM_STATUS_BARGE_IN_TIMEOUT
- STREAM_STATUS_END_SPEECH_TIMEOUT
NOTE: If STREAM_STATUS_END_SPEECH was received, process as follows;
Get Results
If WaitForDecode or WaitForEngineToIdle did not timeout, the results will be ready to be obtained. They may be obtained using the same functions used to get ASR results such as