Tools

Using CPA in an Application

Reference Number: AA-02275 Views: 11002

0 Rating/ Voters

This article describes how to use LumenVox Call Progress Analysis (CPA) in an application

Detection Modes

For any given recognition, the LumenVox ASR can be in any or all of the following detection modes:

Standard VAD: used for standard speech recognition. Depending on which license tier you are using, there will have different restrictions on the type and size of grammars that can be used for a recognition.
Call Progress Analysis: used for analyzing the length of inbound speech from an initial greeting to make a determination about whether the called party is a live human or not.
Tone detection: used for detecting fax, voicemail and SIT tones.

By default, the standard VAD mode will be used, automatically applying any grammar restrictions based on the license type it acquires (e.g. if you have Tier 2 ASR licenses, it will restrict you to a total vocabulary size of 500 words across all active grammars).

To switch between different ASR detection modes, you can add a <meta> tag in the grammar where the name is STREAM|DETECTION_MODE and set one of the following modes:

VoxLite (Tier 1 or 2 ASR)
SpeechPort (Tier 3 ASR)
SLM (Tier 4 ASR)
CPA
Tone

The first three settings enable standard VAD for ASR, and the last two settings enable CPA or tone detection respectively. These can be mixed and matched, but any grammar may only have a single STREAM|DETECTION_MODE meta tag, so to mix detection types you must load multiple grammars.

Note:

Note that it is usually a bad idea to have CPA and ASR active at the same time, since the ASR will likely return before the CPA has made a determination. Our recommended model is to use CPA initially, and then activate ASR only after a human has been detected. Using tone detection in parallel with either CPA or ASR should be fine.

If you do not provide an explicit STREAM|DETECTION_MODE in the grammar, the mode will be set to ASR (using the LICENSE_TYPE specified in the client_property.conf config file). Thus once you are done using CPA or tone detection, you can simply load a standard speech grammar to get back into ASR-only mode, assuming the LICENSE_TYPE value (in client_property.conf) is set to "AUTO."

LumenVox provides sample grammars for CPA and AMD as described in our Grammars in CPA and AMD article. Most users will only need to load one or both of the CPA / tone detection grammars.

Checking the Result

The result from a successful CPA interaction is similar to a speech recognition result, meaning you will have raw text (what the user said) and an interpretation (what the user meant), and in most cases you will want to check the interpretation.

Checking the interpretation is platform-dependent, so you may want to look at the instructions for specific voice platforms, below.

By default, interpretations from CPA are simple strings. The following interpretations can be returned:

HUMAN RESIDENCE: The called party was likely a live human with a short greeting.
HUMAN BUSINESS: The called party was likely a live human with a medium-length greeting.
UNKNOWN SPEECH: The called party was likely an answering machine (a long greeting or recorded message was detected).
UNKNOWN SILENCE: No greeting or speech at all was detected.

Similarly, for AMD, the following interpretations can be returned:

AMD: An answering machine or voicemail tone was detected.
FAX: A fax machine was detected.
SIT: A SIT was detected
BUSY: A busy tone was detected

Note that for AMD, there is another possible response from the detection algorithm, when none of the above classifications are met before a timeout occurs - this is represented as a NO INPUT event, which in versions before 19.1 would return the Raw Text "SPEECH", which was technically incorrect, so beginning with version 19.1, this is returned as a more appropriate "NO INPUT".

Implementation Example

The following is meant to provide a simple example of how CPA can be added to an application.

Place the outbound call.
Once the call is connected, perform an ASR recognition using the CPA and tone detection grammars.
Check the result:

If it is HUMAN RESIDENCE or HUMAN BUSINESS, a human probably answered the phone. The outbound application can now ask a question, start playing a message, or transfer the called party to an agent.
If the result was UNKNOWN SILENCE, you can ask the called party a question to verify if they are there, or else just move to step 4 if you want to assume they're a machine.
If the result was UNKNOWN SPEECH, a machine probably answered, and you can either hang up at this point, or continue to step 4.

Activate tone detection:

Simply start a new ASR recognition with the tone detection grammar.
If you are also doing outbound IVR and detected a human, you may add the tone detection grammar along with your existing speech grammars. If you ever get back a result of AMD, you know you have reached a voicemail and that you should leave a message. This can be helpful in case where CPA misclassified a machine as a human.
If you detected a machine in step 3, you may choose to play nothing and just wait for the tone. This helps improve tone detection accuracy

Continue your application as normal.

Multiple Voice Sessions at Once

If your application platform supports multiple voice sessions at once (for example, if you have a CCXML+VXML platform, some support launching multiple VXML sessions in parallel), it can be useful to launch two separate voice sessions at the start of the call. In one session you perform CPA with a relatively short timeout (e.g. 5-10 seconds). In the other, you just enable tone detection with a long no-input timeout (e.g. 120 seconds).

If the CPA session detects a human, it can launch your normal voice application. However, if the tone detection session ever detects a tone, it can exit and inform the call control application to kill any other running voice sessions and start a new voice session whose only purpose is to play a message.

While this is more complex (and not supported by every voice platform), it can improve tone detection accuracy as it means that a single input element is collecting the tone detection audio. If you load tone detection simultaneously with other speech grammars and traverse through an application, as the application moves in and out of input elements, it may cut off the audio stream to the ASR temporarily. In other words, the voice platform performs a recognition, turns off the audio stream, moves to the next input element, and then turns the stream back on.

This does not pose a problem for ASR, where live callers are presumably waiting for the next prompt to play before speaking, it does pose problems for tone detection. If all or part of the voicemail tone occurs while the platform is transitioning between recognition states, LumenVox may not be able to detect the tone. For this reason, we recommend launching two sessions whenever possible.

LumenVox has some sample VoiceXML and CCXML code available for developers interested in this approach. Please contact LumenVox Support for more information.

Attachments

	cpa.grxml	2.1 Kb	Download File
	tone_detection.grxml	3.4 Kb	Download File