Rss Categories

Introduction to CPA

Reference Number: AA-01605 Views: 7274 0 Rating/ Voters

Call Progress Analysis (CPA) is a module for the LumenVox Automatic Speech Recognizer that is used to detect whether the called party in an outbound call is a live person or a machine (e.g. voicemail or fax). It can be integrated into any application that can make use of the LumenVox ASR, including any application that uses the LumenVox API or the Media Resource Control Protocol (MRCP). It is specifically designed to work well with VoiceXML applications that can make use of Call Control XML (CCXML), though it can work with any application that supports LumenVox ASR.

How does CPA work?

LumenVox CPA incorporates two separate technologies to deliver reliable information about the state of an outbound call:

VAD-based Call Progress Analysis

The advanced Voice Activity Detection (VAD), used by our ASR is leveraged by CPA to detect the presence and length of human speech. This allows CPA to determine that the called party is speaking, and by measuring the length of initial speech, return a hypothesis or prediction about whether a called party is a person or voicemail greeting. This combination of observations generates one of the four prediction results described below.

Human Residence

Most humans on a residential or personal mobile phone will answer with a relatively short greeting such as

"Hello? <silence>"

This can be seen in the following representation, which shows where the timing for these short Human Residence type responses are statistically most likely to be within 1800 ms (CPA_HUMAN_RESIDENCE_TIME)

You can see that the length of speech here is fairly brief, only a few hundred milliseconds and is less than the specified CPA_HUMAN_RESIDENCE_TIME (of 1.8 seconds), so this is classified as HUMAN RESIDENCE, which is returned.

Human Business

Most commercial or business calls are answered with slightly longer utterances, such as:

"Thanks for calling XYZ company, how may I help you? <pause>."

As can be seen in this representation:

Since the amount of speech shown here is around 2 seconds in length, it is clearly greater than the CPA_HUMAN_RESIDENCE_TIME (1.8 seconds) and less than the maximum CPA_HUMAN_BUSINESS_TIME (3 seconds), this is classified as HUMAN BUSINESS in the result.

Unknown Speech

Contrast the previous two Human Residence and Human Business classifications with the outgoing messages on voicemail systems which tend to last for several seconds:

"Hi, this is John Smith with XYZ Company. I'm away from my desk right now, so please leave me a message after the beep. <pause>."

As you can see here, the amount of speech is well over 3 seconds, which is greater than the default maximum expected for CPA_HUMAN_BUSINESS, so this is classified as UNKNOWN SPEECH. This may be the start of an answering machine message (possibly followed by a beep), or some other recorded announcement, however once the CPA algorithm has calculated that more than enough speech has been detected to determine the UNKNOWN SPEECH classification, the response is returned. To be clear, it does not wait for all of the message to complete, it returns the answer as soon as it can to allow the application to determine how it or if it should proceed with the call.

Unknown Silence

There is one more possibility when working with CPA, which is when no human speech is detected.

The UNKNOWN SILENCE response is returned when no human speech is detected before the CPA_UNKNOWN_SILENCE_TIMEOUT is reached, which in this case is the default of 5 seconds (5000 ms).

If any speech is detected before this timeout occurs, one of the other classifications will be returned. In other words, this classification is only returned when no human speech is detected at all in the audio stream.

It is left up to the application to determine how to handle this situation, which is often done by prompting to see if anyone is on the line, or simply hanging up the call.

Tone Detection

Tone detection algorithms are capable of detecting a number of telephony events, including Special Information Tones (SIT), fax machines, and the tones played by voicemails / answering machines. This serves dual purposes: fax machines or SITs can be detected quickly, and if an answering machine tone is detected, the outbound application can leave a message immediately following the beep. This leads to natural, professional-sounding messages.

To be clear, there are essentially 2 different situations within a call where tones or beeps can typically occur, both of which are handled by LumenVox Tone Detection algorithms.

At the beginning of a call, there may be a Special Information Tone, a Fax Tone or a Busy Tone for example - all of these indicate some issue connecting the call to a human, so a response is returned to the application to let it know, so that it can deal with the situation (likely hang up). There is another option for a tone within a call,  which may come after an answering machine message, so this tone or beep may be much later in the call. Applications should be aware of these things and react accordingly.

Various tone detection algorithms can be selectively enabled or disabled within the grammar as needed. These various tones and options are described below

Special Information Tones (SIT)

Telephony systems sometimes generate a series of three rising tones of differing frequencies, followed by a pre-recorded message to indicate some issue with the call, such as

"<three rising tones> We're sorry, your call cannot be completed as dialed. Please check the number and try again"

There are a seven of these standardized Special Information Tones (SIT) that may be used, as described in our Grammars in CPA and AMD article. Generally the recorded message is customized to the telephony provider. LumenVox Tone Detection can classify all of these tones and return the appropriate SIT response to the application.

To enable SIT tone detection, be sure to mark the SIT_CUSTOM_ENABLE setting as "true" in your grammar

<meta name="SIT_CUSTOM_ENABLE"     content="true"/>

Special Information Tone

Fax Tones

Some numbers are associated with fax machines, and clearly can't interact with voice processing systems, so the ability to detect a fax tone early in the call is important to save time.

To enable FAX tone detection, be sure to mark the FAX_CUSTOM_ENABLE setting as "true" in your grammar

<meta name="FAX_CUSTOM_ENABLE"     content="true"/>

Fax Tone

Busy Tones

Introduced in version 19.1.100, a new option to detect the presence of Busy tone signals were added, allowing applications with access to pre-connect audio the option of quickly detecting that a called number is currently busy, thus saving resources.

To enable BUSY tone detection, be sure to mark the BUSY_CUSTOM_ENABLE setting as "true" in your grammar

<meta name="BUSY_CUSTOM_ENABLE"     content="true"/>

Busy Tone

Busy tones vary from country to country, however they are in the 400 to 680 Hz range and are seen within the first couple of seconds of the audio stream (again, assuming the system has access to the pre-connect audio).

Note that if Answering Machine Detection (AMD) and Busy detection are enabled at the same time, some answering machine tones are in the same frequency range as Busy tones, so the two may be confused, however the Busy detection algorithm is only active within the first 7.5 seconds of the audio stream and will not be returned after this, which should help discriminate between the two tone types.

Answering Machine Beep

Answering machine tones or beeps generally come after some prerecorded message. These usually appear much later in the call than the other tones described above. Their location is dependent on the length of the message, but our detection algorithm is designed to detect any beep within the audio stream if the option is enabled.

To enable AMD tone detection, be sure to mark the AMD_CUSTOM_ENABLE setting as "true" in your grammar

<meta name="AMD_CUSTOM_ENABLE"     content="true"/>

Answering Machine

Implementing CPA

To start using CPA, install it by following the guide we have specifically for Installing CPA.

CPA does require its own license, which is separate from any ASR licenses. It is also possible to use CPA without any ASR licenses, or to mix them and even switch licenses within an application.

If you are new to LumenVox, review our Licensing overview to understand how to install and use licenses.

See one of our Using CPA in an Application or Using CPA on a Voice Platform articles for different methods of using and deploying CPA, depending on how you will be interacting with it.

We recommend using grammar-based settings when working with CPA, which allows ultimate flexibility with the option to change settings from one session (or call) to the next, or different settings for different applications or clients in a multi-tenant environment. By using grammars to manage these settings, it becomes much more convenient to manage these settings in a unified manner. See our Grammars in CPA and AMD article for more details.