Tools

CPA Technical Specifications

Reference Number: AA-01606 Views: 3497

0 Rating/ Voters

LumenVox' Call Progress Analysis software

CPA Prediction Algorithm

The CPA prediction mechanism differentiates between a residential customer, a business customer and answering machine message by using LumenVox' ASR capabilities as well as traditional time-based CPA predictors.

The prediction algorithm listens to the incoming audio for human speech using the LumenVox’ advanced Voice Activity Detection (VAD) scheme. Once audio is end pointed the algorithm attempts to classify the called party based on the length of time it takes the called party to stop speaking.

Consider the following ways a called party might answer the phone:

"Hello? <silence>"
"Hi, this is John. <silence>"
"Thank you for calling XYZ corporation, how may I help you? <silence>"
"This is John Smith at XYZ corporation. I'm away from my desk right now, but if you leave me a message with your name and number I'll get back to you as soon as possible. <silence>"

By measuring the amount of time between the start of speech and the silence, the system can return a hypothesis about whether the caller is a human residence, a human business, or a machine. The other possible case is that the called party simply does not say anything, in which case the CPA prediction can return that fact.

The following table illustrates the amount of time CPA must detect speech or silence before returning a result. These are default values, and can be modified:

Result	Audio Length	Description
Human Residence	< 1800 ms of speech	Very short greeting indicates a strong likelihood of having reached a human residence.
Human Business	> 1800 ms of speech but < 3000 ms	Moderate-length greeting indicates likelihood of having reached a human business.
Answering Machine Prediction	> 3000 ms of speech	Likely an answering machine or other automated system. This may be confirmed using a confirmation prompt.
Unknown Silence	> 5000 ms of silence	No speech was detected for a prolonged period of time; it is likely there is a human who is not speaking on the line.

If the classifier is unable to reliably predict the callee based on the ASR result, it falls back to a heuristic length of speech based approach to classifying the callee. Table 1 shows the default values for predicting the callee based on length of speech. The Description column describes the circumstances under which each of these states would be predicted.

Note that CPA is often used in parallel with Tone Detection, sometimes called Answering Machine Detection or AMD. When platforms have the ability to process audio streams in two independent methods, one listening for speech as described above (this is the CPA method) and an additional process listening for tones, such as SIT, fax, busy or answering machine tones or beeps. When used together, these methods provide a highly accurate detection of whether a human or machine answered the call, allowing the IVR or speech application to decide how to respond in each case, using resources more effectively and more productively.