Tools

CPA - Functional Overview

Reference Number: AA-02272 Views: 8355

0 Rating/ Voters

LumenVox CPA combines two discrete technologies into a single software module:

Call Progress Analysis: using the advanced voice activity detection (VAD) algorithms employed by the LumenVox Automatic Speech Recognizer (ASR), LumenVox CPA is able to accurately determine whether a person is speaking or not. When people answer the phone, they tend to give relatively short greetings ("Hello?" or "Hello, this is John Smith speaking"). When answering machine systems answer a call, they tend to provide long greetings ("Hi, this is John Smith and I'm unable to come to the phone right now…"). LumenVox VAD enables the CPA algorithm to accurately determine when a greeting has stopped and finished, and by measuring the length of that greeting, to accurately make a determination as to whether the called party is a live person or a machine.
Tone detection: through detecting tones, the LumenVox CPA is also able to provide more information about the status of an outbound call. This can be used right when a call is connected, e.g. to detect a fax machine or a Special Information Tone (SIT). It can also be used later in a call, such as to detect the presence of a voicemail or answering machine tone, allowing a message to be left for the called party.

The primary use cases for CPA are outbound systems of various sorts. Here are some examples:

Outbound Interactive Voice Response (IVR): Applications may place outbound calls and, if a live person is detected, immediately puts the user into an IVR to perform a task, such as paying an overdue bill. If a machine is detected, the application may wait until the voicemail tone is detected and then leave a message at the appropriate time.
Outbound Messaging: Any application that needs to provide outbound messaging, such as emergency notification systems, can place outbound calls and use CPA to determine whether a human has answered. If so, the message can be delivered to the called party. If not, the system can wait for the tone to be detected and leave the notification as a voicemail message

The LumenVox CPA is only intended to be used once a call is connected, and thus please note that the LumenVox CPA does not perform the following functions:

Detection of different ring types.
Detection of busy signals (including "fast busy").
Detection of ringback music or color ring tones.
Detection of DTMF key-presses.

Technical Overview

LumenVox has engineered CPA to function similarly to a standard ASR interaction. This allows CPA to be used seamlessly with most voice platforms that can interoperate with the LumenVox ASR, even if that platform was not designed to support CPA. The general pattern in using CPA is as follows:

A connection between the voice platform and the LumenVox ASR is established:

This may be over the Media Resource Control Protocol (MRCP) by making an RTSP SETUP request (MRCPv1) or by sending a SIP INVITE (MRCPv2) with the appropriate session information.
It may also be by an application making a CreateClient call into the LumenVox API.
VoiceXML developers can generally just allow their voice platform to handle this piece.

Grammars are loaded and activated, as is normally the case with speech recognition. However, special meta tags in the grammar indicate that this session should use CPA (CPA can be used in conjunction with ASR).

Over MRCP, this would generally take the form of a DEFINE-GRAMMAR message.
Using our API, the grammar must first be registered using RegisterGrammarForPendingStream, then loaded with LoadGrammar, then activated with ActivateGrammar.
VXML developers simply load the grammar as they would any speech grammar, generally as part of a form that performs CPA.

The LumenVox ASR is directed to begin processing audio:

Over MRCP, this is a RECOGNIZE message.
Using our API, StreamStartListening must be called.
In VXML this is implied by the existence of forms and/or fields that use ASR.

Once the ASR/CPA has finished analyzing the audio, it will make a determination (e.g. "human residence" or "fax machine" or "No input") and return that result to the voice platform as a normal speech result.

a. The possible returns can be customized in the grammar; for more on that see below.

The application evaluates the result and acts accordingly.