Browse
 
Tools
Rss Categories

Using CPA in an Application

Reference Number: AA-02275 Views: 581 0 Rating/ Voters

This article describes how to use LumenVox Call Progress Analysis (CPA) in an application

Detection Modes 

For any given recognition, the LumenVox ASR can be in any or all of the following detection modes: 

  • Standard VAD: used for standard speech recognition. Depending on which license tier you are using, there will have different restrictions on the type and size of grammars that can be used for a recognition.
  • Call Progress Analysis: used for analyzing the length of inbound speech from an initial greeting to make a determination about whether the called party is a live human or not.
  • Tone detection: used for detecting fax, voicemail and SIT tones. 

By default, LumenVox will use the standard VAD mode, automatically applying any grammar restrictions based on the license type it acquires (e.g. if you have Tier 2 ASR licenses, it will restrict you to a total vocabulary size of 500 words across all active grammars). 

To switch between different ASR detection modes, you can add a <meta> tag in the grammar where the name is STREAM|DETECTION_MODE and set one of the following modes: 

  • VoxLite (Tier 1 or 2 ASR)
  • SpeechPort (Tier 3 ASR)
  • SLM (Tier 4 ASR)
  • CPA
  • Tone 

The first three settings enable standard VAD for ASR, and the last two settings enable CPA or tone detection respectively. These can be mixed and matched, but any grammar may only have a single STREAM|DETECTION_MODE meta tag, so to mix detection types you must load multiple grammars. 

Note:

Note that it is usually a bad idea to have CPA and ASR active at the same time, since the ASR will likely return before the CPA has made a determination. Our recommended model is to use CPA initially, and then activate ASR only after a human has been detected. Using tone detection with either CPA or ASR should be fine.

If you do not provide a STREAM|DETECTION_MODE, LumenVox sets the mode to ASR (using the LICENSE_TYPE specified in the client_property.conf config file). Thus once you are done using CPA or tone detection, you can simply load a standard speech grammar to get back into ASR-only mode, assuming the LICENSE_TYPE value is "AUTO." 

LumenVox provides sample grammars (see below), so most users will only need to load one or both of the CPA/tone detection grammars.


Sample CPA Grammar

The following grammar will enable CPA mode when using the default settings:

<?xml version='1.0'?> 
<grammar xml:lang="en-US" version="1.0" root="root" mode="voice"
          xmlns="http://www.w3.org/2001/06/grammar"
          tag-format="semantics/1.0">
  
<meta name="STREAM|DETECTION_MODE" content="CPA"/> 
<meta name="STREAM|CPA_HUMAN_RESIDENCE_TIME"  content="1800"/>
<meta name="STREAM|CPA_HUMAN_BUSINESS_TIME"   content="3000"/>
<meta name="STREAM|CPA_UNKNOWN_SILENCE_TIMEOUT"  content="5000"/>
<meta name="HUMAN_RESIDENCE_CUSTOM_INPUT_TEXT"  content="HUMAN RESIDENCE"/>
<meta name="HUMAN_BUSINESS_CUSTOM_INPUT_TEXT"  content="HUMAN BUSINESS"/>
<meta name="UNKNOWN_SPEECH_CUSTOM_INPUT_TEXT"  content="UNKNOWN SPEECH"/>
<meta name="UNKNOWN_SILENCE_CUSTOM_INPUT_TEXT"  content="UNKNOWN SILENCE"/>

<rule id="root" scope="public">
    <one-of>
        <item>HUMAN RESIDENCE<tag>out="HUMAN RESIDENCE"</tag>
        </item>
        <item>HUMAN BUSINESS<tag>out="HUMAN BUSINESS"</tag>
        </item>
        <item>UNKNOWN SPEECH<tag>out="UNKNOWN SPEECH"</tag>
        </item>
        <item>UNKNOWN SILENCE<tag>out="UNKNOWN SILENCE"</tag>
        </item>
    </one-of>
</rule>
</grammar>


Sample Tone Detection Grammar

The following grammar will enable tone detection mode using the default settings:

<?xml version='1.0'?> 
<grammar xml:lang="en-US" version="1.0" root="root" mode="voice"
          xmlns="http://www.w3.org/2001/06/grammar"
          tag-format="semantics/1.0.2006">  

<meta name="STREAM|DETECTION_MODE"  content="Tone"/> 

<meta name="AMD_CUSTOM_ENABLE" content="true"/>
<meta name="FAX_CUSTOM_ENABLE" content="true"/>
<meta name="SIT_CUSTOM_ENABLE" content="true"/> 

<meta name="AMD_CUSTOM_INPUT_TEXT" content="AMD"/>
<meta name="FAX_CUSTOM_INPUT_TEXT" content="FAX"/> 

<meta name="SIT_REORDER_LOCAL_CUSTOM_INPUT_TEXT" content="SIT REORDER LOCAL"/>
<meta name="SIT_VACANT_CODE_CUSTOM_INPUT_TEXT" content="SIT VACANT CODE"/>
<meta name="SIT_NO_CIRCUIT_LOCAL_CUSTOM_INPUT_TEXT" content="SIT NO CIRCUIT LOCAL"/>
<meta name="SIT_INTERCEPT_CUSTOM_INPUT_TEXT" content="SIT INTERCEPT"/>
<meta name="SIT_REORDER_DISTANT_CUSTOM_INPUT_TEXT" content="SIT REORDER DISTANT"/>
<meta name="SIT_NO_CIRCUIT_DISTANT_CUSTOM_INPUT_TEXT" content="SIT NO CIRCUIT DISTANT"/>
<meta name="SIT_OTHER_CUSTOM_INPUT_TEXT" content="SIT OTHER"/> 

<rule id="root" scope="public">
    <one-of>
        <item>AMD<tag>out="AMD"</tag>
        </item>
        <item>FAX<tag>out="FAX"</tag>
        </item>
        <item>SIT REORDER LOCAL<tag>out="SIT"</tag>
        </item>
        <item>SIT VACANT CODE<tag>out="SIT"</tag>
        </item>
        <item>SIT NO CIRCUIT LOCAL<tag>out="SIT"</tag>
        </item>
        <item>SIT INTERCEPT<tag>out="SIT"</tag>
        </item>
        <item>SIT REORDER DISTANT<tag>out="SIT"</tag>
        </item>
        <item>SIT NO CIRCUIT DISTANT<tag>out="SIT"</tag>
        </item>
        <item>SIT OTHER<tag>out="SIT"</tag>
        </item>
    </one-of>
</rule>
</grammar>


Checking the Result

The result from a successful CPA interaction is similar to a speech recognition result, meaning you will have raw text (what the user said) and an interpretation (what the user meant), and in most cases you will want to check the interpretation. 

Checking the interpretation is platform-dependent, so you may want to look at the instructions for specific voice platforms, below. 

By default, interpretations from CPA are simple strings. The following interpretations can be returned: 

  • HUMAN RESIDENCE: The called party was likely a live human with a short greeting. 
  • HUMAN BUSINESS: The called party was likely a live human with a medium-length greeting. 
  • UNKNOWN SPEECH: The called party was likely an answering machine (a long greeting was detected). 
  • UNKNOWN SILENCE: No greeting at all was detected. 

Similarly, for AMD, the following interpretations can be returned:

  • AMD: An answering machine or voicemail tone was detected.
  • FAX: A fax machine was detected.
  • SIT: A SIT was detected


Implementation Example

The following example is meant to give some idea of how CPA can be added to an application. 

  1. Place the outbound call. 
  2. Once the call is connected, perform an ASR recognition using the CPA and tone detection grammars. 
  3. Check the result: 
    • If it is HUMAN RESIDENCE or HUMAN BUSINESS, a human probably answered the phone. The outbound application can now ask a question, start playing a message, or transfer the called party to an agent. 
    • If the result was UNKNOWN SILENCE, you can ask the called party a question to verify if they are there, or else just move to step 4 if you want to assume they're a machine. 
    • If the result was UNKNOWN SPEECH, a machine probably answered, and you can either hang up at this point, or continue to step 4. 
  4.  Activate tone detection: 
    • Simply start a new ASR recognition with the tone detection grammar.
    • If you are also doing outbound IVR and detected a human, you may add the tone detection grammar along with your existing speech grammars. If you ever get back a result of AMD, you know you have reached a voicemail and that you should leave a message. This can be helpful in case the CPA misclassified a machine as a human.
    • If you detected a machine in step 3, you may choose to play nothing and just wait for the tone. This helps improve tone detection accuracy
  5. Continue your application as normal. 


Multiple Voice Sessions at Once

If your application platform supports multiple voice sessions at once (for example, if you have a CCXML+VXML platform, some support launching multiple VXML sessions in parallel), it can be useful to launch two separate voice sessions at the start of the call. In one session you perform CPA with a relatively short no-input timeout (e.g. 5-10 seconds). In the other, you just do tone detection with a long no-input timeout (e.g. 120 seconds).  

If the CPA session detects a human, it can launch your normal voice application. However, if the tone detection session ever detects a tone, it can exit and inform the call control application to kill any other running voice sessions and start a new voice session whose only purpose is to play a message. 

While this is more complex (and not supported by every voice platform), it can improve tone detection accuracy as it means that a single input element is collecting the tone detection audio. If you load tone detection simultaneously with other speech grammars and traverse through an application, as the application moves in and out of input elements, it may cut off the audio stream to the ASR temporarily. In other words, the voice platform performs a recognition, turns off the audio stream, moves to the next input element, and then turns the stream back on. 

This does not pose a problem for ASR, where live callers are presumably waiting for the next prompt to play before speaking, it does pose problems for tone detection. If all or part of the voicemail tone occurs while the platform is transitioning between recognition states, LumenVox may not be able to detect the tone. For this reason, we recommend launching two sessions whenever possible. 

LumenVox has some sample VoiceXML and CCXML code available for developers interested in this approach. Please contact LumenVox Support for more information.

Attachments
cpa.grxml 2.1 Kb Download File
tone_detection.grxml 3.4 Kb Download File