Rss Categories


Reference Number: AA-02456 Views: 7246 0 Rating/ Voters
Added in 19.2.100

As part of a range of improvements and changes to CPA introduced in LumenVox version 19.1, a new setting CPA_MAX_TIME_FROM_CONNECT was added to make CPA more easily understood and utilized within applications requiring stricter timing limits.

Some of the related changes in this release make settings more robust and functionality clearer, however this new option allows users to specify a single setting to control all other settings relating to CPA by auto-scaling other settings as needed, based upon the maximum length of time before an application receives a response from Call Progress Analysis prediction.

Feedback from some high volume users helped us understand that many implementations often require a predictive response from CPA within a certain (sometimes very short) time period. Users were sometimes confused about how to adjust settings to accommodate these application requirements, which were imposed by business logic elsewhere or even legislation governing automated dialing systems.

Whatever the reason, we decided to come up with a unified method of controlling CPA with this new auto-scaling option, which provides classifications as best as is possible, based on the length of audio is has available.

Normally, when length of audio is not a factor, the predictive capabilities of the LumenVox CPA functionality are described in our CPA Technical Specifications article - for this predictive classification to work as originally designed, it was envisioned that the length of available audio would be at least 5 seconds (5000 ms). This amount of time is necessary to accurately classify HUMAN RESIDENCE, HUMAN BUSINESS as well as UNKNOWN SPEECH or UNKNOWN SILENCE.

In order to accommodate providing predictive results within strict time limits, possibly with shorter amounts of audio, some of these classifications may be limited or not possible at all (for instance, it is not possible to distinguish between HUMAN RESIDENCE and HUMAN BUSINESS when there is less than 1.8 seconds (1800 ms) of audio available.

<meta name="STREAM|CPA_MAX_TIME_FROM_CONNECT" content="5000"/>

When the new CPA_MAX_TIME_FROM_CONNECT setting is specified in a grammar, as shown above, it overrides the other CPA settings listed here:


Instead, an internal auto-scaling is applied to these, based on the value specified for this CPA_MAX_TIME_FROM_CONNECT setting, as described here:


This will be set to the lower of either CPA_MAX_TIME_FROM_CONNECT or 3000 (the optimal default value for CPA_HUMAN_BUSINESS_TIME). This means that for values of CPA_MAX_TIME_FROM_CONNECT greater than 3000ms, this will be set to return UNKNOWN SPEECH if the human speech duration detected is greater than this and HUMAN BUSINESS if human speech duration is less than this (unless otherwise classified as HUMAN RESIDENCE).

For cases where CPA_MAX_TIME_FROM_CONNECT is less than 3000, then this setting will use that same value, which is the maximum possible speech that is detectable, so anything greater than this value will be considered UNKNOWN SPEECH. Also important to note here, because the CPA_MAX_TIME_FROM_CONNECT returns a CPA response no later than the specified time, it is possible that there is ongoing speech when this time is reached. In such cases, it is impossible to further classify whether the speech is human or machine, so the only appropriate response in that case is to return UNKNOWN SPEECH (which is often treated as a Machine by applications)

If it is possible to detect end of speech (including the required VAD_EOS_DELAY), then a human classification (Residence or Business) is possible before the CPA_MAX_TIME_FROM_CONNECT time is reached.


This will be set to the lower of either CPA_HUMAN_BUSINESS_TIME or 1800ms (the optimal CPA_HUMAN_RESIDENCE_TIME).

In situations where CPA_MAX_TIME_FROM_CONNECT is set to a low value (lower than 1800ms), then by definition, CPA_HUMAN_BUSINESS_TIME would be set to the same value, meaning that HUMAN BUSINESS classification would not be possible, only HUMAN RESIDENCE would be possible if end of speech is detected (including the required VAD_EOS_DELAY) before the CPA_MAX_TIME_FROM_CONNECT has been reached. In this case, since timing is limited, applications operating with such restrictions could better think of this classification as either HUMAN (less than CPA_HUMAN_RESIDENCE_TIME or MACHINE (equal to, or greater than this time)

Please also note that we neither recommend, nor discourage operating CPA with lower timing values and are simply describing the effects of doing so here. CPA accuracy will always be better by allowing at least 5000ms to perform classification, however we realize that not all applications want to process that amount of audio before getting a classification prediction from CPA.


CPA_UNKNOWN_SILENCE_TIMEOUT is set to the same value as CPA_MAX_TIME_FROM_CONNECT. If any human speech is detected before this timeout is reached, then one of the other classifications will be returned. If no human speech has been detected when this timeout is reached, then UNKNOWN SILENCE will be returned.

This simplified set of responses is what some applications desire, since there are certain implementations that need to get a response, regardless of the prediction algorithm's abilities within a limited number of milliseconds. This new setting allows predictable responses regardless of the values specified.

Note that any value lower than 1000 ms for CPA_MAX_TIME_FROM_CONNECT will be automatically set to the minimum value of 1000 ms, which is the shortest period of time the algorithm can reliably make a determination


It is important to note that although the minimum value of 1000 ms mentioned above is supported and works well, from a practical perspective, this may be below the reasonable limit for many applications, so please choose your settings wisely.


When working with CPA, either using the new CPA_MAX_TIME_FROM_CONNECT or not, the influence of VAD_EOS_DELAY on accuracy is significant.

This setting defines how much silence after someone stops talking is allowed before triggering an end-of-speech event, which in turn generates a CPA result. Having some amount of silence after someone stops talking is necessary to allow for the natural gaps and pauses between words or sentences. Not allowing enough time for these silence gaps can lead to premature triggering of the end-of-speech event. The duration of speech between start-of-speech and end-of-speech is critical to determining how much human speech there actually was, and this is used to determine whether there was a short utterance (Residence), a medium utterance (Business) or a lengthier utterance (Unknown Speech or Machine message).

As can be seen in this diagram, the purple periods of silence are normal and expected within natural speech. The End of Speech (Silence) Delay (VAD_EOS_DELAY) is used to check whether silence is one of these gaps between words, or if the person has finished speaking, at which point a result can be generated.

Specifying too much of a delay here to accommodate longer silence "gaps" helps the CPA algorithm a lot, since it can better categorize the length of utterance, however this comes at the cost of more of a delay for the human that answered the call.

To specify the default 1200ms (1.2 seconds) of VAD_EOS_DELAY, add the following to your CPA grammar.

<meta name="STREAM|VAD_EOS_DELAY" content="1200"/>

We have found that this value works well for most situations, so we recommend using this if you are unsure of which value to use.

This default value (1200ms) will automatically be used if the above setting is not explicitly specified.

For advanced users, or those more sensitive to machine-to-human misclassifications, such as call centers, changing this to some higher value of 1500ms, for example, can yield significantly better CPA accuracy, at the expense of an additional delay observed by the person answering the call. The difference in this example (default 1200ms to higher value of 1500ms) would be 300ms of additional delay, or a total of 1500ms (1.5 seconds) before being routed to an agent or getting some other response from the system.

Application developers should consider how much delay should be acceptable for humans being called by their applications, noting that machines don't care about these delays. There is a balance and trade-off between CPA accuracy and this delay.


Prior to version 19.1, it was important for applications to ensure these two timeout settings were set to values that would never be reached, since reaching them would cause unwanted responses from CPA.

From version 19.1 onward, when working with CPA, these two settings are assigned values of 10,000ms (10 seconds) greater than the largest CPA setting, rendering them effectively disabled. This prevents those particular timeouts from affecting the CPA algorithm, and instead allowing the CPA timing settings to dominate the results. These values are applied whether using the new CPA_MAX_TIME_FROM_CONNECT or not.

Using this setting

This new CPA_MAX_TIME_FROM_CONNECT setting is designed to be used for all types of CPA applications, from those requiring very short responses from the system, to more conventional requirements.

This idea behind this setting was to make access to CPA less complex and therefore easier to implement for all users and all use-cases where an imposed maximum time limit before a CPA response was generated needed to be imposed.

It is important to note that by imposing a maximum time limit on CPA by using this setting, it limits the algorithm's ability to perform certain classifications accurately. As such, the accuracy of using this stricter approach is also not as good as when using the default timing approach (when NOT using CPA_MAX_TIME_FROM_CONNECT). We recommend that this new setting is used as the method of configuring CPA only when needing to impose a time limit on getting CPA detection algorithm results.

INPUT TEXT and Semantic Interpretation

When working with the new CPA_MAX_TIME_FROM_CONNECT setting, users are still free to customize their INPUT TEXT and Semantic Interpretation values as described in our Grammars in CPA and AMD article.