Ambiguous Responses

Dealing With Ambiguous Responses

(APRIL 2007) — Ambiguity arises in speech applications when callers say something that could be interpreted multiple ways.

Ambiguity may arise if a speaker says something that sounds very similar to two or more vocabulary items. It may also arise if callers only give partial information, e.g. they only say the first name of the person to whom they'd like to speak.

Using features in the LumenVox Speech Engine along with careful application design, you can deal effectively with ambiguous situations to ensure your users successfully complete their calls.

When Users Give Partial Information

If you have a call router that allows callers to be routed to a person by saying that person's name, it often happens that a caller will only say the first name. This can be fine if only one person in the directory has that first name, but what about a situation where you have three people named Jim? You might have a simple ABNF grammar that included these lines:

$JimA = Jim [Abrams]
$JimB = Jim [Brown]
$JimC = Jim [Charles]

If a caller said "Jim" the Engine would return all three matches as separate results. As a developer, you would have to decide how to handle them.

You could simply build into your application a way to handle this. You could ask your callers to simply tell you the last name of the person they wanted to speak to, and route them to that person.

Alternatively, your application could read each possibility back to the caller and let them pick. E.g. "Did you want to speak to 'Jim Abrams?'" Be aware that this can get repetitive for users, especially if there is a long list of possibilities. If the user was misrecognized and actually meant "Tim" then having to sit through three prompts asking about Jims can be tedious.

To help alleviate this problem, you could apply some logic to the situation. If Jim Brown is in sales and thus far more likely to receive incoming calls than the other names, you might want to build special logic into your application to always offer his name first.

More on Disambiguation

For more information, please refer to our white paper on Disambiguating Speech Recognition Applications.

To learn more about speech application development, watch one of our training videos.

Asterisk News

Good news for users of LumenVox on Asterisk: the latest version of the LumenVox–Asterisk connector bridge contains n–best support. The latest connector bridge is always available from our FTP site. If you have misplaced your link, contact us and we'll be happy to help you out.

We've also added a new section to our Web site, the Asterisk Speech Application Zone. This section contains free speech applications for Asterisk, and information about our regular Asterisk contest for speech developers.

How to Download the Latest Release

The most recent build of our Engine contains new acoustic models that are better at certain domains, including names, large lists of words, and at dealing with noisy audio. These improvements have resulted in up to a 3 percent improvement in some domains.

If you would like information on downloading the latest release of the LumenVox Speech Engine, please contact us. It is a free download for users with current software maintenance packages.

When Pronunciation Causes Ambiguity

Ambiguity is often caused because you have two vocabulary entries which sound similar. Imagine you had a store–locator application that prompted users for the city they were in.

If you had both "Austin" and "Boston" as possible answers, it's very possible that they might have close confidence scores in response to a user asking for either of them. Because they sound so similar, the Speech Engine might think that there is an equal probability of either being correct. Using a technology call ed n–best can help with this situation.

Normally, the Speech Engine only returns one result — the top match (the exception is when the utterance matches more than one grammar entry, as in the call router example to the left). N–best gives you access to the top result plus any other close matches.

In the Austin/Boston example, using n–best would allow you to see that your caller said something that has two good matches. You could then ask the caller to disambiguate with a confirmation prompt.

If it is relied on too heavily, n–best can cause user frustration. If there are multiple n–best answers but none of them match what the caller said, it can be tiring for the user to have to reject all the possibilities. As a rule, you should keep this sort of user querying to a minimum.

A way to help avoid this situation is through weighting grammar items. Because Boston is a much bigger city than Austin, your callers are probably more likely to ask for Boston. By weighting Boston more heavily than Austin in an SRGS grammar, the Engine will be more likely to return Boston with a higher confidence score in the case of a close match.

© 2016 LumenVox, LLC. All rights reserved.