Converting DTMF to Speech Part 1
(Dual Tone Multi Frequency), also called TouchTone, is a very common method for interacting
with Interactive Voice Response (IVR) systems. It's basically when users use the telephone's
keypad to make choices. Speech recognition offers an alternative method for interacting with
IVR systems. We're going to get into why you would want to convert a DTMF application to use
speech recognition. We'll also take a look at example applications using DTMF and how the call
flow might be a little different using speech recognition. Finally, we'll look at some of the
prerequisites, some of the things to consider prior to getting started.
- It makes it possible to be more engaging. Even if a good voice actor is used with
DTMF applications, users are still simply dialing keys. It will not feel like the user is
actually interacting with the voice persona. With speech recognition, the users are able
to talk back to the voice persona and therefore feel more engaged. More personality and
friendliness can come across.
- Speech recognition also provides an opportunity to extend and enhance your company's
brand by making your customers feel more attached because they can talk to the application.
- Speech recognition gives you the ability to do certain things that would be difficult
to do with DTMF like providing more options, and more global options. Global options can
be performed from anywhere within the system. For example, in DTMF a global option would
be pressing zero for the operator, pressing pound for the main menu, or press nine to
hear the list of options again. With DTMF there are certain inherent limitations such as
the number of keys on the telephone keypad. However, with speech recognition there are no
such limitations. You may allow the user to simply say "go back," to back up one
menu, or they can say "main menu" to get to the main menu at any time. More
options may include "operator" or "repeat instructions," to hear a
list of all the instructions again. DTMF tends to use a lot of hierarchical nested menus,
where you would have to hear one group of menu choices in order to get to the next menu.
With voice response you can allow the user to jump to any menu in the application by
simply stating the name of the menu they would like to use.
- Speech also does locations very well, which DTMF does not do well. With DTMF, you
would have to type in a ZIP code or spell the first couple of letters in a city name. It's
much easier to provide a prompt that asks, "Tell me the city and state your looking
for," the caller would simply say the location and the information would be provided.
- DTMF also does not do call routers particularly well. With DTMF the caller is asked
to type in the letters of a person's first or last name. This may present a problem if
the caller doesn't know how to spell the name. Another issue may occur if the person has
a common name, perhaps like John Smith. The caller types in "Smith" and
receives a list many people with the last name of Smith. It would be much easier to
simply say "John Smith" and receive the information for John Smith only.
To illustrate the difference between DTMF and speech, let's say that we have an umbrella
store. The user calls into our IVR and gets the DTMF application.
Application: "Thank you for calling Umbrellas-R-Us. To get store
locations, press 1, to get hours of operation, press 2, to order umbrellas over the phone,
Caller: Inputs 3, to order an umbrella by phone.
Application: "We have 5 choices of colors for umbrellas. Press 1
for red, 2 for blue, 3 for green, 4 for yellow, and 5 for purple."
Caller: Inputs 5 to order a purple umbrella.
Application: "We have 2 sizes of umbrellas, press 1 for compact
and 2 for full-size."
Caller: Inputs 2 for full size.
At this point the task is not particularly painful, but let's perform the same task
The speech application answers the call:
Application: "Thank you for calling Umbrellas-R-Us. If you would
tell me what you wish to do, we can respond immediately. Would you like store locations,
store hours or would you like to order an umbrella?"
Caller: "Order an umbrella."
Application: "Simply tell me the color you would like. Would you
like a red, blue, yellow, green, or purple umbrella?"
Application: "Compact or full size?"
Caller: "Full size."
The first thing we notice is that we have eliminated the need for the "press one for,
press two for," types of prompts, which take up time and can become repetitive and
annoying to a caller. Also, with DTMF as a caller you find yourself attempting to memorize
listed options until you here the choice for which you are looking. With speech, you're
prompted for a choice and you simply speak your selection.
So over all, speech allows you to get through the call process achieving your goal more
expediently and affectively.
Before you start
- Understand how speech will help you users. First and foremost, before making changes
to an application you'll want to ask the question: how will this help my users? You don't
want to add speech just for the sake of saying you now have speech. You want to add speech
to actually help your callers. With the benefits we have covered such as more global
options, quicker call completions, flatter and less hierarchical menus, speech actually
brings a measurable benefit to the table.
- What you don't want to do is to just retrofit your existing application. You don't want
to copy DTMF with speech. For example: "Press or say one for red, press or say two for
blue." This doesn't actually help the user, there is no benefit over DTMF with this
type of application for speech. What you'd want to do is to restructure the application,
and not give in to pressure to duplicate DTMF.
Another thing you'll want to do is to start simple if at all possible. Don't take a very
complex application and try and turn it into a speech application your first time out. There
are aspects of the speech design that are different than working with DTMF. It may take a
little longer to develop, tune, and test. If you have a smaller application it may be best to
start with it. You may also want to consider enabling a portion of your application at first.
This will enable you to get some idea as to how speech development software works, how to
trouble shoot it, how users interact with it and then from there start considering revamping
the entire application.