Converting DTMF to Speech Part 1 Video



  • The first of this series goes over the reasons for converting your DTMF application to speech recognition. This includes an in-depth sample application to show the difference in the call flow for your callers, as well as things to consider before you begin your conversion to speech project.
  • RUNTIME 10:08


Video Transcription

Converting DTMF to Speech Part 1


(Dual Tone Multi Frequency), also called TouchTone, is a very common method for interacting with Interactive Voice Response (IVR) systems. It's basically when users use the telephone's keypad to make choices. Speech recognition offers an alternative method for interacting with IVR systems. We're going to get into why you would want to convert a DTMF application to use speech recognition. We'll also take a look at example applications using DTMF and how the call flow might be a little different using speech recognition. Finally, we'll look at some of the prerequisites, some of the things to consider prior to getting started.

Why Speech?

  • It makes it possible to be more engaging. Even if a good voice actor is used with DTMF applications, users are still simply dialing keys. It will not feel like the user is actually interacting with the voice persona. With speech recognition, the users are able to talk back to the voice persona and therefore feel more engaged. More personality and friendliness can come across.
  • Speech recognition also provides an opportunity to extend and enhance your company's brand by making your customers feel more attached because they can talk to the application.
  • Speech recognition gives you the ability to do certain things that would be difficult to do with DTMF like providing more options, and more global options. Global options can be performed from anywhere within the system. For example, in DTMF a global option would be pressing zero for the operator, pressing pound for the main menu, or press nine to hear the list of options again. With DTMF there are certain inherent limitations such as the number of keys on the telephone keypad. However, with speech recognition there are no such limitations. You may allow the user to simply say "go back," to back up one menu, or they can say "main menu" to get to the main menu at any time. More options may include "operator" or "repeat instructions," to hear a list of all the instructions again. DTMF tends to use a lot of hierarchical nested menus, where you would have to hear one group of menu choices in order to get to the next menu. With voice response you can allow the user to jump to any menu in the application by simply stating the name of the menu they would like to use.
  • Speech also does locations very well, which DTMF does not do well. With DTMF, you would have to type in a ZIP code or spell the first couple of letters in a city name. It's much easier to provide a prompt that asks, "Tell me the city and state your looking for," the caller would simply say the location and the information would be provided.
  • DTMF also does not do call routers particularly well. With DTMF the caller is asked to type in the letters of a person's first or last name. This may present a problem if the caller doesn't know how to spell the name. Another issue may occur if the person has a common name, perhaps like John Smith. The caller types in "Smith" and receives a list many people with the last name of Smith. It would be much easier to simply say "John Smith" and receive the information for John Smith only.

To illustrate the difference between DTMF and speech, let's say that we have an umbrella store. The user calls into our IVR and gets the DTMF application.

Application: "Thank you for calling Umbrellas-R-Us. To get store locations, press 1, to get hours of operation, press 2, to order umbrellas over the phone, press 3."

Caller: Inputs 3, to order an umbrella by phone.

Application: "We have 5 choices of colors for umbrellas. Press 1 for red, 2 for blue, 3 for green, 4 for yellow, and 5 for purple."

Caller: Inputs 5 to order a purple umbrella.

Application: "We have 2 sizes of umbrellas, press 1 for compact and 2 for full-size."

Caller: Inputs 2 for full size.

At this point the task is not particularly painful, but let's perform the same task using speech.

The speech application answers the call:

Application: "Thank you for calling Umbrellas-R-Us. If you would tell me what you wish to do, we can respond immediately. Would you like store locations, store hours or would you like to order an umbrella?"

Caller: "Order an umbrella."

Application: "Simply tell me the color you would like. Would you like a red, blue, yellow, green, or purple umbrella?"

Caller: "Purple."

Application: "Compact or full size?"

Caller: "Full size."


The first thing we notice is that we have eliminated the need for the "press one for, press two for," types of prompts, which take up time and can become repetitive and annoying to a caller. Also, with DTMF as a caller you find yourself attempting to memorize listed options until you here the choice for which you are looking. With speech, you're prompted for a choice and you simply speak your selection.

So over all, speech allows you to get through the call process achieving your goal more expediently and affectively.

Before you start

  • Understand how speech will help you users. First and foremost, before making changes to an application you'll want to ask the question: how will this help my users? You don't want to add speech just for the sake of saying you now have speech. You want to add speech to actually help your callers. With the benefits we have covered such as more global options, quicker call completions, flatter and less hierarchical menus, speech actually brings a measurable benefit to the table.
  • What you don't want to do is to just retrofit your existing application. You don't want to copy DTMF with speech. For example: "Press or say one for red, press or say two for blue." This doesn't actually help the user, there is no benefit over DTMF with this type of application for speech. What you'd want to do is to restructure the application, and not give in to pressure to duplicate DTMF.

Another thing you'll want to do is to start simple if at all possible. Don't take a very complex application and try and turn it into a speech application your first time out. There are aspects of the speech design that are different than working with DTMF. It may take a little longer to develop, tune, and test. If you have a smaller application it may be best to start with it. You may also want to consider enabling a portion of your application at first. This will enable you to get some idea as to how speech development software works, how to trouble shoot it, how users interact with it and then from there start considering revamping the entire application.

© 2018 LumenVox, LLC. All rights reserved.