Introduction to Semantic Interpretation Video



  • Semantic Interpretation, what is it and why use it? In this series, Stephen Keller covers detailed examples of the process of determining what callers mean, not what they say. Also included is a discussion of Semantic Interpretation for Grammars and DTMF.
  • RUNTIME 6:52


Video Transcription

Introduction to Semantic Interpretation

In this segment we're going to introduce you to Semantic Interpretation. If you haven't already watched our videos on SRGS grammars and you don't really know anything about them, we would suggest that you take a look at those videos first. This discussion will assume that have at least a little knowledge of working with grammars. We're going to be discussing:

  • What is Semantic Interpretation?
  • Why do you use it?
  • Semantic Interpretation in Grammar vs. in the Application

What is Semantic Interpretation?

It's a subtle distinction but it's very important and often misunderstood by developers who are new to working with speech recognition. Basically it's the process of determining what a user said versus what they meant. What the user meant is the semantic interpretation.

Let's say you have a call router, a customer calls in and asks to speak with technical service. A second customer calls in and asks to speak with technical support. A third customer simply asks for support. All three callers meant exactly the same thing, they are trying to get to the exact same place. They all wanted technical support, but they used three different phrases to get across the same meaning. As humans we can do semantic interpretation very well, computers require more explicit input. Semantic interpretation will allow you take all three of those utterances and return a single result that can be predictable and useful by computer code.

Example 1:

Let's take a look at a more in depth example using a call router once again. In our SRGS grammar we have the following line:

$robertsmith = (Robert | bob) [smith]; 

This basically says that a caller can ask to talk to Robert Smith by saying Robert Smith, or Bob Smith, or just Robert or Bob without using a last name. So this is a couple of different ways to ask for the same person. In your application you won't want to account for all the different ways because it will become tricky. However, within a grammar using semantic interpretation tags inside the grammar you can have all those separate utterances return Robert Smith, or perhaps a phone extension. Then in your application you can have it get back the extension and have the caller transferred to that extension, or if the application receives Robert Smith, the call will be transferred to Robert Smith. This makes it easier for you because you do not have to account for all of this within the application and users can use a wide range of responses.

Example 2:

Another way to use semantic interpretation is to control and format the kind of output from the speech engine to the application. Let's say you have a prompt that asks the caller for a PIN (a series of digits), so the caller responds, "1, 2, 3, 4."

Well, the speech engine understands words and not really numbers. It sees number as words and not digits. So it will return to you the words "one, two, three," and the word " four." You could probably perform transformations with the application so that you could replace the word "one" with a digit. However, using semantic interpretation you can go ahead and turn those words into numbers before it even reaches the application, which is quite handy because your application is going to want to work with just digits. Likewise, if you have a longer number, such as if a caller says the number "One thousand ninety four." What you want back is not the words one thousand ninety four but instead the digits 1094, you can do that with semantic interpretation.

So hopefully you're starting to get an idea as to why semantic interpretation is so important and why you need to always do it. One thing new developers are often unsure about is why they should put it in the grammar when they can put it in the application.

Technically you could, within an application you can account for all the different utterances and phrases your callers are likely to say and translate them into a standard output. However, if you begin to really think about it, what are you really doing? You're actually recreating the grammar, because the grammar already contains all of the words and phrases that you expect the caller to say and that you want to have recognized. So if you put the grammar into the application code, this will create some issues.

First of all, it's simply not elegant, you'll want to keep this kind data separate from your logical application code. The other big problem is that it's difficult to keep them in synch. You're going to be making changes to your grammar as you tune your application, add entries, and change entries, those sorts of things. If you have to make these updates in two places that makes if difficult to keep things in synch. If you have an utterance in your grammar that is not doing proper interpretation in your code because you forgot to update the code and you get strange results then you've got a mess on your hands. You simply want to stay away from that if at all possible. So using semantic interpretation within a grammar you can get nice clean predictable output to your application and have much smoother functionality.


There is another type of case that is interesting and that's working with DTMF, also known as Dual Tone Multi Frequency, sometimes called TouchTone. It's the sound that telephone keys make when you press them. Our speech engine does not handle DTMF tones by itself, your telephony platform has to decode those tones. But, once the telephony platform has decoded the tones, you can pass the results of what numbers were pressed to our engine. Our engine can then run them through a grammar and perform semantic interpretation. So if you have a prompt that would allow a user to press or say a number, you can have the same semantic interpretation return regardless if they said one or they pressed one. This is good because now in your application you do not have to write code for executing DTMF input, and for speech input, when they are performing the same function, you simply check semantic interpretation from the engine and you go on your way. So that covers the introduction to Semantic Interpretation for now.

Next Section

In the next video we're going to really roll up our sleeves and get into the nitty gritty of semantic interpretation for speech recognition. That will be the W3 spec standard that you can use to put the semantic interpretation tags within your grammar. We'll also cover the syntax, and introduce you to working with them and we'll give you tips to how best to use them in semantic interpretation part 2.

© 2016 LumenVox, LLC. All rights reserved.