Like any industry, the speech recognition industry contains plenty of jargon words you may not be aware of before you come into the industry so we would like to define the most commonly used jargon words.
In this first section we'll focus on general telephony speech recognition jargon words. In the next paper we'll get into more specific terms.
IVR: Which stands for Interactive Voice Response. It's a way for a caller to interact with a computer via the telephone or VOIP. So instead of talking to a person they will be speaking with a computer.
DTMF: You'll hear this a lot when you're converting to a speech application. This stands for Dual Tone Multi Frequency. It's the tones that your digital keypad makes. It's also commonly referred to as TouchTone, however DTMF is the technical telephony jargon frequently heard. Especially when converting a DTMF application into a speech application.
ASR: This stands for Automatic Speech Recognition. You'll also hear this as Speech Recognition Engine, or SRE. This is software that converts spoken word into text. It doesn't mean that the application will understand the meaning of these words, but it will take the phonetic word and turn it into text for the application.
Voice Recognition: Commonly confused with speech recognition. The difference between the two is that speech recognition is more generic, all speakers will be recognized the same. Voice recognition focuses on a single speaker's voice. It's most commonly used for speaker verification in a biometrics, security type of application in order verify that who you say you are is actually who you are. The two are commonly used interchangeably but their meanings are quite different so it's important to clarify when unsure.
TTS: This stands for Text-to-Speech. This is the opposite of an ASR, it's the process of changing text to the spoken word. It's commonly used for referencing a real time database, for example if you have an application which provides callers with current weather information. It's not very convenient to record every possible weather condition so it's allowing a TTS application convert ever changing weather updates from text to speech becomes a more viable solution.
VUI: Stands for Voice User Interface. Although this may seem a little confusing, this is how you navigate through speech recognition system using verbal commands and responses. This can be compared to a GUI, which is a graphic user interface. A GUI allows you to visually navigate your computer using pictures buttons, and allows you to use you mouse to function. A VUI allows you to navigate verbally only.
MRCP: Stands for Media Resource Control Protocol. As the name implies, this is a way of controlling your media resources. Such as your telephone platform, your speech engine uses MRCP to communicate with the speech application. MRCP is a standard set of rules for communication and comes in two versions, MRCP version 1, and version 2. These two versions are very different and non compatible with one another.
VXML: Stands for Voice XML (eXtensible Markup Language). VXML is a standard language used to write voice applications. It's used to control your telephone platform and speech engine. Since it is a standard there are certain rules associated with VXML, so if your VXML compliant it is easier to navigate new speech software as long as all is within the standards.