Video Understanding Speech Recognition Technologies



  • When selling speech recognition for Asterisk, it is important to understand the technologies that drive speech recognition. This video will introduce you to a number of the fundamental technologies and standards behind speech recognition and Asterisk. It covers how the LumenVox Speech Engine works and touches on how it integrates with the Asterisk Dial Plan, and how you can also use standards like Voice XML for building speech-enabled software for Asterisk. The video also covers how LumenVox licenses its Speech Engine and many tips and tricks that are useful for actually deploying speech applications.
  • Licensing options are continuously being updated, so please contact LumenVox for current information.
  • RUNTIME 16:10


Video Transcription

Understanding Speech Recognition Technologies

Today we will be talking about all the different technical areas of speech.

LumenVox Speech Engine

  • Speaker independent
  • Dynamic language and grammar loading
  • Standards-based
  • Windows and Linux support
  • Built-in noise reduction models


Grammars tell the Speech Engine what words or phrases to listen for. They need to cover all expected responses.

Call Router example grammars: Operator, Main Menu, Sales

Dynamically loaded vs. static grammars: Static grammars are loaded once when you start the application, and are always there. An example would be a City / State grammar for a store locator. The caller says what city and state they're hoping to find a store in, and the system returns the information for the store in that city. An example of a dynamic grammar would be if a customer calls in and the system recognizes a caller ID. The system then brings up a contact list that is specific to the caller. This would be considered voice dialing.

The LumenVox Speech Engine allows you to process up to 12,000 grammars. If you are creating a directory for a company, each person's first name, last name, and potentially nickname should be entered, and each one of these would be considered a grammar.

Application Tuning

Tuning is one of the most important aspects of speech recognition, as it optimizes application accuracy. Speech development is a repetitive process, where you create an initial version, deploy, analyze, fix, re-deploy, analyze, fix…. You can change or improve the application, for example by adding alternate pronunciations. Up to 40% of development time may be spent after the initial deployment if you do not use a tuning method. The Speech Tuner can cut that down by half or even more.

Deployment Options

There are two ways of deploying the Speech Engine in conjunction with the Asterisk system.

In the Asterisk Server: If you have a smaller number of simultaneous calls you need to handle, you can put the Engine right on the Asterisk server, so you have a single box loaded with everything Asterisk needs in order to process the calls. If your environment requires more simultaneous calls to be handled, you can scale it out from the Asterisk server, or you can do it from the get-go with distributed architecture.

Distributed Architecture with Load Balancing: The Engine has built-in distributed architecture environments which you can take advantage of, with load balancing included. With this, the actual call handler would be sitting in your Asterisk system, but now you can basically distribute out speech servers to your network. The load balancing module will automatically determine which engine is currently available, where you can run a decode, and this way you can handle any number of calls.

Port licensing: Every concurrent call handled with the Engine requires a speech port. Resource licensing is very economic for you and your partners and end-users. You really only need a port active as you recognize something.

Port density parameters:

  • Single or dual processor
  • RAM
  • Grammar size
  • Single interaction length

© 2018 LumenVox, LLC. All rights reserved.