HOSTING

Managed Services

PRODUCTS

Speech Engine

AMD

Text to Speech

SERVICES

Discover

Design

Develop

Deploy

ASR

Subscriptions

Multiple Ways of Billing

Distributed Architecture for Speech Applications

Speech recognition applications in telephony environments not only require increased processor load but are also mission critical and "down time" is not an option. To address these variables, a distributed architecture is recommended. While distributed architecture simply means that the computing demand is split across multiple servers, it really enables a variety of benefits to the deploying enterprise.

Why Distributed Architecture?

So what are the main advantages of setting up your speech solution in a distributed way?

  • Redundancy
    If all of your speech processing, licensing and incoming or outgoing trunks run through the same server, you truly have your eggs in one basket. Avoid having one single point of failure and risk your application being off line by using distributed architecture.
  • Performance
    You can achieve a more robust application with improved performance by dedicated different servers for different tasks. For example, your speech recognition server is separate from your application server, which is separate from your licensing server. This removes competition for processing power and enables a "purity" of tasking for each server.
  • Ability to Scale
    As demand for your speech solution grows with either call volume or speech recognition use, so can your ability to scale. It can be as simple as adding an additional application server to your architecture to handle your growing needs. Upgrading becomes a more efficient task if your application is set up in a distributed way from the start.

Basic Setup

First off, there are applications and environments where creating a distributed architecture is not necessary. The diagram below shows a typical setup of a telephony application using speech recognition in a non–distributed way. The Application Server communicates with a single Speech Server that contains all of the necessary components that manage licensing and speech resources. This architecture can be appropriate for smaller port densities.

Basic Setup Diagram

Diagram 1: Basic Setup of Speech Application.

Distributed Architecture Case Example

To illustrate these benefits and to explain specifically how this architecture works in a real world environment with the LumenVox speech recognition software, let's discuss a hosted IP–PBX solution from Ontelnet.

Ontelnet, a full service provider of communication services for residential and business customers, decided to add speech recognition functionality to its offerings. Because of high call volume, they knew they would need to deploy a distributed architecture.

Distributed Architecture Diagram

Diagram 2: Ontelnet's Distributed Architecture.

In addition, Ontelnet not only needed multiple Speech Servers, but also wanted to have multiple servers to accept incoming SIP trunks and host the company's applications. When speech recognition resources are required, each of the Ontelnet servers could communicate with the Speech Recognition Servers to enlist speech resources

Multiple Server Distributed Architecture Diagram

Diagram 3: Multiple Ontelnet Servers Communicate with Multiple Speech and License Servers.

Architectural Components

In this example, there are four components to the Architecture:

  • Speech Server
    The Speech Engine server is a program that performs the actual speech recognition. It processes the incoming audio, compares a speaker's utterance to the phrases in the active grammars, and returns the results of the audio decode to the speech client.
  • Speech Client
    The LumenVox client is a piece of software that sits between the speech–enabled Ontelnet application and the speech server. It passes the audio from the application to the engine, and returns the decode information from the server back to the application.
  • Server Monitor
    The server monitor is a component of the client process that coordinates requests between the client and the servers. Each time a client has audio it needs decoded, it asks the server monitor which speech server to use for that decoding request. The monitor tells the client which server to use, giving it the server that will be able to complete the decoding process in the shortest time. This allows a mix of fast and slow servers to be deployed in a cluster. The faster servers will handle a larger share of the requests, but all servers will be used as efficiently as possible to keep the time needed to handle each request about the same. Each client's server monitor acts independently of other clients, so there is no single failure point, but the algorithm used ensures all clients share resources fairly.

    The monitor continuously watches speech servers, so it can remove one from its list of valid servers if a server goes down, or reduce load on impacted servers. If a speech server later goes back online, the monitor will recognize this and begin sending client requests to that server once more. If a client request fails in midstream (the server fails in the middle of request for any reason), the entire request can be sent to another server instead. This gives a very high degree of uptime, even for requests taking place at the exact time a speech server fails or is taken offline.

  • License Server
    The License Server manages the pool of Speech Engine licenses. When the Ontelnet application opens a new speech port, the speech client requests a license from the License Server. If there is an available license, the License Server assigns it to that client until the speech port is closed and the client releases the license. Because licensing is handled on the Speech Client, there are no limitations on the number of speech servers deployed. If the current configuration is overloading the speech servers, a new speech server can be added to the cluster without worrying about licensing issues. Also, the speech cluster configuration (the list of speech servers a specific client is using) is independent of licensing, so a single license server could be used for multiple speech clusters, and again, the cluster configuration can be altered at anytime, without regard to licensing.

Conclusion

Using a distributed model for speech–based applications is a very feasible architecture, and is truly required for effective load distribution for high call volume applications. It also addresses the needs of redundancy, performance, scalability and efficient upgrading.

If you have any further questions about how to set up a distributed architecture for your speech solution, please contact the LumenVox support department and we'll be happy to help.

For more information on the Speech Engine go to: LumenVox Speech Engine.

VIEW PDF FILE

KEY TAKEAWAYS

  • Implementing a Distributed Architecture for Speech Recognition applications is feasible and allows for mission–critical redundancy.
  • By creating a Distributed Architecture you can truly address performance and scalability issues in your deployment.