Distributed Architecture for Speech Applications
Speech recognition applications in telephony environments not only require increased processor load
but are also mission critical and "down time" is not an option. To address these variables,
a distributed architecture is recommended. While distributed architecture simply means that the
computing demand is split across multiple servers, it really enables a variety of benefits to the
deploying enterprise.
Why Distributed Architecture?
So what are the main advantages of setting up your speech solution in a distributed way?
-
Redundancy
If all of your speech processing, licensing and incoming or outgoing trunks run through the same
server, you truly have your eggs in one basket. Avoid having one single point of failure and risk
your application being off line by using distributed architecture.
-
Performance
You can achieve a more robust application with improved performance by dedicated different servers
for different tasks. For example, your speech recognition server is separate from your application
server, which is separate from your licensing server. This removes competition for processing power
and enables a "purity" of tasking for each server.
-
Ability to Scale
As demand for your speech solution grows with either call volume or speech recognition use, so can
your ability to scale. It can be as simple as adding an additional application server to your
architecture to handle your growing needs. Upgrading becomes a more efficient task if your application
is set up in a distributed way from the start.
Basic Setup
First off, there are applications and environments where creating a distributed architecture is not
necessary. The diagram below shows a typical setup of a telephony application using speech recognition
in a non-distributed way. The Application Server communicates with a single Speech Server that contains
all of the necessary components that manage licensing and speech resources. This architecture can be
appropriate for smaller port densities.
Distributed Architecture Case Example
To illustrate these benefits and to explain specifically how this architecture works in a real world
environment with the LumenVox speech recognition software, let?s discuss a hosted IP-PBX solution from
Ontelnet.
Ontelnet, a full service provider of communication services for residential and business customers,
decided to add speech recognition functionality to its offerings. Because of high call volume, they
knew they would need to deploy a distributed architecture.
In addition, Ontelnet not only needed multiple Speech Servers, but also wanted to have multiple servers
to accept incoming SIP trunks and host the company?s applications. When speech recognition resources are
required, each of the Ontelnet servers could communicate with the Speech Recognition Servers to enlist
speech resources.
Architectural Components
In this example, there are four components to the Architecture:
-
Speech Server
The Speech Engine server is a program that performs the actual speech recognition. It processes
the incoming audio, compares a speaker's utterance to the phrases in the active grammars, and
returns the results of the audio decode to the speech client.
-
Speech Client
The LumenVox client is a piece of software that sits between the speech-enabled Ontelnet application
and the speech server. It passes the audio from the application to the engine, and returns the decode
information from the server back to the application.
-
Server Monitor
The server monitor is a component of the client process that coordinates requests between the client
and the servers. Each time a client has audio it needs decoded, it asks the server monitor which
speech server to use for that decoding request. The monitor tells the client which server to use,
giving it the server that will be able to complete the decoding process in the shortest time. This
allows a mix of fast and slow servers to be deployed in a cluster. The faster servers will handle
a larger share of the requests, but all servers will be used as efficiently as possible to keep
the time needed to handle each request about the same. Each client?s server monitor acts
independently of other clients, so there is no single failure point, but the algorithm used ensures
all clients share resources fairly.
The monitor continuously watches speech servers, so it can remove one from its list of valid
servers if a server goes down, or reduce load on impacted servers. If a speech server later
goes back online, the monitor will recognize this and begin sending client requests to that
server once more. If a client request fails in midstream (the server fails in the middle of
request for any reason), the entire request can be sent to another server instead. This gives
a very high degree of uptime, even for requests taking place at the exact time a speech server
fails or is taken offline.
-
License Server
The License Server manages the pool of Speech Engine licenses. When the Ontelnet application opens
a new speech port, the speech client requests a license from the License Server. If there is an
available license, the License Server assigns it to that client until the speech port is closed and
the client releases the license. Because licensing is handled on the Speech Client, there are no
limitations on the number of speech servers deployed. If the current configuration is overloading the
speech servers, a new speech server can be added to the cluster without worrying about licensing
issues. Also, the speech cluster configuration (the list of speech servers a specific client is using)
is independent of licensing, so a single license server could be used for multiple speech clusters,
and again, the cluster configuration can be altered at anytime, without regard to licensing.
Conclusion
Using a distributed model for speech-based applications is a very feasible architecture, and is truly
required for effective load distribution for high call volume applications. It also addresses the needs
of redundancy, performance, scalability and efficient upgrading.
If you have any further questions about how to set up a distributed architecture for your speech solution,
please contact the LumenVox support department at support@lumenvox.com and we?ll be happy to help.
For more information on the Speech Engine go to: LumenVox Speech Engine.
Print Version
View PDF file
View a print ready version of this page.
Key Takeaways
-
Implementing a Distributed Architecture for Speech Recognition applications is feasible and
allows for mission-critical redundancy.
-
By creating a Distributed Architecture you can truly address performance and scalability issues
in your deployment.