Tools

LumenVox Distributed Architecture

Reference Number: AA-00617 Views: 8605

0 Rating/ Voters

The LumenVox Speech Engine is designed to run in a distributed manner, with multiple clients and servers working together. This allows it to function well in high-volume applications, as the computing demand can be split across multiple machines (this is known as load balancing, since the workload is balanced across several computers). It also enables users to set up redundancy: if one Speech Engine server goes down in a multi-server environment, clients can switch servers seamlessly.

Overview

The architecture is broken into four components:

Speech Servers: The Speech Engine server is a program that performs the actual speech recognition. It processes the incoming audio, compares a speaker's utterance to the phrases in the active grammars, and returns the results of the audio decode to the speech client.
Speech Clients: The LumenVox client is a piece of software that sits between the speech-enabled application (such as a speech-enabled IVR) and the speech server. It passes the audio from the application to the engine, and returns the decode information from the server back to the application.
Server Monitor: The server monitor is a component of the client process that coordinates the servers and the clients. When a client has audio it needs decoded, it asks the server monitor which speech server to use. The monitor tells the client which server to use, giving it the server that is the least busy. The monitor continuously watches speech servers, so it can remove one from its list of valid servers if a server goes down. If that server later goes back online, the monitor will know this and begin sending clients to that server once more.
License Server: The License Server manages the pool of Speech Engine licenses. When a speech-enabled application opens a new speech port, the speech client requests a license from the License Server. If there is an available license, the License Server assigns it to that client until the speech port is closed and the client releases the license.

By working together in the manner described above, the pieces of the LumenVox distributed architecture allow speech-enabled applications to handle high volumes of speech recognition requests in a robust and fault-tolerant way.

Implementation

Taking advantage of LumenVox's distributed architecture technology is easy. When your application performs a decode, you can specify a list of speech servers to use using LV_SRE_SetPropertyEX (C API) or SetPropertyEx (C++ API).

One of the parameters SetPropertyEx takes is PROP_EX_SRE_SERVERS. This parameter is set to a value that is a string containing the IP addresses of the speech servers you wish to use, separated by semicolons (different ports can be specified by using colons). For instance, "127.0.0.1;10.0.0.1:5721" specifies a server at 127.0.0.1 using the default port of 5730, and a server at 10.0.0.1 using the port 5721

Once this value is set, the LumenVox speech client will automatically try to send decodes to all the specified servers as described above.

Example

To illustrate how the process works, here is a step-by-step example of how the pieces fit together.

A caller calls into a speech-enabled call router and asks to speak with technical support.
The call router application opens a speech port with the speech client.
The client asks the License Server for a Speech Engine license.
The License Server checks its license pool, sees an available license, and assigns it to the speech client.
The client confirms that the speech port is opened and the call router application passes the call audio and parameters to the speech client.
The speech client asks the server monitor which server it should use.
The server monitor has been monitoring the status of the speech servers, and gives the speech client the IP address and port of an available speech server.
The client sends the audio and parameters to the speech server.
The speech server runs the audio and parameters through the Speech Engine and gets the results of the decoded audio, which it passes back to the client.
The client returns the results to the call router application, which is then able to transfer the caller to the technical support department.