High Volume Applications

Handling High Volume Applications

(APR 2008) — Building any telephony application that experiences a high volume of calls can be a difficult task, which is why LumenVox has taken steps to ensure that speech recognition can be easily added to high volume, high availability systems.

In this month's Tech Bulletin, we'll discuss some of the questions and technologies that come up when people start planning for complex and robust speech applications.

And much of this information is useful for those building smaller solutions that would like to scale as they grow, so keep reading regardless of whether you're planning on deploying four ports or four thousand.

Hardware Requirements

The most basic thing to consider when building high density systems is the kind of computing hardware you'll need to deploy. First, you'll want to consider the load on your systems without speech. This means understanding the CPU, memory, and disk usage required by your telephony layer, application server, and any supporting services.

Once you know that, you can factor speech recognition into the equation. Very high volume applications will want to offload speech to a dedicated machine or cluster, but smaller applications can get away with hosting everything on a single server.

Speech recognition is a fairly demanding task, as it uses a lot of processing power and memory. The exact amount required depends mainly on two factors: the number of simultaneous people using the Speech Engine, and the complexity of your grammars.

A simple yes/no grammar, for instance, requires only one–quarter of the CPU time that a decode using a 500 word grammar does. Very large grammars, with thousands of words or lines, may also require significant increases in memory.

For more information on CPU use, please see the hardware questions section of our help file.

In general, you should expect that the LumenVox Speech Engine will require 500 MB to 1 GB of memory above and beyond what your other applications are using (very large grammars may require significantly more).

Adding More CPUs

The LumenVox Speech Engine is multi–threaded, so it is able to take advantage of multiple processor cores or chips. Simultaneous decode can happen in parallel across multiple processors, and decodes are very processor intensive.

Because of this, increasing the number of CPUs tends to have a very positive effect on the number of decodes you can run at once. In general, we find that doubling the number of chips or cores increases capacity by about 70–80 percent. Doubling CPU capacity doesn't quite double throughput as there are other limiting factors (there is some overhead and other bottlenecks like the speed of memory, disks, and the motherboard's bus).

This means that if you had a single–CPU system that was able to do 100 simultaneous decodes, adding a second CPU of the same variety would usually allow you to do between 170 and 180 simultaneous decodes.

Please note that the above example is just hypothetical; the other bottlenecks and overhead may restrict actual performance gains. Also note that different processor architectures provide different benefits, so simply increasing the number of CPUs may offer less benefit than moving to a better processor architecture.

LumenVox does not make any guarantees about the number of decodes you will be able to run in practice. Your own applications and system architecture will likely affect performance, so anyone looking to build high–density systems needs to run these tests for themselves, using a real–world environment to correctly gauge how the entire solution will run in production.

Distributed Architecture

As we mentioned in the August 2007 Tech Bulletin, the LumenVox Speech Engine is designed to work in a distributed environment.

A speech client — generally your telephony platform or application server — is able to use multiple speech servers, intelligently balancing load between them and monitoring the list of servers to perform automatic failover should any server go down.

This is useful for high demand applications in several ways. First of all, it will allow you to move the speech recognition functionality off your application servers and onto dedicated machines.

This helps improve performance and make optimal use of hardware, and you can be sure that a spike in demand for speech recognition won't harm the performance of the rest of your application (or vice versa).


One technology that's on the rise in many high volume installations is the virtualization of servers, often in conjunction with what is called utility computing, computing on demand, grid computing, or cloud computing.

Virtualization technology, which is similar to emulation, allows you to build a virtual machine that runs inside an existing machine and operating system.

The virtual machine has an operating system and environment that shares hardware with the host system, but is otherwise largely independent (there is a layer of abstraction between the virtualized machine and the host operating system by means of special virtualization software).

This is useful to people who have a need to run multiple operating simultaneously on a single machine, or multiple environments. The virtual machine looks and behaves like a perfectly normal computer running a dedicated operating system.

The technology is also useful for people building large computing installations. It makes it easy to segment a single server into multiple virtual servers, ensuring that no programs running on one virtual machine can harm programs running on the other — a fatal application error can't take down the entire server.

Because virtual machines do not interface directly with hardware, service providers can actually distribute computational load across many different machines in a way that is seamless to users.

This allows for the service providers to grow or shrink the hardware required to run a given virtual machine depending on what is running in that machine at any given time. Instead of buying dedicated hardware that might sit unused 90 percent of the time, users can just pay for the CPU cycles or memory they use, allowing for on–demand computing resources.

LumenVox is able to work in these environments with one important exception: the License Server must run on a dedicated machine. Because licenses are tied to the hardware used by the License Server, we require access to actual hardware in order to run correctly.

That said, our distributed architecture allows you to easily put the Speech Engine on a different machine from the License Server, so for the most part the LumenVox products should fit nicely into a heavily virtualized or on–demand environment.

© 2016 LumenVox, LLC. All rights reserved.