Tools

Configuring TTS Sampling Rates

Reference Number: AA-01937 Views: 8579

0 Rating/ Voters

Many of the LumenVox Text-to-Speech voices are offered in two sampling rates: 8KHz and 22KHz. The sampling rate of an audio signal is one of the main determiners in how "good" the signal sounds; a higher sampling rate generally corresponds to a higher quality sound (see the Wikipedia entry on sampling for more details). 8KHz is the sampling rate used by the G711 (u-Law and A-Law) codec that is the standard for all telephony systems, meaning that for most telephony uses an 8KHz sampling rate is all that is supported. Thus, the default sampling rate used by LumenVox is 8KHz.

However, non-telephony users may benefit from a higher-quality output and can take advantage of the higher 22KHz sampling rate. Note that 22KHz output produces larger file sizes and will take more bandwidth.

Prerequisites

In order to use a 22KHz voice, you must have the following:

A valid license for the 22KHz voice installed on your license server or available in your LumenVox account if you are using our hosted licensing. 8KHz voices are licensed separately from 22KHz voices, so please contact LumenVox Support or your account representative if you need a 22KHz license. (If your license does not indicate the sampling rate, it is an 8KHz license).
The 22KHz voice package installed. See our Linux Installation or Windows Installation instructions for information on how to download and install a voice. Be sure that your package says 22 in its name; if it does not indicate a sample rate it is 8KHz.

Specifying Sampling Rate

Once your voices are installed and licensed, you can specify the sampling rate in one of several ways:

In client_property.conf, the SYNTHESIS_SAMPLING_RATE can be set to 8000 for 8KHz (the default) or 22050 for 22KHz. Note that while we commonly say 22KHz, it is technically 22.05KHz, meaning you must sate the rate as 22050 and not 20000. The sampling rate value in client_property.conf will be used if it is not specified anywhere else.

If you are using the Speech Tuner's SSML editor and want 22KHz syntheses, you must edit the Tuner's client_property.conf file.

If you are using the SimpleTTSClient, the sampling rate can be set by specifying -r rate on the command line. E.g. SimpleTTSClient -r 22050 -t "Hello World" will cause it to synthesize "Hello World" at 22KHz.
When using the API, you can set the PROP_EX_SYNTH_PROSODY_RATE using SetPropertyEx to either 8000 or 22050.

Note that, as a telephony-focused application, the LumenVox Media Server currently only supports 8000 KHz synthesis output.