Tools

Adding Audio

Reference Number: AA-00606 Views: 11628

0 Rating/ Voters

Because the LumenVox Speech Engine is hardware independent, the client application has great flexibility when collecting audio data. Once the audio is acquired, your client application should ensure the data is in a supported audio format, either by collecting it in this format initially or else converting it somehow.

The audio sent to the Speech Engine must be header-less, otherwise known as "raw" audio. For example, the standard Windows .wav files have a header which needs to be removed.

Audio data is stored in a voice channel. Each speech port has 64 different voice channels. This allows 64 different audio data samples to be stored in a speech port at once, although most applications will only need 2: one for the main answer, and one holding the results of a confirmation yes/no question.

Audio may be entered at once, as a batch decode, or it may be streamed in. Batch decodes are generally used for applications where you have audio files saved to disk, while streaming decodes are used when your application is collecting audio and sending it directly to the Engine (most telephony applications will use streaming).

Batched Audio

To get your audio into the port all you have to do is collect your audio into a buffer and call LoadVoiceChannel, supplying the handle of the port, the number of the voice channel you wish to use, a pointer to the audio data, the length of the audio, and the correct sound format.

C Code

void LoadAudio(HPORT hport, void* audio, int audiolength)
{
LV_SRE_LoadVoiceChannel(hport, 1, audio, audiolength, PCM_16KHZ);
}

C++ Code

void LoadAudio(LVSpeechPort &myport, void* audio, int audiolength)
{
myport.LoadVoiceChannel(1, audio, audiolength, PCM_16KHZ);
}

Streaming

In order to stream audio into the speech server, there are several parameters to set, as the Speech Engine must do voice activity detection (VAD), correctly identifying the beginning and end of speech. See Recommended Engine Settings.

If you are having problems with barge in or with the Engine chopping off words at the end of utterances, it is probably because of the way the streaming parameters are set. Please review the Recommended Engine Settings, Sensitivity Settings, and the SetStreamParameter pages.

The code below will set up streaming and set the stream parameters to the most commonly used settings.

C Code

LV_SRE_StreamSetParameter(hport, STREAM_PARM_DETECT_BARGE_IN, 1);
LV_SRE_StreamSetParameter(hport, STREAM_PARM_DETECT_END_OF_SPEECH, 1);
LV_SRE_StreamSetParameter(hport, STREAM_PARM_AUTO_DECODE, 1);
LV_SRE_StreamSetParameter(hport, STREAM_PARM_VOICE_CHANNEL, 1);
LV_SRE_StreamSetParameter(hport, STREAM_PARM_GRAMMAR_SET, LV_ACTIVE_GRAMMAR_SET);

C++ Code

// The port gets opened and initialized.
LVSpeechPort port;
port.CreateClient();
// ...

// Let the port detect beginning and end of speech,
// and handle the speech decoding automatically
port.StreamSetParameter(STREAM_PARM_DETECT_BARGE_IN, 1);
port.StreamSetParameter(STREAM_PARM_DETECT_END_OF_SPEECH, 1);
port.StreamSetParameter(STREAM_PARM_AUTO_DECODE, 1);

// Pick a voice channel to record audio and send responses to.
port.StreamSetParameter(STREAM_PARM_VOICE_CHANNEL, 1);

// If you wish to use your activated SRGS grammars, the grammar set
// must be LV_ACTIVE_GRAMMAR_SET
port.StreamSetParameter(STREAM_PARM_GRAMMAR_SET, LV_ACTIVE_GRAMMAR_SET);

The rest of this example will be in C++. Suppose we have an interface that intermittently provides audio to us. For simplicity, assume it always sends audio in u-Law 8KHz:

Example

typedef bool (*AudioStreamCallback)(char* audio_chunk,
int audio_length,
void* user_data)
class AudioStreamer
{
public:
// Non-blocking function. Sends audio through the callback function
// at regular intervals on a separate thread. It will stop sending
// audio if the callback returns "false".
void StartStream(AudioStreamCallback cb, void* user_data);

// The audio thread will stop sending audio through the callback if
// StopStream is called. When StopStream returns, the audio thread
// is no longer sending.
void StopStream();

// constructors, destructors, hardware hooks, etc.
// ...
};

The speech port also has a callback mechanism for letting the user know what state of processing it is in.

Example

typedef void (*StreamStateChangeFn)(int new_state,
unsigned int total_bytes,
unsigned int recorded_bytes,
void* user_data);

We can connect our speech port and the audio streamer together by way of their callbacks.

Example

struct SimpleRecognizer
{
LVSpeechPort port;
AudioStreamer audio;
};

bool AudioCB(char* audio_chunk, int audio_length, void* user_data)
{
SimpleRecognizer* self = (SimpleRecognizer*)user_data;
self -> port.StreamSendData (audio_chunk, audio_length);
return true;
}

static void PortCB(int new_state, unsigned int total_bytes,
unsigned int recorded_bytes,
void* user_data)
{
SimpleRecognizer* self = (SimpleRecognizer*)user_data;
switch (new_state)
{
case STREAM_STATUS_READY:
self -> audio.StartStream(AudioCB, self);
break;
case STREAM_STATUS_STOPPED:
case STREAM_STATUS_END_SPEECH:
self -> audio.StopStream();
// Retrieve answers: we will define this later
break;
case STREAM_STATUS_BARGE_IN:
// Stop playing prompt
break;
}
}

Now all that has to happen is to plug the PortCB function into the port.

Example

SimpleRecognizer reco;

// Initialize the speech port and the audio streamer
// ...
// Start the stream.
reco.port.StreamSetStateChangeCallBack(PortCB, &reco);
reco.port.StreamSetParameter(STREAM_PARM_SOUND_FORMAT, ULAW_8KHZ);

// StreamStart will put the port into the STREAM_STATUS_READY state, which
// will trigger the audio streamer to start sending audio to the port.
reco.port.StreamStart();