The LumenVox API returns the offset for SSML markers and viseme markers in terms of a buffer offset i.e. number of bytes from the 0th byte. However, that number may not be meaningful without information regarding the sampling rate and audio format. It is sometimes more useful to keep track of the offset in terms of milliseconds.
This page provides some pseudo-code to describe the conversion from buffer offset to number of milliseconds for each of the audio formats we accept. The below pseudo-code is for the C interface, however, the same conversion formula would work for the C++ interface.
C Code
- int TTS_handle; // Handle from LV_TTS_CreateClient in C interface
- int audio_format, sampling rate; // Got from LumenVox API
- int buffer_offset; // Offset returned for ssml or viseme markers
- float offset_ms; // Output offset in milliseconds
- ////////////////////////////////////////////////////////////////////////////
- // Insert code that does the synthesis and gets the ssml or viseme
- // marker offsets using the LumenVox API
- ////////////////////////////////////////////////////////////////////////////
- // Get audio format and sample rate from TTS Server
- LV_TTS_GetSynthesizedAudioFormat(TTSClientHandle, audio_format);
- LV_TTS_GetSynthesizedAudioSampleRate(TTS_handle, sampling_rate);
- // Calculate milliseconds offset based on audio format
- // The below formulas are generic for the C and C++ interfaces
- switch(audio_format)
- {
- case SFMT_ULAW:
- case SFMT_ALAW:
- // We multiply by 1000 to convert seconds to milliseconds
- offset_ms = (float)buffer_offset * 1000 /sampling_rate;
- break;
- case SFMT_PCM:
- // We divide by 2 because there are 2 bytes per PCM sample
- offset_ms = (float)buffer_offset / 2 * 1000 / sampling_rate;
- break;
- }
// Offset_ms now contains buffer_offset converted to milliseconds