Tools

Using save_waveform to save audio over MRCP

Reference Number: AA-01935 Views: 9217

0 Rating/ Voters

As per both the MRCP version 1 and version 2 specifications, the LumenVox Media Server supports the save-waveform MRCP header. This allows those using MRCP and the LumenVox Media Server to save audio files for each recognition. Typically, this will be handled by the voice platform, and LumenVox will return a URI as part of each RECOGNITION-COMPLETE message specifying the location of the audio file. However, the Media Server can also be configured to save audio files for all interactions, which may be useful.

Please note that these audio files are not .wav files, instead they are headerless, raw audio files encoded in either u-Law or A-law (depending on what codec was used when streaming audio to the Media Server). If you wish to play them in a standard desktop audio player, you will need one that can import raw audio files. One such option is the free, open-source utility Audacity which is capable of this. (LumenVox does not make or support Audacity; we simply suggest it as a well-known option.)

Configuration Options

The following options in media_server.conf can be used to control the behavior of the Media Server with regards to saving audio files:

save_waveform

Takes a value of true or false (default).
If this is set to false, by default no audio files will be saved unless save-waveform: true message is explicitly set over MRCP.
If this is set to true, then all audio files will be saved regardless of the MRCP messaging. There will be one audio file created per speech recognition request, meaning there may be multiple files saved per MRCP session (if that session includes multiple RECOGNIZE requests).

waveform_url_location

Takes a path on disk to save audio files (default is null). This might be something like c:\audio\ on Windows or /var/www/html/audio/ on Linux.
Specifies the location on disk where the Media Server will store all audio files.
NOTE: this must be set in order to enable the saving of audio files. If the location cannot be written to, no audio files will be saved. Be sure that the user running the Media Server has write permissions to this location.

remove_waveform_files

Takes a value of 0 (disabled) or 1 (enabled, default).
If this is set to 1, the Media Server will delete any audio files it saved during an MRCP at the end of that session.
If this is set to 0, the Media Server will never delete any audio files it saved. If this is set to 0, you will need to make sure that you periodically clean up any saved audio files or else they might potentially fill your hard disk.

waveform_url_prefix

Takes a string value, typically in the form of a URI (default is null).
If this is set, the value will be appended to the start of the URI that gets returned by the Media Server when it sends the waveform URI to the voice platform over MRCP, replacing the waveform_url_location value. As an example, you may be storing your audio in the /var/www/html/audio/ directory. If the Media Server saved a file called 1234, it would send the URI as /var/www/html/audio/1234. However, you would like your voice platform to access it using a URI of http://audio-server/audio/ as the base URL. Setting the waveform_url_prefix to http://audio-server/audio/ would cause the Media Server to send http://audio-server/wavs/1234 as the URI.

Saving Everything

A common use case for developers is to force the Media Server to save all interactions as audio files. This is useful especially when debugging a new deployment. To save everything, you must do the following:

Set save_waveform = true
Set waveform_url_location to a valid path on disk.
Set remove_waveform_files = 0

This will cause the Media Server to save all interactions to audio files and never delete them, allowing you to access them at will.

Saving Data for Tuning

Users looking to do speech tuning (reviewing audio to improve the performance of their application) are not recommended to use the save_waveform functionality. Instead, they should enable response file generation and use the LumenVox Speech Tuner. The response files contain much information that is vital to tuning which is not present in raw audio files:

End-pointed audio and audio after noise reduction has been applied.
Information about the grammars that were used during a recognition.
Recognition results including confidence score, interpretations, etc.
Other metadata about the recognition (ASR settings, speed, etc.).

The primary use of saving raw audio with save_waveform should be just to verify that audio is making it to the Media Server.

Waveform Filenames

The name of the waveform generated is made up of the session identifier, plus a sequence number relating to the individual sequence within the session (since a single session can have multiple decodes, and therefore multiple waveforms associated with it), followed by a suffix based on the audio format selected.

The session identifier used is based on whether you are using RTSP or SIP:

In the case of RTSP, it is the Session: value associated with the session. This is followed by an underscore, then the corresponding sequence number associated with the RECOGNIZE request, so if the Channel-ID was “5891c16e-1d6f-4e51-a” and the REQUEST sequence number was 5 and the audio format was ulaw, the waveform filename would be:

5891c16e-1d6f-4e51-a_5.ulaw

In the case of SIP, the filename is made up of the Channel-Identifier, followed by the corresponding sequence number, with the audio format suffix. So, if your Channel-Identifier value was “32AECB23433801@speechrecog” and the sequence number associated with the recognition was 11 and the audio format was alaw, the name of the generated wavefile would be:

32AECB23433801@speechrecog_11.alaw

When the current recognition request is completed, the name and location of this waveform file are sent back to the MRCP client as part of the RECOGNITION-COMPLETE message. In RTSP, this is returning the Waveform-URL header, and in the case of SIP sessions, this is returned in the Waveform-URI header. When trying to determine the name of any generated waveform file, this should be the primary source used, since this will give the actual name that was generated, along with any configured prefix. Typically, users specify a prefix (waveform_url_prefix) and a location where these files are saved (waveform_url_location) in their configuration settings, as described above.

For UniMRCP Users

Note that users of the UniMRCP client interact (often in conjunction with Asterisk or some derivative platform) can access the Waveform-URI or Waveform-URL using the ${RECOG_WAVEFORM_URI} variable within their applications.

Please refer to the UniMRCP documentation for more information relating to this.

You can use this variable within your dialplan. For example, this will log out the value to the Asterisk console:

exten=>s,n,Verbose(1,The recognition waveform
URI is ${RECOG_WAVEFORM_URI})

Other Notes

Only audio packets received while the MRCP session is in a RECOGNIZING state will be saved, so audio sent between RECOGNIZE requests will be discarded and not saved (this is per the MRCP specifications). Consider a full packet capture if you need to get the entire RTP stream between a voice platform and the Media Server.

DTMF audio will also generally not be present in saved audio files. This is because LumenVox does not support nor expect to receive inband DTMF audio tones; instead DTMF must be converted to RTP Events per RFC 4733 before being sent to the Media Server.