Rss Categories

Using save_waveform to save audio over MRCP

Reference Number: AA-01935 Views: 2039 0 Rating/ Voters

As per both the MRCP version 1 and version 2 specifications, the LumenVox Media Server supports the save-waveform MRCP header. This allows those using MRCP and the LumenVox Media Server to save audio files for each recognition. Typically, this will be handled by the voice platform, and LumenVox will return a URI as part of each RECOGNITION-COMPLETE message specifying the location of the audio file. However, the Media Server can also be configured to save audio files for all interactions, which may be useful.

Please note that these audio files are not .wav files, instead they are headerless, raw audio files encoded in either u-Law or A-law (depending on what codec was used when streaming audio to the Media Server). If you wish to play them in a standard desktop audio player, you will need one that can import raw audio files. One such option is the free, open-source utility Audacity which is capable of this. (LumenVox does not make or support Audacity; we simply suggest it as a well-known option.)

Configuration Options

The following options in media_server.conf can be used to control the behavior of the Media Server with regards to saving audio files:

  • save_waveform
    • Takes a value of true or false (default).
    • If this is set to false, by default no audio files will be saved unless save-waveform: true message is explicitly set over MRCP.
    • If this is set to true, then all audio files will be saved regardless of the MRCP messaging. There will be one audio file created per speech recognition request, meaning there may be multiple files saved per MRCP session (if that session includes multiple RECOGNIZE requests).
  • waveform_url_location
    • Takes a path on disk to save audio files (default is null). This might be something like c:\audio\ on Windows or /var/www/html/audio/ on Linux.
    • Specifies the location on disk where the Media Server will store all audio files.
    • NOTE: this must be set in order to enable the saving of audio files. If the location cannot be written to, no audio files will be saved. Be sure that the user running the Media Server has write permissions to this location.
  • remove_waveform_files
    • Takes a value of 0 (disabled) or 1 (enabled, default).
    • If this is set to 1, the Media Server will delete any audio files it saved during an MRCP at the end of that session.
    • If this is set to 0, the Media Server will never delete any audio files it saved. If this is set to 0, you will need to make sure that you periodically clean up any saved audio files or else they might potentially fill your hard disk.
  • waveform_url_prefix
    • Takes a string value, typically in the form of a URI (default is null).
    • If this is set, the value will be appended to the start of the URI that gets returned by the Media Server when it sends the waveform URI to the voice platform over MRCP, replacing the waveform_url_location value. As an example, you may be storing your audio in the /var/www/html/audio/ directory. If the Media Server saved a file called 1234, it would send the URI as /var/www/html/audio/1234. However, you would like your voice platform to access it using a URI of http://audio-server/audio/ as the base URL. Setting the waveform_url_prefix to http://audio-server/audio/ would cause the Media Server to send http://audio-server/wavs/1234 as the URI.

Saving Everything

A common use case for developers is to force the Media Server to save all interactions as audio files. This is useful especially when debugging a new deployment. To save everything, you must do the following:

  1. Set save_waveform = true
  2. Set waveform_url_location to a valid path on disk.
  3. Set remove_waveform_files = 0

This will cause the Media Server to save all interactions to audio files and never delete them, allowing you to access them at will.

Saving Data for Tuning

Users looking to do speech tuning (reviewing audio to improve the performance of their application) are not recommended to use the save_waveform functionality. Instead, they should enable response file generation and use the LumenVox Speech Tuner. The response files contain much information that is vital to tuning which is not present in raw audio files:

  • End-pointed audio and audio after noise reduction has been applied.
  • Information about the grammars that were used during a recognition.
  • Recognition results including confidence score, interpretations, etc.
  • Other metadata about the recognition (ASR settings, speed, etc.).

The primary use of saving raw audio with save_waveform should be just to verify that audio is making it to the Media Server.

Other Notes

Only audio packets received while the MRCP session is in a RECOGNIZING state will be saved, so audio sent between RECOGNIZE requests will be discarded and not saved (this is per the MRCP specifications). Consider a full packet capture if you need to get the entire RTP stream between a voice platform and the Media Server.

DTMF audio will also generally not be present in saved audio files. This is because LumenVox does not support nor expect to receive inband DTMF audio tones; instead DTMF must be converted to RTP Events per RFC 4733 before being sent to the Media Server.