Rss Categories

Audio Collector

Reference Number: AA-02243 Views: 193 0 Rating/ Voters

The Audio Collector Server is used to collect audio files from incoming telephony traffic based on Realtime-Transport-Protocol (RTP). This audio can either be stored to disk and used as part of a stand-alone audio collection mechanism, or can be used to send audio from calls into other services for processing, such as when using Voice Authentication.

When implementing passive voice authentication using a telephony platform, the Audio Collector processes inbound audio and allows it to be used for Passive Enrollment or Authentication purposes, as shown in the following diagram.

Diagram 1: Audio Collector flow diagram

The three main components within the Audio Collector are described below:

  • Audio Acquisition Server (AASVR) - this is responsible for tracking active call sessions (when SIP is being used) as well as receiving and decoding audio data from RTP traffic prior to processing.
  • Audio Acquisition Recorder (AAREC) - this is responsible for buffering and recording audio data from AASVR for either saving to disk (if this functionality is enabled), or sending on to the CTI Service for further processing
  • CTI Service - this component is responsible for coordinating between the Telephony Adapter (TA), VTAssure (Authentication Platform) and AASRV/AAREC components. This component controls which audio streams are targeted for processing, as well as informing the TA of which calls are actively in progress, so that they may be selected for processing for passive enrollment and authentication.

Network Port Mirroring

In order to obtain audio and telephony traffic to be used for speech authentication when using the Audio Collector, you must connect using Network Port Mirroring (sometimes called SPAN) to direct replicated telephony network traffic (SIP, RTP, etc.) to the Audio Collector Server as shown in the diagram below.

Diagram 2: Connection details for port mirroring to route traffic to Audio Collector

As an alternative to using a managed switch with port mirroring, you may also be able to use a network Tap device to provide similar functionality. The important thing to remember is that the Audio Collector machine needs to receive all of the telephony traffic, including the associated audio traffic (typically RTP).

Differences Between SIP and TDM Calls

When working with SIP calls, there is additional information from the SIP session headers and negotiation that the Audio Collector can utilize. These headers are parsed to determine things like Call-ID as well as To/From information as well as RTP audio stream information and ports used from the SDP headers. This call metadata is used by CTI Manager when selecting calls for inclusion in enrollment or authentication and when recording call audio to disk.

Specifically, when recording audio to disk, the Call-ID is used to generate a sub-folder that contains call audio files along with a metadata file (named Metadata.csv) containing the various metadata elements, so that the data can be parsed later.

When working with TDM or T1 calls, there are no SIP headers available to describe the calls, so metadata about each call needs to be assigned using the CTI Service interface. Typically this connects to a Telephony Adapter (TA), which would have knowledge about active calls, so it can inform the CTI Service (which in turn informs AASRV) about each call of interest, either being recorded, or used as part of enrollment or verification. Information presented to CTI Service when identifying calls of interest include a means of identifying the call (Call-ID) along with the RTP port information to monitor for audio, since the Audio Collector would not be able to distinguish which ports to listen to without this information.

The Metadata.csv file, containing various attributes for each call, is stored in a consistent comma-delimited format.

The first line of this file contains header information, describing each column or field of data in the following line. The header looks like this:


The following line contains the actual metadata relating to the call associated with the recorded audio, an example of this is shown here:

1,qy979vpe-5,"560" sip:560@,560,"333" sip:333@;tag=fIISJ0cw3,333,1,2018-09-19T08:56:25,PCMU/8000

Here are some additional notes for the various fields:

  • Version: Current version number of this field layout (so that the formatting could be changed in an organized way in future)
  • Call-ID: Unique call identifier, either generated from SIP traffic headers or from the CTI Service (in Passive / TDM mode)
  • To: Identifier of the callee (the destination of the call)
  • To-Name: Optional 'display name' extracted from the 'To' header.
  • From: Identity of the caller (source of the call)
  • From-Name: Optional 'display name' extracted from the 'From' header
  • Inbound: 1 if the direction of the call is 'inbound', or 0 if 'outbound'
  • Timestamp: Time and date in YYYY-MM-DDTHH:MM:SS format
  • Media-Format: Describes the format and sample rate of the original RTP stream (note all audio is converted to linear PCM 16-bit before being saved). For example PCMU//8000

Saved Audio Files

When writing audio files to disk is enabled, the root folder specified in the configuration settings will be used to store these audio files. Each audio stream being recorded will generate its own folder to contain the audio. The name of the folder will be based on the Call-ID, which is either parsed from the SIP headers for the call, or provided by a TA, which provides a Call-ID

Within each folder that is generated, the audio for that call will be recorded. This may be segmented into buffered chunks of files, as determined by internally buffering (allowing the audio to be consumed more readily by VT Assure), or if concatenation is enabled, each audio stream will be stored in a single concatenated file.

The format of these files is always PCM 8kHz mono. If you are saving the audio in buffered chunks, and would like to assemble a complete audio call, simply concatenating the headerless recordings in the correct order will generate the same output as the concatenated version. In other words, saving in chunks, or saving as a single concatenated file will generate the same amount of audio saved to disk.

When audio files are split into chunks, each chunk will start with a name of 000001.dat, and increment for each new chunk generated. When concatenation is enabled, the audio will be stored in a single file named CallData.dat

In addition to the audio files being generated in each folder (one per call essentially), there is an information file that contains the metadata associated with the call. This metadata includes the Call-ID along with other relevant information including a date/timestamp for the call. This information may be useful in matching up the audio with call records (CDRs) later if needed, or looking up audio associated with a flagged call perhaps.

Note that the Audio Collector system does not contain any management or housekeeping functionality to remove these files and folders after they are generated, so this responsibility is placed upon system administration. Usually, files would be moved out to locations that are date related or deleted after they are processed, so that the system hard drive does not fill up unexpectedly, which could happen quickly on systems processing lots of calls.

We recommend running routine cleanup or processing of these files to ensure the system hard drive does not fill up. Note that most Operating Systems (including Windows) struggle to manage lots of files or folders under a single parent folder location.


Installation of the Audio Collector is done using the provided Windows Installation package. You can specify where on your server you would like to install the package.

Once installed, the various services will be installed and may be controlled from the services control panel, much like any other Windows service.

Configuration Settings

All three of the components within the Audio Collector need to be configured. The necessary settings are filled and configured during the GUI based installation process and usually don't need to be adapted manually. However, each configuration file can further be adapted manually. For reference see configuration files described below:

Note that you need to restart the service chain after adaptation of one file, the services should be started in this order:

  1. AASRV
  2. AAREC
  3. CTI


Note that licensing must be configured for the Audio Collector, which can be achieved using the normal LumenVox Flexible Licensing, which should have been provided to you.

The License Server can be installed on the same machine as the Audio Collector, or can be connected remotely, as needed. You will need to configure your settings to select the IP address of your License Server as described above.