Tools

Vendor Specific Parameters

Reference Number: AA-01100 Views: 18867

0 Rating/ Voters

As defined in the MRCP specifications, there are a set of headers allowing the client to adjust vendor specific parameters. These headers may be sent in the SET-PARAMS/GET-PARAMS methods.

The following parameters are LumenVox-specific extensions to the MRCP specification. They can be controlled via the media_server.conf file, located in the config directory of the Windows LumenVox installation folder. By default, this location is C:\Program Files\Lumenvox\config\.

In Linux, edit the media_server.conf file in /etc/lumenvox/.

They may also be set with the appropriate header as part of a RECOGNITION or SET-PARAMS method; see Specifying Vendor-Specific Properties via MRCP Headers below.

See Configuration Parameters for more information about changing various MRCP parameters.

wind-back-time

The length of audio wound back at the beginning of voice.

It helps in the situation of weak speech onset. The resolution of this parameter is 40 ms and it is rounded to the closes multiple of 40ms, which means setting this value to 139 ms is the same as setting it to 120 and setting this value to 141 ms is the same as setting it to 160 ms. It is specified in milliseconds.

Range: >0

See STREAM_PARM_VAD_WIND_BACK in the LumenVox API documentation for more details.

Default: 480

snr-sensitivity-lvl

This setting controls the minimum SNR of streamed audio data for it to be processed to identify whether it is speech. Data below this threshold is automatically assumed to be silence/noise. The Noise estimate for the calculation is obtained from the initial silence specified by STREAM_PARM_VAD_STREAM_INIT_DELAY. The higher the value the harder it is to barge in. The default value of 50 equals 5 dB SNR. The parameter range is mapped between 3.5dB to 20dB. If the application is expected to be in a very noisy environment and speech is not expected to be much louder than the background, this setting may need to be lowered. If speech is expected to be much louder than the surrounding noise, then raising this value allows the VAD to ignore lower volume background speech or babble noise that may otherwise cause barge-in

Note that this parameter can be set in the range 0-100, with higher values (closer to 100) being more sensitive to barge-in in noisy situations with low SNR (where speech and background noise are similar)

Range: 0-100

See STREAM_PARM_VAD_SNR_SENSITIVITY in the LumenVox Core API reference documentation for more details. Note that the LumenVox setting (0 is most sensitive) is opposite to the snr-sensitivity-lvl setting (100 is most sensitive). Note that this vendor specific setting should not be confused with the similar MRCP Sensitivity-Level header setting, which affects the STREAM_PARM_VAD_VOLUME_SENSITIVITY setting in the API.

Default: 50

vad-stream-init-delay

The length of audio (in milliseconds) that the VAD module uses to estimate the acoustic environment.

Accurate VAD depends on good estimation of acoustic environment. The VAD module uses the first a couple of frames of audio to estimate the acoustic environment, such as noise level. The length of this period is defined by this parameter.

Range: >0

See STREAM_PARM_VAD_STREAM_INIT_DELAY in the LumenVox API documentation for more details.

Default: 100

vad-bargein-threshold

VAD speech sensitivity setting.

A higher value makes the VAD more sensitive to speech which means that the VAD is very sure the data is speech before barge in. Raising the value will reject more false positives/noises However, it may mean that some speech that is on the borderline may be rejected This value should not be changed from the default without significant tuning and verification.

Range: 0 - 100 (MRCP v1 and MRCP v2)

See STREAM_PARM_VAD_BARGEIN_THRESHOLD in the LumenVox API documentation for more details.

Default: 50

compatibility_mode

Enables compatibility encoding of results

This option may need to be enabled to match the output of LumenVox decodes with those of other vendors.

Please contact LumenVox support for more specific details

Default: 0

end-of-speech-timeout

Controls the end of speech timeout setting

This value affects the underlying STREAM_STATUS_END_SPEECH_TIMEOUT of the speech port, which is used in an MRCP ASR recognition session.

After barge-in, the streaming interface will flag STREAM_STATUS_END_SPEECH_TIMEOUT, if it did detect end-of-speech in the time specified by this property. This is different from the end of speech delay; STREAM_PARM_END_OF_SPEECH_TIMEOUT represents the total amount of time a caller has to speak after barge-in is detected.

See STREAM_STATUS_END_SPEECH_TIMEOUT in the LumenVox Stream Properties documentation for more details.

Default: -1 (infinite)

Added in 11.0.300

secure_context

Enables suppression of potentially sensitive ASR data.

When enabled, this option will prevent logging of any potentially sensitive data to either log files or callsre data files, which includes any associated audio segments. Where potentially sensitive data would have appeared, the word _SUPPRESSED will replace the potentially sensitive data to indicate that suppression occurred.

Possible Values:

0 - Disabled. Normal logging will be performed
1 - Secure Context mode enabled. Sensitive data will be suppressed

Default: 0

Added in 11.0.300

tts.secure_context

Enables suppression of potentially sensitive TTS data

When enabled, this option will prevent logging of any potentially sensitive data to either log files or callsre data files, which includes any associated audio segments. Where potentially sensitive data would have appeared, the word _SUPPRESSED will replace the potentially sensitive data to indicate that suppression occurred.

Possible Values:

0 - Disabled. Normal logging will be performed
1 - Secure Context mode enabled. Sensitive data will be suppressed

Default: 0

Added in 15.1.100

enable-sre-logging

Controls the generation of Response Files within a session.

When used, this option allows users to override the default enable_sre_logging setting in media_server.conf, which is used to control the generation of Response Files. This flexible option therefore allows users to independently control whether Response Files are generated for individual sessions. It can even be used to control interactions and requests within a single session so that, for example, only certain recognition requests are recorded, if this is needed.

Note that increasing logging verbosity (using this method, or any other) causes an increase in CPU and disk I/O usage, and should therefore be avoided wherever possible in a production environment where optimal performance is critical.

Possible Values (same as for enable_sre_logging setting):

0 - SAVE_SOUND_FILES_NONE
Deactivates saving of .callsre files (no interactions recorded)
1 - SAVE_SOUND_FILES_BASIC
Saves basic information in .callsre files. This includes the audio from when BARGE_IN occurred up to the END_OF_SPEECH.
2 - SAVE_SOUND_FILES_ADVANCED
In addition to the information stored with the SAVE_SOUND_FILES_BASIC setting, when speech is streamed in to the Voice Activity Detection module, all of the collected data prior to a StreamCancel command is saved. This is usually when a NO_INPUT or TIMEOUT event occurs. This option collects data only when there is something to debug, which offers a balance between disk usage and retaining important information for debugging and is particularly useful when diagnosing NO-INPUT problems.
3 - SAVE_SOUND_FILE_ALL
Used to collect all streamed data in all cases. This option collects all streamed data, whether there was a NO_INPUT or TIMEOUT event or not, including untrimmed audio from 'good' decodes. This option can be useful in diagnosing barge-in problems as well as other potential issues.

Default: (as specified by enable_sre_logging)

Added in 15.1.100

callsre-prefix

Allows the addition of a custom string prefix to the beginning of the Response File filename for the current session

When specified, this option will add the specified prefix to Response Files generated for the current session. This may be useful when identifying certain specific calls, such as those belonging to a certain application or customer controlled category.

Note that the callsre-prefix and callsre-suffix options are both independent, so can be used individually, together, or not at all, as needed.

Possible Values:

A string containing valid filename characters (avoid reserved characters)

Default:

Added in 15.1.100

callsre-suffix

Allows the addition of a custom string suffix to the end of the Response File filename for the current session

Similar to callsre-prefix, when specified, this option will add the specified suffix to Response Files generated for the current session. This may be useful when identifying certain specific calls, such as those belonging to a certain application, or some customer controlled category.

Note that the callsre-prefix and callsre-suffix options are both independent, so can be used individually, together, or not at all, as needed.

Possible Values:

A string containing valid filename characters (avoid reserved characters)

Default:

Added in 15.1.100

logging-verbosity

Controls the Logging Verbosity within the current session

When used, this option allows users to override the default LOGGING_VERBOSITY setting in client_property.conf, which is used to control the verbosity of log messages. This flexible option therefore allows users to independently control the amount of logging generated for individual sessions. It can even be used to control interactions and requests within a single session so that, for example, only certain recognition requests are logged with specified verbosity, if this is needed.

Possible Values (same as LOGGING_VERBOSITY setting):

1 - Minimal Logging - only errors and critical issues
2 - Medium Logging - all non-debug information as events occur
3 - Debug Logging - all types of events, including information and debugging activity
4 and higher values - typically higher levels of logging verbosity are useful to LumenVox.

Default: (as specified by LOGGING_VERBOSITY setting)

Added in 18.0.400

sticky-save-waveform

Allows for option to override the platform's default SAVE-WAVEFORM setting

Possible Values:

True - If set to true, regardless of the save-waveform header value, the save wave-form option will be set to true for the remainder of the MRCP session

False - If set to false, regardless of the save-waveform header value, the save wave-form option will be set to false for the remainder of the MRCP session

Default:

Specifying Vendor-Specific Properties via MRCP Headers

As mentioned previously, you may specify the above parameters in an MRCP header. You must use the following format. Note that a semicolon (";") is used as the delimiter:

Vendor-Specific: com.lumenvox.wind-back-time=300;com.lumenvox.vad-stream-init-delay=200

This header field may be specified in RECOGNIZE, recognizer SET-PARAMS or synthesizer SET-PARAMS method during an MRCP session. The following header field names may be used:

com.lumenvox.wind-back-time
com.lumenvox.snr-sensitivity-lvl
com.lumenvox.vad-stream-init-delay
com.lumenvox.vad-bargein-threshold
com.lumenvox.compatibility-mode
com.lumenvox.end-of-speech-timeout
com.lumenvox.secure_context
com.lumenvox.tts.secure_context
com.lumenvox.enable-sre-logging
com.lumenvox.callsre-prefix
com.lumenvox.callsre-suffix
com.lumenvox.logging-verbosity
com.lumenvox.sticky-save-waveform

Vendor Specific Parameters

wind-back-time

snr-sensitivity-lvl

vad-stream-init-delay

vad-bargein-threshold

compatibility_mode

end-of-speech-timeout

secure_context

tts.secure_context

enable-sre-logging

callsre-prefix

callsre-suffix

logging-verbosity

sticky-save-waveform

Specifying Vendor-Specific Properties via MRCP Headers

See Also