| Reference Number: AA-01461 Views: 48058
100 Rating/ 2 Voters
14.2.100 (Apr 11, 2016):
Improvements and New Features:
- Added new TTS1 Polish voices Jacek (Male), Agnieszka (Female), Ewa (Female) and US English voice Justin (Male Child).
- Support for these new TTS voices was added to the Speech Tuner auto-complete functionality within the TTS Editor.
- Added new client_property.conf settings options to the Media Server configuration page in the Dashboard. This now allows users to modify all Media Server and underlying Client Property configuration settings from within the Dashboard. These new settings are conveniently shown below the existing Media Server options, in the "Client Property Configuration" section on the page.
- The configuration settings view within the Dashboard now also has additional setting information available as expandable dropdowns beside certain settings where needed. This provides users more verbose information about settings, offering further assistance when making configuration changes.
- Further improved the Dashboard configuration page by compressing the underlying code, making it significantly smaller and therefor faster when using this page.
- Changed the Dashboard to allow the Manager Service to be restarted if necessary. This restarting option is now available one the Summary and Configuration Pages.
- Added a new option to display the last successful License Synchronization date when viewing the Licensing Page within the Dashboard. This option is only visible when using the Flexible Licensing mechanism, and may be helpful when determining whether successful synchronization with the LumenVox Licensing Nodes in the cloud is occurring.
- Added a new 'Class' column to the Interaction List displayed within the Speech Tuner. This contains the Accuracy Classification of each interaction, based on the currently selected Confidence Threshold. The currently selected Confidence Threshold can be changed by clicking on one of the Confidence Histograms. Once a Confidence Threshold is selected, these classifications show how each interaction would be classified if the selected Threshold was used, in terms of In Grammar (IG), Out Of Grammar (OOG) as well as Correct-Accept (CA), False-Accept (FA) and so on, from an accuracy classification perspective. This information will now also be exported if users select the "Save List To File..." option for the Interaction List, and may therefore be helpful when compiling external accuracy reports.
- Added new builtin:grammar/boolean support for the Brazilian Portuguese ASR language
- The Speech Tuner now allows users to edit the settings for their selected License Server within the application configuration dialog instead of needing to edit the corresponding client_property.conf file on disk as was previously needed. Any changes made within the Speech Tuner dialog are permanently saved to the corresponding LICENSE_SERVERS setting of the conf file on disk.
- Modified the Speech Tuner export audio functionality to add a new option of utilizing any Logging-Tag entries derived from imported Response Files when naming the saved audio files to disk. If selected, the corresponding Logging-Tag (if present) will be used as the filename prefix where appropriate. If no appropriate Logging-Tag is detected in this mode, the specified default prefix string will instead be used.
- Modified some Error Code messages to return ASR in place of the previously used SRE references as part of our ongoing branding updates to standardize on the use of the more familiar ASR term.
- Changed the wording of the MRCP diagnostic report results on the Diagnostic Page of the Dashboard. These changes correct minor typos and also clarify the meaning of certain test failure conditions.
- Modified the License Server's Configuration Page within the Dashboard to expose the FLEX_NODE_LIST configuration option that was previously not present.
- Fixed incorrect behavior of special NULL rule handling when used in certain optional parameter conditions due to incorrect weighting being applied. This is unlikely to have affected many customers and was only discovered during internal aggressive testing.
- Fixed some minor typos in the Dashboard's Diagnostic page within the diagnostic report.
- Fixed a bug in the Dashboard's Diagnostic testing algorithm that prevented the correct TTS language from being selected when running certain tests. Previously a default value of en-US language was selected when this may not have been appropriate and caused some confusion when no American English voices/licenses were available.
- Fixed an unusual bug related to Flexible Licensing Tier selection, which appears to have affected a single customer, where transition from Tier2 to Tier3 licenses were prevented in very specific conditions based on the grammar vocabulary size.
- Fix for SSML processing of say-as / interpret-as "telephone", "phone", "vxml:phone" and "vxml:time" elements, where carriage-return and tab characters within the element were being unexpectedly read aloud.
14.1.300 (Feb 16, 2016):
Improvements and New Features:
- Updated the LumenVox End User License Agreement (EULA) documentation accompanying all products to the latest version.
- Changed the Speech Tuner Matching Files Update Dialog to be less sensitive. Now the list of possible grammars presented excludes any grammars that do not match the prior text of the grammar being modified.
- Removed a minor security loophole within the Dashboard that could allow system monitoring data to be exposed to non-authenticated users via the Dashboard's Monitoring tab.
- Small cosmetic change to the Licensing page of the Dashboard to correctly show the license tree expansion after certain operations such as "Resync Info"
- Fixed a minor bug in the Speech Tuner, introduced as part of the 14.1.100 changes relating to the modification of a root grammar within a callsre file, where the root status was incorrectly removed. Speech Tuner users should upgrade to this version from 14.1.100 to take advantage of this change and avoid the issue. This issue only occurs when editing root grammars in the version 14.1.100 Speech Tuner.
- Fixed a Media Server issue where Content-ID is specified with angle brackets during DEFINE-GRAMMAR (i.e. 'session: <email@example.com>' and later used during a RECOGNIZE request without the brackets (i.e. 'session: firstname.lastname@example.org'). This appears to have affected a small number of users, so modifications have been made accept this formatting.
- Fixed a small typo with the recently introduced Diagnostics page of the Dashboard when an MRCPv2 synthesis failure is reported. Previously, the RTSP port was being shown in the error string instead of the SIP port that was actually used.
- Fixed a reported issue relating to occasional Media Server exceptions occurring while streaming TTS audio. This appears to been due to an unusual buffering issue that this change addresses.
14.1.100 (Jan 11, 2016):
Improvements and New Features:
- A new diagnostics tab has been added to the Dashboard utility. This can be used to check the system configuration and test functionality. Options are also available to globally change logging verbosity to debug or default levels as needed, or clear all LumenVox logs. An option to send a diagnostic report to LumenVox, or save it to disk is also included.
- Dashboard now includes options to disable monitoring certain LumenVox services on the local machine, as needed. This allows users to prevent indicating issues with services they do not intend to use. One example of this is that the seldom-used Call Indexer Service is now unmonitored by default. These new options coincide with the new diagnostic tests that are now available.
- Dashboard configuration screen now includes Basic, Intermediate and Advanced filtering options, allowing new or occasional users better clarity as to which settings should be altered. Now that there are a lot of possible options for some of the LumenVox services, this should make it easier for users to manage.
- The Summary page on the Dashboard now optionally shows the IP addresses of remote LumenVox services that may be configured.
- Many improvements added to the lv_show_config utility, including an option that displays the current values as well as the default values for all configuration settings reported. Previously only the current setting was shown, which did not clearly indicate which settings were at their defaults.
- Additional improvements to the lv_show_config utility were added to better support mixed language installations. Now users can specify the ASR and TTS languages they wish to test (asr_lang and tts_lang options respectively), instead of the utility defaulting to American English.
- Along with other significant changes to the lv_show_config utility, results being returned were updated to be more verbose and clearer to users whenever an issue is detected. For example, previously where a generic failure message would be shown, the output will now be clearer, indicating a missing license or language pack for example.
- The lv_show_config utility was also updated to automatically skip over SIP or RTSP testing if those ports are marked as disabled (set to 0 in media_server.conf). This change removes user confusion while running this utility.
- Added new builtin:grammar/digits grammars for Australian English, British English, Indian English, Colombian Spanish, Canadian French and Brazilian Portuguese. These optional grammars were previously not provided.
- Added a new process status checking WebAPI to the manager server allowing users to check the status of LumenVox processes by calling /apt/status/asr etc. and receiving a corresponding HTML page response. This may be of particular interest to those users that have requested something that they could use to integrate to their monitoring systems, such as SNMP or other management tools, as used by many datacenters.
- Added a new LOGGING_VERBOSITY configuration option to manager.conf, allowing the manager service to adjust logging verbosity, in a similar way to our other services. This, in conjunction with some logging cleanup results in easier to read manager_app.txt files.
- The Manager Service will now automatically skip attempting to clean up log files that are marked as read-only. This only affects logging reports, since previous versions would report attempts to clean up such files.
- Configuration files for License Server, TTS Server and ASR Server are now correctly located and reported by lv_show_config. Previously on Windows-only installations this was omitted since these files could be installed anywhere on the system as determined by the user. Linux machines are unaffected - the configuration files are always located in a fixed location.
- Manager now reports if unable to control services due to permission issues. Previously, such controls were disabled with no indication as to the reason.
- The "Phonetic Speller" and "Random Sentence Generator" dialogs in the Speech Tuner now automatically scale correctly if Windows users configure non-default screen resolution font settings.
- The Media Server now processes the Speech-Language header for TTS SPEAK requests, which now allows text/plain requests to be able to specify and use the user's desired language. This is only needed if not using SSML markup - if SSML markup is used (the most common usage), this header value will be ignored. Previously when using text/plain TTS requests, these would only use the default (SYNTHESIS_LANGUAGE) specified in client_property.conf
- Changed the Recognition-Timeout characteristics within the Media Server. Previously whenever this timer expired, the recognition was terminated with a Recognition-Complete and a completion cause of Recognition-Timeout. We found that it is possible to improve performance by attempting a decode at this time if sufficient audio has been received and a valid result is achievable. If insufficient audio is available, or the decode yields no result, the response will be the same as before, however if a valid response is available, it will be returned instead. This may dramatically help users encountering barge-out issues related to certain noisy conditions.
- Implemented a minor change to unify our third-party library notifications. These are now all contained in a single ThirdPartyLegalNotices.txt file distributed will all of our packages.
- Added option to configure LICENSE_SERVERS setting (within client_property.conf) for Media Server in the Dashboard.
- Fixed a number of minor bugs relating to MRCP connectivity problems when running the SimpleMRCPClient utility.
- Removed a bug in the manager configuration settings related to Access Control List (ACL), which prevented the use of the + and - modifiers. This is now corrected, allowing the correct ACL attributes to be specified per our documentation.
- Fixed a significant issue in the speech tuner relating to incorrectly resolved grammar references when a particular combination of parent-child-grandchild grammars are used and multiple slightly different instances of these relatives exist, with or without references to custom lexicons. This problem only affected a very small number of Speech Tuner users, but the impact was significant, where grammar references were not correct. The internal structure and processing of these grammar references was completely redone to address this issue. Now if sibling grammars are edited within the Speech Tuner, users will be presented a dialog, indicating the affected references, and given a choice of whether to modify those references, or only the current one.
- Fixed a client-side grammar caching issue, where duplicate entries could be spawned, causing unnecessary duplicate copies to be saved into response files. This issue is unlikely to have been noticed by users, however this behavior used more disk space and memory than was needed, and is therefore less optimal. We encourage users to upgrade to this version to make use of this change.
- Fixed a minor manager startup issue, which may have resulted in incorrect log reports of problems accessing the ADMIN_PORT.
- Fixed an issue in the Media Server when processing a BARGE-IN-DETECTED message from an RTSP client, when no active SPEAK request was being processed. Previously, the acknowledgement response to this notification did not have the required Content-Type header and may have caused some platforms to incorrectly process this response. This incorrect behavior has been present since version 10.0 (February 2011)
14.0.100 (Sept 25, 2015):
Improvements and New Features:
- Added support for Brazilian Portuguese (pt-BR) ASR language.
- Added a new mechanism to report warnings detected during grammar loading. These are issues that insufficient to trigger a grammar loading error, so may have previously gone unnoticed (despite appearing in the log files). This new mechanism introduces three new API functions exposing this functionality: LV_SRE_ReturnGrammarNumWarnings,LV_SRE_ReturnGrammarFirstWarningStringLength and LV_SRE_ReturnGrammarFirstWarningString.
- Added support for new Brazilian Portuguese ASR language to the Speech Tuner as well as the new grammar warning mechanism. Now any syntax or similar warnings generated when loading grammars will be displayed to the Speech Tuner users.
- Modified the Phonetic Speller in the Speech Tuner to correctly process multi-word phrases. Previously these multi-word phrases were being processed, however the method used was not identical to that used by the ASR during grammar compilation, so may have yielded slightly different results.
- Modified the way the Speech Tuner lists grammar references. Previously, the complete grammar URIs were displayed, which often obscured the filename portion of the reference. Now, users can choose between displaying the entire URIs or the filename portions. Both can always be displayed by viewing the properties of a selected grammar, but displaying only the shortened version may be easier for many users. This setting can be toggled as desired from a new "Show Grammar Paths" option in either the View menu, or the grammar list context menu.
- Changed the Speech Tuner license checking behavior. Now if the license date has expired, a message to that effect will be displayed prior to exiting. Note that using the Speech Tuner requires an annual subscription license that must be renewed each year.
- Minor enhancement to the Licensing Tab in the Dashboard to include Expand-All and Collapse-All options to allow users more convenient display options for the license tree.
- Fixed a Media Server issue that caused an exception when processing SIP UPDATE requests from unknown or previously closed sessions.
- Changed the Media Server to re-implement SIP responses when there is no message-body in the message (such as a response from SET-PARAMS). A change in 13.1.100 affected the MRCPv2 processing for Avaya (and possibly other platforms), preventing correct operation. This change now ensures compatible responses are returned.
13.1.100 (July 23, 2015):
Improvements and New Features:
- Added support for Linux RHEL 7 / CentOS 7 Operating Systems, and deprecated support for Windows 2003.
- Added sysvstatus option to Linux init.d scripts to be consistent across supported Linux platforms with the introduction of RHEL 7/CentOS 7 support. This small change was needed since the semantics of status changed due to the use of systemd.
- Added new Portuguese (pt-PT) Female 'Catina' TTS voice and new Russian (ru-RU) Male 'Vasili' voice.
- Migrated Windows development to Visual Studio 2013 from VS2008 in previous versions. This means that the runtime libraries we are built against are now using the VS2013 versions, and our sample projects for Windows have also been updated to reflect this change. Along with our deprecated support for Windows XP in 13.0 and now Windows 2003 in 13.1, we can use more optimal coding algorithms, some of which are being introduced in this release. Most customers should not notice any significant difference in this migration.
- Added support for seldom used 'OPTIONS' request (for ASR and TTS) in RTSP protocol within the Media Server.
- Rebuilt TTS1 acoustic models, introducing a number of overall enhancements, improved currency interpretation and pronunciation of numerous words, including proper names and expansion of abbreviations as well as road numbers. Improved apostrophe handling and unit expansion and time/date ranges. Improved American address expression, state names and contractions.
- Added TTS1 processing for radio stations and non-native characters (such as ê). Improved pronunciation of hyphenated compounds (e.g. pay-as-you-go) and improved homograph disambiguation. Improved reading of serial numbers, chemical elements and NCAA team names among other pronunciation improvements.
- LVShowConfig -tts_test will now succeed even if the default TTS voice is not installed or licensed. As long as at least one voice can be used, the test will complete successfully. This addresses unusual issues where users had not updated the default voice setting, typically when using a non en-US voice or language.
- Rebuilt the ASR acoustic models to provide a slight improvement in overall accuracy.
- Modified the internal architecture of the Media Server to run more efficiently using less threads, which should allow it to scale to even higher port densities than before. These changes also include a number of memory management optimizations to reduce memory fragmentation of extremely heavily loaded installations.
- Modified the Windows installation to set the Media Server service to start automatically whenever the system reboots. Previously this was assigned as a manual start, since not all users utilize this service, however we have changed this setting to be more accommodating to the MRCP / Media Server users. Any users that do not utilize the Media Server can reduce the overhead of having that service running by disabling it in the Windows Services Manager. See out knowledge base article on Windows Installation for more details on how to do this. Note that on Linux systems, this automatic startup option was already in place for the Media Server.
- Changed the response to requests in RTSP protocol within the Media Server to return a simpler "404 Not Found" instead of the previous "486 Busy Here" when processing DESCRIBE or SETUP requests. This new response is more optimal and more consistent with the MRCP standard. If any user wrote code to expect the "486 Busy Here", we recommend changing to anticipate either, or use the normalized "404 Not Found" now being returned. In the interim, users can add and enable the hidden 'send_486_replies' flag in the MRCPv1 section of media_server.conf to continue receiving the "486 Busy Here" responses (this is for backward compatibility only)
- Changed the behavior of the Media Server when RECOGNITION-START-TIMERS requests are received whenever the recognizer is not in a RECOGNIZING state. Such requests will now receive a "402 Method not valid in this state" error response, where previously the request could have been accepted and the timers started. This new functionality is more consistent with the MRCP specifications. Other timers also now check the current recognizer state prior to being processed (such as DTMF activity), which is a more robust overall implementation.
- The ECMAScript engine used for processing Semantic Interpretation data was updated to use the latest Spider Monkey version 31 library. This was also modified to reduce dependence on system installed libraries which have caused occasional problems in the past.
- Various optimizations and improvements were made to SimpleMRCPClientto minimize stalling or errant packet processing.
- Added 'Content-Type' for legacy RTSP 486 replies from Media Server. This header was previously missing.
- Changed MultiThreadedStreamingExample (C and C++) examples to print out the parameters used to the console, giving users a little more detail about which audio and grammar files and number of threads are being used.
- Removed the RTSP idle timeout by default from the Media Server. This was previously added to address issue with certain client applications opening many ports and then abandoning them, which would use up unnecessary resources. This new default, although a change in behavior, is unlikely to affect any systems. If someone wishes to re-enable this non-standard functionality, setting the enable_rtsp_idle_timeout to 1 in the media_server.conf settings will enable the old behavior. This option is retained for backwards compatibility only, and its use should be avoided due to the (small) additional overhead it introduces. When enabled, this timer will trigger an RTSP connection shutdown if no requests are received within 10 seconds of the connection being established.
- Add support for file://C:/blah URIs (with two slashes following the file:). These are now being treated as if they were file:///C:/blah URIs (three slashes following the file:). The URI with two slashes is technically invalid, but to remain consistent with other vendors which support these non-standard URI formats, we have added this support. Please see our URIs and LumenVox knowledge base article for more information relating to supported URI formats, or https://en.wikipedia.org/wiki/File_URI_scheme for more explicit details and examples of valid file URI schemes.
- Fixed a problem with TTS synthesis where an ambiguous language 'en' specification was failing to utilize the correct language per our published Voice Language Selection Priority list. Now any partially specified language identifiers will follow the correct order of precedence.
- Fixed the TTS language/voice ordering for Gwendolyn. As a female, she should precede any male voice in that language. Previously this voice preceded Gavin, which was inconsistent with our order of precedence rules, so this has been changed. Again, these rules only apply if an ambiguous language, gender or voice-name selection is made, so this change is unlikely to impact many users.
- Fixed the TTS language/voice ordering for European Portuguese to have precedence over Brazilian Portuguese.
- Fixed an unusual problem with the ASR that was throwing an exception when a certain custom lexicon was being used. The problem was isolated to the 13.0 version of LumenVox and also specific to RHEL6 Linux.
- Fixed a Speech Tuner issue where Call data was being incorrectly shown on the Call Volume graph in the Summary view. The number of Calls shown was significantly lower that the correct values. This issue was a regression issue with the version 13.0 release.
- Fixed an issue with SSML <say-as> telephone number processing for TTS1 voices. Previously the cadence of the numbers spoken was not correct. Now numbers are correctly grouped as expected and any following extension number or abbreviation is correctly pronounced.
- Fixed a TTS stalling issue where a .wav file with a "LIST" chunk is specified as the reference. Additional (unsupported) chunk types are now ignored as needed to prevent such stalls in future.
- Fixed a logging bug in LVShowConfig when logging an MRCPv2 failure. It would incorrectly report the RTSP port instead of the SIP port in the failure log which could lead to some user confusion.
13.0.400 (May 22, 2015):
Improvements and New Features:
This is a maintenance release which addresses the issues listed below, and should be considered a recommended update for users who may encounter these specific problems. No other functionality was changed, so there is no need to update to this version if you are unaffected by these changes.
- Correction for a regression introduced in 13.0.100 for TTS playback artifacts affecting second and subsequent SPEAK requests within active sessions. Due to a recent optimization, utilizing an audio buffer, up to 2 seconds of audio artifact from an earlier canceled synthesis stream could be played prior to the subsequent (expected) stream being played. This affects MRCPv1 and MRCPv2 sessions where TTS playback was canceled due a STOP request or a BARGE-IN-OCCURRED request.
- A correction to grammar URI parsing relating specified MRCP grammar URIs that contain more than one parameter in the URI string was made. Previously only parameters separated using ampersands was being parsed correctly, and parameters separated using semicolons were being pruned and ignored. The correct behavior of allowing either parameter separator is now implemented. This issue was isolated to the Media Server MRCP processing, and did not affect C/C++ API users.
- A correction was made to ASR lexicon URI processing, which was a regression introduced with version 13.0.100. Specifically, lexicon URIs that were pointing to files on the local file system and had optional parameters, e.g. type=backup, were not being parsed correctly, causing this optional parameter to be misinterpreted and cause a grammar load failure to be emitted. This affected both absolute URIs using the file:// prefix and relative URIs from grammars loaded from the local file system. Web-based URIs, e.g. the ones using the http:// prefix, were not affected. These issues have now been resolved.
- Corrected a Speech Tuner licensing issue when using the new Flexible licensing mechanism. Specifically, if the License Server port number was present in the configuration IP address string, this was causing connectivity issues, which prevented the Speech Tuner from starting correctly.
13.0.300 (April 9, 2015):
Improvements and New Features:
This is a maintenance release which addresses the two issues listed below, and should be considered a recommended update for users who may encounter these specific Media Server connectivity problems. No other functionality was changed, so there is no need to update to this version if you are unaffected by these changes.
- Changed the ordering of media lines (m=xxx) in SDP responses to SIP INVITE packets to match the order represented in the original requests. Previously the LumenVox Media Server responded with the application and audio media lines in a fixed order, which caused issues on a specific IVR platform, so this change means the Media Server will now respond with media lines listed in the same order as they appear in the original request. This change makes the LumenVox Media Server more compliant with the IETF RFC3388 specification describing Grouping of Media Lines.
- Added support for MRCP sequence number 0, which was previous being treated as an erroneous / invalid value. This change allows platforms to make requests using Request-ID 0 in both MrcpV1 and MrcpV2, and resolves a related problem seen with a specific IVR platform during initial integration testing. This addresses an ambiguous definition in the IETF MRCPv2 draft, and now offer support for the “0” MRCP request-id value for both ASR and TTS requests.
13.0.100 (January 6, 2015):
Improvements and New Features:
- Added support for Windows 8 and Windows Server 2012 Operating Systems, and deprecated support for Windows XP
- Added new Norwegian(nb-NO) Female 'Mathilde' TTS voice. Also included support for this new voice in the Speech Tuner
- New SSML processing warning reports are implemented. Previously whenever SSML issues were encountered during TTS synthesis, if some output was possible, the synthesis result would simply be returned with no indication of possible syntax errors in the request. Now additional diagnostic warnings can be used to determine whether a referenced file was not found, or some minor syntax error was skipped over. Increasing the logging verbosity level at the client and server will show an increasing amount of detail when processing these warnings.
- Added new C API functions to access the new SSML warnings that are now available following a TTS synthesis:
- Added new C++ API functions to access the new SSML warnings that are now available following a TTS synthesis:
- Added a new warning tab to the Speech Tuner SSML Editor view to indicate the presence of any SSML warnings that were detected during synthesis.
- Added new acoustic models complete with advanced pronunciation generation algorithms for all supported ASR languages. Users of grammars that contain words already in our comprehensive lexicons will not really notice any difference, but grammars containing out-of-lexicon words and phrases should benefit from these changes. Such grammars and applications may include people and place names, foreign or unusual words.
- A new and improved Canadian French acoustic model was produced that should perform slightly better than previous versions.
- As part of our pronunciation enhancements in this version, we have extended our inline phonetic support and also clarified and improved our ASR custom lexicon support.
- In addition to our new pronunciation enhancements, this version also includes a significant improvement to confidence scoring performance for all supported ASR languages. Significantly lower confidence scores will be seen for Out-of-Grammar and Incorrect utterances. There is also better separation between the correct and incorrectly recognized utterances making it easier for the application developer to pick good confidence thresholds or ranges for confirmation prompts. Our internal testing shows an average relative improvement of over 10% in application performance.
- Added new command line argument -ip to SimpleTTSClient, allowing a specific TTS server's IP address to be specified
- Improved how the Speech Tuner processes and stores grammars and references within its sandbox area. This resolves odd referenced grammar and lexicon issues where those references were n-levels deep. This issue only affected the Speech Tuner's sandbox references.
- Added new 'Close Tuner Database' menu option in the Speech Tuner to make it clearer to new users how to close the current database. This performs the same operation as 'New Tuner Database' but may be more intuitive.
- Speech Tuner now allows the use of middle-mouse button to close selected grammar and SSML tabs when using those editors.
- Speech Tuner now double-clicking selected grammar and SSML documents to open them in the appropriate editor.
- When saving Speech Tuner files in situations where filters are active, users are now prompted to choose whether to save all interactions and remember the active filters, or to save only the unfiltered interactions.
- Speech Tuner now allows bulk assignment / removal of the OOC marker for selected interactions.
- Improved Speech Tuner transcription to reduce latency when moving from one interaction to another. This should be most noticeable to users working extended periods with the Transcriber.
- Added new performance metrics to the Speech Tuner Confidence Histogram to provide additional in-grammar and out-of-grammar statistics.
- Improved the Speech Tuner Transcriber View to allow users to configure how function keys can be mapped to shortcut markers on the Marker Toolbar, which should make the transcription process easier. Now users can assign any marker to any of the available function keys, rather than use the previously predefined assignments.
- Added new Decoded SI list to the Speech Tuner's Summary View, enabling a different way to visualize the statistics of the data.
- Modified the way in which our SimpleMRCPClient and lv_show_config/LvShowConfig utilities detect the localhost IP address in Linux Operating systems. Previously, the first interface returned that was not localhost was used, but now a reverse lookup of the hostname is performed, which is a more reliable method. The corresponding articles for SimpleMRCPClient and lv_show_config have been updated to include a new "Network related warning" in both, which explain this.
- Modified LVLicenseManager to reload / re-synchronize licenses in the LumenVox License Server when running in Flexible licensing mode. This extends the '-r [ip-address]' functionality that was previously available in non-Flexible modes.
- Media Server TTS performance was improved significantly by increasing the amount of cached synthesized audio prior to being streamed out. The amount of benefit users will encounter depends largely on the length of audio being synthesized, as well as the overall system load; however our testing shows noticeable improvement in most cases.
- Improved overall TTS synthesis performance. Less system resources are now used during TTS synthesis, which means higher port densities should be achievable on production servers. Internal testing indicates in excess of 10% overall improvement in synthesis performance in most cases, which should be reflected in the corresponding TTS port densities achievable.
- Minor TTS pronunciation improvements to German, English, Polish, Swedish, Danish, Russian and Spanish language voices
- Modified en-US builtin grammars for date, number, time and phone to improve in-grammar coverage and overall performance when using any of these.
- Improved SSML 1.1 and 'lookup' support for TTS1 synthesis. Now the 1.1 version will be correctly passed along to the synthesizer.
- Modified the Speech Tuner to permit OOC utterances to have transcription text associated with it. This is done by using ++OOC++ marker at the beginning of the transcript string, which can now be toggled.
- Modified the Linux versions of SimpleMRCPClient, SimpleASRClient and SimpleTTSClient to optionally look for sample audio, grammar and SSML files in /usr/share/lumenvox/client/data/ if not present in the current directory.
- Modified the Speech Tuner to indicate the confidence threshold seen on screen when saving the histogram image to file. Previously the threshold indicated may not have been what was selected.
- Modified the Speech Tuner to make the Phonetic Speller tool accessible from all views, instead of being limited to the Grammar Editor View.
- Modified the Speech Tuner to remember user selections when working with the Phonetic Speller and Advanced Filtering dialogs. User selections will now be remembered the next time these dialogs are used.
- Speech Tuner Summary View now shows utterance list filtering options as unavailable when multiple selections are made. Previously the operation was still disabled, but visibly appeared as though this was a valid option (it's not)
- Added a new return code (LV_ACQUIRING_LICENSE_FAILED) as a possible return value to LV_TTS_WaitForSynthesis. This will be triggered if a necessary voice license was not available.
- Removed the deprecated END_OF_SPEECH_DETECTION option from client_property.conf - this setting should no longer be used. Use the VAD_EOS_DELAY setting instead.
- Added logging of diagnostic issues when attempting to resolve external URI references. Increased logging verbosity will show more detail. This information can now provide more specific detail when URI issues occur, such as when an HTTPS handshake failure occurs with the remote server, which will now indicate the nature of the URI failure, rather than simply a generic URI failure message.
- Improved Speech Tuner auto-indent handling within the Grammar Editor and SSML Editor views. Now the correct combination of whitespace characters is inserted, where previously only spaces would be inserted when adding an open-tag marker.
- Changed the way in which OpenSSL is initialized within our implementation to provide better thread-safety. This change is not associated with any reported errors from the field, but is a preventative measure against possible issues occurring in the future.
- Added a new warning message to alert Speech Tuner users when their license is due to expire within 30 days. This will be shown at startup of the Speech Tuner, but otherwise not affect any functionality. As always, Speech Tuner license expiration dates can be seen using the Dashboard utility's Licensing page.
- Fixed a problem with Media Server to allow more case-insensitive support for the 'channel-identifier' and other headers when using MRCPv2.
- Fixed a problem with Media Server when processing MRCPv2, where recognition would be incorrectly aborted if START-INPUT-TIMERS was received after barge-in was already detected. This change brings MRCPv2 into alignment with our existing MRCPv1 implementation when this situation is encountered.
- Added missing example files in Linux: 1234.ulaw,8587070707.pcm, SimpleTTSClient.ssml and ABNFDigits.gram are now located in the /usr/share/lumenvox/client/data/ folder.
- Added missing example file to the Windows Engine package to install a sample SSML document named SimpleTTSClient.ssml
- Fixed a minor issue when accessing shortcut menus in Windows installations, which would always run the batch file after changing to %LVBIN% folder, which was not always the desired or expected behavior. Now the batch files will operate in the expected folder (whichever was selected).
- Fixed a bug when loading external referenced grammars via HTTPS URI. The parameters of SSL_VERIFYPEER and CERTIFICATE_AUTHORITY_FILE were not being passed on from the parent grammar to the externally referenced child grammars, leading to grammar load failures since default values were being used when resolving those child references, which would not be appropriate in non-default configurations.
- Fixed a minor bug to remove an unwanted quote mark when using auto-complete in the SSML Editor of the Speech Tuner, when selecting 'spell-out'
- Fixed a minor bug in the Speech Tuner Call Browser View where 'Clear Selection' and 'Re-evaluate Selection' options did not operate as intended for the selected interactions.
- Fixed a minor bug in the Speech Tuner Tuning Wizard metrics for Confidence tuning to use the correct statistics and colors in comparisons against the baseline and optimal thresholds.
- Fixed ASR handling of optional parameters in URIs that appear after the question mark for referenced grammars and lexicons.
- Fixed TTS SSML parsing issue where & appearing within quotes were being incorrectly processed (within URI references for example).
- Fixed a minor issue in the Speech Tuner when displaying decoded results overlaid on the audio control's waveform image. If decodes were run within the tuner (using the Tester View), only the resulting decode results would be shown, and not the original. Now switching between the original and output versions is correctly shown.
- Made a minor css style change to correct a word-wrapping issue on the Licensing page in the Dashboard
- Made a minor css style change to alter the color of expired maintenance dates from red to orange to reduce the perceived severity on Licensing page in the Dashboard
- Fixed suppression of HTTPS URIs in the Media Server logs
12.2.100 (September 2, 2014):
Improvements and New Features:
Added many internal cosmetic and functional changes to the Speech Tuner relating to speed, stability and performance in addition to specific features and changes described below.
- Added new Random Sentence Generator to the Speech Tuner.
- Added new Pronunciation Checker functionality to the Speech Tuner
- Added new filtering options to Speech Tuner for audio length, signal-to-noise ratio, menu index, RTF and grammar set index
- Added new status filtering options to Speech Tuner, allowing for OOG, OOC, Transcribed, No-Input and No-Match
- Added new Out-Of-Coverage (OOC) option to Speech Tuner, along with option to treat OOC as OOG for backward compatibility.
- Added new Tuning Wizard functionality to Speech Tuner, which analyzes data and reports any issues detected
- Added new Speech Tuner option to load callsre files recursively from a selected folder, significantly increasing loading speed, when lots of files are being used
- Added new option to Speech Tuner allowing users to specify the number of threads (and corresponding speech ports/licenses) that will be used. This can dramatically increase Speech Tuner performance over previous single-threaded behavior (while using more licenses).
- Added new API option to specify PROP_EX_CONFIDENCE_THRESHOLD for an application. This is automatically called when using the Media Server, but for API customers wishing to track and use confidence threshold values during tuning, they should use this API functionality.
- Added new client_property.conf setting for MENU_ID_STRING_MODE for client applications. This setting defines which information used to determine the uniqueness of an active grammar set, when used in conjunction with the Speech Tuner to automatically discover menus and grammar sets so that data can be organized more naturally using these constructs
- Added new options in Speech Tuner to allow specification of strategy to use when determining menu/grammar set uniqueness. Also added option to force this method to be used when loading data (essentially overriding whatever setting was in effect when the callsre files were generated)
- Added new automatic log cleanup mechanism to the LVManager service. Now log files/folders will be automatically cleaned up after a number of days (specified in manager.conf). Separate configuration settings are available for resource logging files (used by the Dashboard charts) and regular log files. This mechanism also caters for the extended logging mode, which rolls log files over to new sub-folders based on the current date, and will also clean up any vacated sub-directories as part of this processing.
- Added new options to 'Manager' dashboard configuration screen to specify settings for log_file_max_age (default = 0/disabled) and resource_log_file_max_age (default = 30 days)
- Added new Dashboard options to allow configuration of log cleanup mechanism via Manager configuration screen
- Added new Dashboard functionality to display machine uptime in main Summary view
- Added option to specify audio Sample Rate in SimpleTTSClient_c example code. This already existed in the c++ sample. Also changed the samples to use the default values specified in client_property.conf file unless explicitly specified in the command line. Previously the audio format would always default to ULAW when not specified.
- Added new Speech Tuner option to right-click TTS interaction and edit the corresponding SSML via context menu
- Added new menu and grammar set processing functionality to the Speech Tuner, offering a more intuitive organization of data, based on the combinations of grammars used for different interactions
- Added new menu and grammar set context toolbar to the Speech Tuner to allow easy switching between these new contexts
- Added SSML interaction processing support to CallIndexer
- Added CPA and AMD specific testing to LVShowConfig
- Added new Grammar Editor option allowing users to select either the currently active 'editor' grammar, or all selected grammars when parsing using the grammar editor. This new option allows multiple grammars to be parsed at once, as opposed to a single grammar as implemented previously. A separate parse result will be displayed for each active grammar.
- Added new Speech Tuner statistics to track "False No-Input" and "Correct No-Input", allowing better analysis of No-Input interactions
- Modified the behavior of MIN_FREE_DISK_SPACE in logs.conf. Now, when this is set to a value of 0, the disk free space checking algorithm will be disabled.
- Modified customer example code to demonstrate the correct use of IGNORE_LV_DEPRICATED when defined.
- Modified comments in customer example code relating to disambiguation
- Modified SSML parser for TTS1 to support both "spell-out" and "spell" as options for 'say-as'
- Updated SSML parser to improve support for decimal and hexadecimal numeric escape sequences
- Improved Speech Tuner auto-complete code to auto-close XML tags when editing grammars, lexicons and SSML documents
- Improved reporting of grammar load failures both within API logging and also when using the Speech Tuner. Often obscure grammar load failures (such as failure to load referenced grammar or lexicon files) are now more clearly reported.
- Updated Windows builds to use the latest version (1.0.1g) of openssl to fix any potential issues relating to the heartbleed bug. Linux versions use the currently installed version on the host machine, so can be updated independently
- Increased the severity of messages logged when the Media Server runs out of resources.
- Changed Dashboard process charts to not set the 'smoothed' plot option by default
- Modified Media Server to better handle the situation where multiple INVITE requests are being handled in parallel for the same session. Better detection is now in place to avoid responding if a previous thread is already processing the first request and starting the corresponding session. This fixes a potential issue where the Media Server is being overwhelmed by a large volume of session requests (this situation is not typically encountered).
- Modified callsre logging for No-Input type events to now record which grammars were active, since this can be useful when later tuning. Also recording more clearly which voice and dtmf grammars were active when performing DTMF decodes (again, for better subsequent tuning).
- Modified Speech Tuner confidence histogram display to better utilize display window height. Also displaying of confidence threshold ranges was improved.
- Modified Speech Tuner to display optional messages if insufficient transcribed data is available, along with helpful tips
- Modified the Speech Tuner to better represent data from callsre files before (input) and after (output) when performing decodes within Tester View
- Modified Speech Tuner transcriber view to disable transcription options for non-transcribable interactions (such as TTS interactions).
- Modified Speech Tuner SSML and Grammar Editors to automatically adjust scrollbars to match contents when the length of the lines being edited were particularly long. Now the scrollbars should be correct when editing all documents.
- Modified Grammar Editor in the Speech Tuner to allow multiple grammars and lexicons to be opened in different tabs until unloaded by the user. This allows easier switching between grammars
- Modified the grammar lists shown in Summary, Grammar Editor and Tester Views to be filtered by the new menu / grammar set context.
- Changed Speech Tuner NO_INPUT/"~Barge in Timeout" interactions so that these are no longer considered as DECODE FAILURES
- Modified Speech Tuner behavior when evaluating Decode and Transcript text matches, to now strip out any noise tags. Utterances that previously had noise tags would always show up as a "semantic match" even if the raw text matched except for the noise tags. After this change, the same utterance would now show up as a "correct".
- Improved Speech Tuner handling of loading incompatible versions of tuner zips. Previous versions would fail without specifically identifying the cause of the failure as loading a tuner zip created by a newer speech tuner than the current one. An appropriate message box is now displayed when there is a failure due to incompatible tuner zip versions.
- Added support for custom ASR lexicons to Tier1 and Tier2 licenses. Previously this functionality was restricted to Tier3 and Tier4
- Fixed a logging bug, which miscalculated the amount of free disk space when using certain block sizes in Linux.
- Fixed a small memory leak in the Media Server when AMD and CPA are active at the same time in a session
- Fixed minor quantization issue in ASR statistics reporting
- Fixed minor ASR statistics issue when these were reset (via the Dashboard) the maximum queue size was being set to 0, instead of the currently active value
- Fixed some incorrect statistical logging for ASR and TTS. This issue led to some very large numbers being generated, due to a wrap-around issue, which would throw off the average timing calculations.
- Suppressed logging of Content-ID for Media Server when secure-context is active, since this may be used to expose potentially sensitive information
- Suppressed logging of grammar labels in LVSpeechPortAPILog.txt when secure context was active.
- Fixed a minor bug where grammars were not being automatically deactivated when they were unloaded from the port. Now UnloadGrammar will also deactivate the grammar.
- Fixed a bug in GetPropertyEx when using the PROP_EX_LOGGING_ENCRYPTION_LEVEL option. Previously this would have returned 0, regardless of the actual setting
- Fixed a complex bug when dealing with multiple ASR servers that were being used by API or Media Server clients in a load-balanced configuration. Previously, it was possible for a grammar load failure to occasionally occur depending on how the grammar was being cached.
- Fixed incorrect accents within builtin Mexican Spanish date and currency grammars.
- Fixed an issue in the Speech Tuner where some TTS interactions were not correctly displaying the corresponding 'platform' information for an interaction.
- Fixed a grammar related issue in the Speech Tuner when working with grammars having the same URI but with different raw text
- Fixed file encoding display issues within Speech Tuner. Previously conversion to UTF-8 or ISO-8859-1 was performed internally before the grammar was displayed in the editor. Now the selection is more faithful and consistent with the actual format of the file (and can be changed on the fly as needed). This applies to grammar, lexicon and SSML files being edited
- Fixed a Speech Tuner file permission installation issue which resolves the inability to write log files when run without administrative privileges
- Fixed issues in Dashboard when viewing log files of services with or without the extended path mechanism (logs created in subdirectory based on current date). Also corrected an issue when viewing logs containing less than 350 characters
- Fixed issue with ASR going into an unrecoverable state when attempting to process a decode with multiple conflicting or invalid languages
- Fixed a bug in the client (Media Server or LVSpeechPort) which threw an exception in a specific situation, indicating "LicenseClient::ValidateTimedLicense" as the culprit in stack trace.
- Fixed Speech Tuner bug when extracting a TunerZip file where file handles were not closed as soon as extraction was complete, but were instead closed when the application was closed. This prevented the user from deleting the extracted folder until the Speech Tuner application was closed which also prevented re-extraction to the same folder.
- Fixed Speech Tuner "Send To LumenVox" functionality to include all grammars instead of just the last loaded grammar.
- Fixed a minor cosmetic issue when scrolling up or down one line at a time when viewing logs within the Dashboard.
- Fixed a problem when viewing logs within the Dashboard for services using extended logging, with log files in different sub-folders
- Fixed a Grammar Editor bug in the Speech Tuner where escaped quotations (") in SSML XML attributes were being mishandled. This was noticeable in specifying a
element in SSML with "x-sampa" alphabet where a quote (") as part of the sampa phoneme syntax indicates a primary stress. Even though the quote was correctly escaped using " in the input SSML, on preparsing the SSML, due to the internal handling it was getting unescaped to ", thus breaking the XML.
- Fixed bug when loading referenced custom lexicons from callsre files via the Speech Tuner.
- Fixed a small leak in LVSpeechPort when using custom lexicons.
12.1.100 (February 10, 2014):
Improvements and New Features:
- Added feature in Speech Tuner to allow changing the TTS Server ip-address via the Options dialog.
- Added feature in Speech Tuner to allow text-encoding for SSML to be specified. The encoding is auto detected when a file is loaded from disk but can also be modified via a drop down box to either UTF-8 or ISO-8859-1
- Added feature in Speech Tuner to allow editing of grammars as either UTF-8 or ISO-8859-1 encoding type.
- Improved handling of grammar load failures in MRCP to return more appropriate completion causes: 005 gram-compilation-failure, 009 uri-failure, and 010 language-unsupported. The optional MRCP headers "Failed-URI" and "Failed-URI-Cause" are also now populated upon a URI failure
- Improved handling of SSML load failures in MRCP to return more appropriate completion causes in case of URI failure. The optional MRCP headers "Failed-URI" and "Failed-URI-Cause" are also now populated upon a URI failure
- New Dashboard option allowing statistics to be reset as needed
- Improved rules for es-MX builtin:grammar/date
- Changed secure context functionality to be more secure by suppressing grammar label, URI and grammar text when secure_context=1 (active)
- Changed Media Server default ports for mrcp_server_port_base and rtp_server_port_base to 20000 and 25000 respectively to avoid overlap with the CentOS6 ephemeral port range.
- Added new check at Media Server startup to detect overlap between ephemeral port range and RTP/MRCP port ranges. A warning message is logged to the media_server_app.txt log as well as lumenvox_critical.txt if any overlap is detected
- Added new checks to Media Server startup to detect and report any overlapping of MRCP and RTP port ranges. This is similar to the new ephemeral port range checks.
- Added checks in LVShowConfig/lv_show_config to log a warnings if there are any overlaps between the the RTP Port, MRCP Port and Ephemeral Port ranges.
- Added new estimate grammar complexity feature, viewable in Speech Tuner in the grammar property page. This value represents a relative complexity for the specified grammar when used with the LumenVox ASR engine. This may be particularly useful when debugging or determining scalability when using this grammar.
- Modified order of interpretations returned when multiple grammars match the input to be sorted by grammar activation order rather than alphabetical order of label
- Modified Dashboard, adding the ability to detect and correctly locate and log files when extended logging mode is enabled (log files stored in date-based sub-folders)
- New dashboard monitoring functionality to allow many more days to be displayed (previously limited to one day). Also allows for scaling of Y (vertical) axis, and persistence when hiding series. Another new option allows only currently running processes to be shown, or all tracked processes (not necessarily still running). Scaling in the x (horizontal) direction can now be done using a slider with min/max options or by using an optional settings dialog. Requires updated Manager from 12.1 in order to work correctly (this functionality is not backwards compatible with previous versions).
- After adding the extended Dashboard monitoring functionality, a more compact and efficient method of data transfer was implemented. When viewing more than 10 hours of data, it is displayed in a summarized format, based on hourly min/max range. When viewing less than 10 hours of data, more detailed minute-based information is displayed. These changes allow for a much more responsive user experience and much less unnecessary data traversing the network.
- Added TTS API to return Error String for a specified Return Code. These new calls are LV_TTS_ReturnErrorString and LVTTSClient::ReturnErrorString, and make the TTS interfaces more consistent with the ASR interfaces.
- New feature to stop logging to centralized logs if there is less than a specified amount of free disk space. This amount can be specified in logs.conf, with the default being 100MB.
- New Media Server NUM_CHANNELS setting that should be set to the anticipated maximum number of channels (ASR + TTS + CPA ports) being handled by the server. This single setting will automatically assign appropriately scaled settings to num_spawning_threads, num_graveyard_threads, num_mrcp_threads, num_rtp_threads and listening_socket_size to optimize memory use and performance. Note that num_spawning_threads, num_graveyard_threads, num_mrcp_threads, num_rtp_threads and listening_socket_size settings should be assigned as either 'auto' or 'default' values to enable this mechanism. For RTP event threads, MRCP event threads and spawning threads, the value automatically assigned will become 2 for up to 500 channels and then scale up to 4 threads for 1000 channels or greater.
- For graveyard threads, we set the number of threads as 4 up to 400 lines and then scale up to 10 threads for 1000 lines or greater.
- Deprecated support for Upgrade Analyzer tool, which is no longer required.
- Deprecated the following Global Grammars API calls:
LVSpeechPort::LoadGlobalGrammar(const char* label, const char* uri);
LVSpeechPort::LoadGlobalGrammar(const char* label_is_uri);
LVSpeechPort::LoadGlobalGrammarFromBuffer(const char* label, const char* buffer_string);
LVSpeechPort::ActivateGlobalGrammar(const char* name);
LVSpeechPort::UnloadGlobalGrammar(const char* uri);
LVSpeechPort::LoadGlobalGrammarFromObject(const char* label, LVGrammar& Grammar);
NOTE: These API calls can still be used by including LV_SRE_Deprecated.h
- Updated confidence scoring algorithms that show modest overall improvement to confidence scores. This change may not be noticeable to most customers, but allows for future planned improvements
- Improved customer examples to clean up some inconsistencies and make the code more consistent with our coding recommended
- Improved Speech Tuner grammar and SSML editors to modify the auto-indent feature when a new-line is entered. There is now more consistent white-space formatting to match the previous line, which gives a better and more consistent user experience.
- Improved to add more detailed logging messages when encountering a failure to load from an external reference in a grammar
- Fixed grammar bug in Speech Tuner when loading up tunerzip files which led to the wrong grammar being used in decodes
- Modified Media Server configuration from within Dashboard to permit 0 (disabled) as a valid entry for SIP_port and RTSP_port.
- Fix for minor leak when setting custom js footer in compatibility mode
12.0.100 (November 18, 2013):
Improvements and New Features:
- Various changes to logging mechanism to allow settings to be modified by LVCONFIG/logs.conf file. This file will be created if it doesn't exist.
- Multiple streamlining and performance improvements across all LumenVox products, resulting in more responsive handling of requests, and more efficient use of memory and resources. These changes may be most evident for users running a very large number of simultaneous sessions, which should now perform noticeably better at higher loads.
- Added 4 new TTS languages and voices
- European Portuguese Male: Adriano
- Indian English Female: Rani
- Turkish Female: Sevi
- Swedish Female: Janna
- Added SSML support for <sentence> as a substitute for <s> and <paragraph> as a substitute to <p>. While this is not required by the SSML specification, customers have frequently used these tokens interchangeably.
- Improved TTS logging of SSML parsing to specify the missing mandatory property that caused an element to be skipped.
- Modified Tuner filters to allow string matching to be case insensitive. For example if looking for "Transcript Text" using the "Contains" option to search for strings will now perform case insensitive matching. Previously this was case sensitive.
- Improved error handling for loading W3C n-grams. Malformed n-grams will now be handled in a more robust way, and any errors detected will be reported in a clearer manner.
- Removed unwanted constrains on logging verbosity settings. These were previously being capped at 3. Note that using verbosity at higher settings on high-bandwidth systems, may have an impact on performance due to the amount of data written to the logs.
- Various internal changes to licensing mechanism that will be deployed in future versions to assist users when deploying or updating licenses. These changes will have no effect on new or existing users for the moment.
- Added support for optional quotes around vendor-specific-parameters. Customers have asked us to support things like secure_context="1" as well as secure_context=1. This was added for all vendor-specific-parameters. Note that when querying vendor-specific-parameters after setting them, the returned values will reflect what the client set (with or without quotes, as appropriate). Our default remains without quotes.
- Added new Media Server configuration option (max_rtp_packet_size) that allows maximum size of RTP packets being received. These changes also lower the default value of DEFAULT_RTP_PACKET_LENGTH to 200 (from 260). Permitted RTP packet sizes are now in the range 180-3000 bytes.
- Added new options to LVManager service that allow tracking of system and process resources over time. By default, statistics from LumenVox processes will be tracked, however users can easily configure settings to monitor additional processes instead, or in addition to these. Samples are taken by default once every 60 seconds. This sample frequency can also be changed by configuration setting.
- New Dashboard 'Monitoring' page that graphically shows the performance history of the system and processes that are being tracked. Real-time CPU, memory, thread, handle and disk use can be displayed, including up to 24 hours of history.
- Modified Dashboard to allow selection of verbosity settings up to and including 5 (Highest). Note that verbosity settings this high can have an impact on throughput if a very large number of simultaneous ports are being used.
- Added Dashboard option to display installed acoustic model languages when viewing the ASR configuration page.
- Added Dashboard options to display TTS voice names that are installed when viewing the TTS Server configuration page.
- Added several new Dashboard options allowing configuration of ASR 'ENABLE_APP_STAT_LOGGING', Media Server 'USE_SPEECH_COMPLETE', 'FORCE_INCREMENT_RTSP_CSEQ' and 'MAX_RTP_PACKET_SIZE'
- Many changes to accommodate suppression of grammars being logged when running in secure_context mode.
- Added more statistical tracking of resource use within ASR, which can optionally be sent to the asr_server_status.txt
- Replaced timing mechanism used by Media Server when tracking various session based timers. The new mechanism is significantly more accurate, and allows timing precision down to a small number of millisecond in most cases.
- Implemented more reasonable floor values for Speed vs Accuracy settings, resulting in fewer recognizer-errors being returned when using edge-case settings.
- Added a new 'force_increment_rtsp_cseq' configuration setting to Media Server for RTSP message processing. The default is 0, in which multiple replies for the same request will have the same RTSP CSeq as the original request. Setting this as 1 uses stricter adherence to the specification, where outgoing (server originated) RTSP messages will have their own CSeq numbers, starting from a value of 1. The default setting of 0 will behave as previous versions.
- Added a new 'use_speech_incomplete' Media Server configuration option. If enabled, the greater of speech-complete-timeout or speech-incomplete-timeout will be used for EOS delay. If disabled, only speech-complete-timeout will be used (as in previous versions).
- Changes were made to builtin DTMF grammar processing where the specified language parameter will be used when selecting the appropriate grammar.
- Added ISO-8859-1 encoding declaration to NLSML output to better support accented characters being returned to other platforms via MRCP. This resolves potential ambiguity where the other platform assumes UTF-8 encoding.
- Removed the 'Save-Waveform: false' header from SimpleASRClient requests, thus allowing users to control save-waveform from their configuration files as needed.
- Minor styling change on Dashboard create_server_id.html page to be clearer for users.
- Deprecated API functions LV_SRE_DecodePitch, LV_SRE_DecodeEnergy and LV_SRE_GetVoiceChannelData.
- Deprecated the following properties, that were not being used and have not been supported for some time:
- Improved TTS SSML handling for languages including handling of Gwendolyn and Gavin when the language is not specified. We now do a better job of picking the language in a smart fashion if both cy-GB (Welsh) and en-GB (British English with a Welsh accent) voices are available. If there was a language specified in any parent property that is valid for the current element in SSML, we use that language in picking the language for these voices.
- Several TTS enhancements, including:
- Rebuilt voice models
- Improved pronunciation accuracy in Russian, Romanian, English, Brazilian Portuguese, Spanish, Italian, Dutch and German voices
- Improved pronunciation of URLs in Italian, Dutch, Danish and Brazilian Portuguese voices
- Improved support for VXML phone number and time formats in SSML <say-as>
- Improved splitting of paragraphs ending with a single EOL character
- Fixed a small memory leak detected across all LumenVox products when connection between client and server is lost under heavy load conditions.
- Fixed Speech Tuner problem which mishandled external references when loading grammars from tuner zip files
- Fixed Speech Tuner issues with dragging/dropping callsre files, including encryption callbacks. Also added Welsh and Russian voices and languages to SSML editor auto-complete options.
- Fixed bug in TTS where prosody rate changes applied as a relative value was ignored. e.g. "+20%" or "-15%" would not work in previous versions. Absolute values of 1.20 or 0.85 has always, and continues to work.
- Fixed an issue in the Media Server where a received MRCP message may not trigger an immediate response when under high load in Linux
- Fix for forward-slash characters used in GrXML phrases. These are now surrounded by quotes when processed.
- Removed fractional dollars from builtin:currency. This fixes an issue where invalid currency utterances could be accepted, also leading an invalid semantic interpretation
- Fix for exception in SimpleMRCPClient if an audio file with no suffix/extension was specified.
- Fixed a port/license leak in the Speech Tuner when saving a pre-compiled grammar
- Fixed issues with secure_context log suppression
- Fixes to better handle the situation where users specify a NULL or empty dtmf-term-char, if specified at the request level. Previously a null string would result in the default # from the session being used.
- Several changes to correctly implement dtmf-term-timeout, and slightly adjusting dtmf-interdigit-timeout. These changes affect MRCPv1 and MRCPv2.
- Fixed a TTS bug where, if no specific voice or gender is specified, the original priority list will be used in deciding which voice to use as opposed to being effected by the previous synthesis voice or gender.
- Fixed a problem where compiling grammars over a certain size could not be saved using GrammarLoader due to timeout problem. Previously, the timeout was only associated with the grammar compilation, but if the size of the compiled grammar is significant then the fetching of the compiled grammar could fail due to a small fixed timeout that was used. This has been changed to allow the remainder of the specified PROP_EX_LOAD_GRAMMAR_TIMEOUT
- Fixed a bug where nested NULL rules at the end of a sequence of symbols were not being handled correctly
- Fixed an issue where square brackets in the middle of grammar URIs were causing the URIs to be incorrectly terminated at that point. These '[' and ']' characters are now classed as permitted within URI strings.
- Fix for grammar issue where rules with similar constructs could cause unwanted additional parses. In particular this issue was being seen for the builtin:dtmf/currency grammar. Now these types of grammars should return the correct number of parses and in a shorter period of time
- Fix to better handle newlines within SISR literals, which previously had problems when evaluating newlines within strings and would incorrectly return a syntax error. This fixes an issue with the interpretations returned for grammars using a tag-format sisr/1.0-literals with a newline in the tag.
- Fix for problems seen during grammar load when performing a server-initiated grammar fetch from client while also performing a client-initiated load request using the same grammar. This issue may have be seen when using multiple ASR servers from a single client and under some significant load
- Added missing root rule to builtin es-MX date grammar
11.3.100 (August 27, 2013):
Improvements and New Features:
- Added new TTS API functions LV_TTS_GetLastSynthesisError and LV_TTS_GetLastSynthesisErrorCode to provide more verbosity when TTS synthesis errors are detected.
- Added Response (callsre) file encryption to code ASR and TTS functionality as well as adding necessary decryption functionality to Speech Tuner. Please see our new Securing Sensitive Data article for more information.
- Added support for the new Russian TTS voice Nikita
- Improved ASR acoustic model performance for all languages and all models. This is a major change to the ASR and should provide more accurate results in most cases.
- Added support for 22 kHz TTS1 voices for API customers (no Media Server transport of 22 kHz is supported). Users can now choose between 8 kHz (telephony) and 22 kHz (Web / Mobile / Desktop / Embedded / Other) options.
- Modified SimpleTTSClient to allow a new -rate option so that users can specify either 8000 (default) or 22050 Hz when performing synthesis.
- Added support for TTS1 viseme generation with new PROP_EX_VISEME_GENERATION option and matching VISEME_GENERATION client_property.conf setting. These new API functions are used to provide access to visemes:
- Minor updates to the TTS API comments. Also added the C++ LVTTSClient versions of the recently added GetLastSynthesisError and GetLastSynthesisErrorCode functions. LVTTSClient member functions now correctly return LV_INVALID_TTS_HANDLE instead of LV_SYSTEM_ERROR if called when m_client is NULL (before the class was initialized, or after it is destroyed)
- Improved user feedback when saving TTS audio to file
- Improved loading of grammars to reduce time taken processing cached grammars. This should reduce the amount of string processing that occurs on repeated load grammar requests and fix a possible threading issue when reading URI fragments.
- Added new internal caching mechanism to Speech Tuner to improve performance and user experience. This is especially important when dealing with encrypted response files, but has benefits for all response file handling.
- Fixed a minor issue where Media Server would add unwanted telephone-event and fmtp headers to the SDP for MRCPv1 SETUP replies when running in compatibility mode 1.
- Minor corrections to API log label decoration in calls to UnloadGrammar
- Fix for a problem with Avaya AEP 6 when NOT using new RTSP session per call. Previously the media server could crash when processing two calls in the same thread after one call terminates. Note that users should select the "Use New RTSP Session Per Call" setting, so this situation should never occur.
- Minor change to resolve Speech Tuner exceptions when launching from Windows Explorer shell by double-clicking interactions files.
- Fixed issue that was introduced in 11.0 when using SSML for TTS1 when only the "gender" attribute was specified for a "voice" element without an "xml:lang" attribute. The "gender" attribute was previously being ignored in this scenario.
- Fixed a bug in the Speech Tuner which prevented platform information from being displayed for Response Files.
- Fixed some existing return values that were previously incorrect when calling LV_TTS_SetPropertyEx
- Fixed typo in SimpleMRCPClient usage information. Previously an argument of 1 for -secure_context was indicated. However, there should be no argument for a -secure_context specifier
- Fix for grammar compilation handling of recursive rules. This fixes cases where out-of-grammar recognition may be returned from the decoder when using complex grammars that contain recursion or an unterminated repeat operation.
11.2.200 (May 22, 2013):
Improvements and New Features:
- Added missing Completion-Cause header from DEFINE-GRAMMAR responses for both good and bad MRCP results. This header was missing from both MRCPv1 and MRCPv2. This change makes the Media Server more consistent with both specifications
- Fixed an issue when encountering grammars with more than 10,000 characters within a single line without any line feeds. This caused the LumenVox Client, Media Server and Speech Tuner to issue a segmentation fault
- Fixed SSML processing for the following XML escaped punctuation characters within text to be spoken as part of a TTS synthesis request. " ' < > This caused errors processing the XML syntax of the SSML, causing the requested synthesis to fail.
- Minor fix to processing of special VOID rule handling. This would only affect grammars using the VOID rule together with a repeat operator
- Fix for internal GUID being returned in results in place of grammar label when running in CPA/AMD modes
11.2.100 (May 6, 2013):
Improvements and New Features:
- Added new functionality to support LumenVox/CMU and SAMPA format phoneme strings, significantly extending the previous custom lexicon functionality. Please refer to ourknowledgebase article for more details on using custom lexicons, including the recent enhancements.
- Adding new LV_SRE_GetSampaToLumenVoxConversion, LV_SRE_GetLumenVoxToSampaConversion, LV_SRE_IsValidLumenVoxPhonemeString and LV_SRE_IsValidSampaPhonemeString API functions as part of other custom lexicon enhancements
- Many changes to add support for custom ASR lexicons in Speech Tuner, allowing them to be created and/or edited within the grammar editor section. Lexicons can also now be edited and tested within the tuner, like grammars.
- Updated the Speech Tuner Phonetic Speller dialog (with optional SAMPA support) to support new new custom lexicon functionality.
- Added HTTPS support for fetching grammar and SSML documents. Two new configuration options were added to the global section of client_property.conf to give users control over certificate verification options: SSL_VERIFYPEER and CERTIFICATE_AUTHORITY_FILE.
- SSL_VERIFYPEER defaults to 1, but may be set to 0 skip certificate verification for trusted sites.
- CERTIFICATE_AUTHORITY_FILE may be used to specify the path a CA cert file to be used to verify peer certificates upon HTTPS requests.
- Moved AuthorizationLog.txt log file from LVBIN to LVLOGS folder to be consistent with other logs. This is only used when the license server when run in authentication mode, which is fairly uncommon.
- Minor optimization to the Speech Port to reduce the number of threads used. This should improve resource utilization performance when running many ports simultaneously.
- The EULA document shipped with LumenVox products was updated.
- Minor cosmetic changes to improve refreshing of Revert button in Speech Tuner's SSML editor
- Added auto-indent and improved auto-complete and syntax-highlighting functionality in Speech Tuner grammar and SSML editors.
- Updated example ABNF/GrXML/SSML and new lexicon files when creating new files in the Speech Tuner
- Minor changes to move logging files LVLicenseReport.logdata to LVBIN in Windows and /var/lumenvox/license_reports in Linux, and also LVLicense_A.logdata and LVLicense_B.logdata to LVBIN in Windows and /etc/lumenvox in Linux.
- Minor change to formatting of NLSML to remove a carriage return when used with Avaya Orchestration Designer, which did not handle this correctly in previous versions. This issue does not affect the Avaya Aura Experience interface, or other Avaya products.
- Added new higher verbosity logging level (LOGGING_VERBOSITY_HIGHER) to reduce logging load on systems. LOGGING_VERBOSITY_HIGHER should only be used when fully debugging, will likely affect performance under medium to high load.
- Improved responsiveness of Media Server when under significant load to prevent unwanted delays in MRCP message processing
- Due to improvements made in the LumenVox ASR’s handling of grammars internally, the ASR server has changed the way it responds to speech clients (e.g. the Media Server). This means that users with pre-11.2 speech clients must upgrade those clients if they wish to use the 11.2 ASR server. Attempting to use a pre-11.2 speech client with an 11.2 or newer ASR server will result in grammar labels being replaced with hash codes.
- Fix for issue relating to multiple grammars sharing the same session: label in the Media Server. Previously if the same grammar was used a second time in the same session, but with a different label (Content-Id), the original label may be returned in the result string.
- Fixed a grammar processing bug, related to grammars defining rule paths that contained a NULL (special) rule in addition to a semantic tag. For example:
<rule id='example' scope='public'>
- Minor fix to Media Server SDP parsing to accommodate optional encoding parameters at the end of rtpmap lines, which were previously causing 486 Busy replies to session requests if present.
- Minor fix to NLSML formatting to remove an unwanted quote mark at the end of the <result> element when preparing a nomatch reply
- Fixed LV_SRE_WaitForEngineToIdle, which in versions since 10.0 had a bug that caused the call to block up to at most decode timeout if the decode did not complete.
- Fixed a memory leak that could occur when writing to disk fails or when handling a very large number of simultaneous logging messages under extreme system load
- Fixed a problem when attempting to load a malformed XML grammar into the Speech Tuner grammar editor. Previously the grammar would fail to load, now it can be loaded (indicating error) and can be corrected.
11.1.100 (March 13, 2013):
Improvements and New Features:
- Added support for SIP over TCP connectivity. Now, either TCP or UDP SIP connections can be established with the Media Server. These connections share the same sip_port number as defined within the configuration file (with the default value for sip_port being 5060).
- Added SIP/TCP connectivity support to SimpleMRCPClient utility, consistent with new Media Server functionality. See the new "-transport TCP" option in the Using the SimpleMRCPClient article for details.
- Added SIP/TCP support to lv_show_config (Linux) / LVShowConfig (Windows), consistent with new Media Server functionality. See the Running lv_show_config article for more details.
- Added support for text/grammar-ref-list media type to DEFINE-GRAMMAR requests within the Media Server, as described in RFC6787. Although this media type is not explicitly supported by MRCPv1, the LumenVox implementation supports this type for both MRCPv1 and MRCPv2 protocols. See the Recognizer DEFINE-GRAMMAR article for details on how different grammar references can be used with the LumenVox Media Server.
- Added Completion-Reason header to indicate the reason behind grammar load failures where possible. This is an MRCPv2 only feature. MRCPv1 users should continue to consult the client and grammar logs for additional information relating to grammar load failures. Additional logging changes now also include more verbosity when reporting such errors (for both MRCPv1 and MRCPv2).
- Added new configurable APP_STAT_LOGGING option in tts_server.conf, which enables logging useful statistical information to (tts_server_status.txt) file regarding the TTS Server. This change is consistent with other status logging for other LumenVox services.
- Added support for non-compliant GrXML grammar files that are missing their XML prolog at the top of the file. This is consistent with non-compliant behavior in some other vendors. See Section 4.3 of the SRGS specification for compliance details. We now consider the <?xml version="1.0"?> to be an optional requirement in GrXML grammars when specified inline, or by URI reference. This change in support applies to both Media Server and C/C++ API level integration.
- Added new Media Server option to suppress sending 'TRYING' messages in response to SIP INVITEs. This change modifies earlier behavior, which would automatically send these optional messages. Now the default is to NOT send these messages, however this can be overridden setting the send_sip_trying configuration to 1 if needed. Not sending these messages is more efficient for both client and server resources when connecting.
- Added size and duration parameters when reporting waveform-uri in MRCPv2 RECOGNITION-COMPLETE messages. These two parameters are now appended to the end of the Waveform-URI (when enabled). Size is reported in bytes and duration is in milliseconds. Note that these parameters are only defined in MRCPv2. This functionality is now consistent with the latest MRCPv2 specification document. See section 9.4.8 of the MRCPv2 specification for details on using Waveform-URI
- Improved layout and formatting of Media Server MRCPv1 and MRCPv2 logging messages to provide clearer reporting.
- Significant set of changes to Media Server to support request-based parameter settings as well as session-based. Now the scope of the header settings is correctly applied to either session (when using SET-PARAMS/GET-PARAMS) or request (SPEAK/RECOGNIZE/etc.) as needed. This set of changes also optimizes the handling of all settings so that they are now only applied to the ASR/TTS port when needed. Previously, settings may have been applied multiple times unnecessarily, utilizing more CPU than was strictly necessary.
- Added new HTTP-based Dashboard to replace the previous (and now deprecated) LVDashboard GUI application, which was Windows-only. The new Dashboard application provides remote web access to LumenVox services installed on a Windows or Linux machine. The web server is included within the lv_manager codebase and can therefore coexist with IIS and/or Apache, or any other web servers installed on the system. Please see the LumenVox Dashboard Overview article for more information.
- Added new Welsh voices : Gavin (male), Gwendolyn (Female). Note that these voices can be configured to speak the Welsh language (cy-GB) or British English with a Welsh accent (en-GB). The language code can be used to specify which language to use when utilizing these voices.
- Added the option to enable the use of non en-US builtin grammars to be used within the Media Server. The "Speech-Language" is now used to determine the appropriate builtin grammar. See theBuilt-in Grammars article for details on adding support for additional languages as needed, where you can provide your own versions of builtin grammars for various languages.
- Added new shortcuts and hyperlinks to Windows installation packages, making the products and documentation more accessible to new users
- Added new Tools package for Windows, which includes the new Dashboard and LVManager utilities
- Modified the default Media Server values for "Start-Input-Timers" (MRCPv2) and "Recognizer-Start-Timers" (MRCPv1) from "false" to "true" to comply with the MRCP specifications. Users may override this default by adding a configuration value (recognizer_start_timers) into the [MRCP] section of media_server.conf. For example, adding the line 'recognizer_start_timers= false', will force this default to be false instead of true as required by the MRCP specifications. Note that this is a change from the previous implementation, and may result in unexpected No-Input-Timeout events if applications were not adequately controlling these timers, for example if this value was not being set to either true or false for RECOGNIZE tasks. See respective sections in the MRCPv2 Specification or MRCPv1 Specification for details.
- Optimized processing of RTP streaming data, reducing CPU and memory load slightly. Also added handling of processing an incorrect/invalid IP address from the client, falling back to using the client IP address if necessary.
- Minor wording change in the Media Server logs - "Error accessing RTP..." is now used instead of "Error while listening to RTP..." since this may be emitted whenever sending data, not necessarily only when listening to the port (for example if unable to use the UDP port for TTS streaming)
- Changed the order of the SDP headers in SIP messages to be more consistent with other vendors. This is in response to some customers that (incorrectly) expect these to be in a certain order.
- Reduced logging of OPTIONS and DESCRIBE requests to prevent log contamination. Messages are still shown to indicate these requests were processed, but the contents of the inbound and outbound packets are no longer logged. The number of these requests can still be seen in media_server_app.txt and also media_server_status.txt
- The EULA document shipped with LumenVox products was updated.
- Documented reported CentOS6 Media Server high idle CPU utilization issue in a new knowledgebase article.
- Changed LV_SRE_SetCustomCallGuid to return LV_INVALID_HPORT when an invalid port was detected. Previously LV_FAILURE would result from this situation, which is less clear.
- Added new error code LV_FUNCTION_DISABLED (-69) defined as "The selected function is currently disabled." This is reported whenever an API call is made that cannot succeed due to functionality being disabled, such as attempting to call AddEvent when callsre logging is disabled. This is technically not an error, however this return code indicates that the specified action was not carried out. This was added to improve logging clarity, and this change is also reflected in the LVErrorCodes/lv_error_codes utility.
- Minor changes to logging for LV_SRE_SetPropertyEx, LV_SRE_SetClientPropertyExPermanent, LV_SRE_CreateClient, LV_SRE_GetStringProperty, LV_SRE_GetAvailableLanguageIndex and LV_SRE_GetCallGuid to improve readability and reduce possible confusion.
- Media Server was modified to remove the now-defunct Monitoring port functionality, which is now replaced by the new Dashboard functionality described above.
- Improved shutdown sequence for core connectivity between modules, allowing faster and smoother shutdown whenever stopping or restarting services
- Removed no longer used configuration settings 'Log' and 'MAINTENANCE_MESSAGING_PORT' from CallIndexer configuration.
- Modified Media Server grammar loading to pass in session parameters so that MRCP parameters such as "Speech-Language" and "Fetch-Timeout" specified in a SET-PARAMS request are applied to the grammar load from a subsequent DEFINE_GRAMMAR, RECOGNIZE, or INTERPRET request
- Modified Media Server to improve performance, reduce footprint and perform better under extreme load conditions. These changes included optimizing code, cleanup and also addressing a limitation encountered in Linux Operating Systems when running a very large number of simultaneous channels, using more than 1024 sockets. This value equals around 300 channels, depending on a number of factors such as protocol being used and number of active logs, open connections to ASR, License and TTS servers, etc.
- Added clearer logging and return values when returning from a grammar load failure due to licensing in delayed licensing mode and in some grammar load time out situations. Also addressed minor issues when performing many load grammar requests of the same grammar in a large number of simultaneous threads.
- Fixed a problem associated with MRCPv1 TTS and RTP audio streams when using proxy servers. Specifically, whenever a client requested TTS RTP audio to be sent to a different IP address than the client address using the c=IN IP4 specifier in the SDP of the SETUP request, this was not being honored. MRCPv2 continues to work correctly in this situation.
- Fixed a minor leak in Speech Tuner when a Call Indexer IP address was specified, but the Call Indexer was not available.
- Fixed a minor leak in Media Server when processing DTMF packets when not in recognition mode, or when the session is closed/closing while the packet is being processed.
- Fixed a minor issue when streaming the last packet of synthesized audio, which if less than the full packet length would result in a STREAM_STATUS_END_SPEECH message for both the last (partial) message, as well as the following zero-length packet.
- Fixed a problem related to HTTP caching (introduced in version 10.5.110) when fetching remote grammar files. There was a problem with superfluous carriage-returns in the request packets, which caused the HTTP caching mechanism at the server to respond inappropriately. With Apache, the file was re-sent with each request (negating the effect of caching) and with IIS, these requests were rejected, causing grammar load failures.
- Fixed incorrect or skipped reporting of <mark> tags in SSML when used before a <break> tag or at the end of the document, which occurred in TTS2. SSML marks should now be correctly reported across both TTS1 and TTS2 voices.
- Fixed a bug in the API call to GetAvailableLicensesCount, which previously would only work with "Engine" products. Now this will correctly work with all license types.
- Fixed a Speech Tuner bug parsing GrXML grammar external references. Now both apostrophe and quote marks can be used. Previously the code was expecting only quotes, so would stall whenever presented with apostrophes. This problem was isolated to when using the Speech Tuner only, and does not affect core ASR functionality.
- Fixed a bug in Media Server MRCPv2, affecting BARGE-IN-OCCURRED request processing. Previously the reply to this request incorrectly had the RequestId of the active SPEAK request instead of the RequestId of the BARGE-IN-OCCURRED as was required by the specification.
- Fixed a bug introduced in 11.0, which caused an unusual situation if a grammar load was requested, then immediately canceled, leaving it in a pending state. This could affect the following grammar load and recognition task..
11.0.300 (November 20, 2012):
Improvements and New Features:
- Added support for several new TTS languages and voices, as described in our knowledgebase article http://www.lumenvox.com/knowledgebase/index.php?/article/AA-01577
- Here is a complete list of TTS voices included in 11.0, bringing the total voice count to 39 [incl. 1 deprecated] -- That's 22 new voices!!. New voices shown with asterisk beside name...
=== TTS1 ===
- American English
- Chris (Male)
- * Andrew (Male)
- * Alvin (Male)
- * Jackie (Female)
- * Amanda(Female)
- * Kim(Female)
- * Leah(Female Child)
- British English
- Ben (Male)
- * Megan(Female)
- Standard German
- Heidi (Female)
- European French
- Margot (Female)
- Castilian Spanish
- Antonio (Male)
- Martina (Female)
- North American Spanish
- Lorena (Female)
- Australian English
- * Ian (Male)
- * Mikkel (Male)
- * Helsa (Female)
- * Henrick(Male)
- * Anneka(Female)
- * Angelo (Male)
- * Emilia (Female)
- Canadian French
- * Elodie (Female)
- Brazilian Portuguese
- * Gustavo(Male)
- * Giovanna(Female)
- * Jacub (Male)
- * Karolina (female)
- * Isak(Male)
- * Birta(Female)
=== TTS2 ===
- American English
- Rebecca (Female)
- Stacey (Female deprecated)
- British English
- Sophie (Female)
- Latin American Spanish
- Changed the default voice selection behavior for TTS. Previously if no voice name or voice gender was specified, the voice gender defaulted to male under certain complex conditions, based on the (OS dependent) voice loading sequence. However, this behavior was not consistent and was simplified to allow for a predictable default priority list. This change may affect customers if they were not specifying a particular voice and relying on the defaults since a different voice may be getting synthesized now by default, however the new behavior should be much more predictable for users using several languages and voices. Please review the knowledgebase article at http://www.lumenvox.com/knowledgebase/index.php?/article/AA-01616 for more details.
- Added support for precompiled grammars. Grammars may now be pre-compiled, stored to disk or HTTP grammar server and loaded/used in their precompiled form as needed. This can be used to optimize performance by removing the need to compile these grammars at run time on production machines. The command line tool GrammarLoader (lv_grammar_loader in Linux) now includes optional parameters to support precompiled grammar generation. See knowledgebase article http://www.lumenvox.com/knowledgebase/index.php?/article/AA-01618 for more details on using precompiled grammars and http://www.lumenvox.com/knowledgebase/index.php?/article/AA-01089 for details of the updated GrammarLoader utility.
- Added support for precompiled grammars within the Speech Tuner, so that these are indicated as such, and can be used during tuning and testing. Note that precompiled grammars cannot be modified in the grammar editor. Should precompiled grammars ever need to be changed, the original grammar that produced them will need to be modified and recompiled as needed. The Speech Tuner can be used to save precompiled grammars from within the Grammar Editor.
- In conjunction with other grammar loading changes related to precompiled grammars, the Speech Tuner has been modified to allow loading of grammars from specified URI references, and also builtin:grammar/ specifiers. Regular SRGS grammars as well as precompiled grammars can now be loaded in this way in addition to previous file-based and callsre-based references. See http://www.lumenvox.com/knowledgebase/index.php?/article/AA-01117 for more details on loading grammars into the Speech Tuner.
- Added support for suppressed logging as part of our PCI compliance initiative. Users may now specify the com.lumenvox.secure_context Vendor-Specific-Parameter header to suppress logging for ASR type events when using the Media Server, and may specify the com.lumenvox.tts.secure_context Vendor-Specific-Parameter header to suppress logging for TTS events when using the Media Server. When this header value is set to 1, logging will be suppressed and any potentially sensitive data that would normally be recorded in the logs will be replaced with a value of _SUPPRESSED instead. Note that when this mode is active for a session, DTMF digit logging will also be suppressed. When secure_context mode is active, this suppression will be passed to all aspects of the product logging so that not only the Media Server logs are affected, but all logs will have sensitive information suppressed. Note that this can be enabled on a per port basis (or per session basis in the case of Media Server). Callsre log files will also have potentially sensitive data suppressed, and will not record audio for affected interactions. MRCP users can specify the (tts.)secure_context header as part of a SET-PARAMS/GET-PARAMS request or as part of RECOGNIZE/INTERPRET/SPEAK requests as needed. The default value for secure_context and tts.secure_context can be specified in media_server.conf, or also within the client_property.conf file associated with the underlying speech/TTS port(s). If there is a conflict between these two configurations, the more secure option will always be used. These default values can be overridden using the headers specified above on a per-interaction basis within sessions as needed.
- Added new PROP_EX_SECURE_CONTEXT property value that can be used with LVSpeechPort::SetPropertyEx (or corresponding LV_SRE_SetPropertyEx) as well as LVTTSClient::SetPropertyEx (or corresponding LV_TTS_SetPropertyEx). When this property is set to a value of 1, logging for the specified port will be suppressed to avoid writing out any potentially sensitive data to log files. This property can be changed between ASR/TTS interactions within the port to selectively enable (1) or disable (0) the suppression of this potentially sensitive data. These changes are part of our ongoing PCI compliance initiative.
- Added new functionality to LVShowConfig (Windows)/lv_show_config (Linux). Added MRCP v1 and v2 testing of ASR and/or TTS functionality using this utility. Previously MRCP support was excluded and this utility only implemented API-level testing.
- Added LumenVox manager connectivity test to LVShowConfig/lv_show_config to allow connectivity testing and configuration reporting to be included in this utility.
- Changed the utility's license check to test for availability of any kind of license, including SLM, SpeechPort, VoxLite, CPA and/or AMD to be more verbose for all customers. Also, improved verbosity to clarify whether no licenses were available, or no license server could be reached, whereas previously this was shown as no licenses being available when either situation occurred.
- In addition to adding new functionality to LVShowConfig (Windows)/lv_show_config (Linux) mentioned above, the specified parameters to this utility have been improved to be clearer and more consistent for users. The usage information now shows only the following list of parameters -a, -all_config, -all_test, -license_test, -asr_test, -mrcp_test, -tts_test, -h, -o. The -a parameter flag now runs -all_config as well as -all_test. Note that the -o (output) parameter will now save all output in the specified output file, whereas previously only the configuration values were reported in the output file. Now test results and configuration values will be stored in the output file, which improves usefulness. See http://www.lumenvox.com/knowledgebase/index.php?/article/AA-01635
- Added support for SSML documents housed on HTTP servers, or using a file system path, to the Media Server so that such documented will be fetched and used from the specified HTTP URI or file location. MRCP users will now be able to use the SPEAK request along with Content-Type: text/uri-list as an alternative to the previously supported Content-Type: application/synthesis+ssml.
- Added new LV_TTS_SynthesizeURL to C API and corresponding SynthesizeURL method to C++ LVTTSClient. These functions now allow API users to specify SSML documents located on HTTP servers when performing TTS synthesis.
- Improved logging to LVSpeechPortAPILog.txt so that in addition to logging parameter values passed into API functions, results from these API calls are now also logged (assuming logging verbosity is configured to report these).
- Added new SimpleMRCPClient utility application to allow users to exercise basic functionality of MRCPv1 and MRCPv2 when configuring or testing Media Server. This utility accepts grammar and audio file parameters, then connects to the specified Media Server to verify basic ASR and/or TTS functionality. This can be useful for customers to validate whether all LumenVox components are correctly licensed and configured. See http://www.lumenvox.com/knowledgebase/index.php?/article/AA-01633
- Added new max_num_rtp_packets_buffered configuration setting to media_server.conf, allowing users to specify the maximum number of unprocessed RTP packets to be held on to when the media server is not in recognition mode i.e. it buffers the unprocessed audio that is usually discarded between recognitions and spools it in when the next recognition starts. Do not use if the media server session is being shared with different calls since the noise baseline may end up getting calculated with noise from a different caller. This setting is useful when there are large delays between prompts during which there is no active recognition and the user may have said something that should have been captured for the next prompt.
- Added support for configurable PUBLIC_RULE_ACTIVATION_MODE option in sre_server.conf file. In version 10.5, the behavior of public rules in SRGS grammars was modified so that all top-level public rules would be activated along with the root rule in accordance with the SRGS conformance test (http://www.w3.org/Voice/2003/srgs-ir/test/conformance-4.grxml). This caused an unexpected change in behavior for some customers, and this option is now configurable. Default behavior is now backwards compatible with versions prior to 10.5.110, i.e. Rules that are not referenced in an SRGS grammar are unreachable. To enable SRGS compliant behavior, this value can be set to 1
- Improvements were made to MRCP compatibility mode 1 to provide more compliant behavior. This change includes implementing a template-based method of applying custom changes, allowing users to modify output of NLSML to emulate other speech vendors, for example or to support alternate result container formats as needed. Also, now when specifying Compatibility mode 1, GrXML "tag" supplied as a property of <item> within grammars are now permitted. Previously only explicit child tag elements were supported.
- Previously, a RECOGNIZE request using a session: identifier from a previously defined grammar could only return results from the first grammar in the URI list. Now all of the grammars in the uri-list from the DEFINE-GRAMMAR request will be activated when using the corresponding session: identifier
- Added a new command line utility application LVErrorCodes (Windows) / lv_errorcodes (Linux) to help users interpret the meaning of error codes emitted from the various LumenVox function calls. See http://www.lumenvox.com/knowledgebase/index.php?/article/AA-01636 for usage information and more details
- Added new API function LV_SRE_LoadGrammarWithParameters to accommodate loading of grammars from URI with an optional list of MRCP-style headers. This can be useful when loading parameters in an MRCP or HTTP environment, or when loading grammars where parameters cannot be stored within the grammar itself, such as Fetch-Timeout and Cache-Control parameters. See the knowledgebase article http://www.lumenvox.com/knowledgebase/index.php?/article/AA-01624 for more details.
- Improved user feedback in upgrade analyzer utility to provide better clarity to users wanting to check their configuration before upgrading to newer versions of LumenVox products. This useful utility was previously not well documented or understood by users. See the knowledgebase article http://www.lumenvox.com/knowledgebase/index.php?/article/AA-01637 for more details.
- Improved sample ASR applications to automatically detect the presence of WAV headers for audio files being passed in. These utilities were designed to only accept headerless audio files, however many users complained of performance issues when incorrectly using wav files. This change is designed to avoid such frustration, however the desired input files for sample ASR applications remains headerless (ulaw) audio files, which will give the best performance. Previously, these unwanted WAV headers at the beginning of audio files would manifest themselves as clicks or noise at the beginning of recognition, and would adversely affect noise reduction settings and voice activity detection, which would ultimately interfere with recognition results.
- Added auto detection of wave headers in LoadVoiceChannel to strip out the wave header. This only affects direct LoadVoiceChannel and does not affect Streaming interface in any way. If a Wave header is detected, the audio format contained within it will automatically be used. The Audio Header and Audio Footer are detected in a robust manner if the data is determined to have a Wave header. Most customers will not notice this change, since LoadVoiceChannel is designed to accept headerless audio (non-wave-files), however this change can help situations where the incorrect use of wave files are attempted. If there is any ambiguity in detecting the audio format or the header/footer, the old behavior will persist. Previously, these unwanted WAV headers at the beginning of audio files would manifest themselves as clicks or noise at the beginning of recognition, and would adversely affect noise reduction settings and voice activity detection, which would ultimately interfere with recognition results.
- Modified TTS Server to add check for XML syntax errors before parsing in TTS1 engine so that behavior is consistent with TTS2. This change may affect customers who were attempting to perform SSML on a malformed XML document using TTS1. Previous behavior would result in a successful synthesis with empty audio. This behavior has been changed to return a synthesis error and log out details of the xml syntax error (e.g. line number, error description) to tts_server_app.txt and also made available at LV_TTS_GetLastSSMLError() and/or LVTTSClient::GetLastSSMLError().
- Added warnings in MultiThreadedStreamingExamples to notify the user that an audio file with a WAVE header was loaded. No attempt is made to compensate for the wave headers since it would detract from the example of our API.
- Modified Media Server to correct a problem in the Linux implementation where any Recognition-Mode that was specified was causing RECOGNIZE requests to fail. Now only unsupported values (such as hotword) specified in Recognition-Mode header will be rejected. To clarify, LumenVox only supports Recognition-Mode: normal at this time, and any other values specified will be rejected as unsupported when specified for this header. As part of this change, any specified Recognition-Mode header value will not persist within the session across subsequent RECOGNIZE requests, however this minor change should not affect any users, but should prevent any unwanted/unsupported values from persisting.
- Modified asr_server_grammar.txt logging to reduce the severity of log messages related to cached grammars not being located. These common and benign warnings were being incorrectly reported with ERROR severity. These are now correctly reported as INFO severity.
- Modified logging of LV_SRE_SetPropertyEx to correctly appear in the LVSpeechPortAPILog instead of the client_asr log to be more consistent with other API logging activity.
- Modified the documentation used when describing the LICENSE_TYPE setting within client_property.conf to be more verbose and clear when users are implementing SLM, CPA or AMD
- Modified the comments associated with TIMEOUT_INFINITE to clarify that this definition is only applicable when calling one of the WaitForEngineToIdle functions, and should not be used with any SetPropertyEx function calls.
- Minor change to PROP_EX_LOAD_GRAMMAR_TIMEOUT handling to prevent values outside of permitted range of 1000 to 2147483647. Previously invalid values of <= 0 or > 2147483647 caused undesirable/unpredictable results (such as affected the Digium connector bridge). Now any value outside the permitted range will be ignored in favor of the current value, or default (200000) as appropriate.
- Added minor comment to LV_SRE_Defines.h to clarify that SPX_8KHZ and SPX_16KHZ are deprecated audio formats.
- Minor edits and code cleanup was performed on the sample ASR and TTS code.
- Modified TTS Server to improve ability to switch voices in TTS1 between languages by just specifying the voice name in the ssml <voice> element. Previously for TTS1 the voice language would also have to be specified along with the voice name if a voice switch in a different language was desired. This change makes TTS1 more consistent with TTS2 behavior.
- SimpleTTSClient applications have been modified to accept SSML via specified URI in addition to previous options. Examples were also added to the usage information.
- Modified the way in which results are handled whenever com.lumenvox.end-of-speech-timeout expires during a decode (following barge-in). Previously, any result in this situation resulted in success-maxtime along with the returned result. Now, if the confidence score is below the threshold, a no-match-maxtime will be returned with no result.
- Modified client shutdown procedure to avoid assertions whenever License Server could not be contacted.
- Modified the following sample application projects to include _CRT_SECURE_NO_WARNINGS and _CRT_SECURE_NO_DEPRECATE precompiler directives to remove benign compile-time warnings:
- Sample TTS applications have been modified to create WAV headers when producing output files. In version 10.5, the files generated were headerless ulaw, which was an undocumented and undesired change from previous versions, so this is now corrected and behaves as it did prior to 10.5, thus producing correctly formed wav files.
- Modified Speech Tuner to improve shutdown whenever non-existent CallIndexer machine references are being used. This problem was relatively benign, however this change reduces the apparent delay and exception logging side-effects of the problem.
- Fixed a bug affecting TTS2 SSML parsing where an invalid or non-existent audio URI reference would cause synthesis failure instead of playing the alternate text contained within the <audio> tag in such situations. Note that an invalid audio URI reference in this context may mean an audio file that is not in the supported format (16-bit, 16 KHz PCM with wav header). TTS1 was not affected by this issue.
- Fixed referencing builtin:grammar from a grammar document with parameters. This resolves a problem introduced in 10.5 where parameters were being incorrectly stripped from the URL when builtin grammars were being specified from within other SRGS grammars. This problem manifested itself indicating the following message in asr_server_grammar.txt: "Referencing an external root rule. But the root was not defined in target grammar."
- Modified handling of PROP_EX_DECODE_TIMEOUT to ignore values of <= 0 of specified for this value. Now if values of <= 0 are specified, the currently active value will be retained (if set) or the default (typically 20000ms) will be used. This is the change that prompted the 10.5.300 maintenance build, so this behavior has not changed since that version, but this change now affects both Windows and Linux builds.
- Fixed a bug introduced in 10.4.500 where multiple parses for a single interpretation were not correctly added to the NLSML recognition result. This only affected decodes with >1 SRGS grammar parse paths that resulted in the same Semantic Interpretation. This gave a single result instead of the correct number of results based on the actual number of parses.
- Removed unwanted .o and .d files that were incorrectly shipped with sample Linux projects
- Fixed a small leak in TTS Server which was introduced in 10.5.110, where significant repeated load would eventually lead to synthesis failure due to depletion of available handles. The number of synthesis iterations needed to encounter this problem is > 120,000. Restarting the TTS Server clears the handles, however users are encouraged to upgrade to 11.0 to avoid this defect.
- Fixed a bug when processing additional URI parameters when passed in with grammar request. These were incorrectly being stripped from the subsequent HTTP fetch request. This problem was introduced in 10.5.110, where a query string present in an HTTP URI would not be passed along to the fetching of the HTTP document.
- Fixed a bug that was introduced in 10.5.110 where old voice names were not correctly supported in a backward compatible way for TTS1 users, effectively disabling voice switching within single SSML requests when using the older voice names.
- Fixed a problem when processing malformed n-gram documents. Previously it was possible to cause a fatal exception in the ASR server when such a malformed SLM was used.
- Fixed a bug in TTS Server where accented characters for TTS1 did not work previously when specified within an SSML document. It is likely that this bug has always existed since the introduction of TTS in version 10.0. It however did work previously with TTS1 when non-SSML document plain text was used. This bug did not affect TTS2, in which accented characters work as expected for SSML and plain text.
- Minor change to LVSpeechPort destructor to check whether HPort was NULL prior to performing further cleanup. This was previously harmless, however produced some undesired log messages whenever the port had already been released normally.
10.5.300 (September 21, 2012):
- Maintenance fix for Linux only. This corrects a change in behavior between 10.4 and 10.5 if users specified a timeout value of 0 when calling SetPropertyEx with a PropertyName value of PROP_EX_DECODE_TIMEOUT. In previous versions, an invalid value of 0 was ignored but in version 10.5 changes were introduced to utilize this value. The fix is to once again ignore such invalid values.
Note that this change affected the Asterisk Connector Bridge, so Asterisk users should avoid 10.5.100 and 10.5.200 and use 10.5.300 instead.
10.5.200 (August 29, 2012):
- Changed internal logging behavior in message routing subsystem to avoid reporting unwanted late/ignored messages, since these could be interpreted as problems by customers, when they were, in fact, benign
- lv_show_config output was modified to display default values where appropriate, thus removing ambiguity from the values previously shown
- Corrected minor typos in SimpleASRClient_c and SimpleASRClient_cpp customer examples
- Modified Speech Tuner to correct a minor bug causing exceptions when accessing Call Indexer running in 32-bit CentOS5/Red Hat 5 build
- Corrected a problem affecting the single stream mode of CPA. This would only affect users attempting to use the CPA/AMD features in version 10.5.110 release who opted for the unrecommended single stream method
- Fixed a problem with customer examples Visual Studio solutions, where the 64-bit option had not been defined, requiring users to define them.
- Fixed a problem introduced in 10.5.110 where specifying DTMF and AMD together would result in undesired speech barge-in if detected
- Fixed bug introduced in 10.5.110 when performing LoadGrammar with multiple optional parameters. These erroneous duplicates had the effect of creating new cached grammar entries, when encountered. These are now normalized.
- Improved memory management in statistical pronunciation modeling, which corrects a very small, slow increase in memory use over time.
- Fixed SSML preparser which previously ignored optional emphasis child elements for TTS2 engine only. This was contrary to the documentation.
- Fixed a minor typo in lv_show_config, where GRAMMAR settings were incorrectly listed under a STREAM heading.
- Fixed a bug when defining a global grammar using the same label as a previous global grammar, the former grammar would persist. Now the new grammar will replace any former definition.
- Corrected a bug introduced in 10.5.110, which reported incorrect 0-value vocab size for global grammars. This problem only effected reported values shown on screen in the Speech Tuner application.
- Fixed a build time error, introduced in 10.5.110, which caused lv_show_config to fail when running on certain platforms, announcing lumenvox.conf could not be found
10.5.110 (August 14, 2012):
Improvements and New Features:
- Added new Call Progress Analysis functionality. This is a significant enhancement over the previously available Answering Machine Detection mechanism. Please refer to the Knowledge Base for full details of these advanced features and capabilities.
- Grammar processing has been improved to cater for more optional parameters specified with grammars, and also meta-data within grammars has been greatly enhanced.
- TTS Voice names have been changed, along with all revised TTS documentation to offer more clarity to users. See LumenVox TTS Voices page and section for more details.
- New functionality has been added to the streaming interface. See API documentation for DELAYED_LICENSE_ACQUISITION, LV_SRE_StreamStartListening, LV_SRE_RegisterGrammar or the equivalent C++ methods LVSpeechPort::StreamStartListening, LVSpeechPort::RegisterGrammar.
- Implemented HTTP caching when referencing remote grammars. This change is in addition to the internal grammar caching mechanisms that were already in place, and allows some degree of additional control over the caching process.
- An optimized builtin:grammar/date grammar is now distributed with the product, offering better speed and memory performance when this grammar is mixed with others. The old version will also continue to be distributed for now, but this may be phased out in the future. See builtin:grammar/date_with_month_checks to reference the old version.
- Results returned from AMD will now be BEEP instead of the earlier ++BEEP++, which was not a valid token in a grammar. The built-in grammar was updated to accommodate this change.
- Improved Media Server to better handle extreme load conditions
- Revised and improved example code being distributed to users to be more consistent with current best practices and more clearly demonstrate the preferred streaming method.
- Added better Virtualization support, allowing smoother installation onto VM instances
- Added configurable behavior for unknown language code (for VXML compliance)
- Improved SSML parser to be more consistent when working with multiple voices. Also added more flexible support in say-as processing to be compatible with other vendors' custom implementations. See the Introduction to SSML section in the Knowledge Base for more details.
- Modified Media Server recognition timer behavior to be based upon barge-in rather than recognition-start-timers
- Modified the help link within he Speech Tuner to reference the new help in our Knowledge Base rather than the older help webpages.
- The SimpleSREClient has been renamed to SimpleASRClient to be more consistent with naming conventions. See the Using the SimpleASR Client article for details on using this tool.
- StreamSetParameter functions in our C and C++ APIs had internally changed parameter types. This requires no changes in customer code, but may require a recompile to work correctly. It is anticipated that this change breaks backward compatibility with previous versions, including Asterisk connector-bridge. See the LV_SRE_StreamSetParameter and LVSpeechPort::StreamSetParameter articles for details.
- Modified Media Server NLSML results to correctly report ambiguous interpretations that match different specified active grammars, so that they correctly appear in their own <interpretation> element.
- Fix for Media Server where zero length synthesis would not produce SPEAK-COMPLETE
- Fixed a bug associated with emailing critical error messages in some situations. This likely ignored certain critical errors being reported by the ASR