With the introduction of the Media Resource Control Protocol (MRCP), speech solution and platform developers now have a choice in how they integrate the LumenVox Speech Engine and other speech engines into their applications: they can use MRCP, or write directly to the Application Programming Interface (API).
This paper discusses the pros and cons of both development methods, so that you can choose the proper path for your organization.
MRCP was proposed in April 2006 to the Internet Engineering Task Force (IETF), and is now in version 2. MRCP controls media resources like speech synthesizers, speech recognizers, signal generators, signal detectors, fax servers, voice biometrics servers, etc., over a network. Until the protocol was defined, these components had to be provided by a single vendor with a proprietary interface. In essence, MRCP allows the developer to seamlessly manage these diverse media resources and provides a common language to speak to all of these devices. Version 2 of this protocol is designed to work with Session Initiation Protocol (SIP), which helps establish control connections to external media streaming devices, and media delivery mechanisms like Real Time Protocol (RTP).
MRCP?s strength is that it enables your application or platform to integrate with the speech and text-to-speech engines of your choice. MRCP addresses the need for client control of media processing resources. There are some caveats, however, when implementing your speech application using MRCP, such as no backwards compatibility and reduced control of core engine features limited to the MRCP standard definition.
In addition, there may be subtle differences in each speech vendor?s implementation of the standard. This is usually because some vendors adopt the standard at different stages of the draft revisions, before the draft is complete. Also, vendors sometimes make special adaptations of the standard to suit their needs.
Despite these issues, if you want to have the ability to support multiple speech products through a single interface, MRCP would be the correct choice for your speech project.
The other option for developers is to simply write their application directly to the API. The API is the optimized interface to the product and its features. This option usually takes less time and provides the developer with greater control. Most importantly, it gives access to all of the features specific to a product, rather than the subset of features that is common among all applications, as provided through MRCP.
Generally this option gives the developer more flexibility, with less reliance on the speech vendor. Additionally, the speech vendor can respond to feature requests more quickly and expose them for the developer in less time (often by weeks) through the API than through MRCP.
Choosing an integration path for your speech application project boils down to what is right for your organization. If you are looking for more flexibility to access to a mix and match of speech technologies, then using MRCP may be the best choice for you. If you need greater functionality and an easier integration process, then writing to the API might be the best solution for company.
If you need further information on this topic, or would like to discuss your particular speech project, please contact LumenVox at 1-877-977-0707 or support@lumenvox.com
In addition, please find the defined MRCP protocol at the Internet Engineering Task Force (IETF) site: www.ietf.org.
For more information on LumenVox products, such as the Speech Engine, please go to www.lumenvox.com.