Video Transcription

MRCP vs API, Part 2

The second part of our two-part look at the differences in integrating a telephony platform and the LumenVox Speech Engine using either MRCP or our API will:

  • Discuss weaknesses that both approaches have
  • Compare the pros from our last discussion with the cons of each method
  • Analyze the pros and cons to find out which methods will work best for you

MRCP Weaknesses

  • One of the things we mentioned in the last discussion as a strength of an API is that they tend to be backwards compatible. With MRCP, different revisions of the standard are not very compatible with one another. Think of it like this: if it's going to take 2 or 3 years to adopt a standard and another 2 to 3 years to adopt a revision of that standard, significant changes will have been made. For example, MRCP version one is not compatible with MRCP version 2. They have completely different control protocols, there a difference with packets, and the communication has been changed. So with an API level integration, vendors to tend to try and not break things when moving from one version of the application to the next. With MRCP, things do tend to get broken and you end up having to rewrite the interface.
  • Another downside is that MRCP is not nearly as vendor neutral as it is supposed to be, actually no standard. The great thing about standards is that everyone implements them differently. Theoretically you could plug in one engine and then plug in a different speech engine without changing anything. But in reality, every vendor implements the standards a little differently, and so there needs to be testing and conformance testing before you switch vendors. You may find that a vendor will do certain things a little differently and you may, for example, have to change your client to accept a certain possibility where as before you might have expected something else. This is a commonality to all Internet and computer standards, things don't work exactly as they are supposed to work. For example, if you open a web page in one browser and the same page in another browser, you will notice the page takes on a different look, even though the browsers are both using the same standards. It's the same with MRCP.
  • Getting started with MRCP is a lot of work. Your client for an MRCP level application has to be in charge of all of the network communication. With an API level integration, all that is really being done is calling functions that we have exposed to you so it's less work. An MRCP client has to perform lower level network functions, like monitoring traffic. There are some open source MRCP clients you can use to help you get a head start, but at the end of the day you'll have to do a significant amount of programming to get your first client ready and then you'll have to look at how vendors implement the standard. MRCP has a pretty high start up time, especially if you are starting from scratch. If you don't have an MRCP client already written it will be a considerable amount of work and development time to get one up and going.

API Weaknesses

  • The API has its own weaknesses. The big weakness is that it's not very flexible. It's powerful but you're getting vendor lock-in. You'll have to rewrite the interface entirely when you use a separate vendor, because each vendor's API is unique and is not written to be compatible with anyone else's. Each has its own functions so you'll have to completely scrap your interface when you move to a different speech solution. MRCP will take some time when moving vendor to vendor but at least you won't have to completely rework your interface, like you will with the API.
  • Another aspect of working with an API which can be looked at as a weakness or a benefit is that with LumenVox we have a speech client and a speech server. When using an API level integration the speech client has to running on the same machine as your application is running on. Our client does all of the front end audio processing, such as voice activity detection, knowing when to barge-in, dealing with noise cancellation, etc.. This all takes up processing time so if you have your application running on one machine you also have to have the front end audio processing being done on that same machine, which means you'll need our client on that same machine. We can send all of the speech to our speech server to actually do the recognition. This is where the bulk of processing power comes, in terms of speech recognition is doing decodes, however there is another piece on the front end which has to happen on the same machine as your application if you're using an API level integration. With MRCP these functions can be split up because MRCP is all based around network traffic. So you can have your client application talking to our MRCP server which is where our speech client resides, and that speech client can communicate with yet another speech server if desired. MRCP is more distributed than an API integration.

Which is Best for You

  • MRCP: How do you factor all of this together and decide which works? MRCP is the solution for you if require flexibility. If you have the need to offer a multitude of solutions for various customers who may require separate speech engines, MRCP is the way to go. This is also good if you already have an MRCP client, because you won't have to build one from scratch.
  • API: However, if you do not already have an MRCP client, and you decide that there will be one clear vendor choice for you, than API is your solution. You'll get more functionality and a faster startup time in terms of development and testing. API is also easier in the long run to maintain, as along as you don't try to switch from one engine to another. Also, in many cases if you're just getting into speech, API is the way to go. You won't have to spend much time to get up and running, if you don't have much experience and you're not certain if you'll want to move from one vendor to another, we recommend that you do the API level integration and get some experience and then later if needed you can invest the greater time into MRCP.

If you have any questions, please contact us.


  • In this second part of our two part series on MRCP v. the API, we delve into the weaknesses of both approaches for developing speech applications and then end with a discussion of which approach might be best for your organization.


  • Video Playtime: 8:04



  • Contact Us
  • +1-858-707-7700
  • Toll Free: (877) 977-0707,
    say "Sales"

© 2016 LumenVox, LLC. All rights reserved.