This presentation will go over the following topics:
Incoming calls generally come through in one of two ways:
A platform is a software system that does two things:
The following graphic shows how this all works together. On the left of the illustration you'll see SIP (or this could also be VOIP), on the right at the bottom notice the POTS (or plain old telephone system).
Above that is the hardware layer and above that is the platform. Within the platform is the call control and then the applications, they will communicate with each other if necessary.
In the past, applications and call control were written directly to the platform's API with a proprietary method.
Modern day platforms generally use an open standard:
The illustration for this is almost the same as before, except that instead of call control, you'll see a CCXML Browser. Also, instead of an application you'll now see a VXML browser. Otherwise the graphic is unchanged.
Most speech applications makes use of either ASR (Automatic Speech Recognition) also referred to as a speech engine, or TTS (text to speech), or both. These are controlled at the application level. The application can be written directly to the API of the ASR or TTS engine, or use MRCP (media resource control protocol).
Since speech resources are controlled by the platform using API or MRCP, they don't care about the underlying application code. This means that speech applications will work relatively the same, whether written in VXML or written directly the platform's API.
The following graphics illustrates this. The graphic on the left shows an application written directly to the API of the platform. The graphic on the right shows utilization of VXML and CCXML. Note that the ASR and the TTS is a layer on top, it doesn't matter if it was written via API or MRCP.
Our graphics show that the call comes in, it talks to a computer, it then goes through the call control or the CCXML browser. The application or VXML browser comes in next, the speech engine or TTS engine sits on top.