Browse
 
Tools
Rss Categories

Recognizer RECOGNIZE

Reference Number: AA-01653 Views: 3167 0 Rating/ Voters

The RECOGNIZE method from the client to the server tells the recognizer to start recognition using the specific grammar or grammars to match for.  The RECOGNIZE method can carry parameters to control the sensitivity, confidence level, and the level of detail in results provided by the recognizer. These parameters override the current defaults set by a previous SET-PARAMS method.

If the resource is already in the recognition state, the RECOGNIZE request will respond with a failure status, otherwise if the recognizer resource is in the Idle state and was able to successfully start the recognition, the server will return a success code and a request-state of IN-PROGRESS, indicating that the recognizer is active and that the client should expect further events with this request-id.

If the resource could not start a recognition, it will return a failure status code of 407 and contain a completion-cause header field describing the cause of failure.

For the recognizer resource, this is the only request that can return request-state of IN-PROGRESS, meaning that recognition is in progress.  When the recognition completes by matching one of the grammar alternatives or by a time-out without a match or for some other reason, the recognizer resource will send the client a RECOGNITION-COMPLETE event with the result of the recognition (in NLSML format) and a request-state of COMPLETE.

For large grammars that can take a long time to compile and for grammars that are used repeatedly, the client could issue a DEFINE-GRAMMAR request with the grammar ahead of time.  In such a case, the client can issue the RECOGNIZE request and reference the grammar through the "session:" special URI.  This also applies in general if the client wants to restart recognition with a previous inline grammar.

Note that since the audio and the messages are carried over separate communication paths there may be a race condition between the start of the flow of audio and the receipt of the RECOGNIZE method.  For example, if audio flow is started by the client at the same time as the RECOGNIZE method is sent, either the audio or the RECOGNIZE will arrive at the recognizer first.  As another example, the client may chose to continuously send audio to the Media Server and signal the Media Server to recognize using the RECOGNIZE method.


MRCPV1 RECOGNIZE Example:

C->S:RECOGNIZE 543257 MRCP/1.0
     Confidence-Threshold:90
     Content-Type:application/grammar+xml
     Content-Id:request1@form-level.store
     Content-Length:104

     <?xml version="1.0"?>
     <!-- the default grammar language is US English -->
     <grammar xml:lang="en-US" version="1.0">

     <!-- single language attachment to tokens -->
     <rule id="yes">
              <one-of>
                       <item xml:lang="fr-CA">oui</item>
                       <item xml:lang="en-US">yes</item>
              </one-of>
          </rule>

     <!-- single language attachment to a rule expansion -->
          <rule id="request">
              may I speak to
              <one-of xml:lang="fr-CA">
                       <item>Michel Tremblay</item>
                       <item>Andre Roy</item>
              </one-of>
          </rule>

       </grammar>

S->C:MRCP/1.0 543257 200 IN-PROGRESS

S->C:START-OF-SPEECH 543257 IN-PROGRESS MRCP/1.0

S->C:RECOGNITION-COMPLETE 543257 COMPLETE MRCP/1.0
     Completion-Cause:000 success
     Waveform-URL:http://web.media.com/session123/audio.wav
     Content-Type:application/x-nlsml
     Content-Length:276

     <?xml version="1.0"?>
     <result grammar="session:request1@form-level.store">
         <interpretation>
             <instance name="Person">
                 <Person>
                     <Name>Andre Roy</Name>
                 </Person>
             <instance>
             <input>may I speak to Andre Roy</input>
         </interpretation>
     </result>
  


MRCPV2 RECOGNIZE Example:

C->S: MRCP/2.0 ... RECOGNIZE 543257
      Channel-Identifier:32AECB23433801@speechrecog
      Confidence-Threshold:0.9
      Content-Type:application/srgs+xml
      Content-ID:<request1@form-level.store>
      Content-Length:...

      <?xml version="1.0"?>

      <!-- the default grammar language is US English -->
      <grammar xmlns="http://www.w3.org/2001/06/grammar"
         xml:lang="en-US" version="1.0" root="request">

      <!-- single language attachment to tokens -->
      <rule id="yes">
            <one-of>
                  <item xml:lang="fr-CA">oui</item>
                  <item xml:lang="en-US">yes</item>
            </one-of>
      </rule>

      <!-- single language attachment to a rule expansion -->
      <rule id="request">
            may I speak to
            <one-of xml:lang="fr-CA">
                  <item>Michel Tremblay</item>
                  <item>Andre Roy</item>
            </one-of>
      </rule>

      </grammar>

S->C: MRCP/2.0 ... 543257 200 IN-PROGRESS
      Channel-Identifier:32AECB23433801@speechrecog

S->C: MRCP/2.0 ... START-OF-INPUT 543257 IN-PROGRESS
      Channel-Identifier:32AECB23433801@speechrecog

S->C: MRCP/2.0 ... RECOGNITION-COMPLETE 543257 COMPLETE
      Channel-Identifier:32AECB23433801@speechrecog
      Completion-Cause:000 success
      Waveform-URI:<http://web.media.com/session123/audio.wav>;
              size=424252;duration=2543
      Content-Type:application/nlsml+xml
      Content-Length:...

      <?xml version="1.0"?>
      <result grammar="session:request1@form-level.store">
        <interpretation>
          <instance name="Person">
            <ex:Person>
                <ex:Name>Andre Roy</ex:Name>
            </ex:Person>
          </instance>
          <input>may I speak to Andre Roy</input>
       </interpretation>
     </result>