Recognizer RECOGNIZE

The RECOGNIZE method, sent from the client to the server, tells the recognizer to start recognition against one or more specified grammars. It can also carry parameters that control sensitivity, the confidence level required for a match, and the level of detail returned in the result; these override the current defaults set by an earlier SET-PARAMS request.

If the resource is already in the recognition state, RECOGNIZE responds with a failure status. If the recognizer is idle and recognition starts successfully, the server returns a success code and a request-state of IN-PROGRESS, indicating that the recognizer is active and that the client should expect further events carrying the same request-id.

If the resource could not start recognition, it returns a 407 failure status code with a Completion-Cause header field describing the cause.

RECOGNIZE is the only recognizer request that can return a request-state of IN-PROGRESS. When recognition finishes—by matching one of the grammar alternatives, timing out without a match, or for some other reason—the recognizer sends the client a RECOGNITION-COMPLETE event carrying the result in NLSML format and a request-state of COMPLETE.

For large grammars that are slow to compile, or grammars used repeatedly, issue a DEFINE-GRAMMAR request ahead of time and then reference the prepared grammar from RECOGNIZE through the session: special URI. The same approach lets a client restart recognition against a previously supplied inline grammar.

Because audio and control messages travel over separate paths, there can be a race between the start of audio flow and the arrival of the RECOGNIZE method. If the client starts audio at the same moment it sends RECOGNIZE, either may reach the recognizer first. A client may also stream audio to the Media Server continuously and use RECOGNIZE to signal when recognition should begin.


MRCPv1 example

A RECOGNIZE request carrying an inline grammar, followed by the IN-PROGRESS acknowledgement, a START-OF-SPEECH event, and the final RECOGNITION-COMPLETE event with its NLSML result:

C->S:RECOGNIZE 543257 MRCP/1.0
            Confidence-Threshold:90
           Content-Type:application/grammar+xml
           Content-Id:request1@form-level.store
           Content-Length:104


         <?xml version="1.0"?>
        <!-- the default grammar language is US English -->
       <grammar xml:lang="en-US" version="1.0">


       <!-- single language attachment to tokens -->
       <rule id="yes">
           <one-of>
               <item xml:lang="fr-CA">oui</item>
              <item xml:lang="en-US">yes</item>
         </one-of>
     </rule>


      <!-- single language attachment to a rule expansion -->
           <rule id="request">
                may I speak to
               <one-of xml:lang="fr-CA">
                     <item>Michel Tremblay</item>
                    <item>Andre Roy</item>
             </one-of>
       </rule>


    </grammar>


S->C:MRCP/1.0 543257 200 IN-PROGRESS


S->C:START-OF-SPEECH 543257 IN-PROGRESS MRCP/1.0


S->C:RECOGNITION-COMPLETE 543257 COMPLETE MRCP/1.0
            Completion-Cause:000 success
            Waveform-URL:http://web.media.com/session123/audio.wav
            Content-Type:application/x-nlsml
            Content-Length:276


           <?xml version="1.0"?>
          <result grammar="session:request1@form-level.store">
                <interpretation>
                       <instance name="Person">
                            <Person>
                                    <Name>Andre Roy</Name>
                           </Person>
                  </instance>
                 <input>may I speak to Andre Roy</input>
          </interpretation>
</result>                     

MRCPv2 example

The equivalent exchange in MRCPv2. Note the Channel-Identifier header on every message, the application/srgs+xml grammar content type, the START-OF-INPUT event in place of START-OF-SPEECH, and the structured Waveform-URI:

C->S: MRCP/2.0 ... RECOGNIZE 543257
             Channel-Identifier:32AECB23433801@speechrecog
             Confidence-Threshold:0.9
             Content-Type:application/srgs+xml
             Content-ID:<request1@form-level.store>
             Content-Length:...


             <?xml version="1.0"?>


             <!-- the default grammar language is US English -->
            <grammar xmlns="http://www.w3.org/2001/06/grammar"
                     xml:lang="en-US" version="1.0" root="request">


           <!-- single language attachment to tokens -->
          <rule id="yes">
                       <one-of>
                                  <item xml:lang="fr-CA">oui</item>
                                 <item xml:lang="en-US">yes</item>
                      </one-of>
          </rule>


          <!-- single language attachment to a rule expansion -->
         <rule id="request">
                  may I speak to
                 <one-of xml:lang="fr-CA">
                     <item>Michel Tremblay</item>
                    <item>Andre Roy</item>
               </one-of>
         </rule>


         </grammar>


S->C: MRCP/2.0 ... 543257 200 IN-PROGRESS
            Channel-Identifier:32AECB23433801@speechrecog


S->C: MRCP/2.0 ... START-OF-INPUT 543257 IN-PROGRESS
              Channel-Identifier:32AECB23433801@speechrecog


S->C: MRCP/2.0 ... RECOGNITION-COMPLETE 543257 COMPLETE
             Channel-Identifier:32AECB23433801@speechrecog
             Completion-Cause:000 success
            Waveform-URI:<http://web.media.com/session123/audio.wav>;
                      size=424252;duration=2543
            Content-Type:application/nlsml+xml
            Content-Length:...


           <?xml version="1.0"?>
          <result grammar="session:request1@form-level.store">
              <interpretation>
                   <instance name="Person">
                         <ex:Person>
                                  <ex:Name>Andre Roy</ex:Name>
                        </ex:Person>
                </instance>
               <input>may I speak to Andre Roy</input>
         </interpretation>
</result>      

Related articles


Was this article helpful?