MRCPRecog()

MRCPRecog() is a dialplan application provided by the res_unimrcp.so module (see Developing Speech Applications on Asterisk for more information) that performs basic automatic speech recognition (ASR). It can play an audio file and allow the caller to interrupt it (barge-in) and return the result of what a caller said.

Application

MRCPRecog(grammar,options)

Parameters

grammar

The grammar that should be used for the recognition. Grammars can be specified as text/XML inline or by using a reference to an external file/URI. Multiple grammars can be specified by surrounding them in quotes and separating them with commas, e.g. "mygrammar1,mygrammar2".

The builtin:grammar/gramname syntax is allowed for built-in grammars.

See the SRGS Introduction article for guidance on building grammars.

options

Options control details about the recognition. Valid options are:

p — Profile to use in mrcp.conf.
i — Digits to allow recognition to be interrupted with. Set to "none" to allow the platform to process DTMF using a DTMF grammar. Otherwise, if "any" or other digits are specified, recognition will be interrupted and the digit will be returned to the dialplan.
f — Filename to play while recognition occurs. If empty or not specified, no file is played.
t — Recognition timeout in milliseconds. This is the total amount of time a caller has to speak.
b — Barge-in value (0 = no barge-in, 1 = ASR engine barge-in, 2 = Asterisk barge-in). It is strongly recommended to allow the ASR engine to perform barge-in instead of Asterisk.
gd — Grammar delimiter. Defaults to a comma.
ct — Confidence threshold (0.0–1.0). If a recognition result has a confidence score below this value, it will be returned as "no match." Defaults to 0.5.
sl — Barge-in sensitivity level (0.0–1.0). The higher this number, the easier it is to barge-in. Defaults to 0.5.
sva — Speed vs. accuracy (0.0–1.0). The lower this number, the faster (and less accurate) recognitions will be. Defaults to 0.5.
nb — N-best list length. Defaults to 1; increase this value to get more results back from the recognizer.
nit — No input timeout. The amount of time the caller has to start speaking before the recognizer returns a no-input result.
sct — Speech complete timeout in milliseconds. The amount of time the recognizer must detect silence after a user stops speaking before it begins processing the utterance. Set this lower for single-word utterances and higher for longer utterances. In most cases, a value of 800 is correct.
dit — DTMF interdigit timeout.
dtt — DTMF terminate timeout.
dttc — DTMF terminate characters.
sw — Save waveform (true/false).
nac — New audio channel (true/false).
spl — Speech language (en-US, en-GB, etc.). If a language is declared in the grammar, this will be ignored.
cdb — Clear DTMF buffer (true/false).
mt — Media type.
iwu — Input waveform URI (MRCPv2 only). Not currently supported.
sint — Speech incomplete timeout. Not currently supported.
rm — Recognition mode. Not currently supported.
hmaxd — Hotword max duration. Not currently supported.
hmind — Hotword min duration. Not currently supported.
enm — Early no match (true/false). Not currently supported.

You are not required to supply any options. Multiple options can be provided by joining them with an ampersand, e.g. f=sayHelloWorld&t=5000

Return Values

RECOGSTATUS

The channel variable ${RECOGSTATUS} is set to "OK" if the recognition started, otherwise it will be set to "ERROR".

RECOG_RESULT

The channel variable ${RECOG_RESULT} stores the result of the recognition, assuming there was a successful recognition. This is an NLSML-formatted XML string containing the speech input, the confidence score, and the semantic interpretation from the recognizer.

Remarks

Because dialplan applications cannot take more than 1024 characters as arguments, any large grammars should be specified via external reference (see examples below).

If you supply XML grammars inline, be sure to escape all quotation marks and commas with backslashes.

Parameters such as language specified in a grammar will take precedence over any options set when invoking MRCPRecog(). The recommendation is to use grammars for this kind of control.

Example Uses

Say Yes or No (built-in grammar)

exten => 1,1,MRCPRecog(builtin:grammar/boolean,p=default&f=beep)
exten => 1,n,Verbose(Status is: ${RECOGSTATUS} and result is: ${RECOG_RESULT})

Say Yes or No (built-in grammar, Mexican Spanish)

exten => 1,1,MRCPRecog(builtin:grammar/boolean,p=default&f=beep&spl=es-MX)
exten => 1,n,Verbose(Status is: ${RECOGSTATUS} and result is: ${RECOG_RESULT})

Use External Grammar (HTTP)

exten => 1,1,MRCPRecog(http://myServer/myGrammar.grxml,p=default&f=beep)
exten => 1,n,Verbose(Status is: ${RECOGSTATUS} and result is: ${RECOG_RESULT})

Set N-Best and Confidence Threshold

exten => 1,1,MRCPRecog(http://myServer/myGrammar.grxml,p=default&nb=5&f=beep&ct=0.1)
exten => 1,n,Verbose(Status is: ${RECOGSTATUS} and result is: ${RECOG_RESULT})

Voice and DTMF Grammar at Same Time

exten => 1,1,MRCPRecog("http://myServer/myGrammar-voicemode.grxml,http://myServer/myGrammar-dtmfmode.grxml",p=default&f=beep)
exten => 1,n,Verbose(Status is: ${RECOGSTATUS} and result is: ${RECOG_RESULT})

Was this article helpful?