SynthAndRecog()

SynthAndRecog() is a dialplan application provided by the res_unimrcp.so module (see Developing Speech Applications on Asterisk for more information) that performs basic automatic speech recognition (ASR) while also playing out synthesized audio (TTS). Callers can interrupt (barge-in) the synthesized audio, and the application returns the result of what a caller said.

Application

SynthAndRecog(text,grammar,options)

Parameters

text

Text for the TTS engine to read to the caller. Valid inputs are plaintext specified inline, SSML specified inline, or a path/URI to an SSML document. See the Introduction to SSML for more information about working with SSML.

grammar

The grammar that should be used for the recognition. Grammars can be specified as text/XML inline or by using a reference to an external file/URI. Multiple grammars can be specified by surrounding them in quotes and separating them with commas, e.g. "mygrammar1,mygrammar2".

The builtin:grammar/gramname syntax is allowed for built-in grammars.

See the documentation on writing grammars for guidance on building grammars.

options

Options control details about the synthesis and recognition. Valid options are:

p — Profile to use in mrcp.conf.
i — Digits to allow recognition to be interrupted with. Set to "none" to allow the platform to process DTMF using a DTMF grammar. Otherwise, if "any" or other digits are specified, recognition will be interrupted and the digit will be returned to the dialplan.
t — Recognition timeout in milliseconds. This is the total amount of time a caller has to speak.
b — Barge-in value (0 = no barge-in, 1 = ASR engine barge-in, 2 = Asterisk barge-in). It is strongly recommended to allow the ASR engine to perform barge-in instead of Asterisk.
gd — Grammar delimiter. Defaults to a comma.
ct — Confidence threshold (0.0–1.0). If a recognition result has a confidence score below this value, it will be returned as "no match." Defaults to 0.5.
sl — Barge-in sensitivity level (0.0–1.0). The higher this number, the easier it is to barge-in. Defaults to 0.5.
sva — Speed vs. accuracy (0.0–1.0). The lower this number, the faster (and less accurate) recognitions will be. Defaults to 0.5.
nb — N-best list length. Defaults to 1; increase this value to get more results back from the recognizer.
nit — No input timeout. The amount of time the caller has to start speaking before the recognizer returns a no-input result.
sct — Speech complete timeout in milliseconds. The amount of time the recognizer must detect silence after a user stops speaking before it begins processing the utterance. Set this lower for single-word utterances and higher for longer utterances. In most cases, a value of 800 is correct.
dit — DTMF interdigit timeout.
dtt — DTMF terminate timeout.
dttc — DTMF terminate characters.
pv — Prosody volume (silent/x-soft/soft/medium/loud/x-loud/default).
pr — Prosody rate (x-slow/slow/medium/fast/x-fast/default).
vn — Voice name to use (e.g. "Lindsey", "Chris").
vg — Voice gender to use (e.g. "male", "female").
sw — Save waveform (true/false).
nac — New audio channel (true/false).
spl — Speech language (en-US, en-GB, etc.). If a language is declared in the grammar, this will be ignored.
cdb — Clear DTMF buffer (true/false).
mt — Media type.
iwu — Input waveform URI (MRCPv2 only). Not currently supported.
sint — Speech incomplete timeout. Not currently supported.
rm — Recognition mode. Not currently supported.
hmaxd — Hotword max duration. Not currently supported.
hmind — Hotword min duration. Not currently supported.
enm — Early no match (true/false). Not currently supported.
vv — Voice variant. Not currently supported.

You are not required to supply any options. Multiple options can be provided by joining them with an ampersand, e.g. vn=Chris&t=5000

Return Values

RECOGSTATUS

The channel variable ${RECOGSTATUS} is set to "OK" if the recognition started, otherwise it will be set to "ERROR".

RECOG_COMPLETION_CAUSE

The channel variable ${RECOG_COMPLETION_CAUSE} indicates whether recognition completed successfully with a match or an error occurred. The possible values are "000" for success, "001" for no-match, and "002" for no-input.

RECOG_RESULT

The channel variable ${RECOG_RESULT} stores the result of the recognition, assuming there was a successful recognition. This is an NLSML-formatted XML string containing the speech input, the confidence score, and the semantic interpretation from the recognizer.

Remarks

Because dialplan applications cannot take more than 1024 characters as arguments, any large grammars and/or SSML should be specified via external reference (see examples below).

If you supply SSML or grammars inline, be sure to escape all quotation marks and commas with backslashes.

Parameters such as language specified in a grammar will take precedence over any options set when invoking SynthAndRecog(). The recommendation is to use grammars or SSML for this kind of control.

Example Uses

Say Yes or No (built-in grammar)

[synth-and-recog]
exten = s,1,Answer
exten => s,n,SynthAndRecog(Say yes or no,builtin:grammar/boolean,&f=beep)
exten => s,n,Verbose(Status is: ${RECOGSTATUS} completion cause is: ${RECOG_COMPLETION_CAUSE} and result is: ${RECOG_RESULT})

Use External Grammars and SSML (HTTP)

[synth-and-recog-ext]
exten = s,1,Answer
exten => s,n,SynthAndRecog(http://myServer/mySSML.ssml,http://myServer/myGrammar.grxml,&f=beep)
exten => s,n,Verbose(Status is: ${RECOGSTATUS} completion cause is: ${RECOG_COMPLETION_CAUSE} and result is: ${RECOG_RESULT})

Was this article helpful?