NLSML

Natural Language Semantics Markup Language (NLSML) is an XML format used to represent speech semantic result information within MRCP implementations. This format is designed to allow software to parse the data in a defined way, allowing varying amounts of data to be represented.

The NLSML format was defined by the W3C working group in their draft documentation, which should be used as a more in-depth reference when working with this format.

Although there are a number of defined uses for NLSML, it is primarily used to format ASR output before sending it in MRCP replies when using the Media Server to connect to the ASR. The NLSML format is used in both MRCPv1 and MRCPv2, with the only major difference being the scaling used when reporting confidence scores: 0 to 100 for MRCPv1 and 0.0 to 1.0 for MRCPv2, as defined by their respective specifications.

Since this format is based on a working draft that was designed to be used in a number of different ways, there is often confusion surrounding the way it is actually used in practice. This article describes the specific implementation used by Capacity Private Cloud, which works well with a large number of platforms and technologies that the Media Server has been certified against. Note that other vendors may implement NLSML differently.

The specific formatting of the NLSML output from the ASR can be customized when using the Media Server by specifying different compatibility_mode options in the media_server.conf configuration file. This is an advanced option, recommended only with Client Services assistance.

Note that the examples given here have carriage returns and indentation added to improve readability. Actual NLSML results would not normally have these and would be considered inline XML without unnecessary whitespace.

Output Encoding Format

The output NLSML uses UTF-8 encoding, declared in the XML header. All NLSML results are encoded in UTF-8, which provides full support for non-ASCII characters across all languages.

Users wishing to change the declared encoding format in the XML header can modify the "ResultHeader.txt" template files in Lang/ResultTemplates/1 and/or Lang/ResultTemplates/2 folders. Note that this only changes the declared encoding in the header — the actual character encoding of the output will always be UTF-8.

"result" Root Element

Description

The root element is <result> and includes one or more <interpretation> elements. Multiple interpretations result from ambiguities in the input of the semantic interpretation.

Attributes

Attribute

Description

grammar

The grammar or recognition rule matched by this result. The grammar can be (and generally is) overridden by a grammar attribute in the "interpretation" element, so this attribute may not be present in the result element.

xmlns

The XML namespace for MRCP. This attribute is not used or populated by the platform.

Parent

None

Children

<interpretation>

Example

<?xml version="1.0" encoding="UTF-8"?>
<result>
  <interpretation grammar="http://192.168.0.55/grammars/test.grxml" confidence="90">
    <input mode="speech">
      San Diego
    </input>
    <instance>
      Destination
    </instance>
  </interpretation>
</result>

"interpretation" Element

Description

Encapsulates the input hypothesis. There should be one or more "interpretation" elements in each result. When multiple "interpretation" elements are present, these each represent n-best alternative results, with each indicating the confidence score of its respective hypothesis.

Attributes

Attribute

Description

grammar

The grammar or recognition rule matched by this result. The format of the grammar attribute matches the rule reference semantics defined in the grammar specification.

confidence

An integer from 0-100 (MRCPv1) or 0.0 to 1.0 (MRCPv2) indicating the semantic analyzer's confidence in this interpretation.

Parent

<result>

Children

<input> <instance>

Example

<?xml version="1.0" encoding="UTF-8"?>
<result>
  <interpretation grammar="builtin:grammar/digits" confidence="100">
    <input mode="dtmf">
      1 2 3 4
    </input>
    <instance>
      John's PIN code
    </instance>
  </interpretation>
</result>

"instance" Element

Description

Contains the SISR Basics semantic interpretation from the ASR for the detected utterance. See the Intro to Semantic Interpretation article for more details on how to define and use semantic interpretation, along with the corresponding tag-format that the ASR grammars can use to process ECMAScript code contained within tags of matched grammar rules.

Attributes

None

Parent

<interpretation>

Children

None

Example

<?xml version="1.0" encoding="UTF-8"?>
<result grammar="session:test_grammar.grxml">
  <interpretation confidence="94">
    <input mode="speech">
      San Diego
    </input>
    <instance>
      Destination
    </instance>
  </interpretation>
</result>

"input" Element

Description

Text representation of the user's input.

Attributes

Attribute

Description

mode

The modality of the input, which will be either speech or dtmf.

confidence

An optional integer from 0-100 (MRCPv1) or 0.0 to 1.0 (MRCPv2) indicating the semantic analyzer's confidence in this interpretation.

timestamp-start

This attribute is not used or populated by the platform.

timestamp-end

This attribute is not used or populated by the platform.

Parent

<interpretation>

Children

<noinput> <nomatch>

Example

<?xml version="1.0" encoding="UTF-8"?>
<result grammar="session:test_grammar.grxml">
  <interpretation confidence="87">
    <input mode="speech">
      San Diego
    </input>
    <instance>
      Destination
    </instance>
  </interpretation>
</result>

"noinput" Element

Description

The "noinput" element under "input" indicates that the ASR interpreter did not detect any speech. This occurs when a timeout expires in the speech recognizer while waiting for start of speech (Voice Activity Detection), and no speech is detected.

Attributes

None

Parent

<input>

Children

None

Example

<?xml version="1.0" encoding="UTF-8"?>
<result grammar="session:test_grammar.grxml">
  <interpretation confidence="0">
    <instance/>
    <input>
      <noinput/>
    </input>
  </interpretation>
</result>

"nomatch" Element

Description

The "nomatch" element under "input" indicates that the ASR interpreter was unable to successfully match any input. This can occur if the confidence score for a result is lower than the Confidence-Threshold for a recognition, in which case the ASR will force the result to become a nomatch. This type of response generally indicates that an ASR result was obtained, but it was not a good match for any of the constraints of the loaded grammars. If no speech was detected, this would typically have resulted in a noinput response.

Attributes

None

Parent

<input>

Children

None

Example

<?xml version="1.0" encoding="UTF-8"?>
<result grammar="session:test_grammar.grxml">
  <interpretation confidence="0">
    <instance/>
    <input>
      <nomatch/>
    </input>
  </interpretation>
</result>

More Complex Results

There is often confusion regarding the format of NLSML when results are more complex than the simple examples shown above. Listed below are some examples of how NLSML appears when more complex results are encountered. The overall format and content of NLSML is highly dependent on the grammar format specified and the semantic interpretation information within them.

Multiple N-Best Results

Each n-best alternative occurs within its own <interpretation> element tag, with a unique <input> child element. Within the <interpretation> tag, there may be one or multiple <instance> tags.

<?xml version="1.0" encoding="UTF-8"?>
<result>
  <interpretation grammar="builtin:grammar/boolean" confidence="80">
    <input mode="speech">
      no
    </input>
    <instance>
      false
    </instance>
  </interpretation>
  <interpretation grammar="builtin:grammar/boolean" confidence="20">
    <input mode="speech">
      nope
    </input>
    <instance>
      false
    </instance>
  </interpretation>
</result>

Multiple Parses

When representing multiple parses in NLSML, only those parses for an input that return different semantic interpretations will be represented. In other words, if multiple parses return the same semantic interpretation, they will be combined.

Multiple Interpretations, Single Grammar

When there are multiple interpretations for the same input using one grammar, each semantic interpretation is contained in its own <instance> element tag within the single <interpretation> tag.

<?xml version="1.0" encoding="UTF-8"?>
<result>
  <interpretation grammar="session:FourNumbers" confidence="80">
    <input mode="speech">
      one twenty three forty five
    </input>
    <instance>
      1,23,40,5
    </instance>
    <instance>
      1,20,3,45
    </instance>
  </interpretation>
</result>

Multiple Interpretations, Multiple Grammars

For interpretations from the same input using multiple grammars, each semantic interpretation is contained in separate <interpretation> element tags.

<?xml version="1.0" encoding="UTF-8"?>
<result>
  <interpretation grammar="session:my_date_grammar" confidence="80">
    <input mode="speech">
      two thirty
    </input>
    <instance>
      02/30/????
    </instance>
  </interpretation>
  <interpretation grammar="session:my_time_grammar" confidence="80">
    <input mode="speech">
      two thirty
    </input>
    <instance>
      2:30
    </instance>
  </interpretation>
</result>

Multiple Slots, Single Interpretation

If a single interpretation contains multiple slots, each slot value is contained in an XML tag with the slot name under a single instance.

<?xml version="1.0" encoding="UTF-8"?>
<result>
  <interpretation grammar="session:my_city_state" confidence="80">
    <input mode="speech">
      San Diego California
    </input>
    <instance>
      <city>
        San Diego
      </city>
      <state>
        California
      </state>
    </instance>
  </interpretation>
</result>

Was this article helpful?