Browse
 
Tools
Rss Categories

NLSML

Reference Number: AA-01803 Views: 3868 0 Rating/ Voters

Natural Language Semantics Markup Language (NLSML) is an XML format used to represent speech semantic result information within MRCP implementations.  This format is designed to allow software to be able to parse the data in a defined way, allowing varying amounts of data to be represented.

The NLSML format was defined by the W3C working group in their draft documentation, which should be used to provide a more in-depth reference when using this format.

Although there are a number of defined uses for NLSML, it is primarily used in the way LumenVox has implemented it, where the output of the ASR is formatted in NLSML before being sent in MRCP replies when using the LumenVox Media Server to connect to the ASR.  The NLSML format is used in both MRCPv1 and MRCPv2 with the only major difference being the scaling used when reporting confidence scores 0 to 100 for MRCPv1 and 0.0 to 1.0 for MRCPv2 as defined by their respective specifications.

Since this format is based on a working draft, and that draft was designed to be used in a number of different ways, there is often confusion surrounding the way in which this format is actually used in practice.  This article describes the specific way in which LumenVox has implemented its version of NLSML, which seems to work reasonably well in conjunction with a large number of platforms and technologies that our Media Server has been certified against. It should be noted that other vendors may implement this in a number of different ways.

Note that the specific formatting of the NLSML output from the ASR can be customized when using the LumenVox Media Server by specifying different compatibility_mode options in the media_server.conf configuration file. This is an advanced option, which we recommend only using in conjunction with LumenVox Client Services assistance.

Please also note that examples given here have carriage returns and indentation added to improve readability. Actual NLSML results would not normally have these, so would be considered inline XML, without unnecessary whitespace.

Output Encoding Format

Starting with LumenVox version 12.0, the output NLSML will also have the "ISO-8859-1" encoding format declared in the XML header to avoid ambiguity when third party applications parse non-ASCII characters that may be contained in the results. Please refer to the Wikipedia article describing ISO-8859-1 encoding for more details.  

Users wanting to revert to the older format (without the encoding format being specified) can modify the "ResultHeader.txt" template files in Lang/ResultTemplates/1 and/or Lang/ResultTemplates/2 folders.



"result" Root Element


Description

The root element is <result> and includes one or more <interpretation> elements. Multiple interpretations result from ambiguities in the input of the semantic interpretation.

Attributes

                 

  

Attribute

  
  

Description

  
  

grammar

  
  

The grammar or recognition rule matched by this result. The grammar can be (and generally   is) overridden by a grammar attribute in the "interpretation"   element, so this attribute may generally not be present in the result element

  
  

xmlns

  
  

The XML namespace for MRCP – this attribute is not used or populated by LumenVox

  

Parent

None

Children

<interpretation>

Example

<?xml version="1.0" encoding="ISO-8859-1"?>
<result>
    <interpretation grammar="http://192.168.0.55/grammars/test.grxml" confidence="90">
        <input mode="speech">
            San Diego
        </input>
        <instance>
            Destination
        <instance/>
    </interpretation>
</result>


"interpretation" Element


Description

Encapsulates the input hypothesis. There should be one or more "interpretation" elements in each result. When multiple 
"interpretation" elements are present, these will each represent n-best alternative results, each indicating the confidence score of its respective hypothesis.

Attributes

                 

  

Attribute

  
  

Description

  
  

grammar

  
  

The grammar or recognition rule matched by this result. (The format of the grammar attribute will match the rule reference semantics defined in the grammar specification.) 

  
  

confidence

  
  

An integer from 0-100 (MRCPv1) or 0.0 to 1.0   (MRCPv2) indicating the semantic analyzer's confidence in this interpretation.

  

Parent

<result>

Children

<input> <instance>

Example

<?xml version="1.0" encoding="ISO-8859-1"?>
<result>
   <interpretation grammar="builtin:grammar/digits" confidence="100">
        <input mode="dtmf">
            1 2 3 4
        </input>
        <instance>
            John's PIN code
        <instance/>
    </interpretation>
</result>


"instance" Element


Description

Contains the SISR semantic interpretation from the ASR for the detected utterance. See our Intro to Semantic Interpretation article for more details on how to define and use semantic interpretation, along with the corresponding tag-formatthat the ASR grammars can use to process ECMAScript code contained within tags of matched grammar rules.

Attributes

None

Parent

<interpretation>

Children

None

Example

<?xml version="1.0" encoding="ISO-8859-1"?>
<result grammar="session:test_grammar.grxml">
    <interpretation confidence="94">
        <input mode="speech">
            San Diego
        </input>
        <instance>
            Destination
        <instance/>
    </interpretation>
</result>


"input" Element


Description

Text representation of the user's input.

Attributes

                                               

  

Attribute

  
  

Description

  
  

mode

  
  

The modality of the input, which will be either speech or dtmf

  
  

confidence

  
  

An optional integer from 0-100 (MRCPv1) or 0.0 to   1.0 (MRCPv2) indicating the semantic analyzer's confidence in this   interpretation.

  
  

timestamp-start

  
  

This attribute is not used or populated by LumenVox

  
  

timestamp-end

  
  

This attribute is not used or populated by LumenVox

  

Parent

<interpretation>

Children

<noinput> <nomatch>

Example

<?xml version="1.0" encoding="ISO-8859-1"?>
<result grammar="session:test_grammar.grxml">
    <interpretation confidence="87">
        <input mode="speech">
            San Diego
        </input>
        <instance>
            Destination
        <instance/>
    </interpretation>
</result>


"noinput" Element


Description

The "noinput" element under "input" is used to indicate that the ASR interpreter did not detect any speech. A timeout expired in the speech recognizer while waiting for start of speech, or Voice Activity Detection (VAD), not none was detected

Attributes

None

Parent

<input>

Children

None

Example

<?xml version="1.0" encoding="ISO-8859-1"?>
<result grammar="session:test_grammar.grxml">
    <interpretation confidence="0">
        <instance/>
        <input>
            <noinput/>
        </input>
    </interpretation>
</result>


"nomatch" Element


Description

The "nomatch" element under "input" is used to indicate that the ASR interpreter was unable to successfully match any input. This can occur if the confidence score for a result is lower than the Confidence-Threshold for a recognition, in which case the ASR will force the result to become a nomatch.  This type of response generally indicates that an ASR result was obtained, but was not a good match for any of the constraints of the loaded grammars.  If no speech was detected, this would typically have resulted in a no-input response.

Attributes

None

Parent

<input>

Children

None

Example

<?xml version="1.0" encoding="ISO-8859-1"?>
<result grammar="session:test_grammar.grxml">
    <interpretation confidence="0">
        <instance/>
        <input>
            <nomatch/>
        </input>
    </interpretation>
</result>


More Complex Results

There is often confusion regarding the format of NLSML when results are more complex than the simple examples shown above. Listed below are some examples of how LumenVox NLSML appears when more complex results are encountered. The overall format and content of NLSML is highly dependent on the grammar format specified and the semantic interpretation information within them.  

Here are some examples of more complex results that may be encountered:

Multiple N-Best Results

Each N-Best alternative occurs within its own <interpretation> element tag, with a unique <input> child element. Within the <interpretation> tag, there may be one or multiple <instance> tags.

<?xml version="1.0" encoding="ISO-8859-1"?>
<result>
    <interpretation grammar="builtin:grammar/boolean" confidence="80">
        <input mode="speech">
            no
        </input>
        <instance>
            false
        </instance>
    </interpretation>
    <interpretation grammar="builtin:grammar/boolean" confidence="20">
        <input mode="speech">
            nope
        </input>
        <instance>
            false
        </instance>
    </interpretation>
</result>

Multiple Parses

Note that when representing multiple parses in NLSML, only those parses for an input that returns different semantic interpretations will be represented. In other words, if multiple parses return the same semantic interpretation, they will be combined.

Multiple Interpretations, Single Grammar

When there are multiple interpretations for the same input using one grammar, each semantic interpretation is contained in its own <instance> element tag within the single <interpretation> tag.

<?xml version="1.0" encoding="ISO-8859-1"?>
<result>
    <interpretation grammar="session:FourNumbers" confidence="80">
        <input mode="speech">
            one twenty three forty five
        </input>
        <instance>
            1,23,40,5
        </instance>
        <instance>
            1,20,3,45
        </instance>
    </interpretation>
</result>

Multiple Interpretations, Multiple Grammars

For interpretations from the same input using multiple grammars, each semantic interpretation is contained in separate <interpretation> element tags

<?xml version="1.0" encoding="ISO-8859-1"?>
<result>
    <interpretation grammar="session:my_date_grammar" confidence="80">
        <input mode="speech">
            two thirty
        </input>
        <instance>
            02/30/????
        </instance>
    </interpretation>
    <interpretation grammar="session:my_time_grammar" confidence="80">
        <input mode="speech">
            two thirty
        </input>
        <instance>
            2:30
        </instance>
    </interpretation>
</result>

Multiple Slots, Single Interpretation

If a single interpretation contains multiple slots, each slot value is contained in an XML tag with the slot name under a single instance

<?xml version="1.0" encoding="ISO-8859-1"?>
<result>
    <interpretation grammar="session:my_city_state" confidence="80">
        <input mode="speech">
            San Diego California
        </input>
        <instance>
           <city>
               San Diego
           </city>
           <state>
               California
           </state>
        </instance>
    </interpretation>
</result>