Natural Language Semantics Markup Language (NLSML) is an XML format used to represent speech semantic result information within MRCP implementations. This format is designed to allow software to be able to parse the data in a defined way, allowing varying amounts of data to be represented.
The NLSML format was defined by the W3C working group in their draft documentation, which should be used to provide a more in-depth reference when using this format.
Although there are a number of defined uses for NLSML, it is primarily used in the way LumenVox has implemented it, where the output of the ASR is formatted in NLSML before being sent in MRCP replies when using the LumenVox Media Server to connect to the ASR. The NLSML format is used in both MRCPv1 and MRCPv2 with the only major difference being the scaling used when reporting confidence scores 0 to 100 for MRCPv1 and 0.0 to 1.0 for MRCPv2 as defined by their respective specifications.
Since this format is based on a working draft, and that draft was designed to be used in a number of different ways, there is often confusion surrounding the way in which this format is actually used in practice. This article describes the specific way in which LumenVox has implemented its version of NLSML, which seems to work reasonably well in conjunction with a large number of platforms and technologies that our Media Server has been certified against. It should be noted that other vendors may implement this in a number of different ways.
Note that the specific formatting of the NLSML output from the ASR can be customized when using the LumenVox Media Server by specifying different compatibility_mode options in the media_server.conf configuration file. This is an advanced option, which we recommend only using in conjunction with LumenVox Client Services assistance.
Please also note that examples given here have carriage returns and indentation added to improve readability. Actual NLSML results would not normally have these, so would be considered inline XML, without unnecessary whitespace.
Output Encoding Format
Starting with LumenVox version 12.0 (of our legacy product-line), the output NLSML will also have the "ISO-8859-1" encoding format declared in the XML header to avoid ambiguity when third party applications parse non-ASCII characters that may be contained in the results. Please refer to the Wikipedia article describing ISO-8859-1 encoding for more details.
Users wanting to revert to the older format (without the encoding format being specified) can modify the "ResultHeader.txt" template files in Lang/ResultTemplates/1 and/or Lang/ResultTemplates/2 folders.
Note:
Users wishing to change the encoding format of these NLSML results to UTF-8 format (for example) could simply modify these "ResultHeader.txt" template files in a similar way to get the desired output if your platform does not accept our default ISO-8859-1 formatted results. Also note that the character encoding itself does not change if this modification is made, so accommodation for non-ASCII characters may be needed.
"result" Root Element
Description
The root element is <result> and includes one or more <interpretation> elements. Multiple interpretations result from ambiguities in the input of the semantic interpretation.
Attributes
Attribute
|
Description
|
grammar
|
The grammar or recognition rule matched by this result. The grammar can be (and generally
is) overridden by a grammar attribute in the "interpretation"
element, so this attribute may generally not be present in the result element
|
xmlns
|
The XML namespace for MRCP – this attribute is not used or populated by LumenVox
|
Parent
None
Children
<interpretation>
Example
<?xml version="1.0" encoding="ISO-8859-1"?>
<result>
<interpretation grammar="http://192.168.0.55/grammars/test.grxml" confidence="90">
<input mode="speech">
San Diego
</input>
<instance>
Destination
<instance/>
</interpretation>
</result>
"interpretation" Element
Description
Encapsulates the input hypothesis. There should be one or more "interpretation" elements in each result. When multiple
"interpretation" elements are present, these will each represent n-best alternative results, each indicating the confidence score of its respective hypothesis.
Attributes
Attribute
|
Description
|
grammar
|
The grammar or recognition rule matched by this result. (The format of the grammar attribute will match the rule reference semantics defined in the grammar specification.)
|
confidence
|
An integer from 0-100 (MRCPv1) or 0.0 to 1.0
(MRCPv2) indicating the semantic analyzer's confidence in this interpretation.
|
Parent
<result>
Children
<input> <instance>
Example
<?
xml version="1.0" encoding="ISO-8859-1"?>
<result> <interpretation grammar="builtin:grammar/digits" confidence="100"> <input mode="dtmf"> 1 2 3 4 </input> <instance> John's PIN code <instance/> </interpretation></result>
"instance" Element
Description
Contains the SISR semantic interpretation from the ASR for the detected utterance. See our Intro to Semantic Interpretation article for more details on how to define and use semantic interpretation, along with the corresponding tag-formatthat the ASR grammars can use to process ECMAScript code contained within tags of matched grammar rules.
Attributes
None
Parent
<interpretation>
Children
None
Example
<?
xml version="1.0" encoding="ISO-8859-1"?>
<result grammar="session:test_grammar.
grxml">
<interpretation confidence="94">
<input mode="speech">
San Diego </input>
<instance> Destination <instance/> </interpretation>
</result>
"input" Element
Description
Text representation of the user's input.
Attributes
Attribute
|
Description
|
mode
|
The modality of the input, which will be either speech or dtmf
|
confidence
|
An optional integer from 0-100 (MRCPv1) or 0.0 to
1.0 (MRCPv2) indicating the semantic analyzer's confidence in this
interpretation.
|
timestamp-start
|
This attribute is not used or populated by LumenVox
|
timestamp-end
|
This attribute is not used or populated by LumenVox
|
Parent
<interpretation>
Children
<noinput> <nomatch>
Example
<?
xml version="1.0" encoding="ISO-8859-1"?>
<result grammar="session:test_grammar.
grxml">
<interpretation confidence="87">
<input mode="speech"> San Diego </input> <instance> Destination <instance/> </interpretation>
</result>
"noinput" Element
Description
The "noinput" element under "input" is used to indicate that the ASR interpreter did not detect any speech. A timeout expired in the speech recognizer while waiting for start of speech, or Voice Activity Detection (VAD), not none was detected
Attributes
None
Parent
<input>
Children
None
Example
<?
xml version="1.0" encoding="ISO-8859-1"?>
<result grammar="session:test_grammar.
grxml">
<interpretation confidence="0">
<instance/>
<input>
<noinput/> </input>
</interpretation>
</result>
"nomatch" Element
Description
The "nomatch" element under "input" is used to indicate that the ASR interpreter was unable to successfully match any input. This can occur if the confidence score for a result is lower than the Confidence-Threshold for a recognition, in which case the ASR will force the result to become a nomatch. This type of response generally indicates that an ASR result was obtained, but was not a good match for any of the constraints of the loaded grammars. If no speech was detected, this would typically have resulted in a no-input response.
Attributes
None
Parent
<input>
Children
None
Example
<?
xml version="1.0" encoding="ISO-8859-1"?>
<result grammar="session:test_grammar.
grxml">
<interpretation confidence="0">
<instance/>
<input>
<nomatch/> </input>
</interpretation>
</result>
More Complex Results
There is often confusion regarding the format of NLSML when results are more complex than the simple examples shown above. Listed below are some examples of how LumenVox NLSML appears when more complex results are encountered. The overall format and content of NLSML is highly dependent on the grammar format specified and the semantic interpretation information within them.
Here are some examples of more complex results that may be encountered:
Multiple N-Best Results
Each N-Best alternative occurs within its own <interpretation> element tag, with a unique <input> child element. Within the <interpretation> tag, there may be one or multiple <instance> tags.
<?xml version="1.0" encoding="ISO-8859-1"?>
<result>
<interpretation grammar="builtin:grammar/boolean" confidence="80">
<input mode="speech">
no
</input>
<instance>
false
</instance>
</interpretation>
<interpretation grammar="builtin:grammar/boolean" confidence="20">
<input mode="speech">
nope
</input>
<instance>
false
</instance>
</interpretation>
</result>
Multiple Parses
Note that when representing multiple parses in NLSML, only those parses for an input that returns different semantic interpretations will be represented. In other words, if multiple parses return the same semantic interpretation, they will be combined.
Multiple Interpretations, Single Grammar
When there are multiple interpretations for the same input using one grammar, each semantic interpretation is contained in its own <instance> element tag within the single <interpretation> tag.
<?xml version="1.0" encoding="ISO-8859-1"?>
<result>
<interpretation grammar="session:FourNumbers" confidence="80">
<input mode="speech">
one twenty three forty five
</input>
<instance>
1,23,40,5
</instance>
<instance>
1,20,3,45
</instance>
</interpretation>
</result>
Multiple Interpretations, Multiple Grammars
For interpretations from the same input using multiple grammars, each semantic interpretation is contained in separate <interpretation> element tags
<?xml version="1.0" encoding="ISO-8859-1"?>
<result>
<interpretation grammar="session:my_date_grammar" confidence="80">
<input mode="speech">
two thirty
</input>
<instance>
02/30/????
</instance>
</interpretation>
<interpretation grammar="session:my_time_grammar" confidence="80">
<input mode="speech">
two thirty
</input>
<instance>
2:30
</instance>
</interpretation>
</result>
Multiple Slots, Single Interpretation
If a single interpretation contains multiple slots, each slot value is contained in an XML tag with the slot name under a single instance
<?xml version="1.0" encoding="ISO-8859-1"?>
<result>
<interpretation grammar="session:my_city_state" confidence="80">
<input mode="speech">
San Diego California
</input>
<instance>
<city>
San Diego
</city>
<state>
California
</state>
</instance>
</interpretation>
</result>