Redmond Software

Turning to Another Speech Provider

Redmond enlisted the help of LumenVox, an ASR provider experienced at working in noisy environments. Redmond knew that a truly collaborative effort was the only way that they would provide a solution that would meet General Motors' needs.

"In order to get any Technical Support with the previous vendor, you have to go through their process, and pay their prices," Berdinas said. "And even so, they were not really giving us any solutions."

Gustavo Berdinas

To gauge performance between the existing speech engine and the new candidate, Redmond sent the exact same sets of recorded audio through both engines. This audio consisted of proper names, words, phrases and numbers. For the most part, the two engines provided similar results — the accuracy was almost equal between the two different solutions. The area that LumenVox excelled at was in the recognition of numbers.

When testing the recognition of digits using the same audio LumenVox easily bested the incumbent. Redmond immediately began using LumenVox solely for digits-only recognitions. However, these findings also prompted Redmond to take a closer look at LumenVox.

What Redmond found was a partner willing to help with its problem. "We thought that working with a smaller company would make it much easier to give LumenVox our feedback, and really work together in solving the problem," said Berdinas. "With LumenVox, it was also the right timing."

Languages

LumenVox had been looking for a partner to help increase the number of foreign languages it supported. It already had a Spanish language acoustic model, but it was not accurate enough.

In order to create or improve an acoustic model, software developers and speech scientists take hours of recorded, transcribed audio, and extract the mathematical representations of these spoken words. Getting this audio data is usually a lengthy and expensive process.

To help in improving LumenVox's acoustic model, Redmond gave LumenVox developers recorded audio from actual ChevyStar users. The shared data allowed the LumenVox team to rebuild their South American Spanish model. This greatly improved the model's accuracy, particularly for the ChevyStar application.

Tuning the ChevyStar Grammar

The initial improvement in digits recognition was significant, but both teams saw more room for improvement. A speech recognition application needs fine-tuning after its deployment.

One of the areas that LumenVox wanted to start on was the design of the grammars that Redmond was using.

A grammar is the list of the words that the Speech Engine is expected to recognize. The Engine converts audio to text, and then that text is compared to the grammar. In order for a word to be recognized, it must be listed in the grammar.

ChevyStar's system for writing the car owners' grammars did not format them to meet new industry standards. LumenVox saw this as low-hanging fruit, and worked with Redmond's application developers to overhaul the output of their grammar creation system.

LumenVox had adopted the speech recognition grammar specification (SRGS) as the standard by which its engine should accept grammars. This allowed the LumenVox team to apply more advanced techniques in tuning the grammars than had been previously available.

Weights

For example, one of the more common misrecognitions they discovered was with the word seis. It's the Spanish word for "six." The problem was that it rhymes, and therefore sounds like, the Spanish word for "three" — tres. The engine was mistaking people saying seis for tres. The LumenVox development team applied weights to counter this.

Weights are a way of changing the likelihood that the Engine will pick one item over another. So, they gave seis a greater weight, and found that after several iterations, the misrecognitions began to decrease.

Another technique LumenVox used was to employ rule expansions. A grammar can declare several words to mean the same thing and have the same result. In this case the Spanish word for "three" can either be pronounced tres or tre, depending on the person speaking. Because the grammars were now being built with SRGS rules, the developers were able to add both tres and tre as acceptable options to convey "three."

"We were actually quite impressed by the accuracy gained from this simple change," said Axel An, one of the LumenVox developers working on the project.

The result was that LumenVox and Redmond were now able to fine-tune the grammars with greater detail, therefore increasing accuracy, one point at a time.

© 2016 LumenVox, LLC. All rights reserved.