HOSTING

Managed Services

PRODUCTS

Speech Engine

AMD

Text to Speech

SERVICES

Discover

Design

Develop

Deploy

ASR

Subscriptions

Multiple Ways of Billing

Turning to Another Speech Provider

Redmond enlisted the help of LumenVox, an ASR provider experienced at working in noisy environments. Redmond knew that a truly collaborative effort was the only way that they would provide a solution that would meet General Motors' needs.

"In order to get any Technical Support with the previous vendor, you have to go through their process, and pay their prices," Berdinas said. "And even so, they were not really giving us any solutions."

Gustavo Berdinas

To gauge performance between the existing speech engine and the new candidate, Redmond sent the exact same sets of recorded audio through both engines. This audio consisted of proper names, words, phrases and numbers. For the most part, the two engines provided similar results — the accuracy was almost equal between the two different solutions. The area that LumenVox excelled at was in the recognition of numbers.

When testing the recognition of digits using the same audio LumenVox easily bested the incumbent. Redmond immediately began using LumenVox solely for digits–only recognitions. However, these findings also prompted Redmond to take a closer look at LumenVox.

What Redmond found was a partner willing to help with its problem. "We thought that working with a smaller company would make it much easier to give LumenVox our feedback, and really work together in solving the problem," said Berdinas. "With LumenVox, it was also the right timing."

Languages

LumenVox had been looking for a partner to help increase the number of foreign languages it supported. It already had a Spanish language acoustic model, but it was not accurate enough.

In order to create or improve an acoustic model, software developers and speech scientists take hours of recorded, transcribed audio, and extract the mathematical representations of these spoken words. Getting this audio data is usually a lengthy and expensive process.

To help in improving LumenVox's acoustic model, Redmond gave LumenVox developers recorded audio from actual ChevyStar users. The shared data allowed the LumenVox team to rebuild their South American Spanish model. This greatly improved the model's accuracy, particularly for the ChevyStar application.

Tuning the ChevyStar Grammar

The initial improvement in digits recognition was significant, but both teams saw more room for improvement. A speech recognition application needs fine–tuning after its deployment.

One of the areas that LumenVox wanted to start on was the design of the grammars that Redmond was using.

A grammar is the list of the words that the Speech Engine is expected to recognize. The Engine converts audio to text, and then that text is compared to the grammar. In order for a word to be recognized, it must be listed in the grammar.

ChevyStar's system for writing the car owners' grammars did not format them to meet new industry standards. LumenVox saw this as low–hanging fruit, and worked with Redmond's application developers to overhaul the output of their grammar creation system.

LumenVox had adopted the speech recognition grammar specification (SRGS) as the standard by which its engine should accept grammars. This allowed the LumenVox team to apply more advanced techniques in tuning the grammars than had been previously available.

Weights

For example, one of the more common misrecognitions they discovered was with the word seis. It's the Spanish word for "six." The problem was that it rhymes, and therefore sounds like, the Spanish word for "three" — tres. The engine was mistaking people saying seis for tres. The LumenVox development team applied weights to counter this.

Weights are a way of changing the likelihood that the Engine will pick one item over another. So, they gave seis a greater weight, and found that after several iterations, the misrecognitions began to decrease.

Another technique LumenVox used was to employ rule expansions. A grammar can declare several words to mean the same thing and have the same result. In this case the Spanish word for "three" can either be pronounced tres or tre, depending on the person speaking. Because the grammars were now being built with SRGS rules, the developers were able to add both tres and tre as acceptable options to convey "three."

"We were actually quite impressed by the accuracy gained from this simple change," said Axel An, one of the LumenVox developers working on the project.

The result was that LumenVox and Redmond were now able to fine–tune the grammars with greater detail, therefore increasing accuracy, one point at a time.

HOW A SPEECH ENGINE WORKS

  • A speech engine loads a list of words to be recognized. This list of words is called a grammar. Audio from a speaker is captured by a microphone or telephone. This audio is turned into a waveform, a mathematical representation of sound. The engine then determines which words in the grammar the audio most closely matches and returns as a result.
  • Learn More

COMPANY OVERVIEW

  • ChevyStar Logo
  • ChevyStar is a service that uses a GM vehicle's built–in cellular phone, GPS, microphone and speakers. It allows for hands–free phone calling and GPS navigation, security features and remote vehicle diagnostics. Computers with sensors on the car's hardware (brakes, airbag, engine, etc.) communicate via the cellular network to the ChevyStar support centers in the event that maintenance is needed, there's been an accident, or for personal assistance like remote keyless entry. They've recently deployed a stolen vehicle shutdown system, which disables the vehicle if it's reported stolen.

CONTACT

  • Contact Us
  • +1–858–707–7700
  • Toll Free: (877) 977–0707,
    say "Sales"