Creating Accurate Transcripts

Creating Accurate Transcripts

Transcripts entered with the Speech Tuner are helpful to tuning a speech recognition application. They are used to perform tests on a speech application in order to pinpoint problems.

While transcribing, you should be mainly concerned with audio that is appropriate for the speech application and that is recognizable by humans. It is a subjective process and transcribers should strive to maximize their efficiency.

For instance, if the application records a caller talking to another person in the background, that speech can simply be marked as garbage and discarded while doing tests. Likewise, if a caller says something that is unintelligible to a human, there is no way the Engine can be expected to understand it and thus it can be marked as garbage as well.

What should be transcribed are utterances that are appropriate for the application. This includes intelligible out–of–grammar responses when those responses make sense as being valid responses to the prompt.

For perfect transcripts, transcribe what a speaker said verbaitm, without orrecting grammatical errors or mispronunciations. If you have a need for perfect, very detailed transcripts, the following rules are useful.

Guidelines for Detailed Transcripts

  • Go slowly and do a minimum of 2–4 listens.
  • Get a feel for what the speaker is saying.
  • Transcribe the words.
  • Transcribe the noises (if creating acoustic models).
  • Check the words and noises.
  • If any changes are made to the transcription, listen again.

Try to get the words, noises, and their placements correct in order for the Tuner to know which sounds correspond to which word or sound in the transcription.

Transcription Rules

Grammatical errors and mispronunciations: For transcription purposes there are no such things as grammatical or mispronunciation errors. Transcribe precisely what the caller said. If the caller says "I seen him," then transcribe "I SEEN HIM."

Standard reductions, alternate pronunciations and contractions: Transcribe as spoken.

Caller Transcription
naw (no) NAW
nah (no) NAH
gonna GONNA
wanna WANNA
y'all Y'ALL

Hyphenating: Never hyphenate.

Compound words: Unless there is an obvious pause between two words, all compound words should be transcribed as one word when such a word exists in the dictionary. "Everyday" should not be transcribed as "EVERY DAY" for instance.

Abbreviations: Never abbreviate, except when the speaker says the abbreviation. If the caller says "Doctor" then transcribe "DOCTOR" and not "Dr." However, if the caller says "Ave" instead of "Avenue" then transcribe "AVE."

Punctuation: No punctuation should be used in transcriptions. Do not put in periods, commas, question marks, etc. However, if the word is possessive or a contraction you may use the apostrophe. Never use double quotes, the "+," "<," or ">" symbols. These symbols are used in the underlying code in order to analyze the gathered data.

Common Misspellings: Watch for common spelling confusions. For instance, "they're," "there," and "their" all sound the same.

Numbers: Numbers should be transcribed as words. If the caller says "Four hundred and fifty five" then the transcription should read "FOUR HUNDRED AND FIFTY FIVE" and not "455."

Letter sequences: Spell out letter sequences.

  • Transcribe a spoken spelling by separating each letter by a space. For example, if a caller speaks "My name is spelled S–U–S–A–N," then transcribe "MY NAME IS S U S A N."
  • When a letter sequence is used as part of an inflected word, add the inflection to the end of the sequence with an apostrophe. If a caller says, "The witness ID'ed him," then transcribe "THE WITNESS ID'ED HIM."

Acronyms: Transcribe acronyms as they are said. "NATO" is transcribed as "NATO" with no spaces or periods.

Initialisms: Transcribe initialisms as they are said. "CIA" is transcribed as "C I A" with spaces to denote that each letter is pronounced individually.

Possessives: Use standard punctuation rules to denote possession. "Susan's book" is transcribed simply as "SUSAN'S BOOK" and "The drivers' cars" is transcribed "THE DRIVERS' CARS."

Filler noise: Depending on the type of filler noise, it should be transcribed as either a noise tag or a word.

Caller Transcription
uh, ah, um ++UM++
huh HUH

Yes/no sounds: For anything resembling sounds of assent or denial, transcribe them as they sound.

Caller Transcription
hum um UHHUH
huh HUM UM
yeah YEAH
yep YEP

Gender: Pick the appropriate gender for what the speaker sounds like (if creating acoustic models).

Out-Of-Coverage:Any utterance that appears to be something you would never expect a speech application to be able to handle should be marked as Out-Of-Coverage. More information about OOC items can be found in our knowledge base article here.

© 2018 LumenVox, LLC. All rights reserved.