A Blog by Jonathan Low

 

Feb 26, 2019

Google AI Technique Reduces Speech Recognition Errors by 29 Percent

As expanded communication and ecommerce become more important to the global economy, accurate interpretation becomes a more critical feature. JL

Kyle Wiggers reports in Venture Beat:

Most automatic speech recognition systems jointly train three components: an acoustic model that learns the relationship between audio signals and linguistic, a language model that assigns probabilities to sequences of words, and a mechanism that aligns acoustic and recognized symbols. All use a single neural network and transcribe audio-text pairs. As a result, the model suffers degraded performance when it encounters words that infrequently occur. Incorporating (a) spelling correction model generates expanded output with “significantly” lower word error rate.
Speech recognition is pretty darn good these days. State-of-the-art models like EdgeSpeechNet, which was detailed in a research paper late last year, are capable of achieving about 97 percent accuracy. But even the best systems sometimes stumble on uncommon and rare words.
To narrow the gap, scientists at Google and the University of California propose an approach that taps a spelling correction model trained on text-only data. In a paper published on the preprint server Arxiv.org (“A Spelling Correction Model for End-to-End Speech Recognition“), they report that in experiments with the 800-word, 960-hour language modeling LibriSpeech dataset, their technique shows a 18.6 percent relative improvement in word error rate (WER) over the baseline. In some cases, it even managed 29 percent error reduction.
“The goal … is to incorporate a module trained on [text] data into the end-to-end framework, with the objective of correcting errors made by the system,” they wrote. “Specifically, we investigate using unpaired … data to [generate] audio signals using a text-to-speech (TTS) system, a process similar to backtranslation in machine translation.”
As the paper’s authors explain, most automatic speech recognition (ASR) systems jointly train three components: an acoustic model that learns the relationship between audio signals and the linguistic units that make up speech, a language model that assigns probabilities to sequences of words, and a mechanism that performs alignment the acoustic frames and recognized symbols. All three use a single neural network (layered mathematical functions modeled after biological neurons) and transcribed audio-text pairs, and as a result, the language model typically suffers degraded performance when it encounters words that infrequently occur in the corpus.
The researchers, then, set out to incorporate the aforementioned spelling correction model into the ASR framework — a model that decodes input and output sentences as sub-word units called “wordpieces,” and that takes the word embeddings (i.e., features mapped to vectors of real numbers) and maps them to higher-level representations. They used text-only data and corresponding synthetic audio signals generated using a text-to-speech (TTS) system (parallel WaveNet) to train an LAS speech recognizer, an end-to-end model first described by Google Brain researchers in 2017, and subsequently to create a set of TTS pairs. Then, they “taught” the spelling corrector to correct potential errors made by the recognizer by feeding it those pairs.
In order to validate the model, the researchers trained a language model, generated a TTS dataset to train the LAS model, and produced error hypotheses to train the spelling correction model with 40 million text sequences from the LibriSpeech dataset, after filtering out 500,000 sequences that contained only single-letter words and those that were shorter than 90 words. They found that, by correcting entries from the LAS, the speech correction model could generate an expanded output with “significantly” lower word error rate.

1 comments:

morphigo said...

Hotmail Login is the standout amongst the most prominent free online email administrations, which give by Microsoft. You need to realize that the Hotmail Login 365 is a Webmail administration and you can get to it from any place over the world with any internet browser with the assistance of a web association.

Post a Comment