A Blog by Jonathan Low

 

Nov 11, 2012

Microsoft Turns Spoken English into Mandarin - in the Same Voice Tone

Instantaneous translation in the same voice is here.

OK, it is not widely and immediately available. But its theoretical possibility has been confirmed and tested. Now? Microsoft and others simply have to figure out how to make it commercially viable. And given the market in China and between it and the rest of the world, how long should that take?

Technology typically works most effectively - and enjoys the greatest success - when it increases the convenience with which common interactions can be conducted. Many people in China are learning to speak English, but it is not simple and it can be expensive. As trade grows with other regions, where Portuguese (Brazil) or Spanish are more widely spoken, opportunities will expand with the ease of communication. Waiting for someone else to translate and overcoming misunderstandings is kludgy, time-consuming and fraught.

While still a novelty, this advance is important as much for what it indicates can be accomplished in related realms as for the astonishing fact of its own development. JL

Alex Wilhelm reports in The Next Web:
Microsoft has posted a complementary explanation of language translation that goes far past what we thought was currently possible.

The speaker explains and demonstrates improvements made to the machine understanding of his English words, which are automatically transcribed as he speaks. He then demonstrates having those words translated directly into Mandarin text. This is when the fun begins. Microsoft, he says, has taken in oodles of data, and can thus have that translated Mandarin spoken. And the final kicker: he has fed the system an hour’s worth of his voice, and thus the software will speak in Mandarin, using his own tones.

It’s mindbending. What is the core technology that powers the tool? According to Rick Rashid, head of Microsoft Research, the man who gave the presentation [Bold: TNW]:

“Just over two years ago, researchers at Microsoft Research and the University of Toronto made another breakthrough. By using a technique called Deep Neural Networks, which is patterned after human brain behavior, researchers were able to train more discriminative and better speech recognizers than previous methods.

[...] We have been able to reduce the word error rate for speech by over 30% compared to previous methods. This means that rather than having one word in 4 or 5 incorrect, now the error rate is one word in 7 or 8.”

The future, it’s coming.

0 comments:

Post a Comment