A Blog by Jonathan Low

 

Aug 3, 2012

Hate Siri All You Want: It's Still the Future

Ok, haters, they get it. Siri isn't perfect.

But before anyone declares it a failed experiment, exhale. Voice is here. And in terms of what's next, you're probably looking at it. Social? Check. Mobile? Check. Voice? Comin' up.

The issue, as the following piece explains, is that Apple bought two different companies to provide the voice technology for the iPhone. One was Siri (uh-huh), which provided the 'assistant' feature based on a declassified military artificial intelligence program. But the diction and grammar, et al came from an entirely separate entity called Nuance. Indeed.

And the problem may not be so much with voice right now as with integrating voice with mobile. Which is part of a much larger mobile/mobility issue. Just ask any merchant or software developer. As an economy, we are playing catch-up on the whole mobile functionality thing. Not catch-up as in, 'who's ahead of us,' but as in meeting consumers' extraordinary expectations. And who can blame them? The developments to date have been mind-boggling. But we have come to expect perfection and we get cranky when we dont get what we want.

But the real driver of consumer technology is convenience. We have sacrificed almost everything of value at its alter. And voice is simply more convenient than the currently available alternatives. So laugh, disparage, criticize and dismiss poor Siri all you want. But she is the functional equivalent of those first hominids who walked off the African savannah. And her successors are moving very, very fast. JL

David Pogue reports in Scientific American:
Siri, as everyone knows by now, is a software assistant that takes spoken orders. No training necessary: just hold down the “Home” button and speak casually.

Siri lit the cultural world on fire. There were YouTube parodies, how-to guides and copycat apps for Android phones. Pundits have proposed new rules of etiquette for using phones in public now that people are speaking to them even when they're not on a call. Speech recognition became all the rage; suddenly, it popped up in television sets and, of course, rival phones. At the crest of the hype, it looked like the way we interact with our gadgets had changed forever. And then—the backlash.
“Siri Is Apple's Broken Promise” was the headline at gadget site Gizmodo. People griped that sometimes you'd dictate a whole paragraph, the phone would think and then type—nothing at all. Now there has been a class-action lawsuit asserting that Apple made false claims. (According to Apple, Siri is still in beta testing.)

What happened? How could Siri, the savior of electronics, turn out to be such a bust?

What everybody's missing is the difference between Siri, the virtual assistant, and Siri, the speech-recognition engine. As it turns out, these two different functions have wildly different track records for success.

The assistant half of Siri comes from a company called Siri, which Apple bought. (It was a spin-off from a military artificial intelligence project that wound up at the research firm SRI. Get it?)

But the dictation feature—the text-to-speech part—is provided by Nuance, the company that brought us software such as Dragon NaturallySpeaking.

When you dictate, you generate an audio file that is transmitted to Nuance's servers; they analyze your speech and send the text back to your phone. That is why, when your Internet signal isn't great or when the cell network is congested, Siri may come up short. (When you're on Wi-Fi, dictation works far better.)

That requirement to shuttle data to and from remote servers is at the heart of Siri's frustratingly inaccurate dictation talents.

There are other challenges to the dictation feature, too. Irregular background noise, wind and variable distance from mouth to microphone all make transcription perfection on a cell phone a towering task—and the results are much less accurate than what you would get using PC dictation software, which faces none of those difficulties. Using Siri (and the even less polished dictation feature on Android phones), you might have to correct two or three errors per paragraph.

Desktop dictation software fares much better—close to 100 percent accuracy—because it doesn't have any of those particular challenges. And on your PC, you train the software to recognize only one voice: yours. There's no training on the phone. The computational task is ridiculously hard.

The backlashers have a point. We're used to consumer technology that works every time: e-mail, GPS, digital cameras. Dictation technology that relies on cellular Internet, though, only sort of works. And that can be jarring to encounter in this day and age.

But let's not throw the Siri out with the bathwater. The “virtual assistant” portion of Siri—all those commands to set an alarm, call someone, text someone, record an appointment—works solidly. Even if all you use are basic commands such as “Wake me at,” “Call,” “Text” and “Remind me,” you save time and fumbling.

Free-form cellular dictation is a not-there-yet technology. But as an interface for controlling our electronics, it makes the future of speech every bit as bright as Siri promised a year ago.

Just wait till she comes out of beta.

0 comments:

Post a Comment