A Blog by Jonathan Low

 

Oct 12, 2016

Conversational Computing: The Growing Importance of Voice

The growth of conversational computing means that voice tone, accent and timbre take on increasing importance for marketing and interactive purposes.

But as the following article suggests, it also raises questions about social, cultural and interpersonal communications. JL

Quentin Hardy reports in the New York Times:

Listening and talking are the new input and output devices of computers. But they have social and emotional dimensions never seen with keyboards and screens.
Jason Mars is an African-American professor of computer science who also runs a tech start-up. When his company’s artificially intelligent smartphone app talks, he said, it sounds “like a helpful, young Caucasian female.”
“There’s a kind of pressure to conform to the prejudices of the world” when you are trying to make a consumer hit, he said. “It would be interesting to have a black guy talk, but we don’t want to create friction, either. First we need to sell products.”
Mr. Mars’s start-up is part of a growing high-tech field called conversational computing. This technology is being popularized by programs like the Siri system in Apple’s iPhone, and Alexa, which is built into the Echo, Amazon’s artificially intelligent home computing device.
Conversational computing is holding a mirror to many of society’s biggest preconceptions around race and gender. Listening and talking are the new input and output devices of computers. But they have social and emotional dimensions never seen with keyboards and screens.
Do we, for example, associate the stereotypical voice of an English butler — think of Jarvis the computer in “Iron Man” — with a helpful and intelligent person? And why do so many people want to hear a voice that sounds like it came from a younger woman with no discernible accent?
Choosing a voice has implications for design, branding or interacting with machines. A voice can change or harden how we see each other. Where commerce is concerned, that creates a problem: Is it better to succeed by complying with a stereotype, or risk failure in the market by going against type?
For many, the answer is initially clear. Microsoft’s artificially intelligent voice system is Cortana, for example, and it was originally the voice of a female character in the video game “Halo.”
“In our research for Cortana, both men and women prefer a woman, younger, for their personal assistant, by a country mile,” said Derek Connell, senior vice president for search at Microsoft. In other words, a secretary — a job that is traditionally seen as female.
Last week, Google introduced a number of voice-based products, including Google Home, its version of Echo. All of them use Google Assistant, which also speaks in tones associated with a young, educated woman.
Google Assistant “is a millennial librarian who understands cultural cues, and can wink at things,” said Ryan Germick, who leads the personality efforts in building Google Assistant. “Products aren’t about rational design decisions. They are about psychology and how people feel.”
The company has had internal debates about whether to respond differently on questions to the computer about suicide, Mr. Connell said. “We’ve leaned to providing information about suicide prevention everywhere,” he said, as opposed to offering no advice at all.
But sometimes, if you want people to figure out quickly that they are talking to a machine, it can be better to have a man’s voice. For example, IBM’s Watson, when it talks to Bob Dylan in television commercials, has a male voice. When Ashok Goel, a professor at the Georgia Institute of Technology, adapted Watson to have a female voice as an informal experiment in how people relate to conversational machines, his students couldn’t tell it was a computer.
But Watson’s maleness is the exception. Amazon’s A.I. technology is another in the comforting female voice camp.
“Alexa was always an assistant, and female,” said Peng Shao, who worked at Amazon on the Echo and is now at a Seattle start-up, building another speech-based A.I. system. Amazon would not comment on its product.
Gender is just the starting point. Can your A.I. technology understand accents? And can it respond in a way that feels less robotic and at least mimics some sort of human empathy?
“You need a persona,” Mr. Shao said. “It’s a very emotional thing — people would get red, even get violent, if it didn’t understand them. When it did understand them, it felt like magic. They sleep next to them. This is heading for hospitals, senior care, a lot of sensitive places.”
Capital One developed a banking app on Alexa, and found it had to dial down the computer’s formality to make people comfortable talking about their finances with a computer.
“Money is inextricably linked to emotion, enabling and preventing things in your life,” said Stephanie Hay, the head of content strategy, culture and A.I. design at Capital One. At first the app said, “Hello,” but that seemed too tense. “‘Hi, there’ worked better,” she said. “She’s my friend, hanging out with me in the kitchen. I need her to be reliable and approachable, but not invasive.”
We don’t just need that computerized voice to meet our expectations, said Justine Cassell, a professor at Carnegie Mellon’s Human-Computer Interaction Institute. We need computers to relate to us and put us at ease when performing a task. “We have to know that the other is enough like us that it will run our program correctly,” she said.
That need seems to start young. Ms. Cassell has designed an avatar of indeterminate race and gender for 5-year-olds. “The girls think it’s a girl, and the boys think it’s a boy,” she said. “Children of color think it’s of color, Caucasians think it’s Caucasian.”
Another system she built spoke in what she termed “vernacular” to African-American children, achieving better results in teaching scientific concepts than when the computer spoke in standard English.
When tutoring the children in a class presentation, however, “we wanted it to practice with them in ‘proper English.’ Standard American English is still the code of power, so we needed to develop an agent that would train them in code switching,” she said.
And, of course, there are regional issues to consider when creating a robotic voice. For Cortana, Microsoft has had to tweak things like accents, as well as languages, and the jokes Cortana tells for different countries.
If a French driver goes into Germany using driving directions voiced by Nuance Communications, the computer will mispronounce the name of a German town with a French accent. The idea is to keep the driver confident by sustaining the illusion that the computer is French.
Local accents can be found in various versions of Apple’s Siri. It’s possible to localize the accent on an iPhone for the United States (“Samantha,” on to the phone’s settings), Australia (“Karen”), Ireland (“Moira”), South Africa (“Tessa”), and Britain (“Daniel.”) Apple could not say whether the English tradition of male butlers influenced its British choice.
Mr. Mars’s company, called Clinc, makes personal financial smartphone software that answers questions like “how much can I spend on a computer?” It relies on a similar Google-created female voice.
He is hoping for enough success that he can eventually test and counter stereotypes with unexpected A.I. voices. “You need to be at a certain size before you can address these questions,” said Mr. Mars, who teaches at the University of Michigan.
But maybe not too big. “I think consumers will eventually be open to exploring different voices and types,” he said. “Companies, they’ll probably stay conservative about it.”

0 comments:

Post a Comment