The Low-Down: The Reason Voice Is the Future of Robot Control

We know how to use it and generally agree on what it means, which is a useful platform from which to program intelligently. JL

Evan Ackerman reports in IEEE Spectrum:

Voice control is fast, intuitive, and as long as there's a reliable mapping between what you tell the robot to do and what the robot actually does, it could be very successful—if you're fine with having Alexa as a mediator. It's better to understand more prepositions and have them converge into a designed behavior which the vast majority of people would be okay with, while not requiring a magic incantation where you need your manual to look up what to tell Roomba to get it to do the right thing. Voice is the great point of integration for breaking this ceiling of control complexity because the smart home isn't smart today
I am not a fan of Alexa. Or Google Assistant. Or, really, any Internet-connected camera or microphone which has a functionality based around being in my house and active all of the time. I don't use voice-activated systems, and while having a webcam is necessary, I make sure to physically unplug it from my computer when I'm not using it. Am I being overly paranoid? Probably. But I feel like having a little bit of concern is reasonable, and having that concern constantly at the back of my mind is just not worth what these assistants have so far had to offer.
iRobot CEO Colin Angle disagrees. And last week, iRobot announced that it has "teamed with Amazon to further advance voice-enabled intelligence for home robots." Being skeptical about this whole thing, I asked Angle to talk me into it, and I have to say, he kinda maybe almost did.
Using Alexa, iRobot customers can automate routines, personalize cleaning jobs and control how their home is cleaned. Thanks to interactive Alexa conversations and predictive and proactive recommendations, smart home users can experience a new level of personalization and control for their unique homes, schedules, preferences and devices.
Here are the kinds of things that are new to the Roomba Alexa partnership:
"Roomba, Clean Around the [Object]" – Use Alexa to send your robot to clean a mess right where it happens with precision Clean Zones. Roomba can clean around specific objects that attract the most common messes, like couches, tables and kitchen counters. Simply ask Alexa to "tell Roomba, clean around the couch," and Roomba knows right where to go.

iRobot Scheduling with Alexa voice service – Thanks to Alexa's rich language understanding, customers can have a more natural interaction directing their robot using their voice to schedule cleaning Routines. For example, "Alexa, tell Roomba to clean the kitchen every weeknight at 7 pm," or "Alexa, tell Braava to mop the kitchen every Sunday afternoon."

Alexa Announcements – Alexa can let customers know about their robot's status, like when it needs help or when it has finished a cleaning job, even if your phone isn't nearby.

Alexa Hunches – The best time to clean is when no one is home. If Alexa has a 'hunch' that you're away, Alexa can begin a cleaning job.
The reason why this kind of voice control is important is because Roombas are getting very, very sophisticated. The latest models know more about our homes than ever before, with maps and object recognition and all kinds of complex and intelligent behaviors and scheduling options. iRobot has an app that does its best to simplify the process of getting your Roomba to do exactly what you want it to do, but you still have to be comfortable poking around in the app on a regular basis. This poses a bit of a problem for iRobot, which is now having to square all these really cool new capabilities with their original concept for the robot that I still remember as being best encapsulated by having just one single button that you could push, labeled "Clean" in nice big letters.
iRobot believes that voice control is the answer to this. It's fast, it's intuitive, and as long as there's a reliable mapping between what you tell the robot to do and what the robot actually does, it seems like it could be very successful—if, of course, you're fine with having Alexa as a mediator, which I'm not sure I am. But after talking with iRobot CEO Colin Angle, I'm starting to come around.
IEEE Spectrum: I know you've been working on this for a while, but can you talk about how the whole Alexa and Roomba integration thing came about?
Colin Angle: This started back when Alexa first came out. Amazon told us that they asked people, "what should we do with this speaker?" And one of the first things that came up was, "I want to tell my Roomba to clean." It was within the original testing as to what Alexa should do. It certainly took them a while to get there, and took us a while to get there. But it's a very substantial and intuitive thing that we're supposed to be able to do with our robots—use our voice and talk to them. I think almost every robot in film and literature can be talked to. They may not all talk back in any logical way, but they all can listen and respond to voice.
Alexa's "hunches" are a good example of the kind of thing that I don't like about Alexa. Like, what is a hunch, and what does the fact that Alexa can have hunches imply about what it knows about my life that I didn't explicitly tell it?
That's the problem with the term "hunch." It attributes intelligence when what they're trying to do is attribute uncertainty. Amazon is really trying to do the right thing, but naming something "hunch" just invites speculation as to whether there's an AI there that's listening to everything I do and tracking me, when in some way it's tragically simpler than all that—depending on what it's connected to, it can infer periods of inactivity.
There's a question of what should you do and what shouldn't you do with an omnipresent ear, and that requires trust. But in general, Alexa is less creepy the more you understand how it works. And so the term "hunch" is meant to convey uncertainty, but that doesn't help people's confidence.
One of the voice commands you can give is having Alexa ask Roomba to clean around the couch. The word "around" can have different meanings for different people, so how do you know what a user actually wants when they use a term like "around?"
We've had to build these skills using words like around, underneath, beneath, near… All of these different words which convey approximate location. If we clean a little more than you want us to clean, but not a ton more, you're probably not going to be upset. So taking a little bit of superset liberties around how Roomba cleans still yields a satisfying result. There's a certain pragmatism that's required, and it's better to understand more prepositions and have them converge into a carefully designed behavior which the vast majority of people would be okay with, while not requiring a magic incantation where you'd need to go grab your manual so that you can look up what to tell Roomba in order to get it to do the right thing.
This is one of the fascinating challenges—we're trying to build robots into partners, but in general, the full functionality has largely been in the iRobot app. And yet the metaphor of having a partner usually is not passing notes, it's delivering utterances that convey enough meaning that your partner does what they're supposed to do. If you make a mess, and say, "Alexa, tell Roomba to clean up around the kitchen table" without having to use the app, that's actually a pretty rewarding interaction. It's a very natural thing, and you can say many things close to that and have it just work.
Our measure of success is that if I said Evan, suck it up, plug in that Alexa and then without reading the instructions, convey your will to Roomba to clean your office every Sunday after noon or something by saying something like that, and see if it works.
Clearly communicating intent using voice is radically more complicated with each additional level of complexity that you're trying to convey. —Colin Angle
Roomba can now recognize commands that use the word "and," like "clean under the couch and coffee table." I'm wondering how much potential there is to make more sophisticated commands. Things like, "Roomba, clean between the couch and the coffee table," or "Roomba, clean the living room for 10 minutes."
Of the things you said, I would say that we can do the ones that are pragmatic. You couldn't say "clean between these two places;" I suppose we might know enough to try to figure that out because we know where those two areas are and we could craft the location, but that's not a normal everyday use case because people make messes under or near things rather than between things. With precise and approximate scheduling, we should be able to handle that, because that's something people are likely to say. From a design perspective, it has to do with listening intently to how customers like to talk about tasking Roomba, and making sure that our skill is sufficiently literate to reasonably precisely do the right thing.
Do these voice commands really feel like talking to Roomba, or does it feel more like talking to Alexa, and how important is that distinction?
Unfortunately, the metaphor is that you're talking to Alexa who is talking to Roomba. We like the fact that people personify Roomba. If you don't yet own a Roomba, it's kind of a creepy thing to go around saying, because it's a vacuum cleaner, not a friend. But the experience of owning a Roomba is supposed to feel like you have a partner. And this idea that you have to talk to your helper through an intermediary is the price that we pay, which in my mind diminishes that partnership a little bit in pursuit of iRobot not having to build and maintain our own speakers and voice system. I think both Amazon and Google played around with the idea of a direct connection, and decided that enforcing that metaphor of having the speaker as an intermediary simplifies how people interact with it. And so that's a business decision on their side. For us, if it was an option, I would say direct connection every time, because I think it elevates the feeling of partnership between the person and the robot.
From a human-robot interaction (HRI) perspective, do you think it would be risky to allow users to talk directly to their Roomba, in case their expectations for how their robot should sound or what it might say don't match the reality that's constrained by practical voice interaction decisions that iRobot will have to make?
I think the benefits outweigh the risks. For example, if you don't like the voice, you should be able to change the voice, and hopefully you can find something that is close enough to your mental model that you can learn to live with it. If the question is whether talking directly to Roomba creates a higher expectation of intelligence than talking through a third party, I would say it does, but is it night and day? With this announcement we're making the strong statement that we think that for most of the things that you're going to want Roomba to do, we have enabled them broadly with voice. Your Roomba is not going to know the score of the baseball game, but if you ask it about what it's supposed to be able to do, you're going to have a good experience.
Coming from the background that you have and being involved in developing Roomba from the very beginning, now that you're having to work through voice interactions and HRI and things like that, do you miss the days where the problems were power cords and deep carpet and basic navigation?
Honestly, I've been waiting to tackle problems that we're currently tackling. If I have to tackle another hair entrainment problem, I would scream! I mean, to some extent, here we are, 31 years in, and I'm getting to the good stuff, because I think that the promise of robots is as much about the interaction as it is around the physical hardware. In fact, ever since I was in college I was playing around with hardware because the software sucked and was insanely hard and not going to do what I wanted it to do. All of my early attempts at voice interaction were spectacular failures. And yet, I kept going back to voice because, well, you're supposed to be able to talk to your robot.
Voice is kind of the great point of integration if it can be done well enough. And if you can leave your phone in your pocket and get up from your meal, look down, see you made a mess and just say, "hey Roomba, the kitchen table looks messy," which you can, that's progress. That's one way of breaking this ceiling of control complexity that must be shattered because the smart home isn't smart today and only does a tiny percentage of what it needs to do.

A Blog by Jonathan Low

Nov 21, 2021

The Reason Voice Is the Future of Robot Control

1 comments:

Post a Comment

contact

Search This Blog

Blog Archive

Labels

links