A Blog by Jonathan Low

 

Sep 7, 2018

The Way To Tell If A Bot Is Actually A Human

Asking them usually doesn't work. JL

Janelle Shane reports in Slate:

Some tech companies that boast about their artificial intelligence have found that humans are a cheaper, easier, and more competent alternative to A.I. One sign is a limited scope of expertise—a convincing bot can only talk about one subject at a time. Another is a poor memory. Outside of totally formulaic text, algorithms have difficulty producing stories or even paragraphs that make sense. Any conversation or script that remains coherent for more than a few sentences is likely human-written or heavily  human-curated.
Every day, people are paid to pretend to be bots.
In a strange twist on “robots are coming for my job,” some tech companies that boast about their artificial intelligence have found that at small scales, humans are a cheaper, easier, and more competent alternative to building an A.I. that can do the task.
Sometimes there is no A.I. at all. The “A.I.” is a mockup powered entirely by humans, in a “fake it till you make it” approach used to gauge investor interest or customer behavior. Other times, a real A.I. is combined with human employees ready to step in if the bot shows signs of struggling. These approaches are called “pseudo-A.I.” or sometimes, more optimistically, “hybrid A.I.”
It’s not uncommon for companies to blur the line between human and bot. Cloudsight.ai, for example, advertises an image-recognition service that “leverages the best of human and machine intelligence.” As Cloudsight explains in an AP interview, what this means is that the trickier images are sent to human employees. This human-A.I. partnership makes Cloudsight’s technology stronger—it’s one of the only image-recognition APIs that isn’t fooled by sheep in unusual places. But thanks to a built-in delay of several seconds, it’s tough to know whether a given photo was labeled by a computer or a human.
Cloudsight.ai is relatively open about its use of humans, but companies that had promised an A.I. solution have repeatedly come under fire for sending sensitive customer voicemails, expense reports, and personal schedules to human employees instead. Even major companies like Facebook have used pseudo-A.I. Facebook’s now-defunct “M” personal assistant owed its competence—and eventually its demise—to its heavy reliance on human help.
Although some companies see the use of humans for “A.I.” tasks as a temporary bridge, others are embracing pseudo-A.I. as a customer service strategy that combines A.I. scalability with human competence. They’re advertising these as “hybrid A.I.” chatbots, and if they work as planned, you will never know if you were talking to a computer or a human. Every remote interaction could turn into a form of the Turing test. So how can you tell if you’re dealing with a bot pretending to be a human or a human pretending to be a bot?
Any conversation or script that remains coherent for more than a few sentences is likely human-written or heavily human-curated.
One of the ways you can’t tell anymore is by looking for human imperfections like grammar mistakes or hesitations. In the past, chatbots had prewritten bits of dialogue that they could mix and match according to built-in rules. Bot speech was synonymous with precise formality. In early Turing tests, spelling mistakes were often a giveaway that the hidden speaker was a human. Today, however, many chatbots are powered by machine learning. Instead of using a programmer’s rules, these algorithms learn by example. And many training data sets come from services like Amazon’s Mechanical Turk, which lets programmers hire humans from around the world to generate examples of tasks like asking and answering questions. These data sets are usually full of casual speech, regionalisms, or other irregularities, so that’s what the algorithms learn. It’s not uncommon these days to get algorithmically generated image captions that read like text messages. And sometimes programmers deliberately add these things in, since most people don’t expect imperfections of an algorithm. In May, Google’s A.I. assistant made headlines for its ability to convincingly imitate the “ums” and “uhs” of a human speaker.
So it’s not difficult for algorithms to imitate human linguistic quirks. But because A.I.s are nowhere near as capable with language as humans, there are certain other telltale signs that something was written by a bot.
One major sign is a very limited scope of expertise—a convincing bot can only talk about one subject at a time. I chatted recently with Whole Foods Market on Facebook Messenger. The conversation went like this:
Whole Foods: Hi Janelle! We’re here to help you find recipes as simple as their ingredients.
Me: Do you have a recipe for guacamole?
Whole Foods: [posts a recipe for guacamole with cilantro]
Me: Is it okay to make guacamole from green peas?
Whole Foods: [posts a recipe for green pea guacamole]
Me: Give me a recipe that uses avocado. Anything but guacamole.
Whole Foods: [posts a recipe for avocado salsa with olives and cilantro]
So far, this conversation is somewhat ambiguous. I’m chatting with something that has not explicitly identified itself as a bot and is competently handling questions that can be answered by posting a recipe. But when I stray from this narrow format, the bot quickly reveals itself.
Me: How can I tell if an avocado is ripe?
Whole Foods: [posts a recipe for edamame guacamole]
Me: Do you have a favorite Star Wars character?
Whole Foods: [posts a recipe for sautéed shrimp with polenta and manchego]
Any algorithm that appears to have human-level performance is probably successful because it has specialized. Algorithms like QuillWordsmith, and the Washington Post’s Heliograf can turn tables of data into written documents, writing up local sports results, election news, and more. But these algorithms succeed because their tasks are formulaic, plugging data from spreadsheets into a few stock phrases. “The St. Mary’s Ryken Knights defeated Mount St. Joseph in a close game on Saturday, 30-28,” writes Heliograf. “The Knights outscored Mount St. Joseph in the first quarter, 13-0. In the second quarter, Mount St. Joseph scored 14 points and allowed seven from the Knights.” It’s not exciting stuff, but it does a competent job of describing the game. But an A.I. like Heliograf would utterly fail when faced with information that doesn’t fit neatly into the prescribed boxes. Did a horse run onto the field midgame? Did someone make an unbelievable catch that turned the game around? Heliograf only knows how to report its spreadsheet.
Limited computing power is the main reason that bots are usually good at just one thing at a time. Whenever programmers try to train machine learning algorithms to handle additional tasks, they usually get algorithms that can do many tasks rather badly. In other words, today’s algorithms are artificial narrow intelligence, or A.N.I., rather than artificial general intelligence, or A.G.I. For now, and for many years in the future, any algorithm or chatbot that claims A.G.I-level performance—the ability to deal sensibly with a wide range of topics—is likely to have humans behind the curtain.
Another bot giveaway is a very poor memory. Outside of totally formulaic text, algorithms have great difficulty producing stories or even paragraphs that make sense. Characters change, plots meander, and conversations become repetitive, because the algorithm simply can’t keep track of what’s going on. As with the algorithms that fail to generalize to multiple tasks, the limitation here is computing power.
There is a trend now for people to claim they produced a script when they “forced a bot to watch 10,000 hours of X,” but one easy way to identify these parodies as human-written imitations is to look for signs of a memory that spans more than a sentence or two. By contrast, an actual algorithm-generated text will usually look something like the recipe below, which was generated by a neural network I trained on 30MB of cookbook recipes.
Shake Stew Market
appetizers, peeled and poultry
1 large potatoes, optional
1 single whole pork chops
4 cloves garlic, minced
3 slice shrimp
¼ cup chili powder
1 lb bayular
Brown in a buttered pan. Cook and stir until the lemon open. Let stand until mixture thickens; add the almonds. Stir until the meat begins to boil. Add the eggs and eggs and stir until well blended. Do not overbean; bring to a microwave-size, then add the garlic. Cook for a few minutes. Add the vinegar and chili flakes. Cover and cook on low heat for 4 to 5 hours (wor Rice).
Yield: 4 servings.
By the time it gets to the recipe directions, the algorithm has completely forgotten the ingredients list. It’s possible to help it by abstracting the problem into larger building blocks, as IBM Chef Watson has done with recipes and as some researchers are doing with stories, but this kind of “data-set engineering” is a lot of work and the results are tightly constrained. Aside from the heavily formulaic, any conversation or script that remains coherent for more than a few sentences is likely human-written or heavily human-curated.
So, can I reliably tell whether I’m chatting with a customer service bot or an employee? The answer is actually no. Because pseudo-A.I. blends bots and humans, it’s hard to experiment. If you begin a conversation with a bot, somewhere along the line you may trigger something that invisibly causes a human to take over the interaction. That itself presents a bit of a dilemma: Bots don’t mind if you mess with them, but it’s rude and potentially harmful to experiment on a human who’s just trying to do their job.
That’s just another reason why companies should identify their bots. Sure, everyone could treat suspected bots as human, just in case. But embracing the bot’s botness can a lot of fun, as anyone who ever tried to make the Eliza chatbot swear knows.
Me: Are you a bot?
Whole Foods: I couldn’t find that, but let’s try again. Remember, you can also use emojis like [strawberry] to search!
Me: If you were stuck on a desert island, what food would you want to have?
Whole Foods: [posts a recipe for smoked salmon reuben wraps]
Me: Are tacos sandwiches?
Whole Foods: [posts a recipe for the New Englandah Burgah]
Me: What’s better, Coke or Pepsi?
Whole Foods: [posts a recipe for fresh strawberry ice cream]

0 comments:

Post a Comment