A Blog by Jonathan Low


Apr 6, 2023

The Reason Robotics Is Significantly Lagging Behind AI

There are several reasons why AI appears to have leapfrogged robotics in terms of capability: there is a funding disparity because investors have come to believe the returns to AI are bigger and come faster; roboticists tend to hoard knowledge, while AI is more open sourced; and the physical world is more complicated than language. As a result, fewer people are working on robotics than AI.  JL

Jacob Stern reports in The Atlantic:

Large language models are drafting screenplays, writing code and cracking jokes. Image generators are winning art prizes and producing dangerously convincing fabrications. Meanwhile, the world’s most advanced robots are still struggling to open actual, physical doors. Fewer people work on robotics than on AI. There’s also a funding disparity: "There is this perception among investors that the payoff of robotics investments is not very high.” And when private companies put money into robots, they hoard their knowledge. In AI, on the contrary, open sourcing is the norm. But the biggest obstacle for roboticists is that the physical world is extremely complicated, far more so than language.When people imagine the AI apocalypse, they generally imagine robots. The android assassins of the Terminator franchise. The humanoid helpers of I, Robot. The Cylon armies of Battlestar Galactica. But the robot-takeover scenario most often envisioned by science fiction is not exactly looming. Recent and explosive progress in AI—along with recent and explosive hype surrounding it—has made the existential risks posed by the technology a topic of mainstream conversation. Yet progress in robotics—which is to say, machines capable of interacting with the physical world through motion and perception—has been lagging way behind. “I can’t help but feel a little envious,” said Eric Jang, the vice president of AI at the humanoid-robotics company 1X, in a talk at a robotics conference last year. And that was before the arrival of ChatGPT.

Large language models are drafting screenplays and writing code and cracking jokes. Image generators, such as Midjourney and DALL-E 2, are winning art prizes and democratizing interior design and producing dangerously convincing fabrications. They feel like magic. Meanwhile, the world’s most advanced robots are still struggling to open different kinds of doors. As in actual, physical doors. Chatbots, in the proper context, can be—and have been—mistaken for actual human beings; the most advanced robots still look more like mechanical arms appended to rolling tables. For now, at least, our dystopian near future looks a lot more like Her than M3GAN.

The counterintuitive notion that it’s harder to build artificial bodies than artificial minds is not a new one. In 1988, the computer scientist Hans Moravec observed that computers already excelled at tasks that humans tended to think of as complicated or difficult (math, chess, IQ tests) but were unable to match “the skills of a one-year-old when it comes to perception and mobility.” Six years later, the cognitive psychologist Steven Pinker offered a pithier formulation: “The main lesson of thirty-five years of AI research,” he wrote, “is that the hard problems are easy and the easy problems are hard.” This lesson is now known as “Moravec’s paradox.”

The paradox has grown only more apparent in the past few years: AI research races forward; robotics research stumbles. In part that’s because the two disciplines are not equally resourced. Fewer people work on robotics than on AI. There’s also a funding disparity: “The flywheel of capitalism isn’t spinning fast enough yet in robotics,” Jang told me. “There is this perception among investors based mostly on historical data that the payoff of robotics investments is not very high.” And when private companies do put money into building robots, they tend to hoard their knowledge.


In AI circles, on the contrary, open sourcing is—or at least was—the norm. There’s also the issue of accidental breakage. When your AI experiment goes wrong, you can just reboot and start again. A mistake with a robot could cost you thousands of dollars in damaged hardware. The difficulty of obtaining sufficient data creates an even bigger problem for robotics, though. Training an AI requires vast amounts of raw material. For a large language model, that means text—a resource that is present in abundance (for now). AI’s recent progress has been fueled to a significant extent by training larger models with greater computation power on larger data sets.

Roboticists inclined toward this approach—hoping to apply the same machine-learning techniques that have proved so fruitful for large language models—run into problems. Humans generate an immense amount of text in the course of our regular affairs: We write books, we write articles, we write emails, we text. The sort of data that might be useful for training a robot, though—taken from, say, the natural movements of a person’s muscles and joints—are rarely recorded. Outfitting masses of people with cameras and sensors is probably not a viable option, which means researchers must gather data via robots, either by controlling them manually or by having them gather data autonomously. Both alternatives present problems: The former is labor-intensive, and the latter gets stuck in a kind of circular logic. To collect good data, a robot must be fairly advanced (because if it just runs into a wall over and over again, it won’t learn much), but to make a fairly advanced robot, you need good data.

In theory, a robot could be trained on data drawn from computer-simulated movements, but there, too, you must make trade-offs. A simple simulation saves time but generates data that are less likely to translate to the real world; a complicated one generates more reliable data but takes longer to run. Another approach would have robots learn from watching thousands of hours of videos of people moving, pulled from YouTube or elsewhere. But even these would not provide that much data on, for example, the workings of fine-motor control, Chelsea Finn, an AI researcher at Stanford University and Google, told me. In his talk, Jang compared computation to a tidal wave lifting technologies up with it: AI is surfing atop the crest; robotics is still standing at the water’s edge.

Some members of the robotics community are not particularly concerned about catching the wave. Boston Dynamics, whose videos of canine and humanoid robots have been going viral for more than a decade, “uses basically no machine learning, and a lot of it is kind of manually tuned,” Finn said (although this apparently is soon to change). Its robots generally are not very adaptable. They excel at performing a specific task in a specific environment. As impressive as they look, they are in this sense far less advanced than some of the more modest robots that are capable of opening various kinds of drawers. (Boston Dynamics did not respond to a request for comment.)

But the biggest obstacle for roboticists—the factor at the core of Moravec’s paradox—is that the physical world is extremely complicated, far more so than language. Running and jumping and grasping objects may come naturally to people, whereas writing essays and playing chess and taking math tests generally do not. “But in reality, motor control is actually in some ways a much more complex problem intrinsically,” Finn told me. “It’s just that we’ve evolved for many, many years to be good at motor control.” A language model must respond to queries made from an unimaginable number of possible word combinations. And yet the number of possible states of the world that a robot might encounter is still much, much larger. Just think about the gulf between the informational content of a sentence, or even a few paragraphs, and the informational content of an image, let alone a video. Imagine how many sentences would be required to fully describe the video, to convey at each moment the exact appearance and size and position and weight and texture of every object that it shows.

Whatever its causes, the lag in robotics could become a problem for AI. The two are deeply intertwined. Some researchers are skeptical that a model trained on language alone, or even language and images, could ever achieve humanlike intelligence. “There’s too much that’s left implicit in language,” Ernest Davis, a computer scientist at NYU, told me. “There’s too much basic understanding of the world that is not specified.” The solution, he thinks, is having AI interact directly with the world via robotic bodies. But unless robotics makes some serious progress, that is unlikely to be possible anytime soon.

Improvements in AI could boost progress in robotics. For years already, engineers have used AI to help build robots. In a more extreme, far-off vision, super-intelligent AIs could simply design their own robotic body. But for now, Finn told me, embodied AI is still a ways off. No android assassins. No humanoid helpers. Maybe even no HAL 9000, the greatest of science fiction’s AI antagonists. Set in the context of our current technological abilities, HAL’s murderous exchange with Dave from 2001: A Space Odyssey would read very differently. The machine does not refuse to help its human master. It simply isn’t capable of doing so.

“Open the pod bay doors, HAL.”

“I’m sorry, Dave. I’m afraid I can’t do that.”


Post a Comment