A Blog by Jonathan Low


Jun 19, 2012

Inside Google's Plan to Build a Catalog of Every Single Thing, Ever

What is all the stuff your computer should know?

Or maybe more to the point, how do you even begin to conceptualize what all the knowledge in the world should be and how it should be organized?

Yeah, heavy.

So Google bought a company called Metaweb which was co-founded by, among others, Danny Hillis, one of the more notable contemporary thinkers, who co-founded Thinking Machines, Applied Minds and was one of the original Disney Imagineers.

Their objective is to catalog everything. Which sounds just as difficult as it must be in practice. They currently have about 500 million objects classified and cross-referenced. Their work is showing up in the way Google now manages your searches and produces results. This has already produced controversy that makes Wikipedia's definitions look tame. Because on the subject of religion, for instance, there are people who simply do not acknowledge the value - and sometimes the existence - of others and who have been known to get rather heated about it.

But you have to start somewhere. When your objective is everything, which is a kind of expansive concept, measuring success is probably going to be a work in progress. But we suspect that this is another of those enterprises in which the journey is the reward. JL

Alexis Madrigal reports in The Atlantic:
There's a lot more to Google's Knowledge Graph than might be apparent from what you see in a casual search.

The ugly truth is that computers don't know anything. They have no common sense. This idea had been circulating in Metaweb co-founder John Giannandrea's head since 1997 when he was working at Netscape and thinking about how to reveal what you did not know you didn't know on the web.
If you were looking at search results for a hiking trail, say, what other hiking trails might you look at? Giannandrea called it "going sideways through the web," and he loved the idea, even if he couldn't execute it back then.

Years later, in 2005, Giannandrea teamed up with Danny Hillis and Robert Cook to cofound Metaweb, which had a simple premise: "What if we could make a catalog of all the stuff our computer should know?" Giannandrea told me in a recent interview. "We were interested in building a model of the world. Our computers are remarkably dumb about the stuff that we take for granted. You learn about stuff. You have some context for understanding. Our computers don't work that way because we don't have any loaded context."

With remarkable confidence (hubris?), he and the other founders said to themselves, "Teaching computers all the discrete stuff in the world seems like it should be doable," so they set out to make a machine-readable catalog of everything in the world.

Last month, their project was finally let loose into the wild as the Google Knowledge Graph, which you now see showing up in your search results on the right of your screen. But there's a lot more to the creation of the Knowledge Graph than might be apparent from using it in casual searches.

This is one of those human knowledge projects that is ridiculous in scope and possibly in impact. And yet when it gets turned into a consumer product, all we see is a useful module for figuring out Tom Cruise's height more quickly. In principle, this is both good and bad. It's good because technology should serve human needs and we shouldn't worship the technology itself. It's bad because it's easy to miss out on the importance of the infrastructure and ideology that are going to increasingly inform the way Google responds to search requests. And given that Google is many people's default portal to the world of information, even a subtle change in the company's toolset is worth considering.

And that's how I found myself on the phone with John Giannandrea discussing mojitos and semantic graphs. "Take the drink called the mojito," he said. "Mojito has ingredients and mint, rum, ice. We'll create a catalog entry for that entity for that human concept 'mojito' and then we'll create a connection between the mojito and its ingredients." The key difference between their catalog and a standard database is that the connection between the mojito and mint is itself an entity, an entity that says, "This thing is an ingredient in this other thing." The edge between the two nouns contains meaning and that makes all the difference. "We can talk about the representation of knowledge with the knowledge itself," Giannandrea said. Whoa, Meta! I thought. Hence, Metaweb.

But there's at least one problem. If you're going to build a catalog of all common sense things in the world, where do you start? The answer was simply, "Somewhere." They added bodies of water and bridges, which go over bodies of water, and highways which the bridges are a part of, and the length of those highways and the states through which the highways run, and the capitols of those states, and the populations of those capitols, and the population of the United States, and the population of every country in the world, and the dates in which those countries were founded, and so on and so forth and so on and so forth.

They built tools to import data from other sources, so that if they got a database from the French cheese association, they could crank out the sodium levels in those cheese and also tell you a bit about the regions they came from.

After five long years, they had 12 million objects in the database. And they were purchased by Google. In the first year after the acquisition, they had 25 million things. What did Google bring to the acquisition, aside from money? Data, of course, of a very specific kind. Before, they were just guessing at what people might want to know (cheese, rivers, highways, etc). With Google's search data, they *know* what users are after, so they can go about finding and making that information available.

With Google's help, their database has grown rapidly to over 500 million items objects. That's orders of magnitude larger than previous attempts to educate artificial intelligences like the Cyc project out of the University of Texas. (Though it should be noted that Cyc has some capabilities that the Knowledge Graph does not.)

In the end, what is most significant to Giannandrea is that "we're taking a baby step in teaching all our computers at Google something about our human world." As for what comes next, he can't say, but the idea is that it will become a resource that all Google developers can call on, the core of common sense at the center of Google's vast web.


Post a Comment