The Low-Down: When Data Becomes Dangerous

Data is not inherently dangerous, anymore than say fertilizer is until it is mixed with gasoline to create a bomb.

It is the person doing the mixing and the purpose for which it is being applied that creates the risk.

The growing availability of data with greater degrees of granularity is not, in itself, a threat. Much of it is useless without the added provision of context, experience and accurate projection. Even then, much of it remains incomprehensible.

The risk comes in misinterpreting the information which, in turn, leads to misuse. The harm is to the user and to those whom the user convinces of its accuracy and utility.

Being a convenience-oriented society and economy, we tend to put a premium on speed and ease of use. Which sometimes means we take the shortest and easiest route possible This is not necessarily the method that helps us achieve whatever goals may have been set, but we have become addicted to trading effort for information in hopes of reducing risk and increasing reward. The ultimate danger is believing it is that simple. JL

Derrick Harris comments in GigaOm:

More, better data means better models and, in theory, better profiling. That’s great for fighting crime, but potentially not so great for the guy who just happens keeps strange hours, make weird web searches, and take a lot of trips to the union office or the Halal butcher.
Elon Musk is apparently worried about humans becoming subservient to artificially intelligent computers. I think the notion is a bit absurd. I’d argue the sci-fi nightmare more likely to become reality thanks to AI has to do with big-brother states and corporate manipulation of consumers’ lives. I’d also argue that the likelihood of any of these scenarios coming true — mine or Musk’s — has everything to do with the laws governing our personal data.
To recap, Musk made the following comment Saturday on Twitter, referring to a forthcoming book called Superintelligence. “Worth reading Superintelligence by Bostrom. We need to be super careful with AI. Potentially more dangerous than nukes.” He followed up on Sunday with a tweet reading, “Hope we’re not just the biological boot loader for digital superintelligence. Unfortunately, that is increasingly probable”.
Artificial intelligence and, more broadly, machine learning, really boil down to data — how much of it computers can ingest and what they can learn from it. When we start talking about applied AI — like in robots, or even just in Google image search — we add in the additional step of what these systems can do because of what they’ve learned. As it turns out, it looks like the answer to all three questions is plenty.
If you’re into dire predictions about the future of American society, or even of mankind as a whole, the technologies currently under development today — from wearable computers to deep learning systems — should provide plenty of fodder for dystopian scenarios about how our data might be used to control us. Probably ones that seem much more imminent than they did decades ago.
However, it need not be that way. If the world’s governments and societies can effectively regulate the flow of data among citizens, corporations, governments and computers, it’s entirely possible we’ll be able to experience the benefits of AI without too many of the cons. Life might change, we’ll probably have to accept ever-evolving relationships with the technology around us, but it doesn’t have to control us.

Digital superintelligence. But, first, computers that recognize dogs
I think the fear of supremely intelligent computers that Musk espoused is rather unlikely in the foreseeable future. Mostly, this is because I’ve spent a lot of time speaking with machine learning experts — the people actually building the models and writing the algorithms that control these systems — and reading about their research to get a sense of where we’re at and where we’re headed.
Building an AI system that excels at a particular task — even a mundane one such as recognizing breeds of dogs — is hard, manual work. Even so-called “self-learning” systems need lots of human tuning at every other step in the process. Making disparate systems work together to provide any sort of human-like concept of reality would be even harder still.

Researchers present on the challenge of building a robot that can communicate effectively.

If humans want to create super-intelligent beings that outsmart us, that are smart enough to turn on us even, we’ll probably have to specifically set out to build them. It’s not outside the realm of possibility — we’ve created nuclear and biological weapons — but it seems entirely preventable.
But build up enough of those disparate systems that are really good at certain tasks, and you have the making for some big problems. Some potential scenarios are already obvious because of the companies that are leading the charge in AI research — Google, Facebook, Microsoft and even Yahoo.
Their work in fields such as computer vision, speech recognition and language understanding is sometimes amazing and already resulting in better user experiences. Applied to areas such as physics, medicine, search and rescue, or law enforcement, it could change lives.

Building better models to better infer who you are
But you can cue up the consumer privacy backlash once they turn these technologies toward advertising. These companies don’t need massive, all-knowing, self-learning systems because they already know who we are. You were worried when Google was just scanning your email to serve up targeted ads, or when Facebook manipulated some users’ feeds in the name of research?
Imagine when web companies are targeting ads based on what they see in your photos, the implied interests in your messages or even what they hear you say through your phone’s microphone. Already, some are predicting a cycle of behavioral reinforcement techniques, whereby consumers are manipulated into certain moods and then shown ads for products they’re now more likely to buy.
This is without considering the things companies will learn as we move into an area of connected devices, where everything from our thermostats to our cars are generating data and sending it back to corporate servers. Viewed in this light, it turns out Musk might not be the best messenger for concerns over an AI apocalypse. He’s CEO of a company, Tesla Motors, whose cars generate incredible amounts of data about every aspect of their existence and that keeps all that valuable information to itself.
That definitely could and should be used to predict when cars will fail or which part caused a failure. Perhaps it could be used to align R&D efforts with the real driving habits of consumers. Of course, it could also be really valuable to advertisers should Tesla ever decide to sell it (not that I’m suggesting it would).

A diagram of DARPA’s proposed deep learning system.

Anyone inclined to worry about government surveillance, however, might pray that consumer data stays within the relatively friendly confines of corporations that just want our money. Because this type of data could also be immensely valuable to government agencies that might want to track citizens or analyze their behavior. And governments — including but not limited to that of the United States — have little trouble gathering it and continue to show interest in new ways of analyzing it using AI techniques.
More, better data means better models and, in theory, better profiling. That’s great for fighting crime, but potentially not so great for the guy who just happens keeps strange hours, use burner phones, make weird web searches, and take a lot of trips to the union office and the Halal butcher. You wouldn’t want to utter the wrong phrase or do the wrong thing in an airport wired to monitor speech or other environmental inputs.
As the Google child-porn case that broke on Monday demonstrates, companies and governments also work together from time to time. This practice, too, is rife with both promise, perils and constitutional questions. Google vows it’s not scanning for any other criminal behavior and I believe it, but the more data companies collect and the better they get at analyzing content, the more tempted agencies might be to expand the scope.

Keeping AI in check by keeping data in check
Many of these types of predictions are straight out of television scripts or sci-fi stories, but they’re increasingly more realistic. The way we prevent against them is to start regulating data in a manner that respects its power and puts that power back into the hands of the citizens who generate it.
Better, clearer, and more-specific terms of service on web privacy policies would be a good start. As would proposed rules around companies acting within their users’ expectations (ideally as expressed in those policies), and perhaps tagging data with accepted uses so auditors, or even prosecutors and plaintiffs’ attorneys, could readily identify privacy violations. Stricter rules on the types of data governments can access from service providers without a warrant, and the means by which they can access it, would also be helpful.
When violations occur, we should expect some sort of symmetrical response rather than an employee reprimand or a bogus class-action settlement where lawyers get rich and consumers get nothing. Essentially, citizens and consumers need a way to protect themselves in what’s presently a one-sided fight where the other side has all the data and, more importantly, all the algorithms.
AI research is only going to pick up its pace over the next decade, and we’re going to start seeing some really big breakthroughs. Who knows, maybe we’ll even start seeing signs that the future Elon Musk and his ilk predict is actually plausible. If we want to take advantage of the good parts and keep the bad parts in check, I think the key will be keeping tabs on the data that makes it all possible.