The Low-Down: How Democracy Can Survive Big Data

The information age business model is built on gathering as much free data as possible and then re-selling it at monstrous margins. The question is whether a rule-and-regulatory framework can now be imposed that, in essence, reduces those margins but preserve the model.

There are many who will bitterly fight the reduction of profits - but the alternative model could be far worse. JL

Colin Koopman reports in the New York Times:

Tech systems are built with a minimal concern for compliance and a total disregard for the downstream consequences of data being collected. The best minds have devoted themselves to delivering on the data science promises of intelligence testing, building the computer, cracking the genetic code, creating the internet, and now this. We have in the course of a single century built an entire society, economy and culture that runs on information. Yet our approach to the ethics of data is wholly reactive.

Only a few years ago, the idea that for-profit companies and foreign agents could use powerful data technologies to disrupt American democracy would have seemed laughable to most, a plotline from a Cold War espionage movie. And the idea that the American system would be compromised enough to allow outside meddling with the most basic of its democratic functions — the election of its leaders — would have seemed even more absurd.

Today we know that this is not fiction but fact. It is a secret so open that even its perpetrators seem halfhearted about hiding it.

“Data drives all that we do.” That is the motto emblazoned on the website of Cambridge Analytica, the consulting firm that was employed by the Trump campaign to influence voters and that is now under scrutiny for its unauthorized harvesting of data from at least 50 million social media users.

The heart of Cambridge Analytica’s power is an enormous information warehouse — as many as 5,000 data points on each of more than 230 million Americans, according to recent reporting, a fact the company proudly confirms on its website. Its promise of elections driven by data ultimately implies a vision of government steered not by people but by algorithms, and by an expanding data-mining culture operating without restrictions.
That such threats to democracy are now possible is due in part to the fact that our society lacks an information ethics adequate to its deepening dependence on data. Where politics is driven by data, we need a set of ethics to guide that data. But in our rush to deliver on the promises of Big Data, we have not sought one.An adequate ethics of data for today would include not only regulatory policy and statutory law governing matters like personal data privacy and implicit bias in algorithms. It would also establish cultural expectations, fortified by extensive education in high schools and colleges, requiring us to think about data technologies as we build them, not after they have already profiled, categorized and otherwise informationalized millions of people. Students who will later turn their talents to the great challenges of data science would also be trained to consider the ethical design and use of the technologies they will someday unleash.

Clearly, we are not there. High schoolers today may aspire to be the next Mark Zuckerberg, but how many dream of designing ethical data technologies? Who would their role models even be? Executives at Facebook, Twitter and Amazon are among our celebrities today. But how many data ethics advocates can the typical social media user name?

Our approach to the ethics of data is wholly reactive. Investigations are conducted and apologies are extracted only after damage has been done (and only in some instances). Mr. Zuckerberg seemed to take a positive step on Wednesday, when he vowed to take action to better protect Facebook’s user data. “We also made mistakes, there’s more to do, and we need to step up and do it,” he said in, unsurprisingly, a Facebook post.

This is like lashing a rope around the cracking foundation of a building. What we need is for an ethics of data to be engineered right into the information skyscrapers being built today. We need data ethics by design. Any good building must comply with a complex array of codes, standards and detailed studies of patterns of use by its eventual inhabitants. But technical systems are today being built with a minimal concern for compliance and a total disregard for the downstream consequences of decades of identifiable data being collected on the babies being born into the most complicated information ecology that has ever existed.

Mr. Zuckerberg and other Silicon Valley chiefs admitted in the wake of the election that their platforms needed fixing to help mitigate the bad actors who had exploited social media for political gain. It is not Mr. Zuckerberg’s fault that our society has given him a free pass (and a net worth of $67 billion) for inventing his platform first and asking only later what its social consequences might be. It is all of our faults. Thus, however successful Mr. Zuckerberg will be in making amends, he will assuredly do almost nothing to prevent the next wunderkind from coming along and building the next killer app that will unleash who knows what before anybody even has a chance to notice.

The challenge of designing ethics into data technologies is formidable. This is in part because it requires overcoming a century-long ethos of data science: Develop first, question later. Datafication first, regulation afterward. A glimpse at the history of data science shows as much.

The techniques that Cambridge Analytica uses to produce its psychometric profiles are the cutting edge of data-driven methodologies first devised 100 years ago. The science of personality research was born in 1917. That year, in the midst of America’s fevered entry into war, Robert Sessions Woodworth of Columbia University created the Personal Data Sheet, a questionnaire that promised to assess the personalities of Army recruits. The war ended before Woodworth’s psychological instrument was ready for deployment, but the Army had envisioned its use according to the precedent set by the intelligence tests it had been administering to new recruits under the direction of Robert Yerkes, a professor of psychology at Harvard at the time. The data these tests could produce would help decide who should go to the fronts, who was fit to lead and who should stay well behind the lines.
The stakes of those wartime decisions were particularly stark, but the aftermath of those psychometric instruments is even more unsettling. As the century progressed, such tests — I.Q. tests, college placement exams, predictive behavioral assessments — would affect the lives of millions of Americans. Schoolchildren who may have once or twice acted out in such a way as to prompt a psychometric evaluation could find themselves labeled, setting them on an inescapable track through the education system.
Researchers like Woodworth and Yerkes (or their Stanford colleague Lewis Terman, who formalized the first SAT) did not anticipate the deep consequences of their work; they were too busy pursuing the great intellectual challenges of their day, much like Mr. Zuckerberg in his pursuit of the next great social media platform. Or like Cambridge Analytica’s Christopher Wylie, the twentysomething data scientist who helped build psychometric profiles of two-thirds of all Americans by leveraging personal information gained through uninformed consent. All of these researchers were, quite understandably, obsessed with the great data science challenges of their generation. Their failure to consider the consequences of their pursuits, however, is not so much their fault as it is our collective failing.

For the past 100 years we have been chasing visions of data with a singular passion. Many of the best minds of each new generation have devoted themselves to delivering on the inspired data science promises of their day: intelligence testing, building the computer, cracking the genetic code, creating the internet, and now this. We have in the course of a single century built an entire society, economy and culture that runs on information. Yet we have hardly begun to engineer data ethics appropriate for our extraordinary information carnival. If we do not do so soon, data will drive democracy, and we may well lose our chance to do anything about it.