A Blog by Jonathan Low

 

Jan 23, 2013

Good Info, Poor Judgement: When Big Data Goes Bad

I once had a finance professor who started a case study by saying, 'Assume, for this discussion, there are no taxes.' He lost us right there.

How about, assume there are no employees? Or customers. Or suppliers. Or computer glitches. Or unexpected events of any kind.

Right.

That is the problem we face with Big Data. To be fair, it is not just Big Data's issue. It is a challenge in any facet of human endeavor. But we are picking on Big Data because its adherents would have us believe either that mere access to data will answer all questions or, as the following article asserts, all humans assigned to interpret and apply the lessons such data offers will put aside their biases, predilections, experiences, political or ideological beliefs and influences to provide pure analysis unencumbered with any personal distortions.

The reality is that it is almost impossible to separate the personal from whatever purity may be. In fact, Big Data has little effective meaning without the application of experienced human screens. What is required, rather than some unattainable standard of excellence, is a forthright declaration that every expression of opinion contains inclinations that may flavor its meaning.

What this means is that transparency is the key to interpreting Big Data. Because understanding the background, experience and biases of those presenting their take on meaning will be refracted through the prism of experience held by those on the receiving end of this wisdom. In other words, data will always be subject to interpretation. And not everyone will agree. But some of the most important lessons learned will come from the discussion, debate and disagreement about precisely those differences of opinion.

We can not legislate, mandate or authenticate meaning. What we can do is recognize that humans have points of view colored by a variety of influences. As long as we are open about those differences we can make informed judgements about sources and uses that will probably be better than decisions made without access either to the data or to those who attempt to interpret it. JL

Jamal Khawaja comments in Enterprise CIO Forum:
The pitfalls of bastardizing accurate information with human judgment. Big Data is amazingly accurate when it comes to hurricane predictions, pretty bad when it comes to economic forecasting, and ultimately miserable in earthquake prediction. When it comes to understanding the failure of predictive analytics it’s not just flawed approaches used by pundits to maximize (or mitigate) risk; it’s also the failure of understand how various interrelated systems of data interact to produce an inflection point that can be measured.

Big Data is something that I am very passionate about. In the next five to ten years, the capture, correlation, and analysis of data will fundamentally reshape every aspect of our society. From analyzing a medical condition to predicting the next economic bubble, Big Data offers us an opportunity unheralded in the annals of human history; to leverage every conceivable data point to predict the future, rather than ascribing such data points causation, is an ability we have never been able to claim. Is rape a function of the readership of Playboy? Can we reduce gun violence by increasing the price of milk? What do math scores, race, and IQ have to do with the efficiency of a bartender? These are answers that Big Data could one day provide.

The break between the viability of Big Data and the efficacy of analysis lies not in the actual analysis, but rather in perspective. Ultimately, what we are talking about is not about the value of a data point, but how multiple data points interrelate with one another. How does one determine causation as opposed to allowing causation to validate an analysis?

Oddly enough, the answer goes back to the analytic engine itself. The engine is a construct of human engineering, and thus is subject to human errors of judgment and human prejudices. If the data scientist responsible for the interrelationship of disparate data points is operating under a political agenda, then the resultant derivation will be flawed. Well, perhaps ‘flawed’ is too strong a term; it will be a representation of a reality that is specific to the desired end point. In this world of restricted data and incomplete information, I can find relationships between religious cosmologies and the murder rate. In Louisiana, the state with the highest per-capita attendance rate of church in the United States, the murder rate is twice the national average. Does that mean that you are twice as likely to commit a murder if you attend church? Of course not; the causal relationship between murder and church attendance is fragile at best, and incoherent at worst. Just because two measurements intersect, it does not imply that there is some sort of relationship between the two data points.

As a society, this is where we fail. It is not just with Big Data, but is actually a function of the flawed impetus behind the analysis and expression of any set of data points. As a society, we do not look for truth; rather, we look for a validation of notions that are already internalized. We do not care to legitimize the perspective of an opponent, irrespective of how sincere or rational that perspective may be. Our embrace of Big Data is designed to corroborate our perspectives, not invalidate them before a perceived enemy. In this manner the menace of climate change becomes the hazard of global warming. Islamic terrorism is an existential scourge whereas regional, generational conflicts take second stage to cultural imperatives. Violence and incarceration rates of minorities justify extreme judicial measures, and the possibility that the system (rather than the individual) is being gamed occurs only to a quiet few.

Much like a sword, Big Data is only as sharp and effective as the warrior that wields it. Political and cultural imperatives are as stones that dull the keen edge of the blade of Big Data; without cogent, nuanced, and balanced perspectives around data, predictive analytics will be as successful an legitimate as the labors exhorted by Dick Morris and Peggy Noonan. It is not enough to cobble together disparate pieces of data in order to create a narrative; the narrative that the data suggests must be rooted in sound data science and empirical, unbiased analysis.

The other real and substantive critique of Big Data analytics is expressed by pundits that rightly question the quantity and quality of data points available. After all, why is it that meteorologists can only provide generalities with respect to the weather? Why are geophysicists so hesitant in providing predictions when it comes to volcanism and earthquakes? Why is it that climatologists are unable to provide a clear and unequivocal analysis of rising sea levels and temperature fluctuations across the globe? Are these scientists hiding something? Or are they simply unable to piece together the avalanche of information into a reasonable, coherent analysis that will describe what will happen tomorrow?

Perhaps it is all of the above. It is tempted to ascribe this lack of granularity to political intentions, but the reality is that Big Data is a burgeoning field with only a few real success stories. Target and Walmart have both leveraged Big Data to accomplish an understanding of their customers that can only be quantified as creepy. More importantly, they have leveraged Big Data to normalize their supply chain and increase the value of their inventory. Outside mega-corporations or the government with billions to pour into research, the field of Big Data is relatively scant. Instrumentation is still the exception rather than the rule, and the amount of data collected by most corporations can be qualified as “big” only in the most generous of circumstances. Most organizations have recognized this limitation and are instead mining “sparse data” in order to drive as much value from their records as they can. As sources of data streams continue to multiply, and as the ability of corporations to capture, manage, and monetize these data streams increases, the only question that comes to mind is how this retained data will affect the efficacy of an organization’s manufacturing, delivery, and sales process.

Ultimately, the goal of Big Data is not to provide information for information’s sake; rather, it is the promise of leveraging disparate data sources in order to gain a better understanding of how to create more value for shareholders. Big Data is a tool that need to be studied and understood and finally leveraged appropriately in the value chain of corporate America’s goods and services. The challenges we face are not the prosaic trials of data retention and data mining; in this new world, we must instead understand the value of data independent of personal, corporate, or national prejudices, and leverage the analysis of data in a manner that provides real economic value rather than political advantage. Our data scientists must be free from the burdens of an aspirational elitist class, and must instead focus their attention upon the understanding, analysis, and monetization of collected data.

0 comments:

Post a Comment