A Blog by Jonathan Low

 

Aug 15, 2019

Without Better Data Quality AI Is No Silver Bullet

There is little agreement - let alone assurance - about what constitutes 'statistical significance,' to say nothing about what constitutes good data. JL


Hetan Shah comments in the Financial Times:

We are living through a period in which (many) want easy answers to difficult questions.There is interest from policymakers and scientists around the world around how artificial intelligence is going to transform their work. In their haste to jump on the AI bandwagon, everybody is forgetting we have not solved some deeper problems about data. AI evangelists see everything as a technical matter (but) any applications are little more than complex algorithms based on historical data. There may be many areas where it would be both more accurate and more transparent to use old-fashioned statistics in place of opaque machine learning.
There is considerable interest from policymakers and scientists around the world around how artificial intelligence is going to transform their work. In their haste to jump on the AI bandwagon, however, everybody is forgetting we have not solved some older, deeper problems about data that will stymie attempts to get the technology off the ground. The UK National Audit Office recently noted that data are often not seen as a priority in government, and that this hampers policymaking. Despite the fact that so many policy challenges — from obesity to climate change — are cross-departmental, the government does not think about data in a joined-up way. With the 2021 census fast approaching, the Office for National Statistics is going to rely more than ever on piecing together data from other departments. The biggest concern the Statistics Authority, the regulator of official statistics, has about the census is that the ONS may not be able to obtain data from departments when it is needed. This seems a failure of leadership: we have the data, we just cannot get the incentives and controls right to share it properly. And yet, while there are many excited conversations about AI in the civil service, few seem keen to discuss the difficulties posed by data sharing without which AI applications will not have the raw materials to learn from. Conversations about data ethics also lack a practical focus. Take healthcare data: there are critical ethical discussions to be had with the public about how their data are used. Treatments could be improved if all patient data were used. So do we want a data collection system that allows individuals to opt out? Or a “social contract” model where everyone’s data are used for the benefit of all? These are old data problems, not new AI issues. And yet the new tech will not get off the ground if the debates are not had. The UK is on course for a considerable expansion of investment in research. But science’s dirty secret is that nobody knows how much research is not statistically robust. Earlier this year, some of the world’s top statisticians argued for the notion of “statistical significance” to be dropped, as it was leading to erroneous or overstated claims. The British government has set up a new council, UK Research and Innovation, which is investing significant amounts of money in AI. It also has an opportunity to help scientists strengthen the way they use data — especially those who may not have had much statistical training. AI enthusiasts believe that new technologies will give us better evidence to inform policy. But improving the evidence is only one part of the job: we need greater demand for it both from policymakers and the public.
We are living through a period in which leading politicians want easy answers to difficult questions.
There is no technical fix to this, although AI evangelists often see everything as a technical matter. So far, use of AI in government has been fairly simplistic. Many applications are little more than complex algorithms based on historical data. Un­aware of the challenges of design, analysis and modelling, naive users may jump in where experienced statisticians fear to tread. In fact there may be many areas where it would be both more accurate and more transparent to use old-fashioned statistics in place of opaque machine learning. For example, as a predictor of US recidivism, a complex machine-learning algorithm was no better than a simple linear-regression model. As things stand, AI in government is like teenage sex: everyone is talking about it but no one is really doing it. That will continue unless we can solve some old data problems. Unfortunately, these tend to be perceived as too boring and difficult for anyone to have a go at tackling them.

0 comments:

Post a Comment