A Blog by Jonathan Low

 

Feb 19, 2019

Is Machine Learning Causing A Reproducibility Crisis In Science?

The problem is that the machine learning algorithms are programmed to find either a specific or an 'interesting' outcome which, given their power, they can almost always do whether or not the results can be reproduced by someone else using different methods. In addition, the algorithms themselves are frequently considered proprietary intellectual property making examination of their assumptions all but impossible.

The problem is that this challenges the veracity of the initial findings, rendering them suspect. JL


Pallab Ghosh reports in the BBC:

A growing amount of scientific research uses machine learning to analyse data already collected. This happens across subjects from biomedical research to astronomy. The data sets are large and expensive. But the answers they come up with are likely to be inaccurate or wrong because the software is identifying patterns that exist only in that data set and not the real world. The “reproducibility crisis” refers to the number of research results that are not repeated when other scientists try the same experiment. Machine learning and big data accelerated the crisis because algorithms have been developed specifically to find interesting things so they inevitably find a pattern.
Machine-learning techniques used by thousands of scientists to analyse data are producing results that are misleading and often completely wrong.
Dr Genevera Allen from Rice University in Houston said that the increased use of such systems was contributing to a “crisis in science”.
She warned scientists that if they didn’t improve their techniques they would be wasting both time and money. Her research was presented at the American Association for the Advancement of Science in Washington.
A growing amount of scientific research involves using machine learning software to analyse data that has already been collected. This happens across many subject areas ranging from biomedical research to astronomy. The data sets are very large and expensive.

'Reproducibility crisis'

But, according to Dr Allen, the answers they come up with are likely to be inaccurate or wrong because the software is identifying patterns that exist only in that data set and not the real world.



“Often these studies are not found out to be inaccurate until there's another real big dataset that someone applies these techniques to and says ‘oh my goodness, the results of these two studies don't overlap‘," she said.
“There is general recognition of a reproducibility crisis in science right now. I would venture to argue that a huge part of that does come from the use of machine learning techniques in science.”
The “reproducibility crisis” in science refers to the alarming number of research results that are not repeated when another group of scientists tries the same experiment. It means that the initial results were wrong. One analysis suggested that up to 85% of all biomedical research carried out in the world is wasted effort.


It is a crisis that has been growing for two decades and has come about because experiments are not designed well enough to ensure that the scientists don’t fool themselves and see what they want to see in the results.

Flawed patterns

Machine learning systems and the use of big data sets has accelerated the crisis, according to Dr Allen. That is because machine learning algorithms have been developed specifically to find interesting things in datasets and so when they search through huge amounts of data they will inevitably find a pattern.
“The challenge is can we really trust those findings?” she told BBC News.
“Are those really true discoveries that really represent science? Are they reproducible? If we had an additional dataset would we see the same scientific discovery or principle on the same dataset? And unfortunately the answer is often probably not.”



Dr Allen is working with a group of biomedical researchers at Baylor College of Medicine in Houston to improve the reliability of their results. She is developing the next generation of machine learning and statistical techniques that can not only sift through large amounts of data to make discoveries, but also report how uncertain their results are and their likely reproducibility.
“Collecting these huge data sets is incredibly expensive. And I tell the scientists that I work with that it might take you longer to get published, but in the end your results are going to stand the test of time.
“It will save scientists money and it's also important to advance science by not going down all of these wrong possible directions.”

1 comments:

menards club said...

Sarkari Results !!!! Jon is a Partner and Co-founder of Predictiv Consulting and PredictivAsia. Predictiv assists Global 500 corporations, government agencies, freejobalert !!!! family-owned businesses and not-for-profits improve management performance, organizational effectiveness, marketing and communications strategy. Predictiv has particular expertise in measuring the impact on financial results of intangibles such as strategy execution, reputation, brand, innovation and post-merger integration. Latest Employment News !!!! Clients have included Southwest Air, Pfizer, Major League Baseball, Petrobras, General Motors, UPS, United Technologies, BASF, the U.S. Joint Chiefs of Staff, Novartis and Visa Couchtonerone

Post a Comment