A Blog by Jonathan Low


Jan 29, 2018

Is Pharma's Data Problem Technical or Cultural?

Cultural and organizational. As in many, if not most organizations, the issue is sharing data and respecting the opinions of colleagues from different disciplines rather than the quality of the information itself. JL

David Shaywitz reports in Forbes:

“Biomedical informaticians and clinical investigators view each other as intellectual peasants providing rote/mechanical services.” In most pharmas, drug development is driven by bench scientists and clinical investigators. Analyses are performed, but in predefined ways, trying to answer specific questions. “Every biomedical organization has data spread across multiple (sites). The real work is curating it into a form that can be cross-analyzed.”Ideally, those doing analysis work in partnership with those leading drug development.
Despite (some might say, because of) a raft of new biological methods, pharma R&D has struggled with its EROOM problem, the fact that the cost of successfully developing a new drug, including the cost of failures, has been relentlessly increasing, rather than decreasing, over time (EROOM is Moore spelled backwards, as in Moore’s Law, describing the rapid pace of technology improvement over time).
Given the impact of technology in so many other areas, the question many are now asking is whether technology could do its thing in pharma, and make drug development faster, cheaper, and better.
Many major pharmas believe the answer has to be yes, and have invested in some version of a by-now familiar data initiative aimed at aggregating and organizing internal data, supplementing this with available public data, and overlaying this with a set of analytical tools that will help the many data scientists these pharmas are urgently hiring to extract insights and accelerate research.
A bevy of established companies and consultancies, including Deloitte, Accenture, BCG, and McKinsey, are championing some version of this vision, along with a number of younger companies, including Silicon Valley powerhouse Palantir, and startups like Datavant.
Two significant challenges associated with this vision are:
(1) How do you achieve it?
(2) Will it actually work?
These are each enormous unknowns with which the entire industry is currently wrestling.
Organizing Pharma Data
“Every biomedical organization has their data spread across multiple data stores, and the real work to be done is curating it into a form that can be cross-analyzed,” Anthony Philippakis, Chief Data Officer of the Broad Institute, explained to me. “The challenge that life science organizations face is not so much analyzing their data, but rather organizing it.”
He adds, “As we think about making precision medicine a reality, I think it is much more likely that we will fail because of the challenges of data sharing and data curation, rather than the challenges of scalability or analytics.”
Philippakis has championed a concept he and his colleagues call a “data biosphere,” an ecosystem that “contains modular and interoperable components that can be assembled into diverse data environments.” Philippakis argues, “It is crazy to think that any one group can create a data platform that will satisfy the needs of all groups across all geographies.  We need modular and interoperable services that result in an ecosystem of activity.” 
Leveraging Organized Data
As many biopharma companies invest significant treasure and time in collecting and organizing their data -- an inordinately heavy lift -- the question is: will it be worth it? Will having a huge amount of organized data radically change how pharma companies discover and develop drugs?
That’s the idea, certainly, if not the expectation – but we should take such predictions with a grain of salt. As Economist reporter Natasha Loder reminded us recently on twitter, analysts in 1997 predicted new techniques – including “things like ‘bioinformatics’ and ‘rational drug design’ are likely to have a huge effect on the drug industry,” radically accelerating development time and doubling the success rate of late-phase trials. More than twenty years later, we’ve seen that these techniques are certainly powerful – yet unfortunately, they’ve not (yet?) cracked the nut of drug development.
Even so, there seems to be a pervasive sense that once the required internal and external data are cobbled together, R&D-altering insights will follow. How?
This is the pharma data version of the famous South Park underpants gnome story, where the business plan is roughly:
Step 1: Collect lots of data
Step 2:
Step 3: Insight and efficiency!
At least, this seems to be how the investment is viewed from the R&D trenches (as I recently discussed), where drug developers are vaguely aware of institutional data efforts, work that for the most part hasn’t yet really impacted how most drug development teams go about their jobs.
From the C-suite, though, Step 2 is “data science,” and the plan of many pharmas, to paraphrase Matt Damon in The Martian, is to collect a ton of information and then data science the heck of out it.
Aren’t Pharma’s Already All About Data And Science?
What exactly is this much-celebrated “data science,” and how is it different from what pharmas are already doing? After all, quantitatively analyzing data is already a central part of pharma R&D, and has been for quite some time.
One of the best answers I’ve heard is from UCSF’s Atul Butte, who told me,
“In general, most computational/informatics folks are viewed as service providers, and they are really not shown all the data available within a company.  To be blunt, they [i.e. pharmas] should be empowering computational folks to come up with ideas and hypotheses using their own internal data (and outside public data), and running experiments to test those ideas.”
Harvard professor Zak Kohane agrees. “Biomedical informaticians and clinical investigators often view each other as intellectual peasants providing rote/mechanical services.”
The problem Butte and Kohane are pointing out in that in most pharmas, drug development is driven by bench scientists (in preclinical development), and by clinical investigators (once a product enters human studies). A lot of analyses are performed, but in relatively stereotypical or predefined ways, trying to answer specific questions posed by bench or clinical scientists. Ideally, those doing this analysis work in close partnership with those leading drug development, but I suspect very few of these statisticians and analysts believe they’re driving the bus.
What Butte and Kohane are arguing for is at some level a radical change; it’s the suggestion that if you let savvy data scientists loose on a reasonable amount of data, and let them figure out what to ask, they will come up with insights by asking questions others in the organization might not have thought of. (There are obviously parallels here to the “research parasite” debate of 2016, another tussle between clinical investigators and data scientists.)
Complicating matters is the deep disconnect between those expert in traditional domains of drug discovery, and those expert in data science. As Kohane observes, “What is best is
one brain (multidisciplinary team second best) capable of a nimble back and forth between questions, hypothesis testing and analyses,” echoing a very similar assertion Calico Chief Computing Officer Daphne Koller recently made on our Tech Tonics podcast (episode here).
The fascinating question is how all this gets resolved. At many biopharmas today, drug development teams toil pretty much as they always have, while data groups collect and assemble both data and talent, in relative isolation. It is a tale of two cultures worthy of C.P. Snow.
It’s not clear to me, or to anyone, how this story ends, but pragmatically, a key step has got to be meaningful contributions from data scientists – novel insights from select, aggregated datasets that are too important for a pharma R&D organization to ignore. Inevitably, these insights will be from the integration of some datasets, but not all possible datasets, and it’s interesting to contemplate which datasets will be most informative. Which do you –
should you -- start with to try to generate a quick, meaningful win?
There are two categories of positive outcomes I can foresee. First, a small company will figure out how to effectively leverage a group of datasets, and use this insight to successfully generate either a molecule or an approach (to clinical trial recruiting, say) that pharmas would be keen to access. Second, a large company might figure out how to solve the culture problem, and find a way to help traditional drug developers and eager data scientists work together effectively, and in particular learn the sorts of questions and approaches relevant to each domain. This seems incredibly difficult to pull off in pharma companies, which tend to be extremely territorial by nature – but imagine the great outcome achievable for both the company that figures this out, as well as for the patients who would benefit from the novel insights such a collaborative team might generate.


Post a Comment