It may not sound sexy, but data accessibility is a hot-button issue in the biomedical field right now (it even became a brief flashpoint at Techonomy Health in our cancer session last month). We’ve covered this before as it relates to cancer research, but the rise of big data in many other areas of medicine is bringing a central problem to a head—people need to be able to share data, and they can’t. Fortunately, scientists and institutions are making new advances in ways to access data, and they could help usher in an era of open data. More on that in a moment, but first let’s take a look at some of the main reasons for today’s accessibility challenges.
HIPAA. The 21-year-old Health Insurance Portability and Accountability Act was enacted to protect the privacy of patients’ medical data—a laudable goal. The unfortunate side effect of the law is that it makes it all but impossible for scientists to share data with each other, even to help those very patients. The fear of violating HIPAA, or just to be perceived as possibly violating HIPAA, is a powerful force preventing the adoption of openly accessible data in medicine.
Standardization and interoperability. If you surveyed 100 different academic medical institutions in this country, you’d probably find at least 100 different data formats or analysis protocols. Limited funding means that scientists will always opt for free solutions when they can, so they tend to write their own software and other data management tools. With each lab coming up with its own solutions, it’s no wonder that there are loads of different formats and structures for storing, processing, and releasing data. There’s also a well-known lack of interoperability across pipelines that were never intended to talk to each other. Even if every scientist and hospital released every byte of data ever generated, it would take a massive effort to sort them out and pull them into a coherent, useful structure.
Consent forms. In the little-data days of biology and medicine, it was hard to imagine that some other institution would ever want your data. So it made sense that consent forms for clinical trials or research studies never asked participants if they’d be willing to let the university share their data with other organizations. But today, anyone with an internet connection and a medical theory is looking to hoover up as much data as he or she can possibly find. At many institutes, consent forms are catching up by adding sections that enable broader data sharing, but there’s still a ton of historical data stuck in silos because of limited consent policies.
Competition. It would be nice to think that data-sharing hurdles are all about outdated policies and technical standards, but the ugly truth is that competitiveness—both at the institutional and individual level—is a real factor. Having a killer data set that nobody else can access gives a scientist or university a leg up when it comes to winning new grant funding and publishing high-profile papers. Scientists who live on the razor’s edge of an upcoming tenure or funding decision often believe that they can’t afford to make their data public and risk letting someone else find an important gem in it before they do.
Together, these challenges make it that much harder for clinicians to find out about a patient’s medical history to make better treatment or diagnostic decisions, and for scientists to amass data sets large enough to elucidate the causes of common disease. But now for the good news: this problem is getting more attention, and that means more public outcry. The scientific community is pulling together to develop new approaches, large and small, to grease the skids of data sharing.
One recent effort came from a broad group of stakeholders from publishers, government agencies, academia, and industry to establish new guidelines that would make it easier for scientists to access and use each other’s data. The FAIR Data Principles (that stands for findability, accessibility, interoperability, and reusability) were the foundation of a hackathon last month that encouraged participants to evaluate and improve their own data for broader use. The Global Alliance for Genomics and Health, launched in 2013 by dozens of institutions, has a focus on improving data standards and interoperability to make information more useful when it is shared. And just this month, scientists from the University of Washington and Microsoft released a text-mining tool that looks through big repositories for data sets that should have been released publicly but weren’t (often because scientists forget to change settings when it’s time for a private data set to be shared). In the paper describing this work, the team reports that using the approach “spurred administrators to respond,” sharing 400 data sets in a single week.
None of these programs or tools can solve the data accessibility problem alone, but together, they’re improving things bit by bit. As more people realize that data blockades are impeding medical advances, we may see enough momentum to finally make open data the standard.
Meredith Salisbury is editorial director for the life science communications firm Bioscribe.