Challenges for Genomics in the Age of Big Data

Last week, a group of respected researchers published a commentary about the coming data challenges in genomics. Comparing the projected growth of genomic data to three other sources considered among the most prolific data producers in the world—astronomy, Twitter, and YouTube—these scientists predict that by 2025, genomics could well represent the biggest of big data fields. With the raw data for each human genome taking up about 100 GB, we’re well on our way. Genomics only recently entered the big data realm, and we have major issues to address before it leapfrogs every other data-generating group.

Scientists don’t dabble much in predictions. They’re comfortable with data, and facts they can observe. So when they do speculate, it’s worth paying attention.
Last week, a group of respected researchers published a commentary about the coming data challenges in genomics. Comparing the projected growth of genomic data to three other sources considered among the most prolific data producers in the world—astronomy, Twitter, and YouTube—these scientists predict that by 2025, genomics could well represent the biggest of big data fields. With the raw data for each human genome taking up about 100 GB, we’re well on our way.
Genomics only recently entered the big data realm, and we have major issues to address before it leapfrogs every other data-generating group. Here are the top four areas Techonomy believes must be improved in the next decade to get the genomics house in order.
Informed consent
Consent forms are a cornerstone of biomedical research. Any research project collecting genomic data begins with informed consent, a pile of paperwork that human subjects have to sign to be admitted to a study. Currently, these consent forms vary from project to project and from institution to institution, with a wide range of permissions and access guidelines not only for how data will be used in the current study, but also for how that data might be used in the future. This landscape makes it far more challenging than it should be for scientists to share data later, or to delve into existing data to draw new conclusions.
Scientists routinely praise an initiative called the Personal Genome Project, run through Harvard Medical School, for having the most broadly useful informed consent policies. Study participants have the option to be contacted for future studies, for example, and they agree to make their data and samples openly available to other labs. Because of that, PGP data has been accessed by researchers around the world who have used it to make important new discoveries. Before data generation ramps up to the billion-plus human genomes that scientists predict could be sequenced by 2025, it’s imperative that institutions embrace informed consent policies like PGP’s, allowing for massive data sharing and maximizing utility of this data.
The authors of the commentary write: “If we do not commit as a scientific community to sharing now, we run the risk of establishing thousands of isolated, private data collections, each too underpowered to allow subtle signals to be extracted.” Successful data sharing in the future depends on significant improvement in informed consent guidelines.
Data security and storage
How many letters have you gotten in the last few years letting you know that your personal data—credit cards, bank accounts, health, insurance—may have been accessed by someone who violated the security of the organization you trusted to keep your information safe?
Those letters could become even more alarming if they’re reporting the theft of your genomic data. We have no idea how this type of data could be used by criminals, but no doubt there will be a market for it. We must invest now in superior data-protection tools for our genomic data if we hope to keep it safer than our financial data is with today’s leaky systems.
Storage methods must improve as well. Too much scientific data is stored in purpose-built databases, each with its own different formats and access rules. Cloud computing is often considered a way to improve the situation: store genomic data in one place (with lots of redundancy) where scientists, clinicians, and even consumers could access and run queries on it. But this will not improve access to the vast amount of existing data that we could capitalize on now if only it were connected and easily queried. We need the tech world to help with better options to store standardized data securely, and to add hooks to existing public data repositories to make them more useful.
Analysis tools
Virtually all genomic analysis tools were created by and for the research community. If you’ve never seen a data-crunching program designed by a scientist, let’s just say you’ve avoided a serious headache. With the projected explosion in genomic data, it’s critical to have tools that can be used as easily by consumers and physicians as by experts in genetics.
The myriad potential applications of genomics—from choosing the prescription least likely to cause side effects to smart toilets that analyze microbiome health and disease biomarkers each time you flush—all demand a foundation of rapid, reliable analytical tools that don’t require an expert.
To tackle this challenge, we’ll need to harness the best analytical and coding minds, from those quants doing number crunching on Wall Street to the bright minds creating sleek online games and mobile apps.
Clinicians
Most human genomes sequenced so far have been for research use. Ten years from now, it’s likely that the center of gravity will have shifted to medical diagnosis and treatment, or even directly to consumers themselves, depending on how clinical guidelines evolve. The medical world needs to get ready, and fast.
Today, an average consumer can’t just go out and get her genome sequenced. Most opportunities available to consumers require a physician to prescribe a genome sequence—something doctors frequently refuse to do on the grounds that clinical benefit hasn’t been demonstrated. With demand for genome data continually increasing, physicians must be educated not only on potential uses of this data, but also on the concept that such data isn’t something most patients should be shielded from.
In the meantime, regulators must think hard about whether there’s really a need to always maintain a medical gatekeeper between a person and his genomic data. It seems clear that consumers will eventually have access to their own data. If we make that impossible in the United States, it’s likely that people will get the work done elsewhere. Would it really be in their best interest to rely on countries with lax guidelines? Couldn’t we support citizens better with safe, straightforward policies for genome sequencing right here at home?
Genetic counselors probably have the best background to meet this demand, so they need to be empowered in the medical community. These counselors should be able to order whole genome sequences or gene tests without having a medical doctor sign off. There’s a staggering shortage of genetic counselors, so we also need to make an investment in this field to attract more people to the career.
If we can meet these challenges, we’ll be in good shape when genomics outpaces every other source of data. But we have to get cracking right now.

Cookie	Duration	Description
__cf_bm	1 hour	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
__hssc	1 hour	HubSpot sets this cookie to keep track of sessions and to determine if HubSpot should increment the session number and timestamps in the __hstc cookie.
__hssrc	session	This cookie is set by Hubspot whenever it changes the session cookie. The __hssrc cookie set to 1 indicates that the user has restarted the browser, and if the cookie does not exist, it is assumed to be a new session.
_GRECAPTCHA	6 months	Google Recaptcha service sets this cookie to identify bots to protect the website against malicious spam attacks.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Advertisement" category.
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-analytics	1 year	Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Analytics" category.
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-functional	1 year	The GDPR Cookie Consent plugin sets the cookie to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-necessary	1 year	Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Necessary" category.
cookielawinfo-checkbox-others	1 year	Set by the GDPR Cookie Consent plugin, this cookie stores user consent for cookies in the category "Others".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	1 year	Set by the GDPR Cookie Consent plugin, this cookie stores the user consent for cookies in the category "Performance".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.
csrftoken	1 year	This cookie is associated with Django web development platform for python. Used to help protect the website against Cross-Site Request Forgery attacks
elementor	never	The website's WordPress theme uses this cookie. It allows the website owner to implement or change the website's content in real-time.
JSESSIONID	session	New Relic uses this cookie to store a session identifier so that New Relic can monitor session counts for an application.
rc::a	never	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
rc::b	session	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
rc::c	session	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
rc::f	never	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
viewed_cookie_policy	1 year	The GDPR Cookie Consent plugin sets the cookie to store whether or not the user has consented to use cookies. It does not store any personal data.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
wpEmojiSettingsSupports	session	WordPress sets this cookie when a user interacts with emojis on a WordPress site. It helps determine if the user's browser can display emojis properly.

Cookie	Duration	Description
lang	session	LinkedIn sets this cookie to remember a user's language setting.
li_gc	6 months	Linkedin set this cookie for storing visitor's consent regarding using cookies for non-essential purposes.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
mgref	1 year	This cookie is set by Eventbrite to deliver content tailored to the end user's interests and improve content creation. It is also used for event-booking purposes.
mgrefby	1 year	This cookie is set by Eventbrite to deliver content tailored to the end user's interests and improve content creation. It is also used for event-booking purposes.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.
yt-player-headers-readable	never	The yt-player-headers-readable cookie is used by YouTube to store user preferences related to video playback and interface, enhancing the user's viewing experience.
yt-remote-cast-available	session	The yt-remote-cast-available cookie is used to store the user's preferences regarding whether casting is available on their YouTube video player.
yt-remote-cast-installed	session	The yt-remote-cast-installed cookie is used to store the user's video player preferences using embedded YouTube video.
yt-remote-connected-devices	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt-remote-device-id	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt-remote-fast-check-period	session	The yt-remote-fast-check-period cookie is used by YouTube to store the user's video player preferences for embedded YouTube videos.
yt-remote-session-app	session	The yt-remote-session-app cookie is used by YouTube to store user preferences and information about the interface of the embedded YouTube video player.
yt-remote-session-name	session	The yt-remote-session-name cookie is used by YouTube to store the user's video player preferences using embedded YouTube video.
ytidb::LAST_RESULT_ENTRY_KEY	never	The cookie ytidb::LAST_RESULT_ENTRY_KEY is used by YouTube to store the last search result entry that was clicked by the user. This information is used to improve the user experience by providing more relevant search results in the future.

Cookie	Duration	Description
__hstc	6 months	Hubspot set this main cookie for tracking visitors. It contains the domain, initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session).
_fbp	3 months	Facebook sets this cookie to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising after visiting the website.
_ga	1 year 1 month 4 days	Google Analytics sets this cookie to calculate visitor, session and campaign data and track site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors.
_ga_*	1 year 1 month 4 days	Google Analytics sets this cookie to store and count page views.
_gat_UA-*	1 minute	Google Analytics sets this cookie for user behaviour tracking.
_gid	1 day	Google Analytics sets this cookie to store information on how visitors use a website while also creating an analytics report of the website's performance. Some of the collected data includes the number of visitors, their source, and the pages they visit anonymously.
AnalyticsSyncHistory	1 month	Linkedin set this cookie to store information about the time a sync took place with the lms_analytics cookie.
browser_id	5 years	This cookie is used for identifying the visitor browser on re-visit to the website.
hubspotutk	6 months	HubSpot sets this cookie to keep track of the visitors to the website. This cookie is passed to HubSpot on form submission and used when deduplicating contacts.
vuid	1 year 1 month 4 days	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos on the website.

Cookie	Duration	Description
bcookie	1 year	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser IDs.
bscookie	1 year	LinkedIn sets this cookie to store performed actions on the website.
fr	3 months	Facebook sets this cookie to show relevant advertisements by tracking user behaviour across the web, on sites with Facebook pixel or Facebook social plugin.
iutk	6 months	Issuu sets this cookie to recognise the user's device and what Issuu documents have been read.
li_sugr	3 months	LinkedIn sets this cookie to collect user behaviour data to optimise the website and make advertisements on the website more relevant.
muc_ads	1 year 1 month 4 days	Twitter sets this cookie to collect user behaviour and interaction data to optimize the website.
NID	6 months	Google sets the cookie for advertising purposes; to limit the number of times the user sees an ad, to unwanted mute ads, and to measure the effectiveness of ads.
personalization_id	1 year 1 month 4 days	Twitter sets this cookie to integrate and share features for social media and also store information about how the user uses the website, for tracking and targeting.
PREF	8 months	PREF cookie is set by Youtube to store user preferences like language, format of search results and other customizations for YouTube Videos embedded in different sites.
scribd_ubtc	10 years	Scribd sets this cookie to gather data on user behaviour across several websites and maximise the relevancy of the advertisements on the website.
test_cookie	15 minutes	doubleclick.net sets this cookie to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	6 months	YouTube sets this cookie to measure bandwidth, determining whether the user gets the new or old player interface.
VISITOR_PRIVACY_METADATA	6 months	YouTube sets this cookie to store the user's cookie consent state for the current domain.
YSC	session	Youtube sets this cookie to track the views of embedded videos on Youtube pages.
yt.innertube::nextId	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__cflb	1 hour	This cookie is used by Cloudflare for load balancing.
__eoi	6 months	Description is currently not available.
_cfuvid	session	Description is currently not available.
AN	1 month	No description available.
AS	session	No description available.
ebEventToTrack	1 month	No description available.
eblang	1 year	No description available.
hmt_id	1 month	Description is currently not available.
li_alerts	1 year	Description is currently not available.
loglevel	never	No description available.
m	1 year 1 month 4 days	No description available.
SP	session	Description is currently not available.
SS	session	Description is currently not available.
stableId	1 year	Description is currently not available.

Challenges for Genomics in the Age of Big Data

About the Author

By Meredith Salisbury

Stories

Communities

Products

Leading Advisors

Social

About

Challenges for Genomics in the Age of Big Data

Share this on:

About the Author

By Meredith Salisbury

Related Content

7 Technologies for Fighting Climate Change

By Michael Puttré

Techonomy 23 to Focus On the Promise and the Peril of AI

By Dan Costa

Seth Godin Believes We Can Still Tackle Climate Change

By Dan Costa

12 Energy Dilemmas the World Needs to Address

By Sean Captain

Most Popular in Business + Innovation

7 Technologies for Fighting Climate Change

By Michael Puttré

The Inflation Reduction Act Could be the New, New Deal

By Robin Raskin

From Ambition to Action: Building a Global Climate Coalition

By Caitlin Hamilton

12 Energy Dilemmas the World Needs to Address

By Sean Captain

Seth Godin Believes We Can Still Tackle Climate Change

By Dan Costa

Climate Chaos Is Outpacing Science and Outrunning Journalism

By Mark Schapiro

A Holistic View of Financial Health: A Conversation with Ellevest’s Sallie Krawcheck

By Amy Kugler

Deep Tech Finds a Way: Behind the Resurgences of Nuclear and Geothermal with Zachary Bogue and Rachel Slaybaugh

By Caitlin Hamilton

Top 10 AI Companies Shaping the Tech World

By Luke Dormehl

Phoenix: Leading the Way in Autonomous Vehicle Technology with Waymo

By Richard Bradley

Newsletter Subscriptions

Sign up for our newsletters

Newsletter Subscription

Sign up to our Premium Membership

Stories

Communities

Products

Leading Advisors

Social

About

Start typing and press enter to search

Newsletter Subscriptions

Sign up for our newsletters