Attacks on genetic privacy via uploads to genealogical databases
Abstract
Direct-to-consumer (DTC) genetics services are increasingly popular, with tens of millions of customers. Several DTC genealogy services allow users to upload genetic data to search for relatives, identified as people with genomes that share identical by state (IBS) regions. Here, we describe methods by which an adversary can learn database genotypes by uploading multiple datasets. For example, an adversary who uploads approximately 900 genomes could recover at least one allele at SNP sites across up to 82% of the genome of a median person of European ancestries. In databases that detect IBS segments using unphased genotypes, approximately 100 falsified uploads can reveal enough genetic information to allow genome-wide genetic imputation. We provide a proof-of-concept demonstration in the GEDmatch database, and we suggest countermeasures that will prevent the exploits we describe.
Data availability
The dataset used here was assembled from publicly available datasets. The combined dataset has been deposited in Dryad at https://doi.org/10.25338/B8X619, and scripts for assembling and analyzing the data are available at https://github.com/mdedge/IBS_privacy.
-
1000 Genomes Phase 3 data1000 Genomes Project.
Article and author information
Author details
Funding
National Institutes of Health (GM108779)
- Graham Coop
National Institutes of Health (GM130050)
- Michael D Edge
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Reviewing Editor
- Magnus Nordborg, Austrian Academy of Sciences, Austria
Publication history
- Received: September 12, 2019
- Accepted: December 23, 2019
- Accepted Manuscript published: January 7, 2020 (version 1)
- Version of Record published: January 30, 2020 (version 2)
Copyright
© 2020, Edge & Coop
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 4,747
- Page views
-
- 495
- Downloads
-
- 11
- Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
If you've uploaded your DNA on genealogy databases, it may be at risk.
-
- Ecology
- Evolutionary Biology
During the struggle for survival, populations occasionally evolve new functions that give them access to untapped ecological opportunities. Theory suggests that coevolution between species can promote the evolution of such innovations by deforming fitness landscapes in ways that open new adaptive pathways. We directly tested this idea by using high-throughput gene editing-phenotyping technology (MAGE-Seq) to measure the fitness landscape of a virus, bacteriophage λ, as it coevolved with its host, the bacterium Escherichia coli. An analysis of the empirical fitness landscape revealed mutation-by-mutation-by-host-genotype interactions that demonstrate coevolution modified the contours of λ’s landscape. Computer simulations of λ’s evolution on a static versus shifting fitness landscape showed that the changes in contours increased λ’s chances of evolving the ability to use a new host receptor. By coupling sequencing and pairwise competition experiments, we demonstrated that the first mutation λ evolved en route to the innovation would only evolve in the presence of the ancestral host, whereas later steps in λ’s evolution required the shift to a resistant host. When time-shift replays of the coevolution experiment were run where host evolution was artificially accelerated, λ did not innovate to use the new receptor. This study provides direct evidence for the role of coevolution in driving evolutionary novelty and provides a quantitative framework for predicting evolution in coevolving ecological communities.