Abstract
Direct-to-consumer (DTC) genetics services are increasingly popular, with tens of millions of customers. Several DTC genealogy services allow users to upload genetic data to search for relatives, identified as people with genomes that share identical by state (IBS) regions. Here, we describe methods by which an adversary can learn database genotypes by uploading multiple datasets. For example, an adversary who uploads approximately 900 genomes could recover at least one allele at SNP sites across up to 82% of the genome of a median person of European ancestries. In databases that detect IBS segments using unphased genotypes, approximately 100 falsified uploads can reveal enough genetic information to allow genome-wide genetic imputation. We provide a proof-of-concept demonstration in the GEDmatch database, and we suggest countermeasures that will prevent the exploits we describe.
Article and author information
Author details
Funding
National Institutes of Health (GM108779)
- Graham Coop
National Institutes of Health (GM130050)
- Michael D Edge
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Reviewing Editor
- Magnus Nordborg, Austrian Academy of Sciences, Austria
Publication history
- Received: September 12, 2019
- Accepted: December 23, 2019
- Accepted Manuscript published: January 7, 2020 (version 1)
- Version of Record published: January 30, 2020 (version 2)
Copyright
© 2020, Edge & Coop
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 3,266
- Page views
-
- 379
- Downloads
-
- 4
- Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.