Attacks on genetic privacy via uploads to genealogical databases

Abstract
Data availability
Article and author information
Metrics

Abstract

Direct-to-consumer (DTC) genetics services are increasingly popular, with tens of millions of customers. Several DTC genealogy services allow users to upload genetic data to search for relatives, identified as people with genomes that share identical by state (IBS) regions. Here, we describe methods by which an adversary can learn database genotypes by uploading multiple datasets. For example, an adversary who uploads approximately 900 genomes could recover at least one allele at SNP sites across up to 82% of the genome of a median person of European ancestries. In databases that detect IBS segments using unphased genotypes, approximately 100 falsified uploads can reveal enough genetic information to allow genome-wide genetic imputation. We provide a proof-of-concept demonstration in the GEDmatch database, and we suggest countermeasures that will prevent the exploits we describe.

Data availability

The dataset used here was assembled from publicly available datasets. The combined dataset has been deposited in Dryad at https://doi.org/10.25338/B8X619, and scripts for assembling and analyzing the data are available at https://github.com/mdedge/IBS_privacy.

The following previously published data sets were used

1. 1000 Genomes Project Consortium
(2012) 1000 Genomes Phase 3 data
1000 Genomes Project.

ftp://ftp.1000genomes.491ebi.ac.uk/vol1/ftp/release/20130502/
1. Patterson N
2. Moorjani P
3. Luo Y
4. Mallick S
5. Rohland N
6. Zhan Y
7. Genschoreck T
8. Webster T
9. Reich D
10. et al
(2019) Downloadable genotypes of present-day and ancient DNA data (compiled from published papers)
Reich Lab Harvard Medical School.

https://reich.hms.harvard.edu/downloadable-genotypes-present-day-and-ancient-dna-data-compiled-published-papers

Article and author information

Author details

Michael D Edge

Center for Population Biology, University of California, Davis, Davis, United States

For correspondence
mdedge@ucdavis.edu

Competing interests
No competing interests declared.

"This ORCID iD identifies the author of this article:" 0000-0001-8773-2906
Graham Coop

Center for Population Biology, University of California, Davis, Davis, United States

For correspondence
gmcoop@ucdavis.edu

Competing interests
Graham Coop, Reviewing editor, eLife.

"This ORCID iD identifies the author of this article:" 0000-0001-8431-0302

Funding

National Institutes of Health (GM108779)

Graham Coop

National Institutes of Health (GM130050)

Michael D Edge

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.