Attacks on genetic privacy via uploads to genealogical databases

  1. Michael D Edge  Is a corresponding author
  2. Graham Coop  Is a corresponding author
  1. University of California, Davis, United States

Abstract

Direct-to-consumer (DTC) genetics services are increasingly popular, with tens of millions of customers. Several DTC genealogy services allow users to upload genetic data to search for relatives, identified as people with genomes that share identical by state (IBS) regions. Here, we describe methods by which an adversary can learn database genotypes by uploading multiple datasets. For example, an adversary who uploads approximately 900 genomes could recover at least one allele at SNP sites across up to 82% of the genome of a median person of European ancestries. In databases that detect IBS segments using unphased genotypes, approximately 100 falsified uploads can reveal enough genetic information to allow genome-wide genetic imputation. We provide a proof-of-concept demonstration in the GEDmatch database, and we suggest countermeasures that will prevent the exploits we describe.

Data availability

The dataset used here was assembled from publicly available datasets. The combined dataset has been deposited in Dryad at https://doi.org/10.25338/B8X619, and scripts for assembling and analyzing the data are available at https://github.com/mdedge/IBS_privacy.

The following previously published data sets were used

Article and author information

Author details

  1. Michael D Edge

    Center for Population Biology, University of California, Davis, Davis, United States
    For correspondence
    mdedge@ucdavis.edu
    Competing interests
    No competing interests declared.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-8773-2906
  2. Graham Coop

    Center for Population Biology, University of California, Davis, Davis, United States
    For correspondence
    gmcoop@ucdavis.edu
    Competing interests
    Graham Coop, Reviewing editor, eLife.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-8431-0302

Funding

National Institutes of Health (GM108779)

  • Graham Coop

National Institutes of Health (GM130050)

  • Michael D Edge

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Copyright

© 2020, Edge & Coop

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 5,964
    views
  • 636
    downloads
  • 36
    citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Michael D Edge
  2. Graham Coop
(2020)
Attacks on genetic privacy via uploads to genealogical databases
eLife 9:e51810.
https://doi.org/10.7554/eLife.51810

Share this article

https://doi.org/10.7554/eLife.51810

Further reading

  1. If you've uploaded your DNA on genealogy databases, it may be at risk.

    1. Evolutionary Biology
    Mauna R Dasari, Kimberly E Roche ... Elizabeth A Archie
    Research Article

    Mammalian gut microbiomes are highly dynamic communities that shape and are shaped by host aging, including age-related changes to host immunity, metabolism, and behavior. As such, gut microbial composition may provide valuable information on host biological age. Here, we test this idea by creating a microbiome-based age predictor using 13,563 gut microbial profiles from 479 wild baboons collected over 14 years. The resulting ‘microbiome clock’ predicts host chronological age. Deviations from the clock’s predictions are linked to some demographic and socio-environmental factors that predict baboon health and survival: animals who appear old-for-age tend to be male, sampled in the dry season (for females), and have high social status (both sexes). However, an individual’s ‘microbiome age’ does not predict the attainment of developmental milestones or lifespan. Hence, in our host population, gut microbiome age largely reflects current, as opposed to past, social and environmental conditions, and does not predict the pace of host development or host mortality risk. We add to a growing understanding of how age is reflected in different host phenotypes and what forces modify biological age in primates.