1. Evolutionary Biology
  2. Genetics and Genomics
Download icon

Attacks on genetic privacy via uploads to genealogical databases

  1. Michael D Edge  Is a corresponding author
  2. Graham Coop  Is a corresponding author
  1. University of California, Davis, United States
Research Article
  • Cited 5
  • Views 3,633
  • Annotations
Cite this article as: eLife 2020;9:e51810 doi: 10.7554/eLife.51810

Abstract

Direct-to-consumer (DTC) genetics services are increasingly popular, with tens of millions of customers. Several DTC genealogy services allow users to upload genetic data to search for relatives, identified as people with genomes that share identical by state (IBS) regions. Here, we describe methods by which an adversary can learn database genotypes by uploading multiple datasets. For example, an adversary who uploads approximately 900 genomes could recover at least one allele at SNP sites across up to 82% of the genome of a median person of European ancestries. In databases that detect IBS segments using unphased genotypes, approximately 100 falsified uploads can reveal enough genetic information to allow genome-wide genetic imputation. We provide a proof-of-concept demonstration in the GEDmatch database, and we suggest countermeasures that will prevent the exploits we describe.

Article and author information

Author details

  1. Michael D Edge

    Center for Population Biology, University of California, Davis, Davis, United States
    For correspondence
    mdedge@ucdavis.edu
    Competing interests
    No competing interests declared.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-8773-2906
  2. Graham Coop

    Center for Population Biology, University of California, Davis, Davis, United States
    For correspondence
    gmcoop@ucdavis.edu
    Competing interests
    Graham Coop, Reviewing editor, eLife.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-8431-0302

Funding

National Institutes of Health (GM108779)

  • Graham Coop

National Institutes of Health (GM130050)

  • Michael D Edge

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Reviewing Editor

  1. Magnus Nordborg, Austrian Academy of Sciences, Austria

Publication history

  1. Received: September 12, 2019
  2. Accepted: December 23, 2019
  3. Accepted Manuscript published: January 7, 2020 (version 1)
  4. Version of Record published: January 30, 2020 (version 2)

Copyright

© 2020, Edge & Coop

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 3,633
    Page views
  • 412
    Downloads
  • 5
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

  1. If you've uploaded your DNA on genealogy databases, it may be at risk.

    1. Evolutionary Biology
    2. Genetics and Genomics
    Paloma Diaz-Maroto et al.
    Research Article Updated

    The study of South American camelids and their domestication is a highly debated topic in zooarchaeology. Identifying the domestic species (alpaca and llama) in archaeological sites based solely on morphological data is challenging due to their similarity with respect to their wild ancestors. Using genetic methods also presents challenges due to the hybridization history of the domestic species, which are thought to have extensively hybridized following the Spanish conquest of South America that resulted in camelids slaughtered en masse. In this study, we generated mitochondrial genomes for 61 ancient South American camelids dated between 3,500 and 2,400 years before the present (Early Formative period) from two archaeological sites in Northern Chile (Tulán-54 and Tulán-85), as well as 66 modern camelid mitogenomes and 815 modern mitochondrial control region sequences from across South America. In addition, we performed osteometric analyses to differentiate big and small body size camelids. A comparative analysis of these data suggests that a substantial proportion of the ancient vicuña genetic variation has been lost since the Early Formative period, as it is not present in modern specimens. Moreover, we propose a domestication hypothesis that includes an ancient guanaco population that no longer exists. Finally, we find evidence that interbreeding practices were widespread during the domestication process by the early camelid herders in the Atacama during the Early Formative period and predating the Spanish conquest.