Direct-to-consumer (DTC) genetics services are increasingly popular, with tens of millions of customers. Several DTC genealogy services allow users to upload genetic data to search for relatives, identified as people with genomes that share identical by state (IBS) regions. Here, we describe methods by which an adversary can learn database genotypes by uploading multiple datasets. For example, an adversary who uploads approximately 900 genomes could recover at least one allele at SNP sites across up to 82% of the genome of a median person of European ancestries. In databases that detect IBS segments using unphased genotypes, approximately 100 falsified uploads can reveal enough genetic information to allow genome-wide genetic imputation. We provide a proof-of-concept demonstration in the GEDmatch database, and we suggest countermeasures that will prevent the exploits we describe.
The dataset used here was assembled from publicly available datasets. The combined dataset has been deposited in Dryad at https://doi.org/10.25338/B8X619, and scripts for assembling and analyzing the data are available at https://github.com/mdedge/IBS_privacy.
1000 Genomes Phase 3 data1000 Genomes Project.
Downloadable genotypes of present-day and ancient DNA data (compiled from published papers)Reich Lab Harvard Medical School.
- Graham Coop
- Michael D Edge
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
- Magnus Nordborg, Austrian Academy of Sciences, Austria
© 2020, Edge & Coop
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
If you've uploaded your DNA on genealogy databases, it may be at risk.
Evolution can tinker with multi-protein machines and replace them with simpler single-protein systems performing equivalent functions in an equally efficient manner. It is unclear how, on a molecular level, such simplification can arise. With ancestral reconstruction and biochemical analysis, we have traced the evolution of bacterial small heat shock proteins (sHsp), which help to refold proteins from aggregates using either two proteins with different functions (IbpA and IbpB) or a secondarily single sHsp that performs both functions in an equally efficient way. Secondarily single sHsp evolved from IbpA, an ancestor specialized in strong substrate binding. Evolution of an intermolecular binding site drove the alteration of substrate binding properties, as well as the formation of higher-order oligomers. Upon two mutations in the α-crystallin domain, secondarily single sHsp interacts with aggregated substrates less tightly. Paradoxically, less efficient binding positively influences the ability of sHsp to stimulate substrate refolding, since the dissociation of sHps from aggregates is required to initiate Hsp70-Hsp100-dependent substrate refolding. After the loss of a partner, IbpA took over its role in facilitating the sHsp dissociation from an aggregate by weakening the interaction with the substrate, which became beneficial for the refolding process. We show that the same two amino acids introduced in modern-day systems define whether the IbpA acts as a single sHsp or obligatorily cooperates with an IbpB partner. Our discoveries illuminate how one sequence has evolved to encode functions previously performed by two distinct proteins.