Attacks on genetic privacy via uploads to genealogical databases

  1. Michael D Edge  Is a corresponding author
  2. Graham Coop  Is a corresponding author
  1. University of California, Davis, United States
  2. University of Southern California, United States
5 figures, 3 tables and 1 additional file

Figures

Schematics of the IBS tiling and IBS probing procedures.

(A) In IBS tiling, multiple genotypes are uploaded (green lines) and the positions at which they are IBS with the target (represented by blue lines) are recorded. Once enough datasets have been …

Figure 2 with 5 supplements
Lengths of genome in Giga base-pairs (Gbp) covered by IBS tiling as a function of minimum required length of IBS segments in centiMorgans (cM) and size of a randomly selected comparison sample for the median person in our dataset.

The top-left panel shows the average coverage across each of the person’s two haplotypes. The top-right shows IBS2 coverage, the length of genome where both haplotypes are covered by IBS tiles. The …

Figure 2—figure supplement 1
Tiling performance with IBS segments that are unlikely to be IBD filtered out.

Conventions are the same as in Figure 2; the difference is that now only IBS segments that represent likely IBD (LOD score >3) are included. As expected, the amount of tiling possible is reduced …

Figure 2—figure supplement 1—source data 1

Tiling performance with IBS segments that are unlikely to be IBD filtered out.

https://cdn.elifesciences.org/articles/51810/elife-51810-fig2-figsupp1-data1-v2.csv
Figure 2—figure supplement 2
IBS tiling performance, limiting to comparison samples who share at least 1 IBS segment of 8 cM or more with the target.

Conventions are the same as in Figure 2. Some DTC genetics companies use a two-step approach for reporting IBS information to users. For example, at this writing, MyHeritage identifies people who …

Figure 2—figure supplement 2—source data 1

IBS tiling performance, limiting to comparison samples who share at least 1 IBS segment of8cMor more with the target.

https://cdn.elifesciences.org/articles/51810/elife-51810-fig2-figsupp2-data1-v2.csv
Figure 2—figure supplement 3
IBS tiling performance when genotype phasing switches are disallowed.

Conventions are the same as in the Figure 2. We called IBS segments using Germline (Gusev et al., 2009), using the haploid flag to find IBS segments within the phased chromosomes produced by Beagle. …

Figure 2—figure supplement 3—source data 1

IBS tiling performance when genotype phasing switches are disallowed.

https://cdn.elifesciences.org/articles/51810/elife-51810-fig2-figsupp3-data1-v2.csv
Figure 2—figure supplement 4
IBS tiling performance in selected populations.

We examined IBS tiling performance in four European populations from the 1000Genomes data—Finnish in Finland (FIN, top left, 99 people), British in England and Scotland (GBR, top right, 91 people), …

Figure 2—figure supplement 4—source data 1

IBS tiling performance in selected populations.

https://cdn.elifesciences.org/articles/51810/elife-51810-fig2-figsupp4-data1-v2.csv
Figure 2—figure supplement 5
IBS tiling performance in terms of number of total alleles covered (left panel) and number of minor alleles covered (right panel, 18.6% of total alleles were minor alleles).

We used Germline in haploid mode (as in Figure 2—figure supplement 3), as it allows easier identification of which allele is covered by a given IBS segment. Dashed lines show the results in terms of …

Figure 2—figure supplement 5—source data 1

IBS tiling performance in terms of number of total alleles covered (left panel) and number of minor alleles covered.

https://cdn.elifesciences.org/articles/51810/elife-51810-fig2-figsupp5-data1-v2.csv
Figure 3 with 2 supplements
A demonstration of the IBS probing method around position 45411941 on chromosome 19 (GRCh37 coordinates), in the APOE locus.

We show the proportion of haplotypes among the 872 Europeans in our sample covered IBS by probes constructed from the sample, as a function of the chromosomal location in a 10-Mb region around the …

Figure 3—figure supplement 1
IBS probing with including only segments with LOD>3.

A demonstration of the IBD probing method around position 45411941 on chromosome 19 (GRCh37 coordinates), in the APOE locus. Conventions are the same as in Figure 3; the difference is that only IBS …

Figure 3—figure supplement 2
IBS probing using Germline (Gusev et al., 2009) in haploid mode.

A demonstration of the IBS probing method around position 45411941 on chromosome 19 (GRCh37 coordinates), in the APOE locus. Conventions are the same as in Figure 3; the difference is that IBS …

Schematics of the IBS baiting procedure.

(A) To perform IBS baiting at a single site, two uploads are required, each with runs of heterozygous genotypes flanking the key site. At the key site, the two uploaded datasets are homozygous for …

Visualization of IBS baiting using GEDmatch’s 1-to-1 chromosome browser.

Left: Zoomed-in view of the region containing key SNP 1, showing the three target kits (T1–T3) matched to the two bait kits (B1 and B2). Right: Zoomed-out views of regions containing all four key …

Tables

Table 1
Key parameters for several genetic genealogy services that allow user uploads as of July 26th, 2019.
ServiceDatabase size (millions)Individuals shownIBS/IBD Segments Reported
GEDmatch1.23000 closest matches shown free; Unlimited w/ $10/month license; any two kits can be searched against each otherYes if longer than user-set threshold. Min. threshold 0.1 cM, default 7 cM
FamilyTreeDNA1*All that share at least one 9 cM block or one 7.69 cM block and 20 total cMYes, down to 1 cM, for $19 per kit
MyHeritage3All that share at least one 8 cM blockYes, down to 6 cM, for $29 per kit or unlimited for $129/year. Customers may opt out
LivingDNAUnknownPutative relatives out to about 4th-cousin rangeOnly sum length of matching segments reported
DNA.LAND**0.159Top 50 matches shown with minimum 3 cM segmentYes
  1. *Although Regalado (2019) reports that FamilyTreeDNA has two million users, he also suggests that only about half of these are genotyped at genome-wide autosomal SNPs, which is in line with other estimates (Larkin, 2018).

    **DNA.LAND has discontinued genealogical matching for uploaded samples as of July 26th, 2019.

Table 2
Potential countermeasures against the methods of attack outlined here, with their likely effectiveness against IBS tiling, IBS probing, and IBS baiting.
StrategyPrevents IBS tilingPrevents IBS probingPrevents IBS baiting
Require cryptographic signature from genotyping serviceYesYesYes
Restrict reporting of IBS to long segments (e.g. >8 cM)PartiallyPartiallyWeakly
Report number and lengths of IBS segments but not locationsYesNoPartially
Block homozygous uploadsPartiallyNoNo
Report small number of matching individuals per kitPartiallyPartiallyPartially
Disallow matching between arbitrary kitsPartiallyPartiallyPartially
Block uploads of publicly available genomesPartiallyNoNo
Block uploads with evidence of IBS-inert segmentsNoYesNo
Block uploads with long runs of heterozygosityNoNoPartially
Use phase-aware methods for IBS detectionNoNoYes
Table 3
Summary of the SNPs targeted by baiting and the IBS returned by GEDmatch.

For each region, we give the position of the key SNP (target bp). Because by design our bait kits are genetically identical outside of the target SNPs, the IBS regions returned by GEDmatch’s 1-to-1 …

Matching pairsTarget 1Target 2Target 3Target 4
target bp27613130340240973767378142008068
T1-(B1 Bmiss)
IBS L bp27427698337716723751986440054428
IBS R bp27680780343287413782771143112674
IBS cM1.30.81.11.2
# SNPs47454240
# SNPs Bmiss46444139
T2-(B1 B2 Bmiss)
IBS L bp27433179337716723750850740357667
IBS R bp27680780343287413782771143112674
IBS cM1.30.81.20.9
# SNPs45454532
# SNPs Bmiss44444431
T3-(B3 Bmiss)
IBS L bp27433179337716723751986440357667
IBS R bp27680780343287413782771143112674
IBS cM1.30.81.10.9
# SNPs45454532
# SNPs Bmiss44444131
Tmiss-(All Baits)
IBS L bp27433179337716723751986440357667
IBS R bp27680780343287413782771143112674
IBS cM1.30.81.10.9
# SNPs44444431
# SNPs Bmiss44444431

Additional files

Download links