Rapid re-identification of human samples using portable DNA sequencing

  1. Sophie Zaaijer  Is a corresponding author
  2. Assaf Gordon
  3. Daniel Speyer
  4. Robert Piccone
  5. Simon Cornelis Groen
  6. Yaniv Erlich  Is a corresponding author
  1. New York Genome Center, United States
  2. Columbia University, United States
  3. New York University, United States
6 figures, 1 table and 2 additional files

Figures

Schematic overview of MinION sketching.

A DNA sample is prepared for shotgun sequencing. Libraries are prepared either for 1D or 2D MinION sequencing (without and with hairpin, respectively). Variants observed in aligned MinION reads are …

https://doi.org/10.7554/eLife.27798.003
Figure 2 with 1 supplement
Re-identification of three DNA samples against a database with 31,000 individuals.

(A) A Frappe plot showing the population structure of the database with a collection of 31,000 genome-wide SNP arrays. (B–D) The match probability is inferred by comparing a MinION sketch to its …

https://doi.org/10.7554/eLife.27798.004
Figure 2—figure supplement 1
A prior representing a database larger than the world population still allows for identification power.

The match probability is inferred by comparing a MinION sketch of YE001 to its reference file as a function of the MinION sketching time. The prior probability for a match was modified as indicated.

https://doi.org/10.7554/eLife.27798.005
Re-identification of HapMap sample NA12890.

The match probability is inferred by comparing a MinION sketch of NA12890 to the reference files of her own genome (red), her son’s genome (black), and her granddaughter’s genome (purple), as a …

https://doi.org/10.7554/eLife.27798.007
Figure 4 with 1 supplement
Cell line authentication.

Barcoded DNA from the THP1 cell line is mixed 1:1 with a random, barcoded sample. Analysis of only the THP1 reads was used to infer ‘pure’ matches, while analyses of the mixture were used to …

https://doi.org/10.7554/eLife.27798.008
Figure 4—figure supplement 1
Cell line authentication.

(A) 10,000 simulated runs of sketching SZ001 were matched against its reference file. The number of SNPs used to reach a 99.9% match is depicted in a histogram. (B) The number of mismatches …

https://doi.org/10.7554/eLife.27798.009
Figure 5 with 1 supplement
Contamination simulations.

Random reads from a run with DNA from THP1 cells and a random, barcoded sample (the contaminant) are mixed in the indicated proportions and shuffled. This simulated MinION sketch is matched against …

https://doi.org/10.7554/eLife.27798.010
Figure 5—figure supplement 1
Theoretical effect of differences in doubling time of contaminants in a cell culture.

We set the doubling time of our cell line of interest to 24 hr. We hypothesized that our culture (with a starting number of 106 cells) would be contaminated with 10 foreign cells. We considered a …

https://doi.org/10.7554/eLife.27798.011
Figure 6 with 1 supplement
Rapid library preparation.

(A) Schematic of the steps from sample to MinION sketch. The current method requires ~55 min until the MinION starts to generate reads. (B) The match probability is inferred by comparing a MinION …

https://doi.org/10.7554/eLife.27798.012
Figure 6—video 1
The movie depicts the rapid, on-site library preparation protocol using the Bento Lab (www.bento.bio) for DNA extraction and library preparation, prior to starting DNA sequencing as described in Figure 6.
https://doi.org/10.7554/eLife.27798.013

Tables

Table 1
List of databases consulted and restrictions to access.
https://doi.org/10.7554/eLife.27798.006
Databases:Restrictions to accessDataset URL:
Opensnp.orgNohttps://opensnp.org/
HapMap*No. The HapMap dataset has been discontinued (https://www.ncbi.nlm.nih.gov/variation/news/NCBI_retiring_HapMap/) and the archived HapMap data is available via FTP from ftp://ftp.ncbi.nlm.nih.gov/hapmap/. The relevant files used for this study have been downloaded from the latter in 2015.http://www.completegenomics.com/documents/PublicGenomes.pdf and ftp://ftp.ncbi.nlm.nih.gov/hapmap/
DNA.landYes. The 29,554 genomes provided by DNA.land are not available for distribution to ensure genomic privacy of the individuals who donated their genomes to DNA.landhttps://dna.land/
CCLEYes. Public access is available by registration. The data made available on the Encyclopedia is for internal research purposes, as specified in CCLE Terms of Access (https://portals.broadinstitute.org/ccle/about). The SNP and Expression data from the Cancer Cell Line Encyclopedia (CCLE) is available on GEO under accession number GSE36139.https://portals.broadinstitute.org/ccle/ and https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE36139

Additional files

Supplementary file 1

Supplementary Tables.

Run statistics for the MinION sketch experiments.

https://doi.org/10.7554/eLife.27798.014
Transparent reporting form
https://doi.org/10.7554/eLife.27798.015

Download links