A highly accurate platform for clone-specific mutation discovery enables the study of active mutational processes

  1. Mohammad KaramiNejadRanjbar
  2. Sahand Sharifzadeh
  3. Nina C Wietek
  4. Mara Artibani
  5. Salma El-Sahhar
  6. Tatjana Sauka-Spengler
  7. Christopher Yau
  8. Volker Tresp
  9. Ahmed A Ahmed  Is a corresponding author
  1. Weatherall Institute of Molecular Medicine, University of Oxford, United Kingdom
  2. Nuffield Department of Women's & Reproductive Health, University of Oxford, United Kingdom
  3. Ludwig Maximilian University of Munich, Germany
  4. Radcliffe Department of Medicine, University of Oxford, United Kingdom
  5. Institute of Cancer and Genomic Sciences, University of Birmingham, United Kingdom
  6. Siemens AG, Corporate Technology, Germany
3 figures, 1 video, 2 tables and 1 additional file

Figures

Figure 1 with 3 supplements
DigiPico sequencing rationale, workflow, and performance.

(A) WGS approaches can only identify early mutational processes (EM) in dominant expanded clones in a tumor (red and blue). Currently active mutational processes (CM) result in a diverse set of …

Figure 1—source data 1

Raw values for the whole genome amplification monitoring of runs D1110 and D1111.

https://cdn.elifesciences.org/articles/55207/elife-55207-fig1-data1-v2.xlsx
Figure 1—source data 2

Raw relative fluorescent values for miniaturized whole genome amplification reactions over time.

https://cdn.elifesciences.org/articles/55207/elife-55207-fig1-data2-v2.xlsx
Figure 1—source data 3

Per well mapping rate, breadth and depth of coverage for runs D1110 and D1111.

https://cdn.elifesciences.org/articles/55207/elife-55207-fig1-data3-v2.xlsx
Figure 1—source data 4

Sequence of oligonucleotides used in DigiPico library preparation.

https://cdn.elifesciences.org/articles/55207/elife-55207-fig1-data4-v2.xlsx
Figure 1—figure supplement 1
Challenges in identifying recent mutations.

While old mutations can easily be studied from bulk sequencing data of the tumor, the study of recent mutations from such data is hampered because of the low variant allele fraction (VAF) of the …

Figure 1—figure supplement 2
DigiPico library preparation workflow.

The exact composition of all master mixes is explained in the methods section. All liquid handling steps were performed using Mosquito liquid handler (Video 1).

Figure 1—figure supplement 3
Analysis workflow for DigiPico data.

(Turajlic et al., 2019) Next generation sequencing reads from normal tissue, bulk of the tumor, and DigiPico library are first mapped to human genome to generate bam files. DigiPico reads are …

Figure 2 with 5 supplements
MutLX algorithm, design and results.

(A) Comparing the number of wells supporting various mutation types in run D1110 confirms that, as hypothesized, the majority of UTDs are present in only a few wells. Horizontal lines represent …

Figure 2—source data 1

Comparison of MutLX analysis with Platypus and SCcaller.

https://cdn.elifesciences.org/articles/55207/elife-55207-fig2-data1-v2.xlsx
Figure 2—source data 2

Targeted sequencing of some of the clone-specific variants identified in run D1111.

Amplicon sequencing of the target sites was performed on the MiSeq platform. 3 out of the 14 targets appeared to have a high noise levels in the blood sample (highlighted in orange) and were therefore deemed inconclusive. Of the remaining 11 mutations only one did not seem to have any evidence in the bulk DNA sample of the PT2R tumor (highlighted in blue). VAWF: Variant Allele Well Fraction.

https://cdn.elifesciences.org/articles/55207/elife-55207-fig2-data2-v2.docx
Figure 2—source data 3

Targeted sequencing of some of the artefactual variants identified in run DE111.

https://cdn.elifesciences.org/articles/55207/elife-55207-fig2-data3-v2.docx
Figure 2—source data 4

Number of different mutation types in various sequencing runs.

https://cdn.elifesciences.org/articles/55207/elife-55207-fig2-data4-v2.xlsx
Figure 2—source data 5

Primer pairs used for targeted validation of UTD variants.

https://cdn.elifesciences.org/articles/55207/elife-55207-fig2-data5-v2.xlsx
Figure 2—figure supplement 1
MutLX analysis algorithm.

(Turajlic et al., 2019) UTD variants are identified by subtracting WGS data from DigiPico data. (Zhang et al., 2018) UTDs and SNPs are used as training sets to train a primary binary classification …

Figure 2—figure supplement 2
Analysis of the synthetic DigiPico datasets.

Various numbers of high-confidence somatic mutations were artificially mislabeled as UTDs (UTD*s) to synthetically inflate the number of true UTDs at various UTD*/UTD ratios in runs (A) D1110 and (B)…

Figure 2—figure supplement 3
Frequency of various mutation types among FP calls identified by MutLX in DigiPico data.

Green dots (on the left) are obtained from the analysis of somatic variants in standard WGS data from patients #11152, #11513, and OP1036. Red dots (on the right) are obtained from all of …

Figure 2—figure supplement 4
Probability score values for runs D1110, D1111, DE011, and GM12885.

A cut-off value of 0.2 removes the majority of FP variant calls from UTDs while retaining nearly all germline SNPs in all samples.

Figure 2—figure supplement 5
Data simulation confirmed that the AUC negatively correlates with the number of true UTD variants.

Various number of somatic mutations were artificially mislabeled as UTDs (UTD*s) to achieve 1%, 5%, and 10% UTD*/UTD ratios in runs D1110 and DE111. Since both of these runs had been performed on …

Figure 3 with 1 supplement
Identification of an active mutational process using DigiPico/MutLX.

(A) Schematic representation of tumor evolution in HGSOC patient #11152. Standard bulk WGS of various tumor samples identified ∼11000 shared somatic mutations among all sites. Dotted purple line …

Figure 3—figure supplement 1
IGV images of SNVs that were identified in the sub-clonal kataegis in PT2R sample.

Note that nearly all mutations are in the form of C > T or C > G mutations on the forward strand of the genome.

Videos

Video 1
Automated pipetting procedure of DigiPico library preparation.

Tables

Table 1
Application of DigiPico sequencing and MutLX analysis to a diverse set of clinical samples.
Run IDPatient IDSample typeCollection siteSequencing platformTotal UTDPassed UTDsValidation rate*AUC
D1110#11152RecurrencePT2RNextSeq16344bNA0.95
D1111#11152RecurrencePT2RNextSeq1325266240/266c0.85
D1112*#11152RecurrencePT2RHiSeq 40001219210189/210c0.85
D1511#11152RecurrencePALNRHiSeq X17869NA0.94
D1210#11152Pre-chemoOMNextSeq31392816/28c0.94
D1211#11152Pre-chemoOMHiSeq 400055216917/69c0.91
D1212#11152Pre-chemoOMHiSeq 400050152416/24c0.94
D1213#11152Pre-chemoOMHiSeq 400050904625/46c0.93
D1214#11152Pre-chemoOMHiSeq 400034153727/37c0.93
DE011#11513NormalBloodHiSeq X17597bNA0.95
DE111#11513Pre-chemoAscitesHiSeq X36854bNA0.97
D6311OP1036Pre-chemoRPCGHiSeq X318512NA0.96
DA111#11502Pre-chemoLPrtNextSeq1251110NA0.97
GM12885-Cell line-NextSeq29703bNA0.96
  1. * Run D1112 is a technical replicate of run D1111. b Runs where true UTDs are not expected to be present.c Validation through comparison with independent high-depth WGS data from the bulk of the tumor. * Validation rate is an under-estimation for the positive predictive value of clone-specific variants. PT2R: Pelvic Tumor Recurrence; PALNR: Para-Aortic Lymph Node Recurrence; OM: Omental Mass; RPCG: Right Paracolic Gutter; LPrt: Left Peritoneum; NA: Not Available; AUC: Area Under the Curve of receiver operating characteristic plot.

    The study of active mutational processes using DigiPico/MutLX.

Table 2
Summary of patient samples and associated sequencing experiments.
Patient IDSample typeCollection siteAnalysis performed
Bulk WGSDigiPico sequencing
#11152NormalBloodx
#11152Pre-chemoOmentumxx
#11152RecurrencePelvic tumorxx
#11152RecurrenceLymph nodex
#11513NormalBloodxx
#11513Pre-chemoAscitesxx
OP1036NormalBloodx
OP1036Pre-chemoParacolic gutterxx
#11502NormalBloodx
#11502Pre-chemoPeritoneumxx

Additional files

Download links