Summary of clinical features of 738 cancer patients and 1,550 healthy controls in discovery and validation cohorts.

Workflow of SPOT-MAS assay for multi-cancer detection and localization.

There are three main steps in the SPOT-MAS assay. Firstly, cfDNA is isolated from peripheral blood, then treated with bisulfite conversion and adapter ligation to make whole-genome bisulfite cfDNA library. Secondly, whole-genome bisulfite cfDNA library is subjected to hybridization by probes specific for 450 target regions to collect the target capture fraction. The whole-genome fraction was retrieved by collecting the ‘flow-through’ and hybridized with probes specific for adapter sequences of DNA library. Both the target capture and whole-genome fractions were subjected to massive parallel sequencing and the resulting data were pre-processed into five different features of cfDNA: Target methylation (TM), genome-wide methylation (GWM), fragment length profile (Flen), DNA copy number (CNA) and end motif (EM). Finally, machine learning models and graph convolutional neural networks are adopted for classification of cancer status and identification tissue of origin.

Analysis of targeted methylation in cfDNA.

(A) Volcano plot shows log2 fold change (logFC) and significance (-log10 Benjamini-Hochberg adjusted p-value from Wilcoxon rank-sum test) of 450 target regions when comparing 499 cancer patients and 1,076 healthy controls in the discovery cohort. There are 402 DMRs (p-value < 0.05), color-coded by genomic locations. (B) Number of DMRs in the four genomic locations. (C) KEGG and WP pathway enrichment analysis using g:Profiler for genes associated with the DMRs. A total of 36 pathways are enriched, suggesting a link between differences in methylation regions and tumorigenesis.

Genome-wide methylation changes in cfDNA of cancer patients.

(A) Density plot showing the distribution of genome-wide methylation ratio for all cancer patients (red curve, n= 499) and healthy participants (blue curve, n= 1,076). The left-ward shift in cancer sample indicates global hypomethylation in the cancer genome (p < 0.0001, two-sample Kolmogorov-Smirnov test). (B) Log2 fold change of methylation ratio between cancer patients and healthy participants in each bin across 22 chromosomes. Each dot indicates a bin, identified as hypermethylated (red), hypomethylated (blue), or no significant change in methylation (grey).

Analysis of copy number aberration (CNA) in cfDNA.

(A) Log2 fold change of DNA copy number in each bin across 22 autosomes between 499 cancer patients and 1,076 healthy participants in the discovery cohort. Each dot represents a bin identified as gain (red), loss (blue) or no change (grey) in copy number. (B) Proportions of different CNA bins in each autosomes.

Analysis of fragment length patterns of ctDNA in plasma.

(A) Density plot of fragment length between cancer patients (red, n=499) and healthy participants (blue, n=1,076) in the discovery cohort. Inset corresponds to an x-axis expansion of short fragment (<150 bp). (B) Ratio of short to long fragments across 22 autosomes. Each dot indicates a mean ratio for each bin in cancer patients (red) and healthy participants (blue).

Differences in 4-mer end motif between cancer and healthy cfDNA.

(A) Heatmap shows log2 fold change of 256 4-mer end motifs in cancer patients (n=499) compared to healthy controls (n=1,076). (B) Box plots showing the top ten motifs with significant differences in frequency between cancer patients (red) and healthy controls (blue) using Wilcoxon rank-sum test with Bonferroni-adjusted p-value < 0.0001.

Model construction and performance validation for SPOT-MAS.

(A) Two model construction strategies for cancer detection. (B) and (C) ROC curves comparing the performance of single-feature models, and two combination models (concatenate and ensemble stacking) in the discovery (B) and validation cohorts (C). (D) and (E) Bar charts showing the specificity and sensitivity of single-feature models and two combination models (concatenate and ensemble stacking) in the discovery (D) and validation cohorts (E). (F) and (G) Dot plots showing the sensitivity of SPOT-MAS assay in detection of 5 different cancer types in the discovery (F) and validation cohorts (G). The points and error bars represent the average sensitivity over 20 runs and 95% confidence intervals. Feature abbreviations as follows: TM – target methylation density, GWM – genome-wide methylation density, CNA – copy number aberration, EM – 4-mer end motif, FLEN – fragment length distribution, LONG – long fragment count, SHORT – short fragment count, TOTAL – all fragment count, RATIO – ratio of short/long fragment.

The performance of SPOT-MAS assay in prediction of the tissue of origin.

(A) Model construction strategy to predict tissue of origin by combining nine sets of cfDNA features using graph convolutional neural networks. (B) Heatmap shows feature important scores of five cancer types. (C) Bar chart indicates the contribution of important features for classifying five different cancers. (D) Three dimensions graph represents the classification of five cancer types. (E) and (F) Cross-tables show agreement between the prediction (x-axis) and the reference (y-axis) to predict tissue of origin in the discovery cohort (E) and validation cohort (F).