Validation of CorEx genes.
(a) Cross-validation experiments validating the predictive value of CorEx genes. Precision denotes the fraction of predicted samples that were truly ecDNA(+). Recall refers to the fraction of ecDNA(+) samples that were predicted correctly. (b) For precision windows of width 0.1 and a value of at least 0.5, recall values were plotted as boxplots. The interquartile ranges for CorEx and Core genes overlap, suggesting similar predictive power. (Continued on the following page.). CorEx genes have higher predictive rates compared to the top 643 differentially expressed genes based on logarithmic fold changes from a DESeq2 analysis (Top-|LFC| genes), 3,012 significant genes selected from a generalized linear model (GLM), and 643 randomly selected genes. (c) CorEx genes were consistently up- or down-regulated in ecDNA(+) samples across tumor types, with the exception of SARC. AU p-values from multiscale bootstrap resampling are shown at the dendrogram branches. (d) Of the 643 Top-|LFC| genes, 240 were up-regulated while 403 were down-regulated in ecDNA(+) samples. Of the CorEx genes, 325 were up-regulated while 318 were down-regulated. The absolute LFC values of the Top-|LFC| gene set was significantly greater than that of the CorEx genes (p-value 1.83e-158). (e) The normalized gene expression values of the CorEx genes were significantly higher than that of the Top-|LFC| gene set (p-value < 2e-308). ***p-value < 0.001.