Mutations in organismal evolution vs. cancer evolution. (A, B) A hypothetical example of DNA sequence evolution in organism vs. in cancer with the same number of mutations. (C) Mutation distribution in two species in the organismal evolution of A. (D and E) Mutation distribution in cancer evolution among 10 sequences may have D and E patterns. (F) Another pattern of mutation distribution in cancer evolution with a recurrent site but shows too few total mutations. Mutations of (F) are CDNs missed in the conventional screens.

Mutation recurrences (Ai’s and Si’s) in 12 cancer types.

Excess of Ai’s of each i class.

Analysis across 6 cancer types., ranging between 0 and 1 (Tang et al. 2004; Chen, He, et al. 2019), is a measure of p differences among the 20 amino acids (see the text). The most similar amino acids have near 0 and the most dissimilar ones have Each panel corresponds to one cancer type, with horizontal bar represents distribution of each recurrence group. The numbers panel are i values and on the right are the number of sites. Note that the proportion of dark red segments increases as i increases. This that mutations at high recurrence sites (larger i’s) code for amino acids that are chemically very different from the wild type.

Distribution of CDNs among genes.

Distribution of CDNs among genes. (A) Out of 119 CDN-carrying genes (red bars), 87 have only one CDN. For the rest, TP53 possesses the most CDNs with three others having more than 10 CDNs. (B) CDN number in TP53 among patients. The dark bar represents the observed patient number with corresponding CDNs of the X-axis. The grey bar shows the expected patient distribution. Clearly, TP53 only needs to contribute one CDN to drive tumorigenesis. Hence, TP53 (and other canonical driver genes; see text), while prevalent, does not contribute disproportionately to the tumorigenesis of each patient.

Sharing of CDNs across cancer types. The X-axis shows imax, which is the largest i a CDN reaches among the 12 cancer types. The Y axis shows the number of cancer types whre the mutation also occurs. Each dot is a CDN and the number of dots in the cloud is given. The blue and red dots denote, respectively, mutations classified as a CDN in one or multiple cancer types. Grey dots are non-CDNs. The table in the lower panel summarizes the number of sites and the number of genes harboring these sites.

Survival analysis of non-small cell lung cancer (NSCLC) patients based on EGFR mutation status. Patient data were retrieved from the GENIE database (https://genie.cbioportal.org/) and stratified into three groups based on EGFR mutation profiles: Group1 comprises patients with EGFR CDN mutations; Group2 includes patients with nonsynonymous mutations in EGFR that are not CDNs; The EGFRWT group consists of patients with no EGFR mutations (see methods). Patients of Group1 and Group2 received EGFR-targeted therapies in accordance with the guidelines for managing EGFR mutant NSCLC (Passaro et al. 2022; Choudhury et al. 2023). Survival analysis using the Kaplan-Meier method revealed a significantly higher survival rate for Group1 patients compared to Group2 and the EGFRWT group (p < 0.001).

Numbers of patients with CDNs vs. number of patients with any non-synonymous mutations in the same genes.

Gene numbers for different cancer hallmarks.