Introduction

Cancer is one of the leading causes of death worldwide, with over 200 types identified. Despite of huge advancements in the field, accurately predicting cancer patient prognosis still remains a significant challenge1,2. Previous studies have highlighted the complexity of this task, which is influenced by various factors including cancer type, clinical stage, therapeutic interventions, nursing care, unexpected comorbidities, and other non-cancer related illnesses that may interplay36. Furthermore, the gene expression plays a crucial role in predicting cancer patient prognosis, such as HER2, VEGF, Ki67, etc712. Nevertheless, there is still a need for further improvement in prognostic accuracy to better inform treatment decisions and patient outcomes.

Survival analysis is commonly used to assess the correlation between genes and cancer patient prognosis13. However, inconsistent findings are frequently observed, even within the same type of cancer. For instance, studies exploring the correlation between CCND1 and prognosis in NSCLC reported contrasting results, including positive correlation, inverse correlation, or negligible influence1416. These discrepancies can be attributed to various influential factors, such as differences in sample size, cohort characteristics (including cancer subtypes and tumor staging) and variations in therapy approaches17. Such observations raise an important question: whether there exist Genes Steadily Associated with Prognosis (GEARs) that consistently correlate with patient outcomes across different conditions, particularly considering the varying sample sizes used in different studies? Affirmative answers to this question would have significant implication not only for the development of more accurate prognostic models but also for enhancing our understanding of cancer biology.

The Cancer Genome Atlas (TCGA) is a comprehensive cancer genomics project initiated in the United States. It consists of transcriptome data, genomic data, and clinical information pertaining to 33 different cancer types, making it the largest cancer clinical sample database currently available. In this study, we developed a novel method called “Multi-gradient Permutation Survival Analysis” (MEMORY) with the utilization of TCGA RNA-seq data, and accessed the potential existence of GEARs across 15 cancer types, each comprising a cohort of over 200 patients. Furthermore, we also evaluated the prognostic predictive power of these GEARs and explored their potential biological functions in driving cancer progression.

Results

Multi-gradient permutation survival analysis identifies GEARs associated with mitosis and immune across multiple cancer types

GEARs are a group of genes consistently and significantly correlate with patient survival, independent of the sample size. To identify these GEARs, we developed a novel method called “Multi-gradient Permutation Survival Analysis” (MEMORY). This method allows us to assess the correlation between a specific gene and cancer patient prognosis using available transcriptomic dataset (Figure 1A, Table 1). Initially, we started with a sampling number at 10% of the cohort size initially, and gradually increased the sampling number with each 10% increment, which was analyzed with 1000 permutations. By calculating the statistical probability of each gene’s association with patient survival, we can identify a group of GEARs for further analyses.

MEMORY uncovers the enrichment of mitosis and immune signatures in multiple cancers. (A) Sample sizes, ranging from 10% to 100% with 10% intervals (“gradient”), were used for 1000 permutations of survival analyses of all 15 cancer types Each matrix was divided into high and low expression groups based on median gene expression values, and survival analyses were performed. Log-rank test significance results were coded as 1 for significant (P < 0.05) and 0 for non-significant (P > 0.05) outcomes, forming survival analysis matrices and the summarized significant probability matrix, which allowed the identification of GEARs. (B) The maximum of significant probability for each gradient sample size of every cancer type. Sample number rate refers to the percentage of samples in each sampling gradient compared to the total number of samples. (C) The pathways were enriched by GOEA based on the GEARs of each cancer type. The displayed pathways represent the top 5 most significant pathways for each cancer type. Mitosis-related pathways were marked in red whereas immune-related pathways were marked in blue.

In this study, we utilized the TCGA datasets for several reasons, including the diversity of cancer types, the availability of gene expression profiles, and patient prognosis information. We set a minimum cohort size of 200 and included 15 eligible cancer types for analysis. These cancer types include bladder urothelial carcinoma (BLCA), breast invasive carcinoma (BRCA), cervical squamous cell carcinoma and endocervical Adenocarcinoma (CESC), colon Adenocarcinoma (COAD), kidney renal papillary cell carcinoma (KIRC), brain lower grade glioma (LGG), liver hepatocellular carcinoma (LIHC), lung adenocarcinoma (LUAD), lung squamous cell sarcinoma (LUSC), ovarian serous cystadenocarcinoma (OV), pancreatic adenocarcinoma (PAAD), prostate adenocarcinoma (PRAD), stomach adenocarcinoma (STAD), thyroid carcinoma (THCA), and uterine corpus endometrial carcinoma (UCEC). As the number of samples increases, the survival probability of certain genes gradually approaches 1. Once this score reaches 0.8 and remained consistent with further sample gradient increase, the gene is considered a GEAR (Figure 1B). Remarkably, we successfully identified a set of GEARs across all 15 cancer types (Supplementary Figure 1-2, Table 2). The GEAR counts in most cancer types ranged from 100 to1000, with exceptions of CESC, KIRC, LGG, and PAAD with over 1000 GEARs, and THCA with only 22 GEARS (Table 2). In LUAD, the top 10 GEARs with the highest significance probabilities were TLE1, GNG7, ERO1A, ANLN, DKK1, TMEM125, S100A16, KNL1, STEAP1, and BEX4. Most of these genes are known to promote LUAD malignant progression except for S100A161826. In other cancer types, BEX4 was identified as a common GEAR in KIRC, LGG, PAAD, and STAD (Table 2). BEX4 is reported as a proto-oncogene promoting cancer onset and malignant progression in multiple cancers including LUAD, glioblastoma multiforme and oral squamous cell carcinoma 25,27,28. We also identified the most significant GEARs in individual cancer types, such as TLL1 (BLCA), PGK1 (BRCA), RFXANK (CESC), DPP7 (COAD), VWA8 (KIRC), SCMH1 (LGG), HILPDA (LIHC), TLE1 (LUAD), CD151 (LUSC), ANKRD13A (OV), SOCS2 (PAAD), DRG2 (PRAD), ADAMTS6 (STAD), PSMB8 (THCA), ASS1 (UCEC) (Table 3). Many of these genes were previously reported to associate with tumorigenesis18,2940. For example, TLE1 is known as a transcriptional repressor that promotes cell proliferation, migration, and inhibits apoptosis in LUAD 41,42. Additionally, PGK1, a key enzyme in the glycolytic process, has been shown to promote cell proliferation, migration, and invasion in multiple cancers43,44. These findings demonstrate the intricate link between the functionality of GEARs and the initiation and progression of cancer.

To gain deep insights into the biological functions of these identified GEARs, we conducted Gene Ontology Enrichment Analysis (GOEA) (Figure 1C). Interestingly, we found the mitosis-related pathways were enriched in LICH, LUAD, LGG, and PAAD, and the immune-related pathways were enriched in BRCA and UCEC45. Additionally, other cancer types exhibited enrichment in various pathways, such as organic acid metabolism pathway in KIRC, oxidative phosphorylation pathway in CESC, organ development-related pathways in LUSC, neurogenesis-related pathways in BLCA, and sustenance metabolism pathways in THCA. Given the crucial role of mitosis in cancer progression and the significance of the immune system in cancer-host interactions, we specifically focused on the mitosis and immune-related GERAs, which accounted for approximately 40% of all the analyzed cancers.

Identification of hub genes in mitosis and immune-related cancers

To further identify crucial genes within the GEARs, we conducted and extracted the higher-ranked edges to construct the core survival network (CSN) (Figure 2A, Table 4). The top 10 GEARs with highest degree in these networks were defined as hub genes 46. We then classified the samples using the hub genes derived from these networks and evaluated their clinical relevance (Supplementary Figure 3A-O, Supplementary Figure 4A-O). A certain degree of correlation with cancer stages (TNM stages) is observed in most cancer types except for COAD, LUSC and PRAD (Supplementary Figure 5A-K). Furthermore, we conducted GOEA on the hub genes selected from CSNs and found that the results were consistent with the GEAR analysis (Table 5). Specifically, the hub genes in LIHC, LGG, and LUAD were enriched in mitosis-related pathways, whereas the hub genes in BRCA and UCEC were enriched in immune-related pathways (Figure 2B). For instance, the 9 hub genes in LUAD were associated with functions related to mitosis, whereas the 8 hub genes of BRCA were associated with immune-related functions (Figure 2C-D).

Identification of hub genes in mitosis or immune-related cancers. (A) The SAS of all GEARs was calculated, and CSN was constructed. (B) The left nodes represent five different cancers, while the right nodes detail the functional types of hub-gene pathways. The height of the edges in the middle represents the proportion of hub-gene pathways corresponding to a specified functional type. (C-D) The CSN of LUAD and BRCA. The enlarged section represents the hub gene. (E) LUAD clustering based on hub gene expression, compared with inhibitor-based classification52. (F-G) LUAD clustering based on hub gene expression in comparison with the classic classification.

Furthermore, we conducted survival-dependent analyses on the hub genes of LGG, LIHC, and LUAD. These analyses revealed that mitosis-related hub genes are closely associated with cancer cell viability, especially those hub genes that are correlated with multiple cancer types such as CDC20, TOP2A, BIRC5 and TPX24750 (Supplementary Figure 6A-C). The importance of these genes is well established. For instance, TPX2, a hub gene in all three cancers, is known to play a crucial role in normal spindle assembly during mitosis and is essential for cell proliferation51. The significance of these hub genes for the survival of cancer cells suggests that the expression levels of these hub genes can be used to screen for inhibitors of tumor cell growth. By integrating the Genomics of Drug Sensitivity in Cancer (GDSC) and Connectivity Map (cMAP) databases, we identified 76 individual compounds that are able to effectively suppress the expression of the 10 hub genes in LUAD cell lines (Figure 2E, Supplementary Figure 6D-E). These compounds also significantly inhibit LUAD cell growth and may serve as potential therapeutic agents for the treatment of LUAD.

We then analyzed the LUAD dataset, which had a larger sample size compared to LGG and LIHC. Based on the expression of the hub genes, we classified the LUAD samples into three subgroups: mitosis low (ML), mitosis medium (MM), and mitosis high (MH) (Figure 2F, Table 6). Previous study has reported three classic subgroups of LUAD known as the terminal respiratory unit (TRU), the proximal-proliferative (PP), and the proximal-inflammatory (PI) subgroups52. Our analysis revealed that the ML subgroup was primarily enriched with TRU subgroup and a small number of samples from the PP subgroup, but not the PI subgroup. The MH subgroup showed high expression of mitosis-related genes and predominantly encompassed PI and PP subgroups, whereas the MM group exhibited intermediate levels of mitosis-related gene expression (Figure 2F-G). This suggests that the new categorization method can help identify new factors that influence patient prognosis (Supplementary Figure 6F-G).

Distinct genetic mutation landscapes characterize different clusters of LUAD

The expression of hub genes provided crucial indicators for distinguishing various subgroups of LUAD. However, the underlying mechanisms driving these variations in expression patterns remain unknown. In our subsequent research, we aimed to explore the genomic differences, particularly in terms of gene mutations, among different LUAD subgroups. Initially, we analyzed the variations of genetic mutation landscape of different LUAD subgroups by assessing the tumor mutation burden (TMB). Interestingly, we found significant differences in TMB among the various subgroups of LUAD (Figure 3A). Analysis of common driver genes further revealed distinct proportions of tumors with ALK and ROS1 fusions in the favorable prognosis ML and MM subgroups, compared to tumors with mutations in KRAS, EGFR, BRAF, and ERBB2 (Figure 3B). Moreover, low tumor mitotic activity is associated with better prognosis in LUAD subgroups with EGFR mutations or pan-negative (no oncogenic alteration in genes including KRAS, EGFR, BRAF, ERBB2, PIK3CA, ALK and ROS1) but not in those with KRAS or BRAF mutations (Figure 3C-D, Supplementary Figure 6H-J). Furthermore, we observed unique mutation characteristics in each of these 3 subgroups (Figure 3E-G). For example, TP53 mutation, a prevalent tumor suppressor gene mutation, was frequently observed in the MH subgroup with a mutation rate of 73%, compared to mutation rates of 14% in the ML subgroup and 39% in the MM subgroup. Additionally, genes such as CSMD3, RP1L1, ZFHX4 exhibited similar trend in mutation frequency across the three subgroups. These findings indicate substantial genomic differences among these three LUAD subgroups based on the hub genes.

Different LUAD subgroups were characterized by unique genetic mutation profiles. (A) TMB analysis in three groups of LUAD samples. ***P < 0.0005. (B) Proportion of different oncogenic drivers including KRAS, EGFR, BRAF, ERBB2 mutations, and ALK, ROS1 fusions in three LUAD subgroups. (C) Kaplan-Meyer overall survival curves for EGFR-mutation samples of LUAD by hub gene subgroups. (D) Kaplan-Meyer overall survival curves for pan-negative LUAD samples by hub gene subgroups. (E-G) Comparison of top gene mutations in three LUAD clusters, including ML vs. MM (C), MM vs. MH (D), ML vs MH (E).

We identified gene mutations that showed significant changes through gene dependence analysis (Figure 4A). To further the functional implications of these mutations, we enriched them using a pathway system called Nested Systems in Tumors (NeST)53. The results revealed notable differences in multiple functional pathways, including androgen receptor, cell cycle, TP53, EGFR, IL-6, and PIK3CA-related pathways (Figure 4B). To gain further insights into the functional implications of these different mutations, we assessed the gene dependency of cells with these mutations using the DepMap database. Importantly, we observed that the pathway differences between the MM and MH clusters were primarily associated with PIK3CA-related pathways. Previous studies have reported that PIK3CA mutation confers resistance in colorectal cancer, lung cancer, and breast cancer5456. Therefore, we aimed to validate the role of PIK3CA mutations in drug resistance using public databases. We further analyzed A549 cells (a lung adenocarcinoma cell line with KRAS mutation) and SW1573 cells (a lung adenocarcinoma cell line with both KRAS and PIK3CA mutations) using their respective GDSC data and identified five potentially effective inhibitors (BMS-345541, Dactinomycin, Epirubicin, Irinotecan, and Topotecan) from the previously screened set of 76 LUAD cell growth inhibitors. Strikingly, SW1573 cells exhibited increased resistance to all these 5 inhibitors when compared to A549 cells (Figure 4C). In line with this, clinical data also supports the notion that concurrent PIK3CA mutation is a poor prognostic factor for LUAD patients57. These findings suggest that PIK3CA mutation may contribute to drug resistance in LUAD.

PIK3CA mutation associates with mitosis and drug resistance in LUAD. (A) Cancer-related gene mutations were annotated based on gene dependency scores data from Depmap database. Functional mutations were indicated in green; functional (subtype-associated) mutations were highlighted in red; non-functional mutations were indicated in gray. (B) The analysis of the NeST differential pathways across three groups including ML vs. MM, MM vs. MH, ML vs MH. (C) The heat map showed the hub gene expression of A549 after compounds treatment from cMAP database98. Blue annotations meant 76 compounds that can inhibit hub gene expression. (D) Comparison of the IC50 z-score of five tumor cell growth inhibitors for A549 (KRAS mutation) and SW1573 (KRAS and PIK3CA mutation).

Distinct clusters of BRCA exhibit different immune infiltration landscapes

We further focused on the immune-related cancer types, with BRCA standing out as a representative immune-related cancer. This prompted us to conduct a more comprehensive analysis (Figure 1C). We hierarchically grouped the BRCA into three distinct immune subgroups based on the expression of hub genes: Immune low (IL), Immune medium (IM), and Immune high (IH) (Figure 5A, Table 7). Interestingly, all the classifications based on the widely used PAM50 classification contained samples from the IL, IM, and IH subgroups, although there were minor differences in proportions58 (Figure 5B, Supplementary Figure 7A-B). Next, we investigated the relationship among PAM50 classification, immune subtypes, and patient prognosis. Our data revealed significant prognostic differences among immune subgroups of the LumB subtypes but not the LumA, Basal or Her2 subtype (Supplementary Figure 7C-F). Integrating our method with traditional classification may enable a more detailed stratification of breast cancer samples. Mutation analysis revealed distinct patterns of gene mutations among the three subgroups, with significant differences in mutation frequencies of genes such as TP53, CDH1, and LRP2 (Figure 5C-E). We conducted a comparison of the proportions of immune cells across these three subgroups and observed a strong association between the overall immune cell proportion and the clustering results based on hub genes (Figure 5F, Supplementary Figure 8A-J). Specifically, the IH subgroup exhibited significantly higher proportions of CD8+ T cells and Treg cells, with mean percentages at 5.0% and 6.0%, respectively, compared to 1.4% and 3.2% in the IM subgroup, and 0.3% and 1.6% in the IL subgroup. These findings suggest the presence of distinct immune responses in these three subgroups, indicating a potential association between genetic mutation pattern and the immune microenvironment landscape.

The hub gene classification of BRCA revealed a significant association between EMT and immune infiltration. (A) The hub gene expression heatmap containing the result of hub gene classification and PAM50 classification of BRCA58. (B) Comparison of molecular classification based on hub genes with PAM50 classification. (C-E) Comparison of gene mutations in three groups of BRCA samples, including IL vs. IM (C), IM vs. IH (D), IL vs. IH (E). (F) Comparison of the total immune cell rate in three BRCA clusters. ***P < 0.0005. (G) Comparison of CDH1 expression in BRCA samples with wildtype or mutant CDH1. ***P < 0.0005. (H) A schematic diagram illustrating the correlation between the EMT score and immune cell rate. The red line represents the fitting curve. The correlation analysis was performed using Pearson correlation coefficient.

Next, we explored the specific genomic factors influencing immune infiltration in BRCA. Mutation analysis revealed that CDH1 gene mutation ranked as the second most prevalent genetic alteration in BRCA samples, following TP53 mutation (Figure 5C-E). Previous study has indicated that CDH1 is involved in mechanisms regulating cell-cell adhesions, mobility, and proliferation of epithelial cells59. We found that CDH1-mutant samples exhibited significantly lower CDH1 expression compared to CDH1-wildtype samples, and CDH1 expression showed a close correlation with immune cell infiltration (Figure 5G, Supplementary Figure 8K). These results were consistent with clinical observations of CDH1 mutation and high immune infiltration in invasive lobular carcinoma of the breast 60.

We further investigated how CDH1 mutation influenced immune infiltration. In BRCA, we observed a significant association between the expression of CDH1 and the expression of EMT marker genes VIM and TWIST2 (Supplementary Figure 8L-M)61,62. This suggests that the regulation of CDH1 is intricately linked with EMT process. Through calculating the EMT score, we observed a positive correlation between the EMT score and the proportion of immune infiltration (Figure 5H). Consequently, CDH1 might influence immune infiltration through the EMT process.

Mitotic and immune signatures predict patient prognosis at pan-cancer level

We further analyzed the association of CDH1 and PIK3CA with specific biological processes at pan-cancer level. Our data showed that PIK3CA level was positively correlated with mitotic scores in most cancer types, whereas the correlation between CDH1 expression and the proportion of immune cell infiltration was positive in PRAD, LGG, OV, Uveal Melanoma (UVM), THCA, LIHC and LUAD but negative in BRCA, Testicular Germ Cell Tumors (TGCT), Thymoma (THYM), LUSC, BLCA, Head and Neck squamous cell carcinoma (HNSC), Sarcoma (SARC), Esophageal carcinoma (ESCA), PAAD and STAD (Figure 6A-B). Finally, we sought to explore the prognostic relevance of mitosis and immune by calculating their scores and assessing their correlation with patient outcomes at the pan-cancer level. The scores for these biological processes were computed using RNA-seq data from the TCGA database, and the median score was used as a cut-off to categorize patients into different groups. Out of the 33 cancer types analyzed, the prognosis of 19 cancer types showed a significant correlation with at least one of these two pathways (Figure 6C). Specifically, 10 cancer types were exclusively associated with the mitosis score, 4 cancer types were exclusively associated with the immune score, and 5 cancer types exhibited a correlation with both mitosis and immune scores simultaneously. Overall, the identification of mitosis and immune-related biological processes as significant prognostic factors at the pan-cancer level suggests their potential utility as valuable biomarkers for predicting patient prognosis.

Mitosis and immune signatures predict patient prognosis at the pan-cancer level. (A) The correlation of mitosis scores of 33 TCGA cancer types with PIK3CA expression was analyzed. (B) The correlation of immune cell infiltration rate of 33 TCGA cancer types with CDH1 expression was analyzed. (C) The mitosis and immune-related pathway scores of 33 cancer types were analyzed, and the median score was utilized as a threshold to categorize patients for survival analysis. Among the 33 cancer types examined, 10 cancer types were exclusively associated with the mitosis score, 2 cancer types were exclusively associated with the immune score, and 5 cancer types showed a correlation with both mitosis and immunity scores concurrently.

Discussion

Due to the multifaceted nature of variables affecting cancer patient prognosis, it remains uncertain whether there exist a set of genes steadily associated with cancer prognosis, regardless of sample size and other factors. Here, we utilized the MEMORY method to address this question and discovered that all the cancer types have GEARs. We observed significant variation in the number of GEARs among different cancer types, indicative of cancer type-specific patterns. The substantial heterogeneity in driver gene and mortality rates among various cancer types could potentially explain this phenomenon. For example, THCA, known for its low malignancy, displays the fewest genetic expression alterations among all the studied cancer types. This observation could be correlated with the relatively high five-year survival rate in THCA, which exceeds 50% even in advanced stage63. In contrast, PRAD, despite of a favorable prognosis similar to THCA, exhibits a significantly higher number of genetic expression alterations. We hypothesize that the positive prognosis in PRAD cases might be largely attributed to early diagnosis, which enables timely treatment for the majority of PRAD patients64. This discrepancy between the number of genetic alterations and prognosis in THCA and PRAD highlights the intricate nature of cancer genetics and emphasizes the importance of personalized considerations in cancer treatment and prognosis evaluation. Furthermore, certain genes are commonly found in the GEARs of various cancer types, indicating their potential significance in cancer development. For instance, BEX4 is present in the GEARs of LUAD, KIRC, LGG, PAAD, and STAD, and known to play oncogenic role in inducing carcinogenic aneuploid transformation via modulating the acetylation of α-tubulin65,66. Aneuploidy is a hallmark characteristic of cancer and can lead to alterations in the dosage of oncogenes or tumor suppressor genes, thereby influencing tumor initiation and progression 6770. The regulation of aneuploid transformation by BEX4 may represent a common mechanism through which this gene impacts prognosis across different cancer types. Subsequently, we discovered that GEARs across different cancers displayed distinct functional characteristics. Notably, a recurrent theme was the prominence of mitosis-related and immune-related features. Specifically, the GEARs of LGG, LIHC, and LUAD were enriched in mitosis-related pathways, whereas BRCA and UCEC showed the enrichment of immune-related pathways. Mitosis and immune processes have been widely recognized as having a significant impact on patient prognosis7175. However, current clinical guidelines do not yet recommend the use of transcriptomic data to assess scores related to mitotic and immune pathways for predicting patient outcomes, despite the well-established association between these pathways and prognosis in cancer7681. Cancers enriched with mitosis pathways often exhibit heterogeneous tumor growth kinetics across individuals, with tumor size being one of the crucial factors influencing patient survival 8284. For instance, in the case of LGG, the growth rate of tumors directly impacts the patient’s prognosis due to the primary location in the brain. Consequently, alterations in the expression levels of genes related to mitosis are often indicative of the prognosis of LGG 85. Cancer types that are enriched in immune-related pathways, such as BRCA and UCEC, are closely associated with deficiencies in DNA mismatch repair (MMR) 86,87. Cancers enriched in immune-related pathways often exhibit heterogeneous tumor growth kinetics across individuals, where tumor size is closely correlated with patient survival.

To explore the underlying differences between samples associated with distinct mitotic and immune-related pathways, we employed GEAR analysis to identify hub genes for further molecular classification and mutation analysis. Specifically, we focused on LUAD and BRCA, two representative cancer types exhibiting enriched mitosis and immune signatures in GEARs. By classifying the hub gene, we divided LUAD into three subgroups: ML, MM and MH, which displayed significant difference in survival outcomes. Subsequently, we utilized NeST to analyze the mutations within these subgroups. Notably, there were significant differences observed in cell cycle and signal transduction-related pathways among ML, MM and MH subgroups. Of particular importance, the PIK3CA-related pathway emerged as a key differentiating pathway between the MH and MM subgroups. Our analyses suggest that LUAD cell line carrying PIK3CA mutation may exhibit increased drug resistance. This aligns with previous studies demonstrating that targeting the PI3K pathway can overcome drug resistance88,89. Of course, future research is warranted to elucidate the exact functional role of PIK3CA mutations in LUAD.

To investigate the factors influencing immune infiltration in BRCA, we classified the samples based on hub genes and analyzed the differentially mutated genes. Interestingly, we found that CDH1 mutation occurred at a high frequency in the IH subgroups. This led us to speculate that CDH1 plays a crucial role in the regulation of immune infiltration in BRCA. Furthermore, previous studies have reported that CDH1 inactivation promotes immune infiltration in breast cancer60. CDH1 is a vital gene associated with EMT, and various studies have demonstrated the close relationship between EMT and the tumor immune microenvironment in different cancers90. Consistently, our results also supported the correlation between EMT score and immune infiltration in BRCA. These findings suggest that the mutations in PIK3CA and CDH1, identified through GEAR analysis, have significant impacts on cancer development and hold potential value in improving clinical therapies. This further emphasizes the importance of GEARs in understanding cancer biology and guiding treatment strategies.

Lastly, we investigated the prognostic predictive capabilities of the mitosis score and immune score at the pan-cancer level. Surprisingly, we found that approximately half of the cancer types exhibited significant correlations between these two scores and patient prognosis. Interestingly, even for cancers originating from the same primary location, their correlations with these scores could differ, indicating the potential diversity of mechanisms underlying cancer-related mortality. For instance, both LGG and GBM are brain tumors, but the primary risk factor for patient prognosis in LGG is closely linked to tumor diameter, whereas the main risk factor for GBM patients lies in its high invasiveness and challenge of surgical resection91,92.

Despite of the strong association between GEAR and patient prognosis, the edges of the CSN constructed based on GEARs was undirected. As a result, the hub genes identified from GEAR in the CSN may primarily serve as stable and effective biological markers. Additionally, through multi-omics analysis, we obtained some functional mutations, but the therapeutic significance of these mutations remains to be elucidated. For example, further study is needed to understand the exact role of PIK3CA mutations in promoting tumor cell proliferation and drug resistance. Similarly, the association of CDH1 mutations with the infiltration of multiple immune cell types also warrants additional experimental investigation. In our future studies, we plan to utilize protein-protein interaction networks and pathway databases to construct a novel network based on the CSN. This approach will allow us to directly screen genes from GEARs that could potentially serve as therapeutic targets. Undoubtedly, future efforts are still required to utilize protein-protein interaction networks and pathway databases to construct a new network based on CSN.

In conclusion, our study utilized the MEMORY algorithm to identify GEARs in 15 cancer types, highlighting the significance of mitosis and immunity in cancer prognosis. Our findings demonstrate that GEARs possess substantial biological significance beyond their role as prognostic biomarkers. This study provides valuable guidance for establishing standards for survival analysis evaluation and holds potential for the development of novel therapeutic strategies.

Methods

Datasets

The gene expression profiles and clinical information of 33 cancers were obtained from the TCGA database and downloaded by the GDC data website (https://portal.gdc.cancer.gov/). All gene expression data and survival data were integrated and normalized.

Multi-gradient permutation survival analysis

Gene expression data of cancer patients were obtained from the RNA-seq expression matrix of TCGA after TPM normalization. We randomly sampled the gene expression data according to the gradient. The sampling strategy was as follows: Ten gradient increases in sample size were pre-set, ranging from approximately 10% to about 100%, with intervals of 10%. Random samples were taken from total samples of each cancer 1000 times, based on each pre-set sample size. Multiple sampling matrix was obtained after the sampling strategy was performed in each gradient. The survival analysis was performed for every sampling matrix by “survival” and “survminer”. “survival” and “survminer” are R packages for survival analysis. The survival analysis method was as follows according to the median expression value of every gene, every sampling matrix was divided into a high and low expression group and performed survival analysis by “survival” and “survminer”. We used 1 for significant survival analysis results (P < 0.05) and 0 for non-significant results (P > 0.05). The survival analysis matrices were obtained after these processes.

Construction of core survival network

The survival analysis matrices were integrated to a significant-probability matrix by a formula:

where the Aij is the value from row i and column j in the significant-probability matrix. We defined the sampling size kj reached saturation when the max value of column j was equal to 1 in a significant-probability matrix. The least value of kj was selected, and the genes with their corresponding Aij greater than 0.8 were extracted as GEAR.

We also defined survival analysis similarity (SAS) as the similarity of the effect on patient prognosis in two genes. The SAS computational formula is as follows:

Where A and B are results of 1000 survival analyses at the first sampling gradient, which are sequences of 0 and 1 of length 1000. AB means the number of events that are simultaneously 1 in both A and B. AB means the size of the union of A and B. The SAS of significant genes with total genes was calculated, and the significant survival network was constructed. Then some SAS values at the top of the rankings were extracted, and the SAS was visualized to a network by Cytoscape93. The network was named core survival network (CSN). The degree of each node was calculated, and the nodes with the top10 degrees were defined hub genes. The effect of molecular classification by hub genes is indicated that 1000 to 2000 was a range that the result of molecular classification was best. One thousand was chosen as the number of SAS. Therefore, one thousand was selected as the number of SAS values for constructed significant survival network.

Gene ontology enrichment analysis

Gene ontology has been used to classify genes based on functions. The gene functions were divided three types, includes molecular function (MF), biological process (BP), and cellular component (CC). ClusterProfiler is an R package for gene set enrichment analysis94. The gene ontology enrichment analysis (GOEA) was processed in significant gene sets and hub genes by ClusterProfiler.

Hub gene classification

Tumor samples were genotyped using RNA-seq data. The following steps outline the genotyping and clustering methods used in this study. The expression matrix of hub genes corresponding to various cancer types was extracted from the RNA-seq data. This involved filtering the RNA-seq data to retain only the expression levels of the identified hub genes. To normalize the expression data and reduce the impact of extreme values, a pseudo count of 1 was added to each expression value, and the resulting matrix M was log-transformed using the formula: M = log2(M + 1). The log-transformed matrix M was then subjected to hierarchical clustering. This clustering was performed using the Ward’s method to minimize the variance within clusters. The distance metric used was the Euclidean distance. The number of clusters was determined based on the dendrogram cut-off.

Calculation of tumor mutation burden (TMB) and differential mutation

The TMB and differential mutation gene analysis were carried out to explore the differences between different genotyped samples at the genomic level. The data are from the gene mutation data of TCGA tumor samples, and the grouping information is from the hub gene hierarchical clustering results. Maftools is an R package for the analysis of somatic variant data, which can export results in form of charts and graph95. The calculation of TMB and difference mutation analysis were used by the maftools.

Quantifying the effect of gene mutations for tumor cell viability

Gene dependence refers to the extent to which genes are essential for cell proliferation and survival. The Cancer Dependency Map (Depmap, https://depmap.org/portal/) database provides genome-wide gene dependence data for large number of tumor cell lines96. Gene mutation data of cell lines were obtained from the CCLE database (https://sites.broadinstitute.org/ccle). The genes appearing in the gene mutation data of cell lines were extracted and sorted into mutation list. Each gene in the list was then analyzed for survival-dependent differences. For each gene in the mutation list, we compared the gene dependence score (S) between cell lines with and without the gene mutation. Specifically, cell lines from the CCLE mutation dataset harboring a mutation in a given gene were classified into one group, while cell lines without the mutation constituted the control group. S for the mutation (Sm) and wild-type (Swt) cell lines were then extracted from the DepMap database.

A two-sample t-test was used to assess the statistical significance of differences between Sm and Swt. The test yielded a P-value, with an alpha level set at 0.05, to determine the significance of the difference in means. Dm was defined as gene mutation dependence, the computational formula was as follows:

Subsequently, we conducted a standardization assessment of Dm, employing the following formula:

In this study, we categorized mutations identified in cell lines into three groups based on their impact on gene dependency. Functional mutations refer to the mutations that exhibit significant changes in gene dependency scores, with a P-value < 0.05 and Sdm > 0.1. Non-functional mutations were those without significant changes in gene dependency scores compared to wildtype. Subtype-associated functional mutations were a subset of functional mutations that show statistically significant differences in occurrence frequency across different LUAD subtypes.

Identification of drugs downregulating hub gene expression

The IC50 data of 76 compounds of LUAD cell lines were obtained from Genomics of Drug Sensitivity in Cancer (GDSC) database97. The hub gene expression data of A549 cell lines after treatment with 76 compounds was obtained from Connectivity MAP (cMAP, https://clue.io/) database, and the data were grouped by hierarchical clustering98. The drugs which were presented only in the hub gene expression suppression group were considered to effective against the LUAD cell.

Immune infiltration analysis

Immune infiltration data was calculated by quantiseq method99. Immunedeconv was an R package for unified access to computational methods for estimating immune Cell fractions from bulk RNA-seq data100. The data were from the gene expression data of TCGA tumor samples after TPM was standardized. Then the immune cell fractions of tumor samples were calculated by quantiseq method invoking immunedeconv.

Biological function score calculation

GSVA is an R package for calculate biological function score based on a single sample101. The data was from TCGA in the calculation process, and the software package was GSVA. The immune score was T-cell infiltration rate of that calculated by quantiseq 99. The gene sets that calculated the mitosis score were obtained from gene ontology (GO:0140014).

Acknowledgements

This work was supported by the National Key Research and Development Program of China (grants 2022YFA1103900 to H.J., F.L.; 2020YFA0803300 to H.J.); the National Natural Science Foundation of China (grants 82341002 to H.J., 32293192 to H.J., 82030083 to H.J., 82173340 to L.H., 82273400 to Y.J., 32100593 to X.T., 82203306 to W.H., 82372763 to X.W., 82303916 to C.G., 82303039 to Z.Q., 82141101 to F.L., 2022hwyq16 to F.L., 82372794 to F.L.); the Basic Frontier Scientific Research Program of Chinese Academy of Science (ZDBS-LY-SM006 to H.J.); the Innovative research team of high-level local universities in Shanghai (SSMU-ZLCX20180500 to H.J.); Science and Technology Commission of Shanghai Municipality (21ZR1470300 to L.H.).

Author Contributions

H.J. and F.L. conceived the idea and designed the analysis. X.C. performed all bioinformatics analyses. Y.Y., X.L. and L.C. provided technical assistance and helpful comments. H.J., F.L. and X.C. wrote the manuscript. All authors approved the final version.

Disclosure of conflicts of interest

The authors declare that there is no conflict of interests.