A multi-gene predictive model for the radiation sensitivity of nasopharyngeal carcinoma based on machine learning

  1. Kailai Li
  2. Junyi Liang
  3. Nan Li
  4. Jianbo Fang
  5. Xinyi Zhou
  6. Jian Zhang  Is a corresponding author
  7. Anqi Lin  Is a corresponding author
  8. Peng Luo  Is a corresponding author
  9. Hui Meng  Is a corresponding author
  1. Department of Oncology, Zhujiang Hospital, Southern Medical University, China
  2. Department of Radiation Oncology, Nanfang Hospital, Southern Medical University, China
5 figures and 5 additional files

Figures

Flowchart for constructing a predictive model for NPC radiotherapy sensitivity.

Differentially expressed genes obtained from local NPC transcriptome data, grouped according to radiosensitivity and radioresistance, were used to predict radiotherapy sensitivity scores (NPC-RSS) of NPC patients using 12 machine learning algorithms, including Lasso, Ridge, Enet, Stepglm, SVM, glmBoost, LDA, plsRglm, RandomForest, GBM, XGBoost, and NaiveBayes. Additionally, 48 other combinations of validated frameworks were constructed to predict the radiotherapy sensitivity score (NPC-RSS) of nasopharyngeal carcinoma patients. The most effective NPC-RSS was finally constructed based on the combination of glmBoost + NaiveBayes, which yielded the best AUC. The role and biological significance of NPC-RSS in NPC radiotherapy sensitivity were comprehensively explored through tumor immune microenvironment analysis, pathway enrichment analysis, and single-cell transcriptomic analysis.

Figure 2 with 1 supplement
Consensus NPC-RSS construction and validation using an integrated machine learning approach.

(A) Based on 48 combined validation frameworks, including 12 machine learning algorithms (Lasso, Ridge, Enet, Stepglm, SVM, glmBoost, LDA, plsRglm, RandomForest, GBM, XGBoost, and NaiveBayes), the area under the curve (AUC) of each model was calculated for both the training (n=34) and validation sets (n=20). Heatmaps were generated based on the sample-weighted AUC values. (B) Demonstration of NPC-RSS genes with absolute values of weight coefficients greater than 10, based on the glmBoost and NaiveBayes combination. (C) Receiver operating characteristic (ROC) curves for the training and validation sets. (D) Volcano plot depicting differentially expressed genes between the sensitive and resistant groups of the CNE-2 cell line. (E) Timeline of resistance strain induction in the CNE-2 cell line. Expression of the top 5 weighted genes in the NPC-RSS signature for the CNE-2 cell line. Data are presented as mean ± SD (n=3 biological replicates). Statistical significance was determined by Student's t-test (*p < 0.05, **p < 0.01,***p<0.001). (G) Heatmap of differentially expressed genes between the sensitive and resistant groups of the CNE-2 cell line. (H) Heatmap of z-scores for NPC-RSS genes in the CNE-2 cell line. (I) Analysis of differences in sensitivity scores among CNE-2 cell lines. Data are presented as mean ± SD (n=3 biological replicates). *p < 0.05, **p < 0.01,***p<0.001.

Figure 2—figure supplement 1
Expression of 18 model genes grouped according to NPC-RSS in our center's transcriptome data.

Data are presented as mean ± SD (n=34). Statistical significance was determined by Student's t-test (*p < 0.05, **p < 0.01, ***p < 0.001).

Annotation analysis of in-house NPC cohorts based on NPC-RSS predictive grouping with immune-related features.

(A) Comparison of immune cell infiltration between NPC-sensitive and resistant tissue groups (n=34). *p<0.05, p<0.01,***p<0.001. (B) Bubble plot depicting the correlation between the top 5 weight-ranked gene features in the NPC-RSS (SMARCA2, DMC1, CD9, PSG4, KNG1) and tumor-immune-infiltrating cells in the radiotherapy-sensitive group. Bubble size represents the proximity of the p-value to zero, with orange and blue colors indicating the strength of positive and negative correlations, respectively. (C) Analysis of interactions among 22 different immune cell types in patients from the NPC-sensitive group. *p<0.05, p<0.01, *p<0.001. (D) Correlation analysis of the top 5 weight-ranked genes in the NPC-RSS (SMARCA2, DMC1, CD9, PSG4, KNG1) with functionally diverse immune genes in the radiotherapy-sensitive group. *p<0.05, p<0.01, ***p<0.001.

Biological characterization of key NPC-RSS genes at the single-cell level.

(A) Clustered UMAP plot of three radiotherapy-sensitive and one radiotherapy-resistant sample with a total of 28,957 cells (n=4). Each color represents a cellular subpopulation (see cellular subpopulation annotations on the right). (B) Myeloid cells, epithelial cells, fibroblasts, and mast cells were significantly more abundant in samples from the radiotherapy-sensitive group compared to the radiotherapy-resistant group. (C) NPC-RSS scores displayed on all cells, with redder colors indicating higher scores. (D) NPC-RSS model gene expression in all cell subpopulations. Redder colors indicate higher expression, while bluer colors indicate lower expression. (E) Histogram showing the percentage of seven cell subpopulations in the radiotherapy-sensitive and radiotherapy-resistant groups. Different colors indicate different cell subpopulations. (F) Expression of marker genes used to annotate various cell subpopulations.

Figure 5 with 1 supplement
GSVA and GSEA analyses of NPC-RSS key genes and their correlation with radiosensitization genes.

(A) GSVA of SMARCA2. (B) GSVA of DMC1. (C) GSEA of SMARCA2. (D) GSEA of DMC1. (E) Pearson correlation bubble plot of the top five NPC-RSS genes (SMARCA2, DMC1, CD9, PSG4, and KNG1) with radiosensitization-related genes. The larger the bubble, the closer the p-value is to zero; the more orange the color, the stronger the positive correlation; the more green the color, the stronger the negative correlation.

Figure 5—figure supplement 1
Differences in the expression of radiotherapy sensitivity-related genes in NPC radiotherapy-sensitive and resistant groups.

Data are presented as mean ± SD (n=34). Statistical significance was determined by Student's t-test (*p < 0.05, **p < 0.01, ***p < 0.001).

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Kailai Li
  2. Junyi Liang
  3. Nan Li
  4. Jianbo Fang
  5. Xinyi Zhou
  6. Jian Zhang
  7. Anqi Lin
  8. Peng Luo
  9. Hui Meng
(2025)
A multi-gene predictive model for the radiation sensitivity of nasopharyngeal carcinoma based on machine learning
eLife 13:RP99849.
https://doi.org/10.7554/eLife.99849.3