Flowchart for constructing a predictive model for NPC radiotherapy sensitivity

Differentially expressed genes obtained from local NPC transcriptome data, grouped according to radiosensitivity and radioresistance, were used to predict radiotherapy sensitivity scores (NPC-RSS) of NPC patients using 12 machine learning algorithms, including Lasso, Ridge, Enet, Stepglm, SVM, glmBoost, LDA, plsRglm, RandomForest, GBM, XGBoost, and NaiveBayes. Additionally, 48 other combinations of validated frameworks were constructed to predict the radiotherapy sensitivity score (NPC-RSS) of nasopharyngeal carcinoma patients. The most effective NPC-RSS was finally constructed based on the combination of glmBoost+NaiveBayes, which yielded the best AUC. The role and biological significance of NPC-RSS in NPC radiotherapy sensitivity were comprehensively explored through tumor immune microenvironment analysis, pathway enrichment analysis, and single-cell transcriptomic analysis.

Consensus NPC-RSS construction and validation using an integrated machine learning approach

(A) Based on 48 combined validation frameworks, including 12 machine learning algorithms (Lasso, Ridge, Enet, Stepglm, SVM, glmBoost, LDA, plsRglm, RandomForest, GBM, XGBoost, and NaiveBayes), the area under the curve (AUC) of each model was calculated for both the training and validation sets. Heatmaps were generated based on the sample-weighted AUC values. (B) Demonstration of NPC-RSS genes with absolute values of weight coefficients greater than 10, based on the glmBoost and NaiveBayes combination. (C) Receiver operating characteristic (ROC) curves for the training and validation sets. (D) Volcano plot depicting differentially expressed genes between the sensitive and resistant groups of the CNE-2 cell line. (E) Timeline of resistance strain induction in the CNE-2 cell line. (F) Expression of the top 5 weighted genes in the NPC-RSS signature for the CNE-2 cell line. (G) Heatmap of differentially expressed genes between the sensitive and resistant groups of the CNE-2 cell line. (H) Heatmap of z-scores for NPC-RSS genes in the CNE-2 cell line. (I) Analysis of differences in sensitivity scores among CNE-2 cell lines.

Annotation analysis of in-house NPC cohorts based on NPC-RSS predictive grouping with immune-related features. (A) Comparison of immune cell infiltration between NPC-sensitive and resistant tissue groups. *P < 0.05, P < 0.01. (B) Bubble plot depicting the correlation between the top 5 weight-ranked gene features in the NPC-RSS (SMARCA2, DMC1, CD9, PSG4, KNG1) and tumor-immune-infiltrating cells in the radiotherapy-sensitive group. Bubble size represents the proximity of the P-value to zero, with orange and blue colors indicating the strength of positive and negative correlations, respectively. (C) Analysis of interactions among 22 different immune cell types in patients from the NPC-sensitive group. *P < 0.05, P < 0.01, *P < 0.001. (D) Correlation analysis of the top 5 weight-ranked genes in the NPC-RSS (SMARCA2, DMC1, CD9, PSG4, KNG1) with functionally diverse immune genes in the radiotherapy-sensitive group. *P < 0.05, P < 0.01, ***P < 0.001.

Biological characterization of key NPC-RSS genes at the single-cell level. (A) Clustered UMAP plot of three radiotherapy-sensitive and one radiotherapy-resistant sample with a total of 28,957 cells. Each color represents a cellular subpopulation (see cellular subpopulation annotations on the right). (B) Myeloid cells, epithelial cells, fibroblasts, and mast cells were significantly more abundant in samples from the radiotherapy-sensitive group compared to the radiotherapy-resistant group. (C) NPC-RSS scores displayed on all cells, with redder colors indicating higher scores. (D) NPC-RSS model gene expression in all cell subpopulations. Redder colors indicate higher expression, while bluer colors indicate lower expression. (E) Histogram showing the percentage of seven cell subpopulations in the radiotherapy-sensitive and radiotherapy-resistant groups. Different colors indicate different cell subpopulations. (F) Expression of marker genes used to annotate various cell subpopulations.

GSVA and GSEA analyses of NPC-RSS key genes and their correlation with radiosensitization genes. (A) GSVA of SMARCA2. (B) GSVA of DMC1. (C) GSEA of SMARCA2. (D) GSEA of DMC1. (E) Pearson correlation bubble plot of the top five NPC-RSS genes (SMARCA2, DMC1, CD9, PSG4, and KNG1) with radiosensitization-related genes. The larger the bubble, the closer the p-value is to zero; the more orange the color, the stronger the positive correlation; the more green the color, the stronger the negative correlation.