TPGs were screened out with GSE53786 dataset.

(A) The range (17.2 –67.4%) of tumor purity of samples in GSE53786. (B) The heatmap showing genes defined as TPGs. (C) The K–M curve showed high tumor purity was correlated with poor prognosis in DLBCL patients in GSE53678 dataset (Patients were divided into two groups according to the best-cutoff provided by “survminer” package in R). (D) GO analysis of TPGs. (E) KEGG analysis of TPGs. TPGs, tumor purity-related genes; K–M, Kaplan-Meier; DLBCL, diffuse large B cell lymphoma; GO, gene ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes.

The key gene candidates used for constructing prognostic model were selected.

(A) The PPI network of TPGs (orange nodes representing genes positively correlated with tumor purity, and green nodes representing genes negatively correlated with tumor purity). (B) The barplot showing hub genes with five or more interactive genes. (C) The forest plot showing prognostic TPGs of DLBCL patients in GSE53786. (D) The venn plot showing intersection genes of PPI hub gene and prognostic TPGs. PPI, protein-protein interaction.

TPGs signature prognostic model was constructed.

(A) Three genes enrolled in the prognostic model. (B) The patients in GSE53786 dataset were divided into high and low-risk group according to the median riskScore based on the prognostic model. (C) High-risk group had worse prognosis than low-risk group. (D) The heapmap showing expression discrepancy of the three genes. (E) Survival analysis revealed that high-risk group had poor prognosis in GSE53786 dataset. (F) The ROC curve showed that the prognostic model performed well in predicting 1-year, 3-year and 5-year prognosis in GSE53786 dataset. (G) Tumor purity was positively correlated with riskScore in GSE53786 dataset. ROC, receiver operating characteristic.

The riskScore of three TPGs signature prognostic model was an independent prognostic factor in DLBCL patients.

(A) Survival analysis results in GSE32918 dataset was consistent to that of GSE53786 dataset. (B) The prognostic model also did well in GSE32918 dataset. (C) High-risk group in GSE53786 dataset contained more ABC type DLBCL, while low-risk group contained more GCB type DLBCL. (D) The ECOG performance of two groups (GSE53786 dataset) showed no statistical difference. (E) More patients in high-risk group were at Stage III or Stage IV, and less patients were at Stage I, compared with low-risk group, although no statistical significance was shown (GSE53786 dataset). (F) High-risk group had higher LDH ratio (GSE53786 dataset). (G) The riskScore was associated with poor prognosis of DLBCL patients in GSE53786 dataset. (H) The riskSocre was an independent prognostic factor for DLBCL patient. * p < 0.05, ** p < 0.01, ns, not significant. ABC, activated B cell; GCB, germinal center B cell; ECOG, Eastern Cooperative Oncology Group; LDH, lactic dehydrogenase.

The analysis of CHCAMS cohort.

(A) The representative image of VCAN staining of high and low expression groups and the survival analysis based on VCAN expression. (B) The representative image of CD3G staining of high and low CD3G+ T cells ratio groups and the survival analysis based on CD3G+ T cells ratio. (C) The representative image of C1QB staining of high and low expression group and the survival analysis based on C1QB expression. (D) The representative image of CD68 staining of high and low CD68+ macrophages ratio groups and the survival analysis based on CD68+ macrophages ratio. (E) The representative image of CD8 staining of high and low CD8+ T cells ratio groups and the survival analysis based on CD8+ T cells ratio. (F) The representative image of CD4 staining of high and low CD4+ T cells ratio groups and the survival analysis based on CD4+ T cells ratio. (G) The correlation between VCAN, CD3G+ T cells ration, C1QB and CD68+ macrophages, CD8+ T cells and CD4+ T cells ratio. “×” means no statistical significance. (H–I) GSEA analysis based on the differentially expressed genes between high-risk and low-risk group in GSE53786 and GSE32918. (J) The CD3G+ T cells infiltration varied from colon to testis originating DLBCL in male. (K) The VCAN expression level was different between intra- and extra-lymph node DLBCL.

Drug sensitivity prediction revealed therapeutic candidates for high-risk group.

(A) Drug sensitivity prediction results with statistical significance in GSE53786 dataset. (B) Drug sensitivity prediction results with statistical significance in GSE53786 dataset. According to “oncoPredict” algorithm, sensitivity score indicates IC50 of drugs, with higher sensitivity score indicating lower sensitivity.