Proteogenomic profiling and mutational signatures in XWLC

a. Four cohort datasets used in this study: XWLC (Lung adenocarcinoma from non-smoking females in Xuanwei area), CNLC (subset of lung adenocarcinoma from non-smoking patients in Chinese Human Proteome Project), TSLC (subset of lung adenocarcinoma from smoking females in TCGA-LUAD project), TNLC (subset of lung adenocarcinoma from non-smoking females in TCGA-LUAD project); b. Age distribution of patients at the time of operation in the four cohorts; c. Distribution of tumor stages across the cohorts; d. Data availability for the XWLC datasets. Each bar represents a sample, with orange bars indicating data availability and gray bars indicating data unavailability; e. Summary of data generated from the XWLC cohort; f-i. Mutational signatures identified in XWLC (f), TSLC (g), TNLC (h), and CNLC (i) cohorts. Cosine similarity analysis of the signatures compared to well-established COMIC signatures (in green) and Kucab et al. signatures (in red). Contribution of signatures in each cohort provided on the right; j. Protein abundance of CYP1A1 in tumor and normal samples within the XWLC cohort; k. Expression levels of the AhR gene in tumor and normal samples within the XWLC cohort; l. Comparison of serum BPDE content in individuals from the Xuanwei area, categorized by young cases and older cases. m. western blotting of EGFR-Y1173, EGFR-Y1068 and EGFR abundance in hiPSC (Control), BaP+S9 treated hiPSC (BaP+S9), and S9 treated hiPSC (S9). Liver S9 is a variety of biological sources represent the post-mitochondrial supernatant fraction from homogenized liver and is known to be a rich source of drug metabolizing enzymes including P-450. Two-tailed Wilcoxon rank sum test used to calculate p-values in j-l.

Genomic and genetic features in XWLC cohort

a. Correlation of genomic mutations among cohorts, determined using the Pearson correlation coefficient; b. Comparison of oncogenic pathways affected by mutations in each cohort; c. Comparison of mutation frequency of four key genes across cohorts; d. Lollipop plot illustrating differences in mutational sites within EGFR (left) and TP53 (right) across XWLC/CNLC, XWLC/TSLC, and XWLC/TNLC pairs; e. Analysis of the percentage of samples with actionable alterations, with a focus on significant variations between XWLC and TSLC cohorts, highlighted by black boxes.

EGFR-G719X in the XWLC cohort

a. Distribution of different EGFR mutation statuses across the four cohorts; b. Comparison of the fraction of G719X mutations across the four cohorts; c. Detailed information on pG719X (pG719/A/D/C/S) mutations in the XWLC cohort. The number of each mutation type is labeled; d. Distribution of nucleotide pairs surrounding the most common G>T transversion site in the XWLC cohort. The x-axis represents the immediate bases surrounding the mutated base. For BaP, the tallest G>T peak occurs at GpGpG; e-h. Comparison of activation levels of key components in the MAPK pathway across different EGFR mutation statuses in the XWLC cohort; i. Comparison of patient ages across different EGFR mutation statuses in the XWLC cohort; j-k. Presentation of overall survival (OS, j) and progression-free interval (PFI, k) analysis across different EGFR mutation statuses in the TCGA-LUAD cohort; l. Evaluation of kinase activities by KSEA in tumors across different EGFR mutation statuses in the XWLC cohort.

Subtyping of XWLC

a. Integrative classification of tumor samples into four ConsensusClusterPlus-derived clusters (MC-I to MC-IV). The heatmap displays the top 50 features, including mRNA transcripts, proteins, and phosphoproteins, for each multi-omic cluster. The features are annotated with representative pathways or genes. If a cluster has fewer than 50 features, all features are shown. If no significant GO biological processes are associated with cluster features, all features are displayed; b. Comparison of overall survival between MC-IV and the other three subtypes; c-d. Protein abundance comparison of CYCP1A1 (c) and AHR (d) across subtypes; e. Protein-level correlation between CYCP1A1 and EGFR; f. Protein-level comparison of EGFR across subtypes; g. Evaluation of kinase activities by KSEA in tumors across subtypes in the XWLC cohort.

Biological and immune features across MC subtypes

a. Relative expression of epithelial-mesenchymal transition (EMT) markers across subtypes; b. EMT scores across subtypes using a gene set derived from MsigDB (M5930); c-d. Protein abundance comparison of FN1 (c) and β-Catenin (d) across subtypes; e. Protein and phosphoprotein levels of key cell cycle kinases across subtypes; f. Expression of mRNA and protein levels of glycolysis-associated enzymes; g. mRNA expression of VEGFA across subtypes; h. Phosphoprotein abundance of VEGFR1 across subtypes; i. Angiogenesis score across subtypes using a gene set derived from MsigDB (Systematic name M5944); j. Expression comparison of key regulators of the Notch pathway across subtypes; k. Metabolism-associated hallmarks across subtypes. Gene sets for oxidative phosphorylation, peroxisome, adipogenesis, fatty acid metabolism, and xenobiotic metabolism were derived from MsigDB hallmark gene sets; l. Expression of PD-1 signaling-associated genes across subtypes. PD-1 signaling-associated genes were derived from MsigDB (Systematic name M18810); m. Immune cell infiltration across subtypes. Gene sets for each immune cell type were derived from a previous study[77]; n. Expression of anti-tumor/pro-tumor lymphocyte receptors and ligands across subtypes. The two-tailed Wilcoxon rank sum test was used to calculate p-values in panels b, c, d, g, h, i, and m.

Microbiota composition and radiomic features across subtypes.

a. Abundance of bacterial sequencing reads per million (RPM) across tumor molecular subtypes and adjacent normal tissues. Significance was determined using a paired two-sided Wilcoxon rank sum test; b. Principal coordinates analysis (PCoA) plot of mRNA data for XWLC samples shows significant variation between tumor and adjacent normal samples along the first axis of variation, and variation between MC-IV subtypes and the other three tumor subtypes along the second axis; c. Heatmap of bacterial species’ abundance highly expressed in the MC-IV subtype compared to the other three subtypes in the XWLC cohort. Bacterial species labeled green prefer an acidic living environment; d. Eight features showing significant differences between MC-II and the other three subtypes. The Wilcoxon rank sum test was used to calculate the p-values; e. A receiver operating characteristic (ROC) curve was used to evaluate the performance of the radiomic signature in distinguishing MC-II from the other three subtypes; f. Confusion matrix allows visualization of the performance of the algorithm in separating MC-II from other subtypes.

Identification of novel targets in XWLC

a. Flow chart showing the integration of mutation-informed PPI analysis, molecular dynamic simulation and experiment validation to identify novel targets; b. Network visualization of XWLC_oncoPPIs. Edge thickness represents the number of missense mutations at the protein-protein interaction (PPI) interface, while node size indicates connectivity; c. MAD1-MAD2 interaction model and the p.Arg558His mutation at the interface (left). The complex model was generated using Zdock protein docking simulation. The right distribution showing root-mean-squared deviation (RMSD) during a 20 ns molecular dynamics simulation of MAD1 wild type vs. MAD1 p.Arg558His in the complex; d. Model showing the p.His550Gln alteration within the TPRN-PPP1CA complex (left). The right distribution showing root-mean-squared deviation (RMSD) during a 20 ns molecular dynamics simulation for TPRN wild type vs. TPRN p.His550Gln (H550Q) in the complex; e. Survival analysis of TPRN mutation group and unaltered group derived from cbioportal using TCGA-LUAD cohort (https://www.cbioportal.org/); f. CCK8 assay for TPRN-MT, TPRN-WT, and CK cell lines in A549 cells which was transfected by mutant TPRN, wild-type and empty vector, respectively. g. Transwell assay for TPRN-MT, TPRN-WT, and CK after 24h and 36h in A549 cells. Magnification was set to 40x; h.Bar chart showing the statistical results of transwell assay; i. Cell colon assay for TPRN-MT, TPRN-WT and CK in A549 and H1299 cell line. The two-tailed Wilcoxon rank sum test was used to calculate p-values in f and h. *.p<0.05; **,p <0.01; ***, p<0.001;