| Proteogenomic profiling and mutational signatures in XWLC

a. Four cohort datasets used in this study: XWLC (Lung adenocarcinoma from non-smoking females in Xuanwei area), CNLC (subset of lung adenocarcinoma from non-smoking patients in Chinese Human Proteome Project), TSLC (subset of lung adenocarcinoma from smoking females in TCGA-LUAD project), TNLC (subset of lung adenocarcinoma from non-smoking females in TCGA-LUAD project); b. Age distribution of patients at the time of operation in the four cohorts; c. Distribution of tumor stages across the cohorts; d. Data availability for the XWLC datasets. Each bar represents a sample, with orange bars indicating data availability and gray bars indicating data unavailability. T, tumor sample. N, Normal tissue; e. Summary of data generated from the XWLC cohort; f-i. Mutational signatures identified in XWLC (f), TSLC (g), TNLC (h), and CNLC (i) cohorts. Cosine similarity analysis of the signatures compared to well-established COMIC signatures (in green) and Kucab et al. signatures (in red). Contribution of signatures in each cohort provided on the right; j-k. Protein abundance of CYP1A1 (j) and AhR (k) in tumor and normal samples within the XWLC cohort; Two-tailed Wilcoxon rank sum test used to calculate p-values in j-k.

| Genomic and genetic features in XWLC cohort

a. Correlation of genomic mutations among cohorts, determined using the Pearson correlation coefficient; b. Comparison of oncogenic pathways affected by mutations in each cohort; c. Comparison of mutation frequency of four key genes across cohorts; d. Lollipop plot illustrating differences in mutational sites within EGFR (left) and TP53 (right) across XWLC/CNLC, XWLC/TSLC, and XWLC/TNLC pairs; e. Analysis of the percentage of samples with actionable alterations, with a focus on significant variations between XWLC and TSLC cohorts, highlighted by black boxes.

| EGFR-G719X in the XWLC cohort

a. Distribution of different EGFR mutation statuses across the four cohorts; b. Comparison of the fraction of G719X mutations across the four cohorts. Two-sided Fisher’s test was used to calculate p values; c. Detailed information on pG719X (pG719/A/D/C/S) mutations in the XWLC cohort. The number of each mutation type is labeled; d. Distribution of nucleotide pairs surrounding the most common G>T transversion site in the XWLC cohort. The x-axis represents the immediate bases surrounding the mutated base. For BaP, the tallest G>T peak occurs at GpGpG; e-h. Comparison of activation levels of key components in the MAPK pathway across different EGFR mutation statuses in the XWLC cohor. N, number of tumor samples containing corresponding EGFR mutation; i. Comparison of patient ages across different EGFR mutation statuses in the XWLC cohort, N, number of tumor samples containing corresponding EGFR mutation; j-k. Presentation of overall survival (OS, j) and progression-free interval (PFI, k) analysis across different EGFR mutation statuses in the TCGA-LUAD cohort, Logrank test was used to calculate p values; l. Evaluation of kinase activities by KSEA in tumors across different EGFR mutation statuses in the XWLC cohort. The two-tailed Wilcoxon rank sum test was used to calculate p-values in panels e-i.

| Subtyping of XWLC

a. Integrative classification of tumor samples into four ConsensusClusterPlus-derived clusters (MC-I to MC-IV). The heatmap displays the top 50 features, including mRNA transcripts, proteins, and phosphoproteins, for each multi-omic cluster. The features are annotated with representative pathways or genes. If a cluster has fewer than 50 features, all features are shown. If no significant GO biological processes are associated with cluster features, all features are displayed; b. Comparison of overall survival between MC-IV and the other three subtypes; c. Protein abundance comparison of CYCP1A1 across subtypes; d. Protein-level correlation between CYCP1A1 and EGFR; e. Protein-level comparison of EGFR across subtypes; f. Evaluation of kinase activities by KSEA in tumors across subtypes in the XWLC cohort. The two-tailed Wilcoxon rank sum test was used to calculate p-values in panels c and e.

| Biological and immune features across MC subtypes

a. Relative expression of epithelial-mesenchymal transition (EMT) markers across subtypes; b. EMT scores across subtypes using a gene set derived from MsigDB (M5930); c-d. Protein abundance comparison of FN1 (c) and β-Catenin (d) across subtypes; e. Protein and phosphoprotein levels of key cell cycle kinases across subtypes; f. Expression of mRNA and protein levels of glycolysis-associated enzymes; g. mRNA expression of VEGFA across subtypes; h. Phosphoprotein abundance of VEGFR1 across subtypes; i. Angiogenesis score across subtypes using a gene set derived from MsigDB (Systematic name M5944); j. Expression comparison of key regulators of the Notch pathway across subtypes; k. Metabolism-associated hallmarks across subtypes. Gene sets for oxidative phosphorylation, peroxisome, adipogenesis, fatty acid metabolism, and xenobiotic metabolism were derived from MsigDB hallmark gene sets; l. Expression of PD-1 signaling-associated genes across subtypes. PD-1 signaling-associated genes were derived from MsigDB (Systematic name M18810); m. Immune cell infiltration across subtypes. Gene sets for each immune cell type were derived from a previous study[73]; n. Expression of anti-tumor/pro-tumor lymphocyte receptors and ligands across subtypes. The two-tailed Wilcoxon rank sum test was used to calculate p-values in panels b, c, d, g, h, i, and m.

| Radiomic features across subtypes.

a. Eight features showing significant differences between MC-II and the other three subtypes. The Wilcoxon rank sum test was used to calculate the p-values; b. A receiver operating characteristic (ROC) curve was used to evaluate the performance of the radiomic signature in distinguishing MC-II from the other three subtypes; c. Confusion matrix allows visualization of the performance of the algorithm in separating MC-II from other subtypes.

| Identification of novel targets in XWLC

a. Flow chart showing the integration of mutation-informed PPI analysis, molecular dynamic simulation and experiment validation to identify novel targets; b. Network visualization of XWLC_oncoPPIs. Edge thickness represents the number of missense mutations at the protein-protein interaction (PPI) interface, while node size indicates connectivity; c. MAD1-MAD2 interaction model and the p.Arg558His mutation at the interface (left). The complex model was generated using Zdock protein docking simulation. The right distribution showing root-mean-squared deviation (RMSD) during a 20 ns molecular dynamics simulation of MAD1 wild type vs. MAD1 p.Arg558His in the complex; d. Model showing the p.His550Gln alteration within the TPRN-PPP1CA complex (left). The right distribution showing root-mean-squared deviation (RMSD) during a 20 ns molecular dynamics simulation for TPRN wild type vs. TPRN p.His550Gln (H550Q) in the complex; e. Survival analysis of TPRN mutation group and unaltered group derived from cbioportal using TCGA-LUAD cohort (https://www.cbioportal.org/); f. CCK8 assay for empty vector (EV), TPRN-WT and TPRN-MT cell lines in A549 cells which was transfected by EV, TPRN-WT and TPRN-MT, respectively. g. Transwell assay for EV, TPRN-WT and TPRN-MT after 24h and 36h in A549 cells. Magnification was set to 40x; h. Bar chart showing the statistical results of transwell assay; i. Cell colony assay for EV, TPRN-WT and TPRN-MT in A549 and H1299 cell line. The two-tailed Wilcoxon rank sum test was used to calculate p-values in f and h. *, p<0.05; **, p <0.01; ***, p<0.001;