Proteogenomic analysis of air-pollution-associated lung cancer reveals prevention and therapeutic opportunities

  1. Center for scientific research, Yunnan University of Chinese Medicine, Kunming, Yunnan, 650500, China;
  2. Department of Thoracic Surgery II, Third Affiliated Hospital of Kunming Medical University, Yunnan Cancer Hospital, Kunming 650106, China;
  3. Department of Nuclear Medicine, Third Affiliated Hospital of Kunming Medical University, Yunnan Cancer Hospital, Kunming 650106, China;
  4. Department of Oncology, Qujing First People’s Hospital, Qujing, 202150, China;
  5. Department of Plastic Surgery, First Affiliated Hospital of Kunming Medical University, Kunming 650032, China;
  6. Department of Nephrology, Institutes for Systems Genetics, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu 610041, China

Peer review process

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Robert Sobol
    Brown University, Providence, United States of America
  • Senior Editor
    Wafik El-Deiry
    Brown University, Providence, United States of America

Reviewer #1 (Public Review):

Summary:

This is a well-written and detailed manuscript showing important results on the molecular profile of 4 different cohorts of female patients with lung cancer.

The authors conducted comprehensive multi-omic profiling of air-pollution-associated LUAD to study the roles of the air pollutant BaP. Utilizing multi-omic clustering and mutation-informed interface analysis, potential novel therapeutic strategies were identified.

Strengths:

The authors used several different methods to identify potential novel targets for therapeutic interventions.

Weaknesses:

Statistical test results need to be provided in comparisons between cohorts.

Reviewer #2 (Public Review):

Summary:

Zhang et al. performed a proteogenomic analysis of lung adenocarcinoma (LUAD) in 169 female never-smokers from the Xuanwei area (XWLC) in China. These analyses reveal that XWLC is a distinct subtype of LUAD and that BaP is a major risk factor associated with EGFR G719X mutations found in the XWLC cohort. Four subtypes of XWLC were classified with unique features based on multi-omics data clustering.

Strengths:

The authors made great efforts in performing several large-scale proteogenomic analyses and characterizing molecular features of XWLCs. Datasets from this study will be a valuable resource to further explore the etiology and therapeutic strategies of air-pollution-associated lung cancers, particularly for XWLC.

Weaknesses:

(1) While analyzing and interpreting the datasets, however, this reviewer thinks that authors should provide more detailed procedures of (i) data processing, (ii) justification for choosing methods of various analyses, and (iii) justification of focusing on a few target gene/proteins in the datasets for further validation in the main text.

(2) Importantly, while providing the large datasets, validating key findings is minimally performed, and surprisingly there is no interrogation of XWLC drug response/efficacy based on their findings, which makes this manuscript descriptive and incomplete rather than conclusive. For example, testing the efficacy of XWLC response to afatinib combined with other drugs targeting activated kinases in EGFR G719X mutated XWLC tumors would be one way to validate their datasets and new therapeutic options.

(3) The authors found MAD1 and TPRN are novel therapeutic targets in XWLC. Are these two genes more frequently mutated in one subtype than the other 3 XWLC subtypes? How these mutations could be targeted in patients?

(4) In Figures 2a and b: while Figure 2a shows distinct genomic mutations among each LC cohort, Figure 2b shows similarity in affected oncogenic pathways (cell cycle, Hippo, NOTCH, PI3K, RTK-RAS, and WNT) between XWLC and TNLC/CNLC. Considering that different genomic mutations could converge into common pathways and biological processes, wouldn't these results indicate commonalities among XWLC, TNLC, and CNLC? How about other oncogenic pathways not shown in Figure 2b?

(5) In Figure 2c, how and why were the four genes (EGFR, TP53, RBM10, KRAS) selected? What about other genes? In this regard, given tumor genome sequencing was done, it would be more informative to provide the oncoprints of XWLC, TSLC, TNLC, and CNLC for complete genomic alteration comparison.

(6) Supplementary Table 11 shows a number of mutations at the interface and length of interface between a given protein-protein interaction pair. Such that, it does not provide what mutation(s) in a given PPI interface is found in each LC cohort. For example, it fails to provide whether MAD1 R558H and TPRN H550Q mutations are found significantly in each LC cohort.

(7) Figure 7c and d are simulation data not from an actual binding assay. The authors should perform a biochemical binding assay with proteins or show that the mutation significantly alters the interaction to support the conclusion.

Reviewer #3 (Public Review):

Summary:

The manuscript from Zhang et al. utilizes a multi-omics approach to analyze lung adenocarcinoma cases in female never smokers from the Xuanwei area (XWLC cohort) compared with cases associated with smoking or other endogenous factors to identify mutational signatures and proteome changes in lung cancers associated with air pollution. Mutational signature analysis revealed a mutation hotspot, EGFR-G719X, potentially associated with BaP exposure, in 20% of the XWLC cohort. This correlated with predicted MAPK pathway activations and worse outcomes relative to other EGFR mutations. Multi-omics clustering, including RNA-seq, proteomics, and phosphoproteomics identified 4 clusters with the XWLC cohort, with additional feature analysis pathway activation, genetic differences, and radiomic features to investigate clinical diagnostic and therapeutic strategy potential for each subgroup. The study, which nicely combines multi-modal omics, presents potentially important findings, that could inform clinicians with enhanced diagnosis and therapeutic strategies for more personalized or targeted treatments in lung adenocarcinoma associated with air pollution. The authors successfully identify four distinct clusters with the XWLC cohort, with distinct diagnostic characteristics and potential targets. However, many validating experiments must be performed, and data supporting BaP exposure linkage to XWLC subtypes is suggestive but incomplete to conclusively support this claim. Thus, while the manuscript presents important findings with the potential for significant clinical impact, the data presented are incomplete in supporting some of the claims and would benefit from validation experiments.

Strengths:

Integration of omics data from multimodalities is a tremendous strength of the manuscript, allowing for cross-modal comparison/validation of results, functional pathway analysis, and a wealth of data to identify clinically relevant case clusters at the transcriptomic, translational, and post-translational levels. The inclusion of phosphoproteomics is an additional strength, as many pathways are functional and therefore biologically relevant actions center around activation of proteins and effectors via kinase and phosphatase activity without necessarily altering the expression of the genes or proteins.

Clustering analysis provides clinically relevant information with strong therapeutic potential both from a diagnostic and treatment perspective. This is bolstered by the individual microbiota, radiographic, wound healing, outcomes, and other functional analyses to further characterize these distinct subtypes.

Visually the figures are well-designed and presented and for the most part easy to follow. Summary figures/histograms of proteogenomic data, and specifically highlighted genes/proteins are well presented.

Molecular dynamics simulations and 3D binding analysis are nice additions.

While I don't necessarily agree with the authors' interpretation of the microbiota data, the experiment and results are very interesting, and clustering information can be gleaned from this data.

Weaknesses:

Statistical methods for assessing significance may not always be appropriate.

Necessary validating experiments are lacking for some of the major conclusions of the paper.

Many of the conclusions are based on correlative or suggestive results, and the data is not always substantive to support them.

Experimental design is not always appropriate, sometimes lacking necessary controls or large disparity in sample sizes.

Conclusions are sometimes overstated without validating measures, such as in BaP exposure association with the identified hotspot, kinase activation analysis, or the EMT function.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation