Introduction

Recent studies have revealed that somatic mutations detected in hematopoietic malignancies are also commonly observed with clonal expansion in the individuals with no hematological conditions (1). Among such individuals, asymptomatic cases with a variant allele frequency (VAF) of ≥ 2% are defined as clonal hematopoiesis of indeterminate potential (CHIP) (2). In general, CHIP is strongly associated with aging; and mutagenic anticancer therapies and smoking can also induce clonal expansion in specific genes (1,3). Longitudinal monitoring of CHIP mutations offers important clues into the clonal dynamics of blood (4).

Hematopoietic stem and progenitor cells (HSPCs) carrying CHIP mutations serve as a major source of expanded clones, and even a few cells are sufficient to influence clonal homeostasis (5). Representative CHIP mutation-carrying genes, such as DNMT3A, ASXL1, and TET2, impact the myeloid population, including monocytes and granulocytes, and have been linked to hematologic malignancies such as myelodysplastic syndrome and acute myeloid leukemia through inflammation mediated by the NF-kB and NLRP3 inflammasome (68). The prevalence of CH in cancer patients is also significantly higher than that in healthy populations, and high CH burden has been implicated in a negative effect on overall survival in multiple cancer types (9,10).

CHIP is also associated with diseases beyond hematologic malignancies, such as atherosclerosis, type 2 diabetes, and chronic obstructive pulmonary disease (1113). Multiple studies, including both population and pan-cancer cohorts, have explored the correlation of CHIP with solid tumors (9,10,14,15). However, establishing a causal relationship has proven challenging due to confounding factors such as history of smoking and anticancer therapy, both of which act as risk factors for CHIP (10). Non-small cell lung cancer (NSCLC) stands out globally as the solid tumor with the highest mortality burden (16). The difficulty in early detection of NSCLC significantly contributes to its mortality rate being high compared to its incidence rate. Recent studies examining CHIP within lung cancer cohorts and general population indicate a possible association of lung cancer risk with CH (15,17,18). As mentioned above, factors such as a patient’s history of anticancer treatments and smoking serve as common risk factors for CH; however, the interplay between lung cancer and CHIP is not yet fully understood. Immune checkpoint inhibitors (ICI) targeting PD-(L)1 and CTLA-4 have revolutionized treatment of advanced NSCLC, both as monotherapies and in combination with chemotherapy (19). However, response rates to ICI treatments remain relatively low. The predictive ability of markers like PD-1 expression and tumor mutational burden (TMB) is insufficient, requiring the identification of additional markers. Recent studies suggest a marginal effect of ICI on the clonal dynamics of CHIP (20,21), but the current understanding of the role of CHIP in ICI-treated patients is not comprehensive. Also, only limited analysis has been performed concerning the impact of CHIP on cancer patients’ blood. To understand the interplay between ICI and CHIP in NSCLC patients, we conducted CHIP-targeted panel sequencing and single-cell RNA sequencing (scRNA-seq) on blood samples collected before and 1-3 weeks after ICI administration. Our results indicate a marginal interaction between ICI and CHIP, but also validate that high CHIP burden manifests an inflammatory phenotype in metastatic NSCLC patients’ myeloid cells.

Materials and methods

Patients

Description of the discovery cohort is previously reported (22). Briefly, 100 patients (50 LUAD, 43 LUSC and 7 others) with metastatic NSCLC stage IV who were treated with anti-PD(L)1 (e.g., Atezolizumab, Nivolumab, and Pembrolizumab) were recruited at Samsung Medical Center (SMC) under the permission of SMC Institutional Review Board (No. 2018-04-048, 2022-01-094). Whole blood sample was acquired before and one to three weeks after the treatment and used for genome sequencing and scRNA-seq. For replication, blood samples from additional 180 lung cancer patients (125 LUAD and 55 LUSC) were used for genome analysis (23).

Targeted sequencing and whole exome sequencing (WES)

We designed a targeted sequencing panel comprising 167 selected genes associated with putative drivers of CHIP at Twist Bioscience (South San Francisco, CA). For WES, we utilized the SureSelect V5 panel from Agilent (Santa Clara, CA). CHIP-targeted sequencing and WES were both conducted at Samsung Genome Institute, respectively employing 50 ng and 200 ng of genomic DNA extracted from whole blood. The targeted sequencing panel was captured and sequenced using NextSeq 2000, with paired-end sequencing performed at a read length of 150 base pairs (bps).

Variant filtering and calling of CHIP variants

For somatic variant calling, the Sarek pipeline (version 3.2.1) implemented in Nextflow was used (24). In brief, the pipeline employed GATK’s standard data preprocessing procedure and Mutect2 for somatic variant calling based on a BED file (25). All passed calls were collected and annotated using Snpeff (version 4.3 and db version 105) and VEP (version 108) (26,27). To filter CHIP variants, a stepwise approach based on the following criteria was applied. First, variants should meet the following conditions: a) variant allele frequency (VAF) 2-35%, b) variant depth >500x for the discovery set and >40x for the WES replication set, c) 4 or more (2 or more for the WES replication set) variant-covering read pairs in each forward and reverse direction, d) global allele frequency from gnomAD <1e-5, or <1e-3 if a variant is in the COSMIC database (28)) allele count <5% in all processed samples, and f) no homopolymer signature found in 6 bp upstream or downstream of the candidate variant. Next, we called CHIP variants based on a) protein sequence alterations, including exonic splicing variants, and b) manual assessment of variant calls using IGV (29). Finally, we curated data from previous studies and the COSMIC database, assigning variants reported more than 10 times or matched within a predefined list as CHIP with putative drivers (called as clonal hematopoiesis of indeterminate potential with putative drivers, or CHIP-PD) (30).

Power estimation

The power of the Poisson binomial test was calculated as the probability that a variant would be detected exclusively in one of two arbitrary groups. In brief, the most prevalent CHIP variant, DNMT3A p.Arg882, constitutes approximately 10% of the DNMT3A CHIP cases. When this variant is totally absent in the one group, a minimum of 16 DNMT3A CHIP patients in the other group would be required to achieve a power of 0.8. Considering the current prevalence of DNMT3A CHIP in our cohort (8/50 CHIP samples), this suggests that a minimum of 100 CHIP samples per group is required for determining the prevalence of any specific variant.

scRNA-seq pre-processing and analysis

The scRNA-seq dataset was generated using the 10x Genomics 5’ single-cell kit and was partially derived from a previous study (22). Reads were aligned to the hg38 reference using 10X CellRanger (ver. 7.0). Contaminant RNA barcodes were removed with cellbender and sample demultiplexing was performed by demuxlet (31,32). After filtering the cell expression matrix, we used Seurat (v 4.3.0) for downstream analysis (33). At the sample level, we filtered out cells with mitochondrial RNA content >15%, UMI counts <200 or >6,000, and calculated library-size corrected counts using SCTransform (34). Then, we computed PCA based on the normalized SCTransform values and integrated all samples using Harmony (35). Using the corrected PCA, we performed unsupervised clustering via Louvain algorithms and UMAP. For each cluster, we identified marker genes using the Wilcoxon rank-sum test with presto and validated annotations using Azimuth (33).

Gene set enrichment analysis (GSEA)

GSEA was employed to compare the expression patterns according to CHIP status based on their concordance to the hallmark pathways in the Molecular Signatures Database (MSigDB) (36). For GSEA of scRNA-seq data, we used fgsea (37). Briefly, the AUC rank of DEGs obtained with and without CHIP for each cluster was used as input. Significant gene set enrichment was determined only for adjusted P < 0.05.

scRNA-seq weighted correlation network analysis (WGCNA)

We utilized WGCNA to construct a gene co-expression network from the gene expression profiles of myeloid cells utilizing the R package hdWGCNA (38). In brief, the co-expression network was constructed using a soft power of 12 for SCT-transformed cell counts. Within the constructed co-expression network, we explored the type and specificity of genes in each module, excluding unassigned genes. The functionality of the black module was validated through GSEA and by examining its overlap with DEGs identified between CHIP groups (38).

Gene regulatory network (GRN) inference

To further understand the altered gene modulation in major cell clusters, including B cells, NK cells, CD4 T cells, CD8 T cells, and monocytes, we constructed a GRN using pySCENIC (39). We created meta-cells for each sample using SEACells and employed these meta-cells for GRN calculation to address the sparse nature of scRNA expression and enhance resolution (40,41). The full transcriptome matrix and the GRN derived from meta-cells were then used to analyze regulon signatures for each cell. Finally, we mapped the regulon AUC values of each cell onto Seurat’s UMAP and visualized the results using Seurat.

Cell to cell communication analysis

To understand the interactions between myeloid and other cell types, we utilized the Python package CellphoneDB (42), which curates human cell-cell interactions. The counts matrix, processed with SCTransform, was divided according to CHIP status and used as input for CellphoneDB. Significant interactions were then counted to determine the number of cell-cell interactions, and myeloid-related interactions were compared between samples of different CHIP status.

Statistical analysis

We present raw data if applicable. For scRNA-seq data, scaled and log-normalized raw counts from cells are displayed in violin plots. To determine statistical significance between high-burden, CHIP, and negative groups, the Wilcoxon test or Kruskal-Wallis test was used for continuous variables and the Poisson binomial test or logistic regression for categorical variables. Statistical metrics from GESA were calculated using the fgsea package in R. All statistical analyses were conducted in R 4.2.1.

Data availability statement

Raw data for this study were generated at Samsung Genome Institute. Derived data supporting the findings of this study are available from the corresponding author upon request.

Results

CHIP profiles in metastatic NSCLC patients treated with ICI

To investigate the interplay between ICI treatment and CHIP in NSCLC, we collected blood samples before and after ICI treatment from 100 metastatic NSCLC patients (Fig. 1, Supplementary Table S1). These samples were utilized for a CHIP targeted sequencing panel (median depth ∼600x) and scRNA-seq (Fig. 1A and Methods). At the baseline, we identified 67 CHIP mutations in 100 patients, 26 of them in known putative driver genes of hematopoietic cancer (CHIP-PD). After treatment, 68 CHIP and 32 CHIP-PD mutations were found in 91 patients who passed quality control assessment (Fig. 1A, Supplementary Fig. S1-S3 and Supplementary Table S3).

Profile of CH in NSCLC patients treated with ICI.

A, Overall study design. The numbers indicate sample counts in each group. B, Frequency of CH variant detection in each gene in NSCLC samples before (light blue) and after (blue) ICI treatment. C, Comparison of variant allele frequency before and after ICI treatment, divided by pathology type (LUAD, LUSC, other). D, Effect of ICI treatment on the clonal landscape of all genes (left) and frequently detected CH genes.

Among the CHIP genes, DNMT3A, PPM1D and TET2 were the most frequently mutated (Fig. 1B). PPM1D truncating mutations were also common, and have previously demonstrated association with patients’ smoking habits and possible history with chemotherapy (3,9). Variants involving the DNMT3A p.Arg882 residue and TP53, which are frequent in CHIP, were not prevalent in our dataset, although our panel extensively covered these regions and these loci were manually inspected (data not shown).

Overall, we observed ICI treatment to effect minimal changes in CHIP burden (Fig. 1B). Also, VAF was highly correlated across all traced mutations and not dependent on clinical responses (Fig. 1C and D). However, we noted increased number of CHIP mutations with VAF >10% in patients with lung squamous cell carcinoma (LUSC). While the specific mutated genes differed between patients with different histology or responses, mutated gene burden did not significantly differ between groups.

Prevalence of high CHIP burden showed bias to LUSC patients

Next, we investigated the association of CHIP status with clinical parameters in the cohort (Fig. 2). CHIP prevalence was significantly higher in the lung cancer cohort compared to the control group (controls 5/42 vs. patients 44/100, age-adjusted logistic regression P = 0.01; Fig. 2A, Supplementary Table S1 and S2). However, the number of CHIP-positive patients did not differ before and after ICI treatment, nor in relation to treatment response (Fig. 2B and C). As previously reported, CHIP burden is heavily dependent on age (Fig. 2D). Smoking history may also impact CHIP prevalence, but we could not find a significant contribution of smoking, as almost all NSCLC patients in our study have a smoking history (CHIP prevalence of non-smoker 2/9 vs. smoker 42/91; Supplementary Fig. S4).

Prevalence of CH in relation to various parameters.

A-D, discovery cohort (n = 100), E-G, replication cohort (n = 180; see Methods for description). A, E, CH prevalence by pathology (LUSC and LUAD). B, Effect of ICI treatment on CH. C, F, Effect of post-ICI prognosis to ICI on CH. D, G, Age distribution of the cohort, stratified by CHIP allele frequency and pathology status.

Subsequently, we examined clonal size in the cohort, as that has been reported as a risk factor of clinical outcome (43). Taking CHIP VAF as an indicator of clonal size, we observed that high CHIP burden was prevalent in LUSC, while multiple concurrent CHIP mutations were common in lung adenocarcinoma (LUAD) patients (Fig. 2C and Supplementary Fig. S5). To determine if this finding would reproduce in an independent cohort, we performed CHIP mutation calling using WES data of median ∼80x depth from 180 additional peripheral blood mononuclear cell (PBMC) samples from an independent cohort of NSCLC patients (Fig. 2E-G, Supplementary Fig. S6, Supplementary Table S1, S3-S4 and Methods). The results showed similar patterns between the two cohorts, but the replication cohort alone was not powerful enough to achieve significance for difference in clonal sizes based on pathology (Fig. 2E-G).

High CHIP burden causes single-cell transcriptomic changes

Next, to test whether the presence of CHIP and ICI prognosis affects blood cell transcriptomes, we analyzed PBMC scRNA-seq data from 63 samples of the discovery cohort. After quality control processes, we retrieved 468,596 cells in a total of 26 clusters defined by systemic integration and clustering using the Louvain algorithm, then validated using Azimuth (Fig. 3A, Supplementary Fig. S7A-B, Supplementary Table S5, and Methods) (33). Cell composition analysis showed an increased proportion of NK cells and a decreased CD4 TEM population after ICI administration, but high variability between samples precluded systematic differences by CHIP status (Supplementary Fig. S7C). Subsequently, DEG analysis using Wilcoxon’s rank-sum test according to CHIP VAF bins in each cluster (see Methods) highlighted genes important in immune response and transcriptional activation (Fig. 3C, Supplementary Fig. S8). To further implicate these DEGs functionally, a gene set enrichment test was performed using the AUC statistics from DEGs and the hallmark pathway gene sets in msigDB (36). The results indicated activation of the TNF signaling pathway via NFKB in the high CH burden group for most cell lineages, including both classical dendritic cells (DC) and NK cells (Fig. 3D, 3E, Supplementary Fig. S9, and Supplementary Table S6; see Methods). To understand possible interactions between ICI treatment response and CHIP burden, we grouped samples by CHIP burden and ICI response (Supplementary Fig. S10A-B). However, the interactions mostly depended on CH status only, and not on response to ICI.

scRNA-seq analysis detected myeloid-specific inflammatory signatures in patients with high CH burden.

A, UMAP plot of scRNA-seq data from the discovery cohort (n = 63). B, Effect of pathology (left) and VAF (right) on cell composition. C, Expression of selected genes in the NF-ĸB pathway from the scRNA-seq data. D, GSEA of DEGs from myeloid populations. Color represents normalized effect score (NES), and dot size represents adjusted P-values. E, GSEA plots of selected pathways from C.

Implication of high CHIP burden in inflammatory signatures of myeloid cells

We performed a gene network analysis to comprehensively analyze the increased expression of inflammatory signaling genes in patients with higher CHIP burden. First, we used hdWGCNA to construct co-expression networks for genes in myeloid cells, which yielded a black module that exhibited myeloid specificity and represented most of the signaling-related DEGs identified via GSEA (Supplementary Fig. S6 and S11 and Supplementary Table S7) (38). We determined gene ontology (GO) term enrichment in this module and found the inflammatory and transcriptional activation pathways observed in DEGs to be clearly identified (Supplementary Fig. S11C). Additionally, further GSEA on the myeloid-specific module showed its function to resemble that of the overall set of DEGs (Supplementary Fig. S11D).

In addition to WGCNA, GRN analysis using SCENIC (39) (Supplementary Fig. S12) indicated that the regulatory activity of transcription factors in the AP-1 and NF-kB pathways was particularly notable in the myeloid lineage of high-burden patients (Supplementary Fig. S12A-B). Both the NF-kB regulon and the ATF3 regulon, which represents the AP-1 pathway, were activated in the myeloid cluster of high-burden patients (Supplementary Fig. S12C-E).

Recent studies have shown that expansion of HSPC-derived monocytes is the cause of CHIP, and representative CHIP genes, including DNMT3A and TET2, exert immune-regulatory effects through the NLRP3 inflammasome (5,8,44). Our network analysis also points to this pathway (Supplementary Fig. S12). Notably, a VAF of 10% in heterozygote mutations corresponds to approximately 20% of cells; thus, the presence of such mutations in HSPC-derived monocytes of high-burden patients would have resulted in an overall transcriptional profile change for blood monocytes.

In addition to conducting gene network analysis, we explored the effect of the inflammatory signature on cell-cell interaction using CellPhoneDB (Supplementary Fig. S13) (42). The number of significant cell-cell interactions did not differ significantly between high-burden CHIP samples and negative samples, although the number of cells in each group seemed likely to have an impact (Supplementary Fig. S13A-B).

However, when the interactions in the TNF and IL-1B pathways were analyzed separately, monocytes and DCs in high-burden cases were activated as receptors for TNF and Lymphotoxin A (Supplementary Fig. S13C). Collectively, our results suggest a sensitized response of monocytes to inflammation in lung cancer patients with high CHIP burden.

Effect of CHIP on clinical outcome

Finally, to understand whether clonal hematopoiesis status can affect the clinical outcomes of lung cancer patients undergoing ICI treatment, we conducted a survival analysis (Supplementary Fig. S14). Our results showed the progression-free survival (PFS) of patients after ICI treatment to be reduced in the high-burden CHIP group (825 vs 404, P = 0.056), with no difference in ICI response rates between the negative and high-burden CHIP groups. Previous studies have already observed a negative effect on life expectancy in CHIP patients with solid tumors. Therefore, it is possible that these results will be validated by analyzing additional samples.

Discussion

To address the influence of CH in lung cancer patients with differential prognosis following ICI, we sought to determine the following: (i) CH profiles from metastatic NSCLC patients, (ii) changes in PBMC gene expression by scRNA-seq, (iii) relationship of CH burden and transcriptome profile, and (iv) potential implication for the clinical outcomes. Here we report that CH prevalence was significantly higher in NSCLC patients and higher burden of mutations with VAF of greater than 10% was frequently observed in LUSC. Using scRNA-seq analysis, we found myeloid cells in NSCLC patients with high CH burden to exhibit an altered inflammatory milieu via increased NF-kB pathway expression. Our results provide possible explanations for heterogeneity in response to immunotherapy and suggest immune response to be altered due to CHIP in cancer patients.

Recent studies have employed mouse models and single-cell transcriptomics to understand the roles of specific mutations in CH and provided evidence of their implications for inflammatory response in the myeloid lineage (45,46). In particular, a study linking COVID-19 and CH showed an increased inflammatory response from monocytes in severe COVID-19 patients with CH, mediated via IFN-γ and TNF-α signaling signatures (47).Likewise, we observed that myeloid cells from NSCLC patients with CH tend to harbor increased inflammatory response, and this response is mediated by NF-kB pathway signaling (Fig. 3).

Our GSEA results specifically indicated the enrichment of TNF signaling and hypoxia pathways in most clusters of patients with severe CH (Supplementary Fig. S9). The leading-edge genes from GSEA results showed core genes such as FOS, DUSP1, JUN, and PPP1R15A, which are known to play critical roles in the inflammatory phenotypes of immune cells, were shared between the TNF signaling and hypoxia pathways in all significant clusters (Supplementary Fig. S15). Furthermore, gene regulatory network analysis using SCENIC indicated a higher enrichment of inflammatory signatures in myeloid lineages (Supplementary Fig. S9), highlighting the pronounced inflammatory phenotype of CH clones in these cell lineages. To gain insight into the functional implications of myeloid-driven inflammatory gene pathways within CH in NSCLC patients, we conducted gene network analyses focusing on the myeloid lineage. The “black module” from the WGCNA analysis (Supplementary Fig. S9) and the NFKB1 and ATF3 GRN from SCENIC (Supplementary Fig. S11) revealed activation and adaptation of myeloid-specific inflammatory pathways (38,39). Additionally, cell-cell interaction analysis using cellphoneDB unveiled notable interactions between myeloid cells and other lineages mediated through the TNF pathway (Supplementary Fig. S12) (42).

As ICIs modulate pathways involved in immune response rather than cell proliferation, it is reasonable that ICIs do not exert a direct influence on CH development (21). Also, our results did not support significant difference of ICI outcome based on CH burden (Fig. 2). According to our power analysis, a minimum of 100 samples per group with CHIP is required to achieve a statistical power of 0.8 from pair-wise comparisons between each genes and variants (Methods for details). However, our findings do suggest that elevated inflammatory activity in the bloodstream may impact long-term patient survival pre-and post-ICI administration (Supplementary Fig. S14). Therefore, a future study with a larger sample size may yet clarify a causal relationship of CH with ICI outcome. Notably, involvement of the IL-1 pathway in the aggressiveness and metastasis of lung cancer has been reported in experimental studies (48,49), and inhibition of IL-1β has proven effective in reducing lung cancer-related mortality, as shown in the CANTOS trial (50). Indeed, our analysis indicated a modest decrease in PFS in the ICI-responder group with high-CH patients (Supplementary Fig. S14, P = 0.056). Our study design is potentially limited by the relatively short observation period for defining ICI response Indeed, it is worth noting that more potent factors, such as smoking and anticancer therapies, may have a more substantial impact than immunotherapy (10). Consistent with this postulation, our analyses indicated minimal changes in in CH-related signatures before and after ICI treatment (Fig. 3, Supplementary S7, S9, and S10).

Also, the distinct characteristics of our cohort can be confounders for our results. Compared to control patients, our cohort was biased to slightly older ages, a higher prevalence of smoking habits, and with a higher proportion of males (mean age: 64.1 vs. 58.9; current smokers: 37/100 vs. 11/42; male/female: 91/9 vs. 18/24 Supplementary Figures S1 and S3). However, previous studies have reported similar prevalence rates of clonal hematopoiesis in NSCLC patients, aligned with our findings (9,51). Moreover, our most prevalent CH mutations, including DNMT3A, TET2, and PPM1D, were marginally affected by smoking, and this trend has been consistently observed in both healthy populations and NSCLC patients (10,51,52).

Here we have illustrated increased inflammatory activation in myeloid cells of NSCLC patients with high CH burden. Our scRNA-seq analysis has revealed that NF-kB signaling mediated by the IL-1β and the TNF superfamilies may enhance interactions of myeloid cells with other cell lineages. These findings propose the potential utility of CH as a predictive marker for classifying NSCLC patients. Future investigations involving ICI-treated patients and their clinical outcomes may shed light on the impact of CH, offering a basis for the development of adjuvant anticancer strategies aimed at modulating CH.

Acknowledgements

We thank George Vassiliou for critical comments. The authors also acknowledge the Korea Research Environment Open Network (KREONET) service and the usage of the Global Science Experimental Data Hub Center (GSDC) provided by Korea Institute of Science and Technology Information (KISTI). This work was supported in part by the grants from the Korean Research Foundation (NRF-2021R1A2C3014067, NRF-RS-2023-00207857 to M. Choi, and NRF-2020R1A2C3006535 to S.-H. Lee), the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI) (HR20C0025 to S.-H. Lee), and Future Medicine 20*30 Project of the Samsung Medical Center (SMX1230041 to S.-H. Lee).

Conflict of interest

Authors declare no conflict of interest.

Authors’ contributions

H. Sim: Data curation, formal analysis, validation, investigation, writing-original draft. H.-J. Park: Conceptualization, data curation, formal analysis, methodology, investigation. G.-H. Park: Data curation, formal analysis, methodology, investigation. Y. J. Kim: Data curation, methodology, investigation. W.-Y. Park: Conceptualization, investigation, methodology. S. H. Lee: Conceptualization, investigation, writing-original draft. M. Choi: Conceptualization, investigation, writing-original draft.