Introduction

Neurodegenerative diseases are a frequent cause of progressive dementia. Alzheimer’s disease (AD) is diagnosed in ∼90% of cases, with an estimated prevalence of ∼10% in the population over 65 years of age 1,2. The role of germline genetic variation in neurodegenerative diseases and AD has been studied intensely. Although autosomal dominant forms of AD due to rare germline variants with high penetrance account only for an estimated ∼1% of cases 38, a number of common variants were also shown to contribute to disease risk. Carriers of one germline copy of the epsilon4 (E4) allele of the apolipoprotein E gene (APOE4), present in ∼15 to 20% of population, have a three-fold higher risk of AD, while two copies (∼2 to 3 % of population) increase the risk by ∼10-fold 912. Genome-wide association studies (GWAS) have identified an additional ∼50 common germline variants that more moderately increase the risk of AD, including TREM2, CD33, and MS4A6A variants 1315. Interestingly, the APOE4 allele is responsible for an increased inflammatory and neurotoxic response of microglia and astrocytes in the brain of carriers 1618, and it was noted that the majority of the other germline AD-risk variants are located within or near genes expressed in microglia 15 and in particular at microglia-specific enhancers 19. These data, together with transcriptional studies 2022 support the hypothesis that genetic variation in microglia may contribute to the pathogenesis of neurodegeneration and AD.

Somatic genetic heterogeneity (mosaicism), resulting from post-zygotic DNA mutations, is widespread in human tissues, and a cause of tumoral, developmental, and immune diseases 2326. Aditionally, a role of somatic variants in neuropsychiatric disorders is also suspected 27. Mosaicism has been documented in the brain tissue of AD patients in several deep-sequencing studies 2830, showing that the enrichment of putative pathogenic somatic mutations in the PI3K-AKT, MAPK, and AMPK pathway do occur in the brain of patients in comparison to controls 30. However these studies performed in whole brain tissue lacked cellular resolution and mechanistic insights, and the role of somatic mutants in neurodegenerative diseases remains poorly understood 23. Somatic variants that activate the PI3K-AKT-mTOR or MAPK pathways in neural progenitors are a cause of cortical dysplasia and epilepsy 3134 and developmental brain malformations 35, while somatic variants that activate the MAPK pathway in brain endothelial cells are associated with arteriovenous malformations 36. Interestingly, we reported that expression of a somatic variant activating the MAPK pathway in microglia causes neurodegeneration in mice 37, but the presence and contribution of microglial somatic clones in neurodegenerative diseases and AD remains unknown.

Here, we investigated the presence and nature of somatic variants in brain cells from control and AD patients. In an attempt to examine all brain cells at the same resolution, nuclei from neurons, glia cells and microglia, which only represent ∼5% of brain cells, were pre-sorted. In addition, although human microglia are reported to develop in embryo and renew by local proliferation within the brain 3840, bone marrow-derived myeloid cells can enter the brain, in particular during pathological processes. Therefore we also analyzed matched peripheral blood in order to identify shared somatic mutants between microglia and blood. Finally, in order to achieve high sensitivity in the detection of variants that confer a proliferative or activation advantage (driver mutations) and support the emergence or pathogenicity of mosaic clones 41, and/or that have been previously associated with neurological diseases, we initially performed a targeted deep-sequencing of a panel of 716 genes covering somatic variants reported in clonal proliferative disorders and neurological diseases. We found that microglia from AD patients were enriched for pathogenic variants in comparison to age-matched controls. Furthemore, we found that these microglia-specific AD-associated variants preferentially target the MAPK pathway, including recurrent CBL ring-domain mutations. In addition, we showed that these variants drive a microglia transcriptional program characterized by a strong neuro-inflammatory response previously associated with neurotoxicity, including the production of IL1 and TNF, both in in vitro microglia models and in patients. The natural history of the AD-associated microglia clonal inflammatory disorder we describe here is difficult to establish. Specifically, we do not know whether it contributes to the onset of the neuro-inflammatory process at an early stage of the disease, or if microglia carrying driver mutations preferentially expand later during the course of the disease in response to tissue inflammation. Under both hypotheses however, the presence of neuro-inflammatory microglial clones may contribute to the neurodegenerative process in a subset of AD patients. This report reveals a previously unrecognized presence of AD-associated microglia harboring pathogenic somatic variants in humans and provides mechanistic insight for neurodegenerative diseases by delineating cell-type specific variant recurrence.

Results

Clonal diversity among brain cells and blood from controls and AD patients

We examined post-mortem frozen brain samples and matching blood from 45 patients with intermediate-onset sporadic AD and 44 control individuals who died of other causes, including 27 donors age-matched donors with the AD cohort (Fig. 1A; Table S1). APOE risk allele frequency for patients and controls was comparable to published series 1012 (Fig. S1A), and analysis of germline mutations did not identify deleterious variants in the 140 genes associated with neurological diseases. Myeloid/microglia, neurons, and glia/stromal cells were purified by flow cytometry using antibody against PU.1 and NeuN 42 (Fig. 1B, Fig. S1B and S1C). Single nuclei (sn)RNAseq was performed on PU.1+ nuclei from one control and 3 AD patients to evaluate microglia enrichment following PU.1+ purification, and a cell-type annotation analysis indicated that ∼94% of PU.1 nuclei correspond to microglia (Fig. 1C and Fig. S1D-S1H). An average of 2.5 brain samples were analyzed for each donor, including cortex samples obtained from all donors, and hippocampus samples, mostly obtained from AD patients (Fig. 1A; Table S1). A total of 744 DNA samples from blood, PU.1+ nuclei, NeuN+ nuclei, and Double Negative nuclei (glia/stromal cells) from patients and controls (Fig. 1A) were submitted to targeted hybridization/capture and deep-DNA targeted sequencing (TDS, Fig. 1D, see Methods), at mean coverage of ∼1,100x (Fig. S1I), for a panel of 716 genes (3.43 Mb, referred to below as BRAIN-PACT) which included genes reported to carry somatic variants in clonal proliferative disorders (n=576 genes) 43,44 or implicated in neurological diseases (n=140) 4553 (Table S2, see Methods).

Detection of mutations in brain cell types and blood

(A) Table with patient and sample information. (B) Schematic represents the isolation and labelling of nuclei from post-mortem frozen brain samples from controls and Alzheimer’s disease patients with DAPI and antibodies against PU.1+ (myeloid/microglia) and NeuN+ (neurons).Representative flow cytometry dot-plot of nuclei separation. Double negative nuclei are labeled ‘DN’. (C) Percentage of cell types obtained in sorted PU.1+ nuclei determined by single-nuclei RNAseq in 5 brain samples from 4 individuals. (D) Schematic represents the sequencing strategy. Two algorithms (ShearwaterML and Mutect1) were used for variant calling. After annotation, pathogeneity was determined using OncoKb and ClinVar. (E) Venn diagram represents the number of variants and overlap between the ShearwaterML and Mutect1. Numbers in red indicate pathogenic variants (P-SNV). Validation of variants was performed by droplet digital (dd)PCR on pre-amplified DNA when available. (F) Venn diagrams represent the repartition per cell type of the 826 single-nucleotide variations (SNVs) identified in NeuN+: Neurons, PU.1+: microglia, DN: glia, and matching blood. [Numbers] in red indicate pathogenic variants P-SNV

After QC and filtering of germline variants, variant calling using ShearwaterML and a curated Mutect1 analysis identified 826 somatic synonymous and non-synonymous single-nucleotide-variations (SNVs), at an allelic frequency > 0.3% (mean 1.3%) in the 744 samples, corresponding to an overall variant burden of 0.3 mut/Mb (Fig. 1E). Sixty-six SNV were present in more than one sample (Table S3). Droplet digital-PCR performed on pre-amplification DNA for ∼10% of the 760 unique SNV was positive in 90% of cases (Fig.1E; Table S3). After annotation using the OncoKB 54 and ClinVar 55 databases for disease-associated or causative variants (Fig. 2D-2F; Table S3), 96 unique SNV were classified as Pathogenic (P)-SNV. 40% of these P-SNV were tested by droplet digital-PCR and confirmed in 95% of cases (Fig. 1E; Table S3). Positive and negative results in matching brain samples from individual donors were confirmed in 100% of samples at a mean depth of ∼5000x (range 648-23.000x) (Table S3). A venn-diagram analysis of SNVs detected in PU.1+, NeuN+, DN, and blood samples indicated that most (>90%) SNV and P-SNV were cell-type or tissue specific, with ∼ 5% of SNV and ∼ 8% of P-SNV shared between the blood and brain of individual donors (Fig. 1F; Table S3). These data indicate that targeted deep-sequencing of purified nuclei allows to detect clonal mosaic variants with high sensitivity and specificity. In addition, ‘bar-coding’ of clonal variants across tissues suggest that blood clones have a minor contribution to microglia somatic diversity, consistent with its local maintainance and proliferation 38,39

Pathogenic variants are enriched in microglia from AD patients.

(A) Correlation plot represents the mean number of variants per cell type and donor (n=89) (Y axis), as a function of age (X axis). Each dot represents mean value for a donor. Statistics: fitted lines, the correlation coefficients (rs) and associated p values were obtained by linear regression (Spearman’s correlation). (B) Number of SNV per Mb and cell types per donor, of age-matched controls (n=27) and AD patients (n=45). Each dot represents mean value for a donor. Statistics: p-values are calculated with unpaired two-tailed Mann-Whitney U. Note: non-parametric tests were used when data did not follow a normal distribution (D’Agostino-Pearson normality test). (C) Number of SNV per Mb in PU.1 samples across brain regions, of age-matched controls (n=27) and AD patients (n=45). Each dot represents a sample. Statistics: p-values are calculated with Kruskal–Wallis, multiple comparisons. Note: non-parametric tests were used when data did not follow a normal distribution (D’Agostino-Pearson normality test). (D) Correlation plot represents the mean number of pathogenic variants (P-SNV) as determined by ClinVar and/or OncoKB, per cell type and donor (n=89) (Y axis), as a function of age (X axis). Each dot represents mean value for a donor. Statistics: fitted lines, the correlation coefficients (rs) and associated p values were obtained by linear regression (Spearman’s correlation). (E) Number of P-SNV per Mb and cell types per sample, of age-matched controls (n=27) and AD patients (n=45). Each dot represents a sample. Statistics: p-values are calculated with unpaired two-tailed Mann-Whitney U. Note: non-parametric tests were used when data did not follow a normal distribution (D’Agostino-Pearson normality test). (F) Number of P-SNV per Mb in PU.1 samples across brain regions, of age-matched controls (n=27) and AD patients (n=45). Each dot represents a sample. Statistics: p-values are calculated with Kruskal–Wallis test and Dunn’s test for multiple comparis. Note: non-parametric tests were used when data did not follow a normal distribution (D’Agostino-Pearson normality test). (G) Number of P-SNV per Mb and and cell types per donor for age-matched controls (n=27) and AD patients (n=45). Each dot represents mean value for a donor. Statistics: p-values are calculated with unpaired two-tailed Mann-Whitney U test. Odds ratio (95% CI, 2.049 to 29.02) and p values for the association between AD and the presence of driver variants are calculated by multivariate logistic regression, with age and sex as covariates.

Somatic clonal diversity of the different cell types, as evaluated by the SNV/megabase burden was higher in blood (1 mut/Mb) and PU.1+ nuclei (0.5 mut/Mb) than for DN and neurons (0.18 mut/Mb) (Fig. S2A). The SNV/mb burden of blood and PU.1 nuclei increased as a function of age (Fig. 2A; Table S3) as previously reported for proliferating cells 24,5658. Interestingly, the SNV/mb burden of blood cells from age-matched controls was higher than for AD patients (Fig. 2B). In contrast, there was no difference in SNV/mb burden between PU.1, NEUN and DN samples from AD patients and age-matched controls (Fig. 2B), and between PU.1+ nuclei from the cortex, hippocampus, and brainstem/cerebellum samples (Fig. 2C). These data altogether indicate that the clonal diversity of microglia and blood both increase with age, and that the clonal diversity of blood cells is lower in AD than in age-matched controls who died of other causes including cancer and cardiovascular diseases (see Methods). This is consistent with recent studies showing that clonal hematopoiesis is associated with a higher risk of several diseases related to ageing such as cardiovascular diseases, but inversely associated with the risk of AD 59,60.

Microglia clones carrying pathogenic variants are enriched in AD patients

P-SNV/Mb burden also increased with age (Fig. 2D), however, we found that the P-SNV/Mb burden was selectively and highly enriched in PU.1+ samples from AD patients in comparison to age-matched controls (p=0.0003, Fig. 2E). Analysis of PU1+ P-SNV/Mb burden per brain region indicated that the P-SNV/Mb burden was similar between brain regions within each group (Fig. 2F), and therefore attributable to AD status rather than sampling bias. Moreover, analysis of mutational load per donor confirmed that microglial clones carrying P-SNV were enriched in the brain of AD patients in comparison to age-matched controls (Fig. 2G). Despite the relatively modest cohort size, a logistic regression analysis confirmed the association between the presence of P-SNVs in PU.1 nuclei and AD after adjusting for sex and age (OR= 7; p=0.0035, Fig. 2G and Fig. S2B). In addition, genes targeted by P-SNV were all expressed in microglia (Fig.S2C; Table S3) and the analysis of P-SNV/Mb mutational load restricted to genes that are not expressed in microglia did not show an enrichment of candidate pathogenic variants in AD patients (Fig. S2D; Table S3). Altogether, these results show a cell-specific association between microglia clones carrying P-SNV and AD in this series.

AD patients carry microglial clones with MAP-Kinase pathway variants including recurrent CBL variants

Pathways analysis of genes carrying P-SNV in microglia from AD patients, against the background of the 716 genes sequenced, showed that the most significant pathways enriched were the receptor tyrosine kinase/MAP-Kinase pathways (Reactome, GO, and canonical pathways, Fig. 3A; Table S4), corresponding to driver/oncogenic variants in 6 of the 15 genes of the classical MAPK pathway 61 (CBL, BRAF, RIT1, NF1, PTPN11, KRAS), TEK, and the KEGG Chronic Myeloid Leukemia (CML) pathway, which includes the former plus SMAD5 and TP53 (Fig. 3B, 3C and Fig. S3). Mutational load for MAPK genes was significantly higher in AD patients in comparison to age-matched control (Fig. 3C). Other enriched pathways, albeit less significant, included genes involved in DNA repair and chromatin binding/methyltransferase activity (Fig. 3B; Table S4). No pathway was enriched in age matched controls. P-SNV targeting genes of the classical RTK/MAPK pathway (Fig. 3C) were detected in the PU.1 samples from ∼25% of the AD patients tested (p=0.0145 vs age-matched controls, Fig. 3D and Fig. S3). Strikingly, half of these patients (6 patients, 13 % of AD patients in this series) carried reccurent P-SNV in the RING domain of CBL 6272 (Fig. 3B-3E). Two additional patients presented with P-SNV in the Switch II domain of RIT1 73 (Fig. 3B-3F). Microglia from the 3 other patients carried activating KRAS (p.A59G), PTPN11 (p.T73I) and TEK (p.R1099*) oncogenic variants previously described in cancer and sporadic venous malformations 7476 (Fig. 3B and 3D). In addition, a 12th patient carried a gain of function (GOF) U2AF1 (p.S34F) variant 77, which is not a ‘classical MAPK gene’ but activates the MAPK pathway in myeloid malignancies 78 (Fig. 3B and Fig. S3). Two patients carried 2 different MAPK activating variants: microglia from 1 patient carried an activating BRAF (p.L505H) variant 79 in addition to loss of function (LOF) variant CBL (p.C416S), and another patient carried the NF1 (p.L2442*) LOF variant 80,81 in addition to the activating RIT1 (p.M90I) variant (Fig. 3D and Fig. S3). Five patients also carried additional P-SNV targeting genes involved in DNA repair with tumor suppressor function 82,83, including the loss of function variants in ATR (c.6318A>G)84 and SMC1A (p.X285_splice) (Fig. 3D and Fig. S3), and in DNA/histone methylation including TET2 (p.Q1627*) 62,85, IDH2 (p.R140Q) 86, and PBRM1 (c.996-7T>A) 87) (Fig. 3D and Fig. S3). Finally, two patients carried oncogenic variants in genes from the KEGG Chronic Myeloid Leukemia (CML) pathway, SMAD3 (p.R373C) 88 and TP53 (pX261_splice) 89 (Fig. 3D and Fig. S3). The detection of multiple oncogenic variants in the same patients is reminiscent of the features observed in myeloproliferative disorders described outside the brain 72,85.

Somatic microglial clones with multiple and recurrent CBL and MAP-Kinase pathway activating variants.

(A) Pathway enrichment analysis for the genes target of D-SNVs using the panel of 716 genes as background set. Graph shows the most enriched pathways by: Reactome Gene Sets, GO Molecular Functions, Canonical Pathways and KEGG Pathway (see complete list in Table S4). (B) Bar plot indicates the genes carrying D-SNV (y-axis) and the % of AD patients carrying D-SNV for each gene (x-axis). Genes are color-coded by pathway. (C) Representation of the classical MAPK pathway, the 6 genes mutated in AD patients are labeled in red, TEK is labeled in blue, and larger font size indicate reccurence of variants in a given gene. Violin plot shows enrichment in AD patients as compared to age-matched control, p-value: unpaired two-tailed Mann-Whitney U test. (D) Summary Table showing patients carrying D-SNV in the classical RTK/MAPK pathway and CML associated genes (see Table S3) and indicating the detection of variants in blood, and their association with other variants in microglia. (E) Recurrent variants in the ring-like domain of CBL are indicated in red on the diagram structure of gene, and representative western blot from cell lysates from HEK293T cells expressing WT of mutant CBL alleles and stimulated with EGF or control, probed with antibodies against Phospho-p44/42 MAPK (Erk 1/2, Thr202/Tyr204), total p44/42 MAPK (Erk1/2), and HA-tag (BOTTOM). Data are representative from 5 independent experiments. (F) RIT1 M90I and F82L are represented on the 3D structure of the gene (pdb code: 4klz, F82 is within a segment whose structure was not resolved) and representative western blot from HEK293T cells expressing Flag-RIT1 (WT and mutants) and treated -/+ 20% FBS before harvesting. Lysates were probed with antibodies against Phospho-p44/42 MAPK (Erk 1/2, Thr202/Tyr204), total p44/42 MAPK (Erk1/2, (MAPK)), and Flag. Data are representative from 4 independent experiments. (G) Percentage of D-SNVs detected by targeted deep sequencing (TDS) which were also detected by Whole-Exome-Sequencing (WES). (H) Variant allelic frequency (VAF, %) for the BRAFV600E allele in PU.1+ nuclei from brain samples from histiocytosis patients (each dot represents a sample) and for D-SNVs in in PU.1+ nuclei from brain of AD patients (each dot represent a variant). Note: non-parametric tests were used when data did not follow a normal distribution (D’Agostino-Pearson normality test).

Recurrent CBL and RIT1 variants activate the MAPK pathway

CBL is an E3 ubiquitin-protein ligase that negatively regulates RTK signaling via MAPK90. CBL somatic and germ-line LOF variants such as R420Q have been previously associated with tumoral diseases including clonal myeloproliferative disorders 6272 and RASopathies 91 respectively. We confirmed that CBL RING-domain variants found in AD patients increased MAPK phosphorylation in response to EGF upon expression of HA-tagged WT or mutant alleles in HEK293T cells (Fig. 3E and Fig. S4A). RIT1 is a RAS GTPase, and somatic or germ-line GOF variants such as RIT1 F82L and RIT1 M90I, also enhance MAPK signaling in malignancies 73 and RASopathies 92,93 respectively. As in the case of CBL variants, the 2 RIT1 variants found in AD patients increased MAPK phosphorylation in response to FBS in HEK293T cells expressing these mutant alleles (Fig. 3F and Fig. S4B, S4C). These data altogether indicatesd that a subset of AD patients (12/45, ∼ 27% of this series) present with microglial clones carrying one or several oncogenic variants that activate the RTK/MAPK pathway, and are characterized by recurrent oncogenic variants in CBL and RIT1.

Allelic frequency of the patients’ MAPK activating variants

The allelic frequencies at which MAPK activating variants are detected in brain samples from AD patients range from ∼1 to 6% of microglia (Fig. 3G), which correspond to clones representing 2 to 12% of mutant microglia in these samples, assuming heterozygosity. This allelic frequency is in the range of the allelic frequency of the MAPK activating BRAFV600E variant in microglia from 6 patients diagnosed with BRAFV600E+ histiocytosis, a rare clonal myeloid disorder associated with neurodegeneration 37,9497 (Fig. 3G; Table S5). These data suggested that the size of the mutant microglial clones in AD patients was compatible with a role in a neuro-inflammatory/neurodegeneration process.

Other variants found in microglia from AD patients

Driver variants that did not involve the MAPK pathway included LOF variants in the DNA repair gene CHEK2 including CHEK2 c.319+1G>A 98 and CHEK2 R346H (Fig. S3 and Fig. S4D), Mediator Complex gene MED12 99, Histone methyltransferases SETD2 100 and KMT2C/MLL3, the DNA methyltransferase DNMT3A 85,101, DNA demethylating enzymes TET2 and the Polycomb proteins ASXL1 85. Of note, TET2, DNMT3 and KMT2C variants when present, were frequently detectable in the patients’ matching blood at low allelic frequency (Fig. S3). TET2, DNMT3 and KMT2C are frequently mutated in clonal hematopoiesis 57,58, suggesting that in contrast to other variants, the presence of TET2, DNMT3 and KMT2C/MLL3 in the brain of patients may reflect the entry of blood clones in the brain.

In half of the AD patients, no microglia driver variants were identified. Targeted deep sequencing (TDS) cannot identify variants located outside of the BRAIN-PACT panel, such as other potential additional variants that would activate the MAPK pathway. Therefore, we performed whole exome sequencing (WES) of PU.1 nuclei at an average depth ∼400x, in selected samples from 48 donors, including samples from most of the patients negative for driver variants by TDS (n=17 out of 22), a selection of patients with variants identified by TDS (n=16 out of 23), and 15 controls, followed by a curated Mutect analysis. Only 6/15 (40%) of the driver SNVs previously identified by TDS and confirmed by ddPCR were detectable by WES in these samples (Fig. 3H), indicating a lower sensitivity of WES. Nevertheless, after annotation by 4 modeling predictors (Polyphen, SIFT, CADD/MSC and FATHMM-XF 102107 additional SNVs predicted to be deleterious with high confidence were identified in 8/22 patients without driver variants identified by TDS (Fig. S3; Table S6). Interestingly, 4 of the predicted deleterious variants identified by WES targeted genes that regulate the MAPK pathway (ARHGAP9, ARHGEF26, CHD8, and DIXDC1 (Fig. S3; Table S6).

The patients’ MAPK activating variants increases ERK phosphorylation, proliferation, inflammatory and mTOR pathways in murine microglia and macrophages

CBL variants increased ERK phosphorylation upon lentiviral transduction in BV2 murine microglial cells 108,109 (Fig. S4E). However as this line was immortalized by v-Raf, which might interfere with the study of the MAPK pathway, we also stably expressed WT and variant CBL, RIT1, KRAS, PTPN11 alleles in SV-U19–5 transformed mouse ‘MAC’ lines 110,111 (see Methods and Fig. S5A,B). MAC lines expressing CBL, RIT1, KRAS and PTPN11 variants presented with increased ERK phosphorylation and/or increased proliferation in comparison to their WT controls, as measured by Western immunoblotting and EdU incorporation (Fig. 4A and Fig. S5A,B). In addition, Hallmark and KEGG pathway analysis of RNAseq data from control and mutant lines showed increased RAS, TNF, IL6 and JAK STAT signaling, complement, inflammatory responses, and mTOR pathway activation signatures in mutants (Fig. 4B; Table S7). These data indicated that microglia variants from patient’s activate murine microglial cells and growth factor-dependent macrophages with proliferative and inflammatory responses in vitro. However, overexpression of mutant alleles in mouse cell lines does not necessarily recapitulate or predict the effects of a heterozygous genetic variant in physiological conditions. Thus, we investigated the role of CBLC404Y allele in heterozygous human primary microglia-like cells.

MAPK pathway activating variants in mouse macrophages and human iPSC-derived microglia-like cells.

(A) Representative western-blot analysis (Top panels) and quantification (Middle panels) of phospho- and total-ERK in lysates from a murine CSF-1 dependent macrophage cell line expressing CBLWT, CBLI383M, CBLC384Y, CBLC404Y, CBLC416S (n=3-6), and RIT1WT, RIT1F82L and RIT1M90I (n=3), KRASWT and KRASA59G (n=3). Bottom panels depicts flow cytometry analysis of EdU incorporation in the same lines. Statistics, Unpaired t-test. (B) HALLMARK and KEGG pathways (FDR/adj.p value <0.25, selected from Table S7) enriched in gene set enrichment analysis (GSEA) of RNAseq from mutant CSF-1 dependent macrophages lines CBLI383M, CBLC384Y, CBLC404Y, CBLC416S, CBLR420Q, RIT1F82L RIT1M90I, KRASA59G, and PTPN11T73I (n=3-6) in comparison with their wt controls. NES: normalized enrichment score. (C) Sanger sequencing of 2 independent hiPSC clones (#93 and #91) of CBL404C/Y heterozygous mutant carrying the c.1211G/A transition on one allele and 2 independent isogenic control CBL404C/C clones (#71 and #89) all obtained by prime editing. (D) Photomicrographs in CBL404C/C and CBL404C/Y iPSC-derived microglia-like cells.(E) Quantification of leading edge and lateral lamellipodia in CBL404C/C and CBL404C/Y iPSC-derived microglia-like cells. n=3-7, statistics: pvalue are obtained by nested one-way ANOVA. (F) Flow cytometry analysis of cell size for the same lines (n>3) statistics: pvalue are obtained with nested one-way ANOVA,). (G) Flow cytometry analysis of EdU incorporation in CBL404C/C and CBL404C/Y microglia-like cells after a 2 hours EdU pulse. n=3, unpaired t-test). (H) Western-blot analysis (left) and quantification (right) of phospho- and total-ERK proteins in lysates from CBL404C/C and CBL404C/Y microglia-like cells starved of CSF-1 for 4 h and stimulated with CSF-1 (5 min, 100 ng/mL) (n=4), statistics: pvalue are obtained with two-way ANOVA.

Heterozygosity for a CBL variant allele activates human microglia-like cells

We used prime editing 112 of human induced pluripotent stem cells (hiPSCs, see Methods) to generate isogenic hiPSCs clones heterozygous for the patients’ variants (Fig. 3C and Fig. S5). We focused our analysis on CBL404C/Y mutant lines because CBL was mutated in 6 patients and 2 of them carried the same CBL c.1211G>A p.C404Y variant (Fig. 3D). Microglia-like cells were differentiated from two independent hiPSC-derived CBL404C/Y lines and their isogenic CBL404C/C controls (Fig. 4C and Fig. S5C). CBL404C/Y and isogenic CBL404C/C microglia-like cells expressed similar amount of CBL total mRNA and protein, and CBL404C/Y cells expressed wt and mutant mRNA in similar amounts, as expected assuming bi-allelic expression of CBL (Fig. S5D-S5F). CBL404C/Y cells presented with a phenotype comparable to isogenic CBL404C/C microglia-like cells for expression of IBA1, CSF1R, NGFR, EGFR, CD11b, MRC1, CD36, CD11c, Tim4, CD45, and MHC Class II (Fig. S5G). Their viability was also comparable to control (Fig. S5H). However, CBL404C/Y cells were larger and presented with more lamellipodia, resulting in an amoeboid morphology less frequently observed in isogenic controls (Fig. 4D,4E), and their proliferation rate was slightly increased, as measured by EDU incorporation (Fig. 4F). Moreover CBL404C/Y cells cultured in CSF1-supplemented medium also presented with a higher basal pERK level than control when restimulated with CSF1 (Fig. S5I), and ERK phosphorylation after stimulation of starved microglia-like cells with CSF-1 was increased by ∼2 fold in comparison to isogenic WT (Fig. 4G). Altogether, these results showed that heterozygosity for a CBLC404Y allele is sufficient to activate human microglia-like cells increasing their proliferation and ERK activation.

Heterozygosity for a CBLC404Y allele drives a microglial neuroinflammatory/AD associated signature

Gene Set Enrichment Analyses (GSEA) of RNAseq comparing CBL404C/Y and isogenic CBL404C/C microglia-like cells showed upregulation of Glycolysis, Oxidative Phosphorylation, and mTORC1 signatures, indicating increased metabolism and energy consumption by the mutant cells (Fig. 5A; Table S8). In addition, as observed in MAC lines, CBL404C/Y cells upregulated complement, TNF, and JAK STAT signaling and inflammatory signatures (Fig. 5A; Table S8) 113. Increased production of TNF, IL-6, IFN-ψ, IL-1β, C3 and complement Factor H (CFH) by CBL404C/Y cells was confirmed by ELISA (Fig. 5B). In addition, CBL404C/Y microglia-like cells also presented with signatures from the KEGG database associated with neurodegenerative disorders (Fig. 5A; Table S8), and for the recently published human microglia AD scRNAseq signature, obtained by analysis of 24 sporadic AD patients and 24 controls 21 (Fig. 5C). These data indicated that heterozygosity for the CBLC404Y allele is sufficient to drive expression of a neuroinflammatory/AD signature in a human microglia-like cell type, characterized by increased metabolism and the production of neurotoxic cytokines known to interfere with normal brain homeostasis.

CBL404C/Y microglia signature.

(A) HALLMARK and KEGG pathways (FDR/adj.p value <0.25, selected from Table S8) enriched in gene set enrichment analysis (GSEA) of RNAseq from from CBL404C/Y iPSC-derived macrophages and isogenic controls NES, normalized enrichment score. (B) ELISA for pro-inflammatory cytokines (n=3) and complement proteins (n=2) in the supernatant from CBL404C/Y iPSC-derived microglial like cells and isogenic controls. Statistics: p-value are obtained by nonparametric Mann-Whitney U test,* 0.05, ** 0.01, *** 0.001, **** 0.0001. (C) GSEA analysis for enrichment of the human AD-microglia snRNAseq signature (MIC1) 21 in differentially expressed genes between CBL404Y/C microglial like cells and isogenic controls. (D) Unsupervised Louvain clustering of snRNAseq data from 5 samples of FACS-purified PU.1+ microglia nuclei from 4 donors (see Fig. S1C) control C14, AD without driver variant (AD34) and AD with driver variants (AD 52 and 53). (E) Dot plot represents the GSEA analysis of HALLMARK and KEGG pathways enriched in snRNAseq microglia clusters (samples from all donors). Genes are pre-ranked per cluster using differential expression analysis with SCANPY and the Wilcoxon rank-sum method. Statistical analyses were performed using the fgseaMultilevel function in fgsea R package for HALLMARK and KEGG pathways. Selected gene-sets with p-value < 0.05 and adjusted p-value < 0.25 are visualized using ggpubr and ggplot2 R package (gene sets/pathways are selected from fig S6B, Table S9). (F) Dot plot represents the GSEA analysis (as in (E)) of HALLMARK and KEGG pathways enriched in cluster 2/2B and deconvoluted by donor samples (selected from Fig. S6A).

The MAPK variant neuroinflammatory microglial signature is detectable in patients

Unsupervised Louvain clustering of the snRNAseq data from 5 samples of purified microglia nuclei from 4 donors (control, AD without and with driver variants, see Fig. S1C; Table S9) outlined 17 microglia clusters (Fig. 5D, Fig. S1A-E and S7A). GSEA analysis of the snRNAseq data showed that microglia samples from mutant patients were enriched for the signatures observed in the MAC lines and CBL404C/Y cells (Fig. S6). In particular microglia cluster 2/2B, where KRAS A59G variant reads from patient AD52 are detected despite the small size of the mutant clones and the low sensitivity of scRNAseq to detect low allelic frequency variants, was most enriched for the inflammatory, TNF, mTOR and oxidative phosphorylation and glycolysis signatures (Fig. 5E and Fig. S6). Deconvolution of cluster 2/2B by samples confirmed that the samples from patients carrying variants (AD52, AD53) and not the controls (C11, AD34) were responsible for this neuroinflammatory and metabolic signature (Fig. 5F and Fig. S6). Altogether, the above results strongly support the hypothesis that microglial clones with driver/oncogenic variants that activate the MAPK pathway are responsible for a metabolic and neuroinflammatory signature in vitro which includes the production of neurotoxic cytokines that are also detected in vivo in patients.

Discussion

We report here that microglia from a cohort of 45 patients with intermediate-onset sporadic AD (mean age 65 y.o), is enriched for clones carrying driver/oncogenic variants predominantly in the MAPK pathway. These variants are absent from blood, glia or neurons in most cases. They include reccurent MAPK pathway variants (CBL RING domain variants in 6 patients), which promote microglial proliferation, activation, and expression of a neuroinflammatory/neurodegereration-associated transcriptional programme in vitro and in vivo, and the production of neurotoxic cytokines IL1b, TNF, and IFNg 114117. Heterozygous expression of pathogenic CBL variant in human microglia-like cells was sufficient to drive a transcriptional program that associates with increased metabolic activity and a neurotoxic inflammatory response, also observed in microglia from patients with MAPK-activating variants.

The association between AD and MAPK pathway variants is consistent with a previous study where WES performed on unseparated brain tissue from AD patients showed that putative pathogenic somatic variants were enriched for the MAPK pathway, despite the lower sensitivity of the approach and the lack of cellular specificity 30. The pathogenic role of the somatic pathogenic variants in the MAPK pathway associated with the microglia of AD patients is supported by several lines of evidence. We show here that they promote a neuroinflammatory/neurodegereration-associated transcriptional programme in microglia like cells. In addition, somatic variants that activate the MAPK pathway in tissue macrophages cause a clonal proliferative and inflammatory disease called Histiocytosis, strongly associated with neurodegeneration 37,9496, and introduction in mouse microglia of the variant allele most frequently associated with histiocytosis (BRAFV600E) causes neurodegeneration in mice 37. The allelic frequencies of pathogenic variants found in AD patients is lower than values classically observed in solid tumors or leukemia, but within the range of the clonal frequency of pathogenic T cells observed in auto-immune diseases 118, and we found that they were in the range of the allelic frequencies observed for the BRAFV600E variant in microglia in the brain of Histiocytosis patients. Moreover, the RAS/MAPK signaling pathway is involved in microglia proliferation, activation and inflammatory response 119121, neuronal death, neurodegeneration, and AD pathogenesis 15,19,37,122, and its activation has been proposed to be an early event in the pathophysiology of AD in human 123. Neuroinflammation is an early event in AD pathogenesis, increasingly considered as critical in pathogenesis initiation and progression 16,124,125. This is underscored by the observation that the main known genetic risk factor for sporadic AD is the APOE4 allele, responsible for an increased inflammatory response in the brain of APOE4 carriers 16. In this regard, the contributing role of MAPK activating variants could be comparable to that of the APOE4 allele, and we noted that the allelic frequency of APOE4 allele is lower in patients with pathogenic variants (16/46 alleles, 34%) than patients without detected variant (23/44 alleles, 53%) although the difference did not reach significance in this series.

Variants targeting the DNA-repair and DNA/histone methylation pathways are also enriched among AD patients, sometimes associated in the same patients, albeit their functional significance was not investigated here. Of note however, germline variants of the DNA-repair transcription factor TP53, and DNA damage sensors ATR and CHEK2 were shown to promote accelerated neurodegeneration in human 82,83.

Microglia variants are frequently absent from blood, and our DNA sequencing barcoding approach does not support a model where blood cells massively infiltrate the brain or replace the microglia pool in patients from our series, but instead consistent with the local maintenance and proliferation of microglia 38,39. In addition, our results are consistent with a recent study showing that clonal hematopoiesis was inversely associated with the risk of AD60.

The association of microglia clones carrying pathogenic variants with AD in a subset of patients, together with evidence that they drive neuroinflammation, suggest that these clones could contribute to AD pathogenesis, together with other genetic and environmental factors. Lewis bodies, amyloid angiopathy, tauopathy, or alpha synucleinopathy, were equally distributed among AD patients with or without microglia clones carrying MAPK activating variants. The natural history of the microglial clones is difficult to study in human. It is possible that microglial clones with proliferative and activation advantage and a neuroinflammatory and neurotoxic profile may be present at the onset and contribute to the early stages of the disease. Alternatively it is also possible that the microglial clones carrying the driver mutations appear or are selected later during the course of the disease in the inflammatory milieu of the AD brain. In the latter case, microglial clones would not contribute to disease initiation, but may contribute to the progression of neuroinflammation and neurodegeneration.

Competing interests

FG has been a paid consultant (no equity) to Third Rock Ventures from 2018 to 2020. Sequencing costs and analysis in this study were covered in part by a SRA between Third Rock venture and MSKCC. This work led to patents PCT/US2022/037893/WO2023004054A1 ‘Methods and compositions for the treatment of alzheimer’s disease’ by MSKCC and PCT/US2018/047964 ‘Kinase mutation-associated neurodegenerative disorders’ by MSKCC.

Acknowledgements

This study was supported by grants from NIH: P30 CA008748 MSKCC core grant, 1R01NS115715-01, 1 R01 HL138090-01, and 1 R01 AI130345-01 to FG, and Basic and Translational Immunology Grants from Ludwig Center for Cancer Immunotherapy and from Cycle for Survival to FG. RV was supported by the 2018 AACR-Bristol-Myers Squibb Fellowship for Young Investigators in Translational Immuno-oncology, Grant Number 18-40-15-VICA. LW was supported by NYSTEM training award C32559GG and a Charles H Revson fellowship. Sequencing costs and analysis were covered in part by a SRA between Third Rock venture and MSKCC. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. MAC mouse cell lines were kindly provided by Dr Richard E Stanley. Code for Shearwater ML and for single cell mRNA genotyping were provided by Dr Inigo Martincorena by Dr Noor Sohail respectively.

Authors contribution

RV and FG designed the study and wrote the draft of the manuscript. RV designed, performed and supervised the collection of brain samples, brain nuclei separation and preparation of samples for DNA and snRNA sequencing with help from AA, OA and AB. DNA and bulk and snRNAseq sequencing were performed in MSKCC genomics core under supervision of AV and NS. DNA sequencing data were analyzed, validated, and interpreated RV and SF, with support from NS, BB, MO, JLC, and FG. Transfected/transduced lell lines were generated by BPC, SYH, WTM (HEK cells) and RV (BV-2 lines) and LW (MAC lines). hiPSCs clones heterozygous for the patients’ variants and isogenic controls were generated and validated with the SKI Stem Cell Research Core by TZ, LW and TL. Biochemical analysis of HEK and BV2 lines was performed by BPC, SYH, WTM. Protocols for culture and analysis of hiPSC-derived cells were designed by TL. Biochemical and phenotypic analysis and RNA preparation for MAC lines hiPSC-derived cells was performed by LW. Analysis of bulk RNAseq data was performed by SF (hiPSC derived cells) and NS (Mac lines). Analysis of snRNAseq was performed by YH, LW, and OE. The Netherland Brain Bank, and MSKCC LWP (CAI-D and RK) provided patients and control brain and blood samples. RMR, RC, and OA contributed to the discussion of the results and edited the manuscript. FG supervised all aspect of the work. RV and FG prepared the initial and revised manuscripts. All authors contributed to the manuscript.

Methods

Tissue samples

The study was conducted according to the Declaration of Helsinki. Human tissues were obtained with patient-informed consent and used under approval by the Institutional Review Boards from Memorial Sloan Kettering Cancer Center (IRB protocols #X19-027). Snap-frozen human brain and matched blood were provided by the Netherlands Brain Bank (NBB), the Human Brain Collection Core (HBCC, NIH), Hospital Sant Joan de Déu and the Rapid Autopsy Program (MSKCC, IRB #15-021). Samples were neuropathologically evaluated and classified by the collaborating institutions as Alzheimer’s disease (AD) (1-5) or non-dementia controls. The mean age of AD patients is 65 years old (55.5% female, 44.5 male). The mean age of all controls is 54 years old (60% female, 40% male), and of the mean age of AD age-matched controls was 70 years old (60%, 40% male). The overall mean of the post-mortem delay interval was 9.8 hours. Patients did not present with germline pathogenic PSEN1/2/3 or APP AD’s associated variants. For additional information on donor’s brain regions. sex, age, cause of death, Apoe status, Braak status see Supplementary table S2. To avoid possible contamination of sequencing data with variants from donor’s tumoral disease, for the group of non-dementia controls, we refrained from selecting cases with blood malignancies or with brain tumors. Samples from histiocytosis patients were collected under GENE HISTIO study (approved by CNIL and CPP Ile-de France) from Pitié-Salpêtrière Hospital and Hospital Trousseau and from Memorial Sloan Kettering Cancer Center.

Nuclei isolation from frozen brain samples, FACS-sorting and DNA extraction

All samples were handled and processed under Air Clean PCR Workstation. ∼250-400 mg of frozen brain tissues was homogenized with a sterile Dounce tissue grinder using a sterile non-ionic surfactant-based buffer to isolate cell nuclei ‘homogenization buffer’ (250 mM Sucrose, 25 mM KCL, 5 mM MgCl2, 10 mM Tris buffer pH 8.0, 0.1% (v/v) Triton X-100, 3 μM DAPI, Nuclease Free Water). Homogenate was filtered in a 40-μm cell strainer and centrifuged 800g 8 min 4°C. To clean-up the homogenate, we performed a iodixanol density gradient centrifugation as follow: pellet was gently mixed 1:1 with iodixanol medium at 50% (50% Iodixanol, 250 mM Sucrose, 150 mM KCL, 30 mM MgCl2, 60 mM Tris buffer pH 8.0, Nuclease Free Water) and homogenization buffer. This solution layered to a new tube containing equal volume of iodixanol medium at 29% and centrifuged 13.500g for 20 min at 4°C. Nuclei pellet was gently resuspended in 200 μl of FACS buffer (0.5% BSA, 2mM EDTA) and incubated on ice for 10 min. After centrifugation 800g 5 min 4°C, sample was incubated with anti-NeuN (neuronal marker, 1:500, Anti-NeuN-PE, clone A60 Milli-Mark™) for 40 min. After centrifugation 800g 5 min 4°C, sample was washed with 1X Permeabilization buffer (Foxp3/Transcription Factor Staining Buffer Set, eBioscience™) and centrifuged 1300g 5 min, without breaks to improve nuclei recovery. Staining with anti-Pu.1 antibody in 1X Permeabilization buffer (microglia marker 1:50, Pu.1-AlexaFluor 647, 9G7 Cell Signaling) was performed for 40 min. After a wash with FACS buffer sample were ready for sorting. Nuclei are FACS-sorted in a BD FACS Aria with a 100-μm nozzle and a sheath pressure 20 psi, operating at ∼1000 events per second. Nuclei were sorted into 1.5 ml certified RNAse, DNAse DNA, ATP and Endotoxins tubes containing 100μl of sterile PBS. For each population we sorted >105 nuclei. Sorting purity was >95%. Sorting strategy is depicted in Fig. S1. Of note, the Double-negative gate is restricted to prevent cross-contamination between cell types. Nuclei suspensions were centrifuged 20 min at 6000g and processed immediately for gDNA extraction with QIAamp DNA Micro Kit (Qiagen) following manufacture instructions. DNA from whole-blood samples was extracted with QIAamp DNA Micro Kit (Qiagen) following manufacture instructions. Flow cytometry data was collected using DiVa 8.0.1 Software. Subsequent analysis was performed with FlowJo_10.6.2. For sorting strategy, see Fig. S1.

DNA library preparation and sequencing

DNA samples were submitted to the Integrated Genomics Operation (IGO) at MSKCC for quality and quantity analysis, library preparation and sequencing. DNA quality mas measured with Tapestation 2200. All samples had a DNA Integrity Number (DIN) >6. After PicoGreen quantification, ∼200ng of genomic DNA were used for library construction using the KAPA Hyper Prep Kit (Kapa Biosystems KK8504) with 8 cycles of PCR. After sample barcoding, 2.5ng-1µg of each library were pooled and captured by hybridization with baits specific to either the HEME-PACT (Integrated Mutation Profiling of Actionable Cancer Targets related to Hematological Malignancies) assay, designed to capture all protein-coding exons and select introns of 576 (2.88Mb) commonly implicated oncogenes, tumor suppressor genes (6) and/or HEME/BRAIN-PACT (716 genes, 3.44 Mb, table S1) an expanded panel that included additional custom targets related to neurological diseases including, Alzheimer’s Disease, Parkinson’s Disease, Amyotrophic Lateral Sclerosis (ALS) and others (table S1) (7-15). To simplify, in the manuscript the combined panel is referred to as ‘BRAIN-PACT’. In table S3, ‘Heme-only’ or ‘Brain-only’ is indicated in the cases for which only one or the other panels were used. Capture pools were sequenced on the HiSeq 4000, using the HiSeq 3000/4000 SBS Kit (Illumina) for PE100 reads. Samples were sequenced to a mean depth of coverage of 1106x (Control samples: 1071x, AD samples 1100x). For detailed information on the sample quality control checks used to avoid potential sample and/or barcode mix-ups and contamination from external DNA, see (6).

Mutation data analysis

The data processing pipeline for detecting variants in Illumina HiSeq data is as follows. First the FASTQ files are processed to remove any adapter sequences at the end of the reads using cutadapt (v1.6). The files are then mapped using the BWA mapper (bwa mem v0.7.12). After mapping the SAM files are sorted and read group tags are added using the PICARD tools. After sorting in coordinate order the BAM’s are processed with PICARD MarkDuplicates. The marked BAM files are then processed using the GATK toolkit (v 3.2) according to best practices for tumor normal pairs. They are first realigned using ABRA (v 0.92) and then the base quality values are recalibrated with the BaseQRecalibrator. Somatic variants are then called in the processed BAMs using MuTect (v1.1.7) for SNV and ShearwaterML(16-18).

muTect (v1.1.7)

To identify somatic variants and eliminate germline variants, we run the pipeline as follow: PU.1, DN and Blood samples against matching-NeuN samples, and NeuN samples against matching-PU.1. In addition, we ran all samples against a Frozen-Pool of 10 random genomes. We selected Single Nucleotide Variations (SNVs) [Missense, Nonsense, Splice Site, Splice Regions] that were supported by at least 4 or more mutant reads and with coverage of 50x or more. Fill-out file for each project (∼27 samples per sequencing pool), were used to exclude by manual curation, variants with high background noise. This resulted in 428 variants (Missense, Nonsense, Splice_site, Splice_Region).

ShearwaterML, was used to look for low allelic frequency somatic variants as it has been shown to efficiently call variants present in a small fraction of cells with true positives being ∼90%. Briefly, the basis of this algorithm is that is uses a collection of deep-sequenced samples to learn for each site a base-specific error model, by fitting a beta-binomial distribution to each site combining the error rates across all normal samples both the mean error rate at the site and the variation across samples, and comparing the observed variant rate in the sample of interest against this background model using a likelihood-ratio test. For detailed description of this algorithm please refer to (16, 17). In our data set, for each cell type (NeuN, DN, PU.1) we used as “normal” a combination of the other cell types, i.e PU.1 vs NeuN+DN, DN vs NeuN+PU.1, NEUN vs PU.1+DN, Blood vs NeuN+DN. Since all samples were processed and sequenced using the same protocols, we expect the background error to be even across samples. More than 400 samples were used as background leading to an average background coverage >400.000x. Resulting variants for each cell type were filtered out as germline if they were present in more than 20% of all reads across samples. Additionally, variants with coverage of less than 50x and more than 35% variant allelic frequency (VAF) were removed from downstream analysis. P-values were corrected for multiple testing using Benjamini & Hochberg’s False Discovery Rate (FDR) (19) and a q-value of cutoff of 0.01 was used to call somatic variants. Variants were required to have a least one supporting read in each strand. Somatic variants within 10bp of an indel were filtered out as they typically reflect mapping errors. We selected Single Nucleotide Variations (SNVs) [Intronic, Intergenic, Missense, Nonsense, Splice Site, Splice Regions] that were supported by at least 4 or more mutant reads and annotated them using VEP. Finally, to reduce the risk of SNP contamination, we excluded variants with a MAF (minor allelic frequency) cutoff of 0.01 using the gnomeAD database. This resulted in 509 SNVs.

We compared the final mutant calls from Muetc1 and ShearwaterML and found that 30% of the events (91 variants) that were called by MuTect1 were also called by ShearwaterML. Overall a total of 826 variants (table S3) were found, with a mean coverage at the mutant site of 668.3X (10% percentile: 276X, 90% percentile: 1181X) and a mean of 29.1 mutant reads (10% percentile: 4, 90% percentile: 52), with 84% of mutated supported by at least 5 mutant reads (table S3). The median allelic frequency was ∼1.34% (table S3). Negative results for matching brain negative samples were confirmed in 100% of samples at a mean depth of ∼5000x (range 648-23.000x) (table S3), confirming nuclei sorting purity of >95% for PU.1+, DN, and NEUN+ populations.

Validation of variants by droplet-digital-PCR (ddPCR)

We performed validation of ∼12% of variants (81/643) by droplet-digital PCR (ddPCR) on pre-amplified DNA or on libraries (in the cases where DNA was available). Around 15% (12/81) of the variants analyzed by ddPCR were called by ShearwaterML, ∼44% (34/81) were called by Mutect1 and 40% (33/81) by both ShearwaterML+ Mutect1. Altogether we confirmed 73/81 of variants tested (∼91%). In addition, 61assays (from variants detected in PU.1+ nuclei) were tested in corresponding cell types isolated from the same brain region. This can help estimate the sorting purity. We found that the mean sorting purity was >96%, even in the cases were the clone in PU.1was >3% (Fig. S1). Assays were also run in matching blood when available. The mean depth of ddPCR was ∼5000x and mutant counts of 3 or more were considered positive. VAF obtained by ddPCR correlated with original VAF by sequencing (R2 0.93, p<0.0001 See Fig. S1). For KRAS_G12D: Bio-Rad validated assay (Unique Assay ID: dHsaMDV2510596) and MTOR_Arg1616His_c.4847G>A: Bio-Rad validated assay (Unique Assay ID: dHsaMDV2510596) were used. The remaining assays were designed and ordered through Bio-Rad. For setting-up the right conditions for newly designed assays, cycling conditions were tested to ensure optimal annealing/extension temperature as well as optimal separation of positive from empty droplets. All reactions were performed on a QX200 ddPCR system (Bio-Rad catalog # 1864001). When possible, each sample was evaluated in technical duplicates or quartets. Reactions contained 10ng gDNA, primers and probes, and digital PCR Supermix for probes (no dUTP). Reactions were partitioned into a median of ∼31,000 droplets per well using the QX200 droplet generator. Emulsified PCRs were run on a 96-well thermal cycler using cycling conditions identified during the optimization step (95°C 10’; 40-50 cycles of 94°C 30’ and 52-56°C 1’; 98°C 10’; 4°C hold). Plates were read and analyzed with the QuantaSoft sotware to assess the number of droplets positive for mutant DNA, wild-type DNA, both, or neither. ddPCR results are listed in table S3.

Classification of variants

To classify somatic variants according to their pathogenicity we did as follow: Variants were classified as ‘driver’ if reported as pathogenic/likely pathogenic by ClinVar(20) and/or oncogenic/predicted oncogenic/likely oncogenic by OncoKb(21) (table S3). These two databases report pathogenicity in cancer and other diseases, based on supporting evidence from curated literature. We considered classical-MAPK-pathway genes those reported to be mutated in RASopathies: BRAF, CBL, KRAS, MAP2K1, NF1, PTPN11, SOS1, RIT1, SHOC2, NRAS, RAF1, RASA1, HRAS, MAP2K2,SPRED1 (22, 23) (See Table S3).

Quantification of mutational load and statistics

We defined mutational load or mutational burden as the number of synonymous and non-synonymous somatic single-nucleotide-variant (SNV) per megabase of genome examined (24). Overall, a total of 643 single-nucleotide-variants (SNV) were detected resulting in 0.3 mutations/Mb sequenced. As detailed in the manuscript, the mutational load varies considerably across cell types and patients. To quantify mutational load we took into consideration the panel used for sequencing each sample: HEME-PACT (2.88 Mb) or the extended panel BRAIN-PACT (3.44 Mb) (see table S1,S3). Therefore, the number of mutations was normalized by the number of Mb sequenced for that specific sample. In the cases where we calculated mutational load per patient, we averaged the mutational load of each sample from that patient for a given cell type [(i.e if for one patient, 2 PU.1 samples were sequenced, one from hippocampus and one from superior parietal gyrus (with BRAIN-PACT) then the mutational load for PU.1 for that patient is the mean of the mutational load of the 2 PU.1 samples analyzed). For the quantification of pathogenic variants, the same analysis is performed, quantifying only variants that are reported as pathogenic/likely pathogenic by ClinVar and/or OncoKb. Statistical significance was analyzed with GraphPad Prism (v9) and R (3.6.3). Non-parametric tests were used when data did not follow a normal distribution (Normality test: D’Agostino-Pearson and Shapiro-Wilk test). For normally distributed data, unpaired t-test was used to compare two groups and one-way, nested one-way or two-way analyses of variance (ANOVA) were used for comparing more than two groups, as indicated in the Fig.s. For data that did not have a normal distribution, the tests performed were unpaired two-tailed Mann-Whitney U test and Kruskal–Wallis test and Dunn’s test for multiple comparisons. Pearson and Spearman were used for correlation analysis. In Fig. 2G, we used multivariate logistic regression analysis to test if there was an association between Alzheimer’s disease and the presence of driver variants in PU.1+ nuclei. We used Alzheimer’s disease as a dependent variable, and age, sex, and the presence of pathogenic variant/s (Yes/No) as co-variates. In all the statistical tests, significance was considered at P < 0.05. For Venn Diagram plots, we used (25).

Pathway enrichment analysis of genes target of variants was performed using Metascape (26) and the following ontology sources: KEGG Pathway (27, 28), GO Molecular function (29, 30), Reactome Gene Sets (29, 30) and Canonical Pathways (31). The list of 716 genes from the targeted panel were used as the enrichment background. Terms with a p-value< 0.05, a minimum count of 3, and an enrichment factor > 1.5 (the enrichment factor is the ratio between the observed counts and the counts expected by chance) are shown. p-values are calculated based on the cumulative hypergeometric distribution (32).

Expression of target genes in microglia

Expression levels of genes target of somatic variants found in AD patients as well as healthy individuals in the brain and microglia (Fig. S3) were confirmed using publicly available datasets (https://www.proteinatlas.org/ and (33, 34). For data from Galatro et al. (GSE99074) (33), normalized gene expression data and associated clinical information of isolated human microglia (N = 39) and whole brain (N = 16) from healthy controls were downloaded from GEO. For data from Gosselin et al. (34), raw gene expression data and associated clinical information of isolated microglia (N = 3) and whole brain (N = 1) from healthy controls were extracted from the original dataset. Raw counts were normalized using the DESeq2 package in R (35).

Nuclei isolation from frozen brain samples for sn-RNAseq

For sn-RNAseq studies we only selected samples with RIN score in whole tissue of 6 or more. All samples were handled and processed under Air Clean PCR Workstation. About 250-400 mg of frozen brain tissues were homogenized with a sterile Dounce tissue grinder using a sterile homogenization buffer to isolate cell nuclei (250 mM Sucrose, 25 mM KCL, 5 mM MgCl2, 10 mM Tris buffer pH 8.0, 0.1% (v/v) Triton X-100, 3 μM DAPI, Nuclease Free Water and 20 U/ml of Superase-In RNase inhibitor and 40 U/ml RNasin ribonuclease inhibitor). Homogenate was filtered in a 40-μm cell strainer and centrifuged 800g 8 min 4°C. To clean-up the homogenate, we performed a iodixanol density gradient centrifugation as follow: pellet was gently mixed 1:1 with iodixanol medium at 50% (50% Iodixanol, 250 mM Sucrose, 150 mM KCL, 30 mM MgCl2, 60 mM Tris buffer pH 8.0, Nuclease Free Water) and homogenization buffer. This solution layered to a new tube containing equal volume of iodixanol medium at 29% and centrifuged 13.500g for 20 min at 4°C. Nuclei pellet was resuspended in FACS buffer with RNAse inhibitors (0.5% BSA, 2mM EDTA, Superase-In RNase inhibitor and 40 U/ml RNasin ribonuclease inhibitor) and centrifuged 800g 5 min, 4 °C. Nuclei pellet was fixed with 90% ice-cold methanol and incubated for 10 min on ice, followed by a centrifugation at 1300g (without brakes, which improves with nuclei recovery after fixation). The pellet was resuspended in permeabilization buffer (6% BSA, Superase-In RNase inhibitor 20 U/mL, RNasin ribonuclease inhibitor 40 U/mL and 0.05% Triton) followed by a centrifugation at 1300g. Sample was incubated with anti-Pu.1 antibody (microglia marker 1:50, Pu.1-AlexaFluor 647, 9G7 Cell Signaling) in permeabilization buffer. After a wash with FACS buffer sample were ready for sorting. Nuclei are FACS-sorted in a BD FACS Aria with a 100-μm nozzle and a sheath pressure 20 psi, operating at ∼1000 events per second. Nuclei were sorted into 1.5 ml certified RNAse, DNAse DNA, ATP and Endotoxins tubes containing 100μl of sterile PBS. For each population we sorted >105 nuclei into FACS buffer.

Sn-RNAseq library preparation and sequencing

The single-nuclei RNA-Seq of FACS-sorted nuclei suspensions was performed on Chromium instrument (10X genomics) following the user guide manual (Reagent Kit 3’ v3.1). Each sample, containing approximately 10,000 nuclei at a final dilution of ∼1,000 cells/µl was loaded onto the cartridge following the manual. The individual transcriptomes of encapsulated cells were barcoded during RT step and resulting cDNA purified with DynaBeads followed by amplification per manual guidelines. Next, PCR-amplified product was fragmented, A-tailed, purified with 1.2X SPRI beads, ligated to the sequencing adapters and indexed by PCR. The indexed DNA libraries were double-size purified (0.6–0.8X) with SPRI beads and sequenced on Illumina NovaSeq S4 platform (R1 – 26 cycles, i7 – 8 cycles, R2 – 70 cycles or higher). Sequencing depth was ∼200 million reads per sample on average. FASQ files were processed using SEQC pipeline (36) for quality control, mapping to GRCH38 reference genome, and log2 transformation of the data with the default SEQC parameters to obtain the gene-cell count matrix.

Sn-RNAseq analysis

Seurat v4.0.3 with default parameters was used to perform sctransform (SCT) normalization, integration and Uniform Manifold Approximation and Projection (UMAP) for dimensionality reduction. The FindClusters function was used for cell clustering. To improve clustering, all samples were analyzed in an integrated analysis, based on canonical correlation analysis (CCA). Cell types were annotated using the top 500 DEGs of each cell type in a human cortex database. Data can be accessed at https://weillcornellmed.shinyapps.io/Human_brain/. The removal of doublets using DoubletFinder and cells with high mitochondrial content (>10% mitochondrial RNA) yielded between 6,437 and 9,241 nuclei per patient and sample. Microglia represented 94 ± 3% of total cells. Unique Molecular Identifiers (UMIs) per nucleus and gene count per nucleus were comparable between donors. Integrated_snn at resolution 0.2 outlined 17 microglia clusters. Except for cluster 13 consisting at 97% of cells from the healthy control C11_AG, all donors and samples were represented in every cluster. Cluster 19 contained few cells (0.84% of total microglia, for an average of 6.20 ± 1.60% for other clusters) and was marked by a low number of cluster-enriched genes and was excluded from further analyses. For pathway enrichment analysis, genes were pre-ranked using differential expression analysis in SCANPY (37) with Wilcoxon rank-sum method. Statistical analysis were performed using the fgseaMultilevel function in fgsea R package (38) for HALLMARK and KEGG pathways. Gene sets with p-value < 0.05 and adjusted p-value < 0.25 were selected and visualized using ggpubr and ggplot2 (39) R package. For the variant analysis of AD52_HIP harboring a KRASA59G (c.176C>G) clone, Integrative Genomics Viewer (IGV) software was used to display sequencing reads at KRAS c.176C (exon 3; GRCh38 chr12:25,227,348). Cells within each cluster were identified based on the 16-digit barcodes from SEQC-aligned reads. Barcodes were converted to the 10X Genomics format and used to sample reads from each cluster within the original BAM file. BAM subsets for each cluster were read with IGV and reads with identical UMIs were filtered out to account for amplification bias.

Whole-Exome-Sequencing and analysis

Remaining libraries from a selected group of PU.1 and NEUN samples sequenced with BRAIN-PACT (see above) were sequenced by Whole-Exome-Sequencing (WES). Matching NEUN samples were sequenced to extract the germline variants. Around 100 ng of library were captured by hybridization using the xGen Exome Research Panel v2.0 (IDT) according to the manufacturer’s protocol. PCR amplification of the post-capture libraries was carried out for 12 cycles. Samples were run on a NovaSeq 6000 in a PE100 run, using the NovaSeq 6000 S4 Reagent Kit (200 Cycles) (Illumina). Samples were covered to an average of 419X. The data processing pipeline for detecting variants in Novaseq data is as follows. First the FASTQ files are processed to remove any adapter sequences at the end of the reads using cutadapt (v1.6). The files are then mapped using the BWA mapper (bwa mem v0.7.12). After mapping the SAM files are sorted and read group tags are added using the PICARD tools. After sorting in coordinate order the BAM’s are processed with PICARD MarkDuplicates. The marked BAM files are then processed using the GATK toolkit (v 3.2) according to the best practices for tumor normal pairs. They are first realigned using ABRA (v 0.92) and then the base quality values are recalibrated with the BaseQRecalibrator. Somatic variants are then called in the processed BAMs using muTect (v1.1.7) for SNV and the Haplotype caller from GATK with a custom post-processing script to call somatic indels. The full pipeline is available here https://github.com/soccin/BIC-variants_pipeline and the post processing code is at https://github.com/soccin/Variant-PostProcess. We selected Single Nucleotide Variants (SNVs) [Missense, Nonsense, Splice Site, Splice Regions] that were supported by at least 8 or more mutant reads, variant allelic frequency above 5% and with coverage of 50x. Annotation was performed using VEP. Finally, to reduce the risk of SNP contamination, we excluded variants with a MAF (minor allelic frequency) cutoff of 0.01 using the genomeAD database. Variants were classified as ‘candidate pathogenic’ when SNV is predicted to affect the protein as determined by PolyPhen-2 (possibly and probably damaging) and SIFT (deleterious) and CADD-MSC (high) and FATHMM-XF (Functional Analysis through Hidden Markov Models (pathogenic) (40) (table S5).

Cell lines

HEK293T cell culture and transfection

HEK 293T cells (ATCC) were maintained in Dulbecco’s modified Eagle’s medium (Mediatech, Inc.) supplemented with 10% fetal bovine serum (Sigma) and 1000 IU/ml penicillin, 1000 IU/ml streptomycin.

BV2 microglial cell line

BV2 murine microglial cells were cultured in Dulbecco’s modified Eagle’s medium (DMEM) High Glucose medium (Gibco), Glutamax (Gibco), sodium pyruvate, 1% non-essential amino acids (Invitrogen) and 10 % heat-inactivated fetal bovine serum (FBS, EMD Millipore). For MAPK activation experiments, cells were treated with M-CSF1 100 ng/ml for 5 min.

MAC cell lines

Mouse primary CSF-1 dependent macrophages immortalized with the SV-U19-5 retrovirus (41) were a gift of Dr. E. R. Stanley (Albert Einstein College of Medicine, Bronx, NY). They were cultured in RPMI 1640 medium with Glutamax (Gibco), 10 % heat-inactivated fetal bovine serum (FBS, EMD Millipore) and 100 ng/mL recombinant CSF-1 (gift from Dr. E. R. Stanley). Growth medium was renewed every second day. When confluency reached 80%, cells were passaged by cell scraping and plated at 5 × 104 cells/cm2 in tissue culture treated plates. For signaling pathway analyses, cell proliferation assays or collection for RNA sequencing, cells were plated one day prior at 5 × 104 cells/cm2 in medium containing 10 ng/mL CSF-1 for lines expressing wild-type (WT) and mutant CBL, RIT1 and KRAS proteins and 100 ng/mL CSF-1 for lines expressing WT and mutant PTPN11 proteins. Cells were grown at 37°C and 5% CO2.

Human induced pluripotent stem cell (hiPSC) culture

Human induced Pluripotent Stem Cell (hiPSC) lines were derived from peripheral blood mononuclear cells (PBMCs) of a healthy donor. Written informed consent was obtained according to the Helsinki convention. The study was approved by the Institutional Review Board of St Thomas’ Hospital; Guy’s hospital; the King’s College London University; the Memorial Sloan Kettering Cancer Center and by the Tri-institutional (MSKCC, Weill-Cornell, Rockefeller University) Embryonic Stem Cell Research Oversight (ESCRO) Committee. hiPSC were derived using Sendai viral vectors (ThermoFisher Scientific; A16517). Newly derived hiPSC clones were maintained in culture for 10 passages (2-3 months) to remove any traces of Sendai viral particles. Over 90% of hiPSCs in the derived lines expressed high levels of the pluripotency markers NANOG and OCT4 by flow cytometry. The C12 hiPSC WT line was engineered to carry a CBL p.C404Y, c.1211G>A heterozygous variant at the endogenous CBL locus. HiPSCs of passage 25-35 were cultured on confluent irradiated CF1 mouse embryo fibroblasts (MEFs, Gibco) in hiPSC medium consisting of knock-out DMEM (Invitrogen), 10% knock-out-Serum Replacement (Invitrogen), 2 mM L-glutamine (Gibco), 100 U/mL penicillin-streptomycin (Invitrogen), 1% non-essential amino acids (Invitrogen), 0.1 mM β-mercaptoethanol (R&D). hiPSC medium was supplement with 10 ng/mL bFGF (PeproTech) and changed every second day. Two days before culture with hiPSCs, MEFs were plated at 20,000 cells/cm2 in DMEM supplemented with 10 % heat-inactivated fetal bovine serum (FBS, EMD Millipore), 100 U/mL penicillin-streptomycin (Invitrogen), 1% non-essential amino acids (Invitrogen) and 0.1 mM β-mercaptoethanol (R&D Systems) on 150 mm tissue culture plates coated with 0.1% gelatin (Sigma). hiPSCs were passaged weekly with 250 U/mL collagenase type IV (ThermoFisher Scientific) at a 1:4 to 1:6 ratio onto MEF cells in hiPSC medium supplemented with 10 µM Rock inhibitor (Y-27632 dihydrochloride, Sigma). Cells were maintained at 37°C and 5% CO2 and they were routinely tested for mycoplasma and periodically assessed for genomic integrity by karyotyping. Microglia like cells were obtained from hiPSCs using an embryoid body (EB)-based protocol as previously described (42). Briefly, hiPSC were loosened with 250 U/mL collagenase type IV (ThermoFisher Scientific) and lifted with cell scraping. For EBs formation, hiPSC colonies were transferred to suspension plates on an orbital shaker in hiPSC medium supplemented with 10 µM Rock inhibitor (Y-27632 dihydrochloride, Sigma). After 6 days, EBs were transferred to 6-wells tissue culture treated plates in STEMdiff APEL 2 medium (Stem Cell Technology) with 5% Protein Free Hybridoma Media (Gibco), 100 U/mL penicillin-streptomycin (Invitrogen), 25 ng/mL IL-3 (Peprotech) and 50 ng/mL CSF-1 (Peprotech). microglia like cells were harvested every week from the supernatant of EBs cultures. Collected microglia like cells were used immediately for signaling pathway analyses or plated for 6-7 days in RPMI 1640 medium with Glutamax supplement (Gibco), 10 % heat-inactivated fetal bovine serum (FBS, EMD Millipore) and 100 ng/mL human recombinant CSF-1 (Peprotech) in tissue culture plates for cytology, flow cytometry, RNA sequencing and supernatant analyses of cytokines release. Microglia like cells differentiation was monitored by May-Grunwald Giemsa staining and flow cytometry analyses of myeloid markers.

Plasmids used in in-vitro studies (HEK, BV2, and MAC lines)

The expression vectors for Flag-tagged CHK2 kinase and RIT1 were from Sino Biological and Origene, respectively. The vector encoding pcDNA3-HA-tagged c-Cbl was a kind gift from Dr. Nicholas Carpino (Stony Brook). RIT1M90I, RIT1F82L, CBLI383M, CBLC404Y, CBLC416S, CBLC384Y, CBLY371H were generated by site-directed mutagenesis using the QuikChange Kit (Agilent). pHAGE_puro was a gift from Christopher Vakoc (Addgene plasmid # 118692; http://n2t.net/addgene:118692; RRID:Addgene_118692) (43). pHAGE-KRAS was a gift from Gordon Mills & Kenneth Scott (Addgene plasmid # 116755; http://n2t.net/addgene:116755; RRID: Addgene_116755) (44). pHAGE-PTPN11 was a gift from Gordon Mills & Kenneth Scott (Addgene plasmid # 116782; http://n2t.net/addgene:116782; RRID: Addgene_116782) (44). pHAGE-PTPN11-T73I was a gift from Gordon Mills & Kenneth Scott (Addgene plasmid # 116647; http://n2t.net/addgene:116647; RRID: Addgene_116647) (44). pDONR223_KRAS_p.A59G was a gift from Jesse Boehm & William Hahn & David Root (Addgene plasmid # 81662; http://n2t.net/addgene:81662; RRID:Addgene_81662) (45), Phage-CBL, Phage-CBLI383M, Phage-CBLC404Y, Phage-CBLC416S, Phage-RIT1, Phage-RIT1M90I and Phage-RIT1F82L and Phage-KRASA59G were generated by Azenta Life Sciences via a PCR cloning approach. pHAGE-CBLC384Y plasmids was generated at Azenta Life Science by targeted mutagenesis of pHAGE-CBL.

Generation of mutant lines

HEK cells were transfected 24 hours after plating with 2.5 µL of Mirus Transit LT1 per µg of DNA. Cells were harvested and lysed 48 hrs after transfection using a buffer containing 25 mM Tris, pH 7.5, 1 mM EDTA, 100 mM NaCl, 1% NP-40, 10 µg/ml leupeptin, 10 µg/ml aprotinin, 200 µM PMSF, and 0.2 mM Na3VO4. For EGF stimulation, the media was replaced 24 hours after transfection with DMEM containing 1% FBS and antibiotics. After a further 24 hours in this starvation media, the cells were stimulated with 50ng/ml EGF for 5 minutes at 37°C. Lentiviral production and transduction of BV2 and MAC cell lines. For BV2 cell line, cells were transduced for 24 hours without the presence of Vpx VLPs and selected with 2.5 μg/mL puromycin (Fisher Scientific). For ‘MAC’ lines, Vpx-containing virus-like particles (Vpx VLPs) were produced by transfection of HEK293T cells with 4.8 ug VSV-g plasmid and 31.2 ug pSIV3/Vpx plasmid, a gift from Dr. M. Menager (Imagine Institute, Paris, France) using TransIT-293 Transfection Reagent (Mirus Bio, Fisher Scientific). Forty-eight hours after transfection, the supernatant containing Vpx VLPs was collected and used immediately for lentiviral transduction of macrophages. Viral supernatants were obtained by transfection of HEK293T cells using X-tremeGENE HP DNA Transfection Reagent (Sigma). Packaging vectors used were psPAX2 (gift from Didier Trono Addgene plasmid # 12260; http://n2t.net/addgene:12260; RRID: Addgene_12260) and pMD2.G (gift from Didier Trono, Addgene plasmid # 12259; http://n2t.net/addgene:12259; RRID:Addgene_12259). Cells were transduced for 24 hours in presence of Vpx VLPs. Transduced macrophages were selected with 5 μg/mL puromycin (Fisher Scientific).

Generation of the CBL+/C404Y and isogenic WT hiPSC lines

The CBLC404Y (c.1211 G>A) variant was inserted at the endogenous locus in the C12 WT hiPSC using Cytidine base editing (CBE) with CBE enzyme BE3-FNLS (46). Briefly, the sgRNA for CBE was designed to target the non-coding strand and introduce the position 6 “C-to-T” conversion, to create the G-to-A conversion on the coding strand. The sgRNA target sequence was cloned into the pSPgRNA (Addgene plasmid # 47108) (47) to make the gene targeting construct. To introduce the CBL C404Y variants, the WT hiPSC (C12) were dissociated using Accutase (Innovative Cell Technologies) and electroporated (1 x106 cells per reaction) with 4 µg sgRNA-construct plasmid and 4 µg CBE enzyme coding vector BE3-FNLS (Addgene plasmid # 112671) (46) using Lonza 4D-Nucleofector and the Nucleofector solution (Lonza V4XP-3034) following our previously reported protocol (48). The cells were then seeded, and 4 days later, the hiPSC were dissociated into single cells by Accutase and re-plated at a low density (4 per well in 96-well plates) to get the single-cell clones. 10 days later, individual colonies were picked, expanded and analyzed by PCR and DNA sequencing to identify the clones carried the desired CBLC404Y heterozygous variant and the isogenic WT control clones. The sgRNA target, PCR and sequencing primers are listed below.

Western Blotting

For HEK cells lysates were resolved by SDS-PAGE, transferred to PVDF membranes, and probed with the appropriate antibodies. Horseradish peroxidase-conjugated secondary antibodies (GE Healthcare) and Western blotting substrate (Thermo) were used for detection. For anti-Cdc42 immunoprecipitation experiments, cell lysates (1 mg total protein) were incubated overnight with 1 µg of anti-Cdc42 antibody (Santa Cruz) and 25 µL of protein A agarose (Roche) at 4°C. Anti-Flag immunoprecipitations were done with anti-Flag M2 affinity resin (Sigma). The beads were washed three times with lysis buffer, then eluted with SDS-PAGE buffer and resolved by SDS-PAGE. The proteins were transferred to PVDF membrane for Western blot analysis. Antibodies used are Phospho-p44/42 MAPK (pErk 1/2) (Thr202/Tyr204) is from Cell Signaling #4370, total p44/42 MAPK (Erk1/2) is from Cell Signaling #9102, HA tag from Millipore # 05-904, Flag antibody is from Sigma (#A8592), pCHEK2 (T383) antibody is from Abcam, #ab59408, and Cdc42 antibody is from Santa Cruz (#sc87). For Immunoprecipitation Kinase assay in HEK293T cells cell lysates (1 mg protein) were incubated overnight with 30 µL of anti-Flag M2 affinity resin on a rotator at 4°C, then washed three times with Tris-buffered saline (TBS). A portion of each sample was eluted with SDS-PAGE sample buffer and analyzed by anti-Flag Western blotting. The remaining sample was used for a radioactive kinase assay. The immunoprecipitated proteins were incubated with 25 µL of reaction buffer (30 mM Tris, pH 7.5, 20 mM MgCl2, 1 mg/mL BSA, 400 µM ATP), 650 µM CHKtide peptide (KKKVRSGLYRSPSMPENLNRPR, SignalChem), and 50 – 100 cpm/pmol of [γ32-P] ATP at 30°C for 15 minutes. The reactions were quenched using 45 µL of 10% trichloroacetic acid. The samples were centrifuged and 30 µL of the reaction was spotted onto Whatman P81 cellulose phosphate paper. After washing with 0.5% phosphoric acid, incorporation of radioactive phosphate into the peptide was measured by scintillation counting. For MAC lines and hiPSC-derived cells, cell lysates obtained with RIPA buffer + 1:1000 Halt Protease and Phosphatase Inhibitor Cocktail (ThermoFisher Scientific) were sonicated 3 times for 30sec at 4°C (Bioruptor, Diagenode). Protein quantification of supernatant was done with Precision Red Advanced Protein Assay (Cytoskeleton). Proteins were boiled for 5 min at 95°C in NuPAGE LDS sample buffer (Invitrogen) and separated in NuPAGE 4%–12% Bis-Tris Protein Gel (Invitrogen) in NuPAGE MES SDS Running Buffer (Invitrogen). Electrophoretic transfer to a nitrocellulose membrane (ThermoFisher Scientific) was done in NuPAGE Transfer Buffer (Invitrogen). Blocking was performed for 60 min in TBS-T + 5% nonfat milk (Cell Signaling) and incubated with primary antibodies at 4°C: rabbit anti-p44/42 MAPK (ERK1/2) (Cell Signaling; 1:1000); rabbit anti-P-p44/42 MAPK (Cell Signaling, 1:1000); rabbit anti-c-CBL (Cell signaling, 1:1000); rabbit anti-RIT1 (Abcam, 1:1000); mouse anti-KRAS (clone 3B10-2F2, Sigma, 1 μg/mL); mouse anti-Actin (clone MAB1501, Sigma, 1:10,000). Primary antibodies were detected using the secondary anti-rabbit IgG HRP-linked (Cell Signaling, 1:1000) or the anti-mouse IgG HRP-linked (Cell Signaling, 1:1000) were used to detect primary antibodies, with SuperSignal™ West Femto Chemiluminescent Substrate (ThermoFischer Scientific) using a ChemiDoc MP Imaging System (Bio-Rad). pERK/ERK ratios were measured with ImageJ software.

DNA/RNA isolation, dd-PCR and RTqPCR in MAC lines and hiPSC-derived cells

Genomic DNA was extracted using QIAamp DNA Micro Kit (50) (Qiagen), following the manufacturer’s instructions. Total RNA was extracted using RNeasy Mini kit (Qiagen), following the manufacturer’s instructions. cDNA was generated by reverse transcription using Invitrogen SuperScript IV Reverse Transcriptase (Invitrogen) with oligo(dT) primers. The TaqMan gene expression assays used were c-CBL FAM (Hs01011446_m1), CBLb FAM (Hs00180288_m1) and GAPDH VIC (Hs02786624_g1) (ThermoFisher Scientific). RT-qPCR was performed using Applied Biosystems TaqMan Fast Advanced Master Mix (ThermoFisher Scientific) and a QuantStudio 6 Flex Real-Time PCR System (ThermoFisher Scientific). The results were normalized to GAPDH. For droplet PCR analyses, assays specific for the detection of I383M, C384Y, C404Y and C416S in CBL and F82L and M90I in RIT1, A59G in KRAS and corresponding WT sequences (listed below) were obtained from Bio-Rad. Cycling conditions were tested to ensure optimal annealing/extension temperature as well as optimal separation of positive from empty droplets. Optimization was done with a known positive control. After PicoGreen quantification, 2.6-9 ng gDNA or cDNA were combined with locus-specific primers, FAM- and HEX-labeled probes, HaeIII, and digital PCR Supermix for probes (no dUTP). All reactions were performed on a QX200 ddPCR system (Bio-Rad catalog # 1864001) and each sample was evaluated in technical duplicates. Reactions were partitioned into a median of ∼19,000 droplets per well using the QX200 droplet generator. Emulsified PCRs were run on a 96-well thermal cycler using cycling conditions identified during the optimization step (95°C 10’; 40 cycles of 94°C 30’ and 52-55°C 1’; 98°C 10’; 4°C hold). Plates were read and analyzed with the QuantaSoft software to assess the number of droplets positive for mutant or wild-type DNA.

Flow cytometry analyses for surface antigens CSF1-R, CD11b, MRC1, α5β3, CD11c, Tim4, HLA-DR, CD45, CD14, NGFR, EGFR, CD36 and SIRPα were performed using PE-conjugated anti-CD115 (CSF1-R) (clone 9-4D2, BD Pharmingen), PE/Cy7-conjugated anti-CD11b (clone ICRF44, Biolegend), Alexa Fluor 488-conjugated anti-CD206 (MRC1) (clone 19.2, ThermoFisher Scientific), PE-conjugated anti-integrin α5β3 (clone 23C6, R&D systems), PE/Cy5-conjugated anti-CD11c (Clone B-ly6, BD Pharmigen), APC-conjugated anti-Tim4 (Clone 9F4, BioLegend), PE/Cy7-conjugated anti-HLA-DR (clone G46-6, BD Pharmigen), BV650-conjugated anti-CD45 (clone HI30, BD Horizon), APC/Cy7-conjugated anti-CD14 (clone M5E2, Biolegend), PE-conjugated anti-NGFR (clone ME20.4, eBioscience), Alexa Fluor 647-conjugated anti-EGFR (clone EGFR.1, BD Pharmigen), APC/Cy7-conjugated anti-CD36 (clone 5-271, BioLegend) and APC-conjugated anti-CD172a (SIRPα) (Clone: 15 414, ThermoFisher Scientific) antibodies. Iba1 expression was detected following fixation and permeabilization of macrophages using BD Cytofix/Cytoperm solution (BD Pharmingen). Cells were marked with Zombie Violet Viability (Biolegend). After incubation with FcR Blocking Reagent (Miltenyi Biotec), cells were stained with Alexa Fluor 555-conjugated anti-Iba1 antibody (clone E4O4W, Cell Signaling). Flow cytometry was performed using a BD Biosciences LSR Fortessa flow cytometer with Diva software. Data were analyzed using FlowJo (BD Biosciences LLC).

Cell proliferation analyses

For hiPSC-derived cells, cell suspension was filtered through a 100 µm nylon mesh (Corning) and marked with Zombie Violet Viability (Biolegend). After incubation with FcR Blocking Reagent (Miltenyi Biotec), surface receptors were labelled with PE/Cy7-conjugated anti-CD11b (clone ICRF44, Biolegend), Alexa Fluor 488-conjugated anti-CD206 (MRC1) (clone 19.2, ThermoFisher Scientific), BV650-conjugated anti-CD45 (clone HI30, BD Horizon), APC/Cy7-conjugated anti-CD14 (clone M5E2, Biolegend) prior to EdU detection. For proliferation studies in the mouse macrophage cell lines, macrophages were incubated with 10 µM EdU (ThermoFischer Scientific) for 2 hours at 37°C and collected by cell scraping and marked with Zombie Violet Viability (Biolegend) prior to EdU detection. EdU detection was performed using the Click-iT Plus EdU Alexa Fluor 647 Flow Cytometry Assay Kit (ThermoFischer Scientific), following manufacturer’s instructions. hiPSC-derived macrophages were analyzed using a BD Biosciences Aria III cell sorter and macrophages were identified as CD11b+CD45+CD14+MRC1+. The macrophage cell lines were analyzed using a BD Biosciences LSR Fortessa flow cytometer. Data were analyzed using FlowJo 10.6 (BD Biosciences LLC).

Enzyme-linked immunosorbent assay

Supernatants of iPSC-derived microglia-like cells were analyzed for human inflammatory cytokines IL-6, TNFα, IL-1β, IFNψ and for the complement C3 and complement Factor H by Enzyme-linked immunosorbent assay (ELISA) at Eve Technologies (Calgary, AB).

Bulk RNA sequencing (RNAseq)

Three biological replicates were processed for each condition/cell line. In view of RNA sequencing, phase separation in cells lysed in 1 mL TRIzol Reagent (ThermoFisher Scientific) was induced with 200 µL chloroform and RNA was extracted from the aqueous phase using the miRNeasy Mini Kit (Qiagen) on the QIAcube Connect (Qiagen) according to the manufacturer’s protocol with 350 µL input, or using the MagMAX mirVana Total RNA Isolation Kit (ThermoFisher Scientific) on the KingFisher Flex Magnetic Particle Processor (ThermoFisher Scientific) according to the manufacturer’s protocol with 350 µL input. Samples were eluted in 30 µL RNase-free water. After RiboGreen quantification and quality control by Agilent BioAnalyzer, 231-500 ng of total RNA with RIN values of 9.4-10 underwent polyA selection and TruSeq library preparation according to instructions provided by Illumina (TruSeq Stranded mRNA LT Kit, Illumina), with 8 cycles of PCR. Samples were barcoded and run on a NovaSeq 6000 in a PE100 run, using the NovaSeq 6000 S4 Reagent Kit (200 Cycles) (Illumina). An average of 90 million paired reads was generated per sample. Ribosomal reads represented 0-1.6% of the total reads generated and the percent of mRNA bases averaged 79%.

Bulk RNAseq analysis

FastQ files of 2×100bp paired-end reads were quality checked using FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/, 2012). Samples with high quality reads (Phred score >= 30) were aligned to the Mus musculus genome (GRCm38.80) for the MAC lines or Homo sapiens (assembly GRCh38.p14) for the IPSCs lines using STAR aligner. For the MAC lines, we computed the expression count matrix from the mapped reads using HTSeq (www-huber.embl.de/users/anders/HTSeq) and one of several possible gene model databases. The raw count matrix generated by HTSeq are then be processed using the R/Bioconductor package DESeq (www-huber.embl.de/users/anders/DESeq) which is used to both normalize the full dataset and analyze differential expression between sample groups. For the ISPCs line dataset, gene quantification was performed using feature counts from the Subread package in R. Gene expression levels were normalized and log2 transformed using the Trimmed Mean of M-values (TMM) method and differential expression analysis was performed using the edgeR package in R. For hiPSC derived cells, gene-set enrichment analysis (GSEA) (Hallmark, KEGG, GO, REACTOME) were performed using the fgsea package in R on a pre-ranked list (formula: sign(ogFC) * -log10(PValue)) on all expressed genes in the dataset. For the MAC cell lines dataset, GSEA was performed using gsea4.3.2 for KEGG and HALLMARK canonical pathways in MSigDB v 7.5.1. Significant genesets were selected based on an FDR <= 0.25. For lists of differentially expressed genes, genes were selected with controlled False Positive Rate (B&H method) at 5% (FDR <= 0.05). Genes were considered upregulated/downregulated for log2 fold change> 1.5 or <-1.5.

Statistical analysis

Statistical methods are detailed in the corresponding sections above (Quantification of mutational load and statistics, Bulk RNAseq analysis, Sn-RNAseq analysis) and in the Fig. legends. P values of 0.05 and adj. P values (FDR) of 0.25 are considered significant unless otherwise specified.

Data availability

DNA sequencing data processed for selection of somatic variants are available for all patients and samples in Table S3. Raw DNA sequencing data (FASTQ files) from targeted-deep sequencing are deposited in dbGaP under project accession number phs002213.v1.p1, for samples where patient-informed consent for public deposition of DNA sequencing data was obtained. Sn-RNAseq raw data are deposited in GEO (number pending) and as an interactive analysis web tool accessible at https://weillcornellmed.shinyapps.io/Human_brain/.

Code availability

All code used in this study has been previously published as referenced in the method section above.

Extended Data Figures S1-S6

Quality control for DNA analysis and snRNAseq

(A) Distribution of APOE genotype in a historical cohort of controls and AD patients (49) (Left) and the present series (Right) of Control, AD and AD without and with pathogenic (P-SNV) microglia variants. Numbers on top of the bars show patient number in each group.(B) Sorting strategy to separate PU.1+, NEUN+ and DN nuclei from post-mortem brain samples. (C) Boxplot represents relative frequencies, median, mean, 25-75th quartiles (boxes) and minimum/maximum (whiskers) of nuclei for each cell type in controls (n=63 brain samples) and AD patients (n=99 brain samples). (D) SnRNA-seq analysis of Facs-sorted PU.1+ nuclei from 4 donors. Table indicate donor characteristics, number of nuclei analyzed after quality control (see methods) and cell types as determined by unsupervised clustering of normalized and integrated gene expression of nuclei from 5 PU.1+ samples. (E) UMAP representation of cell types from (C). (E) Cell proportion plot of the 5 PU.1 samples from (C). (F) Boxplot showing the coverage of targeted DNA deep sequencing per cell type in AD and control samples. Box plots show median (+ mean) and 25th and 75th percentiles; whiskers extend to the largest and smallest values. Dots show outliers. (G) Expresion of microglia markers by sn-RNAseq across samples and clusters. (H) Number (TOP) and proportiton (BOTTOM) of cells from each sample, per-cluster. (I) Boxplot showing the coverage of targeted DNA deep sequencing per cell type in AD and control samples. Box plots show median (+ mean) and 25th and 75th percentiles; whiskers extend to the largest and smallest values. Dots show outliers.

Analysis of driver variants.

(A) Number of SNV per Mb, per donor, and cell types. Each dot represents the mean of a donor. NeuN n=226, DN n=229, PU.1 n=225, Blood n=66). Values (color, italics) indicate the mean number of variants/Mb per cell type. Statistics: p-value are calculated by Kruskal–Wallis test and Dunn’s test for multiple comparisons. (B) Receiver operating characteristic (ROC) curve showing the accuracy of the multivariate logistic regression model in predicting the association of AD and the presence or not of driver variants in PU.1+ nuclei. Note: non-parametric tests were used as data did not follow a normal distribution (D’Agostino-Pearson normality test). (C) Expression of driver genes in microglia and whole brain tissue, reported in (33) (TOP, sorted microglia n= 39 and whole brain n=16) and (34) (BOTTOM, sorted microglia n= 3 and whole brain n=1. (D) Graph depicts mean number of driver variants in a group of control genes not expressed by the brain or by microglia (see table S3), per Mb, and samples (LEFT) and donor (RIGHT), in NEUN, DN, PU.1 nuclei and matching blood from all controls and AD patients. Each dot represents the mean for each donor. Statistics: p-values are calculated with unpaired two-tailed Mann-Whitney U test comparing AD to controls.

Summary of AD patients characteristics and driver variants.

Table shows for all AD patients studied, the detection of driver variants by TDS, candidates identified by WES, categories of gene functions (MAPK pathway, DNA repair, DNA/Histone methylation), expression in microglia, and patient information (age/sex/Apoe genotype/braak status/CERAD score/presence of lewis bodies/presence of amyloid angiopathy). #Brain regions: number of brain regions where variant was detected. GOF (G, Gain of Function)/LOF (L, Loss of Function) as reported in bibliography (see manuscript for references). gnomeAD shows the minor allele frequency of each variant in the population. VAF: variant allelic frequency (%) by BRAIN-PACT in brain cell types and matching-blood when available. CADD score (Combined Annotation Dependent Depletion) of each variant. Notes: (1) Trisomy 21, Down syndrome. (2) familial history of AD, no variant in AD associated genes. (3) MAPK docking protein. (4) cooperative interaction with ELK1 on chromatin. (5) inhibits JNK activation, murine KO has a neurological phenotype(50) (6) microtubule binding, involved in b-amyloid aggregation. (7) DNA repair gene. (8) Mosaic trisomy 21.

Functional analysis of variants in HEK293 and BV2 cell lines

(A) Quantification of Western blot from cell lysates from HEK293T cells expressing WT of mutant CBL alleles and stimulated with EGF or control were probed with antibodies against Phospho-p44/42 MAPK (Erk 1/2, Thr202/Tyr204, (pMAPK)), total MAPK (p44/42 MAPK, Erk1/2, (MAPK)), and HA-tag (BOTTOM). N= 4 independent experiments. Statistic: Student t-test. (B) HEK293T cells expressing Flag-RIT1 (WT and mutants) were treated -/+ 20% FBS before harvesting and Lysates were probed with antibodies against Phospho-p44/42 MAPK (Erk 1/2, Thr202/Tyr204, (pMAPK)), total MAPK (p44/42 MAPK, Erk1/2, (MAPK)), and Flag. N= 5 independent experiments. Statistic: Student t-test. (C) Flag-tagged RIT1 constructs were expressed in HEK293T cells. Lysates were used in pulldown reactions with immobilized GST-PAK1-CRIB domain and in immunoprecipitation reactions with Cdc42 antibody. Bound RIT1 was measured by anti-Flag Western blotting. Lysates were also analyzed by anti-Flag and anti-MAPK Western blotting. (D) CHKE2 R346H is a loss-of-function mutant. The R346H variant is located within the catalytic loop of the protein kinase domain and shown in red on the 3D structure of CHEK2 kinase domain (pdb code: 2cn5) (LEFT). CHEK2 R346 Lysates from HEK293T cells expressing Flag-WT or CHEK2 R346 were probed with antibodies that recognizes the auto phosphorylated and activated form of CHK2 and Flag (MIDDLE). Flag-tagged WT and R346H CHK2 were expressed in HEK293T cells, proteins were isolated by immunoaffinity capture using anti-Flag resin. CHK2 activity was measured with [32P]-labeled ATP and a synthetic CHK2 substrate peptide. Wild-type CHK2 showed robust activity, while the R346H mutant was inactive (RIGHT). (E) Western-blot analysis of CBL expression (TOP), pMAPK and total MAPK (MIDDLE) and respective quantification (BOTTOM) in BV2 cell lines transduced with empty vector, CBLWT, CBLY371H, CBLI383M, CBLC384Y, CBLC404Y and CBLC416S. For MIDDLE panel, cells were treated with M-CSF1 100 ng/ml for 5 min. Statistics: p-values are calculated with t-test. N=3.

Analysis of mouse and human microglia-like cells.

(A) Western-blot analysis of CBL, RIT1, and KRAS expression in lysates from a growth factor-dependent macrophage cell line expressing CBLWT, CBLI383M, CBLC384Y, CBLC404Y, CBLC416S, CBLR420Q, RIT1WT, RIT1F99C, RIT1M107V, KRASWT and KRASA59G alleles (TOP), and ddPCR analysis of wt and mutant alleles in DNA from the same cell lines (BOTTOM). (B) Western-blot analysis of PTPN11 expression and phospho- and total-ERK in lysates from growth factor-dependent macrophage cell line expressing PTPN11WT or PTPN11T73I alleles, and ddPCR analysis of wt and variant alleles in DNA from the same lines. (C) Genomic DNA ddPCR of 2 independent hiPSC clones (#1 and #2) of CBL404C/Y heterozygous mutant carrying the c.1211G/A transition on one allele and 2 independent isogenic control CBL404C/C clones all obtained by prime editing. (D) CBL and CBL-B mRNA expression assessed by Taqman assay in CBL404C/C and CBL404C/Y iPSC-derived microglia-like cells. Unpaired t-test. (E) RT-ddPCR of CBL reference allele (CBL c.1211A) and CBL variant CBL c.1211G transcripts in CBL404C/C and CBL404C/Y iPSC-derived macrophages. n=4-6 independent experiments. (F) Western-blot analysis of CBL expression in lysates from CBL404C/C and CBL404C/Y iPSC-derived microglia-like cells. (G) Representative flow cytometry analysis of the expression of surface receptors and Iba1 in CBL404C/C and CBL404C/Y cells (n=3) (H) Viability of CBL404C/C and CBL404C/Y iPSC-derived microglia-like cells estimated by flow cytometry analysis after DAPI staining. Unpaired t-test. n=6. (I) Western-blot analysis and quantification of phospho- and total-ERK proteins in lysates from CBL404C/C and CBL404C/Y iPSC-derived microglia-like cells untreated or re-stimulated with CSF-1 cells (5 min, 100 ng/mL). (Two-way ANOVA, n=6-7).

snRNAseq analysis of microglia.

(A) Dot plot represents the significant pathways by GSEA analysis of HALLMARK and KEGG pathways of snRNAseq analysis of microglia, by samples and clusters. Genes from all samples are pre-ranked per cluster using differential expression analysis with SCANPY (37) and the Wilcoxon rank-sum method. Statistical analysis were performed using the fgseaMultilevel function in fgsea R package (38)f or HALLMARK and KEGG pathways. Only HALLMARK and KEGG gene sets with p-value < 0.05 and adjusted p-value < 0.25 are visualized, using ggpubr and ggplot2 (39) R package. (B) Dot plot represents the same GSEA analysis of HALLMARK and KEGG pathways enriched in snRNAseq microglia clusters as in A, but samples from all donors are grouped by microglia clusters.

Supplemental Tables S1-S9

Supplementary Table S1: Characteristics of AD patients and controls donors

Supplementary Table S2: Targeted-Sequencing gene panel

Supplementary Table S3: Variants identified in Alzheimer’s disease and control brain samples.

Supplementary Table S4: Pathway enrichment analysis for genes target of driver variants in PU.1 samples.

Supplementary Table S5: BRAFV600E in brain PU.1+ cells from Histiocytosis patients

Supplementary Table S6: Predicted deleterious variants by WES

Supplementary Table S7: RNAseq analysis of mouse cell lines: Differential expressed genes and GSEA analysis.

Supplementary Table S8: RNAseq analysis of hIPSC derived microglial-like cells: Differential expressed genes and GSEA analysis.

Supplementary Table S9: Single nuclei RNAseq analysis of control and AD microglia: Differential expressed genes per clusters and GSEA analysis.