Detection of mutations in brain cell types and blood

(A) Table with patient and sample information. (B) Schematic represents the isolation and labelling of nuclei from post-mortem frozen brain samples from controls and Alzheimer’s disease patients with DAPI and antibodies against PU.1+ (myeloid/microglia) and NeuN+ (neurons).Representative flow cytometry dot-plot of nuclei separation. Double negative nuclei are labeled ‘DN’. (C) Percentage of cell types obtained in sorted PU.1+ nuclei determined by single-nuclei RNAseq in 5 brain samples from 4 individuals. (D) Schematic represents the sequencing strategy. Two algorithms (ShearwaterML and Mutect1) were used for variant calling. After annotation, pathogeneity was determined using OncoKb and ClinVar. (E) Venn diagram represents the number of variants and overlap between the ShearwaterML and Mutect1. Numbers in red indicate pathogenic variants (P-SNV). Validation of variants was performed by droplet digital (dd)PCR on pre-amplified DNA when available. (F) Venn diagrams represent the repartition per cell type of the 826 single-nucleotide variations (SNVs) identified in NeuN+: Neurons, PU.1+: microglia, DN: glia, and matching blood. [Numbers] in red indicate pathogenic variants P-SNV

Pathogenic variants are enriched in microglia from AD patients.

(A) Correlation plot represents the mean number of variants per cell type and donor (n=89) (Y axis), as a function of age (X axis). Each dot represents mean value for a donor. Statistics: fitted lines, the correlation coefficients (rs) and associated p values were obtained by linear regression (Spearman’s correlation). (B) Number of SNV per Mb and cell types per donor, of age-matched controls (n=27) and AD patients (n=45). Each dot represents mean value for a donor. Statistics: p-values are calculated with unpaired two-tailed Mann-Whitney U. Note: non-parametric tests were used when data did not follow a normal distribution (D’Agostino-Pearson normality test). (C) Number of SNV per Mb in PU.1 samples across brain regions, of age-matched controls (n=27) and AD patients (n=45). Each dot represents a sample. Statistics: p-values are calculated with Kruskal–Wallis, multiple comparisons. Note: non-parametric tests were used when data did not follow a normal distribution (D’Agostino-Pearson normality test). (D) Correlation plot represents the mean number of pathogenic variants (P-SNV) as determined by ClinVar and/or OncoKB, per cell type and donor (n=89) (Y axis), as a function of age (X axis). Each dot represents mean value for a donor. Statistics: fitted lines, the correlation coefficients (rs) and associated p values were obtained by linear regression (Spearman’s correlation). (E) Number of P-SNV per Mb and cell types per sample, of age-matched controls (n=27) and AD patients (n=45). Each dot represents a sample. Statistics: p-values are calculated with unpaired two-tailed Mann-Whitney U. Note: non-parametric tests were used when data did not follow a normal distribution (D’Agostino-Pearson normality test). (F) Number of P-SNV per Mb in PU.1 samples across brain regions, of age-matched controls (n=27) and AD patients (n=45). Each dot represents a sample. Statistics: p-values are calculated with Kruskal–Wallis test and Dunn’s test for multiple comparis. Note: non-parametric tests were used when data did not follow a normal distribution (D’Agostino-Pearson normality test). (G) Number of P-SNV per Mb and and cell types per donor for age-matched controls (n=27) and AD patients (n=45). Each dot represents mean value for a donor. Statistics: p-values are calculated with unpaired two-tailed Mann-Whitney U test. Odds ratio (95% CI, 2.049 to 29.02) and p values for the association between AD and the presence of driver variants are calculated by multivariate logistic regression, with age and sex as covariates.

Somatic microglial clones with multiple and recurrent CBL and MAP-Kinase pathway activating variants.

(A) Pathway enrichment analysis for the genes target of D-SNVs using the panel of 716 genes as background set. Graph shows the most enriched pathways by: Reactome Gene Sets, GO Molecular Functions, Canonical Pathways and KEGG Pathway (see complete list in Table S4). (B) Bar plot indicates the genes carrying D-SNV (y-axis) and the % of AD patients carrying D-SNV for each gene (x-axis). Genes are color-coded by pathway. (C) Representation of the classical MAPK pathway, the 6 genes mutated in AD patients are labeled in red, TEK is labeled in blue, and larger font size indicate reccurence of variants in a given gene. Violin plot shows enrichment in AD patients as compared to age-matched control, p-value: unpaired two-tailed Mann-Whitney U test. (D) Summary Table showing patients carrying D-SNV in the classical RTK/MAPK pathway and CML associated genes (see Table S3) and indicating the detection of variants in blood, and their association with other variants in microglia. (E) Recurrent variants in the ring-like domain of CBL are indicated in red on the diagram structure of gene, and representative western blot from cell lysates from HEK293T cells expressing WT of mutant CBL alleles and stimulated with EGF or control, probed with antibodies against Phospho-p44/42 MAPK (Erk 1/2, Thr202/Tyr204), total p44/42 MAPK (Erk1/2), and HA-tag (BOTTOM). Data are representative from 5 independent experiments. (F) RIT1 M90I and F82L are represented on the 3D structure of the gene (pdb code: 4klz, F82 is within a segment whose structure was not resolved) and representative western blot from HEK293T cells expressing Flag-RIT1 (WT and mutants) and treated -/+ 20% FBS before harvesting. Lysates were probed with antibodies against Phospho-p44/42 MAPK (Erk 1/2, Thr202/Tyr204), total p44/42 MAPK (Erk1/2, (MAPK)), and Flag. Data are representative from 4 independent experiments. (G) Percentage of D-SNVs detected by targeted deep sequencing (TDS) which were also detected by Whole-Exome-Sequencing (WES). (H) Variant allelic frequency (VAF, %) for the BRAFV600E allele in PU.1+ nuclei from brain samples from histiocytosis patients (each dot represents a sample) and for D-SNVs in in PU.1+ nuclei from brain of AD patients (each dot represent a variant). Note: non-parametric tests were used when data did not follow a normal distribution (D’Agostino-Pearson normality test).

MAPK pathway activating variants in mouse macrophages and human iPSC-derived microglia-like cells.

(A) Representative western-blot analysis (Top panels) and quantification (Middle panels) of phospho- and total-ERK in lysates from a murine CSF-1 dependent macrophage cell line expressing CBLWT, CBLI383M, CBLC384Y, CBLC404Y, CBLC416S (n=3-6), and RIT1WT, RIT1F82L and RIT1M90I (n=3), KRASWT and KRASA59G (n=3). Bottom panels depicts flow cytometry analysis of EdU incorporation in the same lines. Statistics, Unpaired t-test. (B) HALLMARK and KEGG pathways (FDR/adj.p value <0.25, selected from Table S7) enriched in gene set enrichment analysis (GSEA) of RNAseq from mutant CSF-1 dependent macrophages lines CBLI383M, CBLC384Y, CBLC404Y, CBLC416S, CBLR420Q, RIT1F82L RIT1M90I, KRASA59G, and PTPN11T73I (n=3-6) in comparison with their wt controls. NES: normalized enrichment score. (C) Sanger sequencing of 2 independent hiPSC clones (#93 and #91) of CBL404C/Y heterozygous mutant carrying the c.1211G/A transition on one allele and 2 independent isogenic control CBL404C/C clones (#71 and #89) all obtained by prime editing. (D) Photomicrographs in CBL404C/C and CBL404C/Y iPSC-derived microglia-like cells.(E) Quantification of leading edge and lateral lamellipodia in CBL404C/C and CBL404C/Y iPSC-derived microglia-like cells. n=3-7, statistics: pvalue are obtained by nested one-way ANOVA. (F) Flow cytometry analysis of cell size for the same lines (n>3) statistics: pvalue are obtained with nested one-way ANOVA,). (G) Flow cytometry analysis of EdU incorporation in CBL404C/C and CBL404C/Y microglia-like cells after a 2 hours EdU pulse. n=3, unpaired t-test). (H) Western-blot analysis (left) and quantification (right) of phospho- and total-ERK proteins in lysates from CBL404C/C and CBL404C/Y microglia-like cells starved of CSF-1 for 4 h and stimulated with CSF-1 (5 min, 100 ng/mL) (n=4), statistics: pvalue are obtained with two-way ANOVA.

CBL404C/Y microglia signature.

(A) HALLMARK and KEGG pathways (FDR/adj.p value <0.25, selected from Table S8) enriched in gene set enrichment analysis (GSEA) of RNAseq from from CBL404C/Y iPSC-derived macrophages and isogenic controls NES, normalized enrichment score. (B) ELISA for pro-inflammatory cytokines (n=3) and complement proteins (n=2) in the supernatant from CBL404C/Y iPSC-derived microglial like cells and isogenic controls. Statistics: p-value are obtained by nonparametric Mann-Whitney U test,* 0.05, ** 0.01, *** 0.001, **** 0.0001. (C) GSEA analysis for enrichment of the human AD-microglia snRNAseq signature (MIC1) 21 in differentially expressed genes between CBL404Y/C microglial like cells and isogenic controls. (D) Unsupervised Louvain clustering of snRNAseq data from 5 samples of FACS-purified PU.1+ microglia nuclei from 4 donors (see Fig. S1C) control C14, AD without driver variant (AD34) and AD with driver variants (AD 52 and 53). (E) Dot plot represents the GSEA analysis of HALLMARK and KEGG pathways enriched in snRNAseq microglia clusters (samples from all donors). Genes are pre-ranked per cluster using differential expression analysis with SCANPY and the Wilcoxon rank-sum method. Statistical analyses were performed using the fgseaMultilevel function in fgsea R package for HALLMARK and KEGG pathways. Selected gene-sets with p-value < 0.05 and adjusted p-value < 0.25 are visualized using ggpubr and ggplot2 R package (gene sets/pathways are selected from fig S6B, Table S9). (F) Dot plot represents the GSEA analysis (as in (E)) of HALLMARK and KEGG pathways enriched in cluster 2/2B and deconvoluted by donor samples (selected from Fig. S6A).

Quality control for DNA analysis and snRNAseq

(A) Distribution of APOE genotype in a historical cohort of controls and AD patients (49) (Left) and the present series (Right) of Control, AD and AD without and with pathogenic (P-SNV) microglia variants. Numbers on top of the bars show patient number in each group.(B) Sorting strategy to separate PU.1+, NEUN+ and DN nuclei from post-mortem brain samples. (C) Boxplot represents relative frequencies, median, mean, 25-75th quartiles (boxes) and minimum/maximum (whiskers) of nuclei for each cell type in controls (n=63 brain samples) and AD patients (n=99 brain samples). (D) SnRNA-seq analysis of Facs-sorted PU.1+ nuclei from 4 donors. Table indicate donor characteristics, number of nuclei analyzed after quality control (see methods) and cell types as determined by unsupervised clustering of normalized and integrated gene expression of nuclei from 5 PU.1+ samples. (E) UMAP representation of cell types from (C). (E) Cell proportion plot of the 5 PU.1 samples from (C). (F) Boxplot showing the coverage of targeted DNA deep sequencing per cell type in AD and control samples. Box plots show median (+ mean) and 25th and 75th percentiles; whiskers extend to the largest and smallest values. Dots show outliers. (G) Expresion of microglia markers by sn-RNAseq across samples and clusters. (H) Number (TOP) and proportiton (BOTTOM) of cells from each sample, per-cluster. (I) Boxplot showing the coverage of targeted DNA deep sequencing per cell type in AD and control samples. Box plots show median (+ mean) and 25th and 75th percentiles; whiskers extend to the largest and smallest values. Dots show outliers.

Analysis of driver variants.

(A) Number of SNV per Mb, per donor, and cell types. Each dot represents the mean of a donor. NeuN n=226, DN n=229, PU.1 n=225, Blood n=66). Values (color, italics) indicate the mean number of variants/Mb per cell type. Statistics: p-value are calculated by Kruskal–Wallis test and Dunn’s test for multiple comparisons. (B) Receiver operating characteristic (ROC) curve showing the accuracy of the multivariate logistic regression model in predicting the association of AD and the presence or not of driver variants in PU.1+ nuclei. Note: non-parametric tests were used as data did not follow a normal distribution (D’Agostino-Pearson normality test). (C) Expression of driver genes in microglia and whole brain tissue, reported in (33) (TOP, sorted microglia n= 39 and whole brain n=16) and (34) (BOTTOM, sorted microglia n= 3 and whole brain n=1. (D) Graph depicts mean number of driver variants in a group of control genes not expressed by the brain or by microglia (see table S3), per Mb, and samples (LEFT) and donor (RIGHT), in NEUN, DN, PU.1 nuclei and matching blood from all controls and AD patients. Each dot represents the mean for each donor. Statistics: p-values are calculated with unpaired two-tailed Mann-Whitney U test comparing AD to controls.

Summary of AD patients characteristics and driver variants.

Table shows for all AD patients studied, the detection of driver variants by TDS, candidates identified by WES, categories of gene functions (MAPK pathway, DNA repair, DNA/Histone methylation), expression in microglia, and patient information (age/sex/Apoe genotype/braak status/CERAD score/presence of lewis bodies/presence of amyloid angiopathy). #Brain regions: number of brain regions where variant was detected. GOF (G, Gain of Function)/LOF (L, Loss of Function) as reported in bibliography (see manuscript for references). gnomeAD shows the minor allele frequency of each variant in the population. VAF: variant allelic frequency (%) by BRAIN-PACT in brain cell types and matching-blood when available. CADD score (Combined Annotation Dependent Depletion) of each variant. Notes: (1) Trisomy 21, Down syndrome. (2) familial history of AD, no variant in AD associated genes. (3) MAPK docking protein. (4) cooperative interaction with ELK1 on chromatin. (5) inhibits JNK activation, murine KO has a neurological phenotype(50) (6) microtubule binding, involved in b-amyloid aggregation. (7) DNA repair gene. (8) Mosaic trisomy 21.

Functional analysis of variants in HEK293 and BV2 cell lines

(A) Quantification of Western blot from cell lysates from HEK293T cells expressing WT of mutant CBL alleles and stimulated with EGF or control were probed with antibodies against Phospho-p44/42 MAPK (Erk 1/2, Thr202/Tyr204, (pMAPK)), total MAPK (p44/42 MAPK, Erk1/2, (MAPK)), and HA-tag (BOTTOM). N= 4 independent experiments. Statistic: Student t-test. (B) HEK293T cells expressing Flag-RIT1 (WT and mutants) were treated -/+ 20% FBS before harvesting and Lysates were probed with antibodies against Phospho-p44/42 MAPK (Erk 1/2, Thr202/Tyr204, (pMAPK)), total MAPK (p44/42 MAPK, Erk1/2, (MAPK)), and Flag. N= 5 independent experiments. Statistic: Student t-test. (C) Flag-tagged RIT1 constructs were expressed in HEK293T cells. Lysates were used in pulldown reactions with immobilized GST-PAK1-CRIB domain and in immunoprecipitation reactions with Cdc42 antibody. Bound RIT1 was measured by anti-Flag Western blotting. Lysates were also analyzed by anti-Flag and anti-MAPK Western blotting. (D) CHKE2 R346H is a loss-of-function mutant. The R346H variant is located within the catalytic loop of the protein kinase domain and shown in red on the 3D structure of CHEK2 kinase domain (pdb code: 2cn5) (LEFT). CHEK2 R346 Lysates from HEK293T cells expressing Flag-WT or CHEK2 R346 were probed with antibodies that recognizes the auto phosphorylated and activated form of CHK2 and Flag (MIDDLE). Flag-tagged WT and R346H CHK2 were expressed in HEK293T cells, proteins were isolated by immunoaffinity capture using anti-Flag resin. CHK2 activity was measured with [32P]-labeled ATP and a synthetic CHK2 substrate peptide. Wild-type CHK2 showed robust activity, while the R346H mutant was inactive (RIGHT). (E) Western-blot analysis of CBL expression (TOP), pMAPK and total MAPK (MIDDLE) and respective quantification (BOTTOM) in BV2 cell lines transduced with empty vector, CBLWT, CBLY371H, CBLI383M, CBLC384Y, CBLC404Y and CBLC416S. For MIDDLE panel, cells were treated with M-CSF1 100 ng/ml for 5 min. Statistics: p-values are calculated with t-test. N=3.

Analysis of mouse and human microglia-like cells.

(A) Western-blot analysis of CBL, RIT1, and KRAS expression in lysates from a growth factor-dependent macrophage cell line expressing CBLWT, CBLI383M, CBLC384Y, CBLC404Y, CBLC416S, CBLR420Q, RIT1WT, RIT1F99C, RIT1M107V, KRASWT and KRASA59G alleles (TOP), and ddPCR analysis of wt and mutant alleles in DNA from the same cell lines (BOTTOM). (B) Western-blot analysis of PTPN11 expression and phospho- and total-ERK in lysates from growth factor-dependent macrophage cell line expressing PTPN11WT or PTPN11T73I alleles, and ddPCR analysis of wt and variant alleles in DNA from the same lines. (C) Genomic DNA ddPCR of 2 independent hiPSC clones (#1 and #2) of CBL404C/Y heterozygous mutant carrying the c.1211G/A transition on one allele and 2 independent isogenic control CBL404C/C clones all obtained by prime editing. (D) CBL and CBL-B mRNA expression assessed by Taqman assay in CBL404C/C and CBL404C/Y iPSC-derived microglia-like cells. Unpaired t-test. (E) RT-ddPCR of CBL reference allele (CBL c.1211A) and CBL variant CBL c.1211G transcripts in CBL404C/C and CBL404C/Y iPSC-derived macrophages. n=4-6 independent experiments. (F) Western-blot analysis of CBL expression in lysates from CBL404C/C and CBL404C/Y iPSC-derived microglia-like cells. (G) Representative flow cytometry analysis of the expression of surface receptors and Iba1 in CBL404C/C and CBL404C/Y cells (n=3) (H) Viability of CBL404C/C and CBL404C/Y iPSC-derived microglia-like cells estimated by flow cytometry analysis after DAPI staining. Unpaired t-test. n=6. (I) Western-blot analysis and quantification of phospho- and total-ERK proteins in lysates from CBL404C/C and CBL404C/Y iPSC-derived microglia-like cells untreated or re-stimulated with CSF-1 cells (5 min, 100 ng/mL). (Two-way ANOVA, n=6-7).

snRNAseq analysis of microglia.

(A) Dot plot represents the significant pathways by GSEA analysis of HALLMARK and KEGG pathways of snRNAseq analysis of microglia, by samples and clusters. Genes from all samples are pre-ranked per cluster using differential expression analysis with SCANPY (37) and the Wilcoxon rank-sum method. Statistical analysis were performed using the fgseaMultilevel function in fgsea R package (38)f or HALLMARK and KEGG pathways. Only HALLMARK and KEGG gene sets with p-value < 0.05 and adjusted p-value < 0.25 are visualized, using ggpubr and ggplot2 (39) R package. (B) Dot plot represents the same GSEA analysis of HALLMARK and KEGG pathways enriched in snRNAseq microglia clusters as in A, but samples from all donors are grouped by microglia clusters.