Leveraging the Mendelian disorders of the epigenetic machinery to systematically map functional epigenetic variation

  1. Teresa Romeo Luperchio
  2. Leandros Boukas
  3. Li Zhang
  4. Genay Pilarowski
  5. Jenny Jiang
  6. Allison Kalinousky
  7. Kasper D Hansen  Is a corresponding author
  8. Hans T Bjornsson  Is a corresponding author
  1. Department of Genetic Medicine, Johns Hopkins University School of Medicine, United States
  2. Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, United States
  3. Faculty of Medicine, School of Health Sciences, University of Iceland, Iceland
  4. Landspitali University Hospital, Iceland
5 figures, 2 tables and 1 additional file

Figures

Figure 1 with 1 supplement
The conceptual framework of the present study.

(A) The causal chain of Mendelian Disorder of the Epigenetic Machinery (MDEM) pathogenesis: the genetic disruption of an epigenetic regulator leads to epigenetic and transcriptomic alterations, which ultimately determine the phenotype. (B) We hypothesize that the shared phenotypic features between MDEMs occur because of shared epigenetic and transcriptomic alterations downstream of the genetic disruption of distinct genes. The Venn diagram depicts two MDEMs for convenience, but our approach can be applied to an arbitrary number of MDEMs with shared phenotypes. (C) Our approach is designed to derive a list of abnormalities with high probability of causal relevance, by jointly comparing multiple MDEMs. Shown for two MDEMs for convenience. (D) Experimental design and workflow for sample generation in our present study. Created with BioRender.com. (E) The sample size of our study (number of mice). The ATAC- and RNA-seq samples were generated in parallel (see Materials and methods for details).

Figure 1—figure supplement 1
Simulation study comparing the ability of the standard approach to detect significant hits shared between experiments to that of our new approach.

See Materials and methods for details. Panel (A) corresponds to two experiments, and panel (B) corresponds to three experiments. The distributions were derived after 1000 simulations.

Figure 2 with 2 supplements
Evaluating the overlap between the differentially accessible promoter peaks in Kabuki type 1 (KS1), Kabuki type 2 (KS2), and Rubinstein-Taybi type 1 (RT1) syndromes.

(A) The distribution of p-values from the KS2 vs. wild-type differential accessibility analysis for promoter peaks, stratified according to whether the same promoter peaks are significantly differentially accessible in the KS1 vs. wild-type analysis (FDR < 0.1; red curve), or not (FDR ≥ 0.1; blue curve). (B) Scatterplot of log2(fold changes) from the KS1 vs. wild-type (x-axis) promoter peak differential accessibility analysis against the corresponding log2(fold changes) from the KS2 vs. wild-type analysis (y-axis). Each point corresponds to a peak. Shown are only peaks that are differentially accessible in KS1 (FDR < 0.1). (C) The distribution of p-values from the RT1 vs. wild-type differential accessibility analysis for promoter peaks, stratified according to whether the same promoter peaks are shared differentially accessible between KS1 and KS2 (FDR < 0.1, see Materials and methods; red curve), or not (blue curve). (D) Scatterplot of log2(fold changes) from the RT1 vs. WT (x-axis) differential accessibility analysis, against the mean log2(fold change) from the KS1 vs. wild-type and KS2 vs. wild-type analyses. Each point corresponds to a peak. Shown are only shared differentially accessible promoter peaks between KS1 and KS2 (FDR < 0.1). (E) Principal component analysis plot using only the 420 promoter peaks identified as shared differentially accessible between the three Mendelian Disorders of the Epigenetic Machinery (MDEMs). Each point corresponds to a mouse. (F) The proportion of differentially accessible promoter peaks that show increased accessibility in the mutant vs. the wild-type mice.

Figure 2—source data 1

Coordinates of shared differentially accessible promoter peaks in Kabuki type 1 (KS1) and Kabuki type 2 (KS2) syndromes, along with the corresponding logFC changes.

https://cdn.elifesciences.org/articles/65884/elife-65884-fig2-data1-v2.csv
Figure 2—source data 2

Coordinates of shared differentially accessible promoter peaks in Kabuki type 1 (KS1), Kabuki type 2 (KS2), and Rubinstein-Taybi type 1 (RT1) syndromes, along with the corresponding logFC changes.

https://cdn.elifesciences.org/articles/65884/elife-65884-fig2-data2-v2.xlsx
Figure 2—source data 3

Coordinates of shared differentially accessible distal regulatory element peaks in Kabuki type 1 (KS1) and Kabuki type 2 (KS2) syndromes, along with the corresponding logFC changes.

https://cdn.elifesciences.org/articles/65884/elife-65884-fig2-data3-v2.xlsx
Figure 2—source data 4

Coordinates of shared differentially accessible distal regulatory element peaks in Kabuki type 1 (KS1), Kabuki type 2 (KS2), and Rubinstein-Taybi type 1 (RT1) syndromes, along with the corresponding logFC changes.

https://cdn.elifesciences.org/articles/65884/elife-65884-fig2-data4-v2.xlsx
Figure 2—source data 5

Estimated surrogate variables for the differential accessibility and differential expression analyses.

https://cdn.elifesciences.org/articles/65884/elife-65884-fig2-data5-v2.xlsx
Figure 2—figure supplement 1
Principal component analysis plots using only the 420 promoter peaks identified as shared differentially accessible between the three Mendelian Disorders of the Epigenetic Machinery (MDEMs).

Each point corresponds to a mouse. The black points correspond to wild-type mice from the Kabuki cohorts, and the gray points correspond to wild-type mice from the Rubinstein-Taybi cohort.

Figure 2—figure supplement 2
Evaluating the overlap between the differentially accessible distal regulatory elements in Kabuki type 1 (KS1), Kabuki type 2 (KS2), and Rubinstein-Taybi type 1 (RT1) syndromes.

(A) The distribution of p-values from the KS2 vs. wild-type differential accessibility analysis for peaks at distal regulatory elements (defined as peaks not within ±2 kb from the TSS), stratified according to whether the same elements are significantly differentially accessible in KS1 (FDR < 0.1; red curve), or not (FDR ≥ 0.1; blue curve). (B) Scatterplot of log2(fold changes) from the KS1 vs. wild-type (x-axis) distal regulatory element differential accessibility analysis against the corresponding log2(fold changes) from the KS2 vs. wild-type analysis (y-axis). Each point corresponds to a peak. Shown are only peaks that are differentially accessible in KS1 (FDR < 0.1). (C) The distribution of p-values from the RT1 vs. wild-type differential accessibility analysis for distal regulatory element peaks, stratified according to whether the same peaks are shared differentially accessible between KS1 and KS2 (FDR < 0.1, see Materials and methods; red curve), or not (blue curve). (D) Scatterplot of log2(fold changes) from the RT1 vs. wild-type (x-axis) differential accessibility analysis, against the mean log2(fold change) from the KS1 vs. wild-type and KS2 vs. wild-type analyses. Each point corresponds to a peak. Shown are only peaks that are shared differentially accessible between KS1 and KS2 (FDR < 0.1). (E) The pairwise overlap between the differentially accessible peaks (promoters or distal regulatory elements) in the three Mendelian Disorders of the Epigenetic Machinery (MDEMs). (F) The proportion of differentially accessible distal regulatory elements that show increased accessibility in the mutant vs. the wild-type mice.

Figure 3 with 1 supplement
The relationship between differential accessibility of promoter peaks and differential expression of downstream genes in the three Mendelian Disorders of the Epigenetic Machinery (MDEMs).

(A) The proportion of promoters with differentially expressed downstream genes in Kabuki type 1 (KS1), Kabuki type 2 (KS2), and Rubinstein-Taybi type 1 (RT1) syndromes, estimated for the top ranked differentially accessible promoter peaks. The estimation was repeated for different thresholds for determining the top ranked list. For each MDEM, each point corresponds to a different threshold. Thresholds were slid from 1000 to 5000, in steps of 250. (B) Scatterplot of the accessibility log2(fold changes) of differentially accessible promoter peaks, against the expression log2(fold changes) of differentially expressed downstream genes, for each of the three MDEMs. Shown are only pairs where the promoter peak was within the top 1000 differentially accessible promoter peaks (ranked based on p-value), and the downstream gene was differentially expressed (10% FDR; Materials and methods). Each point corresponds to a gene-promoter pair. In cases where more than one peak in the same promoter was within the top 1000 differentially accessible peaks, the median(log2(fold change)) across all such peaks was calculated. (C) and (D) An example locus (Pard3b) with concordant changes in promoter peak accessibility and downstream gene expression in all three MDEMs. (E) The proportion of promoters with differentially expressed downstream genes in KS1, KS2, and RT1, estimated separately for the top uniquely differentially accessible promoters in each MDEM (see Materials and methods), vs. the same proportion estimated for the genes downstream of the 420 shared differentially accessible promoter peaks.

Figure 3—source data 1

Differentially expressed genes downstream of differentially accessible promoter peaks, along with the corresponding p-values and logFC changes.

https://cdn.elifesciences.org/articles/65884/elife-65884-fig3-data1-v2.xlsx
Figure 3—figure supplement 1
The distributions of p-values (from the differential expression analyses) for genes downstream of promoters with differentially accessible peaks shared across the disorders, or unique to the particular disorder.

The horizontal lines correspond to the estimated proportion of non-differential genes (see Materials and methods). For Kabuki type 1 (KS1) syndrome, both the proportion estimated with the bootstrap (orange line) and the smoother (pink dashed line) procedure are depicted.

Figure 4 with 1 supplement
Evaluating the overlap between the differentially expressed genes in Kabuki type 1 (KS1), Kabuki type 2 (KS2), and Rubinstein-Taybi type 1 (RT1) syndromes.

(A) The distribution of p-values from the KS2 vs. wild-type differential expression analysis, stratified according to whether the same genes are significantly differentially expressed in KS1 (FDR < 0.1; red curve), or not (FDR ≥ 0.1; blue curve). (B) Scatterplot of log2(fold changes) from the KS1 vs. wild-type differential expression analysis (x-axis), against the corresponding log2(fold changes) from the KS2 vs. wild-type analysis (y-axis). Each point corresponds to a gene. Shown are only genes that are differentially expressed in KS1 (FDR < 0.1). (C) The distribution of p-values from the RT1 vs. wild-type differential expression analysis, stratified according to whether the same genes are shared differentially expressed between KS1 and KS2 (FDR < 0.1, see Materials and methods; red curve), or not (blue curve). (D) Scatterplot of log2(fold changes) from the RT1 vs. WT (x-axis) differential expression analysis, against the mean log2(fold change) from the KS1 vs. wild-type and KS2 vs. wild-type analyses. Each point corresponds to a gene. Shown are only genes that are shared differentially expressed between KS1 and KS2 (FDR < 0.1). (E) Principal component analysis plots using only the 264 genes identified as shared differentially expressed between the three Mendelian Disorders of the Epigenetic Machinery (MDEMs). Each point corresponds to a mouse. (F) The proportion of differentially expressed genes that show increased expression in the mutant vs. the wild-type mice. (G) The proportion of genes with differentially accessible promoter peaks in KS1, KS2, and RT1, estimated for the top ranked differentially expressed genes. The estimation was repeated for different thresholds for inclusion into the top ranked list. For each MDEM, each point corresponds to a different threshold. Thresholds were varied from 1000 to 5000, in steps of 250.

Figure 4—source data 1

Shared differentially expressed genes in Kabuki type 1 (KS1) and Kabuki types 2 (KS2) syndromes, along with the corresponding logFC changes.

https://cdn.elifesciences.org/articles/65884/elife-65884-fig4-data1-v2.xlsx
Figure 4—source data 2

Shared differentially expressed genes in Kabuki type 1 (KS1), Kabuki type 2 (KS2), and Rubinstein-Taybi type 1 (RT1), along with the corresponding logFC changes.

https://cdn.elifesciences.org/articles/65884/elife-65884-fig4-data2-v2.xlsx
Figure 4—source data 3

Transcription factor motifs enriched in peaks found in promoters of differentially expressed genes.

https://cdn.elifesciences.org/articles/65884/elife-65884-fig4-data3-v2.xlsx
Figure 4—figure supplement 1
Principal component analysis plots using only the 264 genes identified as shared differentially expressed between the three Mendelian Disorders of the Epigenetic Machinery (MDEMs).

Each point corresponds to a mouse. The black points correspond to wild-type mice from the Kabuki cohorts, and the gray points correspond to wild-type mice from the Rubinstein-Taybi cohort.

Evaluating genes known to encode transcription factors (TFs), or individually contribute to IgA deficiency, for collective expression dysregulation.

(A) The Wilcoxon rank-sum test statistic (red vertical line) computed after assembling a list of genes encoding TFs expressed in B cells (Materials and methods), and comparing the distribution of their differential expression p-values to the p-value distribution of the rest of the genes included in the differential expression analysis. The blue distribution corresponds to the same statistic computed after randomly sampling gene sets of the same size as TFs, and comparing their p-value distribution to the p-values for the rest of the genes. The resampling was performed 10,000 times. (B) Same as (A), but for genes known to individually contribute to IgA deficiency (Materials and methods). (C) The percentage of TF genes that belong to the top 25% differentially expressed TFs in Kabuki type 1 syndrome (KS1) (orange dots), and Kabuki type 2 syndrome (KS2) (green dots), stratified according to their p-value quartile in Rubinstein-Taybi type 1 syndrome (RT1). (D) Same as (C), but for IgA deficiency genes compared in KS1 and RT1. (E) Serum IgA levels in KS1, KS2, and wild-type mice.

Figure 5—source data 1

Top 20 Reactome enriched pathways, using shared differentially expressed genes in Kabuki type 1 (KS1), Kabuki type 2 (KS2), and Rubinstein-Taybi type 1 (RT1) syndromes.

https://cdn.elifesciences.org/articles/65884/elife-65884-fig5-data1-v2.xlsx
Figure 5—source data 2

Top 20 Reactome enriched pathways, using genes with shared differentially accessible promoters in Kabuki type 1 (KS1), Kabuki type 2 (KS2), and Rubinstein-Taybi type 1 (RT1) syndromes.

https://cdn.elifesciences.org/articles/65884/elife-65884-fig5-data2-v2.xlsx
Figure 5—source data 3

Serum IgA levels measured in the two types of Kabuki syndrome and wild-type littermates.

https://cdn.elifesciences.org/articles/65884/elife-65884-fig5-data3-v2.xlsx

Tables

Table 1
Shared differentially expressed genes with shared differentially accessible promoters in Kabuki type 1, Kabuki type 2, and Rubinstein-Taybi syndromes.
Gene nameGene expressionPromoter accessibilityGene function
Pard3bUpUpCell division and cell polarization processes
Pbx1UpUpTranscription factor
Epm2aUpUpSerine/threonine/tyrosine phosphatase
Zfp365UpUpTranscription factor
Ccdc88aUpUpActin binding protein
Tanc2UpUpSynaptic scaffolding protein
Dip2cUpUpProtein interacting with transcription factors
Kif13aUpUpMicrotubule-based motor protein
Spry2UpUpInhibitory activity on receptor tyrosine kinase signaling proteins
Ndrg1UpUpN-myc downregulated gene family member
Ebi3DownDownInterleukin subunit
PpdpfDownDownRegulator of exocrine pancreas development
Golim4UpUpGolgi protein
RelnUpUpSecreted extracellular matrix protein
Amz1DownDownZinc metalloproteinase
Slc29a4UpUpMonoamine transporter
Bicd1UpUpRole in intracellular cargo transport
Slc25a4UpUpMember of the mitochondrial carrier subfamily
Nr3c2UpUpMineralocorticoid receptor
Zfp827UpUpTranscription factor
Slc36a4UpUpAmino acid transporter
Arhgef12UpUpGuanine exchange factor
Tbc1d2bUpUpGTP-ase activating protein
CaskUpUpCalcium-calmodulin-dependent serine protein kinase
DmdUpUpConnects cytoskeleton and the extracellular matrix
Maged1UpUpp75 neurotrophin receptor mediated program
Chic1UpUpCysteine-rich hydrophobic (CHIC) domain containing protein
Gprasp1UpUpG protein-coupled receptor interacting protein
Col4a5UpUpMajor collagen of basement membrane
Key resources table
Reagent type (species) or resourceDesignationSource or referenceIdentifiersAdditional information
Strain, strain background (Mus musculus, both sexes)Kmt2d+/βGeo mice (fully backcrossed to C57BL/6J)Originally from Bay Genomics and described in PMID:2527309625273096.Kmt2d+/βGeo,Mll2Gt(RRt024)BygRRID:MGI:5829565A previously characterized mouse model of Kabuki syndrome (type 1).
Strain, strain background (Mus musculus, females only)Kdm6a± mice (fully backcrossed to C57BL/6J, observed male lethality)Ordered from EMMA (European Mouse Mutant Archive)Kdm6a+/, Kdm6atm1d(EUCOMM)WtsiMGI:4434460A previously characterized mouse model of Kabuki syndrome (type 2). Transition from Kdm6atm1a(EUCOMM)Wtsi to Kdm6atm1d(EUCOMM)Wtsi performed in Bjornsson laboratory.
Strain, strain background (Mus musculus, both sexes)Crebbp±mice (fully backcrossed to C57BL/6J)Ordered from Jackson laboratory and described in PMID:10673499Crebbp+/-Crebbptm1Dli, RRID:MGI:2175793A previously characterized mouse model of Rubinstein-Taybi syndrome (type 1).
Sequence-based reagentβGeo FThis paperPCR primersCAAATGGCGATTACCGTTGA
Sequence-based reagentβGeo RThis paperPCR primersTGCCCAGTCATAGCCGAATA
Sequence-based reagentTcrd (control) FThis paperPCR primersCAAATGTTGCTTGTCTGGTG
Sequence-based reagentTcrd (control) RThis paperPCR primersGTCAGTCGAGTGCACAGTTT
Sequence-based reagentKdm6aTm1c FThis paperPCR primersAAGGCGCATAACGATACCAC
Sequence-based reagentKdm6aTm1c, Floxed LRThis paperPCR primersACTGATGGCGAGCTCAGACC
Sequence-based reagentTcrd (control) F-This paperPCR primersCAAATGTTGCTTGTCTGGTG
Sequence-based reagentTcrd (control) RThis paperPCR primersGTCAGTCGAGTGCACAGTTT
Sequence-based reagentCrebbpR-T FThis paperPCR primersTAAGCAGCAGCATCCTTTGG
Sequence-based reagentCrebbpR-T_WTThis paperPCR primersCCTGACAATGTGTCATGTGAT
Sequence-based reagentCrebbpR_T_MUT R:This paperPCR primersATGCTCCAGACTGCCTTGGGA
Commercial assay or kitIgA ELISA kitThermoCatalog # EMIGA
Commercial assay or kitCD19 positive selectionMiltenyi130-052-201
Commercial assay or kitTagmentationIlluminaNexteraFC-121–1030
Commercial assay or kitDigitoninPromegaG9441
Commercial assay or kitDNA clean and concentration kitZymoD4013
Commercial assay or kitSelect A size purificationZymoD4080
Commercial assay or kitDNA high sensitivityAgilent5067–4626
Commercial assay or kitQubit dsDNA HSThermoQ32851
Commercial assay or kitDirect-zol RNA microprepZymoR2060
Commercial assay or kitQuant-iT RiboGreenThermoR11490
Commercial assay or kitRNA HS Assay kitThermoQ32852
Commercial assay or kitRNA 6000 PicoAgilent5067–1513
Commercial assay or kitNEBNext Poly(A) mRNA isolation moduleNew England BiolabsE7490
Commercial assay or kitNEBNext Ultra II Directional RNA Library Prep kitNew England BiolabsE7760/E7765
Commercial assay or kitKAPA library Quantification kitKAPAKK4824
Software, algorithmBowTie2PMID:22388286RRID:SCR_016368Default parameters
Software, algorithmSamtoolsPMID:19505943RRID:SCR_002105
Software, algorithmMACS2PMID:18798982RRID:SCR_013291Keep-dup = all
Software, algorithmDESeq2PMID:25516281RRID:SCR_015687
Software, algorithmSurrogate Variable AnalysisPMID:17907809RRID:SCR_012836
Software, algorithmSalmonPMID:28263959V0.10RRID:SCR_017036
OtherGEO submission of all dataAccession GSE162181RRID:SCR_005012

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Teresa Romeo Luperchio
  2. Leandros Boukas
  3. Li Zhang
  4. Genay Pilarowski
  5. Jenny Jiang
  6. Allison Kalinousky
  7. Kasper D Hansen
  8. Hans T Bjornsson
(2021)
Leveraging the Mendelian disorders of the epigenetic machinery to systematically map functional epigenetic variation
eLife 10:e65884.
https://doi.org/10.7554/eLife.65884