Mapping cell type-specific transcriptional enhancers using high affinity, lineage-specific Ep300 bioChIP-seq

Abstract
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

Understanding the mechanisms that regulate cell type-specific transcriptional programs requires developing a lexicon of their genomic regulatory elements. We developed a lineage-selective method to map transcriptional enhancers, regulatory genomic regions that activate transcription, in mice. Since most tissue-specific enhancers are bound by the transcriptional co-activator Ep300, we used Cre-directed, lineage-specific Ep300 biotinylation and pulldown on immobilized streptavidin followed by next generation sequencing of co-precipitated DNA to identify lineage-specific enhancers. By driving this system with lineage-specific Cre transgenes, we mapped enhancers active in embryonic endothelial cells/blood or skeletal muscle. Analysis of these enhancers identified new transcription factor heterodimer motifs that likely regulate transcription in these lineages. Furthermore, we identified candidate enhancers that regulate adult heart- or lung- specific endothelial cell specialization. Our strategy for tissue-specific protein biotinylation opens new avenues for studying lineage-specific protein-DNA and protein-protein interactions.

https://doi.org/10.7554/eLife.22039.001

Introduction

The diverse cell types of a multicellular organism share the same genome but express distinct gene expression programs. In mammals, precise cell-type specific regulation of gene expression depends on transcriptional enhancers, non-coding regions of the genome required to activate expression of their target genes (Visel et al., 2009a; Bulger and Groudine, 2011). Enhancers are bound by transcription factors and transcriptional co-activators, which then contact RNA polymerase two engaged at the promoter, stimulating gene transcription.

Enhancers are nodal points of transcriptional networks, integrating multiple upstream signals to regulate gene expression. Because enhancers do not have defined sequence or location with respect to their target genes, mapping enhancers is a major bottleneck for delineating transcriptional networks. Recently chromatin immunoprecipitation of enhancer features followed by sequencing (ChIP-seq) has been used to map potential enhancer. DNase hypersensitivity (Crawford et al., 2006; Thurman et al., 2012), H3K27ac (histone H3 acetylated on lysine 27) occupancy (Creyghton et al., 2010; Nord et al., 2013), or H3K4me1 (histone H3 mono-methylated on lysine 4) occupancy (Heintzman et al., 2007) are chromatin features that have been used to identify cell-type- specific enhancers. While most enhancers are DNase hypersensitive, DNase hypersensitive regions are often not active enhancers (Crawford et al., 2006; Thurman et al., 2012). H3K27ac is enriched on cell type-specific enhancers (Creyghton et al., 2010; Nord et al., 2013), but may be a less accurate predictor of enhancers than other transcriptional regulators (Dogan et al., 2015). Chromatin occupancy of Ep300, a transcriptional co-activator that catalyzes H3K27ac deposition, has been found to accurately predict active enhancers (Visel et al., 2009b). However, antibodies for Ep300 are marginal for robust ChIP-seq, particularly from tissues, leading to low reproducibility, variation between antibody lots, and inefficient enhancer identification (Gasper et al., 2014).

Mammalian tissues are composed of multiple cell types, each with their own lineage-specific transcriptional enhancers. Thus defining lineage-specific enhancers from mammalian tissues requires developing strategies that overcome the cellular heterogeneity of mammalian tissues, particularly when the lineage of interest comprises a small fraction of the cells in the tissue. Past efforts to surmount this challenge have taken the strategy of purifying nuclei from the cell type of interest using a lineage-specific tag. For instance, nuclei labeled by lineage-specific expression of a fluorescent protein have been purified by FACS (Bonn et al., 2012). This method is limited by the need to dissociate tissues and recover intact nuclei, and by the relatively slow rate of FACS and the need to collect millions of labeled nuclei. To circumvent the FACS bottleneck, cell type-specific overexpression of tagged SUN1, a nuclear envelope protein, has been used to permit affinity purification of nuclei (Deal and Henikoff, 2010; Mo et al., 2015). Although this mouse line was reported to be normal, SUN1 overexpression potentially could affect cell phenotype and gene regulation (Chen et al., 2012). Chromatin from isolated nuclei are then subjected to ChIP-seq to identify histone signatures of enhancer activity. However, as noted above histone signatures may less accurately predict enhancer activity compared to occupancy by key transcriptional regulators (Dogan et al., 2015).

Here, we report an approach to identify murine enhancers active in a specific lineage within a tissue. We developed a knock-in allele of Ep300 in which the protein is labeled by the bio peptide sequence (de Boer et al., 2003; He et al., 2011). Cre recombinase-directed, cell type specific expression of BirA, an E. coli enzyme that biotinylates the bio epitope tag (de Boer et al., 2003), allows selective Ep300 ChIP-seq, thereby identifying enhancers active in the cell type of interest. Using this strategy, we identified thousands of endothelial cell (EC) and skeletal muscle lineage enhancers active during embryonic development. Extending the approach to adult organs, we defined adult EC enhancers, including enhancers associated with distinct EC gene expression programs in heart compared to lung. Analysis of motifs enriched in EC or skeletal muscle lineage enhancers predicted novel transcription factor motif signatures that govern EC gene expression.

Results

Efficient identification of enhancers using Ep300^fb bioChIP-seq

We developed an epitope-tagged Ep300 allele, Ep300^fb, in which FLAG and bio epitopes (de Boer et al., 2003; He et al., 2011) were knocked into the C-terminus of endogenous Ep300 (Figure 1A–B and Figure 1—figure supplement 1A). Transgenically expressed BirA (Driegen et al., 2005) biotinylates the bio epitope, permitting quantitative Ep300 pull down on streptavidin beads (Figure 1C). We have not noted abnormal phenotypes. Heart development and function are sensitive to Ep300 gene dosage (Shikama et al., 2003; Wei et al., 2008), yet Ep300^fb/fb homozygous mice survived normally (Figure 1D) and Ep300^fb/fb hearts expressed normal levels of Ep300 and had normal size and function (Figure 1—figure supplement 1B–E). These data indicate that Ep300^fb is not overtly hypomorphic.

Figure 1 with 1 supplement see all

Download asset Open asset

Generation and characterization of Ep300^flbio allele.

(A) Experimental strategy for high affinity Ep300 pull down. A flag-bio epitope was knocked onto C-terminus of the endogenous Ep300 gene. The *bio* peptide sequence is biotinylated by BirA, widely expressed from the Rosa26 locus. (B) Targeting strategy to knock flag-bio epitope into the C-terminus of Ep300. A targeting vector containing homology arms, flag-bio epitope, and Frt-neo-Frt cassette was used to insert the epitope tag into embryonic stem cells by homologous recombination. Chimeric mice were mated with Act:Flpe mice to excise the Frt-neo-Frt cassette in the germline, yielding the *Ep300^fb*allele. (C) Biotinylated Ep300 is quantitatively pulled down by streptavidin beads. Protein extract from *Ep300^flbio/flbio; Rosa26^BirA*embryos was incubated with immobilized streptavidin. Input, bound, and unbound fractions were analyzed for Ep300 by immunoblotting. GAPDH was used as an internal control. (D) Ep300^flbio/flbio mice from heterozygous intercrosses survived normally to weaning.

https://doi.org/10.7554/eLife.22039.002

To evaluate Ep300^fb-based mapping of Ep300 chromatin occupancy, we isolated embryonic stem cells (ESCs) from Ep300^fb/fb; Rosa26^BirA/BirA mice. We then performed Ep300^fbbiotin-mediated chromatin precipitation followed by sequencing (bioChiP-seq), in which high affinity biotin-streptavidin interaction is used to pull down Ep300 and its associated chromatin (He et al., 2011). Biological duplicate sample signals and peak calls correlated well (93.6% overlap; Spearman r = 0.96; Figure 2A–B). We compared the results to publicly available Ep300 antibody ChIP-seq data generated by ENCODE (overlap between duplicate peaks 77.8%; r = 0.91; Figure 2A–B). Ep300 bioChiP-seq identified 48963 Ep300-bound regions (‘Ep300 regions’) shared by both replicates, compared to 15281 for Ep300 antibody ChIP-seq (Figure 2A,C). The large majority (89.6%) of Ep300 regions detected by antibody were also found by Ep300 bioChiP-seq, and Ep300 signal was substantially stronger using bioChiP-seq (Figure 2A,C,D). These data indicate that Ep300^fb bioChiP-seq has improved sensitivity compared to Ep300 antibody ChIP-seq for mapping Ep300 chromatin occupancy in cultured cells.

Figure 2

Download asset Open asset

Comparison of Ep300 bioChiP-seq to antibody ChIP-seq for mapping Ep300 chromatin occupancy.

(A–B) Comparison of biological duplicate antibody Ep300 ChIP-seq (Encode) and Ep300 bioChIP-seq (flbio). The Ep300 bioChIP-seq data had greater overlap between replicates and greater intragroup correlation. Most antibody peaks were covered by the bioChIP-seq data. There were 3.3 times more Ep300 regions detected by Ep300 bioChiP-seq. 89.6% of Ep300 regions detected by antibody ChIP-seq were recovered by Ep300 bioChIP. Panel B shows Spearman correlation between samples over the peak regions. (C) Tag heatmap shows input-subtracted Ep300 antibody or bioChIP signal in the union of the Ep300-bound regions detected by each method. (D) Correlation plots show greater Ep300 bioChIP-seq signal compared to antibody bioChIP-seq.

https://doi.org/10.7554/eLife.22039.004

Identification of tissue-specific enhancers using Ep300^fb bioChIP-seq

We used Ep300^fb/+; Rosa26^BirA/+ mice to analyze Ep300^fb genome-wide occupancy in embryonic heart and forebrain. We performed bioChiP-seq on heart and forebrain from embryonic day 12.5 (E12.5) embryos in biological duplicate (Figure 3A–B). There was high reproducibility (83% and 93%, respectively) between biological duplicates (Figure 3B and Figure 3—figure supplement 1A). In comparison, published Ep300 antibody ChIP-seq from E11.5 heart and forebrain (Visel et al., 2009b) had lower signal-to-noise and yielded few peaks when analyzed using the same peak detection algorithm (MACS2 [Zhang et al., 2008]). Using the originally published peaks, antibody-based Ep300 ChIP-seq yielded 9.5x or 3.0x less Ep300 regions in heart and forebrain, respectively (Figure 3B). These regions overlapped 58.7% and 64.7% of the Ep300^fb bioChIP-seq regions, suggesting that the epitope-tagged allele has superior sensitivity and specificity for mapping Ep300-bound regions in tissues, as it does in cultured cells.

Figure 3 with 3 supplements see all

Download asset Open asset

Tissue specific Ep300 bioChiP-seq.

(A) Tag heatmap showing Ep300^fb signal in heart and forebrain. Each row represents a region that was bound by Ep300 in heart, forebrain, or both. A minority of Ep300-enriched regions were shared between tissues. (B) Ep300^fb pull down from E12.5 heart or forebrain. Ep300-bound regions identified by bioChIP-seq of *Ep300^fb/fb; Rosa26^BirA* tissues in biological duplicates are compared to regions identified by antibody-mediated Ep300 ChIP-seq (Visel et al., 2009b). (C) GO biological process terms enriched for genes that neighbor the tissue-specific heart or forebrain peaks. Bars indicate statistical significance.

https://doi.org/10.7554/eLife.22039.005

We compared Ep300^fb regions from forebrain and heart (Supplementary file 1). Only a minority of Ep300^fb regions (8.9% for heart and 31.3% for brain) were common between tissues (Figure 3A). Viewing Ep300^fb bioChiP-seq signal at genes selectively expressed in heart or brain confirmed robust tissue-specific differences that overlapped enhancers with known tissue-specific activity (Figure 3—figure supplement 1B). Genes neighboring the Ep300^fb occupied regions specific to heart or forebrain were enriched for gene ontology (GO) functional terms relevant to the respective tissue (Figure 3C). These results reinforce the conclusion that Ep300 occupies tissue-specific enhancers and indicate that this conclusion was not a consequence of insensitive detection of Ep300-occupied regions in earlier studies (Visel et al., 2009b).

Ep300 is a histone acetyltransferase, and one of its enzymatic products is histone H3 acetylated on lysine 27 (H3K27ac). We compared the genome-wide signal of Ep300^fb and H3K27ac in E12.5 heart and forebrain (Figure 3—figure supplement 1C and data not shown). There was a high correlation between biological replicates (r = 0.98). Ep300 was also well correlated with H3K27ac (r = 0.64), independently validating the Ep300^fb bioChIP-seq data. The previously published Ep300 antibody ChIP-seq data (Visel et al., 2009b) was less well correlated to H3K27ac (r = 0.37), although the correlation was highly statistically significant (p<0.0001). Interestingly, 26.4% and 52.9% of heart and brain H3K27ac regions were shared between tissues (Figure 3—figure supplement 1D) compared to 8.9% and 31.3% for Ep300^fb heart and brain regions, respectively (Figure 3A), suggesting that Ep300^fb occupancy is more tissue-specific.

We analyzed the prediction of active enhancers by our Ep300^fb bioChiP-seq data. The VISTA Enhancer database (Visel et al., 2007) contains thousands of genomic regions that have been tested for tissue-specific enhancer activity using an in vivo transient transgenic assay. 185 tested regions had heart activity and 130 (70%) of these overlapped Ep300^fb regions that were reproduced in both biological duplicates. In comparison, only 105 (57%) of these regions overlapped the regions previously reported to be bound by Ep300 using antibody ChIP-seq.

Recently human and mouse Ep300 and H3K27ac ChIP-seq data from fetal and adult heart were combined to yield a ‘compendium’ of heart enhancers, with the strength of ChIP-seq signal used to provide an ‘enhancer score’ ranging from 0 to 1 that correlated with the likelihood of regions covered in the VISTA database to show heart activity (Dickel et al., 2016). We compared our heart Ep300 regions to this compendium. Overall, 9438/72508 (13%) regions in the prenatal compendium overlapped with the Ep300 heart regions (Figure 3—figure supplement 2A). However, the overlap frequency increased markedly for regions with higher enhancer scores (Figure 3—figure supplement 2B). For example, if one considers the 3571 compendium regions with an enhancer score of at least 0.4 (corresponding to a validation rate in the VISTA database of ~25%), 2647 (74.1%) were contained within the heart Ep300 regions, and 63/68 (92.6%) regions with a score of at least 0.8 (validation rate ~43%) overlapped a heart Ep300 region. Thus, heart compendium regions that are more likely to have in vivo heart activity are largely covered by heart Ep300 regions. On the other hand, 10752 (53%) heart Ep300 regions were not covered by the compendium, suggesting that this database is incomplete, potentially as a result of its use of incomplete antibody-based Ep300 ChIP-seq data.

Ep300 antibody ChIP-seq was one of the criteria used to select some of the test regions in the VISTA Enhancer database; as an independent test free of this potentially confounding effect, we searched the literature for other heart enhancers that were confirmed using the transient transgenic assay. We identified 40 additional heart enhancers. Of these, 24 (60%) intersected the Ep300^fb regions common to both replicates. In comparison, only 6/40 (15%) intersected the regions detected previously by Ep300 antibody ChIP-seq. Few heart enhancers were found in the regions unique to Ep300 antibody ChIP-seq (11/185 VISTA and 2/40 non-VISTA), compared to regions unique to Ep300^fb bioChIP-seq (36/185 VISTA and 20/40 non-VISTA). We conclude that Ep300^fb ChIP-seq predicts heart enhancers with sensitivity that is superior to antibody-mediated Ep300 ChIP-seq.

Other chromatin features have been used to predict transcriptional enhancers. We compared the accuracy of Ep300 bioChiP-seq to other chromatin features for heart enhancer prediction. To map accessible chromatin, we performed ATAC-seq (assay for transposable-accessible chromatin followed by sequencing) on E12.5 cardiomyocytes. E12.5 heart ChIP-seq data for modified histones (H3K27ac, H3K4me1, H3K4me2, H3K4me3, H3K9ac, H3K27me3, H3K9me3, H3K36me3) were obtained from publicly available datasets (see Materials and methods, Data Sources). Using a machine learning approach and the VISTA enhancer database as the gold standard, we evaluated the accuracy of each of these chromatin features, compared to Ep300 bioChiP-seq, for predictive heart enhancers (Figure 3—figure supplement 3). This analysis showed that Ep300 bioChiP-seq was the single most predictive chromatin feature (area under the receiver operating characteristic curve (AUC) = 0.805). ATAC-seq and H3K27ac also performed well (AUC = 0.749 and 0.747, respectively), whereas H3K4me1 had was poorly predictive (AUC = 0.589). Combining Ep300 bioChIP-seq with ATAC-seq improved predictive accuracy (AUC = 0.866), equivalent to the value obtained by performing predictions with all of the chromatin features (AUC = 0.862). These analyses indicate that of the features tested Ep300 is the best single factor for enhancer prediction.

Cre-activated, lineage-specific Ep300^fb bioChIP-seq

In vivo biotinylation of Ep300^fb requires co-expression of the biotinylating enzyme BirA. We reasoned that Ep300^fb bioChIP-seq could be targeted to a Cre-labeled lineage by making BirA expression Cre-dependent. Therefore, we established Rosa26^fsBirA, in which BirA expression is contingent upon Cre excision of a floxed-stop (fs) cassette (Figure 4A). In preliminary experiments, we showed that Rosa26^fsBirA expression of BirA was Cre-dependent (Figure 4—figure supplement 1A), as was Ep300^fb biotinylation (Figure 4B). When activated by Cre driven from Tek regulatory elements (Tg(Tek-cre)1Ywa/J; also known as Tie2Cre), BirA was expressed in endothelial and blood lineages (Figure 4C), consistent with this Cre transgene's labeling pattern. Thus, Rosa26^fsBirA expresses BirA in a Cre-dependent manner.

Figure 4 with 1 supplement see all

Download asset Open asset

Cre-directed, lineage-selective Ep300 bioChiP-seq.

(A) Experimental strategy. Lineage-specific Cre recombinase activates expression of BirA (HA-tagged) from Rosa26-flox-stop-BirA (*Rosa26^fsBirA*). This results in Ep300 biotinylation in the progeny of Cre-expressing cells. (B) Cre-dependent Ep300 biotinylation using *Rosa26^fsBirA*. Protein extracts were prepared from E11.5 embryos with the indicated genotypes. R26^BirA/+ (ca) and R26^fsBirA/+ indicate the constitutively active and Cre-activated alleles, respectively. (C) Immunostaining demonstrating selective *Tie2Cre*-mediated expression of HA-tagged BirA in ECs of R26^fsBirA; *Tie2Cre* embryos. Arrows and arrowheads indicate endothelial and hematopoietic lineages, respectively. (D) Tissue-selective Ep300 bioChIP-seq. *Tie2Cre* (T; endothelial and hematopoietic lineages), *Myf5Cre* (M; skeletal muscle lineages), and *EIIaCre* (E; germline activation) were used to drive tissue-selective Ep300 bioChIP-seq. Correlation between ChIP-seq signals within peak regions are shown for triplicate biological repeats. Samples within groups were the most closely correlated. (E) Ep300^fb bioChIP-seq signal at *Gata2* (EC/blood specific) and *Myod* (muscle specific). Enhancers validated by transient transgenic assay are indicated along with the citation’s Pubmed identifier (PMID). (F) Biological process GO terms illustrate distinct functional groups of genes that neighbor Ep300 bioChIP-seq driven by lineage-specific Cre alleles. The heatmap contains the top 10 terms enriched for genes neighboring each of the three lineage-selective Ep300 regions.

https://doi.org/10.7554/eLife.22039.009

Next, we compared Ep300^fb bioChIP-seq from embryos when driven by Tie2Cre (endothelial and blood lineages) (Kisanuki et al., 2001), Myf5^tm3(cre)Sor/J (referred to as Myf5Cre; skeletal muscle lineage) (Tallquist et al., 2000), or Tg(EIIa-cre)C5379Lmgd/J (also known as EIIaCre; ubiquitous) (Williams-Simons and Westphal, 1999). For the Tie2Cre and EIIaCre samples, we used E11.5 embryos, a stage with robust angiogenesis. For Myf5Cre, we used E13.5 embryos, when muscle lineage cells are in a range of stages in the muscle differentiation program, spanning muscle progenitors to differentiated muscle fibers. BioChiP-seq from biological triplicates showed high within-group correlation, and lower between-group correlation, demonstrating the strong effect of different Cre transgenes in directing Ep300^fb bioChIP-seq (Figure 4D). Viewing the bioChiP-seq signals in a genome browser confirmed lineage-selective signal enrichment. For example, Tie2Cre drove high Ep300^fb bioChIP-seq signal at a Gata2 intronic enhancer with known activity in endothelial and blood lineages (Figure 4E, top panel). There was less signal at this region in Myf5Cre and EIIaCre samples. At the skeletal muscle specific gene Myod, Myf5Cre drove strong Ep300^fb bioChiP-seq signal at a known distal enhancer (Goldhamer et al., 1992), as well as a second Ep300 bound region about 12 kb upstream from the transcriptional start site.

To identify lineage-selective regions genome-wide, we filtered for regions with called peaks in which the lineage-specific Cre (Tie2Cre or Myf5Cre) Ep300^fb signal was at least 1.5 times the ubiquitous Cre (EIIaCre) Ep300^fb signal (Figure 4—figure supplement 1B–C). This led to the identification of 2411 regions with enriched signal in Tie2Cre (Ep300-T-fb) and 1292 regions with enriched signal in Myf5Cre (Ep300-M-fb), compared to 17382 regions with Ep300^fb occupancy detected with ubiquitous biotinylation (Ep300-E-fb; Supplementary file 1). We analyzed the biological process gene ontology terms enriched for genes neighboring these three sets of Ep300 regions (Figure 4F). Ep300-T-fb regions were highly enriched for functional terms related to angiogenesis and hematopoiesis, whereas the Ep300-M-fb regions were highly enriched for functional terms related to skeletal muscle. Together, these results indicate that our strategy for Cre-driven, lineage-specific, Ep300^fb bioChiP-seq successfully identifies regulatory regions that are associated with lineage relevant biological processes.

Functional validation of enhancer activity of lineage-specific Ep300^fb regions

We next set out to validate the in vivo enhancer activity of the regions with Cre-driven lineage-enriched Ep300^fb occupancy. If a substantial fraction of the Ep300-T-fb regions have transcriptional enhancer activity, then genes neighboring these enhancers should be expressed at higher levels in Tie2Cre lineage cells. To test this hypothesis, we used Tie2Cre-activated translating ribosome affinity purification (Heiman et al., 2008; Zhou et al., 2013) (T2-TRAP) to obtain the gene expression of Tie2Cre-marked cell lineages. Using this lineage-specific expression profile, we then compared the expression of genes neighboring Ep300-T-fb, Ep300-M-fb, and Ep300-E-fb regions. Ep300-T-fb neighboring genes were more highly expressed compared to Ep300-E-fb neighboring genes (p<10⁻³⁸, Mann-Whitney U-test; Figure 5A). In contrast, there was no significant difference between Ep300-M-fb and Ep300-E-fb neighboring genes (Figure 5A). This result held regardless of the maximal distance threshold used to find the gene nearest to a Ep300 region (Figure 5—figure supplement 1A). We also compared the expression of genes with and without an associated Ep300-T-fb region. Ep300-T-fb-associated genes were more highly expressed than non-associated genes (Figure 5—figure supplement 1B). Some Ep300-T-fb-associated genes were not detected within actively translating transcripts (Figure 5—figure supplement 1C). This suggests that in some cases Ep300 enhancer binding is not sufficient to drive gene expression; other contributing factors likely include imprecision of the enhancer-to-gene mapping rule, and regulation at the level of ribosome binding to transcripts. Together, our data are consistent with Ep300-T-fb regions being enriched for enhancers that are active in the Tie2Cre-labeled lineage.

Figure 5 with 3 supplements see all

Download asset Open asset

Functional validation of enhancer activity of Ep300-T-fb-bound regions.

(A) Expression of genes neighboring regions bound by Ep300^fb in different Cre-marked lineages. Translating ribosome affinity purification (TRAP) was used to enrich for RNAs from the *Tie2Cre* lineage. Input or *Tie2Cre*-enriched RNAs were profiled by RNA sequencing. The expression of the nearest gene neighboring Ep300 regions in ECs (Ep300-T-fb regions), but not skeletal muscle (Ep300-M-fb regions), was higher than that of genes neighboring regions bound by Ep300 across the whole embryo (*EIIaCre*). Box and whiskers show quartiles and 1.5 times the interquartile range. Groups were compared to *EIIaCre* using the Mann-Whitney U-test. (B) Transient transgenesis assay to measure in vivo activity of Ep300-T-fb regions. Test regions were positioned upstream of an *hsp68* minimal promoter and lacZ. Embryos were assayed at E11.5. (C) Summary of transient transgenic assay results. Out of 20 regions tested, nine showed activity in ECs or blood in three or more embryos, and two more showed activity in two embryos. See also Table 2. (D) Representative whole mount Xgal-stained embryos. Enhancers that directed LacZ expression in an EC or blood pattern in two or more embryos are shown. Numbers indicate embryos with LacZ distribution similar to shown image, compared to the total number of PCR positive embryos. (E) Sections of Xgal-stained embryos showing examples of enhancers active in arteries, veins, and endocardium, or selectively active in arteries or veins. AS: aortic sac; CV: cardinal vein; DA: dorsal aorta; EC: endocardial cushion; HV: head vein; LA: left atrium; LV: left ventricle; RV: right ventricle. Scale bars, 100 µm. See also Figure 5—figure supplements 1 and 2 and Table 2.

https://doi.org/10.7554/eLife.22039.011

To further evaluate the enhancer activity of Ep300-T-fb regions, we searched the literature and the VISTA Enhancer database (Visel et al., 2007) for genomic regions with endothelial cell activity validated by transient transgenesis (Table 1). Of 40 positive regions identified, 19 (47.5%) overlapped with Ep300-T-fb regions. Next, we used the transient transgenic assay to test the lineage-selective enhancer activity of 20 additional Ep300-T-fb regions. We selected regions that neighbored genes with EC-selective expression (T2-TRAP more than 10-fold enriched over input RNA) and with known or potential relevance to angiogenesis. Of the 20 tested Ep300-T-fb regions, eight drove reporter gene activity in at least three embryos in a vascular pattern, in both whole mounts and histological sections, and two additional regions drove reporter gene activity in a vascular pattern in two embryos (Figure 5B–D; Table 2; Figure 5—figure supplements 2–3). In retrospect, two of the positive enhancers (Eng; Mef2c) had been described previously (Table 2) (Pimanda et al., 2006; De Val et al., 2008). In some of these cases, there was also activity in blood cells, consistent with Tie2Cre activity in both blood and endothelial lineages. Additionally, one enhancer of Lmo2 was active blood cells but not ECs. Thus a substantial fraction (9/20; 45%) of regions identified by lineage-selective, Cre-directed bioChIP-seq have appropriate and reproducible in vivo activity.

Table 1

mm9 genome coordinates of regions with EC activity as determined by transient transgenic assay. Vista_XXX indicates that the region was obtained from the VISTA enhancer database. Lifeover indicates that the region was inferred by liftover from the human genome. For enhancers obtained from the literature, Pubmed was searched for ‘endothelial cell enhancer’. The resulting references were manually curated for transient transgenic testing of candidate endothelial cell enhancer regions.

https://doi.org/10.7554/eLife.22039.015

Chr	Start	End	Note
chr9	37206631	37209631	Robo4;PMID17495228;bloodVessels
chr4	94412022	94413640	Tek;PMID9096345;bloodVessels
chr8	106625634	106626053	Cdh5;PMID15746076;bloodVessels
chr11	49445663	49446520	Flt4;posEC;liftoverFromPMID19070576;FoxETS
chr6	99338223	99339713	FoxP1;posEC;liftoverFromPMID19070576;FoxETS
chr8	130910720	130911740	Nrp1;posEC;liftoverFromPMID19070576;FoxETS
chr18	61219017	61219690	Pdgfrb;posEC;liftoverFromPMID19070576;FoxETS
chr4	137475841	137476446	Ece1;posEC;liftoverFromPMID19070576;FoxETS;artery
chr4	114743698	114748936	Tal1;PMID14966269;endocardium;bloodVessels
chr2	119152861	119153661	Dll4;PMID23830865;arterial
chr13	83721919	83721962	Mef2c;PMID19070576;FoxETS
chr13	83711086	83711527	Mefec;PMID15501228;panEC
chr2	32493213	32493467	Eng;liftoverFromPMID16484587;bloodVessels
chr2	32517844	32518197	Eng;liftoverFromPMID18805961;bloodVessels;blood
chr6	88152598	88153791	Gata2;PMID17395646;PMID17347142;bloodVessels
chr9	32337302	32337549	Fli1;PMID15649946;bloodVessels
chr5	76370571	76372841	KDR;PMID10361126;bloodVessels
chr6	125502138	125502981	VWF;PMID20980682;smallbloodVessels
chr5	148537311	148538294	Flt1;liftoverFromPMID19822898
chr17	34701455	34702277	Notch4;liftoverFromPMID15684396
chr2	155568649	155569132	Procr;liftoverfromPMID16627757;bloodVessels[7/17]
chr9	63916890	63917347	Smad6;liftoverfromPMID17213321;bloodVessels
chr19	37510161	37510394	Hhex;liftoverfromPMID15649946;bloodVessels;blood
chr5	76357892	76358715	Flk1;PMID27079877;bloodVessesl;artery
chr11	32145270	32146411	vista_101;heart[9/12];bloodVessels[7/12]
chr13	28809515	28811310	vista_265;neural tube[7/8];bloodVessels[3/8]
chr17	12982583	12985936	vista_89;heart[6/10];bloodVessels[8/10]
chr4	131631431	131635142	vista_80;bloodVessels[10/10]
chr4	57858433	57860639	vista_261;bloodVessels[5/8]
chr5	93426293	93427320	vista_397;limb[12/13];bloodVessels[12/13]
chr8	28216799	28218903	vista_136;heart[7/10];other[6/10];bloodVessels[4/10]
chr3	87786601	87790798	liftoverFromvista_1891;somite[7/7];midbrain;(mesencephalon)[6/7];limb[7/7];branchial;arch[7/7];eye[7/7];heart[7/7];ear[7/7];bloodVessels[6/7]
chr6	116309006	116309768	liftoverFromvista_2065;bloodVessels[9/9]
chr19	37566512	37571839	liftoverFromvista_1866;bloodVessels[5/5]
chr7	116256647	116260817	liftoverFromvista_1859;neuraltube[8/8];hindbrain;(rhombencephalon)[5/8];midbrain;(mesencephalon)[8/8];forebrain[8/8];heart[7/8];bloodVessels[5/8];liver[4/8]
chr18	14006285	14007917	liftoverFromvista_1653;bloodVessels[5/8]
chr2	152613921	152616703	liftoverFromvista_2050;bloodVessels[5/5]
chr14	32045520	32047900	liftoverFromvista_2179;bloodVessels[5/7]
chr15	73570041	73577576	liftoverFromvista_1882;bloodVessels[8/8]
chr8	28216817	28218896	liftoverFromvista_1665;heart[5/7];bloodVessels[7/7]

Table 2

Summary of transient transgenic validation of candidate EC enhancers.

https://doi.org/10.7554/eLife.22039.016

Neighboring gene	Region (mm9)	Size (bp)	Location w/r gene	Distance to TSS	Whole mount			Sections				Ref. (PMID)
					#PCR pos	#LacZ pos	# EC or blood pos	Endo	Art	Vein	Blood cells
Apln	chrX:45358891–45359918	1028	3'_Distal	28,624	10	3	3	+	+	+	−	−
Dab2	chr15:6009504–6010497	994	5'_Distal	−239,788	24	10	9	+	++	+	+++	−
Egfl7_enh1	chr2:26427040–26428029	990	5'_Distal	−9,041	9	3	3	+++	+++	+++	+	−
Eng	chr2:32493216–32494019	804	5'_Distal	−8,497	12	4	3	++	++	++	−	16484587
Ephb4	chr5:137789649–137790412	764	5'_Proximal	−1,306	7	5	4	++	−	+++	−	−
Lmo2	chr2:103733621–103734378	758	5'_Distal	−64,152	29	14	10	−	−	−	+++	−
Mef2c	chr13:83721522–83722451	930	Intragenic	78,954	10	6	5	+	−	+++	+	19070576
Notch1_enh1	chr2:26330255–26331184	930	Intragenic	−28,622	22	3	3	+++	++	++	−	−
Sema6d	chr2:124380522–124381285	764	5'_Distal	−55,128	17	10	7	++	+++	−	+	−
Egfl7_enh3	chr2:26433680–26434642	963	5'_Distal	−2,415	12	2	2	+	++	+++	−	−
Sox7	chr14:64576382–64577118	737	3'_Distal	14,207	7	4	2	+	++	−	−	−
Aplnr	chr2:85003436–85004412	977	3'_Distal	27,407	12	3	0	NA	NA	NA	NA	−
Egfl7_enh2	chr2:26431273–26431995	723	5'_Proximal	−4,942	9	3	1	+	−	−	−	−
Emcn	chr3:136984933–136986103	1171	5'_Distal	−18,524	15	1	1	−	−	−	+	−
Ets1	chr9:32481485–32482133	649	Intragenic	−21,818	11	1	0	NA	NA	NA	NA	−
Foxc1	chr13:31921976–31922827	852	3'_Distal	23,887	13	3	1*	NA	NA	NA	NA	−
Gata2	chr6:88101907–88102696	790	5'_Distal	−46,356	8	0	0	NA	NA	NA	NA	−
Lyve1	chr7:118020264–118021043	780	5'_Distal	−3,128,028	7	1	1	+	+	−	−	−
Notch1_enh2	chr2:26345973–26347118	1146	Intragenic	12,796	9	0	0	NA	NA	NA	NA	−
Sox18	chr2:181397552–181398335	784	3'_Proximal	8401	13	2	1	−	++	−	−	−

*EC/blood pattern on whole mount not validated in histological sections.

Arterial and venous ECs have overlapping but distinct gene expression programs, yet only three artery-specific and no vein-specific transcriptional enhancers have been described (Wythe et al., 2013; Robinson et al., 2014; Becker et al., 2016; Sacilotto et al., 2013). We examined histological sections of the transient transgenic embryos to determine if a subset is selectively active in ECs of the dorsal aorta or cardinal vein. We identified an enhancer, Sema6d-enh, with activity predominantly in ECs that line the dorsal aorta but not the cardinal vein (Figure 5E, Table 2). A second enhancer, Sox7-enh, also showed selective activity in the dorsal aorta, although this was only reproduced in two embryos. Both were also active in the endocardium and endocardial derivatives of the cardiac outflow tract (Figure 5—figure supplement 3). We also identified two enhancers with activity at E11.5 predominantly in ECs that line the cardinal vein and not the dorsal aorta (Ephb4-enh and Mef2c-enh; Figure 5E, Table 2, and Figure 5—figure supplement 3). Interestingly, a core 44-bp region of Mef2c-enh had been previously reported to drive pan-EC reporter expression at E8.5–E9.5 (De Val et al., 2008). This enhancer’s activity pattern may be dynamically regulated at different developmental stages, as has been described previously for two artery-specific enhancers (Robinson et al., 2014; Becker et al., 2016). Further analysis of these enhancers will be required to confirm the artery- and vein- selective activity patterns that we observed, and to better characterize their temporospatial regulation.

Collectively, our data show that we have developed and validated a robust method for identification of transcriptional enhancers in Cre-marked lineages. Using this strategy, we discovered thousands of candidate skeletal muscle, EC, and blood cell enhancers. Based on our validation studies, we expect that a majority of these candidate regions have in vivo, cell-type specific transcriptional enhancer activity.

Transcription factor binding motifs enriched in skeletal muscle and EC/blood enhancers

Ep300 does not bind directly to DNA. Rather, transcription factors recognize sequence motifs in DNA and subsequently recruit Ep300. The transcription factors and transcription factor combinations that direct enhancer activity in skeletal muscle, blood, and endothelial lineages are incompletely described. To gain more insights into this question, we searched for transcription factor binding motifs that were over-represented in the candidate enhancer regions bound by Ep300 in skeletal muscle or blood/EC lineages. Starting from 1445 motifs for transcription factors or transcription factor heterodimers, we found 173 motifs over-represented in Ep300-T-fb or Ep300-M-fb regions (false discovery rate < 0.01% and frequency in Ep300 regions greater than 5%). Clustering and selection of representative non-redundant motifs left 40 motifs that were enriched in either Ep300-T-fb or Ep300-M-fb, or both (Figure 6A). Many closely related motifs were independently detected by de novo motif discovery (Figure 6—figure supplement 1A). Analysis of our T2-TRAP RNA-seq data identified genes expressed in embryonic ECs that potentially bind to these motifs (Supplementary file 2).

Figure 6 with 1 supplement see all

Download asset Open asset

Motifs enriched in Ep300-T-fb and Ep300-M-fb regions.

(A) Motifs enriched in Ep300-T-fb or Ep300-M-fb regions. 1445 motifs were tested for enrichment in Ep300 bound regions compared to randomly permuted control regions. Significantly enriched motifs (neg ln p-value>15) were clustered and the displayed non-redundant motifs were manually selected. Heatmaps show statistical enrichment (left), fraction of regions that contain the motif (center), and GO terms associated with genes neighboring motif-containing, Ep300-bound regions (fraction of the top 20 GO biological process terms). Grey indicates that the motif was not significantly enriched (neg ln p-value≤15). (B) Conservation of sequences matching *Tie2Cre*- or *Myf5Cre*-enriched motifs within 100 bp of the summit of Ep300-T-fb or Ep300-M-fb regions, compared to randomly selected 12 bp sequences from the same regions. PhastCons conservation scores across 30 vertebrate species were used. ****p<0.0001, Kolmogorov–Smirnov test. (C) Luciferase assay of activity of enhancers containing indicated motifs. Three repeats of 20–30 bp regions from Ep300-bound enhancers linked to the indicated gene and centered on the indicated motif were cloned upstream of a minimal promoter and luciferase. The constructs were transfected into human umbilical vein endothelial cells. Luciferase activity was expressed as fold activation above that driven by the enhancerless, minimal promoter-luciferase construct. *p<0.05 compared to Mef2c-enhancer with mutated ETS:FOX2 motif (Mef2c mut). n = 3. (D) Luciferase assay of indicated motifs repeated three times within a consistent DNA context. Assay was performed as in C. *p<0.05 compared to negative control sequencing lacking predicted motif. n = 3. Error bars in C and D indicate standard error of the mean.

https://doi.org/10.7554/eLife.22039.017

The GATA and ETS motifs were the most highly enriched motifs in the Tie2Cre-marked blood and EC lineages (Figure 6A). Interestingly, GO analysis of the subset of regions positive for these motifs showed that GATA-containing regions are highly enriched for hematopoiesis and heme synthesis, consistent with the critical roles of GATA1/2/3 in these processes (Bresnick et al., 2012). On the other hand, ETS-containing regions were highly enriched for functional terms linked to angiogenesis, also consistent with the key roles of ETS factors in angiogenesis (Wei et al., 2009) and our prior finding that the ETS motif is enriched in dynamic VEGF-dependent EC enhancers (Zhang et al., 2013). The Ebox motif, recognized by bHLH proteins, was the most highly enriched motif in the skeletal muscle lineage, consistent with the important roles of bHLH factors such as Myod and Myf5 in skeletal muscle development (Buckingham and Rigby, 2014). Interestingly, the Ebox motif was also highly enriched in Tie2Cre-marked cells, and genes neighboring these Ebox-containing regions were functionally related to both angiogenesis and heme synthesis. bHLH-encoding genes such as Hey1/2, Scl, and Myc are known to be important in blood and vascular development.

The database used for our motif search included 315 heterodimer motifs that were recently discovered through high throughput sequencing of DNA concurrently bound by two different transcription factors (Jolma et al., 2015). This allowed us to probe for enrichment of heterodimer motifs that may contribute to enhancer activity in blood, EC, and skeletal muscle lineages. One heterodimer motif that was highly enriched in Ep300-T-fb regions (and not Ep300-M-fb regions; referred to as T2Cre-enriched motifs) was ETS:FOX2 (AAACAGGAA), comprised of a tail-to-tail fusion of Fox (TGTTT) and ETS (GGAA) binding sites (Figure 6A). GO analysis showed that this motif was closely linked to vascular biological process terms (Figure 6A). This motif was previously found to be sufficient to drive enhancer EC activity during vasculogenesis and developmental angiogenesis (De Val et al., 2008), validating that our approach is able to identify bona fide, functional heterodimer motifs. Interestingly, we discovered two additional ETS-FOX heterodimer motifs, which were also highly enriched in Ep300-T-fb regions and also linked to vascular biological process terms: ETS:FOX1 (GGATGTT), consisting of a head-to-tail fusion between ETS and FOX motifs, with the ETS motif located 5' to the FOX motif (Figure 6A, arrows over motif logo), and ETS:FOX3 (TGTTTACGGAA), a head-to-tail fusion with the FOX motif located 5' to the ETS motif. Other heterodimer motifs that were enriched in Ep300-T-fb regions and to our knowledge previously were unrecognized as regulatory elements in ECs were ETS:TBox, ETS:HOMEO, and ETS:Ebox. Similar analysis of Ep300-M-fb regions identified highly enriched Ebox-containing heterodimer motifs including Ebox:Hox, Ebox:HOMEO, and ETS:Ebox (‘Myf5Cre-enriched motifs’).

To assess functional significance of these motifs, we examined their evolutionary conservation. Using PhastCons genome conservation scores for 30 vertebrate species (Siepel et al., 2005), we measured the conservation of sequences matching Tie2Cre-enriched motifs within the central 200 bp of Ep300-T-fb regions. Whereas randomly selected 12 bp sequences from these regions exhibited a distribution of scores heavily weighted towards low conservation values, sequences matching Tie2Cre-enriched motifs showed a bimodal distribution consisting of highly conserved and poorly conserved sequences (Figure 6B). The conservation of individual heterodimer motifs such as ETS:FOX1-3 confirmed that they shared this bimodal distribution that included deeply conserved sequences (Figure 6—figure supplement 1B). These findings indicate that a subset of sequences matching Tie2Cre-enriched motifs, including the novel heterodimer motifs, are under selective pressure. Myf5Cre-enriched motifs within the center of Ep300-M-fb regions similarly adopted a bimodal distribution that includes a subset of motif occurrences with high conservation (Figure 6B and Figure 6—figure supplement 1B). This analysis supports the biological function of Tie2Cre- and Myf5Cre-enriched motifs in endothelial cell/blood or skeletal muscle enhancers, respectively.

To further functionally validate the transcriptional activity of these heterodimer motifs, we measured their enhancer activity using luciferase reporter assays. Three repeats of enhancer fragments containing motifs of interest were cloned upstream of a minimal promoter and luciferase. The constructs were transfected into human umbilical vein endothelial cells (HUVECs), and transcriptional activity was measured by luciferase assay, normalized to the enhancerless promoter-luciferase construct (Figure 6C). The well-studied ETS:FOX2 motif from an endothelial Mef2c enhancer (De Val et al., 2008) robustly stimulated transcription to about the same extent as the SV40 enhancer. As expected, mutation of the ETS:FOX2 motif markedly blunted its activity, supporting the specificity of this assay. Interestingly, the alternative ETS:FOX3 motif uncovered by our study was at least as potent in stimulating transcription as the previously described ETS:FOX2 motif. The other novel heterodimer motifs tested, ETS:HOMEO2 and ETS:Ebox, likewise supported strong transcriptional activity, as did the SOX motif, whose enrichment and associated GO terms were highly EC-selective.

To further assess and compare the transcriptional activity of these heterodimer motifs, we cloned upstream of luciferase 3x repeated motifs into a consistent DNA context that had minimal endogenous enhancer activity (Figure 6D). The ETS motif alone only weakly stimulated luciferase expression, whereas the previously described ETS:FOX2 motif robustly activated luciferase expression. Interestingly, both of the newly identified, alternative ETS:FOX motifs (ETS:FOX1 and ETS:FOX3) were more potent activators that ETS:FOX2. The other heterodimer motifs tested, ETS-Homeo2, ETS-Ebox, and ETS-Tbox, also demonstrated significant enhancer activity.

Together, unbiased discovery of cell type specific enhancers coupled with motif analysis identified novel transcription factor signatures that are likely important for gene expression programs of blood, vasculature, and skeletal muscle.

Organ-specific EC enhancers

ECs in different adult organs have distinct gene expression programs that underlie organ-specific EC functions (Nolan et al., 2013; Coppiello et al., 2015). For example, heart ECs are adapted for the transport of fatty acids essential for fueling oxidative phosphorylation in the heart (Coppiello et al., 2015). On the other hand, lung ECs are adapted for efficient transport of gas, but possess specialized tight junctions to minimize transit of water (Mehta et al., 2014). Nolan et al. recently profiled gene expression in ECs freshly isolated from nine different adult mouse organs (Nolan et al., 2013). Clustering the 3104 genes with greater than 4-fold difference in expression across this panel identified genes with selective expression in ECs from a subset of organs (Figure 7—figure supplement 1A). 240 genes were preferentially expressed in ECs from heart (and skeletal muscle) compared to other organs, whereas 355 genes were preferentially expressed in ECs from lung (Figure 7—figure supplement 1B). One cluster contained genes co-enriched in lung and brain, including many tight junction genes such as claudin 5.

We asked if our strategy of lineage-specific Ep300^fb bioChIP-seq would allow us to identify enhancers linked to organ-specific EC gene expression. For these experiments, we used Tg(Cdh5-cre/ERT2)1Rha (also known as VECad-CreERT2) (Sörensen et al., 2009) to drive Ep300^fb biotinylation in ECs (Figure 7—figure supplement 2); unlike Tie2Cre, this transgene does not label hematopoietic cells when induced with tamoxifen in the neonatal period. We isolated adult (eight wk old) heart and lungs from Ep300^fb/+; VEcad-CreERT2⁺; Rosa26^fsBirA/+ mice and performed bioChiP-seq. Triplicate repeat experiments were highly reproducible (Figure 7A). Inspection of Ep300^fb ChIP-seq signals from lung and heart ECs suggested that VEcad-CreERT2 successfully directed Ep300^fb enrichment from ECs. For example, Tbx3 and Meox2, transcription factors selectively expressed in lung and heart ECs, respectively, were associated with Ep300^fb-decorated regions in the matching tissues (Figure 7—figure supplement 3). Interestingly, Meox2 has been implicated in directing expression of heart EC-specific genes and in the pathogenesis of coronary artery disease (Coppiello et al., 2015; Yang et al., 2015). On the other hand, Tbx2/4 and Myh6, genes expressed in non-ECs in lung and heart, were not associated with regions of Ep300^fb enrichment (Figure 7—figure supplement 3).

Figure 7 with 4 supplements see all

Download asset Open asset

Enhancers in adult heart and lung ECs.

*VEcad-CreERT2* and neonatal tamoxifen pulse was used to drive BirA expression in ECs. Ep300 regions in adult heart or lung ECs were identified by bioChIP-seq in biological triplicate. Ep300 regions were ranked into deciles by the ratio of the Ep300-VE-fb signal in heart to lung. (A) Tag heatmap shows that the top and bottom deciles have selective Ep300 occupancy in heart and lung ECs, respectively. (B) Expression of genes in heart ECs (red) or lung ECs (blue) neighboring Ep300 regions, divided into deciles by Ep300-VE-fb signal ratio in heart and lung. ***p<0.0001, Wilcoxon test. Expression values were obtained from (Nolan et al., 2013). Box plots indicate quartiles, and whiskers indicate 1.5 times the interquartile range. (C) Genes with selective expression in heart or lung ECs were identified by K-means clustering (see Materials and methods and Figure 7—figure supplement 1). The number of Ep300-VE-fb regions in heart (red) or lung (blue) neighboring these genes was determined, stratified by decile. ***p<0.0001, Fisher's exact test. (D) Enrichment of selected motifs in Ep300-VE-fb regions from heart (decile 1) compared to lung (decile 10) or vice versa. Grey indicates no significant enrichment. Displayed non-redundant motifs were selected from all significant motifs by manual curation of clustered motifs.

https://doi.org/10.7554/eLife.22039.019

To obtain a broader, unbiased view of the adult EC Ep300^fb bioChIP-seq results, regions bound by Ep300^fb in either heart (14251) or lung (22174) were rank-ordered by the heart to lung Ep300^fb signal ratio (Supplementary file 1). Regions were grouped into deciles by this ratio, with the most heart-enriched regions in decile one and the most lung-enriched regions in decile 10. Next, for genes neighboring regions in each decile, we compared expression in heart compared to lung. Genes neighboring decile one regions (greater Ep300^fb signal in heart) had higher median mRNA transcript levels than lung (Figure 7B). In contrast, genes neighboring decile 10 regions (greater Ep300^fb signal in lungs) had higher median mRNA transcript levels in lung than heart. We then focused on genes with selective expression in heart or lung. More decile 1 Ep300^fb regions were associated with genes with heart-selective expression than lung-selective expression, and more decile 10 Ep300^fb regions were associated with genes with lung-selective expression (in each case, p<0.0001, Fisher's exact test; Figure 7C). These results are consistent with greater heart- or lung-associated enhancer activity in Ep300^fb decile 1 or 10 regions, respectively.

We analyzed the GO terms that are over-represented for genes neighboring Ep300^fb decile 1 or 10 regions. Both sets of regions were highly enriched for GO terms related to vasculature and blood vessel development (Figure 7—figure supplement 4). Analysis of disease ontology terms showed that genes neighboring decile one regions (greater Ep300^fb signal in heart) were significantly associated with terms relevant to coronary artery disease, such as ‘coronary heart disease', 'arteriosclerosis’, and ’myocardial infarction’, whereas genes neighboring decile 10 regions (greater Ep300^fb signal in lung) were less enriched. Conversely, genes neighboring decile 10 regions were selectively associated with hypertension and cerebral artery disease.

To gain insights into transcriptional regulators that preferentially drive heart or lung EC enhancers, we searched for motifs with differential enrichment in decile one compared to decile 10. Using decile one regions as the foreground sequences and decile 10 regions as the background sequences, we detected significant enrichment of the TCF motif in regions preferentially occupied by Ep300 in heart ECs (Figure 7D), consistent with prior work that found that a Meox2:TCF15 complex promotes heart EC-specific gene expression (Coppiello et al., 2015). Other motifs that were significantly enriched in decile 1 (heart) compared to decile 10 regions were FOX, ETS, ETS-FOX and ETS-HOMEO motifs. We performed the reciprocal analysis to identify lung-enriched motifs. One motif enriched in decile 10 (lung) compared to decile one regions was the TBOX motif. Interestingly, TBX3 is a lung EC-enriched transcription factor (Nolan et al., 2013). Other motifs over-represented in decile 10 compared to decile one regions were EBox, ETS:Ebox, and EBOX:HOMEO motifs. Thus differences in transcription factor expression or heterodimer formation in heart and lung ECs may contribute to differences in Ep300 chromatin occupancy and enhancer activity between heart and lung.

Discussion

Here we show that Cre-mediated, tissue-specific activation of BirA permits high affinity bioChiP-seq of factors with a bio epitope tag. Furthermore, we show that combining this strategy with the Ep300^fb knockin allele permits efficient identification of cell type-specific enhancers. The bio epitope has been knocked into a number of different transcription factor loci (He et al., 2012; Waldron et al., 2016) and Jackson Labs 025982 (Rbpj), 025980 (Ep300), 025983 (Mef2c), 025978 (Nkx2-5), 025979 (Srf), and 025977 (Zfpm2)), and these alleles can be combined with Cre-activated BirA to permit lineage-specific mapping of transcription factor binding sites. Cell type-specific protein biotinylation will also be useful for mapping protein-protein interactions in specific cell lineages.

Using this technique, we identified thousands of candidate skeletal muscle and EC enhancers and showed that many of these candidate enhancers are likely to be functional. Furthermore, we showed that the technique can identify tissue-specific enhancers in postnatal tissues, and identified novel candidate enhancers that regulate organ-specific endothelial gene expression. These enhancer regions will be a valuable resource for future studies of transcriptional regulation in these systems.

Large scale identification of tissue-specific enhancers will facilitate decoding the mechanisms responsible for cell type-specific gene expression in development and disease. Here. By analyzing these candidate regulatory regions, we revealed novel transcriptional regulatory motifs that likely participate in skeletal muscle development, angiogenesis, and organ specific EC gene expression. Our recovery of a previously described FOX-ETS heterodimer binding site in EC enhancers (De Val et al., 2008) validates the ability of our approach to detect sequence motifs important for angiogenesis, and suggests that these motifs provide important clues to the transcriptional regulators that interact with them. Our study identified significant enrichment of two new FOX-ETS heterodimer motifs in which the FOX and ETS sites are in different positions and orientations. In their original study, Jolma and colleagues already demonstrated that these alternative FOX-ETS motifs are indeed bound by both FOX and ETS family proteins, implying that these proteins are able to collaboratively bind DNA in diverse configurations, potentially with DNA itself playing an important role in stabilizing the heterodimer (Jolma et al., 2015). Enrichment of these motifs in Ep300-T-fb, and their over-representation neighboring genes related to angiogenesis, suggests that these alternative FOX-ETS configurations are functional. We also recovered FOX-ETS motifs from Ep300 regions in adult ECs (and preferentially in heart ECs), suggesting that these motifs continue to be important in maintenance of adult vasculature. However, direct support of these inferences will require further experiments that identify the specific proteins involved and that dissect their functional roles in vivo.

Additional novel heterodimer motifs, such as ETS:EBox, ETS:HOMEO, and ETS:Tbox, were enriched in EC enhancers from both developing and adult ECs. This suggests that these motifs, and potentially the protein heterodimers that were reported to bind them (Jolma et al., 2015), are important for vessel growth and maintenance. These potential TF combinations, like the FOX-ETS combination, may act as a transcriptional code to create regulatory specificity at individual enhancers and their associated genes. Experimental validation of these hypotheses will be a fruitful direction for future studies.

Limitations

A limitation of our current protocol is the need for several million cells to obtain robust ChIP-seq signal. Optimization of chromatin pulldown and purification through application of streamlined protocols or microfluidics (Cao et al., 2015), combined with the use of improved library preparation methods that work on smaller quantities of starting material, will likely overcome this limitation. Another limitation of our strategy is that Ep300 decorates many, but not all, active transcriptional enhancers (He et al., 2011), and therefore Ep300 bioChIP-seq will not comprehensively detect all enhancers. Our strategy does not directly permit profiling of other chromatin features in a Cre-targeted lineage. This limitation could be overcome by developing additional proteins labeled with the bio-epitope. For instance, by bio-tagging histone H3, non-tissue restricted ChIP for a feature of interest could be followed by sequential high affinity histone H3 bioChIP. Finally, our technique's lineage specificity is dependent on the properties of the Cre allele used (Ma et al., 2008), and users must be cognizant of the cell labeling pattern of the Cre allele that they choose.

Materials and methods

Mice

Animal experiments were performed under protocols approved by the Boston Children's Hospital Animal Care and Use Committee (protocols 13-08-2460R and 13-12-2601). Ep300^fb mice were generated by homologous recombination in embryonic stem cells. Targeted ESCs were used to generate a mouse line, which was bred to homozygosity. This line has been donated to Jackson labs (Jax 025980). The Rosa26^fsBirA allele was derived from the previously described Rosa26^fsTRAP mouse (Zhou et al., 2013) (Jax 022367) by removal of the frt-TRAP-frt cassette using germline Flp recombination. The Rosa26^BirA (Driegen et al., 2005) (constitutive; Jackson Labs 010920), Tie2Cre (Kisanuki et al., 2001) (Jackson Labs 008863), Myf5Cre (Tallquist et al., 2000) (Jackson Labs 007893), EIIaCre (Williams-Simons and Westphal, 1999) (Jackson Labs 003724), Rosa26^mTmG (Muzumdar et al., 2007) (Jackson Labs 007576) and VEcad-CreERT2 (Sörensen et al., 2009) (Taconic 13073) lines were described previously.

Genotyping primers
Name		Sequence (5'- > 3')	Comments
Ep300fb-f		AATGCTTTCACAGCTCGC	0.28 kb for wild-type, 0.43 kb for Ep300fb knockin
Ep300fb-r		AAACCATAAATGGCTACTGC
Forward common		CTCTGCTGCCTCCTGGCTTCT	Rosa26-fs-BirA, 0.33 kb for wildtype, 0.25 kb for knockin
Wild type reverse		CGAGGCGGATCACAAGCAATA
CAG reverse		TCAATGGGCGGGGGTCGTT
LacZ-f		CAATGCTGTCAGGTGCTCTCACTACC	0.42 kb, genotyping of transient transgenic
LacZ-r		GCCACTTCTTGATGCTCCACTTGG
Primers to amplify Ep300 peak regions for transient transgenic assay. 4 nucleotides CACC have been added to all the forward primers for TOPO Cloning.
Name	Sequence (5'- > 3')
Apln_f	CACCGGAGGCTGAGCAATGAATAG
Apln_r	TTGGCTGGGGAAGAGTAAGC
Aplnr_f	CACCTCTCTCTCTGGCTTCG
Aplnr_r	CCTCAGAATGTTTTCATGG
Dab2_f	CACCGTGGAAATCATAGCAC
Dab2_r	GGTTGGAATAAAAGAGC
Egfl7_Enh1_f	CACCGCCTACCCAGTGCTGTTCC
Egfl7_Enh1_r	CTGGAGTGGAGTGTCACG
Egfl7_Enh2_f	CACCGCTAGGGGCTTCTAGTTC
Egfl7_Enh2_r	AGGTCTCTTCTGTGTCG
Egfl7_Enh3_f	CACCTGTTAGTGGTGCTCCC
Egfl7_Enh3_r	TCCAAGGTCACAAAGC
Emcn_f	CACCAGCACACCTCGTAAAATGG
Emcn_r	GAGTGAAGTAAGACATCGTCC
Eng_f	CACCAAACTAATTAAAAAACAAAGCAGGT
Eng_r	CATATGTACATTAGAACCATCCA
Ephb4_f	CACCTGGGTCTCATCAACCGAAC
Ephb4_r	CCTATCTACATCAGGGCACTG
Ets1_f	CACCTTCGTCAGAAATGATCTTGCCA
Ets1_r	TAGCAAGAGAGCCTGGTCAG
Foxc1_f	CACCTCTCTGCTTCAAGGCACCTT
Foxc1_r	TGGATAGCATGCAGAGGACA
Gata2_f	CACCTTCTCTTGGGCCACACAGA
Gata2_r	ATCTGCTCCACTCTCCGTCA
Lmo2_f	CACCTGGTTTTGCTTGCTAC
Lmo2_r	CATTTCTAAGTCTCCAC
Lyve1_f	CACCTACTGCCATGGAGGACTG
Lyve1_r	AGACACCTGGCTGCCTGATA
Mef2c_f	CACCGGAGGATTAAAAATTCCCC
Mef2c_r	CCTCTTAAATGTACGTG

Share this article

Cite this article

Generation and characterization of Ep300flbio allele.

Comparison of Ep300 bioChiP-seq to antibody ChIP-seq for mapping Ep300 chromatin occupancy.

Tissue specific Ep300 bioChiP-seq.

Cre-directed, lineage-selective Ep300 bioChiP-seq.

Functional validation of enhancer activity of Ep300-T-fb-bound regions.

Motifs enriched in Ep300-T-fb and Ep300-M-fb regions.

Enhancers in adult heart and lung ECs.

Author details

Pingzhu Zhou

Contribution

Contributed equally with

Competing interests

Fei Gu

Contribution

Contributed equally with

Competing interests

Lina Zhang

Contribution

Competing interests

Brynn N Akerberg

Contribution

Competing interests

Qing Ma

Contribution

Competing interests

Kai Li

Contribution

Competing interests

Aibin He

Contribution

Competing interests

Zhiqiang Lin

Contribution

Competing interests

Sean M Stevens

Contribution

Competing interests

Bin Zhou

Contribution

Competing interests

William T Pu

Contribution

For correspondence

Competing interests

Citations by DOI

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organism

Generation and characterization of Ep300^flbio allele.