1. Developmental Biology and Stem Cells
  2. Genes and Chromosomes
Download icon

Mapping cell type-specific transcriptional enhancers using high affinity, lineage-specific Ep300 bioChIP-seq

  1. Pingzhu Zhou
  2. Fei Gu
  3. Lina Zhang
  4. Brynn N Akerberg
  5. Qing Ma
  6. Kai Li
  7. Aibin He
  8. Zhiqiang Lin
  9. Sean M Stevens
  10. Bin Zhou
  11. William T Pu Is a corresponding author
  1. Boston Children’s Hospital, United States
  2. Shanghai University of Traditional Chinese Medicine, China
  3. Peking University, China
  4. Chinese Academy of Sciences, China
  5. Harvard University, United States
Tools and Resources
Cited
2
Views
1,973
Comments
0
Cite as: eLife 2017;6:e22039 doi: 10.7554/eLife.22039

Abstract

Understanding the mechanisms that regulate cell type-specific transcriptional programs requires developing a lexicon of their genomic regulatory elements. We developed a lineage-selective method to map transcriptional enhancers, regulatory genomic regions that activate transcription, in mice. Since most tissue-specific enhancers are bound by the transcriptional co-activator Ep300, we used Cre-directed, lineage-specific Ep300 biotinylation and pulldown on immobilized streptavidin followed by next generation sequencing of co-precipitated DNA to identify lineage-specific enhancers. By driving this system with lineage-specific Cre transgenes, we mapped enhancers active in embryonic endothelial cells/blood or skeletal muscle. Analysis of these enhancers identified new transcription factor heterodimer motifs that likely regulate transcription in these lineages. Furthermore, we identified candidate enhancers that regulate adult heart- or lung- specific endothelial cell specialization. Our strategy for tissue-specific protein biotinylation opens new avenues for studying lineage-specific protein-DNA and protein-protein interactions.

https://doi.org/10.7554/eLife.22039.001

Introduction

The diverse cell types of a multicellular organism share the same genome but express distinct gene expression programs. In mammals, precise cell-type specific regulation of gene expression depends on transcriptional enhancers, non-coding regions of the genome required to activate expression of their target genes (Visel et al., 2009a; Bulger and Groudine, 2011). Enhancers are bound by transcription factors and transcriptional co-activators, which then contact RNA polymerase two engaged at the promoter, stimulating gene transcription.

Enhancers are nodal points of transcriptional networks, integrating multiple upstream signals to regulate gene expression. Because enhancers do not have defined sequence or location with respect to their target genes, mapping enhancers is a major bottleneck for delineating transcriptional networks. Recently chromatin immunoprecipitation of enhancer features followed by sequencing (ChIP-seq) has been used to map potential enhancer. DNase hypersensitivity (Crawford et al., 2006; Thurman et al., 2012), H3K27ac (histone H3 acetylated on lysine 27) occupancy (Creyghton et al., 2010; Nord et al., 2013), or H3K4me1 (histone H3 mono-methylated on lysine 4) occupancy (Heintzman et al., 2007) are chromatin features that have been used to identify cell-type- specific enhancers. While most enhancers are DNase hypersensitive, DNase hypersensitive regions are often not active enhancers (Crawford et al., 2006; Thurman et al., 2012). H3K27ac is enriched on cell type-specific enhancers (Creyghton et al., 2010; Nord et al., 2013), but may be a less accurate predictor of enhancers than other transcriptional regulators (Dogan et al., 2015). Chromatin occupancy of Ep300, a transcriptional co-activator that catalyzes H3K27ac deposition, has been found to accurately predict active enhancers (Visel et al., 2009b). However, antibodies for Ep300 are marginal for robust ChIP-seq, particularly from tissues, leading to low reproducibility, variation between antibody lots, and inefficient enhancer identification (Gasper et al., 2014).

Mammalian tissues are composed of multiple cell types, each with their own lineage-specific transcriptional enhancers. Thus defining lineage-specific enhancers from mammalian tissues requires developing strategies that overcome the cellular heterogeneity of mammalian tissues, particularly when the lineage of interest comprises a small fraction of the cells in the tissue. Past efforts to surmount this challenge have taken the strategy of purifying nuclei from the cell type of interest using a lineage-specific tag. For instance, nuclei labeled by lineage-specific expression of a fluorescent protein have been purified by FACS (Bonn et al., 2012). This method is limited by the need to dissociate tissues and recover intact nuclei, and by the relatively slow rate of FACS and the need to collect millions of labeled nuclei. To circumvent the FACS bottleneck, cell type-specific overexpression of tagged SUN1, a nuclear envelope protein, has been used to permit affinity purification of nuclei (Deal and Henikoff, 2010; Mo et al., 2015). Although this mouse line was reported to be normal, SUN1 overexpression potentially could affect cell phenotype and gene regulation (Chen et al., 2012). Chromatin from isolated nuclei are then subjected to ChIP-seq to identify histone signatures of enhancer activity. However, as noted above histone signatures may less accurately predict enhancer activity compared to occupancy by key transcriptional regulators (Dogan et al., 2015).

Here, we report an approach to identify murine enhancers active in a specific lineage within a tissue. We developed a knock-in allele of Ep300 in which the protein is labeled by the bio peptide sequence (de Boer et al., 2003; He et al., 2011). Cre recombinase-directed, cell type specific expression of BirA, an E. coli enzyme that biotinylates the bio epitope tag (de Boer et al., 2003), allows selective Ep300 ChIP-seq, thereby identifying enhancers active in the cell type of interest. Using this strategy, we identified thousands of endothelial cell (EC) and skeletal muscle lineage enhancers active during embryonic development. Extending the approach to adult organs, we defined adult EC enhancers, including enhancers associated with distinct EC gene expression programs in heart compared to lung. Analysis of motifs enriched in EC or skeletal muscle lineage enhancers predicted novel transcription factor motif signatures that govern EC gene expression.

Results

Efficient identification of enhancers using Ep300fb bioChIP-seq

We developed an epitope-tagged Ep300 allele, Ep300fb, in which FLAG and bio epitopes (de Boer et al., 2003; He et al., 2011) were knocked into the C-terminus of endogenous Ep300 (Figure 1A–B and Figure 1—figure supplement 1A). Transgenically expressed BirA (Driegen et al., 2005) biotinylates the bio epitope, permitting quantitative Ep300 pull down on streptavidin beads (Figure 1C). We have not noted abnormal phenotypes. Heart development and function are sensitive to Ep300 gene dosage (Shikama et al., 2003; Wei et al., 2008), yet Ep300fb/fb homozygous mice survived normally (Figure 1D) and Ep300fb/fb hearts expressed normal levels of Ep300 and had normal size and function (Figure 1—figure supplement 1B–E). These data indicate that Ep300fb is not overtly hypomorphic.

Figure 1 with 1 supplement see all
Generation and characterization of Ep300flbio allele.

(A) Experimental strategy for high affinity Ep300 pull down. A flag-bio epitope was knocked onto C-terminus of the endogenous Ep300 gene. The bio peptide sequence is biotinylated by BirA, widely expressed from the Rosa26 locus. (B) Targeting strategy to knock flag-bio epitope into the C-terminus of Ep300. A targeting vector containing homology arms, flag-bio epitope, and Frt-neo-Frt cassette was used to insert the epitope tag into embryonic stem cells by homologous recombination. Chimeric mice were mated with Act:Flpe mice to excise the Frt-neo-Frt cassette in the germline, yielding the Ep300fb allele. (C) Biotinylated Ep300 is quantitatively pulled down by streptavidin beads. Protein extract from Ep300flbio/flbio; Rosa26BirA embryos was incubated with immobilized streptavidin. Input, bound, and unbound fractions were analyzed for Ep300 by immunoblotting. GAPDH was used as an internal control. (D) Ep300flbio/flbio mice from heterozygous intercrosses survived normally to weaning.

https://doi.org/10.7554/eLife.22039.002

To evaluate Ep300fb-based mapping of Ep300 chromatin occupancy, we isolated embryonic stem cells (ESCs) from Ep300fb/fb; Rosa26BirA/BirA mice. We then performed Ep300fbbiotin-mediated chromatin precipitation followed by sequencing (bioChiP-seq), in which high affinity biotin-streptavidin interaction is used to pull down Ep300 and its associated chromatin (He et al., 2011). Biological duplicate sample signals and peak calls correlated well (93.6% overlap; Spearman r = 0.96; Figure 2A–B). We compared the results to publicly available Ep300 antibody ChIP-seq data generated by ENCODE (overlap between duplicate peaks 77.8%; r = 0.91; Figure 2A–B). Ep300 bioChiP-seq identified 48963 Ep300-bound regions (‘Ep300 regions’) shared by both replicates, compared to 15281 for Ep300 antibody ChIP-seq (Figure 2A,C). The large majority (89.6%) of Ep300 regions detected by antibody were also found by Ep300 bioChiP-seq, and Ep300 signal was substantially stronger using bioChiP-seq (Figure 2A,C,D). These data indicate that Ep300fb bioChiP-seq has improved sensitivity compared to Ep300 antibody ChIP-seq for mapping Ep300 chromatin occupancy in cultured cells.

Comparison of Ep300 bioChiP-seq to antibody ChIP-seq for mapping Ep300 chromatin occupancy.

(AB) Comparison of biological duplicate antibody Ep300 ChIP-seq (Encode) and Ep300 bioChIP-seq (flbio). The Ep300 bioChIP-seq data had greater overlap between replicates and greater intragroup correlation. Most antibody peaks were covered by the bioChIP-seq data. There were 3.3 times more Ep300 regions detected by Ep300 bioChiP-seq. 89.6% of Ep300 regions detected by antibody ChIP-seq were recovered by Ep300 bioChIP. Panel B shows Spearman correlation between samples over the peak regions. (C) Tag heatmap shows input-subtracted Ep300 antibody or bioChIP signal in the union of the Ep300-bound regions detected by each method. (D) Correlation plots show greater Ep300 bioChIP-seq signal compared to antibody bioChIP-seq.

https://doi.org/10.7554/eLife.22039.004

Identification of tissue-specific enhancers using Ep300fb bioChIP-seq

We used Ep300fb/+; Rosa26BirA/+ mice to analyze Ep300fb genome-wide occupancy in embryonic heart and forebrain. We performed bioChiP-seq on heart and forebrain from embryonic day 12.5 (E12.5) embryos in biological duplicate (Figure 3A–B). There was high reproducibility (83% and 93%, respectively) between biological duplicates (Figure 3B and Figure 3—figure supplement 1A). In comparison, published Ep300 antibody ChIP-seq from E11.5 heart and forebrain (Visel et al., 2009b) had lower signal-to-noise and yielded few peaks when analyzed using the same peak detection algorithm (MACS2 [Zhang et al., 2008]). Using the originally published peaks, antibody-based Ep300 ChIP-seq yielded 9.5x or 3.0x less Ep300 regions in heart and forebrain, respectively (Figure 3B). These regions overlapped 58.7% and 64.7% of the Ep300fb bioChIP-seq regions, suggesting that the epitope-tagged allele has superior sensitivity and specificity for mapping Ep300-bound regions in tissues, as it does in cultured cells.

Figure 3 with 3 supplements see all
Tissue specific Ep300 bioChiP-seq.

(A) Tag heatmap showing Ep300fb signal in heart and forebrain. Each row represents a region that was bound by Ep300 in heart, forebrain, or both. A minority of Ep300-enriched regions were shared between tissues. (B) Ep300fb pull down from E12.5 heart or forebrain. Ep300-bound regions identified by bioChIP-seq of Ep300fb/fb; Rosa26BirA tissues in biological duplicates are compared to regions identified by antibody-mediated Ep300 ChIP-seq (Visel et al., 2009b). (C) GO biological process terms enriched for genes that neighbor the tissue-specific heart or forebrain peaks. Bars indicate statistical significance.

https://doi.org/10.7554/eLife.22039.005

We compared Ep300fb regions from forebrain and heart (Supplementary file 1). Only a minority of Ep300fb regions (8.9% for heart and 31.3% for brain) were common between tissues (Figure 3A). Viewing Ep300fb bioChiP-seq signal at genes selectively expressed in heart or brain confirmed robust tissue-specific differences that overlapped enhancers with known tissue-specific activity (Figure 3—figure supplement 1B). Genes neighboring the Ep300fb occupied regions specific to heart or forebrain were enriched for gene ontology (GO) functional terms relevant to the respective tissue (Figure 3C). These results reinforce the conclusion that Ep300 occupies tissue-specific enhancers and indicate that this conclusion was not a consequence of insensitive detection of Ep300-occupied regions in earlier studies (Visel et al., 2009b).

Ep300 is a histone acetyltransferase, and one of its enzymatic products is histone H3 acetylated on lysine 27 (H3K27ac). We compared the genome-wide signal of Ep300fb and H3K27ac in E12.5 heart and forebrain (Figure 3—figure supplement 1C and data not shown). There was a high correlation between biological replicates (r = 0.98). Ep300 was also well correlated with H3K27ac (r = 0.64), independently validating the Ep300fb bioChIP-seq data. The previously published Ep300 antibody ChIP-seq data (Visel et al., 2009b) was less well correlated to H3K27ac (r = 0.37), although the correlation was highly statistically significant (p<0.0001). Interestingly, 26.4% and 52.9% of heart and brain H3K27ac regions were shared between tissues (Figure 3—figure supplement 1D) compared to 8.9% and 31.3% for Ep300fb heart and brain regions, respectively (Figure 3A), suggesting that Ep300fb occupancy is more tissue-specific.

We analyzed the prediction of active enhancers by our Ep300fb bioChiP-seq data. The VISTA Enhancer database (Visel et al., 2007) contains thousands of genomic regions that have been tested for tissue-specific enhancer activity using an in vivo transient transgenic assay. 185 tested regions had heart activity and 130 (70%) of these overlapped Ep300fb regions that were reproduced in both biological duplicates. In comparison, only 105 (57%) of these regions overlapped the regions previously reported to be bound by Ep300 using antibody ChIP-seq.

Recently human and mouse Ep300 and H3K27ac ChIP-seq data from fetal and adult heart were combined to yield a ‘compendium’ of heart enhancers, with the strength of ChIP-seq signal used to provide an ‘enhancer score’ ranging from 0 to 1 that correlated with the likelihood of regions covered in the VISTA database to show heart activity (Dickel et al., 2016). We compared our heart Ep300 regions to this compendium. Overall, 9438/72508 (13%) regions in the prenatal compendium overlapped with the Ep300 heart regions (Figure 3—figure supplement 2A). However, the overlap frequency increased markedly for regions with higher enhancer scores (Figure 3—figure supplement 2B). For example, if one considers the 3571 compendium regions with an enhancer score of at least 0.4 (corresponding to a validation rate in the VISTA database of ~25%), 2647 (74.1%) were contained within the heart Ep300 regions, and 63/68 (92.6%) regions with a score of at least 0.8 (validation rate ~43%) overlapped a heart Ep300 region. Thus, heart compendium regions that are more likely to have in vivo heart activity are largely covered by heart Ep300 regions. On the other hand, 10752 (53%) heart Ep300 regions were not covered by the compendium, suggesting that this database is incomplete, potentially as a result of its use of incomplete antibody-based Ep300 ChIP-seq data.

Ep300 antibody ChIP-seq was one of the criteria used to select some of the test regions in the VISTA Enhancer database; as an independent test free of this potentially confounding effect, we searched the literature for other heart enhancers that were confirmed using the transient transgenic assay. We identified 40 additional heart enhancers. Of these, 24 (60%) intersected the Ep300fb regions common to both replicates. In comparison, only 6/40 (15%) intersected the regions detected previously by Ep300 antibody ChIP-seq. Few heart enhancers were found in the regions unique to Ep300 antibody ChIP-seq (11/185 VISTA and 2/40 non-VISTA), compared to regions unique to Ep300fb bioChIP-seq (36/185 VISTA and 20/40 non-VISTA). We conclude that Ep300fb ChIP-seq predicts heart enhancers with sensitivity that is superior to antibody-mediated Ep300 ChIP-seq.

Other chromatin features have been used to predict transcriptional enhancers. We compared the accuracy of Ep300 bioChiP-seq to other chromatin features for heart enhancer prediction. To map accessible chromatin, we performed ATAC-seq (assay for transposable-accessible chromatin followed by sequencing) on E12.5 cardiomyocytes. E12.5 heart ChIP-seq data for modified histones (H3K27ac, H3K4me1, H3K4me2, H3K4me3, H3K9ac, H3K27me3, H3K9me3, H3K36me3) were obtained from publicly available datasets (see Materials and methods, Data Sources). Using a machine learning approach and the VISTA enhancer database as the gold standard, we evaluated the accuracy of each of these chromatin features, compared to Ep300 bioChiP-seq, for predictive heart enhancers (Figure 3—figure supplement 3). This analysis showed that Ep300 bioChiP-seq was the single most predictive chromatin feature (area under the receiver operating characteristic curve (AUC) = 0.805). ATAC-seq and H3K27ac also performed well (AUC = 0.749 and 0.747, respectively), whereas H3K4me1 had was poorly predictive (AUC = 0.589). Combining Ep300 bioChIP-seq with ATAC-seq improved predictive accuracy (AUC = 0.866), equivalent to the value obtained by performing predictions with all of the chromatin features (AUC = 0.862). These analyses indicate that of the features tested Ep300 is the best single factor for enhancer prediction.

Cre-activated, lineage-specific Ep300fb bioChIP-seq

In vivo biotinylation of Ep300fb requires co-expression of the biotinylating enzyme BirA. We reasoned that Ep300fb bioChIP-seq could be targeted to a Cre-labeled lineage by making BirA expression Cre-dependent. Therefore, we established Rosa26fsBirA, in which BirA expression is contingent upon Cre excision of a floxed-stop (fs) cassette (Figure 4A). In preliminary experiments, we showed that Rosa26fsBirA expression of BirA was Cre-dependent (Figure 4—figure supplement 1A), as was Ep300fb biotinylation (Figure 4B). When activated by Cre driven from Tek regulatory elements (Tg(Tek-cre)1Ywa/J; also known as Tie2Cre), BirA was expressed in endothelial and blood lineages (Figure 4C), consistent with this Cre transgene's labeling pattern. Thus, Rosa26fsBirA expresses BirA in a Cre-dependent manner.

Figure 4 with 1 supplement see all
Cre-directed, lineage-selective Ep300 bioChiP-seq.

(A) Experimental strategy. Lineage-specific Cre recombinase activates expression of BirA (HA-tagged) from Rosa26-flox-stop-BirA (Rosa26fsBirA). This results in Ep300 biotinylation in the progeny of Cre-expressing cells. (B) Cre-dependent Ep300 biotinylation using Rosa26fsBirA. Protein extracts were prepared from E11.5 embryos with the indicated genotypes. R26BirA/+ (ca) and R26fsBirA/+ indicate the constitutively active and Cre-activated alleles, respectively. (C) Immunostaining demonstrating selective Tie2Cre-mediated expression of HA-tagged BirA in ECs of R26fsBirA; Tie2Cre embryos. Arrows and arrowheads indicate endothelial and hematopoietic lineages, respectively. (D) Tissue-selective Ep300 bioChIP-seq. Tie2Cre (T; endothelial and hematopoietic lineages), Myf5Cre (M; skeletal muscle lineages), and EIIaCre (E; germline activation) were used to drive tissue-selective Ep300 bioChIP-seq. Correlation between ChIP-seq signals within peak regions are shown for triplicate biological repeats. Samples within groups were the most closely correlated. (E) Ep300fb bioChIP-seq signal at Gata2 (EC/blood specific) and Myod (muscle specific). Enhancers validated by transient transgenic assay are indicated along with the citation’s Pubmed identifier (PMID). (F) Biological process GO terms illustrate distinct functional groups of genes that neighbor Ep300 bioChIP-seq driven by lineage-specific Cre alleles. The heatmap contains the top 10 terms enriched for genes neighboring each of the three lineage-selective Ep300 regions.

https://doi.org/10.7554/eLife.22039.009

Next, we compared Ep300fb bioChIP-seq from embryos when driven by Tie2Cre (endothelial and blood lineages) (Kisanuki et al., 2001), Myf5tm3(cre)Sor/J (referred to as Myf5Cre; skeletal muscle lineage) (Tallquist et al., 2000), or Tg(EIIa-cre)C5379Lmgd/J (also known as EIIaCre; ubiquitous) (Williams-Simons and Westphal, 1999). For the Tie2Cre and EIIaCre samples, we used E11.5 embryos, a stage with robust angiogenesis. For Myf5Cre, we used E13.5 embryos, when muscle lineage cells are in a range of stages in the muscle differentiation program, spanning muscle progenitors to differentiated muscle fibers. BioChiP-seq from biological triplicates showed high within-group correlation, and lower between-group correlation, demonstrating the strong effect of different Cre transgenes in directing Ep300fb bioChIP-seq (Figure 4D). Viewing the bioChiP-seq signals in a genome browser confirmed lineage-selective signal enrichment. For example, Tie2Cre drove high Ep300fb bioChIP-seq signal at a Gata2 intronic enhancer with known activity in endothelial and blood lineages (Figure 4E, top panel). There was less signal at this region in Myf5Cre and EIIaCre samples. At the skeletal muscle specific gene Myod, Myf5Cre drove strong Ep300fb bioChiP-seq signal at a known distal enhancer (Goldhamer et al., 1992), as well as a second Ep300 bound region about 12 kb upstream from the transcriptional start site.

To identify lineage-selective regions genome-wide, we filtered for regions with called peaks in which the lineage-specific Cre (Tie2Cre or Myf5Cre) Ep300fb signal was at least 1.5 times the ubiquitous Cre (EIIaCre) Ep300fb signal (Figure 4—figure supplement 1B–C). This led to the identification of 2411 regions with enriched signal in Tie2Cre (Ep300-T-fb) and 1292 regions with enriched signal in Myf5Cre (Ep300-M-fb), compared to 17382 regions with Ep300fb occupancy detected with ubiquitous biotinylation (Ep300-E-fb; Supplementary file 1). We analyzed the biological process gene ontology terms enriched for genes neighboring these three sets of Ep300 regions (Figure 4F). Ep300-T-fb regions were highly enriched for functional terms related to angiogenesis and hematopoiesis, whereas the Ep300-M-fb regions were highly enriched for functional terms related to skeletal muscle. Together, these results indicate that our strategy for Cre-driven, lineage-specific, Ep300fb bioChiP-seq successfully identifies regulatory regions that are associated with lineage relevant biological processes.

Functional validation of enhancer activity of lineage-specific Ep300fb regions

We next set out to validate the in vivo enhancer activity of the regions with Cre-driven lineage-enriched Ep300fb occupancy. If a substantial fraction of the Ep300-T-fb regions have transcriptional enhancer activity, then genes neighboring these enhancers should be expressed at higher levels in Tie2Cre lineage cells. To test this hypothesis, we used Tie2Cre-activated translating ribosome affinity purification (Heiman et al., 2008; Zhou et al., 2013) (T2-TRAP) to obtain the gene expression of Tie2Cre-marked cell lineages. Using this lineage-specific expression profile, we then compared the expression of genes neighboring Ep300-T-fb, Ep300-M-fb, and Ep300-E-fb regions. Ep300-T-fb neighboring genes were more highly expressed compared to Ep300-E-fb neighboring genes (p<10−38, Mann-Whitney U-test; Figure 5A). In contrast, there was no significant difference between Ep300-M-fb and Ep300-E-fb neighboring genes (Figure 5A). This result held regardless of the maximal distance threshold used to find the gene nearest to a Ep300 region (Figure 5—figure supplement 1A). We also compared the expression of genes with and without an associated Ep300-T-fb region. Ep300-T-fb-associated genes were more highly expressed than non-associated genes (Figure 5—figure supplement 1B). Some Ep300-T-fb-associated genes were not detected within actively translating transcripts (Figure 5—figure supplement 1C). This suggests that in some cases Ep300 enhancer binding is not sufficient to drive gene expression; other contributing factors likely include imprecision of the enhancer-to-gene mapping rule, and regulation at the level of ribosome binding to transcripts. Together, our data are consistent with Ep300-T-fb regions being enriched for enhancers that are active in the Tie2Cre-labeled lineage.

Figure 5 with 3 supplements see all
Functional validation of enhancer activity of Ep300-T-fb-bound regions.

(A) Expression of genes neighboring regions bound by Ep300fb in different Cre-marked lineages. Translating ribosome affinity purification (TRAP) was used to enrich for RNAs from the Tie2Cre lineage. Input or Tie2Cre-enriched RNAs were profiled by RNA sequencing. The expression of the nearest gene neighboring Ep300 regions in ECs (Ep300-T-fb regions), but not skeletal muscle (Ep300-M-fb regions), was higher than that of genes neighboring regions bound by Ep300 across the whole embryo (EIIaCre). Box and whiskers show quartiles and 1.5 times the interquartile range. Groups were compared to EIIaCre using the Mann-Whitney U-test. (B) Transient transgenesis assay to measure in vivo activity of Ep300-T-fb regions. Test regions were positioned upstream of an hsp68 minimal promoter and lacZ. Embryos were assayed at E11.5. (C) Summary of transient transgenic assay results. Out of 20 regions tested, nine showed activity in ECs or blood in three or more embryos, and two more showed activity in two embryos. See also Table 2. (D) Representative whole mount Xgal-stained embryos. Enhancers that directed LacZ expression in an EC or blood pattern in two or more embryos are shown. Numbers indicate embryos with LacZ distribution similar to shown image, compared to the total number of PCR positive embryos. (E) Sections of Xgal-stained embryos showing examples of enhancers active in arteries, veins, and endocardium, or selectively active in arteries or veins. AS: aortic sac; CV: cardinal vein; DA: dorsal aorta; EC: endocardial cushion; HV: head vein; LA: left atrium; LV: left ventricle; RV: right ventricle. Scale bars, 100 µm. See also Figure 5—figure supplements 1 and 2 and Table 2.

https://doi.org/10.7554/eLife.22039.011

To further evaluate the enhancer activity of Ep300-T-fb regions, we searched the literature and the VISTA Enhancer database (Visel et al., 2007) for genomic regions with endothelial cell activity validated by transient transgenesis (Table 1). Of 40 positive regions identified, 19 (47.5%) overlapped with Ep300-T-fb regions. Next, we used the transient transgenic assay to test the lineage-selective enhancer activity of 20 additional Ep300-T-fb regions. We selected regions that neighbored genes with EC-selective expression (T2-TRAP more than 10-fold enriched over input RNA) and with known or potential relevance to angiogenesis. Of the 20 tested Ep300-T-fb regions, eight drove reporter gene activity in at least three embryos in a vascular pattern, in both whole mounts and histological sections, and two additional regions drove reporter gene activity in a vascular pattern in two embryos (Figure 5B–D; Table 2; Figure 5—figure supplements 23). In retrospect, two of the positive enhancers (Eng; Mef2c) had been described previously (Table 2) (Pimanda et al., 2006; De Val et al., 2008). In some of these cases, there was also activity in blood cells, consistent with Tie2Cre activity in both blood and endothelial lineages. Additionally, one enhancer of Lmo2 was active blood cells but not ECs. Thus a substantial fraction (9/20; 45%) of regions identified by lineage-selective, Cre-directed bioChIP-seq have appropriate and reproducible in vivo activity.

Table 1

mm9 genome coordinates of regions with EC activity as determined by transient transgenic assay. Vista_XXX indicates that the region was obtained from the VISTA enhancer database. Lifeover indicates that the region was inferred by liftover from the human genome. For enhancers obtained from the literature, Pubmed was searched for ‘endothelial cell enhancer’. The resulting references were manually curated for transient transgenic testing of candidate endothelial cell enhancer regions.

https://doi.org/10.7554/eLife.22039.015
ChrStartEndNote
chr93720663137209631Robo4;PMID17495228;bloodVessels
chr49441202294413640Tek;PMID9096345;bloodVessels
chr8106625634106626053Cdh5;PMID15746076;bloodVessels
chr114944566349446520Flt4;posEC;liftoverFromPMID19070576;FoxETS
chr69933822399339713FoxP1;posEC;liftoverFromPMID19070576;FoxETS
chr8130910720130911740Nrp1;posEC;liftoverFromPMID19070576;FoxETS
chr186121901761219690Pdgfrb;posEC;liftoverFromPMID19070576;FoxETS
chr4137475841137476446Ece1;posEC;liftoverFromPMID19070576;FoxETS;artery
chr4114743698114748936Tal1;PMID14966269;endocardium;bloodVessels
chr2119152861119153661Dll4;PMID23830865;arterial
chr138372191983721962Mef2c;PMID19070576;FoxETS
chr138371108683711527Mefec;PMID15501228;panEC
chr23249321332493467Eng;liftoverFromPMID16484587;bloodVessels
chr23251784432518197Eng;liftoverFromPMID18805961;bloodVessels;blood
chr68815259888153791Gata2;PMID17395646;PMID17347142;bloodVessels
chr93233730232337549Fli1;PMID15649946;bloodVessels
chr57637057176372841KDR;PMID10361126;bloodVessels
chr6125502138125502981VWF;PMID20980682;smallbloodVessels
chr5148537311148538294Flt1;liftoverFromPMID19822898
chr173470145534702277Notch4;liftoverFromPMID15684396
chr2155568649155569132Procr;liftoverfromPMID16627757;bloodVessels[7/17]
chr96391689063917347Smad6;liftoverfromPMID17213321;bloodVessels
chr193751016137510394Hhex;liftoverfromPMID15649946;bloodVessels;blood
chr57635789276358715Flk1;PMID27079877;bloodVessesl;artery
chr113214527032146411vista_101;heart[9/12];bloodVessels[7/12]
chr132880951528811310vista_265;neural tube[7/8];bloodVessels[3/8]
chr171298258312985936vista_89;heart[6/10];bloodVessels[8/10]
chr4131631431131635142vista_80;bloodVessels[10/10]
chr45785843357860639vista_261;bloodVessels[5/8]
chr59342629393427320vista_397;limb[12/13];bloodVessels[12/13]
chr82821679928218903vista_136;heart[7/10];other[6/10];bloodVessels[4/10]
chr38778660187790798liftoverFromvista_1891;somite[7/7];midbrain;(mesencephalon)[6/7];limb[7/7];branchial;arch[7/7];eye[7/7];heart[7/7];ear[7/7];bloodVessels[6/7]
chr6116309006116309768liftoverFromvista_2065;bloodVessels[9/9]
chr193756651237571839liftoverFromvista_1866;bloodVessels[5/5]
chr7116256647116260817liftoverFromvista_1859;neuraltube[8/8];hindbrain;(rhombencephalon)[5/8];midbrain;(mesencephalon)[8/8];forebrain[8/8];heart[7/8];bloodVessels[5/8];liver[4/8]
chr181400628514007917liftoverFromvista_1653;bloodVessels[5/8]
chr2152613921152616703liftoverFromvista_2050;bloodVessels[5/5]
chr143204552032047900liftoverFromvista_2179;bloodVessels[5/7]
chr157357004173577576liftoverFromvista_1882;bloodVessels[8/8]
chr82821681728218896liftoverFromvista_1665;heart[5/7];bloodVessels[7/7]
Table 2

Summary of transient transgenic validation of candidate EC enhancers.

https://doi.org/10.7554/eLife.22039.016
Neighboring geneRegion (mm9)Size (bp)Location w/r geneDistance to TSSWhole mountSectionsRef. (PMID)
#PCR pos#LacZ pos# EC or blood posEndoArtVeinBlood cells
AplnchrX:45358891–4535991810283'_Distal28,6241033+++
Dab2chr15:6009504–60104979945'_Distal−239,78824109+++++++
Egfl7_enh1chr2:26427040–264280299905'_Distal−9,041933++++++++++
Engchr2:32493216–324940198045'_Distal−8,4971243++++++16484587
Ephb4chr5:137789649–1377904127645'_Proximal−1,306754+++++
Lmo2chr2:103733621–1037343787585'_Distal−64,152291410+++
Mef2cchr13:83721522–83722451930Intragenic78,9541065+++++19070576
Notch1_enh1chr2:26330255–26331184930Intragenic−28,6222233+++++++
Sema6dchr2:124380522–1243812857645'_Distal−55,12817107++++++
Egfl7_enh3chr2:26433680–264346429635'_Distal−2,4151222++++++
Sox7chr14:64576382–645771187373'_Distal14,207742+++
Aplnrchr2:85003436–850044129773'_Distal27,4071230NANANANA
Egfl7_enh2chr2:26431273–264319957235'_Proximal−4,942931+
Emcnchr3:136984933–13698610311715'_Distal−18,5241511+
Ets1chr9:32481485–32482133649Intragenic−21,8181110NANANANA
Foxc1chr13:31921976–319228278523'_Distal23,8871331*NANANANA
Gata2chr6:88101907–881026967905'_Distal−46,356800NANANANA
Lyve1chr7:118020264–1180210437805'_Distal−3,128,028711++
Notch1_enh2chr2:26345973–263471181146Intragenic12,796900NANANANA
Sox18chr2:181397552–1813983357843'_Proximal84011321++
  1. *EC/blood pattern on whole mount not validated in histological sections.

Arterial and venous ECs have overlapping but distinct gene expression programs, yet only three artery-specific and no vein-specific transcriptional enhancers have been described (Wythe et al., 2013; Robinson et al., 2014; Becker et al., 2016; Sacilotto et al., 2013). We examined histological sections of the transient transgenic embryos to determine if a subset is selectively active in ECs of the dorsal aorta or cardinal vein. We identified an enhancer, Sema6d-enh, with activity predominantly in ECs that line the dorsal aorta but not the cardinal vein (Figure 5E, Table 2). A second enhancer, Sox7-enh, also showed selective activity in the dorsal aorta, although this was only reproduced in two embryos. Both were also active in the endocardium and endocardial derivatives of the cardiac outflow tract (Figure 5—figure supplement 3). We also identified two enhancers with activity at E11.5 predominantly in ECs that line the cardinal vein and not the dorsal aorta (Ephb4-enh and Mef2c-enh; Figure 5E, Table 2, and Figure 5—figure supplement 3). Interestingly, a core 44-bp region of Mef2c-enh had been previously reported to drive pan-EC reporter expression at E8.5–E9.5 (De Val et al., 2008). This enhancer’s activity pattern may be dynamically regulated at different developmental stages, as has been described previously for two artery-specific enhancers (Robinson et al., 2014; Becker et al., 2016). Further analysis of these enhancers will be required to confirm the artery- and vein- selective activity patterns that we observed, and to better characterize their temporospatial regulation.

Collectively, our data show that we have developed and validated a robust method for identification of transcriptional enhancers in Cre-marked lineages. Using this strategy, we discovered thousands of candidate skeletal muscle, EC, and blood cell enhancers. Based on our validation studies, we expect that a majority of these candidate regions have in vivo, cell-type specific transcriptional enhancer activity.

Transcription factor binding motifs enriched in skeletal muscle and EC/blood enhancers

Ep300 does not bind directly to DNA. Rather, transcription factors recognize sequence motifs in DNA and subsequently recruit Ep300. The transcription factors and transcription factor combinations that direct enhancer activity in skeletal muscle, blood, and endothelial lineages are incompletely described. To gain more insights into this question, we searched for transcription factor binding motifs that were over-represented in the candidate enhancer regions bound by Ep300 in skeletal muscle or blood/EC lineages. Starting from 1445 motifs for transcription factors or transcription factor heterodimers, we found 173 motifs over-represented in Ep300-T-fb or Ep300-M-fb regions (false discovery rate < 0.01% and frequency in Ep300 regions greater than 5%). Clustering and selection of representative non-redundant motifs left 40 motifs that were enriched in either Ep300-T-fb or Ep300-M-fb, or both (Figure 6A). Many closely related motifs were independently detected by de novo motif discovery (Figure 6—figure supplement 1A). Analysis of our T2-TRAP RNA-seq data identified genes expressed in embryonic ECs that potentially bind to these motifs (Supplementary file 2).

Figure 6 with 1 supplement see all
Motifs enriched in Ep300-T-fb and Ep300-M-fb regions.

(A) Motifs enriched in Ep300-T-fb or Ep300-M-fb regions. 1445 motifs were tested for enrichment in Ep300 bound regions compared to randomly permuted control regions. Significantly enriched motifs (neg ln p-value>15) were clustered and the displayed non-redundant motifs were manually selected. Heatmaps show statistical enrichment (left), fraction of regions that contain the motif (center), and GO terms associated with genes neighboring motif-containing, Ep300-bound regions (fraction of the top 20 GO biological process terms). Grey indicates that the motif was not significantly enriched (neg ln p-value≤15). (B) Conservation of sequences matching Tie2Cre- or Myf5Cre-enriched motifs within 100 bp of the summit of Ep300-T-fb or Ep300-M-fb regions, compared to randomly selected 12 bp sequences from the same regions. PhastCons conservation scores across 30 vertebrate species were used. ****p<0.0001, Kolmogorov–Smirnov test. (C) Luciferase assay of activity of enhancers containing indicated motifs. Three repeats of 20–30 bp regions from Ep300-bound enhancers linked to the indicated gene and centered on the indicated motif were cloned upstream of a minimal promoter and luciferase. The constructs were transfected into human umbilical vein endothelial cells. Luciferase activity was expressed as fold activation above that driven by the enhancerless, minimal promoter-luciferase construct. *p<0.05 compared to Mef2c-enhancer with mutated ETS:FOX2 motif (Mef2c mut). n = 3. (D) Luciferase assay of indicated motifs repeated three times within a consistent DNA context. Assay was performed as in C. *p<0.05 compared to negative control sequencing lacking predicted motif. n = 3. Error bars in C and D indicate standard error of the mean.

https://doi.org/10.7554/eLife.22039.017

The GATA and ETS motifs were the most highly enriched motifs in the Tie2Cre-marked blood and EC lineages (Figure 6A). Interestingly, GO analysis of the subset of regions positive for these motifs showed that GATA-containing regions are highly enriched for hematopoiesis and heme synthesis, consistent with the critical roles of GATA1/2/3 in these processes (Bresnick et al., 2012). On the other hand, ETS-containing regions were highly enriched for functional terms linked to angiogenesis, also consistent with the key roles of ETS factors in angiogenesis (Wei et al., 2009) and our prior finding that the ETS motif is enriched in dynamic VEGF-dependent EC enhancers (Zhang et al., 2013). The Ebox motif, recognized by bHLH proteins, was the most highly enriched motif in the skeletal muscle lineage, consistent with the important roles of bHLH factors such as Myod and Myf5 in skeletal muscle development (Buckingham and Rigby, 2014). Interestingly, the Ebox motif was also highly enriched in Tie2Cre-marked cells, and genes neighboring these Ebox-containing regions were functionally related to both angiogenesis and heme synthesis. bHLH-encoding genes such as Hey1/2, Scl, and Myc are known to be important in blood and vascular development.

The database used for our motif search included 315 heterodimer motifs that were recently discovered through high throughput sequencing of DNA concurrently bound by two different transcription factors (Jolma et al., 2015). This allowed us to probe for enrichment of heterodimer motifs that may contribute to enhancer activity in blood, EC, and skeletal muscle lineages. One heterodimer motif that was highly enriched in Ep300-T-fb regions (and not Ep300-M-fb regions; referred to as T2Cre-enriched motifs) was ETS:FOX2 (AAACAGGAA), comprised of a tail-to-tail fusion of Fox (TGTTT) and ETS (GGAA) binding sites (Figure 6A). GO analysis showed that this motif was closely linked to vascular biological process terms (Figure 6A). This motif was previously found to be sufficient to drive enhancer EC activity during vasculogenesis and developmental angiogenesis (De Val et al., 2008), validating that our approach is able to identify bona fide, functional heterodimer motifs. Interestingly, we discovered two additional ETS-FOX heterodimer motifs, which were also highly enriched in Ep300-T-fb regions and also linked to vascular biological process terms: ETS:FOX1 (GGATGTT), consisting of a head-to-tail fusion between ETS and FOX motifs, with the ETS motif located 5' to the FOX motif (Figure 6A, arrows over motif logo), and ETS:FOX3 (TGTTTACGGAA), a head-to-tail fusion with the FOX motif located 5' to the ETS motif. Other heterodimer motifs that were enriched in Ep300-T-fb regions and to our knowledge previously were unrecognized as regulatory elements in ECs were ETS:TBox, ETS:HOMEO, and ETS:Ebox. Similar analysis of Ep300-M-fb regions identified highly enriched Ebox-containing heterodimer motifs including Ebox:Hox, Ebox:HOMEO, and ETS:Ebox (‘Myf5Cre-enriched motifs’).

To assess functional significance of these motifs, we examined their evolutionary conservation. Using PhastCons genome conservation scores for 30 vertebrate species (Siepel et al., 2005), we measured the conservation of sequences matching Tie2Cre-enriched motifs within the central 200 bp of Ep300-T-fb regions. Whereas randomly selected 12 bp sequences from these regions exhibited a distribution of scores heavily weighted towards low conservation values, sequences matching Tie2Cre-enriched motifs showed a bimodal distribution consisting of highly conserved and poorly conserved sequences (Figure 6B). The conservation of individual heterodimer motifs such as ETS:FOX1-3 confirmed that they shared this bimodal distribution that included deeply conserved sequences (Figure 6—figure supplement 1B). These findings indicate that a subset of sequences matching Tie2Cre-enriched motifs, including the novel heterodimer motifs, are under selective pressure. Myf5Cre-enriched motifs within the center of Ep300-M-fb regions similarly adopted a bimodal distribution that includes a subset of motif occurrences with high conservation (Figure 6B and Figure 6—figure supplement 1B). This analysis supports the biological function of Tie2Cre- and Myf5Cre-enriched motifs in endothelial cell/blood or skeletal muscle enhancers, respectively.

To further functionally validate the transcriptional activity of these heterodimer motifs, we measured their enhancer activity using luciferase reporter assays. Three repeats of enhancer fragments containing motifs of interest were cloned upstream of a minimal promoter and luciferase. The constructs were transfected into human umbilical vein endothelial cells (HUVECs), and transcriptional activity was measured by luciferase assay, normalized to the enhancerless promoter-luciferase construct (Figure 6C). The well-studied ETS:FOX2 motif from an endothelial Mef2c enhancer (De Val et al., 2008) robustly stimulated transcription to about the same extent as the SV40 enhancer. As expected, mutation of the ETS:FOX2 motif markedly blunted its activity, supporting the specificity of this assay. Interestingly, the alternative ETS:FOX3 motif uncovered by our study was at least as potent in stimulating transcription as the previously described ETS:FOX2 motif. The other novel heterodimer motifs tested, ETS:HOMEO2 and ETS:Ebox, likewise supported strong transcriptional activity, as did the SOX motif, whose enrichment and associated GO terms were highly EC-selective.

To further assess and compare the transcriptional activity of these heterodimer motifs, we cloned upstream of luciferase 3x repeated motifs into a consistent DNA context that had minimal endogenous enhancer activity (Figure 6D). The ETS motif alone only weakly stimulated luciferase expression, whereas the previously described ETS:FOX2 motif robustly activated luciferase expression. Interestingly, both of the newly identified, alternative ETS:FOX motifs (ETS:FOX1 and ETS:FOX3) were more potent activators that ETS:FOX2. The other heterodimer motifs tested, ETS-Homeo2, ETS-Ebox, and ETS-Tbox, also demonstrated significant enhancer activity.

Together, unbiased discovery of cell type specific enhancers coupled with motif analysis identified novel transcription factor signatures that are likely important for gene expression programs of blood, vasculature, and skeletal muscle.

Organ-specific EC enhancers

ECs in different adult organs have distinct gene expression programs that underlie organ-specific EC functions (Nolan et al., 2013; Coppiello et al., 2015). For example, heart ECs are adapted for the transport of fatty acids essential for fueling oxidative phosphorylation in the heart (Coppiello et al., 2015). On the other hand, lung ECs are adapted for efficient transport of gas, but possess specialized tight junctions to minimize transit of water (Mehta et al., 2014). Nolan et al. recently profiled gene expression in ECs freshly isolated from nine different adult mouse organs (Nolan et al., 2013). Clustering the 3104 genes with greater than 4-fold difference in expression across this panel identified genes with selective expression in ECs from a subset of organs (Figure 7—figure supplement 1A). 240 genes were preferentially expressed in ECs from heart (and skeletal muscle) compared to other organs, whereas 355 genes were preferentially expressed in ECs from lung (Figure 7—figure supplement 1B). One cluster contained genes co-enriched in lung and brain, including many tight junction genes such as claudin 5.

We asked if our strategy of lineage-specific Ep300fb bioChIP-seq would allow us to identify enhancers linked to organ-specific EC gene expression. For these experiments, we used Tg(Cdh5-cre/ERT2)1Rha (also known as VECad-CreERT2) (Sörensen et al., 2009) to drive Ep300fb biotinylation in ECs (Figure 7—figure supplement 2); unlike Tie2Cre, this transgene does not label hematopoietic cells when induced with tamoxifen in the neonatal period. We isolated adult (eight wk old) heart and lungs from Ep300fb/+; VEcad-CreERT2+; Rosa26fsBirA/+ mice and performed bioChiP-seq. Triplicate repeat experiments were highly reproducible (Figure 7A). Inspection of Ep300fb ChIP-seq signals from lung and heart ECs suggested that VEcad-CreERT2 successfully directed Ep300fb enrichment from ECs. For example, Tbx3 and Meox2, transcription factors selectively expressed in lung and heart ECs, respectively, were associated with Ep300fb-decorated regions in the matching tissues (Figure 7—figure supplement 3). Interestingly, Meox2 has been implicated in directing expression of heart EC-specific genes and in the pathogenesis of coronary artery disease (Coppiello et al., 2015; Yang et al., 2015). On the other hand, Tbx2/4 and Myh6, genes expressed in non-ECs in lung and heart, were not associated with regions of Ep300fb enrichment (Figure 7—figure supplement 3).

Figure 7 with 4 supplements see all
Enhancers in adult heart and lung ECs.

VEcad-CreERT2 and neonatal tamoxifen pulse was used to drive BirA expression in ECs. Ep300 regions in adult heart or lung ECs were identified by bioChIP-seq in biological triplicate. Ep300 regions were ranked into deciles by the ratio of the Ep300-VE-fb signal in heart to lung. (A) Tag heatmap shows that the top and bottom deciles have selective Ep300 occupancy in heart and lung ECs, respectively. (B) Expression of genes in heart ECs (red) or lung ECs (blue) neighboring Ep300 regions, divided into deciles by Ep300-VE-fb signal ratio in heart and lung. ***p<0.0001, Wilcoxon test. Expression values were obtained from (Nolan et al., 2013). Box plots indicate quartiles, and whiskers indicate 1.5 times the interquartile range. (C) Genes with selective expression in heart or lung ECs were identified by K-means clustering (see Materials and methods and Figure 7—figure supplement 1). The number of Ep300-VE-fb regions in heart (red) or lung (blue) neighboring these genes was determined, stratified by decile. ***p<0.0001, Fisher's exact test. (D) Enrichment of selected motifs in Ep300-VE-fb regions from heart (decile 1) compared to lung (decile 10) or vice versa. Grey indicates no significant enrichment. Displayed non-redundant motifs were selected from all significant motifs by manual curation of clustered motifs.

https://doi.org/10.7554/eLife.22039.019

To obtain a broader, unbiased view of the adult EC Ep300fb bioChIP-seq results, regions bound by Ep300fb in either heart (14251) or lung (22174) were rank-ordered by the heart to lung Ep300fb signal ratio (Supplementary file 1). Regions were grouped into deciles by this ratio, with the most heart-enriched regions in decile one and the most lung-enriched regions in decile 10. Next, for genes neighboring regions in each decile, we compared expression in heart compared to lung. Genes neighboring decile one regions (greater Ep300fb signal in heart) had higher median mRNA transcript levels than lung (Figure 7B). In contrast, genes neighboring decile 10 regions (greater Ep300fb signal in lungs) had higher median mRNA transcript levels in lung than heart. We then focused on genes with selective expression in heart or lung. More decile 1 Ep300fb regions were associated with genes with heart-selective expression than lung-selective expression, and more decile 10 Ep300fb regions were associated with genes with lung-selective expression (in each case, p<0.0001, Fisher's exact test; Figure 7C). These results are consistent with greater heart- or lung-associated enhancer activity in Ep300fb decile 1 or 10 regions, respectively.

We analyzed the GO terms that are over-represented for genes neighboring Ep300fb decile 1 or 10 regions. Both sets of regions were highly enriched for GO terms related to vasculature and blood vessel development (Figure 7—figure supplement 4). Analysis of disease ontology terms showed that genes neighboring decile one regions (greater Ep300fb signal in heart) were significantly associated with terms relevant to coronary artery disease, such as ‘coronary heart disease', 'arteriosclerosis’, and ’myocardial infarction’, whereas genes neighboring decile 10 regions (greater Ep300fb signal in lung) were less enriched. Conversely, genes neighboring decile 10 regions were selectively associated with hypertension and cerebral artery disease.

To gain insights into transcriptional regulators that preferentially drive heart or lung EC enhancers, we searched for motifs with differential enrichment in decile one compared to decile 10. Using decile one regions as the foreground sequences and decile 10 regions as the background sequences, we detected significant enrichment of the TCF motif in regions preferentially occupied by Ep300 in heart ECs (Figure 7D), consistent with prior work that found that a Meox2:TCF15 complex promotes heart EC-specific gene expression (Coppiello et al., 2015). Other motifs that were significantly enriched in decile 1 (heart) compared to decile 10 regions were FOX, ETS, ETS-FOX and ETS-HOMEO motifs. We performed the reciprocal analysis to identify lung-enriched motifs. One motif enriched in decile 10 (lung) compared to decile one regions was the TBOX motif. Interestingly, TBX3 is a lung EC-enriched transcription factor (Nolan et al., 2013). Other motifs over-represented in decile 10 compared to decile one regions were EBox, ETS:Ebox, and EBOX:HOMEO motifs. Thus differences in transcription factor expression or heterodimer formation in heart and lung ECs may contribute to differences in Ep300 chromatin occupancy and enhancer activity between heart and lung.

Discussion

Here we show that Cre-mediated, tissue-specific activation of BirA permits high affinity bioChiP-seq of factors with a bio epitope tag. Furthermore, we show that combining this strategy with the Ep300fb knockin allele permits efficient identification of cell type-specific enhancers. The bio epitope has been knocked into a number of different transcription factor loci (He et al., 2012; Waldron et al., 2016) and Jackson Labs 025982 (Rbpj), 025980 (Ep300), 025983 (Mef2c), 025978 (Nkx2-5), 025979 (Srf), and 025977 (Zfpm2)), and these alleles can be combined with Cre-activated BirA to permit lineage-specific mapping of transcription factor binding sites. Cell type-specific protein biotinylation will also be useful for mapping protein-protein interactions in specific cell lineages.

Using this technique, we identified thousands of candidate skeletal muscle and EC enhancers and showed that many of these candidate enhancers are likely to be functional. Furthermore, we showed that the technique can identify tissue-specific enhancers in postnatal tissues, and identified novel candidate enhancers that regulate organ-specific endothelial gene expression. These enhancer regions will be a valuable resource for future studies of transcriptional regulation in these systems.

Large scale identification of tissue-specific enhancers will facilitate decoding the mechanisms responsible for cell type-specific gene expression in development and disease. Here. By analyzing these candidate regulatory regions, we revealed novel transcriptional regulatory motifs that likely participate in skeletal muscle development, angiogenesis, and organ specific EC gene expression. Our recovery of a previously described FOX-ETS heterodimer binding site in EC enhancers (De Val et al., 2008) validates the ability of our approach to detect sequence motifs important for angiogenesis, and suggests that these motifs provide important clues to the transcriptional regulators that interact with them. Our study identified significant enrichment of two new FOX-ETS heterodimer motifs in which the FOX and ETS sites are in different positions and orientations. In their original study, Jolma and colleagues already demonstrated that these alternative FOX-ETS motifs are indeed bound by both FOX and ETS family proteins, implying that these proteins are able to collaboratively bind DNA in diverse configurations, potentially with DNA itself playing an important role in stabilizing the heterodimer (Jolma et al., 2015). Enrichment of these motifs in Ep300-T-fb, and their over-representation neighboring genes related to angiogenesis, suggests that these alternative FOX-ETS configurations are functional. We also recovered FOX-ETS motifs from Ep300 regions in adult ECs (and preferentially in heart ECs), suggesting that these motifs continue to be important in maintenance of adult vasculature. However, direct support of these inferences will require further experiments that identify the specific proteins involved and that dissect their functional roles in vivo.

Additional novel heterodimer motifs, such as ETS:EBox, ETS:HOMEO, and ETS:Tbox, were enriched in EC enhancers from both developing and adult ECs. This suggests that these motifs, and potentially the protein heterodimers that were reported to bind them (Jolma et al., 2015), are important for vessel growth and maintenance. These potential TF combinations, like the FOX-ETS combination, may act as a transcriptional code to create regulatory specificity at individual enhancers and their associated genes. Experimental validation of these hypotheses will be a fruitful direction for future studies.

Limitations

A limitation of our current protocol is the need for several million cells to obtain robust ChIP-seq signal. Optimization of chromatin pulldown and purification through application of streamlined protocols or microfluidics (Cao et al., 2015), combined with the use of improved library preparation methods that work on smaller quantities of starting material, will likely overcome this limitation. Another limitation of our strategy is that Ep300 decorates many, but not all, active transcriptional enhancers (He et al., 2011), and therefore Ep300 bioChIP-seq will not comprehensively detect all enhancers. Our strategy does not directly permit profiling of other chromatin features in a Cre-targeted lineage. This limitation could be overcome by developing additional proteins labeled with the bio-epitope. For instance, by bio-tagging histone H3, non-tissue restricted ChIP for a feature of interest could be followed by sequential high affinity histone H3 bioChIP. Finally, our technique's lineage specificity is dependent on the properties of the Cre allele used (Ma et al., 2008), and users must be cognizant of the cell labeling pattern of the Cre allele that they choose.

Materials and methods

Mice

Animal experiments were performed under protocols approved by the Boston Children's Hospital Animal Care and Use Committee (protocols 13-08-2460R and 13-12-2601). Ep300fb mice were generated by homologous recombination in embryonic stem cells. Targeted ESCs were used to generate a mouse line, which was bred to homozygosity. This line has been donated to Jackson labs (Jax 025980). The Rosa26fsBirA allele was derived from the previously described Rosa26fsTRAP mouse (Zhou et al., 2013) (Jax 022367) by removal of the frt-TRAP-frt cassette using germline Flp recombination. The Rosa26BirA (Driegen et al., 2005) (constitutive; Jackson Labs 010920), Tie2Cre (Kisanuki et al., 2001) (Jackson Labs 008863), Myf5Cre (Tallquist et al., 2000) (Jackson Labs 007893), EIIaCre (Williams-Simons and Westphal, 1999) (Jackson Labs 003724), Rosa26mTmG (Muzumdar et al., 2007) (Jackson Labs 007576) and VEcad-CreERT2 (Sörensen et al., 2009) (Taconic 13073) lines were described previously.

Transient transgenics

Candidate regions approximately 1 kb in length were PCR amplified using primers listed in Table 3 and cloned into a gateway-hsp68-lacZ construct derived from pWhere (Invivogen) (He et al., 2011). Constructs were linearized by PacI and injected into oocytes by Cyagen, Inc. At least 5 PCR positive embryos were obtained per construct. Regions were scored as positive for EC activity if they displayed an EC staining pattern in whole mount and validated in sections, using previously described criteria (Visel et al., 2009a). For scoring purposes, we required that an EC pattern was observed for at least three different embryos, although we also describe results for an additional two regions with activity observed in two different embryos. Embryos were analyzed at E11.5 by whole mount LacZ staining. After whole mount imaging, embryos were embedded in paraffin and sectioned for histological analysis.

Table 3

Oligonucleotides used in this study.

https://doi.org/10.7554/eLife.22039.024
Genotyping primers
NameSequence (5'- > 3')Comments
Ep300fb-fAATGCTTTCACAGCTCGC0.28 kb for wild-type, 0.43 kb for Ep300fb knockin
Ep300fb-rAAACCATAAATGGCTACTGC
Forward commonCTCTGCTGCCTCCTGGCTTCTRosa26-fs-BirA, 0.33 kb for wildtype, 0.25 kb for knockin
Wild type reverseCGAGGCGGATCACAAGCAATA
CAG reverseTCAATGGGCGGGGGTCGTT
LacZ-fCAATGCTGTCAGGTGCTCTCACTACC0.42 kb, genotyping of transient transgenic
LacZ-rGCCACTTCTTGATGCTCCACTTGG
Primers to amplify Ep300 peak regions for transient transgenic assay.
4 nucleotides CACC have been added to all the forward primers for TOPO Cloning.
NameSequence (5'- > 3')
Apln_fCACCGGAGGCTGAGCAATGAATAG
Apln_rTTGGCTGGGGAAGAGTAAGC
Aplnr_fCACCTCTCTCTCTGGCTTCG
Aplnr_rCCTCAGAATGTTTTCATGG
Dab2_fCACCGTGGAAATCATAGCAC
Dab2_rGGTTGGAATAAAAGAGC
Egfl7_Enh1_fCACCGCCTACCCAGTGCTGTTCC
Egfl7_Enh1_rCTGGAGTGGAGTGTCACG
Egfl7_Enh2_fCACCGCTAGGGGCTTCTAGTTC
Egfl7_Enh2_rAGGTCTCTTCTGTGTCG
Egfl7_Enh3_fCACCTGTTAGTGGTGCTCCC
Egfl7_Enh3_rTCCAAGGTCACAAAGC
Emcn_fCACCAGCACACCTCGTAAAATGG
Emcn_rGAGTGAAGTAAGACATCGTCC
Eng_fCACCAAACTAATTAAAAAACAAAGCAGGT
Eng_rCATATGTACATTAGAACCATCCA
Ephb4_fCACCTGGGTCTCATCAACCGAAC
Ephb4_rCCTATCTACATCAGGGCACTG
Ets1_fCACCTTCGTCAGAAATGATCTTGCCA
Ets1_rTAGCAAGAGAGCCTGGTCAG
Foxc1_fCACCTCTCTGCTTCAAGGCACCTT
Foxc1_rTGGATAGCATGCAGAGGACA
Gata2_fCACCTTCTCTTGGGCCACACAGA
Gata2_rATCTGCTCCACTCTCCGTCA
Lmo2_fCACCTGGTTTTGCTTGCTAC
Lmo2_rCATTTCTAAGTCTCCAC
Lyve1_fCACCTACTGCCATGGAGGACTG
Lyve1_rAGACACCTGGCTGCCTGATA
Mef2c_fCACCGGAGGATTAAAAATTCCCC
Mef2c_rCCTCTTAAATGTACGTG
Notch1_Enh1_fCACCTCCCAAATGCTCCACGATG
Notch1_Enh1_rGAGGAATGGCGAGAAATAGAC
Notch1_Enh2_fCACCGAAGGCAGGCAGGAATAAC
Notch1_Enh2_rTGGACAGGTGCTTTGTTG
Sema6d_fCACCTCTTAACCACTATCTCC
Sema6d_rACTTCCTACACAGTTC
Sox18_fCACCTTGGGGGGAAAGAGTG
Sox18_rGACTTCATCCCATCTC
Sox7_fCACCACAGAGCCCCTGCATATGT
Sox7_rGCATGGTTTCTGAAGCCCAAAT
3x repeated enhancer regions for luciferase assay.
Core motifs of interest are highlighted in red.
NameSequence (5'- > 3')
ETS-FOX2 (Mef2c)CAGGAAGCACATTTGTCTACGCTTTCCTGTCATAACAGGAAGAGCAGGAA GCACATTTG TCTACGCTTTCCTGTCATAACAGGAAGAGCAGGAAGCAC ATTTGTCTACGCTTTCCTGTCATAACAGGAAGAG
Mef2c-mutCAAGAAGCACATTTGTCTACGCTTTCCTGTCATATCTAGAAGAGCAAGAAGCACATTTG TCTACGCTTTCC TGTCATATCTAGAAGAGCAAGAAGCACATTTGTCTACGCTTTCCTGTCATATCTAGAAGAG
ETS-FOX3 (Bcl2l1)CAGTTATTTCAGGAAAGATCAGTTATTTCAGGAAAGATCAGTTATTTCAGGAAAGAT
ETS-HOMEO2 (Egfl7_Enh1)GACAGACAGGAAGGCGGGACAGACAGGAAGGCGGGACAGACAGGAAGGCGG
ETS-HOMEO2 (Egfl7_Enh3)ACACACTTCCTGTTTCCTGACACACTTCCTGTTTCCTGACACACTTCCTGTTTCCTG
ETS-HOMEO2 (Flt4)ACAGTCACTTCCTGTTTTACAGTCACTTCCTGTTTTACAGTCACTTCCTGTTTT
ETS-HOMEO2 (Kdr)CAACAACAGGAAGTGGACAACAACAGGAAGTGGACAACAACAGGAAGTGGA
Sox (Apln)CAGTTCCCCATTGTTCTCGCAGTTCCCCATTGTTCTCGCAGTTCCCCATTGTTCTCG
Sox (Robo4)GCCAGAACAATGAAGAACAAAGCCTGCACGGCCAGAACAATGAAGAACAAAGCCTGCAC GGCCAGAACAATGAAGAACAAAGCCTGCACG
Sox (Sema3g)CGAATGGAAAGGGCATTGTTCAGGGGAGAACGAATGGAAAGGGCATTGTTCAGGGGAGA ACGAATGGAAAGGGCATTGTTCAGGGGAGAA
ETS-Ebox (Apln)AGGCGGAAGCAGCTGGGATAGGCGGAAGCAGCTGGGATAGGCGGAAGCAGCTGGGAT
ETS-Ebox (Zfp521)TTATCCACAGGAAACAGATGAGGATCGTTATCCACAGGAAACAGATGAGGATCGTTATC CACAGGAAACAGATGAGGATCG
3x repeated motifs within a similar DNA context.
Motifs are indicated in red.
Neg. controlTGTCATATCTAGAAGAGTGTCATATCTAGAAGAGTGTCATATCTAGAAGAG
ETS_aloneTGTCATATCTGGAAGAGTGTCATATCTGGAAGAGTGTCATATCTGGAAGAG
ETS-FOX1TGTCCGGATGTTGAGTGTCCGGATGTTGAGTGTCCGGATGTTGAG
ETS-FOX2TGTGTAAACAGGAAGTGAGTGTGTAAACAGGAAGTGAGTGTGTAAACAGGAAGTGAG
ETS-FOX3TGTTGTTTACGGAAGTGAGTGTTGTTTACGGAAGTGAGTGTTGTTTACGGAAGTGAG
ETS_EboxTGTAGGAAACAGCTGGAGTGTAGGAAACAGCTGGAGTGTAGGAAACAGCTGGAG
ETS-HOMEO2TGTAACCGGAAGTGAGTGTAACCGGAAGTGAGTGTAACCGGAAGTGAG
Tbx-ETSTGTCACACCGGAAGGAGTGTCACACCGGAAGGAGTGTCACACCGGAAGGAG

Histology

Embryos were collected in ice-cold PBS and fixed in 4% paraformaldehyde over night. For immunostaining, cryosections were stained with HA (Cell Signaling #3724, 1:100 dilution) and PECAM1 (BD Pharmingen, 553371, 1:200 dilution) antibodies and imaged by confocal microscopy (Olympus FV1000).

For immunostaining, cryosections were stained with HA (Cell Signaling #3724) and PECAM1 (BD Pharmingen, 553371) antibodies and imaged by confocal microscopy (Olympus FV1000).

Ep300fb/fb;Rosa26BirA/BirA ES cells derivation and bioChiP

ESCs were derived as described previously (Bryja et al., 2006). Five week old Ep300fb/fb;Rosa26BirA/BirA female mice were hormonally primed and mated overnight with eight weeks old Ep300fb/fb;Rosa26BirA/BirA male mice. Uteri were collected on 3.5 dpc and embryos were flushed out under a microscope. After removal of zona pellucida by treating the embryos with Tyrode’s solution (Sigma, T1788), the embryos were cultured in ESC medium (DMEM with high glucose, 15% ES cell-qualified FBS, 1000 U/mL LIF, 100 µM non-essential amino acids, 1 mM sodium pyruvate, 2 mM glutamine,100 µM β-mercaptoethanol, and penicillin/streptomycin), supplemented with 50 µM PD98059 (Cell Signaling Technology, #9900) for 7–10 days. The outgrowth was dissociated with trypsin, and the cells were cultured in ESC medium in 24 well plates to obtain colonies, which were then clonally expanded. Five male ESC lines were retained for further experiments. Pluripotency of these ESC lines was confirmed by immunostaining for pluripotency markers Oct4, Sox2, and SSEA, and they were negative for mycoplasma.

The five Ep300fb/fb;Rosa26BirA/BirA ESC lines were cultured in 150 mm dishes to 70–80% confluence. Crosslinking was performed by adding formaldehyde to 1% and incubating at room temperature for 15 min. Chromatin was fragmented using a microtip sonicator (QSONICA Q700). The chromatin from 3 ESC lines was pooled for replicate one and from the other 2 ESC lines for replicate 2. Ep300fb and bound chromatin were pulled down by incubation with streptavidin beads (Life Technologies #11206D).

Cell culture and luciferase reporter assays

Candidate motif sequences were synthesized as oligonucleotides (Table 3) and cloned into plasmid pGL3-promoter (Promega) between MluI and XhoI. HUVEC cells (Lonza) were cultured to 50–60% confluent in 24-well plates and transfected in triplicate with 1 µg luciferase construct and 0.5 µg pRL-TK internal control plasmid, using 5 µl jetPEI-HUVEC (Polyplus). After 2 days, cells were analyzed using the dual luciferase assay (Promega). Luciferase activity was measured using a 96-plate reading luminometer (Victor2, Perkin Elmer). Results are representative of at least two independent experiments.

Western blotting

Immunoblotting was performed using standard protocols and the following primary antibodies: GAPDH, Fitzgerald, 10 R-G109A (1:10,000 dilution); Ep300, Millipore, 05257 (1:2000 dilution); BirA, Abcam, ab14002 (1:1000 dilution).

Tissue collection for ChIP and bioChiP

bioChIP-seq was performed as described previously (He et al., 2014; He and Pu, 2010), with minor modifications. We used lower amplitude sonication to avoid fragmentation of Ep300 protein.

Ep300 and H3K27ac in E12.5 heart and forebrain

E12.5 embryonic forebrain and ventricle apex tissues were isolated from Swiss Webster (Charles River) females crossed to Ep300fb/fb;Rosa26BirA/BirA males. Cells were dissociated in a 2 mL glass dounce homogenizer (large clearance pestle, Sigma P0485) and then cross-linked in 1% formaldehyde-containing PBS for 15 min at room temperature. Glycine was added to final concentration of 125 mM to quench formaldehyde. Chromatin isolation was performed as previously described (He and Pu, 2010) 30 forebrains or 60 heart apexes were used in each sonication. Conditions were titrated to achieve sufficient fragmentation (mean fragment size 500 bp) while avoiding degradation of Ep300 protein. We used a microtip sonicator (QSONICA Q700) at 30% amplitude and a cycle of 5 s on and 20 s off for 96 cycles in total. Sheared chromation was precleared by incubation with 100 µl Dynabeads Protein A (Life Technologies, 10002D) for 1 hr at 4°C. For Ep300 bioChIP, 2/3 of the chromatin was then incubated with 250 µl (for forebrains) or 100 µl (for heart apexes) Dynabeads M-280 Streptavidin (Life Technologies, 11206D) for 1 hr at 4°C. The streptavidin beads were washed and bound DNA eluted as described (He and Pu, 2010). For H3K27ac ChIP, the remaining 1/3 chromatin was incubated with 10 µg (forebrain) or 5 µg (heart apex) H3K27ac antibody (ActiveMotif #39133) overnight at 4°C. Then 50 µl or 25 µl Dynabeads Protein A were added and incubated for 1 hr at 4°C. The magnetic beads were washed six times with RIPA buffer (50 mM HEPES, pH 8.0, 500 mM LiCl, 1% Igepal ca-630, 0.7% sodium deoxycholate, and 1 mM EDTA) and washed once in TE buffer. ChIP DNA was eluted at 65°C in elution buffer (10 mM Tris, pH 8.0, 1% SDS and 1 mM EDTA) and incubated at 65°C overnight to reverse crosslinks.

Ep300 bioChIP of fetal EC cells

E11.5 embryos were isolated from pregnant Rosa26mTmG females crossed to Ep300fb/fb;Rosa26fsBirA/BriA;Tie2Cre males. Tie2Cre positive embryos were picked under a fluorescence microscope for further experiments. Crosslinking and sonication were performed as described above. We used 15 Tie2Cre positive embryos for each bioChIP replicate. We used 750 µL Dynabeads M-280 Streptavidin beads for Ep300fb pull-down; this amount was determined by empiric titration.

Ep300 bioChIP of fetal Myf5Cre labeled cells

E13.5 embryos were isolated from pregnant Rosa26mTmG females crossed to Ep300fb/fb;Rosa26fsBirA/BriA;Myf5Cre males. Cre positive embryos were picked under a fluorescence microscope for further experiments. five embryos were used for each bioChIP replicate and incubated with 750 µl Dynabeads M-280 Streptavidin beads for Ep300fb pull-down.

Ep300 bioChIP of adult ECs

Ep300fb/fb;Rosa26fsBirA/mTmG;VEcad-CreERT2 pups were given two consecutive intragastic injections of 50 µl tamoxifen (2 mg/ml in sunflower seed oil) on postnatal day P1 and P2 to induce the activity of Cre. The lungs and heart apexes were collected when the mice were eight weeks old. Cross-linking and sonication were performed as described above. Six mice were used for each replicate. We used 600 µl (heart) or 1.5 mL (lung) streptavidin beads for the bioChIP.

bioChIP-seq and ChIP-seq

Libraries were constructed using a ChIP-seq library preparation kit (KAPA Biosystems KK8500). 50 ng of sonicated chromatin without pull-down was used as input.

Sequencing (50 nt single end) was performed on an Illumina HiSeq 2500. Reads were aligned to mm9 using Bowtie2 (Langmead and Salzberg, 2012) using default parameters. Peaks were called with MACS2 (Zhang et al., 2008). Murine blacklist regions were masked out of peak lists. For embryo samples, peak calling was performed against input chromatin background with a false discovery rate of less than 0.01. For adult samples, peak calling was poor using input chromatin background and therefore was performed using the ChIP sample only at a false discovery rate of less than 0.05. Aggregation plots, tag heat maps, and global correlation analyses were performed using deepTools 2.0 (Ramírez et al., 2014). bioChIP-seq signal was visualized in the Integrated Genome Viewer (Thorvaldsdóttir et al., 2013).

To associate genomic regions to genes, we used Homer’s AnnotatePeaks (Heinz et al., 2010) to select the gene with the closest transcriptional start site.

ATAC-seq

E12.5 heart ventricles were dissociated into single cell-suspensions using the Neonatal Heart Dissociation Kit (Miltenyi Biotec #130-098-373) with minor modifications from the manufacturer’s protocol. Embryonic tissue samples were incubated twice with enzyme dissociation mixes at 37°C for 15 min with gentle agitation by tube inversions between incubations. Cell mixtures were gently filtered through a 70 μm cell strainer and centrifuged at 300 x g, 5 min, 4°C. Red blood cells were lysed with 10X Red Blood Cell Lysis Solution (Miltenyi #130-094-183) and myocytes were isolated using the Neonatal Cardiomyocyte Isolation Kit (Miltenyi Biotec #130-100-825). 75,000 isolated cardiomyocytes were used for each ATAC-Seq experiment. Libraries were prepped as previously described (Buenrostro et al., 2015).

TRAP

Translating ribosome RNA purification (TRAP) and RNA-seq were performed as described (Zhou et al., 2013). The expression of the gene with TSS closest to the Ep300-bound region was used to define ‘neighboring gene’. In Figure 5A, no maximal distance limit was used; in Figure 5—figure supplement 1, a range of maximal distance thresholds were tested.

Gene ontology analysis

Gene Ontology analysis was performed using GREAT (McLean et al., 2010). Results were ranked by the raw binomial P-value. To determine the fraction of terms relevant to a cell type or process, the top twenty biological process terms were manually inspected.

Motif analysis

Homer (Heinz et al., 2010) was used for motif scanning and for de novo motif analysis. Regions analyzed were 100 bp regions centered on the summits called by MACS2. The motif database used for motif scanning was the default Homer motif vertebrate database plus the heterodimer motifs described by Jolma et al. (Jolma et al., 2015). To select motifs for display, motifs from samples under consideration with negative ln p-value > 15 in any one sample were clustered using STAMP (Mahony et al., 2007). Non-redundant motifs were then manually selected.

Gene expression analysis

Translating ribosome RNA purification (TRAP) was performed as described (Zhou et al., 2013). E10.5 Rosa26fsTrap/+;Tie2Cre embryos were isolated from Swiss Webster strain pregnant females crossed to Rosa26fsTrap/Trap;Tie2Cre males. TRAP RNA from 50 embryos was pooled for RNA-seq. The polyadenylated RNA was purified by binding to oligo (dT) magnetic beads (ThermoFisher Scientific, 61005). RNA-seq libraries were prepared with ScriptSeq v2 kit (Illumina, SSV 21106) according to the manufacturer’s instructions. RNA-seq reads were aligned with TopHat (Trapnell et al., 2009) and expression levels were determined with htseq-count (Anders et al., 2015). Adult organ EC expression values were obtained from Nolan et al. (2013) (GEO GSE47067).

Comparison of enhancer prediction using different chromatin features

Enhancer regions in the VISTA database were used as the golden standard. Each chromatin feature that was analyzed was from E12.5 heart. Data sources are listed below under ‘Data Sources’. For each chromatin factor, average read intensity in each VISTA regions with or without heart enhancer activity was calculated and used as the starting point for machine learning. We used the weighted KNN method as the classifier. The parameters used were: 10 nearest neighbors, Euclidean distance, and squared inverse weight. The classifier was trained and tested using 10-fold cross validation. ROC (Receiver operating characteristic) curve was used to evaluate the predictive accuracy of each factor, as measured by the area under the ROC curve (AUC).

Conservation analysis

The phastCons score for the multiple alignments of 30 vertebrates to mouse mm9 genome was used as the conservation score (Siepel et al., 2005). Sequences within Ep300 regions matching selected motif (s) were identified using FIMO (Grant et al., 2011) with default settings, and the average conservation score across the width of the sequence was used. To generate the random conservation background, 100 random motifs 12 bp wide (the average width of the motifs being analyzed) were used to the scan the same regions.

Literature searches

Heart and endothelial cell enhancers were identified by searching PubMed for ‘heart enhancer' or ’endothelial cell enhancer’, respectively, and then manually curating references for mouse or human enhancers with appropriate activity in murine transient transgenic assays.

Data sources

Sequencing data generated for this study are as follows: (1) ES_Ep300_bioChIP from Ep300fb/fb;Rosa26BirA/BirA ESCs; (2) FB_Ep300_bioChIP from E12.5 Ep300fb/+;Rosa26BirA/+ forebrains; (3) FB_H3K27ac_ChIP from E12.5 Ep300fb/+;Rosa26BirA/+ forebrains; (4) He_Ep300_bioChIP from E12.5 Ep300fb/+;Rosa26BirA/+ heart apex; (5) He_H3K27ac_ChIP from E12.5 Ep300fb/+;Rosa26BirA/+ heart apex; (6) EIIaCre_Ep300_bioChIP from E11.5 Ep300fb/+;Rosa26BirA/+ whole embryos; Myf5Cre_Ep300_bioChIP from E13.5 Ep300fb/+;Rosa26fsBirA/+;Myf5Cre whole embryos; Tie2Cre_Ep300_bioChIP from E11.5 Ep300fb/+;Rosa26fsBirA/+;Tie2Cre+ whole embryos; Tie2Cre-TRAP and Tie2Cre-Input from E10.5 Rosa26fsTrap/+;Tie2Cre+ whole embryos; and ATAC-seq from wild-type E12.5 ventricular cardiomyocytes. These data are available via the Gene Expression Omnibus (accession number GSE88789) or the Cardiovascular Development Consortium server (https://b2b.hci.utah.edu/gnomex/; login as guest).

The following public data sources were used for this study: ES_Ep300_ChIP, GSE36027; ES_Ep300_input, GSE36027; E12.5_Histone_input, GSE82850, E12.5_H3K27ac, GSE82449; E12.5_H3K27me3, GSE82448; E12.5_H3K36me3, GSE82970; E12.5_H3K4me1, GSE82697; E12.5_H3K4me2, GSE82667; E12.5_H3K4me3, GSE82882; E12.5_H3K9ac, GSE83056; E12.5_H3K9me3, GSE82787. Antibody Ep300 ChIP-seq data on forebrain (Visel et al., 2009b) and heart (Blow et al., 2010) are from GSE13845 and GSE22549, respectively. Adult organ EC gene expression data are from GEO GSE47067 (Nolan et al., 2013).

Accession numbers

Sequencing data generated for this study are available via the Gene Expression Omnibus (accession number GSE88789) or the Cardiovascular Development Consortium server (https://b2b.hci.utah.edu/gnomex/; login as guest; instructions for reviewer access are provided in an supplementary file).

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
    Early myotome specification regulates PDGFA expression and axial skeleton development
    1. MD Tallquist
    2. KE Weismann
    3. M Hellström
    4. P Soriano
    (2000)
    Development 127:5059–5070.
  52. 52
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
  62. 62
  63. 63
  64. 64
  65. 65
  66. 66

Decision letter

  1. Deepak Srivastava
    Reviewing Editor; Gladstone Institutes, United States

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "Mapping cell type-specific transcriptional enhancers using high affinity, lineage-specific p300 bioChIP-seq" for consideration by eLife. Your article has been reviewed by two peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Kevin Struhl as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Benoit G Bruneau (Reviewer #1); Joshua D Wythe (Reviewer #2).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

We appreciate the novel in vivo biotinylation strategy, via generation of a flag-bio tagged allele of endogenous p300 in combination with a published R26-lsl-birA line and multiple tissue-specific Cre drivers, which demonstrated that p300 occupancy predicts lineage-specific transcription. Validation of the novel p300 biotinylation system in vivo through rigorous criteria demonstrate the utility of this reagent in predicting enhancer usage. We believe this is a high quality manuscript that will be of interest to the general community. However, there are a number of points that should be addressed, which are detailed below. Most importantly, please address the issue of how "neighboring genes" were determined and also describe the metrics by which enhancers were determined be positive in transgenic embryos.

Essential revisions:

1) The authors state, after performing the TRAP-seq, that they compared the expression in ECs of "neighboring genes". However, we could not find within the manuscript any bioinformatic criteria explaining how they decided what a neighboring gene was. The genomic criteria (distance +/- from the TSS, for example) for suggesting a relationship of p300 occupancy and translating ribosomal-bound message should be clearly stated (and references to other papers using the same criteria would be appropriate, as well as comparison to genes that are actively translated without p300 enrichment in these tissues). Additionally, they should discuss many p300-T-fb enriched regions were near genes that weren't actively translating in the EC sample.

2) Overall, the criteria of 2 embryos (out of at least 5 PCR positive embryos) leading to a call as a "positive" enhancer lacks any interpretation or comment on the strength of the enhancer, as some of their elements (like Sox7, for instance) appear very weak and others appear somewhat non-specific (like Apln in the AER of the forelimb and hindlimb, or Egfl7_3 – which doesn't appear to be endothelial enriched or specific). Additionally, scoring an enhancer as positive when only 3 out of 22 showed EC activity seems arbitrary (especially for Notch1, where it is not even obvious that the 3rd embryo shown has activity). The transgenic results are somewhat puzzling, as the Sox7 enhancer (Figure 5) appears to clearly have signal in the atria on the wholemount view, but there is no myocardial signal in the section. Were these from 2 different samples? Perhaps a more nuanced interpretation of the results, and maybe commenting on the strength of the enhancers, would be useful. Finally, it is unclear what is meant by stating that 55% activity is consistent with the validation rate of other reports (as forebrain was 87%, midbrain 88%, and limb 88% in Visel et al., 2009; Blow et al. ranged between ~65-75%, etc.).

https://doi.org/10.7554/eLife.22039.037

Author response

Essential revisions:

1) The authors state, after performing the TRAP-seq, that they compared the expression in ECs of "neighboring genes". However, we could not find within the manuscript any bioinformatic criteria explaining how they decided what a neighboring gene was. The genomic criteria (distance +/- from the TSS, for example) for suggesting a relationship of p300 occupancy and translating ribosomal-bound message should be clearly stated (and references to other papers using the same criteria would be appropriate, as well as comparison to genes that are actively translated without p300 enrichment in these tissues). Additionally, they should discuss many p300-T-fb enriched regions were near genes that weren't actively translating in the EC sample.

The definition we used was the TSS closest to the p300 peak, without a maximal distance. This definition has been widely used, for example Blow et al., Nature Genetics, 2010. In the revision we also repeated the comparison with a series of different maximal distance thresholds and showed that the overall conclusion is not affected by the specific maximal threshold used (Figure 5—figure supplement 1).

As suggested, we also compared expression of p300-associated genes to those without a p300 peak. We found that p300-associated genes were more highly expressed.

Finally, we calculated the fraction of genes with and without associated p300 that were not detected within actively translating transcripts. We found that a higher fraction of p300-associated genes were detectably expressed. However, not all p300-associated genes were expressed, and not all expressed genes were p300-associated. This likely reflects multiple mechanisms of transcriptional activation, as well as imperfect rules used to associate genes to p300 regions.

2) Overall, the criteria of 2 embryos (out of at least 5 PCR positive embryos) leading to a call as a "positive" enhancer lacks any interpretation or comment on the strength of the enhancer, as some of their elements (like Sox7, for instance) appear very weak and others appear somewhat non-specific (like Apln in the AER of the forelimb and hindlimb, or Egfl7_3 – which doesn't appear to be endothelial enriched or specific). Additionally, scoring an enhancer as positive when only 3 out of 22 showed EC activity seems arbitrary (especially for Notch1, where it is not even obvious that the 3rd embryo shown has activity). The transgenic results are somewhat puzzling, as the Sox7 enhancer (Figure 5) appears to clearly have signal in the atria on the wholemount view, but there is no myocardial signal in the section. Were these from 2 different samples? Perhaps a more nuanced interpretation of the results, and maybe commenting on the strength of the enhancers, would be useful. Finally, it is unclear what is meant by stating that 55% activity is consistent with the validation rate of other reports (as forebrain was 87%, midbrain 88%, and limb 88% in Visel et al., 2009; Blow et al. ranged between ~65-75%, etc.).

We agree that scoring regions as positive or negatively overly simplifies the activity, which also includes spatial pattern, specificity, and strength. For this reason, we provide the raw data on the regions, as both whole embryo images and sections. However, scoring regions as positive or negative simplifies summarizing the activity of many regions, and comparison to prior studies.

We adopted the definition of an active enhancer that has been used by the Pennacchio and Visel groups. In their series of studies, an active enhancer was defined as one that drove expression in the target tissue (excluding embryos with ubiquitous expression, which likely reflect an integration site effect). In other words, a “heart enhancer” is an enhancer that drives expression in the heart, without regard to its specificity for heart. The Pennacchio/Visel definition of a positive enhancer is one that shows expected activity in three or more independent embryos. This demonstrates reproducibility. In our original manuscript, we used a criteria of two or more independent embryos, but in the revision we adhered to the definition of 3 or more for calculation of the validation rate. We retained the description of the results for the additional two enhancers with reproducible activity in two but not 3 embryos as these we feel it likely that this activity is biologically meaningful. We rewrote the text to reflect these changes and to remove the statement that 55% activity is consistent with previously reported validation rates.

For Sox7, the enhancer had activity in the endocardium of the outflow tract, not the atria. This is accurately reflected in the tissue sections.

https://doi.org/10.7554/eLife.22039.038

Article and author information

Author details

  1. Pingzhu Zhou

    1. Department of Cardiology, Boston Children’s Hospital, Boston, United States
    Contribution
    PZ, Conceived of the approach, Designed and performed experiments, Analyzed data, and co-wrote and Edited the manuscript
    Contributed equally with
    Fei Gu
    Competing interests
    The authors declare that no competing interests exist.
  2. Fei Gu

    1. Department of Cardiology, Boston Children’s Hospital, Boston, United States
    Contribution
    FG, Analyzed data and Edited the manuscript
    Contributed equally with
    Pingzhu Zhou
    Competing interests
    The authors declare that no competing interests exist.
  3. Lina Zhang

    1. Department of Biochemistry, Institute of Basic Medicine, Shanghai University of Traditional Chinese Medicine, Shanghai, China
    Contribution
    LZ, Constructed plasmids and Acquired data on transient transgenic embryos
    Competing interests
    The authors declare that no competing interests exist.
  4. Brynn N Akerberg

    1. Department of Cardiology, Boston Children’s Hospital, Boston, United States
    Contribution
    BNA, Performed ATAC-seq
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon 0000-0001-6470-6588
  5. Qing Ma

    1. Department of Cardiology, Boston Children’s Hospital, Boston, United States
    Contribution
    QM, Contributed to histological analysis of transient transgenic embryos
    Competing interests
    The authors declare that no competing interests exist.
  6. Kai Li

    1. Department of Cardiology, Boston Children’s Hospital, Boston, United States
    Contribution
    KL, Acquired data
    Competing interests
    The authors declare that no competing interests exist.
  7. Aibin He

    1. Institute of Molecular Medicine, Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, China
    Contribution
    AH, Provided reagents and Expertise on bioChIP-seq
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon 0000-0002-3489-2305
  8. Zhiqiang Lin

    1. Department of Cardiology, Boston Children’s Hospital, Boston, United States
    Contribution
    ZL, Acquired data
    Competing interests
    The authors declare that no competing interests exist.
  9. Sean M Stevens

    1. Department of Cardiology, Boston Children’s Hospital, Boston, United States
    Contribution
    SMS, Contributed to animal husbandry and Data acquisition
    Competing interests
    The authors declare that no competing interests exist.
  10. Bin Zhou

    1. State Key Laboratory of Cell Biology, CAS center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai, China
    Contribution
    BZ, Generated the Ep300fb mice
    Competing interests
    The authors declare that no competing interests exist.
  11. William T Pu

    1. Department of Cardiology, Boston Children’s Hospital, Boston, United States
    2. Harvard Stem Cell Institute, Harvard University, Cambridge, United States
    Contribution
    WTP, Supervised the project, Designed experiments, Analyzed data, and co-wrote the manuscript
    For correspondence
    1. wpu@pulab.org
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon 0000-0002-4551-8079

Funding

American Heart Association (12EIA8440003)

  • William T Pu

National Institutes of Health (U01HL098166)

  • William T Pu

National Institutes of Health (U01HL095712)

  • William T Pu

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

WTP was supported by funding from the National Heart, Lung, and Blood Institute (U01HL098166 and HL095712), by an Established Investigator Award from the American Heart Association, and by charitable donations from Dr. and Mrs Edwin A Boger. The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding agencies.

Ethics

Animal experimentation: Animal experiments were performed under protocols approved by the Boston Children's Hospital Animal Care and Use Committee (protocols 13-08-2460R and 13-12-2601).

Reviewing Editor

  1. Deepak Srivastava, Reviewing Editor, Gladstone Institutes, United States

Publication history

  1. Received: October 2, 2016
  2. Accepted: January 23, 2017
  3. Accepted Manuscript published: January 25, 2017 (version 1)
  4. Version of Record published: February 7, 2017 (version 2)
  5. Version of Record updated: April 11, 2017 (version 3)

Copyright

© 2017, Zhou et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,973
    Page views
  • 610
    Downloads
  • 2
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Comments

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)