Long-range cis-regulatory elements such as enhancers coordinate cell-specific transcriptional programmes by engaging in DNA looping interactions with target promoters. Deciphering the interplay between the promoter connectivity and activity of cis-regulatory elements during lineage commitment is crucial for understanding developmental transcriptional control. Here, we use Promoter Capture Hi-C to generate a high-resolution atlas of chromosomal interactions involving ~22,000 gene promoters in human pluripotent and lineage-committed cells, identifying putative target genes for known and predicted enhancer elements. We reveal extensive dynamics of cis-regulatory contacts upon lineage commitment, including the acquisition and loss of promoter interactions. This spatial rewiring occurs preferentially with predicted changes in the activity of cis-regulatory elements and is associated with changes in target gene expression. Our results provide a global and integrated view of promoter interactome dynamics during lineage commitment of human pluripotent cells.https://doi.org/10.7554/eLife.21926.001
Virtually every cell in the body contains the same set of DNA, which encodes thousands of genes. The activities of these genes vary between different types of cells and at different points in time. As a result, our DNA contains a complex array of molecular switches that instruct genes to switch on and off at the right time and in the right cells. These molecular switches, termed regulatory elements, are often a long way away from the genes that they control, and this can make it difficult to find out which switch controls which genes.
DNA is made up of several different building blocks known as bases and the order of these bases encodes specific information about the gene. Every human cell contains approximately two meters of DNA, which is highly folded in the cell nucleus. This three-dimensional folding allows regions that are far apart on the DNA thread to physically contact each other. To reach the genes they control, regulatory elements form loops on the DNA that are near-impossible to predict from looking at the sequence of bases alone. Mapping the locations of these loops can reveal the hidden circuitry within our DNA and help us to understand how unwanted changes (mutations) within regulatory elements may cause disease.
Freire-Pritchett, Schoenfelder et al. set out to reveal how loops between genes and their regulatory elements change as the stem cells specialise into immature brain cells. The experiments show that the pattern of DNA loops is extensively altered after the stem cells specialise into brain cells, that is, some loops are lost and new ones form. Furthermore, the regulatory elements themselves were often toggled between “on” and “off” states. These two processes tend to occur together, so that new loops are formed at the same time as the switch is activated.
Freire-Pritchett, Schoenfelder et al. also show that individual genes are often connected to many different regulatory elements. The next step is to understand how these multiple connections work together to coordinate gene activity, and whether this information could be used to control how stem cells specialise. This knowledge may lead to the development of stem cell-based therapies for stroke, Parkinson’s disease and other conditions.https://doi.org/10.7554/eLife.21926.002
Cell fate decisions are associated with profound changes in chromatin organisation, which underlie the activation of lineage-specific and the silencing of lineage-inappropriate genes (Buecker and Wysocka, 2012; Bulger and Groudine, 2010; Calo and Wysocka, 2013; Hallikas et al., 2006; Ong and Corces, 2012). Cis-regulatory elements such as transcriptional enhancers play a key role in this process by integrating regulatory inputs from intrinsic and extracellular cues, and by mediating the recruitment of core activator and repressor complexes (Pennacchio et al., 2013; Shlyueva et al., 2014; Spitz and Furlong, 2012). The definition of chromatin signatures has enabled the genome-wide identification of enhancer elements across multiple human cell types (ENCODE Project Consortium, 2012; Heintzman et al., 2007, 2009; Pennacchio et al., 2006; Rada-Iglesias et al., 2011; Kundaje et al., 2015). Chromatin states can also provide a robust readout of cis-regulatory activity associated with poised and active enhancers and have been used to show that widespread changes in enhancer position and activity occur upon cell fate decisions such as the lineage commitment of pluripotent cells (Creyghton et al., 2010; Rada-Iglesias et al., 2011; Zentner et al., 2011).
Cis-regulatory elements are often considerable distances away from their target gene promoters and may not control their nearest genes (Carvajal et al., 2001; Jeong et al., 2006; Marinić et al., 2013; Pennacchio et al., 2006; Ruf et al., 2011; Sagai et al., 2005; Spitz et al., 2003). It is generally accepted that this long-range action is facilitated by DNA-looping interactions (Pennacchio et al., 2013; Shlyueva et al., 2014). However, specific determinants of chromosomal interactions are still not fully understood, which presents challenges for the high-confidence prediction of regulatory interactions from sequence and epigenetic information (Mora et al., 2016; Roy et al., 2015; Shlyueva et al., 2014; Whalen et al., 2016). As a result, the target genes of most cis-regulatory elements remain unknown. Furthermore, while it is generally accepted that many genes are controlled by multiple regulatory elements (Barolo, 2012; Miguel-Escalada et al., 2015) our understanding of multi-modular gene regulation remains limited, particularly in the context of mammalian development and stem cell differentiation.
Over the last decade, chromosome conformation capture (3C) and derived methods have enabled the biochemical mapping of looping interactions to offer new insights into their architecture across different cell types (Dekker et al., 2013; de Laat and Duboule, 2013; Schmitt et al., 2016). In particular, Hi-C has allowed genome-wide characterisation of higher order chromatin dynamics during differentiation at the level of contact domains, including A/B compartments and topologically associated domains (TADs) (Fraser et al., 2015; Dixon et al., 2015). The complexity of Hi-C samples creates challenges for a comprehensive identification of individual enhancer-promoter loops using this technology. However, analyses focusing on candidate-interacting regions or those bound by specific proteins (such as cohesin and RNA polymerase II) have made it possible to detect subsets of promoter-enhancer interactions at high resolution (Heidari et al., 2014; Li et al., 2015; Sanyal et al., 2012) and delineate their dynamics during cell differentiation and reprogramming. These studies provided evidence of interactions associated with transcriptional changes upon lineage commitment (Denholtz et al., 2013; Kieffer-Kwon et al., 2013; Phillips-Cremins et al., 2013; Zhang et al., 2013), as well as revealed interactions formed in anticipation of changes in gene activity (Apostolou et al., 2013; Ghavi-Helm et al., 2014; Wei et al., 2013). However, despite these advances, the global and unbiased high-resolution mapping of promoter cis-regulatory interactions that form and remodel during development and stem cell differentiation is still lacking. This hampers an integrated understanding of the cis-regulatory logic underlying transcriptional decisions during lineage commitment.
Recently, we developed Promoter Capture Hi-C that uses sequence capture to enrich Hi-C libraries for interactions involving the promoters of most annotated genes, providing a global view on promoter interactions that is independent of the activity of interacting regions and identity of proteins recruited to them (Mifsud et al., 2015; Schoenfelder et al., 2015a). Here, we use PCHi-C in human embryonic stem cells (ESCs) and ESC-derived neuroectodermal cells (NECs) (Bajpai et al., 2009) to create a high-resolution resource of promoter contacts and their dynamics during early lineage commitment in the context of extensive chromatin changes that occur at the interacting cis-regulatory regions as the cells differentiate (Rada-Iglesias et al., 2011). Our large-scale dataset links thousands of known and predicted enhancer elements with their putative target genes in human pluripotent and early lineage-committed cells, including those known to drive tissue-restricted reporter gene expression in transgene assays. We integrate the promoter interacting regions of each gene to define cis-regulatory units (CRUs) that provide a view of multi-modular gene regulation. We show that CRUs undergo extensive reorganisation during lineage commitment that involves both the ‘rewiring’ (acquisition or loss) of promoter interactions, as well as chromatin state changes at pre-existing interactions. Importantly, we demonstrate that this reorganisation is associated with changes in target gene expression, thereby providing insights into the transcriptional control of early human development.
We used PCHi-C to profile the interactomes of 21,841 promoters in human ESCs and NECs (Figure 1A). We generated NECs using an established protocol (Rada-Iglesias et al., 2011) (Figure 1—figure supplement 1A) and confirmed efficient differentiation by flow cytometry analysis and RNA-sequencing (Figure 1—figure supplement 1B,C). PCHi-C data normalisation and signal detection using the CHiCAGO pipeline (Cairns et al., 2016) identified 75,795 significant cis-interactions between promoters and other genomic regions in ESCs and 75,624 in NECs. In addition, approximately 300 significant trans-interactions were detected in each cell type. As examples of this rich dataset, high-confidence interactions are shown for the SOX2 and PAX6 promoters (Figure 1B and Figure 1—figure supplement 2A). These examples illustrate the multiple promoter-contacts observed, alongside the conventional Hi-C profiles additionally generated in this study that reveal higher-order genome topology over the same region. Overall, PCHi-C samples showed an 11 to 15-fold enrichment for promoter-containing interactions over conventional Hi-C. This data resource provides a global, high-resolution atlas of chromosomal interactions in human pluripotent and lineage-committed cells. Processed datasets have been made available through Open Science Framework (http://osf.io/sdbg4), and raw sequencing reads have been deposited to Gene Expression Omnibus (accession GSE86821).
To gain insight into the chromatin properties of the promoter-containing interactions, we integrated the PCHi-C data with published genome-wide histone modification profiles in ESCs and NECs (Rada-Iglesias et al., 2011). In both cell types, promoter-interacting regions (PIRs) were significantly enriched for histone marks that are associated with regulatory functions (Figure 1C), including H3 lysine 4 monomethylation (H3K4me1) and H3 lysine 27 acetylation (H3K27ac), which can identify enhancers in human cell types, as well as H3 lysine four trimethylation (H3K4me3), which is associated with transcriptional activation, and H3 lysine 27 trimethylation (H3K27me3), which is associated with Polycomb-mediated transcriptional repression (Di Croce and Helin, 2013; Heintzman et al., 2007, 2009). PIRs were also significantly enriched for sites bound by the architectural protein CTCF (Figure 1—figure supplement 2B; based on ENCODE data available for ESCs only), consistent with previous observations in other cell types (Jin et al., 2013; Phillips-Cremins et al., 2013; Sanyal et al., 2012).
We used ChromHMM (Ernst and Kellis, 2012) to integrate these histone marks and to define four combinatorial chromatin states in both ESCs and NECs, as follows: active (characterised by H3K4me3 and/or H3K27ac); Polycomb-associated (H3K27me3); poised (H3K4 methylation and H3K27me3); and background (no detectable signal for the tested histone modifications) (Figure 1—figure supplement 3A; see Materials and methods for details). Overall, we detected just under 20,000 PIRs in each cell type that harboured either active (12,250 in ESCs and 7510 in NECs), Polycomb-associated (3505 in ESCs and 5856 in NECs) or poised (2274 in ESCs and 4262 in NECs) chromatin state signatures, connecting a large set of putative regulatory sequences in human pluripotent and lineage committed cells to their target promoters. In addition, 25,727 PIRs in ESCs and 20,016 PIRs in NECs were in the background state.
The chromatin states of several example promoters, including those for the POU5F1, PRDM14 and CHD7 genes, together with each of their respective PIRs, are shown in Figure 1—figure supplement 3B. When analysing the whole dataset, we found a significant concordance between the chromatin states at promoters and their PIRs (Figure 1D), which is in line with previous studies in other human cell types (Jin et al., 2013; Mifsud et al., 2015; Sanyal et al., 2012) and provides validation of our dataset. Notably, poised and Polycomb-associated promoters showed similar interaction preferences for PIRs in either of these two states (Figure 1D). This finding suggests that poised and Polycomb-associated regions are broadly interchangeable in terms of their interaction affinities, which is consistent with a key role for Polycomb-group proteins in mediating interactions in the poised state (Schoenfelder et al., 2015b).
Taken together, these data provide a comprehensive resource that links many thousands of known and predicted regulatory elements with their putative target genes and will enable the investigation of regulatory contacts during human lineage commitment.
The enrichment of PIRs for specific chromatin regulatory features points to the presence of functional enhancer elements at these regions that could potentially provide inputs to the promoters they contact. To assess the enhancer activity of the identified PIRs, we examined whether they were known to efficiently drive reporter gene expression in embryonic day 11.5 mouse embryos based on information from the VISTA Enhancer Browser (Visel et al., 2007). As an initial example, we focused on the 39 PIRs detected in NECs that interact with the promoter of the neural transcription factor POU3F2. Strikingly, four out of the five POU3F2 PIRs tested experimentally in VISTA transgenic assays showed reporter activity exclusively in neural tissues, and one PIR was inactive (Figure 2A). Furthermore, the mRNA expression pattern of mouse Pou3f2 broadly matched the combined tissue-restricted pattern of the tested human POU3F2 PIRs (Figure 2A). These results highlight how PCHi-C can contribute to our understanding of the cis-regulatory networks for key developmental genes.
We next examined all putative enhancer elements and their PCHi-C-identified promoter targets. Overall, 219 PIRs in ESCs and 267 PIRs in NECs overlapped VISTA-annotated human elements (Supplementary file 2). Notably, we found that NEC PIRs were strongly enriched for sequences that could drive reporter-gene activity in neural tissues and in other neuroectodermal derivatives, such as the neural tube and cranial structures (z-score = 11, Figure 2B and Figure 2—figure supplement 1A). In contrast, ESC PIRs were enriched for sequences active in neural (z-score = 6.7) and non-neural tissues (z-score = 4.5) at similar levels (Figure 2B and Figure 2—figure supplement 1A). Interestingly, the enrichment of PIRs with neural enhancer activity in NECs relative to their enrichment in ESCs was even more pronounced when we focused on PIRs in an active chromatin state (Figure 2—figure supplement 1B). Collectively, these results validate the function of several hundreds of PIRs as cell-type-specific developmental enhancers.
We next sought to link enhancers documented in the VISTA Enhancer Browser to their putative target genes on the basis of PCHi-C data. We detected the interactions of 267 VISTA-annotated human enhancers with 277 target gene promoters in NECs (Supplementary file 2). Of these, 122 PIRs (46%) interacted with their nearest gene, which is consistent with their current annotation in the VISTA Enhancer Browser. The remaining PIRs, however, did not interact with their nearest gene in NECs, but engaged with more distal promoters (Supplementary file 2). Figure 2C shows PCHi-C-based reassignment of enhancer targets for several examples of key neural regulators including SOX2, SOX4, and FZD3 (Figure 2C), and the full results are listed in Supplementary file 2.
Taken together, these findings provide a functional validation of the detected human PIRs, and identify the putative promoter targets of multiple known enhancers.
We found interacting promoters to engage a median of four PIRs (Figure 3A), consistent with findings in other human cell types (Jin et al., 2013; Sanyal et al., 2012). To obtain an integrated view of promoter interactions, we considered PIRs connected to each promoter to jointly form a ‘cis-regulatory unit’ (CRU, Figure 3B). Focusing on protein-coding genes, and considering all promoters associated with at least one PIR, we defined 9008 CRUs in ESCs and 9361 in NECs, and studied their localisation, chromatin properties and dynamics during cell lineage commitment.
CRUs spanned a median of ~230 kb (with a range of 1 kb-200Mb) in both cell types (Figure 3C). We assessed their localisation with respect to higher order features of chromosome architecture including TADs and Insulated Neighborhoods (INs) (Dixon et al., 2012; Ji et al., 2016; Nora et al., 2012; Sexton et al., 2012). We defined TADs in ESCs and NECs using Hi-C data for these cells generated as part of this study (see Materials and methods). Overall, ~75% of CRUs were fully contained within a TAD in ESCs and NECs, which was significantly higher than expected by random (Figure 3D–G and Figure 3—figure supplement 1D,E). In the remaining ~25% of CRUs, either some or all PIRs localised outside of the promoter-harbouring TAD (Figure 3H and Figure 3—figure supplement 1D). We found that TAD boundaries crossed by promoter interactions were generally weaker than non-crossed boundaries (Wilcoxon test p-value=1.8e-14; Figure 3—figure supplement 1H). However, the ranges of strength scores for ‘crossed’ and ‘non-crossed’ TAD boundaries were highly overlapping, and even some of the strongest boundaries were penetrable to interactions (Figure 3—figure supplement 1H).
For INs, we used the published genomic coordinates (available for ESCs only) that were defined on the basis of cohesin ChIA-PET and CTCF-binding data (Ji et al., 2016). Just under 30% of CRUs were fully contained within IN boundaries, and this proportion increased to ~45% when considering the largest span of each overlapping set of INs as a single unit (Figure 3I and Figure 3—figure supplement 1I). These numbers significantly exceeded the proportions expected at random (Figure 3I and Figure 3—figure supplement 1I), but at the same time, also provided abundant examples of IN-spanning CRUs (Figure 3G,H). Taken together, these results suggest that CRUs are partially constrained by, but not fully contained within, higher order topological structures such as TADs and INs.
To investigate the potential regulatory features of CRUs, we first characterised their chromatin properties by considering the proportion of PIRs in each chromatin state within a CRU. Applying hierarchical clustering based on this property, we obtained eight distinct clusters of CRUs in both ESCs (Figure 4A) and NECs (Figure 4—figure supplement 1), corresponding to different combinations of PIR chromatin states within CRUs. We found that CRUs within three prevalent clusters contained PIRs in one predominant, non-background, chromatin state (clusters 1–3; Figure 4A and Figure 4—figure supplement 1A). In contrast to these ‘uniform’ CRUs, 18% of CRUs in ESCs and 24% in NECs contained combinations of PIRs in active, poised and Polycomb-associated chromatin states (clusters 4–7; Figure 4A and Figure 4—figure supplement 1). Finally, CRUs in cluster 8 contained PIRs exclusively in the background state (Figure 4A and Figure 4–figure supplement 1). Examples of genes in ESCs assigned to the different CRU clusters are shown in Figure 4B and Figure 4—figure supplement 2. Notably, the chromatin state of each promoter generally matched that of the most prevalent CRU chromatin state (Figure 4A and Figure 4—figure supplement 1). Overall, this classification provides a framework for exploring CRU properties.
We set out to investigate CRU chromatin state transitions on ESC to NEC differentiation. For this analysis, and in each cell type, we classified CRUs into either single-state active (containing active, and possibly also background-state, PIRs), single-state repressed (containing poised and/or Polycomb-associated, and possibly also background-state PIRs), or background (containing only background-state PIRs). CRUs containing a combination of both active and repressed (Polycomb-associated/poised) PIRs were classified as dual-state. We found that 65% of the single-state CRUs in ESCs remained single-state CRUs in NECs, although approximately half of them switched their state (Figure 5A). In addition, similar proportions of CRUs lost (11%) and acquired (13%) a dual-state configuration on ESC to NEC differentiation. These findings demonstrate that considerable reorganisation of CRUs occurs during lineage commitment.
CRU state transitions associated significantly with changes in the expression of their target genes (p-value<0.005, Fisher’s exact test; Figure 5B,C). In particular, genes that were transcriptionally upregulated upon ESC differentiation preferentially gained an active single-state in NECs, either through switching the chromatin state of a single-state CRU (Figure 5B) or through resolving a dual-state CRU (Figure 5C). Examples of CRUs undergoing each scenario include RGMB and MAP2, which are transcriptionally upregulated in NECs (Figure 5D,E). Pronounced chromatin changes were also detected at the CRUs of genes downregulated upon differentiation, including a loss of the active single-state and/or a transition to the repressed single-state ( Figure 5B,C; example shown in Figure 5F). Taken together, these results suggest that the modulation of CRU chromatin state is associated with transcriptional changes upon ESC differentiation. This modulation might potentially underlie many transcriptional changes in early lineage commitment.
To investigate the underlying processes that drive changes in CRU organisation during cell lineage commitment, we studied the dynamics of promoter interactions and chromatin states at the individual PIRs. We refer to changes in PIR connectivity as ‘rewiring’, and to chromatin state changes at PIRs as ‘recolouring’, and note that they do not need to be mutually exclusive (Figure 6A). To distinguish between interactions that are rewired and retained on ESC differentiation at high confidence, we applied additional filters to the PCHi-C data, resulting in 1153 rewired (present in only one cell type) and 1258 retained (present in both ESCs and NECs) interactions (see Materials and methods for details).
Importantly, we found that the co-occurrence of rewiring and recolouring interactions on ESC differentiation was significantly more common than expected at random (Figure 6B, p-value<0.001, Fisher’s exact test). Specifically, new interactions that were gained by NECs preferentially acquired the active state, or transitioned from the background to repressed state (Figure 6B, bottom row). Interactions that were lost on ESC differentiation were enriched for PIRs that transitioned from the active to poised/Polycomb-associated states, as well as for those switching to the background state (Figure 6B, middle row). Notably, the vast majority of rewiring events (99.7%) were not associated with larger-scale A/B compartment dynamics (not shown). Together, these observations indicate that lineage commitment associates with concerted changes in the connectivity and chromatin state of regulatory regions.
Interactions at NR2F1 (Figure 6C) and ZSCAN2 (Figure 6—figure supplement 1A) exemplify the preferential co-occurrence of rewiring and recolouring events, with interactions present in the cell type in which the respective PIR is in the active state. However, we also found examples of PIR rewiring that showed unchanged chromatin states in both cell types, such as those at the JAG1 and HAPL3 genes (Figure 6D and Figure 6—figure supplement 1B). Finally, we observed that 25% of PIRs that were retained in both cell types undergo chromatin state recolouring (Figure 6E, red segment). These regions included, for example, PIRs associated with the IRX3 and RAB3B promoters (Figure 6F and Figure 6—figure supplement 1C).
We asked how rewiring and recolouring events at PIRs contribute to gene expression dynamics on ESC to NEC differentiation. We found that the loss or gain of interactions with active-state PIRs associated significantly with changes in gene expression (Figure 6G), suggesting their functional contribution to transcriptional control. Notably, gene expression changes were detected at retained and recoloured PIRs (Figure 6G, left panel), and also when active-state PIRs were lost or gained through rewiring (Figure 6G, right panel).
Taken together, these findings demonstrate that chromatin state changes and rewiring of interactions at PIRs contribute jointly to transcriptional regulation. Furthermore, our results show that promoter interaction dynamics preferentially co-occur with chromatin state dynamics upon cell lineage commitment.
Our study presents an atlas of promoter interactions in human pluripotent and early lineage-committed cells, and offers new insights into the association between genome organisation and gene regulation. The high resolution of PCHi-C has enabled us to detect individual promoter-associated loops at a single restriction enzyme fragment resolution. We find that promoter-interacting regions in both cell types harbour multitudes of previously known and putative enhancer elements, which we link with their physically associated target genes. While the identified connections are predictive of regulatory relationships, it is important to note that the current data are correlative and will require functional validation using targeted genetic approaches and reporter assays. In addition to active enhancers, we find extensive promoter connectivity to regions associated with Polycomb-mediated repression and poising, reinforcing the role of Polycomb-group proteins in controlling chromosomal topology at transcriptionally inactive genes (Entrevan et al., 2016; Li et al., 2015; Schoenfelder et al., 2015b; Vieux-Rochas et al., 2015). Consistent with previous observations (Sanyal et al., 2012), we also detect large numbers of interactions between promoters of both active and inactive genes, and regions devoid of chromatin features. It is possible that such interactions are structural, rather than play gene regulatory roles. However, a regulatory function for some ‘unmarked’ PIRs also cannot be ruled out as recent mutagenesis experiments have identified functional elements that lack 'classic' chromatin annotations (Pradeepa et al., 2016; Rajagopal et al., 2016).
The high-resolution promoter-interaction information has enabled the identification of cis-regulatory units (CRUs) as sets of interactions connected to the same promoter. Taking the view of CRUs, we consider jointly the dynamics of chromatin states and interactions as ESCs differentiate, and assess the potential contribution of these processes to changes in gene expression during development. We observe that CRUs reconfigure extensively upon cell differentiation. These include the CRUs of ESC- and NEC-specific genes, for which CRU reconfiguration associates with transcriptional changes upon differentiation, as well as the CRUs of genes that are not expressed in either cell type, consistent with the model of progressive chromatin changes at lineage-inappropriate genes during lineage commitment (Spivakov and Fisher, 2007).
Previous studies on candidate loci have proposed that cell-state changes are associated with two predominant models of enhancer–promoter interaction dynamic that have been termed instructive and permissive (de Laat and Duboule, 2013). Instructive interactions are established de novo upon cell state change and are concomitant with target gene activity. In contrast, permissive interactions are already in place before the gene activation occurs and may therefore contribute to enhancer priming. Examples of instructive and permissive interactions have been described in pluripotent cell reprogramming and differentiation (Apostolou et al., 2013; Denholtz et al., 2013; Phillips-Cremins et al., 2013; Wei et al., 2013; Zhang et al., 2013), but little was known on a global scale about which model of enhancer-promoter interactions is predominant during lineage commitment. Here, we show that developmental changes at CRUs involve both the rewiring of ‘instructive’ interactions and the recolouring of the chromatin state of ‘permissive’ interacting regions. Notably, we find that these two processes tend to occur hand in hand, with the strongest association occurring between cell-type-specific promoter interactions and the active state of the respective PIRs. Importantly, CRU chromatin dynamics (at both rewired and preformed interactions) associates with consistent changes in gene expression, suggesting that both mechanisms are functionally important in mediating lineage-specific transcriptional programmes. The exact determinants of ‘permissive’ versus ‘instructive’ interactions remain to be elucidated and may depend on the identity of cis-acting factors recruited to the regulatory regions, as well as on local chromatin environments.
The CRU view provides an opportunity to consider multi-modular gene regulation in early human development that has hitherto been studied on a limited number of genes, predominantly in model organisms (Barolo, 2012; Cannavò et al., 2016; Hong et al., 2008). The ‘single-state’ architecture that we detect at the majority of CRUs is in line with observations of ‘shadow enhancers’ with overlapping activities in Drosophila (Hong et al., 2008). It has been suggested that this cis-regulatory organisation ensures the robustness of gene regulation and can buffer the effects of deleterious sequence variation, as well as providing opportunities for evolutionary innovation (Barolo, 2012; Cannavò et al., 2016; Hong et al., 2008; Perry et al., 2010).
‘Dual-state’ CRUs, although representing a relative minority of the CRUs we analysed, offer additional insights into signal integration at promoters. Specifically, the fact that the chromatin state of the promoter largely associates with the predominant chromatin state of the connected PIRs suggests that promoters may integrate signals from remote elements based on ‘majority vote logic’. Mechanistically, this logic may be a consequence of largely independent enhancer action (potentially at both single-state and dual-state CRUs) that is consistent with the ‘hit-and-run’ model of transcriptional regulation (Schaffner, 1988; Varala et al., 2015), and provides a flexible way to fine-tune the expression of multi-enhancer genes (Guerrero et al., 2010; Lagha et al., 2012). However, this model also does not preclude the possibility that promoter chromatin states at ‘dual-state’ CRUs undergo a continuous turnover depending on the state of the PIR they contact. In this case, the observed ‘majority-vote’ promoter chromatin states would correspond to the predominant state detected at the population level.
Theoretically, the generally independent enhancer action also enables activation signals from individual elements to quantitatively ‘add up’ (at least to some extent) to promote stronger transcriptional outputs (Arnold et al., 2013; Bothma et al., 2015; Lam et al., 2015; Spivakov, 2014). Our observation that the resolution of dual-state CRUs toward a uniformly active state generally results in increased expression (and vice versa) supports this model. However, analyses in Drosophila have identified exceptions to additive enhancer activity (Bothma et al., 2015) and have provided examples of enhancers that activate more than one promoter in a coordinated fashion (Fukaya et al., 2016), which is not immediately expected from the ‘hit-and-run’ looping model. Finally, there is also a possibility that multiple enhancers are jointly engaged in ‘chromatin hubs’ with promoters, rather than acting individually (Hanscombe et al., 1991; Jiang et al., 2016; Patrinos et al., 2004; Tolhuis et al., 2002; Wijgerde et al., 1995). These mechanistic questions go beyond the capabilities of Hi-C-based analyses of cell populations, and as such it is possible that multiple promoter interactions detected within a CRU take place either concurrently or simultaneously. The emerging studies at the single-molecule level (such as [Bartman et al., 2016; Fukaya et al., 2016]) will likely shed further light on the molecular mechanisms that underpin the principles of CRU organisation.
Promoter – enhancer interactions are vitally important for gene regulation and their disruption may lead to pronounced developmental abnormalities (Epstein, 2009). The high-resolution resource of the promoter-interaction landscape in pluripotent and early lineage-committed cells presented here, therefore, provides a stepping stone to understanding the logic of gene regulation and its aberrations during human embryogenesis.
ESCs (H9/WA09; obtained from WiCell (Madison, WI); RRID:CVCL_9773) were cultured at 37°C in 5% CO2 in air in Pluripro media and matrix (Cell Guidance Systems (Cambridge, UK)). Authentication of ESCs was achieved by confirming the expression of pluripotency genes and protein markers, and by SNP analysis of sequencing data. ESCs were routinely verified as mycoplasma-free using a PCR-based assay (Sigma (St. Louis, MO)). The H9/WA09 line is not on the list of commonly misidentified cell lines (International Cell Line Authentication Committee). ESCs were differentiated into NECs using a previously described protocol (Rada-Iglesias et al., 2011) and samples were harvested on day 7.
Following dissociation with accutase, ESCs and NECs were stained on ice for 45 min with CD326-AF647 (BioLegend (London, UK), Cat# 324212, RRID:AB_756086; 5 µL per million cells) and CD56-PE (BD Biosciences (San Jose, CA), Cat# 345812, RRID:AB_2629216; 20 µL per million cells) antibodies in 100 µl PBS containing 2% FBS. After washing, DAPI was included at a final concentration of 5 µl/mL for live/dead cell discrimination, and flow cytometry analysis was performed using a BD LSRFortessa with subsequent data analysis using FlowJo V10.1.
Hi-C and Promoter CHi-C libraries were generated essentially as described (Mifsud et al., 2015; Schoenfelder et al., 2015a), with minor modifications. 3 to 4 × 107 cells (ESCs or NECs) were fixed in 2% formaldehyde (Agar Scientific (Stansted, UK)) for 10 min, after which the reaction was quenched with ice-cold glycine (0.125 M final concentration). Cells were collected by centrifugation (400 x g for 10 min at 4°C), and washed once with PBS (50 ml). After another centrifugation step (400 x g for 10 min at 4°C), the supernatant was completely removed and the cell pellets were immediately frozen in liquid nitrogen and stored at −80°C. After thawing, the cell pellets were incubated in 50 ml ice-cold lysis buffer (10 mM Tris-HCl pH 8, 10 mM NaCl, 0.2% Igepal CA-630, protease inhibitor cocktail (Roche (Basel, Switzerland)) for 30 min on ice. After centrifugation to pellet the cell nuclei (650 x g for 5 min at 4°C), nuclei were washed once with 1.25 x NEBuffer 2. The nuclei were then resuspended in 1.25 x NEBuffer 2, SDS was added (0.3% final concentration) and the nuclei were incubated at 37°C for 1 hr with agitation (950 rpm). Triton X-100 was added to a final concentration of 1.7% and the nuclei were incubated at 37°C for 1 hr with agitation (950 rpm). Restriction digest was performed overnight at 37°C with agitation (950 rpm) with HindIII (NEB; 1500 units per 7 million cells). Using biotin-14-dATP (Life Technologies (Carlsbad, CA)), dCTP, dGTP and dTTP (all at a final concentration of 30 µM), the HindIII restriction sites were then filled in with Klenow (NEB (Ipswich, MA)) for 75 min at 37°C. After addition of SDS (1.42% final concentration) and incubation at 65°C with agitation (950 rpm) for 20 min, ligation was performed for 4 hr at 16°C (50 units T4 DNA ligase (Life Technologies) per 7 million cells starting material) in a total volume of 8.2 ml ligation buffer (50 mM Tris-HCl, 10 mM MgCl2, 1 mM ATP, 10 mM DTT, 100 µg/ml BSA, 0.9% Triton X-100) per 7 million cells starting material. After ligation, reverse crosslinking (65°C overnight in the presence of Proteinase K (Roche)) was followed by RNase A (Roche) treatment and two sequential phenol/chloroform extractions. After DNA precipitation (sodium acetate 3 M pH 5.2 (1/10 volume) and ethanol (2.5 x volumes)) overnight at −20°C, the DNA was spun down (centrifugation 3200 x g for 30 min at 4°C). The pellets were resuspended in 400 µl TLE (10 mM Tris-HCl pH 8.0; 0.1 mM EDTA), and transferred to 1.5 ml eppendorf tubes. After another phenol/chloroform extraction and DNA precipitation overnight at −20°C, the pellets were washed three times with 70% ethanol, and the DNA concentration was determined using Quant-iT Pico Green (Life Technologies). The efficiency of biotin incorporation was assayed by amplifying a 3C ligation product (primers available upon request), followed by digest with HindIII or NheI.
To remove biotin from non-ligated fragment ends, 40 µg of Hi-C library DNA were incubated with T4 DNA polymerase (NEB) for 4 hr at 20°C, followed by phenol/chloroform purification and DNA precipitation overnight at −20°C. After a wash with 70% ethanol, sonication was carried out to generate DNA fragments with a size peak around 400 bp (Covaris E220 settings: duty factor: 10%; peak incident power: 140W; cycles per burst: 200; time: 55 s). After end repair (T4 DNA polymerase, T4 DNA polynucleotide kinase, Klenow (all NEB) in the presence of dNTPs in ligation buffer (NEB)) for 30 min at room temperature, the DNA was purified (Qiagen (Hilden, Germany) PCR purification kit). dATP was added with Klenow exo- (NEB) for 30 min at 37°C, after which the enzyme was heat-inactivated (20 min at 65°C). A double size selection using AMPure XP beads (Beckman Coulter, Brea, CA) was performed: first, the ratio of AMPure XP beads solution volume to DNA sample volume was adjusted to 0.6:1. After incubation for 15 min at room temperature, the sample was transferred to a magnetic separator (DynaMag-2 magnet; Life Technologies), and the supernatant was transferred to a new eppendorf tube, while the beads were discarded. The ratio of AMPure XP beads solution volume to DNA sample volume was then adjusted to 0.9:1 final. After incubation for 15 min at room temperature, the sample was transferred to a magnet (DynaMag-2 magnet; Life Technologies). Following two washes with 70% ethanol, the DNA was eluted in 100 µl of TLE (10 mM Tris-HCl pH 8.0; 0.1 mM EDTA). Biotinylated ligation products were isolated using MyOne Streptavidin C1 Dynabeads (Life Technologies) on a DynaMag-2 magnet (Life Technologies) in binding buffer (5 mM Tris pH8, 0.5 mM EDTA, 1 M NaCl) for 30 min at room temperature. After two washes in binding buffer and one wash in ligation buffer (NEB), PE adapters (Illumina, San Diego, CA) were ligated onto Hi-C ligation products bound to streptavidin beads for 2 hr at room temperature (T4 DNA ligase NEB, in ligation buffer, slowly rotating). After washing twice with wash buffer (5 mM Tris, 0.5 mM EDTA, 1 M NaCl, 0.05% Tween-20) and then once with binding buffer, the DNA-bound beads were resuspended in a final volume of 90 µl NEBuffer 2. Bead-bound Hi-C DNA was amplified with seven PCR amplification cycles using PE PCR 1.0 and PE PCR 2.0 primers (Illumina). After PCR amplification, the Hi-C libraries were purified with AMPure XP beads (Beckman Coulter). The concentration of the Hi-C libraries was determined by Bioanalyzer profiles (Agilent Technologies, Santa Clara, CA) and qPCR (Kapa Biosystems (Wilmington, MA)), and the Hi-C libraries were paired-end sequenced (HiSeq 1000, Illumina) at the Babraham Institute Sequencing Facility.
For Promoter Capture Hi-C, 500 ng of Hi-C library DNA was resuspended in 3.6 µl H2O, and custom hybridization blockers (Agilent Technologies) were added to the Hi-C DNA. Hybridization buffers and the custom-made RNA capture bait system (Agilent Technologies; designed as previously described (Mifsud et al., 2015): 37,608 biotinylated RNAs targeting the ends of 22,076 promoter-containing HindIII restriction fragments) were prepared according to the manufacturer’s instructions (SureSelect Target Enrichment, Agilent Technologies). The Hi-C library DNA was denatured for 5 min at 95°C, and then incubated with hybridization buffer and the RNA capture bait system at 65°C. After 24 hr incubation at 65°C, biotin/streptavidin pulldown (MyOne Streptavidin T1 Dynabeads; Life Technologies) and washes were performed according to the SureSelect Target enrichment protocol (Agilent Technologies). After the final wash, the beads were resuspended in 30 µl NEBuffer 2. After a post-capture PCR (four amplification cycles using Illumina PE PCR 1.0 and PE PCR 2.0 primers), the Promoter CHi-C libraries were purified with AMPure XP beads (Beckman Coulter). The concentration of the Promoter CHi-C libraries was determined by Bioanalyzer profiles (Agilent Technologies) and qPCR (Kapa Biosystems), and the Promoter CHi-C libraries were paired-end sequenced (HiSeq 1000, Illumina) at the Babraham Institute Sequencing Facility.
Raw sequencing reads were processed using the HiCUP pipeline (Wingett et al., 2015), which mapped sequencing read pairs against the human genome (GRCh37), filtered out experimental artifacts such as circularized reads and re-ligations, and removed all duplicate read pairs. The aligned Hi-C data were analysed using HOMER v4.7 (http://homer.salk.edu/homer/) (Heinz et al., 2010). Coverage- and distance-related correction factors of the binned data were computed at 25 kb and 250 kb resolutions, based on the iterative correction algorithm (Imakaev et al., 2012). TADs were identified based on directionality indices (Dixon et al., 2012) of Hi-C interactions 1 Mb upstream and downstream from a 25 kb sliding window every 5 kb steps, which were then smoothed using a running average over a ± 25 kb window. TADs were called between pairs of consecutive local maxima (TAD start) and minima (TAD end) of the smoothed directionality indices with a standard score difference (TAD ∆Z score) above 2.0, and the TAD ends were extended outward to the genomic bins with no directionality bias. These TAD definitions were used to compute the fraction of significant PCHi-C interactions falling inside TADs, alongside TADs reported by Dixon et al., 2015. To assess the strength of the TAD boundaries crossed by promoter interactions, we defined a TAD boundary strength score (TADB ∆Z score) as the difference between the smoothed directionality index values at the local maximum (end of the preceding TAD) and the local minimum (start of the following TAD). Defined this way, TADB ∆Z scores (unlike the TAD ∆Z scores) do not depend on the stringency of the opposite boundary of the respective TAD.
A/B compartments were called by computing the principal components of the distance- and coverage-corrected interaction profile correlation matrix at 250 kb resolution (Lieberman-Aiden et al., 2009). Positive values of the principal component were aligned with H3K4me3 ChIP-seq signals for H9 human ESCs (Rada-Iglesias et al., 2011). For chromosomes 4 and X, we used the second principal component instead of the first, as the first component described the preferential contact pattern within chromosome arms rather than compartments. The principal component values ranged from −42 to 42. To quantify the compartment changes of significant interactions, each side of the interaction was classified as A or B compartment if the principal component of its 250 kb bin was above 10 or below −10, respectively. Interactions falling within the 250 bins that had the principal component scores between −10 and 10 were considered as falling outside either compartment.
Interactions were called at the level of individual HindIII fragments using version 0.1.4 of the CHiCAGO pipeline (Cairns et al., 2016) based on two biological replicates for each cell type that were normalised and combined as part of the pipeline. CHiCAGO incorporates a convolution background model, which emcompasses the ‘Brownian’ (real, but expected interactions) and ‘technical’ (assay and sequencing artefacts) components, and a weighted multiple testing correction procedure trained on interaction distance. CHiCAGO interaction scores correspond to –log-transformed, weighted p-values for each fragment read pair. A score threshold of 12 was used (equivalent to a threshold of 5 in Chicago v1.0.0+ due to a soft-thresholding procedure introduced in this version). This threshold was chosen empirically based on balancing the enrichment for chromatin marks at PIRs with the overall number of detected interactions. Additionally, interactions with scores between 11 and 12 were included in the analysis if they scored above 12 in the other cell type.
Total RNA was extracted from ESCs and NECs using an RNeasy Mini Kit (Qiagen). Indexed mRNA-seq libraries were constructed from 500 ng total RNA using the Tru-Seq RNA Library Prep Kit v2 (Illumina). Library fragment size and concentration was determined using an Agilent Bioanalyzer 2100 and KAPA Library Quantification Kit (KAPA Biosystems). Samples were sequenced on an Illumina HiSeq as single-end libraries at the Babraham Institute Sequencing Facility. Reads were trimmed using trim galore (http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) with default parameters to remove the standard Illumina adapter sequence. Reads were mapped to the GRCh37 assembly using tophat (Trapnell et al., 2009). BAM files were imported to Seqmonk (http://www.bioinformatics.babraham.ac.uk/projects/seqmonk/). Raw read counts per transcript were calculated using the RNA-seq quantitation pipeline on the Ensembl v70 gene set using non-directional counts. Differential analysis of gene expression was performed using the default settings in DESeq2 (Love et al., 2014) without independent filtering of the results. Differentially expressed genes were called at padj < 0.05 and log2 fold change above 1.5 or below −1.5.
The histone modification ChIP-seq data (H3K4me1, H3K4me3, H3K27ac and H3K27me3) for ESCs and NECs were from Rada-Iglesias et al., 2011, available in Gene Expression Omnibus under accession number GSE24447. Data were converted to GRCh37 using liftOver (Kent et al., 2002). CTCF ChIP-seq data were from ENCODE (ENCODE Project Consortium, 2012).
Chromatin segmentations were performed on the basis of multiple histone modification ChIP datasets using a Hidden Markov Model-based method implemented in ChromHMM (version 1.10) (Ernst and Kellis, 2012) with default settings. The segmentation was carried out jointly through providing ‘concatenated’ data for both cell types as input. The resulting 16 states were curated into four broad chromatin states based on analysing their enrichment for different histone marks (Figure 1—figure supplement 3A) as follows. States 1–6 characterised by the presence of H3K4me3 and/or H3K27ac, and the absence of H3K27me3, were labelled ‘active’; states 7–9 showing a combination of H3K4 methylation and H3K27me3 were labelled ‘poised’; state 10 showing H3K27me3 and no H3K4 methylation was labelled ‘Polycomb-associated’; states 14–16 showing no detectable signal for the four tested histone modifications were labelled ‘background’. In addition, two more curated states were defined, but not considered further: states 11–12 were characterised by a ‘mixed’ pattern of both H3K27ac and H3K27me3, which likely arose from a technical issue such as heterogeneity within the samples; state 13 characterised by H3K4me1 alone was classified as ‘intermediate enhancers’, but the fraction of PIRs bearing this signature (~1%) was too small to analyse them as an individual category. HindIII fragments in the human genome (including baits and PIRs) were then classified according to the chromatin states detected within them. When more than one chromatin state was present, classification was resolved in the following manner: (i) any functional state (e.g. active, poised, Polycomb-associated) was prioritised above background; (ii) active, poised and Polycomb-associated states were prioritised above intermediate; (iii) poised state was prioritised above the Polycomb-associated state; (iv) active state together with any inactive state (i.e. poised or Polycomb-associated) was labelled as mixed. Based on these heuristics, we assigned a single chromatin state (including the background state) to 81% of PIRs in both cell types.
Transgenic reporter assays for enhancer activity are described within the VISTA Enhancer Browser (Visel et al., 2007). The enhancer sequences from VISTA were overlapped with PIRs, and their putative target genes were defined according to the PCHi-C detected promoter-PIR interactions.
HindIII fragments were mapped to TADs defined as described above and INs obtained from Ji et al., 2016. Baited fragments overlapping TAD boundaries, and those mapping outside INs, were excluded from respective analyses. For each CRU, the percentage of interactions that map within the same TAD or IN was calculated and these values were collected into 12 bins, The first and last bins contained the values of 0% and 100%, respectively, and the remaining bins contained all other values split into 10% intervals. These results were compared to 1000 random permutations of CRUs across all promoter fragments performed in a manner retaining the overall CRU structure.
Each CRU was categorised according to the fraction of PIRs in the active, poised and Polycomb-associated state. These fractions were used for hierarchical clustering based on Euclidian distances (method='Euclidian’ in dist function in R) with the average agglomeration method (method='average’ in hclust function in R). PIRs assigned intermediate or mixed chromatin states did not contribute to the clustering procedure, in the latter case because the states of the regulatory elements interacting with target promoters within these PIRs are not identifiable. CRUs containing only mixed or intermediate PIRs were not included in the analysis.
False-negative rates associated with stringent signal thresholds drive down the observed overlap between conditions and may overestimate the proportion of cell-type-specific interactions. Therefore, we applied additional criteria to identify high-confidence subsets of rewired and retained interactions based on replicate-level CHiCAGO interaction calls. First, we required that rewired interactions have scores above 12 in both biological replicates of the same cell type, and below 12 in both replicates of the other cell type. We then binned the interactions satisfying these criteria into five groups of equal size according to their interaction scores in the merged samples. Interactions belonging to the top bin in one cell type and the bottom bin in the other cell type were considered as rewired. Interactions with scores above 12 in the two replicates of both cell types were considered as retained. Applying these criteria and filtering out interactions with PIRs in the mixed and intermediate states, we obtained high-confidence sets of 1258 retained and 1153 rewired interactions that were used in the analysis.
Sequencing data have been deposited in Gene Expression Omnibus (GEO) with accession number GSE86821. Processed data including interaction peak calls in the WashU Genome Browser text format and RNA-seq raw read counts were deposited in the same GEO repository. CHiCAGO objects containing all detected interactions, ChromHMM segmentation data, DESeq2-processed RNA-seq data and the defitions of TADs have been made available through the Open Science Framework (http://osf.io/sdbg4).
Molecular stages of rapid and uniform neuralization of human embryonic stem cellsCell Death and Differentiation 16:807–825.https://doi.org/10.1038/cdd.2009.18
Enhancers: the abundance and function of regulatory sequences beyond promotersDevelopmental Biology 339:250–257.https://doi.org/10.1016/j.ydbio.2009.11.035
Modification of enhancer chromatin: what, how, and why?Molecular Cell 49:825–837.https://doi.org/10.1016/j.molcel.2013.01.038
Exploring the three-dimensional organization of genomes: interpreting chromatin interaction dataNature Reviews Genetics 14:390–403.https://doi.org/10.1038/nrg3454
Transcriptional regulation by polycomb group proteinsNature Structural and Molecular Biology 20:1147–1155.https://doi.org/10.1038/nsmb.2669
Regulation of genome architecture and function by polycomb proteinsTrends in Cell Biology 26:511–525.https://doi.org/10.1016/j.tcb.2016.04.009
Importance of globin gene order for correct developmental expressionGenes and Development 5:1387–1394.https://doi.org/10.1101/gad.5.8.1387
Genome-wide map of regulatory interactions in the human genomeGenome Research 24:1905–1917.https://doi.org/10.1101/gr.176586.114
Mechanisms of transcriptional precision in animal developmentTrends in Genetics 28:409–416.https://doi.org/10.1016/j.tig.2012.03.006
Transcriptional enhancers: functional insights and role in human diseaseCurrent Opinion in Genetics and Development 33:71–76.https://doi.org/10.1016/j.gde.2015.08.009
In the loop: promoter-enhancer interactions and bioinformaticsBriefings in Bioinformatics 17:bbv097.https://doi.org/10.1093/bib/bbv097
Enhancers: emerging roles in cell fate specificationEMBO Reports 13:423–430.https://doi.org/10.1038/embor.2012.52
Multiple interactions between regulatory regions are required to stabilize an active chromatin hubGenes and Development 18:1495–1509.https://doi.org/10.1101/gad.289704
Shadow enhancers foster robustness of Drosophila gastrulationCurrent Biology 20:1562–1567.https://doi.org/10.1016/j.cub.2010.07.043
EMAGE mouse embryo spatial gene expression database: 2014 updateNucleic Acids Research 42:D835–D844.https://doi.org/10.1093/nar/gkt1155
A predictive modeling approach for cell line-specific long-range regulatory interactionsNucleic Acids Research 43:8694–8712.https://doi.org/10.1093/nar/gkv865
Genome-wide mapping and analysis of chromosome architectureNature Reviews Molecular Cell Biology 17:743–755.https://doi.org/10.1038/nrm.2016.104
Transcriptional enhancers: from properties to genome-wide predictionsNature Reviews Genetics 15:272–286.https://doi.org/10.1038/nrg3682
Transcription factors: from enhancer binding to developmental controlNature Reviews Genetics 13:613–626.https://doi.org/10.1038/nrg3207
VISTA enhancer browser--a database of tissue-specific human enhancersNucleic Acids Research 35:D88–D92.https://doi.org/10.1093/nar/gkl822
Job DekkerReviewing Editor; University of Massachusetts Medical School, United States
In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.
Thank you for submitting your article "Global reorganisation of cis-regulatory units upon lineage commitment of human embryonic stem cells" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom, Job Dekker (Reviewer#1), is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Fiona Watt as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Frank Alber (Reviewer #3).
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission. In the course of discussion the reviewers and editors decided that this work would be more appropriately reconsidered as a Tools and Resources (TR) paper rather than a Research Article (RA). The work appears to be a valuable resource but the primary research results are not considered sufficiently novel to warrant publication as an RA. Nonetheless, we hope you are willing to go forward with this in the category of a TR paper.
Freire-Pritchett et al. employ Capture Hi-C to detect chromatin interactions between promoters and distal elements in ES cells and in ESC derived neuroectodermal cells (NECs). The authors report significant interactions between many promoters and different distal elements in ESCs and NECs. Analysis of the chromatin state of these elements indicates that these are functional elements including (poised) enhancers. Further, the connectivity between promoters and distal elements differs between ESCs and NECs and this relates to the transcriptional status of the genes and chromatin status of the distal elements. The authors then define cis-regulatory units (CRUs) as promoters and their associated distal elements. Although many of the CRUs are contained within TADs and INs, as had been expected based on earlier studies, the authors report that many can extend beyond TAD and IN boundaries.
The main impact of this paper is to announce the availability of a large data set of enhancer-promoter contacts identified via capture Hi-C. Many of the experimental technologies and statistical analysis methods are established and rely on previously published methodologies. That said the reviewers raise concerns about specific aspects of the analysis. Further, many of the reported correlations are already known and the results confirm earlier studies (e.g. promoters interact with multiple distal elements, these elements coincide with cell type-specific enhancer-like elements and these connections are equally cell type-specific, and enriched for intra-TAD interactions). The main new aspect of this work is the fact that a genome-wide promoter-anchored interaction atlas is described, and this resource could be of interest to the community given that many study these cells.
Please address these main criticisms (more details are in the minor points section):
1) Many of the correlations between looping, gene expression and chromatin state are not novel. Focus on the new things, and present the data as a large dataset or resource for the community.
2) The reviewers raised issues related to the statistical analysis and computational methods to determine PIRs (and their hierarchical clustering), TADs, and CRUs (below). Please address all of them.
3) Related to the previous point, the reviewers raised concerns about the claim that CRUs are distinct from INs, TADs. The main point to address is how the methods to determine these features compare and whether lack of overlap can be due to sensitivity of the computational methods.
1) One concern is that the CHiCAGO method identifies significant interactions using only the general distance decay of interactions into account. It does not take into account that any pair of interactions between loci located within a TAD or IN, or between loci located in a similar compartment (A, B or subcompartment types) are generally higher than loci located in different TADs/INs/Compartments. As a result, a number of significant interactions are not necessarily "loops" (specific point-to-point interactions). This is not to say that these loci do not interact more frequently than expected given their genomic distance, but these interactions can reflect general higher order structures such as domains etc. This can explain why many interacting loci appear to not have chromatin marks: these can be "bystander" interactions that are the result of a nearby structural feature. I recommend that the authors repeat their analysis using a background model that takes domains into consideration.
2) The TAD calling procedure is quite simple. Is there any evidence that this procedure produces TAD calls of comparable quality to those produced by, for example, the HMM of Dixon et al. Nature 2012?
3) The TAD and IN comparison is unsatisfying. The main conclusion seems to be that CRUs are not particularly aligned with TADs and INs. This could be because the two phenomena are more or less independent of each other, or because the TADs, INs and CRUs are each defined in a fairly arbitrary fashion. If, for example, the simple TAD calling scheme employed here is not working well, then it would lead to the observed discordance.
4) The finding that CRUs can extend beyond TAD or IN boundaries is interesting, but the data is not sufficient to claim a new "feature of genome architecture". Further analyses are needed such as: when a CRU extends beyond a TAD boundary, is this a weak TAD boundary? It is well known that TADs are nested structures. Do CRUs remain within the larger nested domains? This seems to be the case for POUF2. Quantification of boundary strength is required, as it seems not inconsistent with the TAD/IN models of enhancer action if the authors find interactions across weak boundaries, but not strong boundaries. Also, are these interactions beyond the boundaries equally strong, or are they much lower in contact frequency
5) The main text claims that the promoter capture Hi-C data was processed to identify "significant cis-interactions." Looking at Methods, we learn that these contacts correspond to a threshold of 12, on a "log-transformed, weighted p-value" scale. This vague description makes it impossible to actually interpret the confidence associated with a threshold of 12, especially since we are told that the threshold corresponds to "a threshold of 5 in Chicago v1.0.0+ due to a soft-thresholding procedure introduced in this version." The choice of threshold must be fully justified and described. The text claims that Chicago uses an FDR control procedure, so the interactions that are presented here should be reported along with their FDR threshold.
6) The manuscript contains no details on precisely how ChromHMM was run. Details like software version numbers and any parameter settings should be given. Importantly, the text should specify whether the segmentation was carried out independently on each cell type, or jointly, and if the latter, whether the data tracks were "stacked" (i.e., 8 marks per position) or "concatenated" (4 marks per position, across two concatenated copies of the genome).
7) Details of the "curation" of the 16 inferred states should also be provided. When the 16 states were collapsed to four states, what was the basis for this collapsing? It seems strange that some states were eliminated at this stage because they contained multiple marks.
8) How are PIRs defined: at the level of individual restriction fragments, or are adjacent fragments that both score significant merged into a single PIR?
9) The authors do not show many tracks of real data (only in one of the supplemental figures). Throughout they only show arcs to indicate significant contacts. This makes it difficult to assess the quality of the data.
10) Promoters often interact with CTCF sites. Did the authors find this in their dataset as well?
11) The authors focus on cis-interactions. Can the authors describe trans interactions as well? Given that trans-regulation is considered rare, this analysis will provide context for interpretation of the cis data.
12) Did the authors detect any known enhancer-promoter pairs? Also, the authors imply that the distal elements that touch promoters are regulating these promoters, but beside correlations between interactions, transcription and chromatin state of the distal elements, no evidence for direct regulation is provided. This would require deletion of the elements, e.g. by CRISPR. The Vista analysis of the POUF2 interactions suggests that many PIRs are enhancers. That is similar in information as the observation that PIRs overlap loci with histone modifications associated with enhancers. The key thing to validate is that these enhancers indeed regulate the interacting promoter. The fact that the pattern of activity of these enhancers resembles POU3F2 expression is encouraging but not definitive. The authors should either provide direct evidence for functional relationships or clearly discuss the extent to which the current data predicts such relationships even though the data currently is correlative.
13) The authors examine the dynamic rewiring and recoloring of interactions as cells differentiate from ESCs to NECs. They report gain and loss of interactions and these are related to gain or loss of chromatin marks. This is interesting, but what needs to be tested is whether this is related compartment changes (see above). This is not to say these interactions are not relevant, but it would be important to determine how they relate to larger scale compartment changes. One interesting implication could be that in fact cell type-specific compartment changes are driven by altered interactions between functional elements that are active in a cell type-specific manner.
14) How do the authors interpret interactions that involve non-expressed genes?
15) The hierarchical clustering of PIRs by the prevalence of different chromatin state labels is fairly uninformative. The fact that these different categories exist in the data is not surprising, and the claim that these eight labels somehow have "potential implications for understanding the logic of signal integration at promoters" is not well supported.
16) In the Abstract, the sentence "Here, we generate[…]" should mention what assay was employed.
17).In the Introduction: "The extreme combinatorial complexity of Hi-C[…]" The complexity is really only quadratic, not combinatorial.
18). In the Results section: The URL for data availability should be included at the end of the sentence, "This data resource[…]"
19) Also in the Results section: "NEC PIRs were strongly enriched" This claim requires statistical support.
20) New paragraph at "We next sought[…]"
21) Two of the observations offered in the Discussion do not seem to be supported by direct evidence in Results: "we find extensive promoter connectivity to regions associated with Polycomb-associated repression and poising, and "we[…] detect large numbers of promoter interactions with regions devoid of chromatin features." These observations seem to be offered for the first time in Discussion.
22) The work does not show whether all the interactions are present simultaneously in a cell, or if the promoter state may vary between cells, depending on the probability with which the promoter is interacting with any of the potential alternative enhancers of a given chromatin state. Please discuss this issue in the main text.
23) Out of the ~21000 promoters, the authors focused their CRU analysis on only 16,000 protein-coding genes. Would the authors expect differences in the outcome if the other remaining promoters would be considered?
24) Out of the 16,000 protein-coding promoters about 9000 were defined as CRU. It may be good to know the selection criteria (i.e. is it based on a minimum number of PIRs per promoter?).
25) Figure 3B defines a CRU as a set of PIRs that seem to be concurrently interacting with the promoter in the same cell. Maybe the authors could indicate in the caption that it cannot be ruled out that some PIRs may provide alternative rather then concurrent interactions.https://doi.org/10.7554/eLife.21926.031
- Peter J Rugg-Gunn
- Paula Freire-Pritchett
- Stefan Schoenfelder
- Csilla Várnai
- Steven W Wingett
- Jonathan Cairns
- Mayra Furlan-Magaril
- Peter Fraser
- Mikhail Spivakov
- Amanda J Collier
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
We thank Kristina Tabbada at the Babraham Institute Sequencing Facility, and Sarah Elderkin, Simon Andrews, Wolf Reik and members of our groups for insightful discussions. PF, PJR–G and MS are supported by the BBSRC. CV and PF are supported by the European Research Council (Advanced Grant 111608). AJC is supported by an MRC DTG Studentship (MR/J003808/1). RG–V was supported by the ERASMUS+ Program. PJR–G was supported by the Wellcome Trust (WT093736).
- Job Dekker, Reviewing Editor, University of Massachusetts Medical School, United States
© 2017, Freire-Pritchett et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.