Abstract
The enhancer-promoter looping model, in which enhancers activate their target genes via physical contact, has long dominated the field of gene regulation. However, the ubiquity of this model has been questioned due to evidence of alternative mechanisms and the lack of its systematic validation, primarily owing to the absence of suitable experimental techniques. In this study, we present a new MNase-based proximity ligation method called MChIP-C, allowing for the measurement of protein-mediated chromatin interactions at single-nucleosome resolution on a genome-wide scale. By applying MChIP-C to study H3K4me3 promoter-centered interactions in K562 cells, we found that it had greatly improved resolution and sensitivity compared to restriction endonuclease-based C-methods. This allowed us to identify EP300 histone acetyltransferase and the SWI/SNF remodeling complex as potential candidates for establishing and/or maintaining enhancer-promoter interactions. Finally, leveraging data from published CRISPRi screens, we found that most functionally-verified enhancers do physically interact with their cognate promoters, supporting the enhancer-promoter looping model.
Introduction
For decades since the discovery of the first enhancers, the enhancer-promoter (E-P) looping model, in which enhancers activate their target genes via physical contact, has dominated the field (Popay and Dixon, 2022; Ptashne, 1986). However, direct empirical evidence for this hypothesis has been scarce, and only a handful of credible enhancers were demonstrated to be in physical proximity to their promoter partners. These include classical enhancers and their established targets such as globin genes, HoxD cluster genes, Shh and PAX6 genes (Davies et al., 2016; Freire-Pritchett et al., 2017; Hua et al., 2021; Montavon et al., 2011; Oudelaar et al., 2019; Symmons et al., 2016; Tolhuis et al., 2002). Simultaneously, some recent microscopy observations do not seem to align well with the E-P looping model (Benabdallah et al., 2019; Karr et al., 2022).
The development of genomic assays based on proximity-ligation of chromatin, commonly known as C-methods (Cullen et al., 1993; Dekker et al., 2002; Denker and de Laat, 2016), has revolutionized the study of genome spatial organization, providing a complementary view to microscopic approaches (Denker and de Laat, 2016; Jerkovic and Cavalli, 2021; McCord et al., 2020). Both the resolution and throughput of genome-wide C-methods were instrumental in defining salient structural features of mammalian chromatin, including genomic compartments, topologically associating domains (TADs) and cohesin-dependent CTCF point interactions (Dixon et al., 2012; Lieberman-Aiden et al., 2009; Nora et al., 2012; Rao et al., 2014; Rowley and Corces, 2018; Sexton et al., 2012). However, unlike CTCF-based interactions, very few specific E-P interactions were observed with conventional restriction enzyme-based C-methods. Initially, sequencing depth inherently limited the resolution of all vs all approaches such as Hi-C, and thus a number of targeted approaches were developed. In-solution hybridization (CaptureC, CaptureHi-C) or immunoprecipitation (ChIA-PET, HiChIP, PLAC-seq) are exploited in these methods to enrich for interactions of specific regions of interest, thus decreasing the sequencing burden and allowing higher resolution (Fang et al., 2016; Fullwood et al., 2009; Hughes et al., 2014; Mifsud et al., 2015; Mumbach et al., 2016). While targeted methods clearly revealed a statistical preference of promoters to interact with enhancer-like regions and enabled detection of some individual E-P interactions, in some cases functionally-verified enhancers did not show spatial proximity to the promoters of their targets as measured with C-methods (Benabdallah et al., 2019; Gupta et al., 2017), raising further questions concerning the universality and functional relevance of E-P looping.
Although targeted methods have circumvented sequencing-depth limitations, the resolution of C-methods based on restriction endonuclease digestion cannot overcome the inherent restriction fragment length limitation, which often results in a high level of background noise and the inability to distinguish CTCF-based interactions from other types of chromatin spatial interactions (Goel and Hansen, 2021). The introduction of MNase-based C-methods, such as Micro-C and its targeted oligonucleotide hybridization-based counterparts, MCC, Tiled-MCC, and RCMC, reduced the resolution limitation, allowing finer chromatin structures to be distinguished (Aljahani et al., 2022; Goel et al., 2022; Hsieh et al., 2020; Hua et al., 2021; Krietenstein et al., 2020). Crucially, the usage of MNase also allowed milder detergent pretreatment of crosslinked cells, increasing the sensitivity of Micro-C, MCC and related techniques by retaining E-P interactions, which appear to be more susceptible to experimental conditions than CTCF-based chromatin interactions (Canver et al., 2015; Fulco et al., 2016; Gasperini et al., 2020). In parallel, the development of genomic and epigenomic CRISPR-based functional screens have greatly expanded the amount of available data on functional E-P interactions. Taken together, these advancements now open the door for the systematic investigation of the spatiotemporal mechanisms underlying distal control of mammalian transcription.
Here we present a new MNase-based C-method, MChIP-C, which allows measuring promoter-centered interactions at single nucleosome resolution on a genome-wide scale. We applied MChIP-C to construct a nucleosome-resolution map of promoter-centered interactions in human K562 cells. Taking advantage of the significantly improved resolution and sensitivity of our approach compared to restriction endonuclease-based C-methods, we investigated the molecular underpinnings of promoter-centered chromatin spatial interactions and identified EP300 histone acetyltransferase and the SWI/SNF remodeling complex as candidates for establishing and/or maintaining E-P spatial contacts. Finally, by comparing our chromatin interaction data with previously published results of CRISPRi screens (Fulco et al., 2019; Gasperini et al., 2019), we found that most functionally verified enhancers do interact with their cognate promoters, supporting the E-P looping model of enhancer activity and suggesting that in general physical interactions are central to enhancer activity in mammalian cells.
Results
MChIP-C: antibody-based targeted measurement of genome architecture at single-nucleosome resolution
We developed MChIP-C, a novel MNase-based proximity ligation technique which combines the genome-wide throughput of HiChIP with the exceptional resolution and sensitivity of MCC (Fig. 1a). The protocol starts similar to MCC, where cells are crosslinked with formaldehyde and permeabilized with digitonin. This milder cell permeabilization procedure has been shown to provide higher sensitivity for detecting E-P interactions (Hua et al., 2021). Next, chromatin is digested with micrococcal nuclease, DNA ends are blunted and proximity ligated. This is followed by sonication and chromatin immunoprecipitation with a specific antibody, as in HiChIP or PLAC-seq (Fig. 1a). Crosslinks are then reversed, and DNA is purified, followed by library preparation and sequencing. As in MCC, we avoided using biotin to enrich for ligated fragments, potentially increasing library complexity, but also resulting in a low proportion of informative chimeric read pairs in the sequencing libraries. To partially mitigate this, we decreased the level of DNA fragmentation and selected for longer DNA fragments (see Methods).
We performed 4 biological replicates of MChIP-C experiment on K562 cells using anti-H3K4me3 antibodies to focus on spatial interactions of active promoters. The MChIP-C libraries were paired-end sequenced with a total of ∼3.5B read pairs, yielding ∼3B uniquely mapped reads (Fig. 1b; Fig. S1b; Dataset S02). ∼91.5% (∼2.77B) of these mapped within 1 kb of each other and ∼4% (∼120M) mapped further than 5 kb away from each other. While the 4% fraction of informative reads was low relative to restriction enzyme-based C-methods (e.g. >60% informative reads in PLAC-seq), it is comparable to MCC (0.5-5% informative reads, see Methods).
After read mapping, we identified active promoters by first selecting nucleosome-size convergently mapping read pairs to obtain H3K4me3 occupancy profiles. We found that the profiles of replicates were highly similar (Pearson correlation 0.95-0.97) and match standard H3K4me3 ChIP-seq (Pearson correlation 0.89-0.91) (Fig. S1c,d). Next, we identified 11,898 consensus peaks amongst replicates and found these mostly colocalized with active RNA polymerase II TSSs (Fig. S1e; Dataset S03).
We then proceeded to map promoter interactions. We defined the H3K4me3 peaks as interaction viewpoints and selected the reads mapping to each of these viewpoints, resulting in 11,898 4C-like viewpoint interaction profiles. To gain a general overview of the data, we summed the profiles of all viewpoints into a single genome-wide profile which we refer to as “merged MChIP-C” (Fig. 1b). Both viewpoint-specific and merged MChIP-C correlated well between replicates (Pearson correlation 0.83-0.93 for merged profiles and 0.64-0.76 for viewpoint-separated profiles) (Fig. S1f), and therefore we combined all replicates, resulting in single-nucleosome resolution interaction profiles for 10,949 active promoters, after filtering low-coverage viewpoints.
The obtained interaction profiles demonstrate that promoters interact with each other and with non-promoter regions (Fig. 1b). We systematically identified localized interactions by searching for 250 bp bins with significantly higher than expected MChIP-C signal. For each of the replicates, we found that 70-87% of its interactions were present in at least one other replicate (corresponding to the amount of sequencing per replicate; see Methods), suggesting that called interactions were generally reproducible. Overall, we detected 111,543 promoter-promoter bin (P-P) interactions and 128,236 promoter-nonpromoter bin interactions (Fig. 1c; Dataset S04). Both types of interactions were mostly localized within TADs (Fig. S1h). We excluded P-P interactions from subsequent analyses and focused on 128,236 promoter-nonpromoter interactions linking 10,707 viewpoints with 98,721 unique nonpromoter bins (Dataset S05) which we refer to as PIRs (Promoter Interacting Regions).
MChIP-C provides a sensitive genome-wide view of promoter-centered interactions
We then asked how MChIP-C compares to similar approaches based on standard restriction enzymes. HiChIP and PLAC-seq are directly comparable to MChIP-C, as both HiChIP and PLAC-seq combine Hi-C with chromatin immunoprecipitation. Specifically, we compared our data to those of Chen et al. (Chen et al., 2022), who used PLAC-seq with anti-H3K4me3 antibodies in K562 cells. We first visually compared the PLAC-seq and MChIP-C proximity ligation profiles of a number of genes with well-described regulatory landscape in K562 cells: MYC, GATA1 and HBG2. We found that MChIP-C profiles showed clear highly-localized interaction peaks corresponding to the CTCF-bound sites and enhancers known to control the expression of these genes (Fig. 2a; Fig. S2). For instance, the MYC gene has seven well-characterized K562-specific enhancer elements (e1-e7) spread along the 2 Mb region downstream of the gene (Fulco et al., 2016; Lin et al., 2022) and MChIPC allowed detection of interactions between MYC promoter and 5 of these sites. PLAC-seq profiles are by contrast much noisier and lack clear peaks corresponding to either CTCF sites or known enhancers of the interrogated genes.
Next, we compared MChIP-C and PLAC-seq in a more systematic way by examining their merged interaction profiles around promoter-distal DHSs (Fig.2b). We observed that in MChIP-C 22.9% of PIRs colocalize with CTCF-bound sites (Fig. 1c; Dataset S05). We separately examined DHSs with and without CTCF, and found that both types of DHSs show increased interaction signal relative to background. To quantify this, we compared the MChIP-C signal at DHS centers to more distant MChIP-C “background” signal within 1-5 kb of the DHS (Fig. S2b), and found a 2.52 median fold enrichment of DHS centers (3.06 with CTCF, and 1.98 without CTCF). In contrast, we found that PLAC-seq shows a very weak enrichment of interactions at DHSs, even for those bound by CTCF (1.24 median fold enrichment with CTCF and 1.24 without CTCF). Next, we compared standard Micro-C data previously reported for K562 to MChIP-C, using only the promoter-centered interactions from the Micro-C dataset (see Methods). Although better than PLAC-Seq, Micro-C shows a reduced signal-to-noise ratio at DHSs compared to MChIP-C (1.36 median fold enrichment with CTCF and 1.28 without CTCF). Thus, in line with other recent reports on MNase-based C-methods (Aljahani et al., 2022; Goel et al., 2022; Hsieh et al., 2020; Hua et al., 2021; Krietenstein et al., 2020; Ramasamy et al., 2022), our results suggest that MChIP-C achieves superior sensitivity and resolution compared to C-methods based on standard restriction enzymes. Additionally, in the context of promoter-centered interactions, our results suggest that MChIP-C also outperforms standard whole-genome Micro-C.
CTCF orientation-biased interaction of CTCF-bound sites and promoters
As we observed an abundance of CTCF-bound PIRs in MChIP-C data, we asked whether these CTCF-promoter interactions correspond to loop extrusion-driven loops between convergent CTCF sites easily noticeable on the Hi-C heatmaps (Fudenberg et al., 2016; Rao et al., 2017, 2014; Sanborn et al., 2015). About half of the identified promoter-CTCF interactions (21,152 out of 41,230) showed CTCF binding at the promoter side as well. Interestingly, only 568 (∼2.7%) of those directly correspond to CTCF-CTCF loops reported in a previous Hi-C study (Rao et al., 2014). The other half of the MChIP-C interactions containing a CTCF-bound PIR (20,078/41,230) do not show CTCF binding at the promoter side at all. For this set of CTCF-less promoters, we analyzed the orientation of CTCF motifs in PIRs relative to the position of the promoter. We found that promoters preferentially interact with CTCF-bound sites if CTCF motifs are oriented towards the promoter (79% towards vs 21% away) (Fig. 3a). This bias holds true regardless of the position of the CTCF site relative to the transcription direction (80% vs 20% for upstream CTCF sites and 79% vs 21% for downstream sites). We also observed this bias if both the promoter and the PIR have CTCF ChIP-seq signal, but CTCF motifs are not in convergent/divergent orientation (Fig. S3). We conclude that interactions of promoters with CTCF-bound sites are biased by CTCF orientation, regardless of CTCF binding at the promoter side. This orientation bias suggests the possible involvement of loop extrusion.
Promoter-interacting enhancers bind a distinct and diverse set of protein factors
We next asked whether the sensitivity and resolution of MChIP-C could allow us to examine the molecular underpinnings of promoter-centered interactions which are not CTCF-based. While the question of what factors underlie P-E interactions is fundamental, this type of analysis would be problematic with restriction enzyme-based C-methods due to their inability to precisely map loop anchors to protein binding sites. To determine protein factors associated with PIRs, we first overlaid MChIP-C PIRs with 271 ChIP-seq profiles for K562 cells (Dataset S01). Most of the analyzed transcription-related factors and histone post-translational modifications are substantially enriched in MChIP-C PIRs (4-25X) (Fig. 3b; Dataset S06). Notable exceptions are facultative heterochromatin-associated histone mark H3K27me3 (1113 overlaps, 1866 (SD=43) expected by chance) and LINE1-binding protein ZNF146 (184 overlaps, 174 (SD=12) expected by chance). As expected, CTCF (22,448 / 1,589 (SD=39)), cohesin subunits RAD21 (10,629 / 398 (SD=17)) and SMC3 (12,838 / 557 (SD=23)) are among the most highly enriched factors.
Rather than separately consider each factor associated with a PIR, we next used hierarchical clustering to partition PIR-overlapping DHSs (i.e. promoter interacting DHSs) (N=18,837) into groups according to the factor binding profile of each DHS. We identified 5 major clusters (Fig. 3c; Fig. S4a; Dataset S07). Clusters 1 (N=4,881) and 2 (N=4,086) were almost universally bound by the structural factors CTCF (98.2% and 97.3% in clusters 1 and 2 correspondingly), cohesin subunits SMC3 (87.2% and 73.8%) and RAD21 (70.4% and 60.8%) as well as zinc finger protein ZNF143 (89.5% and 69.5%). A total of 24,633 interactions with a median distance of ∼91 kb were anchored in DHSs from these two clusters (Fig. S4b). DHSs from clusters 3 (N=3,472), 4 (N=2,625) and 5 (N=3,773) contained almost no CTCF- or cohesin-bound sites. No factors were overrepresented in cluster 4, while clusters 3 and 5 were enriched in dozens of transcription-related factors. Notably, CRISPRi-confirmed enhancers (Fulco et al., 2019; Gasperini et al., 2019) were significantly overrepresented in cluster 3 DHSs (182/293 overlaps; one-sided binomial test, P = 4.85*10-61), and we thus labeled P-cluster 3 DHS interactions as regulatory interactions. Overall, there were 8,037 regulatory interactions, which tended to be shorter (median length ∼48kb) than structural ones (Fig. S4b). Interestingly, cohesin subunits were absent in the vast majority of class 3 DHSs. In agreement with a number of recent observations (Hsieh et al., 2022; Thiecke et al., 2020) this may indicate that cohesin is not involved in the maintenance of the regulatory interactions, albeit such a role was suggested earlier (Kagey et al., 2010; Phillips-Cremins et al., 2013). In conclusion, we find diverse types of promoter interacting DHSs associated with the binding of different sets of factors. However, the clustering analysis was not able to pinpoint a specific factor, beyond CTCF and cohesin, which would parsimoniously explain the observed promoter-DHS interactions.
Genomic distance, CTCF, cohesin and EP300 are key determinants of promoter-centered interactions
We next sought to pinpoint additional factors which underlie promoter-centered interactions and to directly test their predictive power. We used a greedy forward feature selection approach, based on a random forest regression model trained to predict the strength of MChIP-C interactions between promoters and DHS sites (N=104,164). The initial random forest model consisted of five features: DHS-promoter genomic distance, CTCF ChIP-seq signal at both the promoter and the DHS, and motif orientations on both the promoter and the DHS. This initial model was able to explain ∼37% of variance in the MChIP-C signal. Then, we iteratively added the most predictive feature out of a set of 270 ChIP-seq profiles, where predictivity is calculated as R2 in 3-fold cross validation. Using this strategy, cohesin subunit RAD21 and histone acetyltransferase EP300 were automatically selected, after which predictive performance plateaued at an R2 of ∼43% (Fig. 3d). We also found the same factors as the most predictive of binary outcomes (high/low MChIP-C signal) reaching an AUC of 0.825.
As CTCF and RAD21 are mostly associated with structural loops, we asked whether EP300 mainly underlies regulatory interactions. Indeed, we find that EP300 was 3-fold enriched in cluster 3 DHSs and 93.5% of cluster 3 DHSs showed EP300 binding. Interestingly, we noticed that the only proteins that had more binding sites in cluster 3 DHSs are subunits of the SWI/SNF remodeling complex: DPF2 (97.5%), ARID1B (95.6%) and SMARCE1 (93.6%) (Fig. 3c). EP300 has been found to directly interact with SWI/SNF-complex (Alver et al., 2017; Blümli et al., 2021) and their chromatin binding profiles are highly correlated (Pearson correlation 0.65-0.70). Revisiting the binding profiles of these SWI/SNF subunits, we found their predictive power to be very close to that of EP300 (Fig. S4c). We also specifically examined the predictive power of RNA polymerase II, mediator complex, YY1 and BRD4, which were previously suggested to be associated with enhancer-promoter interaction (Hnisz et al., 2017; Kagey et al., 2010; Papantonis and Cook, 2011; Weintraub et al., 2017), but found these features to be weaker predictors of MChIP-C signal (Fig. S4c). Thus, we suggest that EP300 and/or the SWI/SNF complex might be involved in the formation of regulatory chromatin interactions independently of СTCF and cohesin.
MChIP-C data are largely consistent with the looping model of enhancer activity
The apparent lack of interaction between experimentally-validated enhancers and their cognate promoters in some studies employing C-methods has raised doubts regarding the classical promoter-enhancer looping model. We thus asked whether the enhanced sensitivity and resolution of MChIP-C could shed some light on the fundamental question of whether enhancers should interact with their targets in order to activate them. To address this, we systematically compared MChIP-C profiles with functionally verified E-P pairs identified by CRISPRi screens in K562 (Fulco et al., 2019; Gasperini et al., 2019). We compared 351 functionally verified E-P pairs, in which targeting CRISPRi to the enhancer changed the expression of the target, with 53,660 putative non-regulatory DHS-P pairs, in which CRISPRi showed no effect. Notably, we found that 61.5% (216/351, 95% CI [56.2%; 66.6%]) of the verified pairs have underlying spatial interactions detected with MChIP-C, while only 6.3% (3,399/53,660, 95% CI [6.1%; 6.5%]) of non-regulatory DHS-P pairs show such interactions (Fig. 4a,b; Datasets S08 and S09). Thus, we conclude that a majority of experimentally-validated enhancers exhibit interaction with their target promoters.
Next, we attempted to reproduce this analysis with PLAC-seq, Micro-C and Hi-C as an additional evaluation of their power to detect enhancer-promoter interactions (Fig. S5a,b; Datasets S08 and S09). We used PLAC-seq interactions reported reviously by Chen et al. (Chen et al., 2022), and for Hi-C and Micro-C we used the entire genome-wide datasets to identify interactions using Mustache (Roayaei Ardakany et al., 2020). With respect to recall/sensitivity, the recall of MChIP-C (61.5%, as mentioned above) was superior to those of PLAC-seq (15.0%), Micro-C (19.7%), and Hi-C (2.3%). In spite of this, the precision of the methods was comparable, with 6.0% for MChIP-C, 4.1% for PLAC-seq, 7.1% for Micro-C, and 2.5% for Hi-C. In terms of the false positive rate, MChIP-C was the highest with 6.3%, compared to 2.0% for PLAC-seq, 1.7% for Micro-C, and 0.6% for Hi-C. To control for sequencing depth, we also repeated the analysis with our data down-sampled to 46% so that the valid MChIP-C reads matched the number of valid PLAC-seq reads and was lower than the respective number in Micro-C (see Methods). We found that down-sampled MChIP-C maintains a high recall of 54.7%, with slightly better precision (7.1%) and false positive rate (4.7%). These results suggest that MChIP-C identifies more enhancer-promoter interactions than the alternatives.
Finally, we inspected more closely cases in which validated E-P pairs did not show interaction in MChIP-C (Fig. 4b, iv-vi). First, we noticed that, in some instances, weak interactions were visually apparent but were missed by the interaction calling algorithm, especially in viewpoints that had low coverage (Fig. 4b, iv). Indeed, when we considered 175/351 validated E-P pairs corresponding to high-coverage viewpoints, 70.3% (123/175, 95% CI [62.8%; 76.8%]) had underlying MChIP-C spatial interactions. Second, in some cases we noticed adjacent (few kb) H3K27ac-positive regions clearly interacting with the expected promoter, possibly suggesting inexact identification of the enhancer position due to the limited resolution of the CRISPRi screens (Gasperini et al., 2019) (Fig. 4b, v). Third, CRISPRi-detected E-P interactions are not guaranteed to represent direct effects. In some cases, an enhancer may impose secondary influence through the action of the directly targeted gene (Gasperini et al., 2019). For example, CRISPRi showed that an enhancer near LMO2 affects both LMO2 and CAT, a gene approximately 400kb away. MChIP-C only shows interaction with LMO2 but not with CAT (Fig. 4b, vi). We hypothesize that regulatory relationships between this enhancer and CAT are mediated through transcription factor encoded by the LMO2 gene and thus it is unreasonable to expect spatial interaction between them. Additionally, we expect that interactions between enhancers and indirect targets would not be overrepresented at short genomic distances. Indeed, we observed that the genomic distance between non-interacting E-P pairs (median 75 kb) is substantially larger than between interacting E-P pairs (median 28 kb), supporting the possibility of indirect activation (Fig. 4c). Another possible source of indirect effects are CTCF-occupied insulators. As was pointed out previously (Fulco et al., 2019), these sites can affect the expression of genes in CRISPRi screens, but do not need to spatially interact with their targets. Indeed, 59 of 351 verified E-P pairs show CTCF occupancy at enhancer side, and 14 of these lack E-P interaction in MChIP-C. In summary, we conclude that the fraction of verified E-P pairs in which we observe a spatial interaction is likely underestimated, which further supports the notion that most enhancers physically interact with their targets and that many such interactions may have not been identified previously due to technical limitations of the employed methods.
Discussion
In this study, we present a new high-resolution genome-wide C-method, MChIP-C, which integrates proximity ligation of MNase-digested chromatin with chromatin immunoprecipitation and deep sequencing. Like other MNase-based C-methods, MChIP-C allows precise mapping of localized chromatin interactions including E-P contacts which often evade capture by restriction endonuclease-based C-methods (Goel and Hansen, 2021). Uniquely, MChIP-C combines the resolution and sensitivity of MCC with the genome-wide scale of Micro-C.
Applying H3K4me3-directed MChIP-C to K562 cells, we explored the spatial connectivity of active promoters. Single-nucleosome resolution and high sensitivity allowed us to map tens of thousands of localized spatial interactions between promoters and distal genomic sites. We focused on promoter interactions with non-promoter elements (PIRs). Although P-P interactions were clearly visible in our maps, they were excluded from most of the analyses. The promoter-centered map of chromatin spatial contacts represents a valuable resource for further research. The quality of the resulting genome-wide profiles is supported by the fact that previously identified spatial contacts, such as contacts of globin gene promoters with their enhancers, are clearly visible.
In agreement with previous observations, promoters appear to interact primarily with the surrounding CTCF sites. Interestingly, CTCF-motifs in these sites are typically oriented towards the interacting promoter (Valton et al., 2022). This observation can be explained by the putative ability of active promoters to block movement of the loop extrusion complex which can lead to the establishment of promoter-CTCF loops with CTCF motifs preferentially oriented towards the promoters (Fig. 3a, i) (Banigan et al., 2022). Alternatively, the observed bias might be a consequence of a more general phenomenon known as architectural stripes, where a CTCF-bound site blocks loop extrusion in a directional manner, causing nonspecific interactions with an extended nearby region (Vian et al., 2018) (which may include the gene promoter) (Fig. 3a, ii).
Similarly to other MNase-based C-methods, MChIP-C is sensitive enough to identify many relatively weak chromatin interactions that are independent of CTCF binding. The most intriguing group of such spatial contacts is E-P regulatory interactions. We identified 8,472 interactions linking active promoters with distal DHSs bound by a set of enhancer-specific factors. These DHSs are highly enriched with functionally verified K562 regulatory sequences and contain chromatin marks typical for active enhancers (Fig. 3c; Fig. S4a). The increased proximity between promoters and these enhancer-like DHSs presumably represents regulatory E-P interactions.
One of the long-standing hypotheses explaining enhancer action at a distance is the E-P looping model (Popay and Dixon, 2022; Ptashne, 1986) which assumes that direct physical contact between an enhancer and its target promoter is an essential prerequisite for the promoter activation. Although it is known that at least some enhancers do establish physical contacts with their targets, technical limitations of methods assessing spatial proximity as well as scarcity of functional data on mammalian enhancers prevented accurate systematic evaluation of the E-P looping model. To explain the elusiveness of E-P interactions, it has been suggested that their highly dynamic nature may preclude capturing with existing proximity ligation techniques (Schoenfelder and Fraser, 2019). Here, we showed that 61.5% of functional enhancer-promoter pairs previously identified in K562 cells using CRISPRi screens (Fulco et al., 2019; Gasperini et al., 2019) do establish interactions, implying their spatial proximity. Moreover, due to various technical reasons, including insufficient MChIP-C coverage, CRISPRi enhancer mapping mistakes, and indirect regulatory effects, this fraction probably represents an underestimation of the true E-P interaction rate. Thus, it is likely that the majority of active enhancers directly interact with target promoters.
In addition to identifying interactions between promoters and their CRISPRi-validated enhancers, MChIP-C found many more apparently non-functional interactions of promoters with DHSs which did not show a significant effect on gene expression. Such interactions are also often observed in alternative C-methods, and in some cases could represent CRISPRi false negative pairs. These interactions could also include structural loops as well as pre-established interactions of promoters with conditionally-activated regulatory elements (Comoglio et al., 2018; Jin et al., 2013), and warrant further investigation leveraging the increased sensitivity and resolution of MChIP-C.
Although it is generally assumed that in the C-methods ligation occurs only between fragments that are in close spatial proximity, the mobility of DNA ends within fixed nuclei has not been extensively studied. It is possible that DNA ends available for ligation can scan a certain territory and thus MChIP-C detected interactions do not exactly represent direct physical contacts. However, we argue that our data demonstrates that, in most cases, enhancers establish some form of spatial interaction with their targets, resulting in increased ligation frequency.
Though we have shown that active mammalian enhancers in general interact with the target promoters, we cannot infer that this interaction is necessary for target activation. Testing whether it is possible to decouple spatial interactions from gene activity would require understanding the physical nature of these interactions. Various protein factors and a number of physical mechanisms including canonical protein-protein interactions, molecular crowding and liquid-liquid phase separation were implicated in this phenomenon (Hnisz et al., 2017; Kagey et al., 2010; Papantonis and Cook, 2013). One protein complex which was suggested to be involved is cohesin (Kagey et al., 2010; Phillips-Cremins et al., 2013). However, in line with studies exploring cellular responses to the acute loss of the cohesin complex (Hsieh et al., 2022; Thiecke et al., 2020), we observed almost no overlap between regulatory E-P interactions and cohesin binding. Different groups proposed other candidates for the role: the mediator complex, BRD4, YY1 and RNA polymerase II (Hnisz et al., 2017; Kagey et al., 2010; Papantonis and Cook, 2013; Phillips-Cremins et al., 2013; Weintraub et al., 2017). However, recent studies did not identify drastic changes in E-P interaction frequency caused by the degradation or the inhibition of any of these proteins (Crump et al., 2021; Hsieh et al., 2022; Ramasamy et al., 2022; Zhang et al., 2022).
In order to identify proteins that may be involved in E-P interaction, we assessed the ability of an assortment of ChIP-seq binding profiles to predict the strength of promoter-PIR interaction measured by MChIP-C. As expected, the binding of CTCF and cohesin best predicted the strength of the MChIP-C signal, as the latter is dominated by structural promoter-CTCF interactions. However, we also identified EP300 and SWI/SNF complex as strong predictors of MChIP-C interaction, and they are enriched in enhancer-like PIRs lacking CTCF. These hallmark enhancer proteins were previously shown to bind each other and to share chromatin interaction sites genome-wide (Alver et al., 2017; Blümli et al., 2021). Unlike cohesin, mediator, BRD4, YY1 and RNA polymerase II, EP300 and SWI/SNF have rarely been considered as mediators of E-P interaction. However, in our analysis, they consistently outperformed other enhancer-associated factors in terms of predictive power. Although these results may be affected by differences in the quality of the ChIP-seq data, the potential role of EP300 and SWI/SNF in E-P interaction warrants further experimental exploration. Since both EP300 and SWI/SNF are known as chromatin modifiers rather than structural proteins (albeit some studies suggest such a role for EP300 (Ma et al., 2021)), it is possible that their participation in E-P interaction is indirect. To this end, a scenario involving histone acetylation by EP300 may be considered. Although H3K27 residue is a well-documented EP300 nucleosomal target (Tie et al., 2009), a recent study identified several other lysine residues localized in H2B N-terminus that are acetylated by EP300 (Narita et al., 2022a, 2022b; Weinert et al., 2018). These H2B N-terminus acetylation marks correlate with the enhancer activity more strongly than H3K27ac and therefore qualify for being involved in E-P interaction.
Methods
MChIP-C experimental procedure
K562 cells were cultivated in DMEM medium supplemented with 10% fetal bovine serum and 1× penicillin/streptomycin in a humidified 37 °C incubator with 5% CO2. ∼3.5M cells were crosslinked for 10 min at room temperature in 2.5 mL of fresh full growth medium supplemented with 2% formaldehyde (Sigma-Aldrich, F8775). To quench the reaction, glycine was added to reach a final concentration of 0.2 M; the suspension was immediately transferred to ice and incubated there for 5 min. Crosslinked cells were centrifuged at 300g and 4°C, washed with cold phosphate-buffered saline (PBS) and centrifuged again. Pellets were resuspended in 350 μL of cold PBS supplemented with 0.5× protease inhibitor cocktail (PIC) (Bimake, B14001) and 0.5 mM PMSF. Digitonin (Sigma-Aldrich, D-5628) stock solution (1% in DMSO) was added to cells to reach a final concentration of 0.01% A 1 mL tip was used to carefully mix the suspension. Cells were permeabilized on ice for 7 min then centrifuged at 300g and 4°C and resuspended in 800 μL of MNase digestion buffer (10 mM Tris-HCl pH 7.5, 1 mM CaCl2). To achieve a sufficient level of chromatin digestion (50-75% of DNA in mononucleosomal fragments) we added 35-40U of MNase (Thermo Scientific, EN0181) and incubated the cells at 37°C with shaking for 1h. EGTA was added to reach a final concentration of 5 mM in order to stop digestion. Cells were centrifuged at 300g and room temperature, resuspended with a low-retention tip in PBS and transferred to a new, low-retention, tube. We pelleted the material once again at 1000g and resuspended it in 530 μL 1.05× ligation buffer (ThermoScientific, EL0012) supplemented with 0.105 mM dNTPs and 2.1 mM ATP. 50 μL of the material at this stage was separated as a control of digestion efficiency. 100U of T4 Polynucleotide Kinase (NEB, M0201L) and 50U of Klenow fragment of DNA I polymerase (NEB, M0210L) were added to the remaining material in order to repair DNA ends; the mix was incubated at 37 °C with shaking. After 1h 37.5U of T4 DNA ligase (ThermoScientific, EL0012) were added to the reaction and incubation was continued at 37°C for an additional hour. Before leaving the ligation reaction overnight, the mixer was cooled to 20°C and 37.5U more of T4 DNA ligase were added.
The next day, we took 50 μL aliquot as a ligation control, transferred the remaining chromatin to a 2 mL tube and centrifuged it at 5000g. The pellet was resuspended in 550 μL of ice cold ChIP lysis buffer (50 mM Tris-HCl pH 8.0, 1% SDS, 10 mM EDTA) supplemented with 0.5× PIC (Bimake, B14001) and 0.5 mM PMSF. All buffers used in the following sonication and immunoprecipitation steps were precooled to 4°C. We solubilized chromatin with 3 15-sec ultrasound pulses on “15” power setting of VirSonic 100 (VirTis) sonicator. Then, we cleared the solubilized chromatin with 21,000g centrifugation and transferred the supernatant fraction to a 30-kDa Amicon filter (Millipore, UFC503096). We changed the solvent with two successive washes with RIPA buffer (50 mM Tris–HCl pH 8.0, 150 mM NaCl, 1% Triton X-100 (v/v), 0.5% sodium deoxycholate, 0.1% SDS). After the final wash, we transferred the material to a new 1.5 mL tube, brought the volume of RIPA to 1 mL and supplemented it with 1× PIC (Bimake, B14001). Subsequent incubation with antibodies, immunoprecipitation and DNA isolation were performed as previously described (Golov et al., 2021). We used 5 μg of polyclonal anti-H3K4me3 antibodies (Active motif, 39016) per MChIP-C experiment. DNA of digestion and ligation controls was isolated in parallel with the immunoprecipitated DNA. After ethanol precipitation, DNA was reconstituted from each sample (controls and the MChIP-C) in 10 μL of 10 mM Tris-HCl buffer (pH 8.0). To assess efficiency of the digestion and ligation, the control samples were run on a 1.5% agarose gel (Fig. S1a).
MChIP-C NGS libraries were prepared from immunoprecipitated DNA with the NEBNext Ultra II DNA Library Prep Kit for Illumina (NEB, E7645) according to the manufacturer’s recommendations, with the following modifications: 1) we decreased the reaction volumes two-fold; 2) instead of the adaptor included in the kit, we used 1 μL of barcoded Y-shaped Illumina TruSeq adapters (Illumina, 20015960 and 20015961) per reaction and extended ligation time to 1h. We used 17 μL (∼0.37×) of AMPure XP beads (Beckman Coulter, A63881) in the pre-PCR clean-up to select longer DNA fragments and therefore enrich the final library with ligation-informative pairs. Adapter-ligated material was amplified with 12-14 cycles of PCR using KAPA HiFi HotStart PCR Kit (Roche, 07958897001) and universal P5/P7 Illumina primers (see Golov et al., 2019). Post-PCR clean-up was performed without size selection using 1.3× volume of AMPure XP beads (Beckman Coulter, A63881).
Overall, we performed 4 biological replicates of H3K4me3 MChIP-C. The libraries were paired-end sequenced (PE100)on Illumina NovaSeq 6000 and BGI DNBSEQ-T7 devices with ∼0.5-1.5B read pairs per library (Fig. S1b, Dataset S02).
H3K4me3 ChIP-seq
∼3.5M K562 cells were crosslinked and permeabilized as described above. Permeabilized cells were resuspended in 550 μL of ice cold ChIP lysis buffer (50 mM Tris-HCl pH 8.0, 1% SDS, 10 mM EDTA) supplemented with 0.5× PIC (Bimake, B14001) and 0.5 mM PMSF. Chromatin was sheared with 4 30-sec ultrasound pulses on “15” power setting of VirSonic 100 (VirTis) sonicator. Incubation with antibodies, immunoprecipitation, DNA isolation and NGS library preparation were performed as described MChIP-C protocol, except for the pre-PCR clean-up step where DNA was cleaned with 35 μL (0.76×) of AMPure XP beads (Beckman Coulter, A63881). 12 cycles of PCR were used to amplify the ChIP-seq library, and it was paired-end sequenced (PE100) on a Illumina NovaSeq 6000 sequencer with ∼6M read pairs. The sequencing reads were aligned to the hg19 genome build and a genome-wide H3K4me3 occupancy profile was generated as described previously (Golov et al., 2021).
MChIP-C data primary processing
The sequenced reads of each replicate were mapped to the hg19 genome build with BWA-MEM (with -SP5M flags) and parsed with the pairtools package (v.0.3.0) (Open2C et al., 2023). Mononucleosomal (>100 bp and <201 bp apart; convergent) and distal (>5,000 bp apart; cis) read pairs were isolated for downstream analysis.
Mononucleosomal reads were down-sampled to 20M pairs, deduplicated with pairtools and converted to BAM files using samtools (v.1.15.1). Mononucleosomal peaks for each individual replicate were called with MACS2 (v.2.2.7.1) (FDR < 0.0001, maximum gap = 1,000 bp) and 50-bp occupancy profiles were generated with deeptools (v.3.5.1). To compare mononucleosomal profiles of replicates with each other and with conventional H3K4me3 ChIP-seq signal, we calculated read coverage in K562 DHSs (see Dataset S01 for the source of K562 DHS data) for each experiment with deeptools (‘multiBigwigSummary BED-file’ command), and compared pairwise Pearson correlation coefficients.
We considered genomic regions overlapping mononucleosomal peaks in at least 3 replicates to be consensus peaks. Nearby (< 1,000 bp apart) consensus peaks were merged with bedtools (v2.26.0) resulting in 11,898 MChIP-C viewpoints (Dataset S03). DNase sensitivity, H3K4me3 ChIP and CAGE signals (see Dataset S01 for the source of K562 CAGE data, two CAGE replicates were averaged with bigwigCompare from deeptools) in a 10 kb window centered on each MChIP-C viewpoint and distal DHSs were calculated and plotted with deeptools (‘computeMatrix reference-point’ & ‘plotHeatmap’ commands).
Next, we deduplicated distal read pairs using pairtools, produced BAM files and filtered out read pairs that did not map to the identified MChIP-C viewpoints using samtools. Filtered binary files were converted to replicate-specific coverage profiles, or merged MChIP-C profiles, which were then piled up together into an aggregate merged profile. To analyze and visualize interaction profiles of individual viewpoints, we selected from the merged profile a subset of reads mapping to the viewpoint of interest with samtools (‘samtools view -P’ command).
To evaluate reproducibility of the distal MChIP-C signal in replicates and to identify promoter-centered interactions, we analyzed the distribution of MChIP-C reads in 250 bp genomic bins. We started with deduplicated distal read pairsam files produced by pairtools; using the data on MChIP-C viewpoints, we extracted positions of distal read pairs mapping to the viewpoints with pairtools (‘pairtools split’ command) and bedtools (‘bedtools pairtobed’ command). Each of these read pairs was assigned to a viewpoint; if two reads in a pair mapped to two different viewpoints, the pair was assigned to both viewpoints. Then we overlapped (‘bedtools intersect’ command) the positions of the mate reads (other-end reads) with 250 bp genomic bins and aggregated the pairs mapping to the same viewpoint and the same other- end bin generating 4 replicate-specific BEDPE files. Each line of the resulting files contained the count of read pairs for a unique viewpoint-genomic bin (other-end bin) pair. To calculate inter-replicate correlation of viewpoint-separated MChIP-C signal, we selected pairs contained within a 1Mb genomic window with only one end mapping inside a viewpoint and estimated inter-replicate Pearson correlation. We also estimated inter-replicate correlation of the merged data. To achieve this, we used merged MChIP-C profiles to create 250 bp-binned coverage dataframes with deeptools (‘multiBigwigSummary bins’ command), then filtered out bins overlapping viewpoints as well as all bins containing zero reads in at least one replicate and finally calculated inter-replicate Pearson correlation coefficients. To generate distance decay curves, we calculated aggregated MChIP-C signal from replicate-specific BEDPE files in 30 distance bins of equal size on a log10 scale between 3.5 (3,162 bp) and 6.5 (3,162 kbp).
MChIP-C interaction calling
Before calling interactions, we normalized MChIP-C signal in each replicate by the replicate-matched mononucleosomal signal. We did this to account for differences of genomic regions in their propensity to be immunoprecipitated with H3K4me3 antibodies. To consider both viewpoint and other-end bin H3K4me3 occupancy, we normalized by mean mononucleosomal coverage of these two regions (we used ‘samtools bedcov’ command to estimate coverage of each viewpoint and each genomic bin).
We first identified significantly interacting viewpoint-other-end pairs in each replicate separately. In order to perform interaction calling, we wanted different viewpoint-specific profiles to be comparable, thus we filtered out pairs separated by more than 2.5 Mb and normalized signal once again, by viewpoint coverage. The viewpoint coverage was defined as a sum of all H3K4me3 occupancy-normalized MChIP-C signal pertaining to the specific viewpoint. The final MChIP-C signal was normalized twice (by H3K4me3 occupancy and by viewpoint coverage), we refer to it as normalized signal. To identify viewpoint-other-end pairs with signal significantly exceeding the background, we first assigned each analyzed pair with the distance rank (from 1 for other-end bins directly neighboring the viewpoints to 10,000 for bins separated by 2.5 Mb from the viewpoints). Then, in each distance rank starting with the 20th (since pairs of distal reads were separated by at least 5,000 bp we had almost no signal in the first 19 distance ranks) and ending with the 4,000th, we fitted a Weibull distribution-based background model and called pairs with p-value<0.01 and more than 3 reads interactions. We also included pairs with distance rank more than 4,000 and more than 3 reads in replicate-specific interaction list.
We combined all viewpoint-other-end pairs identified as significant interactions in at least one replicate in one dataset. We tested whether each of the interaction in this combined list had a corresponding viewpoint-other-end pair in each of the four individual replicate (we defined correspondence as cases where viewpoints are exactly aligned and other ends are aligned exactly or located within 1kb from each other). Then for each replicate we calculated proportion of interactions without a corresponding interaction present in at least one another replicate.
Next, we identified significantly interacting viewpoint-other-end pairs in an aggregate MChIP-C dataset. We summarized all replicate-specific distal MChIP-C signals in each viewpoint-other-end pair and performed interaction-calling procedure described above with minor changes. To reduce noise and avoid calling false positive interactions, we focused on 10,949 viewpoints with raw total coverage (viewpoint coverage before H3K4me3 occupancy normalization in all 4 replicates combined) of more than 1,000 read pairs. We also increased to 6 the threshold for minimal number of raw reads (in all 4 replicates combined) in bins which we considered to call viewpoint interacting bins. Finally, we divided all identified interactions into two groups: promoter-promoter (P-P) and promoter-nonpromoter (P-PIR) – depending on whether the other-end bin was localized within a viewpoint (Dataset S04).
Analysis of MChIP-C interactions crossing TAD boundary
We used K562 TAD boundaries from Rao et al. (Rao et al., 2014). The expected number of interactions crossing TAD boundaries was defined as a mean count of cross-boundary interactions in 100 independent random permutations changing the positions of the interaction other-ends while preserving the interaction distance distribution. The position of other-end bins in each permutation was established by random reshuffling of distances separating viewpoints from their original interaction partners (bins within other viewpoints or PIRs).
MCC data reanalysis
To estimate the typical proportion of valid ligation pairs in MCC sequencing output, we reanalyzed publicly available datasets from Hua et al. (Hua et al., 2021) and Downes et al. (Downes et al., 2021). We downloaded fastq files for at least 3 biological replicates from each interrogated cell type and mapped them to hg19 or mm10 genomes with bwa (bwa mem - SP5M). Then we parsed reads with pairtools and calculated the fraction of cis-read pairs separated by more than 5,000 bp.
PLAC-seq data reanalysis
To compare MChIP-C data with a similar restriction-endonuclease based C-method, we reanalyzed publicly available H3K4me3 PLAC-seq experiments (Chen et al., 2022). Raw sequencing reads were downloaded from the 4D Nucleome Data portal (https://data.4dnucleome.org/experiment-set-replicates/4DNESWX1J3QU/#raw-files) and processed similarly to MChIP-C reads. We relied on MChIP-C-defined viewpoints to obtain merged PLAC-seq profiles and viewpoint-specific PLAC-seq profiles. We used restriction fragment-length bins to build viewpoint-specific PLAC-seq profiles and fixed 250 bp bins to build merged profiles.
For downstream analysis, we utilized PLAC-seq interactions identified by the authors of original publication (GEO, GSE161873). We merged interactions identified in two individual experimental replicates in one set and lifted them to hg19 genome assembly with the UCSC LiftOver tool (https://genome.ucsc.edu/cgi-bin/hgLiftOver).
Micro-C data reanalysis
We also compared MChIP-C with a recently published K562 Micro-C data (Barshad et al., 2023). Raw Micro-C sequencing reads were downloaded from GEO (GSE206131) and analyzed with the pipeline we used for MChIP-C and PLAC-seq data processing. We once again relied on MChIP-C-defined viewpoints to extract merged promoter-anchored Micro-C profiles and viewpoint-specific Micro-C profiles. The data was binned in 100 bp or 250 bp bins to plot viewpoint-specific interaction profiles. In all other types of analysis we used fixed 250 bp bins.
We also created all-to-all Micro-C proximity ligation matrices using conventional distiller pipeline (v. 0.3.4) (https://github.com/open2c/distiller-nf). To generate iteratively balanced promoter-anchored Micro-C profiles, we dumped 250 bp binned cooler matrix to a text stream (‘cooler dump’ command) and then selected balanced signal from the bins residing within MChIP-C viewpoints. We then aggregated this signal into a merged genome-wide bedgraph file and converted it to a bigwig profile using bedgraphToBigwig UCSC utility.
To identify point interactions in the Micro-C dataset, we used Mustache (v 1.0.1) algorithm (Roayaei Ardakany et al., 2020). We called interactions in balanced contact matrices at resolutions of 250 bp, 500 bp, 1 kb, 2 kb, 5 kb and 10 kb using -pt 0.1 and -st 0.88 options. We combined interactions called at all resolutions in one set; if an interaction was detected at multiple resolutions, we retained a variant with coarser boundaries and discarded finer ones.
Hi-C data reanalysis
Raw Hi-C sequencing reads were downloaded from GEO (GSE63525) and all-to-all Hi-C proximity ligation matrices were generated with distiller pipeline. To obtain raw and iteratively balanced promoter-anchored Hi-C profiles, we dumped 1 kb binned cooler matrix to a text stream (‘cooler dump’ command) and then selected raw and balanced signal from the bins residing within MChIP-C viewpoints. We then aggregated these signals into two separate merged genome-wide bedgraph files (one containing raw signal and another containing balanced signal) and converted them to bigwig profiles with bedgraphToBigwig UCSC utility.
To identify point interactions in the Hi-C dataset, we used Mustache algorithm. We called interactions in balanced contact matrices at resolutions of 1 kb, 2 kb, 5 kb, 10 kb and 20 kb using -pt 0.1 and -st 0.88 options. We combined interactions called at all resolutions in one set; if an interaction was detected at multiple resolutions, we retained a variant with coarser boundaries and discarded finer ones.
Comparison of merged MChIP-C, PLAC-seq, promoter-anchored Micro-C and promoter-anchored Hi-C signals around distal DNase hypersensitive sites
DNase sensitivity, CTCF ChIP, H3K4me3 ChIP, merged MChIP-C, merged PLAC-seq and merged promoter-anchored Micro-C and Hi-C signals spanning 5 kb regions flanking distal CTCF sites and CTCF-less DHSs were plotted as heatmaps and average profiles with deeptools. Distal CTCF sites with the peak score (the 5th column of CTCF peak file) less than 250 and DHS sites with the signalValue (the 7th column of DHS peak file) less than 200 were excluded from the analysis. In all subsequent analysis concerning DHSs we used the same signalValue cutoff.
To plot violin graphs for proximity ligation signal in and around distal DHSs, we used 250-bp binned signal matrices generated by deeptools as an intermediary during heatmap generation. For each proximity ligation method, we compiled a set of signal bins and a set of background bins. For the signal set, we carved out from the matrix a pair of central columns corresponding to bins surrounding each individual analyzed distal DHS. For the background set, we selected 32 columns corresponding to bins separated by 1-5 kb from the center of DHSs (16 columns from each side of the DHSs).
Analysis of CTCF motif orientation in viewpoints and PIRs
The presence of CTCF motifs in viewpoints and PIR bins, as well as their orientation, were determined with HOMER (v4.11.1) (“annotatePeaks.pl”). To compare MChIP-C-identified CTCF-based interactions with Hi-C loops, we used the list of 6,057 K562 Hi-C loops identified by Rao et al. (Rao et al., 2014).
To evaluate number and orientation of CTCF-motifs in randomized sets of PIRs, we performed 100 independent random permutations, changing the positions of the PIRs while preserving P-PIR distance distribution. Then we assessed the presence and orientation of CTCF motifs in each generated set of PIRs and averaged the obtained results to get the number and distribution of motifs per randomization.
Transcription related factor enrichment analysis in PIRs
To assess transcription-related factor (TRF) enrichment in PIRs, we exploited a compendium of publicly available K562 ChIP-seq profiles (see Dataset S01 for a complete list of ChIP-seq datasets used in the present study). The majority of the analyzed datasets were downloaded from the ENCODE portal (268 out of 271). CDK8 (Pelish et al., 2015) (CistromeDB: 56379) and MED1 (CistromeDB: 74667) occupancy data were downloaded from the Cistrome database, BRD4 (Liu et al., 2017) – from GEO (GSM2635249). Hg38 datasets were lifted to hg19 with CrossMap (v.0.6.0). All ChIP-seq TRF peaks with the signalValue (the 7th column of each peak file) less than 25 were excluded from all the TRF-related analysis.
First, we overlapped MChIP-C PIRs with TRF binding sites. To estimate relative enrichment, we calculated the expected number of overlaps. We did this for each individual TRF by averaging the overlap counts in 100 independent random permutations, changing the positions of the PIRs while preserving P-PIR distance distribution. The position of PIRs in each permutation was established by random reshuffling of distances separating viewpoints from their original PIRs. Enrichment of TRF binding was then calculated as log2 of the ratio between the experimentally observed and the expected number of overlaps. Enrichment of TRF motifs in PIRs was evaluated with the HOMER package (“findMotifsGenome.pl” command).
PIR-overlapping DHS clustering and cluster annotation
To separate 18,837 PIR-overlapping DHSs (promoter interacting DHSs) into biologically-relevant groups according to their TRF binding, we used hierarchical clustering with binary features reflecting the binding status of the analyzed TRFs. Here we used only 167 TRFs bound to more than 1,000 promoter interacting DHSs and PIR enrichment higher than 2. We used the Ward’s Minimum Variance distance metric for clustering and chose a cutoff of 5 clusters.
For each cluster, we calculated the number of DHSs bound by each of the analyzed TRFs. Next, for each cluster we identified 20 TRFs which were bound to the greatest number of DHSs. Then we merged these 5 sets of TRFs into one list which was comprised of 28 unique TRFs. We plotted the binding of these selected TRFs in PIR-overlapping DHSs as a heatmap (Fig. 3c). To characterize the identified clusters, we overlapped cluster-assigned DHSs with K562 ChromHMM states (Ernst et al., 2011). We also overlapped DHSs with CRISPRi-verified enhancers from Fulco et al. (Fulco et al., 2019) and Gasperini et al. (Gasperini et al., 2019) in order to assess enhancer enrichment of each cluster with a one-sided binomial test.
To calculate the number and distance distributions of MChIP-C interactions associated with the identified DHS clusters, we overlayed the complete set of P-PIR interactions with positions of the promoter-interacting DHSs.
Random forest regression for modeling of MChIP-C signal
We used a random forest regression model (R library “ranger”) to predict normalized MChIP-C signal and to identify predictive features. We focused on a subset of viewpoint-DHS pairs separated by <250 kb. We identified 104,166 such viewpoint-DHS pairs with a non-zero count of MChIP-C reads and filtered out 2 outliers (chr16:214,895-215,045 and chr5:693,455-693,605) which were 175 and 82 IQRs above the third quartile. For training the initial model, we used the following features: viewpoint-DHS distance, CTCF motif orientation on both the viewpoint and DHS and CTCF ChIP-seq signal on both the viewpoint and DHS. Training parameters were default with mtry=3, num.trees=100.
Next, we used a greedy forward feature selection approach to add 10 features out of a set of 270 TRF ChIP-seq signals on the DHSs. At each feature selection step, we evaluated all features and added the most predictive feature to the model. To assess the predictive power of a feature, we retrained the model with the feature and used 3-fold cross-validation to evaluate either R2 or AUC (where MChIP-C was partitioned as higher/lower than the median). To assess reproducibility of the greedy feature selection, we repeated the entire procedure 3 times.
Estimating correlation between chromatin binding of EP300 and SWI/SNF-complex
To compare chromatin binding profiles of histone acetyltransferase EP300 and of the ARID1B, DPF2, SMARCE1 subunits of SWI/SNF-complex in K562 cells, we first calculated ChIP-seq read coverage in K562 DHSs for each of the proteins with deeptools (‘multiBigwigSummary BED-file’ command). Then we calculated Pearson’s correlation between EP300 coverage and coverages of each of the studied SWI/SNF subunits.
Downsampling of MChIP-C data and analysis of the downsampled datasets
For more thorough comparison of MChIP-C with other C-methods in terms of their ability to detect increased ligation frequency between distal regulatory elements and their target promoters, we downsampled generated MChIP-C dataset. We wanted to decrease the total number of unique distal pairs mapping to the viewpoints in the downsampled MChIP-C dataset so that it would be equal to the number of such reads in the complete PLAC-seq and Micro-C datasets. First, we evaluated the numbers of such pairs for all three approaches. It turned out that PLAC-seq and Micro-C datasets contained approximately two times less useful pairs than complete MChIP-C dataset (∼19.75 mln. and ∼23.15 mln. vs ∼42.7 mln.). Thus, we sampled 46% of pairs from 4 files containing deduplicated viewpoint mapped pairs corresponding to 4 biological replicates of MChIP-C (‘pairtools sample’ command). The downsampled dataset included ∼19,6 mln. pairs. We performed interaction calling in the downsampled MChIP-C dataset with the almost exactly same pipeline we used previously for the complete aggregate MChIP-C dataset. We made only two minor changes: decreased minimal raw coverage to 500 for viewpoints included in the analysis and decreased to 4 the threshold for minimal number of raw reads (in all 4 downsampled replicates combined) in tested viewpoint-other-end pairs.
Comparing MChIP-C interactions with functionally verified E-P pairs
We used K562 CRISPRi datasets from Fulco et al. (Fulco et al., 2019) and Gasperini et al. (Gasperini et al., 2019) to compile a list of 351 verified E-P pairs and 53,660 non-regulatory DHS-P pairs (Datasets S08 and S09). In order to directly compare these pairs to MChIP-C identified chromatin interactions, we included in this list only pairs with promoters overlapping MChIP-C viewpoints and pairs with distal sites within 5kb-1Mb of the promoter.
Then we overlapped the obtained E-P and DHS-P pairs with interactions identified with various C-methods: MChIP-C (complete and downsampled versions), PLAC-seq, Micro-C and Hi-C. Before overlapping we extended PIRs of the identified MChIP-C interactions by 500 bp in both directions, and loop anchors of the identified Micro-C and Hi-C interactions by 2500 bp in both directions. Using the obtained results, we separated E/DHS-P pairs in 4 groups for each analyzed method: DHS-P pairs lacking interactions, DHS-P pairs with underlying interactions, E-P pairs lacking interactions and E-P pairs with underlying interactions. We excluded all CTCF-bound E/DHS sites and visualized normalized MChIP-C signal as well as raw PLAC-seq, Micro-C and Hi-C signal around each individual E/DHS site (reflecting proximity between the site and the paired promoter) as a heatmap. To generate promoter-anchored Micro-C and Hi-C signal tracks, we dumped cooler matrices (250 bp binned for Micro-C and 1 kb binned for Hi-C) in a text streams (‘cooler dump’ command), selected signal from bins overlapping MChIP-C viewpoints and aggregated signal pertaining to each individual viewpoint to obtain MChIP-C-like BEDPE files. We generated such files for both raw and ICE-balanced signals. We finally plotted average MChIP-C (normalized), PLAC-seq (raw), Micro-C (raw and balanced) and Hi-C signal (raw and balanced) for each of the 4 E/DHS-P groups as an averaged profile.
We calculated sensitivity (recall), precision and false positive rate for the predictors based on the presence of interactions between promoters and their potential regulatory elements (E/DHS) identified with each of the analyzed C-methods. As a reference we used predictors based on the assignment of each DHS to the closest active gene (TPM > 0.5, from ENCODE ENCFF934YBO dataset) as its enhancer or the assignment of each DHS to the two closest active genes as their enhancer. Also as a reference we built a precision-recall and receiver operating characteristic curves for a predictor based on the thresholds inversely proportional to the genomic distance between the DHS/enhancer and TSS of the potential target gene.
Data processing and visualization
The following software and packages were used for analysis and visualization: bwa (v.0.7.17), Bowtie2 (v.2.3.4), pairtools (v.0.3.0), samtools (v.1.15.1), bedtools (v2.26.0), deeptools (v.3.5.1), R (v. 4.2.1; dplyr v.1.0.9, tidyr v.1.2.0, ggplot2 v.3.3.6, gplots v. 3.1.3, data.table v.1.14.8, GenomicRanges v.1.48.0, reshape2 v.1.4.4, fitdistrplus v.1.1-8, RColorBrewer v.1.1-3, dendextend v.1.16.0, dendroextras v.0.2.3, GGally v.2.1.2, ranger v.0.14.1, caret v.6.0-93, PRROC v.1.3.1), HOMER v4.11.1, CrossMap (v.0.6.0), Integrative Genomics Viewer (v.2.8.0), Adobe Illustrator (v.23.0.1).
Acknowledgements
This work was supported by the Russian Science Foundation (21-64-00001) and by the Russian Ministry of Science and Higher Education (075-15-2021-1062). MChIP-C experiments were performed using the equipment of IGB RAS facilities, supported by the Russian Ministry of Science and Higher Education.
Data, Materials, and Software Availability
MChIP-C and ChIP-seq sequencing data as well as derivative genomic profiles generated in this publication are available in National Center for Biotechnology Information’s Gene Expression Omnibus (GEO) through accession GSE225087. Code to reproduce analysis is available on GitHub: https://github.com/KaplanLab.
References
- Analysis of sub-kilobase chromatin topology reveals nano-scale regulatory interactions with variable dependence on cohesin and CTCFNat Commun 13:1–13
- The SWI/SNF chromatin remodelling complex is required for maintenance of lineage specific enhancersNat Commun 8:1–10
- Transcription shapes 3D chromatin organization by interacting with loop-extruding cohesin complexesbioRxiv https://doi.org/10.1101/2022.01.07.475367
- RNA polymerase II dynamics shape enhancer– promoter interactionsNat Genet 55:1370–1380
- Decreased Enhancer-Promoter Proximity Accompanying Enhancer ActivationMol Cell 76:473–484
- Acute depletion of the ARID1A subunit of SWI/SNF complexes reveals distinct pathways for activation and repression of transcriptionCell Rep 37
- BCL11A enhancer dissection by Cas9-mediated in situ saturating mutagenesisNature 527:192–197
- Systematic discovery and functional dissection of enhancers needed for cancer cell fitness and proliferationCell Rep 41
- Thrombopoietin signaling to chromatin elicits rapid and pervasive epigenome remodeling within poised chromatin architecturesGenome Res 28:295–309
- BET inhibition disrupts transcription but retains enhancer-promoter contactNat Commun 12
- Interaction between transcription regulatory regions of prolactin chromatinScience 261:203–206
- Multiplexed analysis of chromosome conformation at vastly improved sensitivityNat Methods 13:74–80
- Capturing chromosome conformationScience 295:1306–1311
- The second decade of 3C technologies: detailed insights into nuclear organizationGenes Dev 30:1357–1382
- Topological domains in mammalian genomes identified by analysis of chromatin interactionsNature 485:376–380
- Identification of LZTFL1 as a candidate effector gene at a COVID-19 risk locusNat Genet 53:1606–1615
- Mapping and analysis of chromatin state dynamics in nine human cell typesNature 473:43–49
- Mapping of long-range chromatin interactions by proximity ligation-assisted ChIP-seqCell Res 26:1345–1348
- Global reorganisation of cis-regulatory units upon lineage commitment of human embryonic stem cellsElife 6https://doi.org/10.7554/eLife.21926
- Formation of Chromosomal Domains by Loop ExtrusionCell Rep 15:2038–2049
- Systematic mapping of functional enhancer-promoter connections with CRISPR interferenceScience 354:769–773
- Activity-by-contact model of enhancer-promoter regulation from thousands of CRISPR perturbationsNat Genet 51:1664–1669
- An oestrogen-receptor-alpha-bound human chromatin interactomeNature 462:58–64
- A Genome-wide Framework for Mapping Gene Regulation via Cellular Genetic ScreensCell 176:377–390
- Towards a comprehensive catalogue of validated and target-linked human enhancersNat Rev Genet 21:292–310
- The macro and micro of chromosome conformation captureWiley Interdiscip Rev Dev Biol 10
- Region Capture Micro-C reveals coalescence of enhancers and promoters into nested microcompartmentsbioRxiv https://doi.org/10.1101/2022.07.12.499637
- Sensitivity of cohesin-chromatin association to high-salt treatment corroborates non-topological mode of loop extrusionEpigenetics Chromatin 14
- A Genetic Variant Associated with Five Vascular Diseases Is a Distal Regulator of Endothelin-1 Gene ExpressionCell 170:522–533
- A Phase Separation Model for Transcriptional ControlCell 169:13–23
- Enhancer-promoter interactions and transcription are largely maintained upon acute loss of CTCF, cohesin, WAPL or YY1Nat Genet 54:1919–1932
- Resolving the 3D Landscape of Transcription-Linked Mammalian Chromatin FoldingMol Cell 78:539–553
- Defining genome architecture at base-pair resolutionNature 595:125–129
- Analysis of hundreds of cis-regulatory landscapes at high resolution in a single, high-throughput experimentNat Genet 46:205–212
- Understanding 3D genome organization by multidisciplinary methodsNat Rev Mol Cell Biol 22:511–528
- A high-resolution map of the three-dimensional chromatin interactome in human cellsNature 503:290–294
- Mediator and cohesin connect gene expression and chromatin architectureNature 467:430–435
- The transcription factor activity gradient (TAG) model: contemplating a contact-independent mechanism for enhancer-promoter communicationGenes Dev 36:7–16
- Ultrastructural Details of Mammalian Chromosome ArchitectureMol Cell 78:554–565
- Comprehensive mapping of long-range interactions reveals folding principles of the human genomeScience 326:289–293
- Nested epistasis enhancer networks for robust genome regulationScience 377:1077–1085
- In Situ Capture of Chromatin Interactions by Biotinylated dCas9Cell 170:1028–1043
- Co-condensation between transcription factor and coactivator p300 modulates transcriptional bursting kineticsMol Cell 81:1682–1697
- Chromosome Conformation Capture and Beyond: Toward an Integrative View of Chromosome Structure and FunctionMol Cell 77:688–708
- Mapping long-range promoter contacts in human cells with high-resolution capture Hi-CNat Genet 47:598–606
- A regulatory archipelago controls Hox genes transcription in digitsCell 147:1132–1145
- HiChIP: efficient and sensitive analysis of protein-directed genome architectureNat Methods 13:919–922
- A unique H2B acetylation signature marks active enhancers and predicts their target genesbioRxiv https://doi.org/10.1101/2022.07.18.500459
- The logic of native enhancer-promoter compatibility and cell-type-specific gene expression variationbioRxiv https://doi.org/10.1101/2022.07.18.500456
- Spatial partitioning of the regulatory landscape of the X-inactivation centreNature 485:381–385
- Pairtools: from sequencing data to chromosome contactsbioRxiv https://doi.org/10.1101/2023.02.13.528389
- A revised model for promoter competition based on multi-way chromatin interactions at the α-globin locusNat Commun 10
- Transcription Factories: Genome Organization and Gene RegulationChem Rev 113:8683–8705
- Fixing the model for transcription: the DNA moves, not the polymeraseTranscription 2:41–44
- Mediator kinase inhibition further activates super-enhancer-associated genes in AMLNature 526:273–276
- Architectural Protein Subclasses Shape 3D Organization of Genomes during Lineage CommitmentCell 153:1281–1295
- Coming full circle: On the origin and evolution of the looping model for enhancer-promoter communicationJ Biol Chem 298
- Gene regulation by proteins acting nearby and at a distanceNature 322:697–701
- The Mediator complex regulates enhancer-promoter interactionsbioRxiv https://doi.org/10.1101/2022.06.15.496245
- Cohesin Loss Eliminates All Loop DomainsCell 171:305–320
- A 3D map of the human genome at kilobase resolution reveals principles of chromatin loopingCell 159:1665–1680
- Mustache: multi-scale detection of chromatin loops from Hi-C and Micro-C maps using scale-space representationGenome Biol 21
- Organizational principles of 3D genome architectureNat Rev Genet 19:789–800
- Chromatin extrusion explains key features of loop and domain formation in wild-type and engineered genomesProc Natl Acad Sci U S A 112:E6456–65
- Long-range enhancer-promoter contacts in gene expression controlNat Rev Genet 20:437–455
- Three-Dimensional Folding and Functional Organization Principles of the Drosophila GenomeCell 148:458–472
- The Shh Topological Domain Facilitates the Action of Remote Enhancers by Reducing the Effects of Genomic DistancesDev Cell 39:529–543
- Cohesin-Dependent and -Independent Mechanisms Mediate Chromosomal Contacts between Promoters and EnhancersCell Rep 32
- CBP-mediated acetylation of histone H3 lysine 27 antagonizes Drosophila Polycomb silencingDevelopment 136:3131–3141
- Looping and interaction between hypersensitive sites in the active beta-globin locusMol Cell 10:1453–1465
- A cohesin traffic pattern genetically linked to gene regulationNat Struct Mol Biol 29:1239–1251
- The Energetics and Physiological Impact of Cohesin ExtrusionCell 175:292–294
- Time-Resolved Analysis Reveals Rapid Dynamics and Broad Scope of the CBP/p300 AcetylomeCell 174:231–244
- YY1 Is a Structural Regulator of Enhancer-Promoter LoopsCell 171:1573–1588
- Enhancer-promoter contact formation requires RNAPII and antagonizes loop extrusionbioRxiv https://doi.org/10.1101/2022.07.04.498738
Article and author information
Author information
Version history
- Sent for peer review:
- Preprint posted:
- Reviewed Preprint version 1:
- Reviewed Preprint version 2:
Copyright
© 2023, Golov et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
- views
- 611
- downloads
- 22
- citations
- 0
Views, downloads and citations are aggregated across all versions of this paper published by eLife.