MChIP-C experimental and computational workflow.

a. An overview of the MChIP-C experimental procedure. b. General MChIP-C analysis pipeline. A 250-kb genomic region surrounding the TAL1 gene is shown with H3K4me3 MChIP-C profiles in K562 cells. Positions of individual viewpoints are highlighted by green rectangles and anchors. Identified MChIP-C interactions are shown as magenta (P-PIR) and dark violet (P-P) arcs. c. Summary statistics for all 239,779 promoter-centered MChIP-C interactions identified in K562 cells.

Comparison of MChIP-C with PLAC-seq and Micro-C.

a. Top: MChIP-C, PLAC-seq and Micro-C interaction profiles of the MYC promoter in K562 cells. MChIP-C interactions of the MYC promoter are shown as magenta arcs. Positions of 7 (e1-e7) CRISPRi-verified K562 MYC enhancers are highlighted as orange rectangles. Bottom: zoom in on two enhancer clusters. b. Systematic comparison of merged MChIP-C, merged PLAC-seq and merged promoter-anchored Micro-C signals in distal regulatory sites. Top: Merged MChIP-C, merged PLAC-seq and merged promoter-anchored Micro-C profiles in a 150-kb genomic region surrounding the α-globin gene domain. Viewpoints are highlighted as green rectangles. Positions of CTCF-bound and CTCF-less DNase hypersensitive sites outside viewpoints are depicted as blue and orange circles. Bottom: Heatmaps and averaged profiles of DNase sensitivity, CTCF ChIP, H3K4me3 ChIP, merged MChIP-C, merged PLAC-seq and merged promoter-anchored Micro-C signals centered on distal CTCF-bound and CTCF-less DNase hypersensitive sites.

Analysis of protein factors underlying MChIP-C interactions.

a. Left: CTCF-motif orientation bias in regions interacting with CTCF-less promoters. The majority (∼79%) of CTCF motifs are oriented towards the interacting promoter. Right: Schematic of two hypothetical loop extrusion dependent mechanisms that can account for the observed pattern: promoter LE-barrier activity (i) or CTCF-originating interaction stripes (ii). b. Enrichment of transcription-related factor (TRF) binding in MChIP-C PIRs. Y-axis represents enrichment (log2 (observed/expected)) of binding for 271 examined TRFs, x-axis – proportion of TRF-bound PIRs, color – enrichment of corresponding motifs in PIRs (grey color is assigned to TRFs lacking DNA-binding motif). c. Hierarchical clustering of PIR-overlapping DHSs (N=18,837). The binding status of 168 TRFs highly enriched in PIRs are used as binary features. Binding of 28 selected TRFs (see Methods) in each PIR-overlapping DHS is shown as a heatmap. ChromHMM chromatin state distributions in each cluster are shown. DHSs overlapping CRISPRi-verified K562 enhancers (Fulco et al., 2019; Gasperini et al., 2019) are shown as orange dots. d. Predictive performance (3-fold cross-validation R2/AUC) of random forest models predicting MChIP-C signal for DHS-promoter pairs. Starting with an initial model based on distance and CTCF, the most predictive TRF features are added incrementally to the model (left to right).

The majority of functionally-verified enhancers do physically interact with their target promoters.

a. Bar plots representing the proportion of MChIP-C interacting pairs among nonfunctional DHS-P pairs and CRISPRi-verified E-P pairs. Heatmaps and average profiles of MChIP-C signal are shown for individual subsets. CRISPRi-verified enhancers shown in panel b are indicated by roman numerals. b. Examples of MChIP-C profiles for promoters physically interacting with their functionally verified enhancers (MChIP-C interaction has been found) (i-iii) and not interacting with them (MChIP-C interaction has not been found) (iv-vi). The viewpoints are highlighted by anchor symbols and green rectangles, the enhancers are highlighted by orange rectangles. c. Distance distribution boxplots for verified E-P pairs with and without MChIP-C interactions.

H3K4me3 MChIP-C experiment technical assessment.

a. 1.5% agarose gel electrophoresis of MChIP-C MNase digestion and ligation controls. b. Mapping and filtering statistics of 4 biological replicates of H3K4me3 MChIP-C experiments. c. 1Mb region on chromosome 16 with mononucleosomal MChIP-C profiles of 4 MChIP-C replicates reflecting H3K4me3 occupancy, positions of consensus MChIP-C mononucleosomal peaks, viewpoints and conventional H3K4me3 ChIP-seq profile. d. Hexagonal heatmaps representing pairwise comparison of mononucleosomal MChIP-C profiles in 4 replicates and a conventional H3K4me3 ChIP-seq profile. Signals in K562 DHS sites are used for correlation and plotting. Pearson’s correlation coefficient (r) for each pair is shown. e. Heatmaps and average profile plots of DNase sensitivity, H3K4me3 ChIP and CAGE signal in 10kb windows centered on either MChIP-C viewpoints or distal DHS sites (not overlapping MChIP-C viewpoints). f. Hexagonal heatmaps representing pairwise comparison of distal MChIP-C profiles in 4 replicates (either merged or separated by viewpoints). Pearson’s correlation coefficient (r) for each pair is shown. g. Distance-dependent decay of distal MChIP-C signal in 4 replicates. Signal is calculated in 30 distance bins of equal size on a log10 scale between 3.5 (3,162 bp) and 6.5 (3,162 kbp). h. Observed and expected numbers of cross-TAD boundary MChIP-C interactions. Standard deviations (SDs) of simulated expected numbers are indicated.

Comparison of MChIP-C with other C-methods.

a. MChIP-C, PLAC-seq and Micro-C interaction profiles for GATA1 and HBG2 genes. Viewpoints are highlighted by anchor symbols and green rectangles, K562 known enhancers are highlighted by orange rectangles (only enhancers localized within 5kb-1Mb of the viewpoints are shown). Note that even though these two viewpoints have comparatively low MChIP-C read coverage (3,213 and 2,577 read pairs) E-P physical interactions are still discernible in most cases. b. Aggregate ligation signal between all active promoters and CTCF-bound (bottom) and CTCF-less (top) distal DNase hypersensitive sites measured with various C-methods in K562 cells. Genomic bins within 1–5 kb of DHS center were used as background. ICE denotes iteratively balanced Micro-C and Hi-C datasets.

CTCF-orientation bias in PIRs interacting with CTCF-occupied promoters.

The graph demonstrates CTCF-motif orientation bias in regions interacting with CTCF-occupied promoters. A significant portion of these promoters does not contain canonical CTCF binding motifs. CTCF-bound promoters tend to interact with CTCF-motifs oriented towards them whether the CTCF-binding motif is present (right diagram) or absent (left diagram) in these promoters. Note that the bias holds even for codirectional CTCF motifs while one of them is localized within a promoter and the other within a PIR.

Extended characterization of protein factors underlying promoter-interacting DHSs.

a. Heatmap showing the binding of all 168 TRFs used for hierarchical clustering of promoter-interacting DHS. Columns corresponding to proteins associated with clusters 1 and 2 (RAD21, SMC3, ZNF143 and CTCF) as well as to enhancer-associated factors potentially involved in physical E-P interactions (BRD4, H3K27ac, SMARCE1, ARID1B, DPF2, EP300, YY1, MED1, PolII-S5P, CDK8) are highlighted. b. Distance distribution boxplots for MChIP-C P-PIR interactions anchored in DHSs from identified clusters. c. Predictive power (3-fold cross-validation R2) of various enhancer-associated factors in the random forest model of MChIP-C signal strength. The predictive power of the initial model+RAD21 (6 features total) is shown as a dashed line. The predictive power of each factor is shown after adding the factor to the 6 features and retraining the model.

Recall (sensitivity), precision and false positive rate for predictions of functional enhancer-promoter pairs using different C-methods.

a. Bar plots representing the proportion of interacting pairs according to PLAC-seq, Micro-C and Hi-C among nonfunctional DHS-P pairs and CRISPRi-verified E-P pairs. Heatmaps and average profiles of proximity ligation signal are shown for separate subsets in each individual method. b. Precision-Recall plot and ROC plot for various C-methods aiming to distinguish between nonfunctional DHS-P pairs and CRISPRi-verified E-P pairs in K562 cells. Stars denote the performance of distance-based predictors in which each analyzed DHS is assigned as an enhancer to the nearest active gene (1) or to the two nearest active genes (2). Curves represent the performance of the distance-based predictor using thresholds inversely proportional to the genomic distance between the DHS/enhancer and TSS of the target gene.