Fate of the TADs in chromosomes upon cohesin deletion.

(a) The number of TADs in all the chromosomes, identified by the TopDom method [44], in the wild type (WT) cells and the number of preserved TADs (P-TADs) after deleting cohesin loading factor (ΔNipbl) in mouse liver. The percentage of P-TADs is ≈ 16%. (b) Same as (a) except the experimental data are analyzed for HCT-116 cell before (WT) and after RAD21 deletion. The percentage of P-TADs is 26%. (c) The total number of TADs and the number of P-TADs for each chromosome calculated using the mouse liver Hi-C data. The number above each bar is the percentage of P-TADs in each chromosome. (d) Same as (c) except the results are for chromosomes from the HCT-116 cell line. The percentage of P-TADs is greater in the HCT-116 cell line than in mouse liver for almost all the chromosomes, a feature that is more prominent in the distribution of P-TADs proportions (Right).

Identification of preserved TADs (P-TADs) from the contact map using TopDom method.

(a) Schematic representation used to determine the P-TADs. Yellow (Blue) triangles represent the TADs identified using TopDom method in WT (cohesin depleted) contact maps at 5Okb resolution. Small square within each triangle represents a single locus (5Okb). The boundaries of a TAD detected in WT CM within ± one bin (5Okb) from a position of boundaries in cohesin depleted CM is deemed to be a P-TAD. (b) P-TAD upon cohesin loss in HCTΠ6 cell. The bar plots above the CMs show the epigenetic states. Red (Blue) color represents the active (inactive) state, respectively. The TAD between grey dashed lines is preserved upon cohesin loss. The parameter (with red square) displayed at each left bottom indicates the color scale when plotting contact maps used in Juicebox [54].

CCM simulations reveal characteristics of P-TADs.

(a) The number of TADs in the simulated Chr10 and Chr13 chromosomes identified by the TopDom method [44] for Pl=1. The number of P-TADs after CTCF loop depletion (Pl=0) is also shown, (b) The number of P-TAD with epigenetic mismatches (blue) and those identified by the peaks in the boundary probability (green). (c)-(e) Comparison between CMs for the region of Chr13 with upper (lower) triangle with Pl = 1 (Pl = 0). The black circles at the corner of the TADs are the CTCF loop anchors. The bar above the contact map are the epigenetic states with red (blue) representing A (B) loci. Arrows above the bar show the epigenetic switch. (c) After loop deletion, TAD structures disappear. (d) TAD whose boundaries are marked by epigenetic mismatches are preserved. (e) TAD lacking at least one epigenetic mismatch is disrupted after CTCF loop loss. (f)-(h) Comparison of the CM and mean spatial-distance matrices for the 2.5Mb genomic regions (25.7-28.2Mbp,73.3-75.8Mbp and 102-104.5Mbp, respectively) with (upper) and without (lower) CTCF loop. Bottom graph shows the boundary probability, with the high values indicating population averaged TAD boundary. Purple circles in the boundary probability graph represent the preferred boundaries. A subset of P-TADs boundaries match epigenetic mismatches (blue lines). P-TADs with high boundary probability is in green line. The magenta line describes P-TADs which is not accounted by epigenetic mismatch or physical boundary in 3D space but are found using the TopDom method.

Classification of P-TADs from Hi-C maps from two cell lines and link between boundary probability peak and epigenetic mismatch.

(a) The number of P-TADs in all the chromosomes (orange bar taken from Fig. la) that are accounted for by epigenetic mismatches (blue bar) as well as peaks in the boundary probability (green bar) after Nipbl loss in mouse liver. (b) Same as (a) except the analyses is done using experimental data are for HCT-116 cell after ΔRAD21. (c) Example of P-TAD in the WT 97.7Mb-100.2Mb region of Chr3 from HCT-116 cell line. The CM resolution is 5Okb.The mean DMs calculated using the 3D structures is in the middle panel. The dark-red circles at the boundaries of the TADs in the CMs are loop anchors detected using HiCCUPS [55]. The peaks in the boundary probability (bottom panel) are shown by purple circles. Epigenetic mismatch coincides with peak in the boundary probability (compare top and bottom panels). Bottom plot shows the probability for each genomic position to be a single-cell domain boundary. (d) Same as (c) except the results correspond to the absence of RAD21. Although not as sharp, there is discernible peak in the boundary probability when there is an epigenetic mismatch after removal of RAD21.

Fate of TADs after ΔNipbl in mouse liver cells.

(a)-(b) Comparison between Hi-C CMs (lower) and calculated CMs (upper) using the 3D structures obtained from HIPPS method for the 3Mb genomic regions (Chr6:22.6Mb-26.IMb in WT cells and Chr7:l39Mb-l42.5Mb in Nipbl-depleted cells), respectively. The distance threshold for calculated CMs is adjusted to achieve the best agreement between HIPPS and experiments. Calculated CMs are in very good agreement with Hi-C CMs in both WT and Nipbl-depleted cells. (c) Complete Loss (Chr6:23.55Mb-26.O5Mb) of TADs. (d)-(e) P-TADs (Chr7:l39.5Mb-l42Mb and Chr15:89.5Mb-92Mb). In all cases, the plots below the scale on top, identifying the epigenetic states[47], compare 5Okb-resolution Hi-C CMs for the genomic regions of interest with Nipbl (upper) and without Nipbl (lower). Mean spatial-distance matrices, obtained from the Hi-C contact matrices using the HIPPS method[45], are below the CMs. The dark-red circles at the boundaries of the TADs in the CMs are loop anchors detected using HiCCUPS [4]. ChIP-seq tracks for CTCF, RAD21 and SMC3 in the WT cells [34] illustrate the correspondence between the locations of the most detected loop anchors and the ChIP-seq signals. Bottom plots give the probabilities that each genomic position is at a single-cell domain boundary in the specified regions. Purple circles in the boundary probability graph represent the physical boundaries. A subset of physical boundaries in P-TADs coincide with epigenetic mismatches (blue lines), indicating that the probabilities of contact at these boundaries are small. P-TADs in (e), demarcated by green lines, have high peaks in the boundary probability without epigenetic mismatch.

Calculated 3D structures from Hi-C produce TAD-like structures found in imaging experiments

On the left (right) panels are results for WT (ΔRAD21) for (Chr21:34.6-37.1Mb). For visualization purposes we adopted the color scheme used in the imaging study [40]. (a) Hi-C contact maps with (left) and without ΔRAD21 [26]. (b) Mean distance matrices calculated from HIPPS-generated 3D strictures. (c) Examples of calculated single-cell distance matrices with (left) and without (right) RAD21. Schematic of structures for the two cells under the two conditions are given below. (d) Distribution of the boundary strengths, describing the steepness in the changes in the spatial distance across the boundaries. The left (right) is for the WT (ΔRAD21) cells. The blue (purple) histogram is calculated using HIPPS (experiments). (e) Position dependent boundary probability for the WT (left) RAD2I deleted cells (right). The curve in blue (purple) is the calculated (measured) boundary probability for the WT cells. The orange (green) curve is from the calculations (experiments). The plots show that the location of prominent peaks in the calculated boundary probability are in excellent agreement with experiments for the WT cells (left panel). Without RAD2I, high peaks are absent in both the cases (right panel). The calculated results were obtained without adjusting any parameter in the theory.

Certain TADs enriched in E-P/P-P interactions at the boundary are preserved upon cohesin deletion.

(a) Comparison between 5kb Micro-C CMs in the region (Chr8:72.24Mb-72.57Mb) in the WT (left panel) and cohesin-depleted (right panel) mESC cells [48]. The location of cohesin loops (green square) and E-P/P-P (blue circles) plotted in the WT CMs are from experiments[48]. Each bar above contact map shows epigenetic states (Red: Active, Blue: Inactive) annotated based on ChromHMM results [56]. The cohesin-dependent (green dashed lines) and independent (blue dashed lines) TADs are detected in the WT cells using the TopDom method with default parameter (w=5). P-TADs (blue dashed lines) are also found in cohesin deleted cells. (b)-(c) Comparison between 2Okb Micro-C CMs and mean DMs spanning the regions (Chr19:8.66-9.2Mb and Chr12:56.4-56.9Mb, respectively.) in the presence (upper) and absence (lower) cohesin. Bottom graph shows below each DM shows the boundary probability calculated from 10,000 3D structures calculated using the parameter free HIPPS method. P-TADs between grey dashed lines are detected using TopDom method (w=5). A P-TAD with high boundary peak, without epigenetic mismatches, are enriched due to E-P/P-P interactions at the boundaries.

Parameters for bonding potentials

Schematic representation used to identify the P-TADs with epigenetic mismatches:

Dark grey triangles represent the P-TADs in CM. Small square within each triangle represents a single locus (50kb). Red (Blue) color indicates the active (inactive) state in the bar below the CM. A transition between A and B epigenetic states is referred as to epigenetic mismatch (Green arrows). We examined if each P-TAD has an epigenetic mismatch at the boundaries ± 100kb (II). If P-TADs have only one locus (50kb) mismatch near their boundaries (I) or comprise < 70% of sequences in identical epigenetic state (III), they are excluded. The TAD (yellow star) is a P-TAD with epigenetic mismatch at the TAD boundary.

CCM simulations for chromosome 13 (Chr13) from the GM12878 cell line:

(a) In the CCM, red (blue) spheres represent active (repressive) loci. The black open circles are the CTCF loop anchor locations, (b) Comparison of the simulated (Pl = 1, top half)) and Hi-C CMs (bottom half). The bar above marks the epigenetic states with red (blue) representing active (repressive) loci. The values of the contact frequencies, converted to a log scale, are shown on the right, (c) Comparison between the Pearson correlation maps consisting of ρij for all loci pairs from simulations (top half) and experimental data (bottom half). The scale for the Pearson Correlation Coefficient (PCC) is on the right, (d) Distribution of the PCC, ρij for all (i,j) pairs from simulations and experiment (1 is positive correlation, 0 is no correlation, and −1 corresponds to anti-correlation). The Kullback-Leibler, Dkl, value between CCM prediction and experiment is small. (e) First eigenvector values (PC1) from Principal Component Analysis (PCA) using the correlation matrix for CCM. The compartment A and B are defined by positive (red) and negative (blue) values. (f) Snapshot of the folded Chr13. The color corresponds to genomic distance from one end point, ranging from red to green to blue. (g) Ensemble averaged distance map obtained from simulations. (h) Ward Linkage Matrix (WLM) comparison between simulations and the one computed using Hi-C data. The PCC between the two distance matrices is ∼ 0.83, indicating reasonable agreement between simulations and experiments. (i) Contact map for the 8 Mbp region ((44-52)Mb) with the upper (lower) triangle corresponding to simulations (experiments), (j) On the right is an Illustration of the TADs, identified using the Multi-CD method [72]. The dark-red circles are the positions of the loop anchors detected in the Hi-C experiment, which are formed by two CTCF motifs. A subset of TADs is defined by the CTCF loops, whereas others are not associated with loops. These could arise from segregation between the chromatin states of the neighboring domains in certain experimental studies [26, 73, 74]. The average sizes of the TADs detected using Multi-CD method from Hi-C and simulated contact maps are ∼750kbs and ∼700kbs, respectively. (k) Snapshot of the TAD, marked in (j). (m) Same as (j) except the TADs were calculated for the region ((75-83)Mb) in (1).

Organizational features of Chr10 from human cell line GM12878:

(a) Comparison between the simulated CM (Pl = 1.0, top half) and Hi-C experiments (bottom half). The bar above the contact map shows the epigenetic states with red (blue) representing active (repressive) loci. (b) Experimental (lower triangle) and the simulated (upper triangle) Pearson correlation maps. (c) The distribution of the PCC, ρij for each pair of (i, j) from simulations and experiment. The value of the KL divergence at the bottom is obtained by comparing the distributions obtained in the simulations and experiments. (d) A conformation of the folded Chr10 (N=2,712) obtained using the CCM simulations. The colors correspond to genomic distance from the 5’ to 3’ end. (e) Ensemble averaged distance map calculated using the simulated structures. (f) Experimental (lower triangle) and the simulated (upper triangle) WLMs. The PCC between the two WLMs is ∼ 0.75. The agreement between simulations and experiments is fair. (g) Hi-C map for the region (19.7-26.25) Mb with the upper (lower) triangle corresponding to simulations (experiments). (h) Right is an illustration of the TADs. The dark-red circles are the positions of the loop anchors detected in the Hi-C experiment, formed by two CTCF motifs, (i) Snapshot of the TAD, marked by the black line in (h). (k) Same as (h) except the TADs were calculated for a region (90.8-97.05)Mb in (j). The diversity of TAD structures is apparent.

Clustering of A and B loci is stronger after CTCF loop (cohesin) loss:

(a) Comparison between simulated contact maps using CCM (19-34Mb, upper panel) and Pearson correlation maps (19-29Mb, lower panel) for Chr13 (GM12878 cell line). Upper triangle (lower triangle) was calculated with (without) CTCF loops. The black circles in the upper triangle are the positions of the CTCF loop anchors detected in the Hi-C experiment [4]. The bar on top marks the epigenetic states with red (blue) representing active (repressive) loci. Upon CTCF loop loss, the plaid patterns are more prominent, and finer details of the compartment organization emerge. (b) 3D snapshots of A and B clusters identified using the DBSCAN algorithm with Pl = 1 (left panel) and Pl = 0 (right panel) computed from simulations of Chr13 with and without loops, respectively. Five A clusters (Upper panel; Red, orange, yellow, dark-green, light-green) and one B cluster (Lower panel; white) are detected in this 3D structure with Pl =1. Four A clusters and one B cluster are detected for Pl =0. The size of a locus σ50K ≈ 243nm[ll]. (c) Box plot of the number (left) and average size (right) of A (B) clusters determined using 10,000 individual 3D structures for Pl =1 and Pl =0 for simulated Chr10 and Chr13. The size of the A (B) cluster, Sa (Sb) is defined as, (the number of A (B) loci within the cluster)/(the total number of A (B) loci within the chromosome). Boxes depict median and quartiles. The black line with caps describes the range of values in the number and size. Loop loss creates a smaller number (enhancement in compartment strength) of A-type clusters whose sizes are larger (Upper). Two-sided Mann-Whitney U test was performed for the statistical analysis. There is no change in the number and size of B clusters after loop deletion (Lower), (d)-(e) Same as (c) except the results were determined using 10,000 3D structures generated with the HIPPS method from the experimental Chr11 and Chr19 CMs (Chr6 and Chr15 CMs) from mouse liver for the WT and ΔNipbl [34] (HCT-116 in (WT) and ΔRAD21 cells [26]), respectively. The number of A clusters decreases by 18% and 27% after Nipbl loss in Chr11 and Chr19, respectively, (f)-(g) Pearson correlation matrix derived from 3D structures for Chr11 and Chr19 of mouse liver, respectively. Two loci, separated by a distance smaller than l.75σ are in contact (σ is the mean distance between i and i + l loci for WT and ΔNipbl, respectively). The black circles in the upper triangle are loop anchors detected in Hi-C map [34] using HiCCUPS [4]. (h) The percentage of decrease in the number of A (B) clusters after CTCF loop or cohesin loss for some chromosomes in simulations and experiments as a function of the percentage of A (B) loci within the chromosome. When the proportion of B loci is much larger than A loci, there is no change in B clusters despite loop or cohesin deletion (Upper panel).

Enhancement of compartmentalization upon CTCF loop loss:

Compartmentalization saddle plots are shown for (a) Chr13 and (b) Chr10 with Pl = 1 (left) and Pl = 0 (right). Observed/expected matrix bins are arranged based on PC1, obtained from the contact maps without loops. Numbers at the centre of the maps represent compartment strengths defined as the ratio of ((A–A) and (B–B) interactions) to ((A–B) and (B–A) interactions) using the mean values from the corners. The increase in the compartment score (4.2 to 5.2 for Chr13 and 10.9 to 13.3 for Chr10) shows that the compartment features are accentuated in Pl = 0 (loop deletion) compared to Pl = 1, which accords well with the conclusions in the main text that uses a different method.

Calculation of boundary strength and boundary probability from the distance matrix at 5Okb resolution:

(a) A schematic describing the chromosome model. (b) Each small square of size a (= 50 kb) represents distance, rij between two loci i and j. The red square is used to illustrate the idea. (c) Definition of the start and end-of domain boundary strengths in the N×N distance matrix. The distance between the loci are represented as arcs in various colors. (d) The distance maps in 10,000 cells are calculated using the 3D structures using the HIPPS method[68] with Hi-C contact map from Schwarzer et al. [34] as input. Local maxima above a defined threshold at the start/end-of domain boundary strengths (yellow and green lines, respectively) are defined as domain boundaries in the WT Chr13. The start/end boundary probabilities for each locus are calculated as the proportion of cells in which the corresponding locus is a boundary location. The average of the start and end boundary probabilities cover 10,000 cells, is defined as the boundary probability for a given locus.

ChromHMM chromatin state annotation in HCT-116 cells

Single cell TAD-like structures exhibit in both Pl = 1 and Pl = 0. (a) Mean-spatial distance matrix for the genomic region (25.7-28.2Mbps) in CCM Chr13 without (left) and with (right) CTCF loops. (b) Examples of single-cell spatial-distance matrices calculated from the simulated 3D structures. TAD-like structures vary from cell to cell in both Pl = 1 (left) and Pl = 0 (right). Schematic of structures for the four cells under the two conditions are given below. (c) Distribution of the boundary strengths before (left) and after (right) CTCF loop loss, describing the steepness in the changes in the spatial distance across the boundaries. (d) The probability for each locus to be a single-cell domain boundary in cells for Pl = 1 (left) and Pl = O(right).

Same as Appendix Fig.8 except the results are for the genomic region (123.5126Mb) in Chr4 of mouse liver [34] with (left) and without (right) cohesin loading factor Nipbl. HIPPS generated single-cell spatial-distance matrices using Hi-C contact maps as inputs.

Same as Appendix Fig.8 except the results are for the genomic region (182.05184.55Mb) in Chr2 of HCT116 [26] with (left) and without (right) a core component of the cohesin complex, RAD21. Single cell 3D structures were calculated from Hi-C contact maps using HIPPS.

Epigenetic states contribute to the formation of domain boundaries.

P-TAD does not always have corner dots at their boundaries in the WT cells. (a) The number of P-TADs (after ΔNipbl) whose boundaries coincide with both epigenetic mismatches and corner dots (CTCF loop anchors) (red color) and only epigenetic mismatches (olive color) in the WT chromosomes from mouse liver. (b) Same as (a) except the results are obtained using experimental data from HCT-116 cell. (c) Chr10:57Mb-59.5Mb in mouse liver and (d) Chr1:lll.8Mb-ll4.3Mb in HCT-116 cells, respectively. Comparison between 5Okb-resolution CMs for the 2.5Mb region with (upper) and without (lower) Nipbl (RAD21). The panels below show the mean DMs obtained from the 3D structures. ChIP-seq tracks for CTCF, RAD21 and SMC1 in WT cells [4, 34] illustrate the correspondence between the locations of the detected loop anchors and the ChIP-seq signals. Comparison of the CMs and boundary probabilities in (c) and (d) shows that the P-TAD boundaries (blue dotted lines) correspond well with epigenetic mismatch (blue line) even without corner dots in WT cells. Purple circles in the boundary probability graph represent the preferred boundaries.

Fate of TAD structures after loss of RAD21 in HCT-116 cells.

(a) Complete Loss (Chr21:34.6Mb-37.IMb) (b)-(c) Preserved (Chr3:97.7Mb-100.2Mb and Chr5:9Mb-11.5Mb). 5Okb-resolution CMs for the 2.5Mb genomic regions of interest with (upper) and without (lower) RAD21 are shown in the middle panels. The dark-red circles at the boundaries of the TADs in the CMs are loop anchors detected using HiCCUPS [55]. The mean DMs calculated using the 3D structures with and without RAD21 are compared in the top and bottom panels. ChIP-seq tracks for CTCF, RAD21 and SMC1 in WT cells [4] illustrate the correspondence between the locations of the detected loop anchors and the ChIP-seq signals. Bottom plots are the probability for each genomic position to be a single-cell domain boundary in the regions for cells. Purple circles in the boundary probability graph represent the preferred boundaries. Some P-TADs boundaries match epigenetic mismatch (blue lines). P-TADs have only high peaks in boundary probability (green line) without evidence for epigenetic mismatch. The magenta line shows discordance between TopDom and Boundary probability.

Examples of discordance between TopDom and Boundary probability in mouse liver [34]. In all cases, the plots show CM with TopDom results, mean spatial-distance matrix and boundary probability for the 2.5Mb region (a) (Chr1:l72Mb-l74.5Mb) (b) (Chr4:l37.5Mb-l4OMb) and (c) (Chr10:8.7Mb-11.2Mb) with (top) and without (bottom) Nipbl. Purple circles in the boundary probability indicate the prominent physical boundary in 3D structures. The magenta lines represent discordance between TopDom and Boundary probability