# Abstract

Compartment formation in interphase chromosomes is a result of spatial segregation between eu- and heterochromatin on a few mega base pairs (Mbp) scale. On the sub-Mbp scales, Topologically Associating Domains (TADs) appear as interacting domains along the diagonal in the Hi-C contact map (CM). Hi-C experiments showed that most of the TADs vanish upon deleting cohesin, while the compartment structure is maintained and is even enhanced. However, closer inspection of the data reveals that a non-negligible fraction of TADs is preserved (P-TADs) after cohesin loss. Imaging experiments show that, at the single-cell level, TAD-like structures are present even without cohesin. To provide a structural basis for these findings, we used polymer simulations to show that certain TADs with epigenetic mismatches across their boundaries survive after depletion of loops. More importantly, the three-dimensional structures show that many of the P-TADs have sharp physical boundaries. Informed by the simulations, we analyzed the Hi-C maps (with and without cohesin) in mouse liver and HCT-116, which affirmed that epigenetic mismatches and physical boundaries (calculated using the 3D structures) explain the origin of the P-TADs. Single-cell structures, calculated from using only the Hi-C map without *any parameters*, display TAD-like features in the absence of cohesin that are remarkably similar to the findings in imaging experiments, thus providing a cross validation of the computations. Some P-TADs, with physical boundaries, are relevant to the retention of enhancer-promoter/promoter-promoter interactions. Overall, our study shows that preservation of a subset of TADs upon removing cohesin is a robust phenomenon that is valid across multiple cell lines.

**eLife assessment:**

This **valuable** study, of interest for students of the biology of genomes, uses simulations in combination with published data to examine how many TADs remain after cohesin depletion. The authors suggest that a significant subset of chromosome conformations do not require cohesin, and that knowledge of specific epigenetic states can be used to identify regions of the genome that still interact in the absence of cohesin. The theoretical approaches and quantitative analysis are state-of-the-art, and the data quality and strength of the conclusions are **solid**. However, because "physical boundaries (of domains?)" in the model appear to be a consequence of preserved TADs, rather than the other way around, the functional insights are limited.

# I. INTRODUCTION

Advances in experimental techniques have provided details of the three-dimensional (3D) organization of chromosomes in diverse species[l–6]. The contact map (CM) [4, 6], inferred using chromosome conformation capture technique and related variants (referred to as Hi-C from now on), is a two-dimensional (2D) matrix, whose elements indicate the probability that two loci separated by a certain genomic distance are spatially adjacent. The Hi-C experiments on different mammalian cells suggest that there are two major length scales in the organization of interphase chromosomes. On the scale, *L*_{C} ∼ (2-5) Mbp (mega base pairs or Mbp), one observes checkerboard patterns in the CMs [4], which are thought to be associated with micro-phase separation between the two major epigenetic states, active (A) or euchromatin and inactive (B) or heterochromatin. On the length scale, *L*_{TAD}, from tens of kb up to a few Mb, chromatin domains, referred to as Topologically Associating Domains (TADs), appear as squares along the diagonal of the CMs [7, 8]. Contacts are enriched within the TADs, and are suppressed across the boundaries between the TADs. A number of polymer models [9–14] have shown that compartment formations and TADs may be explained using micro-phase separation. The use of the two length scales, *L*_{C} and *L*_{TAD}, in characterizing the organization of interphase chromosomes is now entrenched in the held, although there are suggestions that finer sub-TAD structures emerge at kilo base scales [15, 16]. In particular, recent Micro-C experiments have shown that there are fine structures starting from the nucleosome level [17–19], thus establishing the hierarchical organization of interphase chromosomes over a broad range of length scales.

TADs are thought to regulate gene expression by constraining the contacts between target gene and regulatory regions [20, 21]. As a consequence, perturbation or disruption of their integrity such as deletions, duplications or inversions of DNA segments within the TADs, could lead to aberrant gene expression [8, 22–28]. A class of chromatin loops, mediated by the ATP-dependent motor cohesin [29] and the DNA-binding protein CTCF protein (“cohesin-associated CTCF loop”), organizes a subset of the TADs [30]. Cohesin [29] extrudes DNA loops of varying length, which are terminated when cohesin encounters the transcriptional insulator CCCTC-binding factor (CTCF) [31]. This implies that cohesin and CTCF are often colocalized at the TAD boundary [4–6, 30, 32, 33].

Several experiments have shown that depletion of architectural proteins (Nipbl, RAD21, and CTCF) disrupts the organization of interphase chromosomes [26, 34–41]. Schwarzer et al. [34] showed that the removal of the cohesin loading factor, *Nipbl* in the mouse liver cell results in loss of TADs. They concluded that compartment formation, which is independent of cohesin, is a consequence of the underlying epigenetic landscape, while TAD formation requires cohesin. Similarly, it was found that upon removal of cohesin subunit, *RAD21* cohesin-associated CTCF loops and TADs are abolished [26, 39, 40]. Deletion of *RAD21* results in the complete loss of the so-called loop domains [26], which are formed when CTCF colocalizes with cohesin. In contrast, imaging experiments[40] showed that TAD-like structures, with sharp boundaries are present, at the single-cell level even after deleting cohesin. Three headlines emerge from these studies are: (i) They reinforce the two-length scale description of genome organization at the ensemble level. (ii) Factors that prevent the association of cohesin with chromosomes globally abolish the TADs and the Hi-C peaks, but preserve (or even enhance) compartmentalization. Experimental studies [4, 26, 34, 39, 40] and polymer simulations [12, 35, 42] have shown that the global epigenetic state determines compartment formation, while the more dynamic TADs, with finite lifetimes[43], require ATP-dependent cohesin motor. (iii) TAD-like features persist in single cells before and after auxin treatment, albeit with changes in the locations of sharp domain boundaries.

The results of super resolution experiments [40], at the single-cell level (described above), forced us to ask if there is evidence for preservation of TADs at the ensemble level. To this end, we analyzed the experimental CMs from mouse liver and HCT-Π6 cells, in the presence and absence of cohesin to assess if TADs are preserved at the ensemble level. We discovered that, on an average a fraction of TADs, identified using the TopDom method [44], are retained in the mouse liver and HCT116 cell lines (Fig.1) after removing cohesin. These findings raise the following questions. What is the structural basis for the retention of a small but not insignificant fraction of TADs that are preserved after cohesin loss? Is it possible to establish a structural explanation for TAD retention at the ensemble level (Hi-C), which would reconcile with the results in super resolution imaging experiments showing TAD-like structures at the single level, even without cohesin?

We answered the questions posed above by using the following strategy. We first performed polymer simulations for two chromosomes from the GM 12878 cell line using the Chromosome Copolymer Model (CCM) [11] with and without loop anchors, which roughly mimics the wild type (WT) and the absence of cohesin-associated CTCF loops, respectively. Informed by the results from the polymer simulations, we analyzed the experimental data from two cell lines [26, 34]. We discovered that epigenetic mismatch does account for a reasonable fraction (≈ 0.4 in mouse liver and ≈ 0.3 in HCT-116 cell lines) of P-TADs. We also generated the 3D structural ensemble of the chromosomes that are needed to locate the physical boundaries. The 3D structures were calculated using the parameter free HIPPS method [45] for all the chromosomes for both the cell lines. The analyses using the 3D coordinates accounted for about 53% of the P-TADs. More importantly, the 3D structures revealed TAD-like structures, at the single-cell level, exhibit in the presence and absence of cohesin, which is in accord with the super resolution imaging data [40]. Our work shows that the effects of cohesin removal on chromatin structures are nuanced, requiring analyses of both the epigenetic landscape as well as 3D structures in order to obtain a comprehensive picture of how distinct factors determine interphase chromosome organization in the nucleus. Our calculations for chromosomes from three cell lines lead to the robust conclusion that a subset of P-TADs is intact after deleting cohesin.

# II. RESULTS

## A non-negligible fraction of TADs are preserved upon removal of cohesin

Experiments [26, 34-37, 39, 40] have shown that deletion of cohesin loaders (*Nipbl in mouse liver, SCC2 in yeast*), cohesin subunit (*RAD21*) abolishes a substantial fraction of both cohesin-associated CTCF loops and the TADs. These observations across different cell lines raise an important question: Do all the TADs completely lose their contact patterns after removal of cohesin? To answer this question, we first analyzed 5Okb-resolution CMs from the two cell lines (mouse liver [34] and HCT-116 [26]) before and after degradation of *Nipbl* and *RAD21*, respectively (see Appendix Analyses of the experimental data for details). Using TopDom [44] to identify the TADs in the CMs, we discovered that roughly 16% (26%) of TADs in mouse liver (HCT116) cell are preserved (see Fig.2 to identify P-TADs using Hi-C data) upon *Nipbl* (*RAD21*) removal (Fig.l). Approximately, 659 TADs out of 4176 are preserved (Fig.1a) after removing *Nipbl* in the mouse liver cells. In the HCT-116 cells, 1226 TADs out of 4733 are preserved (Fig.1b) upon *RAD21* loss. Fig.1c and Fig.1d show that the number of P-TADs depends on the chromosome number. We ought to emphasize that the actual number of P-TADs, which depends on the TAD-calling protocol, is not as important as the finding that a non-negligible fraction is preserved after cohesin depletion.

## CCM simulations reproduce wild-type Hi-C maps

To explore the mechanism resulting in P-TADs, we first simulated the chromosome copolymer model [11], CCM (Appendix Fig.2a). To independently decipher the origins of P-TADs (Fig.1) in experiments, we calculated the CMs for Chr13, shown in Appendix Fig.2 (Chr10 in Appendix Fig.3) from the GM12878 cell line. The CCM simulations (Appendix Fig.2b) reproduce the ubiquitous checker board patterns well. The rectangle in Appendix Fig.2b represents the border of one such compartment formed primarily by interactions between the B-type loci.

In order to quantitatively compare the Hi-C data and the simulated CMs, we transformed the contact maps to Pearson correlation maps, which is used to assess if two loci have correlated interaction profiles (Appendix Fig.2c). The Kullback-Leibler (KL) divergence between the two probability distributions for the Pearson correlation coefficients, *ρ _{ĳ}*s, from simulations and experiments is 0.04 (see Appendix Fig.2d). We also performed Principal Component Analysis (PGA) on the Pearson correlation matrix to identify the compartment structure. Comparison of the PCA-derived first principal components (PC1) across the Chr13 reveals that A/B compartments observed in the CCM correspond well to those found in the experiments (Appendix Fig.2e).

We then compared the three-dimensional spatial organization between the simulations and experiments using the Ward Linkage Matrix (WLM), which is based on an agglomerative clustering algorithm. The simulated WLM is calculated from a spatial distance map of the organized chromosome, as described in Appendix Fig.2f-h. We constructed the experimental WLM by converting the Hi-C contact map to a distance map using the approximate relationship [2, 11, 45], . Here, *P _{ĳ}* is the contact probability and

*R*is the 3D spatial distance between loci

_{ĳ}*i*and

*j*(see Appendix Data Analyses). The Pearson correlation coefficient (

*ρ*) between experimental and simulated WLMs is 0.83 (Appendix Fig.2h).

_{wlmexp},wlm_{ccm}Snapshots of TAD structures in Appendix Fig.2k and n show that they are compact but structurally diverse. The average sizes of the TADs detected using TopDom [44] from Hi-C and simulated contact maps are ∼6l5*kbs* and ∼565*kbs*, respectively. Overall the emerging picture of the compartment and TAD structures using different methods are consistent with each other. The results in Appendix Fig.2 show that the agreement between the CCM simulations and Hi-C data is good, especially considering that (i) error estimates in the Hi-C experiments are essentially unknown, and (ii) more importantly, we did tune the value of any parameter to fit the experimental CMs. We surmise that the CCM faithfully reproduces the important features of the Hi-C maps for Chr13 (see Appendix Fig.3 for Chr10 results).

## Epigenetic mismatch accounts for a large fraction of P-TADs

Most of the TADs are not discernible after loop loss, as evidenced by the blurred edges in the contact maps (Fig.3c). In the CCM simulations of chromosomes from human GM12878 cell line a subset of TADs remains even after cohesin-associated CTCF loop loss. Fig.3a shows that ≈ 44% (Chr10) and 28% (Chr13) of initial TADs identified in the *P _{l} =* 1 simulations using TopDom [44] are preserved after loop loss (

*P*0). These values are in line with TAD preservation in chromosomes in HCT-116 [26] cells (see Fig.1d). The percentages of P-TADs depend on both the resolution of the Hi-C experiments as well as the algorithm used to identify the TADs. By using the same method to analyze both the simulation results and experimental data, it is hoped that the conclusions would be qualitatively robust.

_{l}=We used the simulation results to determine the reasons for P-TADs by comparing the results for *P _{L}*=l and

*P*0. The first observation is that some TADs, even with cohesin-associated CTCF loops, consist mostly of sequences in the same epigenetic state (A or B). Fig.3d compares the fate of one TAD in the region (19.3-21.8 Mb) in Chr13 between

_{l}=*P*=l and

_{L}*P*0

_{l}=*·*The highlighted TAD is preserved upon loop loss although the probabilities of contact within this TAD is reduced when

*P*(bottom) compared to

_{l}= 0*P*= 1 (top). Their boundaries correspond to a switch in the epigenetic state (the sequence location where the change in the epigenetic states occurs, as shown by the two black arrows). In contrast, a TAD in Fig.3e, that is present in the WT, is abolished when

_{l}*P*0

_{l}=*·*The disruption of this particular TAD, lacking in at least one epigenetic mismatch, occurs because it can interact more frequently with neighboring TADs composed of similar epigenetic states, which in this case is B-type loci. The importance of epigenetic mismatch in the P-TADs has been noted before [26].

The results in Figs.3d and e show that a mismatch in the epigenetic state across a TAD boundary in the WT is likely to result in its preservation after cohesin-associated CTCF loop loss. To test this assertion, we calculated the number of P-TADs that are associated with mismatches in the epigenetic states (see Appendix Data Analyses and Appendix Fig.1 for details). We considered the TAD boundary and epigenetic mismatch as overlapping if they are less than 100kb apart, which is reasonable given that Hi-C resolution adopted here is 5Okb. By using 100kb as the cut off, we only consider mismatch occurrences that exceed two loci. With this criterion, out of 216 (169) TADs calculated using TopDom 96 (47) are P-TADs for Chr13 (Chr10) (vertical blue bars in Fig.3b in which there are epigenetic mismatches in the WT).

The P-TADs with epigenetic mismatches, illustrated in Fig.3f, shows TADs in the 2.5Mbs region in Chr13. Among the three P-TADs, two of them, whose boundaries are marked by dashed blue lines, have epigenetic mismatch across the TAD boundary. These two TADs survive after removal of cohesin-associated CTCF loop.

## P-TADs have prominent spatial domain boundaries

Because there are a number of P-TADs that are preserved *even without epigenetic mismatches* across their boundaries, we wondered if the distance matrix, which requires 3D structures, would offer additional insights about P-TADs upon cohesin-associated CTCF loop loss. Recent imaging experiments[4O, 42, 46] revealed that TAD-like domain structures with spatially segregated heterogeneous conformations are present in single cells even without cohesin. The boundaries of TAD-like domains, identified from individual 3D structures, vary from cell to cell. They exhibit a preference to reside at specific genomic positions only upon averaging, as found in the Hi-C experiments. The boundary probability at each locus is the proportion of chromosome structures in which the locus is identified as a domain boundary in the 3D space. The locations of prominent peaks in the boundary probability frequently overlap with TADs detected by the population level Hi-C maps.

To explore the relation between P-TADs in ensemble averaged CMs and preferential boundaries in individual 3D structures of chromosomes, we first constructed individual spatial distance matrices (DMs) using 10,000 simulated 3D structures, which were used to identify the single-cell domain boundaries (see Appendix Data Analyses for details and Appendix Fig.6). We find preferential domains, with high peaks in the boundary probability along the genomic region as well as variations in single-cell domains both in *P _{L}* = 1 and

*Pl*= 0 (see Appendix Fig.8).

The simulations show that TADs with epigenetic mismatches across the boundary are likely to be preserved after cohesin-associated CTCF loop loss. Furthermore, Fig.3f (blue dashed lines) shows that single-cell domain boundaries preferentially reside at the TAD boundaries with epigenetic mismatches, leading to a prominent signature for the ensemble boundary after averaging over a population of cells. Interestingly, the P-TAD has prominent peaks in boundary probability, even without epigenetic mismatch, at the same genomic position as in the CM (green lines in Fig.3f and g). These observations imply that the presence of physical boundaries in the 3D structures may be used to identify P-TADs, especially in instances when there are no epigenetic mismatches.

Not all preferential boundaries identified in the distance matrices of the WT cells coincide with the TADs detected using the CM [40]. There is discordance in the TAD boundaries and high peaks in boundary probability. The top panel in Fig.3h (magenta lines) shows that in the (102 - 104.5) Mb range, TopDom predicts that there are three P-TADs in the WT after loop loss (see the top panel with *P _{l}* = 0). There are two prominent peaks in the WT boundary probability whose boundaries coincide with the TADs predicted by TopDom (see bottom panel with

*P*1 in Fig.3h). But the peak height for the third TAD is very small. At best, one can deduce from the boundary probabilities (compare the results in Fig.3h for

_{l}=*P*1 and

_{l}=*P*0) that the middle TAD is preserved, which would be consistent with the TopDom prediction.

_{l}=We calculated a standardized Z-score for the boundary probability in the genomic region to determine the preferred boundaries in single-cell domains. The number of P-TADs that are accounted for by prominent boundary peaks increases if Z-score is reduced. This implies that some P-TADs detected in the CMs using TopDom have weak physical boundaries in the 3D structures. We considered the maxima, with Z-score values larger than 0.7, as preferred boundaries in order to determine if P-TADs arise due to the presence of strong physical boundary. With this criterion, we obtained good agreement for the mean length of the TADs detected in the CM using the TopDom method. The averaged sizes of the TADs in Chr13 using TopDom and boundary probability are ∼565*kbs* and *∼*535*kbs*, respectively. Quantitative analysis of the boundary probabilities along the genomic region revealed ≈66% of the P-TADs in Chr10 and Chr13 have preferential positioning in single-cell domains (green bars in Fig. 3b). Most P-TADs with epigenetic mismatches display prominent peaks in the boundary probability (≈ 85%).

Let us summarize the primary lessons from the simulations, which form the basis for analyzing the experiments on chromosomes from mouse liver and HCT-116 cell lines. (i) Mismatch in the epigenetic state across the TAD boundary is a predominant factor in determining the P-TADs after CTCF loop deletion. (ii) The presence of physical boundaries in the 3D structures accounts for certain fraction of P-TADs, although in some instances TopDom predictions (used in (i)) are not compatible with boundaries deduced from 3D structures.

## Structural explanation of P-TADs upon cohesin removal from analysis of Hi-C data

In order to assess if the conclusions from simulations explain the experimental data, we first calculated the number of P-TADs whose boundaries have switches in the A/B epigenetic states in mouse liver and HCT-116 cell lines. We assigned chromatin state A (active, red) or B (repressive, blue) by analyzing the combinatorial patterns of histone marks using ChromHMM [47] (see Appendix analyses of the experimental data and Appendix Fig.7). An average over the 20 chromosomes shows that 280 P-TADs were associated with a mismatch between A and B epigenetic states upon Δ*Nipbl* in mouse liver (blue bar in Fig.4a). The corresponding number of P-TADs, averaged over 23 chromosomes, with epigenetic mismatches is 396 after deleting *RAD21* in HCT-116 (blue bar in Fig.4b). Not unexpectedly, TADs with epigenetic mismatches across their boundaries are preserved with high probability. We find that a large number of P-TADs are accounted (green bars in Fig.4a and Fig.4b) for by the presence of peaks in the boundary probabilities, which we discuss in detail below.

We then searched for a structural explanation for P-TADs in the two cell lines (Fig.4a and Fig.4b). A plausible hint comes from the the CCM simulations (Fig. 3), which show that boundary probabilities are good predictors of P-TADs. This implies that peaks in the boundary probabilities should correspond to P-TADs. Similar findings are obtained by analyzing the experimental data (Fig. 4c and d). Additional examples are discussed in Fig. 5. In light of these findings, we wondered if, in general, physical boundaries can be inferred directly using data from ensemble experiments. To this end, we used the HIPPS method (see Methods), to calculate an ensemble of 3D structures with the Hi-C contact map as the input. Several conclusions follow from the results in Fig.5. First, Figs. 5a and b show that HIPPS faithfully reproduces Hi-C contact maps. Second, using the ensemble of 3D structures, we calculated the locus-dependent boundary probabilities for both the WT and cohesin-depleted cells. Comparison of the peak positions in the averaged boundary probabilities and TAD boundaries shows that they often coincide, although there are discordances as well (Fig.5 and Appendix Figs. 12 and 13). Third, when there is a mismatch in the epigenetic states, a substantial fraction of P-TADs have high peaks in the boundary probabilities (see Fig.5 and Appendix Fig. 12). As in the simulations, a large fraction of P-TADs (≈ 67%) have high peaks in the boundary probabilities. Taken together, the results show that the predictions using boundary probabilities and the TopDom method are consistent. Finally, analyses of the experiments suggest that the epigenetic state as well as the presence of physical boundary in the 3D structures have to be combined in order to determine the origin of P-TADs in cohesin depleted cells.

With the near quantitative agreement with experiments, we performed detailed analyses, based on the epigenetic mismatches and boundary probabilities for chromosomes from the mouse liver (Fig.5). The Appendix contains analyses of the experimental data, and the results for HCT-116 cell are given in Appendix Fig. 12. To illustrate different scenarios, we consider the 2.5Mbs regions from Chr6 (Fig.5c), Chr7 (Fig.5d), and Chr15 (Fig.5e). (i) For Chr 6, there are three TADs according to TopDom (Fig.5c) in the WT. Upon Δ*Nipbl*, these TADs are abolished (compare the top and bottom panels in Fig 5c). The epigenetic track indicates that the region is mostly in the repressive (B) state. Quantification of the boundary probabilities along the *2.5Mb* region of Chr6 shows that the TADs also lack physical boundaries upon Š*Nipbl.* (ii) Examples of P-TADs that satisfy the epigenetic mismatch criterion are given in Fig.5d. Using TopDom, we identified several TADs (top panel in Fig.5d) in this region of Chr7. It is interesting that the boundary probabilities obtained from the HIPPS-generated distance matrices are also large when there is a mismatch in the epigenetic states. In these examples, both epigenetic mismatches and boundary probabilities give consistent results (see the dashed blue lines in Fig.5d). Two TADs in the WT (the ones on the right in the upper panel in Fig.5d) merge to form a single TAD in the Δ*Nipbl.* This observation is in accord with expectation based on epigenetic mismatch, whose corollary is that if there is a TAD within a region that contains predominantly A or B type loci they ought to merge upon Δ*Nipbl.* (iii) In the *2.5Mb* region of Chr15, there are three TADs in the WT (top panel in Fig.5e). The first and the third TADs have an epigenetic mismatch at only one boundary (blue dashed line), and the expectation is that they would not be preserved upon *Nipbl* removal. However, the boundary probabilities show that the TADs have physical boundaries in both, and thus they are preserved. Taken together, the results in Fig.5 show that only by combining the epigenetic mismatches (Hi-C data is sufficient) and the boundary probabilities (3D structures are required), one can account for a number of P-TADs.

## Single-cell structural change upon cohesin depletion

Finally, we asked whether HIPPS can capture the 3D structural changes in cohesin-depleted cell at the single-cell level. To this end, we compare the structures obtained using HIPPS with the imaging data [40], which examined the consequences of Δ*RAD21* in HCT-Π6. We used HIPPS to the Hi-C contact map on the same genomic region as in the experiment [40] to generate the 3D structures. The results of our calculations for a 2.5Mbp in Chr21 (34.6-37.1Mb) region from HCT-Π6 cell line for the WT and Δ*RAD21* are presented in Fig.6. The DMs were calculated from the calculated 3D structures using the HIPPS method (Fig.6a). The mean DMs for the WT and Δ*RAD21* are shown on the left and right panels in Fig.6b. Similar results for Chr 4 in mouse liver cell (Chr2 in HCT-116) are displayed in Appendix Fig.9 (Appendix Fig. 10). Several inferences may be drawn from the results in Fig.6. (1) There are large variations in the distance matrices and domain boundary locations/strengths from cell to cell (Fig.6c and d). This finding is in excellent agreement with imaging data [40]. (2) In both experiments and our calculations, there are TAD-like structures at the single level even after *RAD2I* is removed (see the right panel in Fig.6c for the theoretical predictions). We should add that TAD-like structures in single cells with and without cohesin have also been found using the strings and binders polymer model [12]. (3) The calculated boundary strength distribution (blue histogram in the left panel in Fig.6d) for the WT is in reasonable agreement with the measured distribution (purple histogram from [40]). Similarly, the calculated and measured boundary strength distributions for Δ*RAD21* cells are also in good agreement (right panel in Fig.6d). Just as in experiments [40], we find that the distributions of boundary strength are the same in the WT and in cells without *RAD21.* (4) We also find that the average locus-dependent boundary probability obtained using theory is in very good agreement with the reported experimental data (compare the curves in the left panel in Fig.6e for the WT and the ones on the right for Δ*RAD*21 cells).

# III. DISCUSSION

By analyzing the experimental Hi-C data [26, 34], we showed that upon cohesin loss a non-negligible fraction of TADs is preserved. To examine the factors that control the P-TADs, we first performed polymer simulations of two chromosomes in the presence and absence of cohesin-associated CTCF loops from the GM 12878 cell line. The simulations for chromosomes from a cell line reproduced the salient findings noted in the experiments in mouse liver and HCT-Π6 cells [26, 34]. The key findings in the simulations are: (1) Mismatches in the epigenetic states across the TAD boundary account for a large fraction of P-TADs. Certain P-TADs, that lack epigenetic mismatches, could be preserved as revealed by the presence of physical boundaries in the 3D structures. (2) In general, there is an overall consistency between predictions based on boundary probability calculations using 3D structures, and mismatches in the epigenetic switches across TAD boundaries.

Informed by the simulation results, we analyzed the experimental Hi-C data on chromosomes from two cell lines [26, 34]. Analyses of the structures determined from the Hi-C map using the HIPPS [45], with and without *Nipbl* or *RAD*21, showed that A-type loci form larger spatial clusters after cohesin removal, consistent with enhancement of compartments inferred from Hi-C CMs (Appendix Fig.4). More importantly, we found that a large fraction of P-TADs can be accounted for by the epigenetic mismatches and the preferential boundaries in 3D structures of chromosomes. Thus, a combination of the two perspectives accounts for the experimental observations pertaining to P-TADs upon cohesin loss[26, 34]. Most of the P-TADs with epigenetic mismatches in the CMs have a prominent peak in the boundary probability, which indicates a strong physical boundary in the 3D structures. If this interpretation is correct, then a natural conclusion is that not only micro-phase separation on the larger scale, *L*_{C} but also some special TADs on a shorter scale, *L*_{tad} are encoded in the epigenetic sequence.

## P-TADs are due to enhancer/promoter interactions

Cohesin is thought to directly or indirectly regulate enhancer-promoter (E-P) interactions. However, strikingly a recent Micro-C experiment discovered that E-P and promoter-promoter (P-P) interactions are largely insensitive to acute depletion of cohesin [48]. It has been previously shown that E-P/P-P interactions form one or multiple self-associating domains, strips extending from domain borders and loop-like structures at their intersections at a finer scale [4951]. Based on the recent finding [48], we explored if maintenance of finer-scale E-P and P-P interactions is the origin of the P-TADs without epigenetic mismatches by analyzing Micro-C data [48]. The left panel in Fig.7a shows cohesin-associated (green dashed line) and cohesin-independent (blue dashed line) TAD structures (defined using Topdom) in the WT cells. In the latter case, the TADs E-P and P-P loops (blue circles) exist at the boundary but there is *no epigenetic mismatch*, implying that it is a domain needed for E-P or P-P communication. Interestingly, the TADs were also conserved upon cohesin loss (right panel in Fig.7a). Analyses of the 3D structures (Fig.7b) also reveal that the TADs with E-P/P-P loops have strong physical boundaries even in the absence of cohesin. Fig.7c shows an example of a TAD with both E-P/P-P loops and cohesin/CTCF loops at the boundary in the WT cells is retained after cohesin deletion, and is associated with prominent boundary peaks. We propose that only a subset of TADs are conserved, potentially for functional reasons. Taken together, our observations suggest that maintenance of E-P/P-P interactions is the origin of the P-TADs even in the absence of epigenetic mismatches. It is worth emphasizing that these conclusions can only be obtained by analyzing the 3D structures, which we calculated from the Micro-C contact maps using the parameter free HIPPS method [45].

## TAD-like structure are preserved at the single-cell level

Remarkably, the conclusion that there are cell-to-cell variations in the distance maps, noted in imaging experiments [40], are also fully affirmed in the calculated 3D structures calculated directly from the ensemble Hi-C data. In addition, the calculations also show that in mouse liver and HCT-116 cell lines, TAD-like structures are present after removing *Nipbl* and *RAD21*, respectively. We established that the Hi-C data sets can be utilized to generate 3D structures using HIPPS, which allows us to examine the impact of perturbations on three-dimensional organization of interphase chromosomes at the single-cell level. These results are significant because (1) only a limited number of loci can be directly imaged whereas Hi-C data can be routinely generated at higher resolution, **(2)** the number of Hi-C data on various cell types and species currently is greater than that from imaging data.

## Comments on the Methods

In order to explore the factors that control the P-TADs, there are two assumptions. (1) The results of the Hi-C experiments are taken at face value in the sense. We view this as an assumption because errors in the Hi-C readouts may be difficult to evaluate even though such experiments are invaluable [52]. (2) The TADs were identified using TopDom, one of many TAD callers. A recent survey [53] shows that, although the finding that TADs are hierarchically organized domains is robust, there are substantial variations in the identification of these domains predicted by different methods. Although TopDom fairs reasonably well in comparison to other methods, there is no guarantee that it identifies the TAD location or the number of TADs accurately. It is only for convenience that we used TopDom as the reference to which the results using the boundary probabilities are compared.

# Acknowledgements

We are grateful to Alistair Boettiger for discussions and useful comments. We thank Sucheol Shin, Atreya Dey, and Debayan Chakraborty for useful discussions. This work was supported by the National Science Foundation (CHE 19-00033) and the Welch Foundation (F-0019) through the Collie-Welch Regents Chair.

# Data availability

Data presented in this study is available upon reasonable request to the corresponding author.

# Methods

We performed polymer simulations for the following reasons: (i) Because all TAD calling schemes are approximate, we evaluated the accuracy of given protocol (TopDom [44] in our study) using the well calibrated CCM. TAD identification in the CCM simulations could be made directly from the 3D structures, thus allowing us to test the validity of the TopDom method. (ii) The combination of 3D structures, assignment of epigenetic states using ChromHMM [47], and accurate calculation of the Hi-C maps using the CCM, was used to determine the origin of P-TADs. (iii) An added bonus is that the polymer simulations on an entirely different cell type (human GM 12878) could be used to assess the robustness of our conclusions.

To avoid biases in the formulation of the hypothesis to explain TAD preservation, we simulated chromosomes from the Human Cell line (GM 12878), which is different from the cell lines used in experiments [26, 34]. In the main text, we report results for Chr13 (19115.10 Mbp). The total number of *5Okb* loci is *N* = 1923, and the total number of loop anchor pairs is 72. To ensure that the results are robust, we also simulated Chr10 (Appendix Fig.3.

## Chromosome Copolymer Model (CCM) for Chromosomes

We modified the Chromatin Copolymer Model (CCM) [11] in order to simulate full length interphase chromosomes. In the CCM, chromosomes are modeled as a self-avoiding copolymer with A (B) type loci representing the active (repressive) epigenetic state. The connectivity between two nearest neighbor loci (*nn*), *i* and *i* + 1, separated by a distance *r _{nn}* = |

*r*– r

_{i}

_{i}_{+1}|, is given by a Finitely-Extensible Nonlinear Elastic (FENE) potential,

where *K _{Fene}* is the spring constant, and

*R*

_{0}is the maximum extent of the bond. Nonbonded interaction between two loci that are not directly connected to each other is given by the Lennard-Jones (LJ) potential,

where α and *ß* can be either A or B. Finally, we used a harmonic potential for the CTCF loop anchors *p* and *q* that are typically stabilized by cohesin. The loop anchor potential is,

where *K _{h}* is the spring constant and

*r*

_{0,h}is the equilibrium length between the CTCF loop anchors. The CCM energy function is,

The unit of energy is *k _{B}T*, where

*k*is the Boltzmann constant and

_{B}*T*is the temperature.

We used the CCM simulations in order to deduce the reasons for preservation of certain TADs when the loop anchors are deleted. The simulations must reproduce the two major findings [26, 34]: (a) propensity of the A and B loci to segregate should be enhanced upon removal of cohesin; (b) a fraction of TADs should be preserved upon cohesin or cohesin-associated CTCF loop loss. Each locus in the polymer is either A type (active locus is in red) or B type (repressive locus is in blue) (Appendix Fig.2a). The locus type is determined using the Broad ChromHMM track [57–59]. There are 15 chromatin states, out of which the first eleven are related to gene activity, based on which we group states 1-11 as active state (A), and states 12-15 as repressive state (B). The locations of the CTCF/cohesin-mediated loop anchors, which are fixed in the polymer simulations (cohesin is present), are obtained from the Hi-C data [4] (GSE63525_GMl2878_primary+replicate_HiCCUPS_with_motifs.txt.gz). Removal of the loop constraints mimics the absence of cohesin. In the Wild type (WT) simulations, the probability of loop anchor, *P _{L}* being present is unity. To model cohesin depletion, we set

*P*= 0 to assess the impact of deleting the loops on compartments and TADs.

_{L}## CCM at 5Okb-resolution

In our previous study [11], we used 1,200 bps resolution. Here, we used 50,000 bps (5Okb) resolution in order to model the entire length of the chromosomes. To determine the size of each locus, with *N _{bp}* base pairs, we assume [60] that the radius of gyration is . Assuming that a locus, with

*σ*

_{1,200}and

*σ*

_{50k}, represents a condensed polymer, with 1.2k and 50k base pairs, respectively, we expect that

*R*

_{g}_{,1200}∼ (1, 200)

^{1/3}and

*R*

_{g}_{,50,000}∼(50,000)

^{1/3}. By using this relation, we estimated the size of each locus,

*σ*

_{50k}= 3.466

*σ*

_{1200kb}. Similarly, the mass of the locus at 5Okb-resolution is modified as

*m*

_{50}

*k =*where

*m*

_{1,200}=l. The parameters for the bonding potentials (Eq.l and Eq.3) at 50kbps resolution of the CCM are given in Table I.

## Effective energy scales

The creation of CCM was motivated by the experimental observation that active and repressive loci segregate on a few mega base scale. By adopting the Flory-Huggins (FH) theory[60], the spatial segregation between A and B loci is modeled using a weaker A-B attraction compared to A-A and B-B interactions. With the assumption that *ε _{aa} = ε_{bb}* =

*ε*, which is made for simplicity, the only free parameter in the CCM is

*ε*By fixing the ratio to , we chose

_{ab.}*ε*such that the simulated CMs are in reasonable agreement with the Hi-C maps. Although a large number of energy functions could reproduce the Hi-C map [10, 13, 61], the CCM is perhaps the simplest copolymer model with only one unknown energy parameter. The unknown parameter is adjusted to match the predicted and measure Hi-C contact map as closely as possible.

## Simulations

We performed Langevin Dynamics (LD) simulations using the LAMMPS simulator by integrating the equations of motion,

where *r _{i}* is the position vector of the

*i*locus, and is the force on the

^{th}*i*locus,

^{th}*ζ*is the friction coefficient that is chosen to be . The random force

*δF*(

_{i}*t*) satisfies . We first did simulations using a small time step, , with only repulsive pairwise interactions between the loci to avoid numerical instabilities. After a certain number of time steps, the loci associated with the loops are in proximity, and undergo fluctuations around their equilibrium bond distance. At this stage, we increased the time step to 10

^{−2}τ

_{50k}, and turned on the attractive pairwise interactions, and continued the simulations for 10

^{8}Δ

*t*

_{50k}. We then performed LD simulations for an additional 10

^{8}Δ

*t*50

*k*to compute the structural properties. Because we are only interested in equilibrium structures, the values of and

*ζ*are irrelevant.

## Contact map

We calculated the contact frequencies, *Cĳ*, between loci *i* and *j* by computing the distance, *r _{ĳ} = |r_{j}* –

*r*| between them, and counting the number of instances, when

_{i}*r*1.75

_{ij}<*σ*

_{50k}. The set of elements,

*C*constituting the CM is a 2D representation of the chromosome organization (Appendix Fig.2(b) and Appendix Fig.3(a)).

_{ij}## Pearson correlation map

To assess the accuracy of the CCM predictions, we calculated the Pearson correlation maps (Appendix Fig.2(c) and Appendix Fig.3(b)) by first transforming the simulated CMs and the Hi-C data to a *log _{e}* scale. For each element,

*C*, we calculated,

_{ĳ}*Z*, using,

_{ĳ}
where , and *σ _{s}* is the standard deviation associated with

*C*. The Pearson correlation coefficient (PCC),

_{s}*ρĳ*, is calculated between the

*i*row,

^{th}*Xi*, and the

*j*column,

^{th}*Y*, associated with the matrix

_{j}*Z*whose elements are

*Z*. The PCC is where E denotes expectation,

_{ij}*µ*and

_{X}*µ*are the means of

_{Y}*X*and

*Y*, respectively, and

*σ*(

_{X}*σ*) is the standard deviation of

_{Y}*X*(

*Y*).

## Kullback-Leibler(KL) divergence

To measure the difference between two probability distributions that are functions of the same variable *x*, we calculated the Kullback-Leibler divergence (KL), *D _{kl}*(

*p*(

*x*),

*q*(

*x*)), which is a measure of the information loss when

*q*(

*x*) is used to approximate

*p*(

*x*). Here,

*p*(

*x*) and

*q*(

*x*) are the two probability distributions of a discrete random variable

*x*. Using the KL divergence, the difference between the PCC probability distributions obtained from the simulations,

*p*and experiments,

^{CCM}*p*were calculated. We define as as shown in Appendix Fig.2(d) and Appendix Fig.3(c).

^{EXP}## Ward Linkage Matrix (WLM)

We used the WLM, an agglomerative clustering algorithm method, to reveal the hierarchical organization on different length scales (Appendix Fig.2((h) and Appendix Fig.3(f)). In our previous study[ll], we showed that the contact probability is inversely proportional to a power of the spatial distance, . This relationship provides a way to convert Hi-C contact matrix to the spatial distance matrix. We computed WLM with our simulated spatial distance matrix, which is directly calculated in the simulations. To compare with experiments, we converted the Hi-C contact matrix to a spatial distance matrix, **R _{exp}** using the relation . The Ward Linkage Matrix (WLM),

**W,**from

**R**and the simulated spatial distance matrix

_{exp}**R**can be calculated, as described previously [11].

_{sim},## Density-based spatial clustering of applications with noise (DBSCAN)

DBSCAN is a clustering algorithm [62] that finds regions of high density by grouping together data points that are in proximity based on spatial distribution. For DBSCAN, two parameters, *Epsilon* and *MinPoints* are required; *Epsilon* is a threshold distance between two loci that is used to classify if they belong to the same cluster whereas *MinPoints* is the minimum number of data points needed to form a dense region. The *MinPoints* can be derived from the dimensions, *D* in the data points, as *MinPoints > D + 1*. We use the recommended value [62] *MinPoints* = 2×D.

The optimal *Epsilon* value is determined using k-distance graph. We set *MinPoints=6* and calculated the distance from every point to the *k ^{th}* nearest neighbor in each cell. The k-distances are plotted in an ascending order, and a reasonable value corresponds to the maximum curvature (elbow) in this plot. It is likely that optimal values depend on the chromosomes, A/B loci type and the cell type. For example, we found that the optimal

*Epsilon*values are and for A (B) loci in Chr13 (CCM), Chr19 (Mouse liver) and Chr15 (HCTΠ6) of both WT and CTCF loops/cohesin depleted cells, respectively (Each

*σ*represents the average distance between

*i*and

*i +*l loci in the chromosome.). With the optimal parameters, we identified the number of A (B) clusters,

*N*(

_{a}*N*) in 10,000 individual structures in the chromosome (see Appendix Fig.4). In addition, we calculated the size of each cluster,

_{b}*S*(

_{a}*S*), which is defined as (the number of A (B) loci within the cluster)/(the total number of A(B) loci within the chromosome).

_{b}The compartmental strength is enhanced after the removal of CTCF loops (Appendix Fig.5), indicating that CTCF loop loss leads to an enhanced tendency for micro-phase separation [26, 34-37, 40, 63, 64]. Thus, DBSCAN, a method that relies on 3D structures, and a method that uses only the CM produces qualitatively consistent picture of strengthening of compartments upon cohesin loss.

## TAD and P-TAD identification

TopDom [44] is one of many methods used to identify TADs. The average contact frequency around each locus, *i* between upstream (*i-w*+l, *i-w, …, i*) and downstream (*i+*l, *i+2, …, i+w*) regions with the free parameter, *w*, is calculated as the value of the binSignal. TAD boundaries are discovered as local minima in the binSignal. Subsequently, false detections in the local minima are filtered by using the Wilcox Rank Sum. We used the software package and source codes of TopDom (https://github.com/jasminezhoulab/TopDom) with default parameter, *w*=5. Two aspects concerning the implementation of TopDom should be kept in mind. (i) TopDom results change depending on parameter values. Large *w* produces big domains, reducing the total number of detected domains. (ii) There are some matrix columns/rows whose contact frequencies sum up to zero. We refer to them as missing bins. We selected only the domains whose boundaries have zero or one missing bin in a 25Okb range, since the presence of the missing bin influences contact insulation.

For completeness, let us define preserved TADs (P-TADs). We detected TADs using TopDom [44] based on the Hi-C data. First, *P-ΊADs* are those that remain in *both* the wild type (WT) cells and cohesin-depleted cells (Fig.2). If the boundaries between two TADs in cohesin-depleted cells are within ± 5Okb window from the corresponding boundary in the WT, and if there is greater than ≥ 80% overlap between the WT and cohesin-depleted cells, such a TAD is classified as a P-TAD. Second, *epigenetic mismatches across TAD boundaries* refer to the alteration of epigenetic state upon going from one TAD to the neighboring TAD (see Appendix Fig.1). For instance, one TAD consisting of predominantly euchromatin loci with the adjacent TAD comprising largely heterochromatin loci would create epigenetic mismatches across the boundary. We also used three dimensional structures of chromosomes, with and without cohesin, to calculate boundaries in three dimensions to determine the structural origin of P-TADs.

## P-TADs with epigenetic mismatches

The procedure of epigenetic mismatches is schematically shown in Appendix Fig.l. Mismatches that occupy only one locus (50*kb*) were excluded (see *I* in Appendix Fig.1). We considered the P-TAD boundary and epigenetic mismatch as overlapping if they are less than 100kb apart (*II* in Appendix Fig.l). Finally, P-TADs with epigenetic mismatches, consisting of <70% of sequences in identical epigenetic states, with epigenetic mismatches, were filtered out (*III* in Appendix Fig.l).

## Boundary strength and boundary probability

To measure the boundary strength [40, 65] for each locus *i*, we first calculated the median distance values *(L)* of the left three columns, each extending 6-elements below the diagonal, and the median value *(R)* of the right three 6-element columns below the diagonal. Similarly, the median value *(B)* of the right three 6-element columns above the diagonal, and the median value *(T)* of the left three 6-element columns above the diagonal were calculated.

The two boundary strengths, *L/R* (start-of-domain boundary strength) and *B/T* (end-of domain boundary strength) are computed as defined in Appendix Fig.6. The local maxima above a defined threshold in the start/end-of domain boundary strengths are identified as te start/end positions of the domain boundary, respectively. This is physically reasonable because at the boundary between two TADs is low which implies that has to be large. Based on the boundary positions in individual cells, we compute the start/end boundary probability for each locus as the fraction of chromosomes in which the corresponding locus is identified as a start/encl boundary of a domain. The average of these start and encl boundary probabilities for each locus is defined as the boundary probability at the locus (Appendix Fig.6).

## Appendix A: Analyses of the experimental data

A hypothesis that emerges from the CCM simulations is that the TADs, which are preserved with high probability upon cohesin deletion, have epigenetic mismatches across the TAD boundary, and are often accompanied by peaks in the boundary probability. In order to test this hypothesis, we analyzed the Hi-C contact map from Schwarzer et al. [34] (mouse liver) and Rao et al. [26] (HCT-116). In each experiment, cohesin loading factor, *Nipbl* and a core component of the cohesin complex, *RAD21* were depleted by employing a liverspecific, tamoxifen-inducible Cre driver and an auxin-inducible degron (AID), respectively. The availability of the wild-type (WT) and cohesin depleted (Δ*Nipbl* or Δ*RAD21*) CMs allows us to test the hypothesis derived from simulations.

In order to analyze the mouse liver data, we used the wild-type and *Nipbl*-depleted Hi-C contact maps at 5Okb-resolution from GEO: GSE9343l. The locations of the CTCF loops were determined using the HiCCUPS method in HiCPeaks [4] (https://github.com/XiaoTaoWang/HiCPeaks). HiCCUPS examines each pixel in a Hi-C contact matrix and detects loops by finding the pixels that are enriched relative to local neighborhoods (pixels to its lower-left, pixels to its left and right, pixels above and below, and pixels within a doughnut-shaped region surrounding the pixel of interest). We obtained the locations of the 3,301 loop anchors for 20 chromosomes from the wild type Hi-C contact maps. Epigenetic landscape is determined using the Broad ChromHMM track for mouse liver [66] (https://github.com/gireeshkbogu/chromatin_states_chromHMM_mm9). Among the 15 chromatin states in the track, we assigned states 1-10 and 15 to be in the active state (A) because they are related to gene transcription. States 11-14 correspond to heterochromatin, and hence are taken to the repressive (B).

For the HCT-116 cell, we used 5Okb-resolution Hi-C map for untreated and *RAD21*-depleted cells obtained from GEO: GSE104334. Chromatin state characterization was performed using ChromHMM [47] considering twelve histone modifications ChIP-seq data (H3K27ac, H3K9ac, H3K27me3, H3K36me3, H3K79me2, H3K9me2, H3K9me3, H4K2Omel, H3K4mel, H3K4me2, H3K4me3, and CTCF) that are available in the ENCODE Project Consortium. Chromatin functional states are annotated based on the study by Moudgil et. al. [67]. For this cell line, we assigned states 1-9 and 15 to be in the the A (active) because they are related to gene transcription. States 10-14 are repressed, and hence may be classified as repressive (B) (Appendix Fig.7). Locations of the chromatin loops for HCT 116 from Hi-C contact maps were determined using HiCCUPS in juicer [55] (https://github.com/ aidenlab/juicer) following the procedure from Rao at al [26]. Using this procedure, we detected 3,624 loop anchors for 23 chromosomes from wild type Hi-C contact maps.

### Calculation of the 3D structures of chromosome from Hi-C contact maps

In order to identify the A/B clusters and calculate the boundary probability, we need the 3D coordinates of the loci. We used the HIPPS (Hi-C-polymer-physics-structures) [68] method, which uses the Hi-C contact map as the input, to generate an ensemble of three-dimensional chromosome structures. In the HIPPS method, the mean distance matrix is generated using polymer physics using a power-law relation between the mean contact probability, between loci *i* and *j* and the average spatial distance, . An ensemble of 3D structures is calculated by using the mean distance matrix as a constraint using the principle of maximum entropy. We generated an ensemble of 10,000 individual 3D structures by applying the HIPPS method to Hi-C contact map for each chromosome for the two cell lines. The 3D structures are used to calculate the boundary positions (see below) to identify potential single-cell domains, which complements the TopDom analysis.

## Appendix B: Effect of *RAD21* removal

Single-cell studies [12, 40, 42] on a *2.5Mb* region (Chr21:34.6-37.1Mb) in HCTΠ6 cell showed that several pronounced TAD structures detected in the WT cells were eliminated if *RAD21* is degraded. This could be predicted by the EMH because the degraded TADs are mostly composed of active (A) loci. Our analysis for the same region also shows flat domain boundary probability (Appendix Fig. 12a). In contrast, Appendix Fig. 12b reveals that preferential TAD boundaries persist, and are retained despite *RAD21* loss, which is associated with epigenetic mismatches. Furthermore, some TADs without epigenetic mismatches are preserved, and could be identified based on the presence of physical boundaries in the 3D structures (green lines in Appendix Fig. 12c).

## Appendix C: Corner dots in P-TADs

Our analyses of the experimental data show that TADs whose borders correspond to both epigenetic mismatch and CTCF loop anchors in the WT cells are preserved after removal of cohesin. However, not all P-TADs have loop anchors at their boundaries (corner dots) in the WT cells (see Appendix Fig. 11). Out of the 280 (396) P-TADs with epigenetic mismatches in mouse liver (HCT-116) chromosomes, 117 (170) did not have corner dots at their boundaries before deleting *Nipbl (RAD2I)*. This suggests that some TAD boundaries are formed in the absence of corner dots. It is the underlying epigenetic landscape that is the predominant factor in the formation of such domain boundaries.

## Appendix D: Loss of cohesin-associated CTCF loops leads to an increase in the degree of compartmentalization

The first experimental finding is that cohesin knockdown results in preservation or even enhancement of the compartment structure[26, 34, 36, 39, 40]. To ensure that CCM reproduces this finding, we first explored how cohesin-associated CTCF loop deletion affects compartmentalization by performing simulations without loop anchors, which is implemented by setting *P _{l} =* 0 (see Methods). We find that the plaid patterns persist after the deletion of the loops (Appendix Fig.4a). Visual comparison between the two CMs in Appendix Fig.4a shows that this is indeed the case. Moreover, the rectangles in Appendix Fig.4a also show that finer features, absent when

*P*is unity, emerge when

_{l}*P*0, which is an indication of increase in the compartment strength. Pearson correlation maps (lower panel in Appendix Fig.4a) illustrate the reduction in the contact profiles between A and B loci (dark-blue color) after loop loss. There is an enhancement in interactions between same type loci (A-A or B-B) (dark-red color). The results in Appendix Fig.4a confirm the experimental finding that disruption of the loops not only creates finer features in the compartment structures but also results in increased number of contacts between loci of the same type, which is simultaneously accompanied by a decrease in the number of A-B interactions.

_{l}=We then investigated if the enhancement of compartments observed in the CMs has a structural basis. An imaging study[69] showed that active loci form larger spatial clusters in cohesin depleted cells. To probe whether this finding is reproduced in our CCM simulations, we first calculated the spatial distance matrix (DM) using 10,000 simulated 3D structures. For individual matrices, with and without cohesin-associated CTCF loops, we identified A (B)-dense regions as A (B) clusters, respectively, using the DBSCAN (Densitybased spatial clustering of applications with noise) method [62]. Appendix Fig.4b shows the number of clusters obtained using DBSCAN. The number of A clusters varied between individual structures, which reflects the heterogeneity in the chromosome organization. On an average, there are about five A clusters for *P _{L}* = 1, which decreases to four with

*P*0, which is one indication of the increase in the compartment strength. We also calculated the size of the clusters, which is the average fraction of loci in each cluster in individual structures (right panel in Appendix Fig.4c). On an average, the size of A clusters is greater after loop loss. This implies that active clusters merge in space to form larger and more connected clusters upon deleting the loops. In contrast, there is no change in the number and size of B clusters (see Appendix Fig.4a). Taken together, our observations for A and B clusters show that the A loci form larger clusters upon loop loss, which leads to stronger segregation between A and B loci after the loops are deleted.

_{L}=In the Appendix Fig.4, we used the DBSCAN method to demonstrate the enhanced compartmentalization upon cohesin deletion. In order to assess if our method produces results that are consistent with other techniques, we also calculated the compartment strength used in previous studies [70, 71]. In this method, chromosomal interaction frequencies are first normalized by the average interaction frequency at a given genomic distance, s. Then the distance corrected interaction frequencies are sorted based on the PC1 value for each locus *i*, which we calculated from the contact map. Finally, the frequencies were aggregated into 50 bins to obtain the saddle plot (Appendix Fig.5) in which the top left corner (B-B) indicates the frequency of interaction between the B compartments, while bottom right (A-A) represents the contact frequency between the A compartments. Top Right (B-A) and Bottom Left (A-B) corners are interaction frequencies between A and B compartments. The strength of the compartment is calculated using, ((A–A) + (B–B)) / ((A–B) + (B–A)). The values used for this ratio were determined by calculating the mean value of 20% bins in each corner of the saddle plot. We used cooltools for saddle plot calculations (https://github.com/open2c/cooltools).

### Relation to experiments regarding the compartmenlization

Besides being consistent with imaging experiments [69], we wondered whether the structural changes inferred from the Hi-C maps from the mouse liver and HCT-116 cell lines liver [26, 34] could be used to demonstrate the strengthening of compartments upon deleting cohesin. We calculated an ensemble of 10,000 3D structures for chromosomes from the two cell lines using the parameter free HIPPS method [45]. Consistent with the simulation results, we also observed variations in the spatial organization of A and B loci in individual structures in WT and cohesin depleted cells. DBSCAN analysis identified a smaller number A/B clusters with larger size in Δ*Nipbl* (Δ*RAD*21) cells compared to WT cells, as shown in Appendix Fig.4d for Chr11 and Chr19 (see Appendix Fig.4e for HCT-116 results).

To provide additional insights into the calculated conformations, we constructed Pearson correlation matrices using contact maps calculated from the 3D structures. Appendix Figs.4f and g show that cohesin loss induces stronger plaid patterns, which is consistent with simulation results (lower panel of Appendix Fig.4a) for Chr13 from GM12878 cell line. Comparison of the 3D structural changes with and without cohesin shows that the micro-phase separation between active and inactive loci is strengthened, with larger A and B physical clusters upon depletion of cohesin, which accords well with Hi-C experiments [69].

# References

- [1]An Overview of Genome Organization and How We Got There: from FISH to Hi-C
*Microbiol. Mol. Biol. Rev***79**:347–372 - [2]Spatial organization of chromatin domains and compartments in single chromosomes
*Science***353**:598–602 - [3]Super-resolution imaging reveals distinct chromatin folding for different epigenetic states
*Nature***529**:418–422 - [4]A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping
*Cell***159**:1665–1680 - [5]Exploring the three-dimensional organization of genomes: interpreting chromatin interaction data
*Nat. Rev. Genet***14**:390–403 - [6]Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome
*Science***326**:289–293 - [7]Topological domains in mammalian genomes identified by analysis of chromatin interactions
*Nature***485**:376–380 - [8]Principles of genome folding into topologically associating domains
*Sei. Adv***5** - [9]Complexity of chromatin folding is captured by the strings and binders switch model
*Proc. Natl. Acad. Sci. U.S.A.***109**:16173–16178 - [10]Transferable model for chromosome architecture
*Proc. Natl. Acad. Sci. U.S.A.***113**:12168–12173 - [11]Interphase human chromosome exhibits out of equilibrium glassy dynamics
*Nat. Commun***9** - [12]Polymer physics indicates chromatin folding variability across single-cells results from state degeneracy in phase separation
*Nat. Commun***11**:3289–3289 - [13]Modeling epigenome folding: formation and dynamics of topologically associated chromatin domains
*Nucleic Acids Res***42**:9553–9561 - [14]Active and poised promoter states drive folding of the extended HoxB locus in mouse embryonic stem cells
*Nat. Struct. Mol. Biol***24**:515–524 - [15]Architectural protein subclasses shape 3D organization of genomes during lineage commitment
*Cell***153**:1281–1295 - [16]Hierarchical folding and reorganization of chromosomes are linked to transcriptional changes in cellular differentiation
*Mol. Syst. Biol***11** - [17]Mapping nucleosome resolution chromosome folding in yeast by micro-C
*Cell***162**:108–119 - [18]Micro-C XL: assaying chromosome conformation from the nucleosome to the entire genome
*Nat. Methods***13**:1009–1011 - [19]Resolving the 3D landscape of transcription-linked mammalian chromatin folding
*Mol. Cell***78**:539–553 - [20]Control of cell identity genes occurs in insulated neighborhoods in mammalian chromosomes
*Cell***159**:374–387 - [21]The role of insulation in patterning gene expression
*Genes***10** - [22]Insulated Neighborhoods: Structural and Functional Units of Mammalian Gene Control
*Cell***167**:1188–1200 - [23]Ji, X.; Dadon, D.; Powell, B.; Fan, Z.; Borges-Rivera, D.; Shachar, S.; Weintraub, A.; Hnisz, D.; Pegoraro, G.; Lee, T.; Misteli, T.; Jaenisch, R.; Young, R. Cell Stem Cell 2016, 18, 262–275.
*Cell Stem Cell***18**:262–275 - [24]Disruptions of topological chromatin domains cause pathogenic rewiring of gene-enhancer interactions
*Cell***161**:1012–1025 - [25]Activation of proto-oncogenes by disruption of chromosome neighborhoods
*Science***351**:1454–1458 - [26]Cohesin loss eliminates all loop domains
*Cell***171**:305–320 - [27]A CRISPR Connection between Chromatin Topology and Genetic Disorders
*Cell***161**:955–957 - [28]Polymer physics predicts the effects of structural variants on chromatin architecture
*Nat. Genet***50**:662–667 - [29]Shaping of the 3D genome by the ATPase machine cohesin
*Exp. Mol. Med***52**:1891–1897 - [30]Formation of Chromosomal Domains by Loop Extrusion
*Cell Rep***15**:2038–2049 - [31]CTCF: master weaver of the genome
*Cell***137**:1194–1211 - [32]CRISPR inversion of CTCF sites alters genome topology and enhancer/promoter function
*Cell***162**:900–910 - [33]Comparative Hi-C Reveals that CTCF Underlies Evolution of Chromosomal Domain Architecture
*Cell Rep***10**:1297–1309 - [34]Two independent modes of chromatin organization revealed by cohesin removal
*Nature***551**:51–56 - [35]Chromatin organization by an interplay of loop extrusion and compartmental segregation
*Proc. Natl. Acad. Sci. U.S.A.***115**:E6697–E6706 - [36]Topologically associating domains and chromatin loops depend on cohesin and are regulated by CTCF, WAPL, and PDS5 proteins
*EMBO J.***36**:3573–3599 - [37]The Cohesin Release Factor WAPL Restricts Chromatin Loop Extension”
*Cell***169**:693–707 - [38]Targeted Degradation of CTCF Decouples Local Insulation of Chromosome Domains from Genomic Compartmentalization
*Cell***169**:930–944 - [39]Cohesin and CTCF differentially affect chromatin architecture and gene expression in human cells
*Proc. Natl. Acad. Sci. U.S.A.***111**:996–1001 - [40]Super-resolution chromatin tracing reveals domains and cooperative interactions in single cells
*Science***362** - [41]New insights into genome folding by loop extrusion from inducible degron technologies
*Nat. Rev. Genet*:1–13 - [42]Characterizing chromatin folding coordinate and landscape with deep learning
*PLoS Comput. Biol***16**:1–19 - [43]CTCF and cohesin regulate chromatin loop stability with distinct dynamics
*ELĳe***6** - [44]TopDom: an efficient and deterministic method for identifying topological domains in genomes
*Nucleic Acids Res***44**:e70–e70 - [45]From Hi-C Contact Map to Three-Dimensional Organization of Interphase Human Chromosomes
*Phys. Rev. X***11** - [46]TAD-like single-cell domain structures exist on both active and inactive X chromosomes and persist under epigenetic perturbations
*Genome Biol***22**:1–26 - [47]ChromHMM: automating chromatin-state discovery and characterization
*Nat. Methods***9**:215–216 - [48]Enhancer-promoter interactions and transcription are largely maintained upon acute loss of CTCF, cohesin, WAPL or YY1
*Nat. Genet***54**:1919–1932 - [49]Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C
*Nature genetics***47**:598–606 - [50]The pluripotent regulatory circuitry connecting promoters to their long-range interacting elements
*Genome Res***25**:582–597 - [51]Resolving the 3D landscape of transcription-linked mammalian chromatin folding
*Mol. Cell***73**:539–553 - [52]Systematic evaluation of chromosome conformation capture assays
*Nat. Methods***18**:1046–1055 - [53]Comparison of computational methods for the identification of topologically associating domains
*Genome Biol***19**:1–18 - [54]Juicebox. js provides a cloud-based visualization system for Hi-C data
*Cell systems***6**:256–258 - [55]Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments
*Cell Syst***3**:95–98 - [56]hnRNPK recruits PCGF3/5-PRC1 to the Xist RNA B-repeat to establish polycomb-mediated chromosomal silencing
*Mol. Cell***68**:955–969 - [57]ENCODE data in the UCSC Genome Browser: year 5 update
*Nucleic Acids Res***41**:D56–D63 - [58]Discovery and characterization of chromatin states for systematic annotation of the human genome
*Nat. Biotechnol***28**:817–825 - [59]Mapping and analysis of chromatin state dynamics in nine human cell types
*Nature***473**:43–49 - [60]Polymer physics
- [61]Heterochromatin drives compartmentalization of inverted and conventional nuclei
*Nature***570**:395–399 - [62]Density-based clustering in spatial databases: The algorithm GDBSCAN and its applications
*Data Min Knowl Discoυ***2**:169–194 - [63]Cohesin-mediated interactions organize chromosomal domain architecture
*EMBO J***32**:3119–3129 - [64]Cohesin-based chromatin interactions enable regulated gene expression within preexisting architectural compartments
*Genome Res***23**:2066–2077 - [65]TAD-like single-cell domain structures exist on both active and inactive X chromosomes and persist under epigenetic perturbations
*Genome Biol***22** - [66]Chromatin and RNA Maps Reveal Regulatory Long Noncoding RNAs in Mouse
*Cell. Mol. Biol***36**:809–819 - [67]Self-Reporting Transposons Enable Simultaneous Readout of Gene Expression and Transcription Factor Binding in Single Cells
*Cell***182**:992–1008 - [68]From Hi-C Contact Map to Three-Dimensional Organization of Interphase Human Chromosomes
*Phys. Rev. X***11** - [69]BRD2 compartmentalizes the accessible genome
*Nat. Genet***54**:481–491 - [70]Liquid chromatin Hi-C characterizes compartment-dependent chromatin interaction dynamics
*Nat. Genet***53** - [71]A chromosome folding intermediate at the condensin-to-cohesin transition during telophase
*Nat. Cell Biol***21**:1393–1402 - [72]A unified framework for inferring the multi-scale organization of chromatin domains from Hi-C
*PLoS Comput. Biol***17**:1–27 - [73]Evolutionarily Conserved Principles Predict 3D Chromatin Organization
*Mol. Cell***67**:837–852 - [74]On the existence and functionality of topologically associating domains
*Nat. Genet***52**:8–16

# Article and author information

## Version history

- Preprint posted:
- Sent for peer review:
- Reviewed Preprint version 1:
- Reviewed Preprint version 2:
- Version of Record published:

## Copyright

© 2023, Jeong et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

# Metrics

- views
- 707
- downloads
- 77
- citation
- 1

Views, downloads and citations are aggregated across all versions of this paper published by eLife.