1. Chromosomes and Gene Expression
Download icon

Exploring chromosomal structural heterogeneity across multiple cell lines

  1. Ryan R Cheng  Is a corresponding author
  2. Vinicius G Contessoto
  3. Erez Lieberman Aiden
  4. Peter G Wolynes
  5. Michele Di Pierro  Is a corresponding author
  6. Jose N Onuchic  Is a corresponding author
  1. Center for Theoretical Biological Physics, Rice University, United States
  2. Brazilian Biorenewables National Laboratory - LNBR, Brazilian Center for Research in Energy and Materials - CNPEM, Brazil
  3. Center for Genome Architecture, Baylor College of Medicine, United States
  4. Department of Chemistry, Rice University, United States
  5. Department of Physics & Astronomy, Rice University, United States
  6. Department of Biosciences, Rice University, United States
  7. Department of Physics, Northeastern University, United States
Research Article
  • Cited 4
  • Views 1,344
  • Annotations
Cite this article as: eLife 2020;9:e60312 doi: 10.7554/eLife.60312

Abstract

Using computer simulations, we generate cell-specific 3D chromosomal structures and compare them to recently published chromatin structures obtained through microscopy. We demonstrate using machine learning and polymer physics simulations that epigenetic information can be used to predict the structural ensembles of multiple human cell lines. Theory predicts that chromosome structures are fluid and can only be described by an ensemble, which is consistent with the observation that chromosomes exhibit no unique fold. Nevertheless, our analysis of both structures from simulation and microscopy reveals that short segments of chromatin make two-state transitions between closed conformations and open dumbbell conformations. Finally, we study the conformational changes associated with the switching of genomic compartments observed in human cell lines. The formation of genomic compartments resembles hydrophobic collapse in protein folding, with the aggregation of denser and predominantly inactive chromatin driving the positioning of active chromatin toward the surface of individual chromosomal territories.

Introduction

The 3D spatial organization of the chromosomes in the nucleus of eukaryotic cells appears to be cell-type-specific (Rao et al., 2014; Dixon et al., 2012; Dixon et al., 2015; Rowley et al., 2017; Dekker and Heard, 2015; Yu and Ren, 2017; Tang et al., 2015). What determines this cell- type-specific organization and how that organization relates to patterns of gene expression remain crucial questions in structural genomics.

DNA–DNA ligation experiments have revealed spatial compartmentalization, generally termed A/B compartmentalization (Lieberman-Aiden et al., 2009), and CTCF-mediated loop domains. It was observed that the A compartment chromatin contains a larger amount of the expressed genes while the B compartment chromatin is less transcriptionally active. Similar A/B compartmentalization has been observed across human cell lines (Rao et al., 2014; Dixon et al., 2012; Dixon et al., 2015) as well as in other species (Dixon et al., 2012; Rowley et al., 2017; Dudchenko and Shamim, 2018; Sexton et al., 2012; Eagen et al., 2015; Zhang et al., 2012), suggesting that compartmentalization is a conserved feature of genome organization across evolution. While single-cell structures can be interrogated using proximity ligation assays (Nagano et al., 2013; Stevens et al., 2017; Tan et al., 2018), high resolution has so far only been achieved through ligation methods when the experiments are performed over a large population of cells, thus averaging over the respective individual 3D structures.

Recent microscopy approaches have begun to reveal the 3D structures of segments of chromatin longer than a megabase at a spatial resolution on the nanometer scale (Bintu et al., 2018; Boettiger et al., 2016; Nir et al., 2018; Beliveau et al., 2015). These approaches not only allow for the quantification of pairwise and higher-order interactions between loci, but also allow some quantification of the structural variability in a population of cells. One consistent observation from the imaging approaches, as well as from single-cell DNA–DNA ligation experiments (Nagano et al., 2013; Stevens et al., 2017; Tan et al., 2018; Finn et al., 2019), has been the high degree of structural variability seen within an apparently homogeneous population of synchronized cells of a single-cell type. Despite this variability, well-defined cell-type-specific DNA–DNA ligation maps for the ensemble emerge after population averaging the single-cell results.

The high degree of structural variability observed for chromatin necessitates structural models that go beyond a single energetic basin; without the existence of a native structure, Elastic Network Models (Atilgan et al., 2001) are likely not appropriate. Polymer models (Barbieri et al., 2012; Jost et al., 2014; Gürsoy et al., 2017; Brackley et al., 2016; Tjong et al., 2012; Nuebler et al., 2018; Zhang and Wolynes, 2015; Di Pierro et al., 2016; Wong et al., 2012; MacPherson et al., 2018) that describe the process of chromosome organization have been proposed. In particular, the Minimal Chromatin Model (MiChroM) has been shown to accurately predict the population-averaged DNA–DNA ligation maps (Di Pierro et al., 2016; Di Pierro et al., 2017; Di Pierro et al., 2018; Contessoto et al., 2019). Chromosomes are described as polymers subject to interactions which depend on the chromatin biochemical composition and on the genomic distance separating any two loci (Di Pierro et al., 2016). Genomic distance-dependent interactions recapitulate the effect of motors acting along the DNA polymer and result in lengthwise compaction of chromatin. Interactions depending on chromatin biochemical composition recapitulate transient binding among chromosomal loci and result in the emergence of compartmentalization through a process of phase separation, in which chromatin of the same biochemical type preferentially co-localizes. The propensity toward phase separation for chromosomes of human lymphoblastoid cells can be reliably predicted using epigenetic marking data (Di Pierro et al., 2017), suggesting that the information contained within the 1D epigenetic marking patterns decorating the chromatin polymer is sufficient to predict the ensemble of 3D chromosome structures. A neural network called MEGABASE (Di Pierro et al., 2017) was trained to quantify the statistical relationship between the experimental sub-compartment annotations and the histone methylation and acetylation markings tracks, as assayed using chromatin immunoprecipitation data. Once trained, MEGABASE can be used to predict the compartmentalization patterns of a chromosome using a set of epigenetic ChIP-Seq tracks as the sole input. Combining MEGABASE and MiChroM, we are able to simulate the structural dynamics of chromosomes.

We first use the MEGABASE+MiChroM computational pipeline (Di Pierro et al., 2017) to predict the 3D ensemble of chromosomal structures for several well-studied cell types: HMEC, HUVEC, IMR90, K562, HeLa-S3, and H1-hESC. To test these simulated 3D ensembles, we then generate ensemble averaged simulated ligation maps that are compared directly to population-averaged DNA–DNA ligation maps (Rao et al., 2014; Dixon et al., 2012). For the cell lines IMR90 and K562, we also use energy landscape tools to analyze the structures obtained through diffraction-limited microscopy by Bintu et al., 2018 for short ~2 Mb segments of chromatin and compare the experimental structural ensembles directly with the corresponding regions of the simulated chromosome 21 for IMR90 and K562. This comparison shows that not only the population averages but also the structural heterogeneity that is observed in human chromosomes in the interphase are consistent with our energy landscape model. Chromosomes do not adopt a single structure in the interphase, but rather, exhibit a high structural variability characteristic of a phase-separated liquid. We provide a detailed characterization of this structural heterogeneity for the experimentally imaged and simulated segments of chromatin using a collective variable commonly used to quantify structural similarity in protein folding theory. For a gene-rich chromatin segment, we uncover two dominant clusters of structures in both the experimental and simulated structural ensembles: closed structures and open dumbbell-like structures. The transition from a closed structure to an open dumbbell appears to be governed by a two-state process with an apparent free energy cost of about four times the effective information theoretic temperature. For a gene inactive segment, structural analysis reveals highly disordered structures that lack domain boundaries. Additionally, we further examine the structural differences between whole chromosomes belonging to different cell types. The simulations show that inactive segments of chromatin move to the interior of the chromosome, while gene active chromatin moves to the chromosome surface. This effect appears to be driven by the favorable effective interactions between loci belonging to the B compartment, which forms a stable interior core; a phenomenon reminiscent of the hydrophobic collapse much studied in protein folding.

Results and discussion

A polymer model of chromatin based on epigenetic features captures chromosome organization across different cell types

We previously developed a computational pipeline that can predict the 3D ensemble of chromosome structures by using chromatin immunoprecipitation tracks for histone modifications as input (Di Pierro et al., 2017). This approach was successfully used to predict the 3D chromosome structures for human lymphoblastoid cells (GM12878) using the experimental ChIP-Seq tracks for 11 histone modifications (Di Pierro et al., 2017), that is H2AFZ, H3K27ac, H3K27me3, H3K36me3, H3K4me1, H3K4me2, H3K4me3, H3k79me2, H3K9ac, H3K9me3, and H4K20me1. Predicted chromosome structures for human lymphoblastoid cells (GM12878) were found to be consistent with both DNA–DNA ligation and fluorescence in situ hybridization (FISH) experiments (Rao et al., 2014). Here we generate predictions beyond GM12878 to other well- studied cell lines for which we have found sufficient epigenetic marking data.

Using the MEGABASE neural network, which was previously trained using data from GM12878, and sourcing from the Encyclopedia of DNA Elements (ENCODE) database the ChIP-Seq tracks for the same 11 histone modifications previously used, sub-compartment annotations for all the autosomes of cell lines were generated that had never been used in the training phase of the neural network. These sequences of sub-compartment annotations, or chromatin types, then serve as input for molecular dynamics simulations using the Minimal Chromatin Model (MiChroM) (Di Pierro et al., 2016). Using this combined approach, the chromosomal structural ensembles for six additional cell lines were generated: human fetal lung cells (IMR-90), human umbilical vein endothelial cells (HUVEC), immortalized myelogenous leukemia cells (K562), human mammary epithelial cells (HMEC), human embryonic stem cells (H1-hESC), and HeLa-S3 cells.

For each cell type, averaging the simulated ensemble generates in silico DNA–DNA ligation maps, which are consistent with those determined experimentally. Figure 1 shows the comparison between simulated and experimental maps for IMR90 (Figure 1A), HUVEC (Figure 1B) and K562 (Figure 1C), demonstrating quantitative agreement. Corresponding comparisons of the compartmentalization patterns are also provided in Figure 1—figure supplements 13 for additional cell types HMEC, H1-hESC, and HeLa-S3, as well as for GM12878 in Di Pierro et al., 2017. In particular, the Pearson’s R between the simulated and experimental maps of matching cell type as a function of genomic distance shows that the long-range patterns of compartmentalization are captured over tens of mega-bases. To establish a term of comparison we calculated the Pearson’s R between the experimental DNA–DNA ligation maps of mismatching cell types. While the experimental observations on different cell lines do correlate with each other, computational modeling delineates the difference between cell type and appears to best match the experimental map when the cell types of simulation and experiment are matched up. This last result demonstrates that the present theoretical model discriminates well between different cell lines. Further, the Pearson’s R as a function of genomic distance demonstrates high quantitative agreement for matching cell types, comparable to the agreement between two biological replicates for the GM12878 ligation maps (Rao et al., 2014).

Figure 1 with 5 supplements see all
Prediction of chromosome structures for differentiated cell lines and for immortalized leukemia cells.

The 3D ensemble of chromosome structures was predicted for the cell types (A) IMR90, (B) HUVEC, and (C) K562 using the ChIP-Seq histone modification tracks for the respective cell lines found on ENCODE—shown are the structural predictions for chromosome 2. As validation, the chromosome structures were compared with the DNA–DNA ligation experiments of Rao et al., 2014, where the simulated map is shown on the bottom left triangle and the experimental map is shown on the top right triangle. The datasets are visualized using Juicebox (Durand et al., 2016). The MEGABASE chromatin type annotation is shown as a color vector under the contact probability map, followed by the A/B compartment annotation (Rao et al., 2014) for the simulated map (red) and the experimental map (black), respectively. The Pearson’s R between the simulated and experimental contact maps for fixed genomic distances are plotted for the cell types IMR90, HUVEC, and K562, respectively, in thick lines. The Pearson’s R between the experimental maps of mismatching cell types are also shown with thin lines—See Legend. The shaded region highlights that at relatively short genomic distances (<10 Mb), excluding CTCF-mediated loops from the simulation results in disagreement between the simulated and experimental maps. When loops are included in the simulations, the agreement between the simulation and experiment is recovered at the short genomic distances. (D) Pearson’s R as a function of genomic distance is plotted between the experimental map for chromosome 21 (IMR90) and MiChroM simulation with loops (thick red line) and without loops (thin red line). (E) A matrix of Pearson’s R between the AB annotation of the experimental ligation map and the simulated contact maps for different cell types, respectively. The high Pearson’s R signifies the consistency between the simulated maps and the experimental DNA–DNA ligation maps. Additional comparisons between simulated and experimental DNA–DNA ligation maps are shown for cell lines HMEC, H1-hESC, and HeLa-S3 in Figure 1—figure supplements 13, respectively. A matrix of Pearson’s R between the AB annotation of the experimental ligation maps for different cell types is shown in Figure 1—figure supplement 4.

Additional comparisons between the experimental and simulated maps are shown in Figure 1—figure supplement 5. In particular, the scaling of the contact probability with genomic distance (Figure 1—figure supplement 5) appear to suggest that the chromosomes are denser in experiment than in the simulations for the cell lines HUVEC and H1-hESC. It is important to note that the simulations are not re-trained for the different cell line predictions; rather, all of the simulations are performed with a chromatin volume fraction of 0.1 (See Materials and methods for more details).

While we have focused so far on the spatial organization of entire chromosomes on the micrometer length scale, for a better comparison with the structures of chromosome 21 of IMR90 and K562 obtained from microscopy (Bintu et al., 2018), we have also incorporated in the polymer physics simulation the loops mediated by the activity of the protein CTCF.

Figure 1D shows that the inclusion of CTCF-loops, which are easily be incorporated into the model, improves the quality of the results for the short range features of chromosome organization within 10 Mb in genomic distance; at larger length scales the model appears to be completely insensitive of CTCF-mediated loops. To date, we have only examined the effect of the absence or presence of loops on the chromosome structural ensemble. A more detailed treatment of the short ranged chromatin contacts, particularly of many-body chromatin interactions (Perez-Rathke et al., 2020), has been shown to be important in recapitulating the organization of short segments of chromatin between 500 KB and 1.9MB in length.

Figure 1E shows the Pearson’s R between the AB annotation vectors derived directly from the DNA–DNA ligation maps and those obtained from MiChroM simulations for different cell types. The diagonal of Figure 1E corresponds to the Pearson’s R between AB annotations derived from experiment and simulation of matching cell types. The simulated and experimental annotations for the same cell types agree well with each other. Figure 1—figure supplement 4 shows the Pearson’s R between AB annotations derived from experiment alone for the different cell types. Notably, the high degree of correlation between the myelogenous leukemia cell line K562 and human lymphoblastoids (GM12878) maps observed in Figure 1E is apparent from DNA–DNA ligation maps alone (Figure 1—figure supplement 4). The agreement between the simulated and experimental A/B annotations is the highest quality (Pearson’s r ~ 0.9) for the DNA–DNA ligation maps of GM12878, which is not surprising since the GM12878 has an order of magnitude more reads than any other map and consequently has the highest resolution.

Taken together, these results demonstrate that long-range compartmentalization observed in the DNA–DNA ligation maps is well captured by the simulated structural ensembles for these well-studied cell lines using only information about the epigenetic marking patterns as input.

Chromatin structural ensembles from DNA-tracing reveal coexistence of open and closed structures

Recent developments in DNA-tracing have allowed the direct experimental determination of three-dimensional structures using diffraction-limited and super-resolution microscopy (Bintu et al., 2018; Boettiger et al., 2016; Nir et al., 2018; Beliveau et al., 2015). DNA-tracing is a technique that labels consecutive stretches of DNA with optical probes, which can be used to spatially resolve the positions of those probes using microscopy. It has become increasingly clear that unlike the situation for folded globular proteins, which typically can be reasonably well approximated for many purposes by a single native structure corresponding to the average conformation, chromatin appears to be highly dynamical and cannot be characterized by any single conformation. The heterogeneity of the chromosomal structural ensembles was first suggested by the analysis of the free energy landscape of chromosomes (Zhang and Wolynes, 2015; Di Pierro et al., 2016) and has been indirectly observed through single-cell DNA–DNA ligation experiments (Nagano et al., 2013; Stevens et al., 2017; Tan et al., 2018; Finn et al., 2019). The heterogeneity has now been confirmed by direct imaging of individual chromosomal structures (Bintu et al., 2018; Boettiger et al., 2016; Nir et al., 2018). As a consequence of this conformational plasticity, statistical ensembles (Zhang and Wolynes, 2015; Di Pierro et al., 2016; Di Pierro et al., 2017; Di Pierro et al., 2018; Zhang and Wolynes, 2016; Di Pierro, 2019; Goundaroulis et al., 2020; Bascom et al., 2019; Dekker et al., 2013; Kalhor et al., 2012) must be used in order to describe chromosomal structures in vivo.

In order to improve our understanding of the genomic structural ensembles, we characterize the structural heterogeneity of chromatin that was imaged using microscopy. We focus on the traced structures of Bintu et al., 2018, who obtained hundreds of images structures for short ~2 Mb segments of chromatin belonging to chromosome 21. These regions are 29.37–31.32 Mb (referred to here as Segment 1) of IMR90 and K562 cell types and 20.0–21.9 Mb (referred to as Segment 2) of IMR90. Only structures where the positions of over 90% of the loci were resolved are used in the present analysis. There are then 692 usable structures for IMR90 Segment 1, 244 usable structures of K562 Segment 1, and 752 usable structures of IMR90 Segment 2.

As previously reported (Bintu et al., 2018; Boettiger et al., 2016; Nir et al., 2018), the traced structures can be used to generate a population-averaged contact maps, which turn out to be consistent with DNA–DNA ligation maps. Shown in Figure 2—figure supplement 1 are the averaged contact maps for the chromatin Segments 1 (IMR90 and K562) and Segment 2 (IMR90), respectively. Nevertheless, information is lost when converting from a 3D structural ensemble to a 2D contact map.

Focusing on the structural details that cannot be found in a contact map, we make a close examination of the types of structures observed in the tracing dataset using a collective variable commonly used in studying protein folding landscapes, Q, which quantifies the structural similarity between two structures α and β (Eastwood and Wolynes, 2001):

(1) Qαβ=1Ni<jexp(((rij(α)rij(β))22δ2))

where rij(α) and rij(β) are the distances between chromatin loci i and j in structures α and β:, respectively, N is the number of pairs of loci included in the summation, and δ=0.165μm is the resolution length scale for which deviations in the distances between structures α and β are treated as being similar. The Q between any two structures ranges from 0 (dissimilar) to 1 (identical) over the entire set of pairwise distances between loci. The parameter Q is not solely based on contacts; a pair of chromatin loci can contribute to Q even if they are not spatially proximate if they are separated in both structures by a similar distance as set by δ. In this way, Q measures structure more stringently than a simple contact map does.

Using 1Q to define the distance between any two structures, hierarchical clustering of the traced structures for Segment 1 was applied to identify clusters having distinct structural features. These cluster sub-ensembles can be considered distinct conformational states. To see whether the Segment 1 structures for IMR90 and K562 exhibit a high degree of structural similarity, we combined their datasets before clustering.

When applied to the 936 combined experimental structures for Segment 1, the clustering algorithm yields three distinct clusters. These correspond to a closed dumbbell (Cluster 1), an open dumbbell (Cluster 2), and a highly dense chromatin state (Cluster 3) shown in Figure 2. The closed dumbbell, where the head and tail globular domains are in contact with one another, is the dominant state observed for Segment 1 in both IMR90 and K562, accounting for 97.4% of the imaged structures (Nclosed=912). Cluster 1 can further be sub-divided into subgroups 1a, 1b, 1 c, and 1d (Figure 2), which account for 75.5% of the structures in Cluster 1. The subgroups appear to capture various stages of the process of opening. The structures in subgroup 1b are fully collapsed, while structures in 1a, 1c, and 1d capture the progressive opening of the closed dumbbell. The distribution of the radius of gyration for structures belonging to sub-clusters 1a-1d is shown in Figure 2—figure supplement 2. The open dumbbell structures where the head and tail domains have dissociated from one another, account for approximately 1.8% of the imaged data (Nopen=17 ). Additionally, seven dense, highly compact structures were identified from clustering. Representative structures from the three clustered structural groups are shown in Figure 2 and the corresponding population-averaged contact maps are shown in Figure 2B and C for the closed and open structures, respectively.

Figure 2 with 4 supplements see all
Hierarchical clustering and the detailed structural analysis of traced Segment 1.

(A) The dendrogram representation of the hierarchical clustering of Segment 1 (chr21 29.37–31.32 Mb for IMR90 and K562 of Bintu et al., 2018), where 1Q is used as the distance between two structures. The clustering reveals three main clusters: closed dumbbell, open dumbbell, and highly dense structures. Further analysis of Cluster 1 reveals the presence of sub-clusters labeled 1a–1d that represent the gradual opening of the closed dumbbell. Representative traced structures are shown for each of the clusters and sub-clusters. The population-averaged contact maps for the closed and open structure clusters are shown respectively in (B) and (C), where 330 nm is used to define a contact between two 30 kb loci. (D) The distribution of the radius of gyration (top), the corresponding potential of mean force (center), and the distributions of radius of gyration for Cluster 1 and Cluster 2 (bottom) are shown for the traced structures of Segment 1 of IMR90 and K562. The distribution exhibits a heavy tail to the right of the average value, indicating the existence of open, elongated structures. (E) The UCSC Genes track is plotted along the genomic positions of Segment 1 using the Genome Browser (Kent et al., 2002). Figure 2—figure supplement 1 shows the contact maps for the experimentally traced segments of chromatin. Figure 2—figure supplement 2 shows the distributions of the radius of gyration for the sub-clusters of closed dumbbell structures obtained experimentally using tracing. Figure 2—figure supplement 3 shows the hierarchical clustering and detailed structural analysis of the experimentally traced Segment 2.

The high density chromatin, cluster 3, which was found when imaging both Segment 1 and Segment 2 (Bintu et al., 2018), is characterized by an extraordinarily high density of DNA ~2×103mg/ml , as estimated for naked dsDNA. For comparison, the density of heterochromatin that is estimated using microscopy data is ~200mg/ml (Imai et al., 2017); for this reason, we believe that these chromatin conformations are likely artifacts of the experimental protocol. We therefore have excluded Cluster three from further analysis.

Assuming that the opening of the chromatin Segment 1 is in an effective thermodynamic equilibrium would imply a relative stability of log(Nclosed/Nopen)=EopenEclosed~4kBT, where EopenEclosed is an apparent free energy difference between the closed and open states and T is an information theoretic temperature characterizing the ensemble (Zhang and Wolynes, 2017). Interestingly, the relative number of open and closed structures found in the simulations (discussed in the next section) is in remarkable agreement with this experimental finding.

We then used the radius of gyration , Rg, as an additional order parameter for the structural ensembles of Segment 1 belonging to IMR90 and K562 (Figure 2D). A corresponding potential of mean force can be extracted from the distribution of Rg as PMF=kBTlogP(Rg), which also shows the free energy difference of ~4kBT between the closed (Cluster 1) and open (Cluster 2) structural sub-ensembles. The distributions of Rg are also shown for Clusters 1 and 2 in Figure 2D. The open conformations (Cluster 2) possibly belong to a free energy minima in the PMF located between between Rg~0.60.8μm, although additional statistics would be necessary to firmly establish the presence of this additional conformational mode. Interestingly, the vast majority of genes appear to be positioned along the linker region connecting the two globular domains (Figure 2E).

Unlike Segment 1, Segment 2 of IMR90 completely lacks loop domains and, consequently, the averaged contact maps for Segment 2 exhibit no additional features beyond the decay in contact probability as a function of genomic distance (Figure 2—figure supplement 1). Structural analysis reveals that, without the presence of loop domains, Segment 2 is highly disordered; while clustering reveals open and closed structures, the lack of loop domains and domain boundaries results in the loss of dumbbell-like structures (Figure 2—figure supplement 3). It should be noted that unlike Segment 1, Segment 2 has an absence of genes (Figure 2—figure supplement 4).

The chromosomal structures obtained from physical modeling are consistent with those observed with microscopy

We compare the chromosome structures sampled in the simulations to the diffraction-limited microscopy structures of Bintu et al., 2018, finding that the conformational states observed using microscopy are also found in the simulated structural ensemble without any calibration or fine tuning of parameters. While MEGABASE+MiChroM, provides us with structures of entire chromosomes, we focus specifically on the same ~2 Mb chromatin segment within chromosome 21 for our direct comparison.

It is important to note that the simulated model, and the structural variability that it captures, was derived from the energy landscape learned from population-averaged DNA–DNA ligation data using the principle of maximum entropy (Di Pierro et al., 2016). MiChroM has been shown to be consistent with experimental ligation maps (Figure 1 and Di Pierro et al., 2016; Di Pierro et al., 2017; Contessoto et al., 2019), as well as the distribution of distances between Fluorescence in situ hybridization (FISH) probes (Di Pierro et al., 2017) and several observations regarding chromatin dynamics (Di Pierro et al., 2018).

Using the 1Q as the distance between all simulated structures for Segment 1, we now performed hierarchical clustering of the simulated structures. The dendrogram of this clustering is shown in Figure 3A, which uncovers two main clusters in the structural ensemble: a closed dumbbell (Cluster 1) and an open dumbbell (Cluster 2). The closed and open structures are consistent with those observed in the Bintu et al., 2018 datasets. The representative structures of the closed and open conformations are shown in Figure 3, alongside the averaged contact maps for each of the clusters (Figure 3B–C), which are consistent with those determined experimentally (Shown in Figure 2B–C; Figure 3—figure supplement 3). The simulated Cluster 1 can again further be sub-divided into subgroups; 1α, 1β, 1γ, and 1δ represent the four most populated sub-groups (Figure 3), which comprise 66% of the simulated structures of Cluster 1. The subgroups appear to capture various stages of the process of opening. The structures in subgroup 1α are fully collapsed, while structures in 1β, 1γ, and 1δ capture the progressive opening of the closed dumbbell. The radius of gyration of sub-clusters 1α−1δ are shown in Figure 3—figure supplement 1.

Figure 3 with 3 supplements see all
Hierarchical Clustering and the detailed structural analysis of simulated chromatin segment.

(A) The dendrogram representation of the hierarchical clustering of simulated Segment 1 (chr21 29.37–31.32 Mb for IMR90 and K562) where 1Q (Equation 1) is used as the distance between two structures. The clustering reveals two main clusters: closed dumbbell (6275 out of 6400 structures) and open dumbbell (125 out of 6400 structures). The closed dumbbell can be subdivided into sub-clusters labeled 1α−1δ that represent the opening transition of the closed dumbbell. Representative structures are shown for each of the clusters and sub-clusters. The population-averaged contact maps for the clusters are shown respectively in (B) and (C), where 330 nm is used to define a contact between two 50 kb loci of the MiChroM model. The distribution of the radius of gyration is shown for Segment 1 IMR90 (D) and K562 (E) traced structures in comparison with the experimental structures. (F) Distribution of the radius of gyration and the corresponding potential of mean force is shown for both experiment and simulation for all of the structures of Segment 1. Figure 3—figure supplement 1 shows the distributions of radius of gyration for the sub-clusters of closed dumbbell structures obtained from simulation. Figure 3—figure supplement 2 shows how minor deviations in the unit of length estimate can account for the differences in the experimental and simulated distributions of radius of gyration .

No highly dense structures exist in the simulations. Such structures would collapse the entire chromatin segment to the volume of a single monomer, an occurrence that is prohibited by the energy function used to model the system. This is in harmony with our view that Cluster 3 seen in the experiments are artifacts of some sort.

For Segment 1, we performed our analysis on a set of 6400 structures, a representative subset of the simulated trajectories by taking every 125,000th structure from simulations. Both closed (Nclosed=6275) and open structures (Nopen=125) were identified by the clustering algorithm. Since MiChroM assumes an effective equilibrium thermodynamics representation of chromosome structures and dynamics, we can quickly calculate the relative stability between closed and open structures in the simulated ensemble as log(Nclosed/Nopen)=EopenEclosed~4kBT, where EopenEclosed is the effective free energy difference between the closed and open states. This free energy difference is remarkably consistent with the value estimated using only the experimentally traced structures in the preceding section.

Finally, we calculated the distribution of the radius of gyration ,Rg, for the experimetal traced structures of Bintu et al., 2018 and for the simulated MiChroM structures for Segment 1 belonging to IMR90 and K562 (shown in Figure 3D and Figure 3E respectively). Using a length scale calibrated previously (Di Pierro et al., 2017) from a single FISH experiment of 0.165 μm yields excellent quantitative agreement between the experimentally observed structures and those predicted de novo from simulation. It is particularly remarkable that any discrepancies between the experimental and simulated datasets can in fact be captured within 5% error of our original length estimate (Figure 3—figure supplement 2). Similarly, Figure 3F shows the direct comparison between the distribution of Rg for Segment 1 as well as the corresponding potential of mean force. We see then that MiChroM appears to reproduce the apparent free energy difference between open and closed structures found using the experimentally traced structures.

Comparative analysis of the chromosomal structural ensembles of different cell lines: connecting the epigenetic markings of loci with their radial positioning within territories

The frequency of chromatin type annotations predicted by MEGABASE over different cell types is shown in Figure 4A as a stacked bar chart that represents the distribution of chromatin type annotations predicted for each locus of chromosome 2 over all of the cell types. It is evident that certain loci have similar epigenetic markings patterns in all the cell types that we examined, either by being generally transcriptionally active loci, thus likely belonging to the A compartment, or by being transcriptionally inactive B compartment loci. On the other hand, several segments of chromatin switch compartments between different cell types.

Conservation of compartmentalization across cell types and the radial dependence of marked chromatin.

(A) A stacked bar chart is used to represent the distribution of chromatin type annotations predicted by MEGABASE as a function of the genomic position along chromosome 2 (hg19). The colors correspond the chromatin types given in the Figure Legend. For a given genomic position, the relative height of a particular color indicates the fraction of that particular chromatin type predicted at that locus. (B) The MEGABASE prediction of the chromatin type is shown for the chromatin segment 39.9–40.6 Mb of chromosome 2 for HMEC, HUVEC, and IMR90. A black arrow in (A) highlights the location of this segment. (C) The chromatin segment 39.9–40.6 Mb of chromosome 2 is shown in a representative structure for each of the cell types, where the color of the segment denotes its MEGABASE annotation. For HMEC and IMR90, the segment of chromatin tends toward the chromosome surface, whereas the segment tends toward the interior for HUVEC. (D) The radial density as a function of the normalized radial distance is plotted for A compartment loci, B compartment loci, and all loci for simulations of chromosomes for the HMEC cell type. (E) The probability density functions of the radial distance are shown for the center of mass of the segment 39.9–40.6 Mb of chromosome 2 for HMEC, HUVEC, and IMR90, respectively. (F) A stacked bar chart is used to represent the distribution of chromatin type annotations predicted by MEGABASE as a function of the genomic position along chromosome 21 (hg19). The arrows indicate the locations of the traced segments of Bintu et al., 2018: Segment 1 (29.37–31.32 Mb) and Segment 2 (20.0–21.9 Mb). (G) The MEGABASE annotation of the traced chromatin segments are given for IMR90 and K562. (H) The distribution of radial distances of the center of mass of each traced segment is shown.

Using the structural ensemble from the simulations based on the predicted compartments we then quantified the conformational differences between different cell types. On the chromosomal scale, structural differences emerge primarily from changes in the phase separation of epigenetically marked segments of chromatin. An example is illustrated in Figure 4B, which focuses on the region 39.9–40.6 Mb of chromosome 2 for HMEC, HUVEC, and IMR90. The MEGABASE classification (Figure 4B) identifies the segment in HMEC and IMR90 as belonging to the A compartment, whereas the segment for HUVEC should belong to the B compartment. Representative 3D structures for this segment for each of the respective cell types are shown in Figure 4C.

A plot of the radial density of A compartment loci and B compartment loci is shown in Figure 4D. These radial densities are consistent with previously reported simulations (Di Pierro et al., 2016). Taking a look at the radial distance of the center of mass of the segment of chromatin in each of the cell types, one finds that the A compartment loci tend to localize toward the surface of the chromosome, while the B compartment loci of the HUVEC cell type tend to localize in the interior (Figure 4E). A similar positioning of transcriptionally active chromatin toward the periphery of chromosomal territories was also observed by Nagano et al., 2013 in mouse cells using Hi-C experiments.

We additionally use simulations to predict and examine the spatial positioning of the segments of chromatin examined by Bintu et al in the context of the entire chromosome 21. The experimental traced structures could not be used to ascertain the spatial positioning of those chromatin segments within chromosome 21 since only short segments were imaged rather than the entire chromosome. Figure 4F shows a stacked bar chart that represents the distribution of chromatin types predicted by MEGABASE for each genomic position of chromosome 21. Figure 4G shows the MEGABASE predictions for the traced segments, showing that IMR90 Segment 1 (29.37–31.32 Mb) is composed of A-type chromatin while IMR90 Segment 2 (20.0–21.9 Mb) primarily is composed of B compartment chromatin types. K562 Segment 1 (29.37–31.32 Mb) appears to contain both A and B chromatin types. Figure 4H shows the radial distance distribution of the center of mass of these segments of chromatin, showing that IMR90 Segment 2 tends to be in the interior, IMR90 Segment 1 tends to lie near the chromosome surface, and K562 Segment 1 occupies an intermediate region.

The finding that there exists a radial ordering associated with the spatial compartmentalization is consistent with the fact that according to the MiChroM potential (Di Pierro et al., 2016), contact interactions between B loci exhibit the most favorable energetic stabilization of all chromatin interactions. On the other hand, A to B or A to A-type interactions are comparably strong to each other, but are significantly weaker than the B to B interaction (See Supplementary file 1). In other words, according to the MiChroM energetic parameters (which were originally learned from Hi-C maps), B loci drive the phase separation of the chromosomes. Much like a hydrophobic-polar model from protein folding, the B compartmentalization forms the stable core of the simulated chromosome and the weaker interactions between A compartment loci with A or B loci tends toward the surface, to minimize the free energy of the molecular assembly. Our theoretical model thus corroborates recent experiments (Falk et al., 2019; Strom et al., 2017) that suggests heterochromatin phase separation is a major driving force behind genome organization, further highlighting the important role of phase separation in biological organization (Di Pierro et al., 2016; Hnisz et al., 2017; Brangwynne et al., 2015; Banani et al., 2016; Shin and Brangwynne, 2017).

Discussion

DNA-tracing combined with diffraction-limited or super-resolution microscopy is beginning to shed light on the high degree of variability that is characteristic of chromosomal structures in the interphase (Bintu et al., 2018; Boettiger et al., 2016; Nir et al., 2018; Beliveau et al., 2015). These studies add to the growing body of evidence that a unique chromosomal fold simply does not exist in the interphase. Chromosome structures in the nucleus appear to be highly dynamical, owing to the many non-equilibrium processes in the cell, such as the activity of motor proteins.

The advances in genome imaging and the molecular simulation of chromosomes allows the development of parameters able to quantify the structural similarities between different chromosome structures, and the degree of heterogeneity in the ensemble of structures. Our results demonstrate that the collective variable Q, commonly used in protein folding studies and structural biophysics, is suitable for characterizing the structural ensemble of a segment of chromatin. Despite the high degree of conformational plasticity, it appears that for segments of chromatin as short as the ones imaged by Bintu et al., 2018 (~2 Mb in length), there do exist distinct clusters of chromatin structures that can be distinguished using Q. The dominant structures found for chromatin Segment 1 (chr21 29.37–31.32 Mb) examined using data from microscopy as well as from simulation can be described as being a closed dumbbell and an open dumbbell, where the ends of the dumbbell are the globular domains at the head and tail of the chromatin segment.

It is known that CTCF proteins bound along the genome act as gene insulators, probably through their suppressing activity toward loop extrusion (Sanborn et al., 2015; Fudenberg et al., 2016; Vian et al., 2018). Interestingly, a survey of the positioning of genes along Segment 1 reveals that the vast majority of genes appear clustered in the linker region of the chromatin segment (Figure 2E), sandwiched between the head and tail loop domains. On the other hand, there is an absence of genes located on Segment 2 (chr21 20.0–21.9 Mb) (Figure 2—figure supplement 4), which contains no loop domains. Classification of the experimentally imaged structures of Segment 2 lack the domain boundaries that segregate the head and the tail of the chromatin segment into globular domains, although still exhibiting open and closed conformations. These findings suggest a possible role in transcriptional regulation for the opening and closing of organized dumbbell structures. How open and closed structures would achieve such regulation of the transcriptional activity remains to be investigated. It is however clear that understanding the structure-function relationship in the genome is a crucial question that can only be answered using an accurate statistical characterization of the conformational ensembles.

Finally, our work refines the classical view of the spatial compartmentalization of chromatin. We find a striking dependence between radial positioning of chromatin and epigenetic marking patterns. Our theoretical model, MiChroM, predicts that transcriptionally active loci, typically belonging to the A compartment, move toward the surface of the chromosomal territory, while B compartment loci, typically inactive, move to the interior (Di Pierro et al., 2016). Since interactions among B-B loci result in the greatest energetic stabilization, aggregation of these loci seems to be driving force behind both the phase separation of epigenetically similar chromatin into compartments and the expulsion of the active chromatin toward the periphery of chromosomal territories. In other words, according to the present energy landscape model, when the epigenetic marking patterns of a locus are rewritten from A to B, the locus moves toward the interior of the chromosome, perhaps affecting the transcriptional activity of the associated genes.

Materials and methods

We simulate the structural ensembles of chromosomes belonging to select human cell types using a previously introduced computational pipeline referred to as MEGABASE+MiChroM (Di Pierro et al., 2017). This pipeline takes chromatin immunoprecipitation tracks as input and computationally generates an ensemble of 3D structures of individual chromosomes at a resolution of 50 kb per monomeric unit. While this approach was initially trained and validated for chromosomes belonging to human lymphoblastoid cells (GM12878), we demonstrate that this approach can be readily applied to chromosomes belonging to any cell type given epigenetic histone modification data.

Megabase

Request a detailed protocol

Maximum Entropy Genomic Annotations from Biomarkers Associated to Structural Ensembles (MEGABASE) (Di Pierro et al., 2017) was trained to quantify the correlations between chromosome structural annotations (i.e., compartment annotations A1, A2, B1, B2, B3) with chromatin immunoprecipitation (ChIP-Seq) signals. This allowed for the inference of the chromatin types (compartment labels) for each 50 kb locus of chromatin, given information about the histone modifications present at that locus.

Discretization of ChIP-Seq data tracks

Request a detailed protocol

Chromatin Immunoprecipitatin (ChIP-seq) data was downloaded from ENCODE (Dunham and Kundaje, 2012) for the cell lines explored in this manuscript: IMR90, HUVEC, K562, HMEC, H1-hESC, and HeLa-S3. We focused on 11 histone modification tracks: H2AFZ, H3K27ac, H3K27me3, H3K36me3, H3K4me1, H3K4me2, H3K4me3, H3k79me2, H3K9ac, H3K9me3, and H4K20me1. These 11 tracks were previously shown to contain sufficient information to predict the chromosome structural ensembles for GM12878 (Di Pierro et al., 2017).

For each chromosome, the ChIP-Seq signal is re-casted into the data tracks at 50 kb resolution. This is performed by summing the ChIP-Seq signal contained within each 50 kb locus respective of each experiment.

The integrated ChIP-seq signal for each 50 kb locus is assigned a discrete state ranging from 1 (low signal) to 20 (high signal). This is performed by creating a histogram for each experiment of the integrated signal for all of the 50 kb loci in the chromosomes of each cell type. All loci belonging to the top 5% of the distribution with the highest signal are assigned the highest signal state, that is 20. The remaining 19 signal states are defined by partitioning the remainder of the distribution linearly with respect to the signal strength; loci are assigned to those states according to their integrated signal.

Prediction of chromatin structural types from ChIP-Seq data using MEGABASE

Request a detailed protocol

The inferred probabilistic model (Di Pierro et al., 2017) can be marginalized to predict the chromatin type for a given locus l when given the experimental ChIP-Seq measurements at loci l-2, l-1, l, l+1, and l+2:

(2) CST(l)=argmaxP(CST|Exp1,...,L(l2,l1,l,l+1,l+2))

where L = 11 is the number of epigenetic histone modifications used in this study and Exp is a vector of discretized ChIP-Seq signals for loci l-2, l-1, l, l+1, and l+2. This allows for the prediction of the chromatin type (CST) for a given chromatin locus, given the ChIP-Seq signals for the 11 histone modifications at that locus. For additional details on the MEGABASE model, refer to Di Pierro et al., 2017. The predicted sequences of chromatin types can readily be obtained from our server (https://ndb.rice.edu/MEGABASE) (Contessoto et al., 2019) for different cell types and tissues with available ChIP-Seq histone modification tracks from ENCODE. The user can also supply ChIP-Seq tracks to generate sequences of chromatin type annotations for chromosomes of an unspecified cell type.

Minimal chromatin model (MiChroM)

Request a detailed protocol

The sequence of inferred chromatin types for each chromosome serves as input for our coarse-grained simulations of individual chromosomes using the Minimal Chromatin Model (MiChroM) (Di Pierro et al., 2016). MiChroM is a coarse-grained representation of individual chromosomes with the following potential energy:

(3) UMiChroM(r)=UHP(r)+Utypetype(r)+Uloops(r)+Uideal(r)

where

Utypetype(r)=klk,l  Typesαkli{ Loci of Type k }j{ Loci of Type l }f(rij)
Uloops(r)=χ (i,j) { Loops Sites }f(rij)
Uideal(r)=d=3500γ(d)if(ri, i+d)

and the probability of crosslinking between chromatin loci i and j is modeled as

(4) f(rij)=12(1+tanh[μ(rcrij)]).

The first term UHP is a homo-polymer potential that describes the connectivity (bonds and angles) between monomers—the monomers here represent a 50 kb span of DNA. The second term Utypetype describes the sequence-dependent interactions between pairs of monomers; this term captures the phase separation of chromatin loci into spatial compartments. The parameters αkl describe the energetic stabilization when two loci of chromatin type k and l are spatially proximal. The third term Uloops describes the interaction between loop anchors that stabilize a CTCF-mediated loop. The final term Uideal referred to as the Ideal Chromosome (Zhang and Wolynes, 2015; Di Pierro et al., 2016; Zhang and Wolynes, 2016) describes the translationally invariant local ordering in chromatin; a pair of chromatin loci in close proximity are stabilized by an energy γ(d) that depends on the genomic distance between the loci pair, d=| ij |. Although the ideal chromosome accounts for the loop-length dependent entropic effects as well as motor-driven processes acting along the chromatin polymer, the term remains agnostic regarding the precise mechanisms responsible for local ordering.

The parameters μ=3.22 and rc=1.78 were adjusted for the contact maps of GM12878 B-lymphoblastoid cells in dataset GSE63525 (Rao et al., 2014). The parameters α, χ, and γ were iteratively trained (Di Pierro et al., 2016) to be consistent with the DNA–DNA ligation map of chromosome 10 of human lymphoblastoid cells (GM12878)(Rao et al., 2014). The resulting polymer model is confined in a hard wall sphere to approximately preserve the volume fraction of chromatin in the interphase of 0.1 (Rosa and Everaers, 2008). Here, we model individual chromosomes confined within a hard sphere that represents a chromosome territory rather than the nuclear envelope. It has been shown that the physical tethering of chromatin to the nuclear envelope via the nuclear lamina may play an important role during differentiation and development (Solovei et al., 2013). Further, the role of the nuclear lamina in genome organization has been computationally modeled (Laghmach et al., 2020; Lee et al., 2017; MacPherson et al., 2020). While we do not currently use an explicit representation of the lamina, it should be noted that the MiChroM model was trained on DNA–DNA ligation data and its energetic terms would implicitly account for the effect of the lamina on the intra-chromosomal organization of chromatin loci.

MiChroM considers five chromatin types A1, A2, B1, B2, B3 plus a non-specific type NA, which is used to label the centromere. The α parameters, which govern the type-to-type interactions, are given in the Supplementary file 1. MiChroM makes no assumptions about the physical nature of the interactions that lead to compartmentalization. While the mechanistic details behind the chromatin type interactions are not fully understood, recent work (MacPherson et al., 2018) has shown that the binding of HP1 to chromatin can lead to compartmentalization via the oligomerization of HP1 to bridge the nucleosomes.

The parameter χ governing the loop interactions is equal to −1.612990.

The ideal chromosome potential is given by:

γ(d)=γ1log(d)+γ2d+γ3d2

with parameters γ1=0.030, γ2=0.351, γ3=3.727.

The reduced MiChroM energy function used in this manuscript omits CTCF-mediated loops unless stated otherwise:

(5) UMiChroM(r)=UHP(r)+klk,l  Typesαkli{ Loci of Type k }j{ Loci of Type l }f(rij)+d=3500γ(d)if(ri, i+d)

For comparison with the DNA-tracing structures of Bintu et al., 2018, simulations of chromosome 21 for cell types IMR90 and K562 with CTCF-mediated loops were generated using the full energy function of MiChroM.

Langevin simulations

Request a detailed protocol

Langevin simulations of individual chromosomes at a resolution of 50 kb per monomeric unit were performed using the GROMACS (Abraham et al., 2015) molecular dynamics package. Initial structures were generated from linear chain at a starting temperature of 3.0ε/kB and linearly cooled to a temperature of 1.0ε/kB over 5 × 106 steps with a time step of 0.002τ, where τ and ε are the units of time and energy for our model, respectively. Following equilibration, simulations were run at a constant temperature of 1.0ε/kB for 20 × 106 steps with a time step of 0.001τ. All simulations were run using a dampening coefficient of 1.0τ. A total of 40 replicate simulations were run for each chromosome simulation. The resultant simulation trajectories are available for download at the Nucleome Data Bank (https://ndb.rice.edu/Data).

Simulated DNA–DNA ligation maps

Request a detailed protocol

The simulated contact probability pij between chromatin loci i and j is calculated by taking the expectation value of the probability of crosslinking (Equation 4) over the ensemble of chromosome structures obtained from simulation (Di Pierro et al., 2016):

(6) pij=f(rij)=drf(rij)exp(βU(r))/drexp(βU(r)).

Here, pij is a matrix element of the simulated DNA–DNA ligation map.

Notes

Request a detailed protocol

Unless explicitly stated otherwise, all genomic positions are reported using the positions of the hg19 assembly. All of the simulated chromosome structures discussed in this manuscript were deposited in the Nucleome Data Bank (NDB) (Contessoto et al., 2019) found at https://ndb.rice.edu.

Data availability

All of the simulated chromosome structures have been deposited in the Nucleome Data Bank (https://ndb.rice.edu/Data).

The following data sets were generated
The following previously published data sets were used

References

Decision letter

  1. Yibing Shan
    Reviewing Editor; DE Shaw Research, United States
  2. Detlef Weigel
    Senior Editor; Max Planck Institute for Developmental Biology, Germany
  3. Yibing Shan
    Reviewer; DE Shaw Research, United States
  4. Huafeng Xu
    Reviewer; Unaffiliated with any company or institution, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

Using machine learning and polymer physics simulations, Cheng and colleagues generated structural models of chromosomes that are broadly consistent with single-cell imaging and capture the key characteristics of genomic compartmentation. With continued increase of single-cell imaging data and improvement in accuracy for training and calibration of such computational modeling, this holds great potential in generating accurate and full-scale chromosome structures and revealing the underlying mechanism of epigenetic switching at an molecular level.

Decision letter after peer review:

Thank you for submitting your article "Exploring Chromosomal Structural Heterogeneity Across Multiple Cell Lines" for consideration by eLife. Your article has been reviewed by three peer reviewers, including Yibing Shan as the Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Detlef Weigel as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Huafeng Xu (Reviewer #2); and Jie Liang (Reviewer #3).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

We would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (https://elifesciences.org/articles/57162). Specifically, when editors judge that a submitted work as a whole belongs in eLife but that some conclusions require a modest amount of additional new data, as they do with your paper, we are asking that the manuscript be revised to either limit claims to those supported by data in hand, or to explicitly state that the relevant conclusions require additional supporting data.

Our expectation is that the authors will eventually carry out the additional experiments and report on how they affect the relevant conclusions either in a preprint on bioRxiv or medRxiv, or if appropriate, as a Research Advance in eLife, either of which would be linked to the original paper.

Summary:

Cheng et al. combined two computational methods previously published by the authors – MiChroM and MEGABASE – to simulate structural models of chromatins across six cell lines. They demonstrated good correlation between the average pairwise segment-segment contacts in the simulated models and the contacts inferred from ligation experiments and found similar statistical properties in the simulated models and in the structures observed by super-resolution microscopy. A number of important findings have been obtained from this study. A key finding is the high structural variability and ensemble nature of chromatins and as a result a chromatin generally cannot be described by a single stable structural fold. Furthermore, the study suggests that the structural linker region connecting compact domains contains most of genes for a chromosome segment.

Revisions for this paper:

1) The authors showed that different epigenetics of different cell types lead to different distributions of A/B compartments, which further lead to active A compartments moving toward to the surface of chromosomes, and inactive B compartments with stronger self-attraction move to the interior. However, it is well-known that LAD attachment can have strong effects on chromatin phase separations, as shown experimentally by Solovei et al., 2013, on lamin, and more recently in phase-field modeling work of Lee et al., 2017, and Laghmach, Pierro and Potoyan, 2020. In fact, in some studies this appear to be a dominant factor. The authors may wish to clarify whether lamina is included in the model, and whether the effects of LAD association would affect the conclusion of the results. Can the author provides a physical explanation of the A-B compartmentation beyond the analogy to hydrophobic collapse, and the possible physical elements that might regulate the open-close transition.

2) Figure 1E showed that predicted and measured A/B compartments have generally good correlation, with R varying for different cell types. It will be useful to plot the correlation R for A/B predictions/measurement, and R for simulated/measured Hi-C maps together. Are these R(A/B) and R(heatmaps) correlated? If not strongly correlated, why? Knowing this is useful as it may shed light on understanding to what extent the physical interactions encoded in the model (e.g. B-B hydrophobic-like interactions) explain the Hi-C maps, and how sensitive the current chromatin folding model is to the accuracy of A/B compartments assignment. Further, can the author provides a physical explanation of the A-B compartmentation beyond the analogy to hydrophobic collapse, and the possible physical elements that might regulate the open-close transition?

3) The authors showed that inclusion of CTCF-loops improves agreement of simulation and measured Hi-C heatmaps. Is this property uniquely for CTCF? A recent study showed that a small set of driver interactions exists for many loci, not necessarily involving CTCF, where 15-35 judiciously selected specific driver interactions can fold loci of enhancers of 500KB-1.9MB, with excellent R (>0.95) between measured and simulated Hi-C at 5 KB (PMC6966897). The authors may wish to further elaborate on their finding, e.g. after conducting a control study where a randomly (or non-randomly) chosen non-CTCF loop is include/not included, and whether such effects are also observed.

4) Simulated Hi-C heatmaps are reported for 7 cell lines. What are the cell nuclei sizes of these cells, and whether this size difference is taken into account in the simulations, and in general, why this is or is not an issue. This is relevant, as different nuclear size may affect the overall folding landscape of chromatin, since the effects of nuclear size confinement is one of the most strong constraints of chromatin.

5.1) The comparison with single-cell imaging data of Binto et al. is very interesting. It will be helpful if the authors show more details with examples of the degree of agreement between simulation and imaging studies. For example, a side-by-side comparison of best examples of single-cell heatmaps of Euclidean distance/contact between Bintu et al. measured single cells and simulated chromatin single-cell conformations. This can be supplemented with an overall scatter-plot of correlations of the pair-conformations of simulated/imaged conformations. Recent studies have shown some success in reproducing single cell Dip-C data through modeling (PMID PMC6966897), but comparison with imaging data is currently lacking, and the authors' results may set the standard for the field.

5.2) The 3D coordinates are available for both the simulated and the experimental structural ensembles. Thus, in addition to analyses of macroscopic and statistical properties such as the distribution of Rg and the probability of open/close conformations, direct structural comparison should be feasible, which can show that indeed some of the structures generated by the simulations are highly consistent with the experimental structures. Can we compute the average accessible surface area for each 50kb segment and compare between simulations and experiments? Can we plot the contact probability for each pair of segments in the simulations against that in the experimental structures?

5.3) In Figure 1, the simulated structures appear to have significantly more contacts than inferred from ligation experiments. Does this imply that the simulated structures might be too compact? Is this because the experiment sampling is more limited than the simulations? Or because there is some potential systematic discrepancy despite the high-level consistency between the simulation and the experiment. This should be discussed.

6) While the clustering of both imaging and simulations clearly showed the heterogeneity of chromatin, how are clusters defined? If clustered by a different order parameter, e.g. Q instead of Rg, do we still have the same finding – I wonder if certain natural order emerges from clustering. If we cluster both the simulated and experimental structures together, do they mingle in the same clusters?

7) There is an interesting discussion of the two-state equilibrium model and the PMF estimated energy gap of 4KT (also in the eighth paragraph of the subsection “Chromatin structural ensembles from DNA tracing reveal coexistence of open and closed structures”). However, can imaging studies of Segment 1 of Bintu et al. really be thought as from thermodynamic equilibrium? Its experimental set-up may be quite complicated and it is unclear whether imaged cells are samples properly drawn from the equilibrium distribution. In fact, the authors stated that chromatins are highly dynamic and follow a non-equilibrium process (Discussion, first paragraph).

8) "The genomic distance-dependent interactions recaptures the effect of motors acting along the DNA polymer." There is also well-known loop length dependent entropic effects as well, which may be part of the observed genomic distance dependency. It will be helpful if the authors clarify that only the motor-driven active process are at play in their model, or a mixture of both factors, and perhaps other hidden unspecified events are effectively accounted for in their model? Also, a citation on the motor process here will be helpful.

9) The reported finding that the linker region connecting two compact domains contains most of genes for segment 1 is very nice. Is this observation general?

10) The finding reported provides guidance for further biophysical studies of chromatin, for example, simple approximations such as Gaussian network models for chromatin domains are unlikely to be successful in capturing the heterogeneity of chromatin. The authors may want to briefly discuss the Gaussian network model in light of their results.

Revisions expected in follow-up work:

There was agreement that the value of the work could be greatly enhanced if the authors used their model to address a question of biological interest. For example, can the authors use their structural models-which have much richer details than available from experiments-to shed light on the distinction between the structural differences across different cell types and the variations among individual cells within the same cell type? What do the structural models teach us about transcription regulation?

https://doi.org/10.7554/eLife.60312.sa1

Author response

Revisions for this paper:

1) The authors showed that different epigenetics of different cell types lead to different distributions of A/B compartments, which further lead to active A compartments moving toward to the surface of chromosomes, and inactive B compartments with stronger self-attraction move to the interior. However, it is well-known that LAD attachment can have strong effects on chromatin phase separations, as shown experimentally by Solovei et al., 2013, on lamin A/C, and more recently in phase-field modeling work of Lee et al., 2017 and Laghmach, Pierro and Potoyan, 2020. In fact, in some studies this appear to be a dominant factor. The authors may wish to clarify whether lamina is included in the model, and whether the effects of LAD association would affect the conclusion of the results. Can the author provides a physical explanation of the A-B compartmentation beyond the analogy to hydrophobic collapse, and the possible physical elements that might regulate the open-close transition.

We thank the reviewers for bringing up this excellent point. While modeling entire genomes is our future direction, we currently have only modeled individual chromosomes confined within a volume that represents a chromosomal territory. The confinement in our case is not the nuclear envelope and we do not currently have any explicit description of the nuclear lamina. It should be noted however that the MiChroM model was trained on DNA–DNA ligation data and its energetic terms would implicitly account for the effect of the lamina on the intra-chromosomal organization of chromatin loci. We agree that the nuclear lamina has been shown to play an important role during differentiation and development (Solovei et al). However, for interphase chromosomes in differentiated cell lines, we have demonstrated that a model without explicit lamina can generate ensembles of 3D chromosome structures that are consistent with DNA–DNA ligation and microscopy experiments. We have added text to the manuscript regarding the potential role of lamina, as well as the referenced citations.

The strength of this work is that we use a model that makes no assumptions about the physical nature of the interactions that lead to compartmentalization. The parameters of the model were simply trained to be consistent with the Hi-C maps from experiment.

Recent work [MacPherson, Beltran and Spakowitz, 2018] has demonstrated how molecular interactions such as the binding of HP1 to chromatin can plausibly lead to compartmentalization via the oligomerization of HP1 to bridge the nucleosomes. However, the specific details of the molecular interactions that give rise to phase separation in the genome are not fully understood. Also unclear is the interplay between compartmentalization and motor proteins acting along the polymer, such as extruding complexes or polymerases. This has been clarified in the Materials and methods section.

2) Figure 1E showed that predicted and measured A/B compartments have generally good correlation, with R varying for different cell types. It will be useful to plot the correlation R for A/B predictions/measurement, and R for simulated/measured Hi-C maps together. Are these R(A/B) and R(heatmaps) correlated? If not strongly correlated, why? Knowing this is useful as it may shed light on understanding to what extent the physical interactions encoded in the model (e.g. B-B hydrophobic-like interactions) explain the Hi-C maps, and how sensitive the current chromatin folding model is to the accuracy of A/B compartments assignment. Further, can the author provides a physical explanation of the A-B compartmentation beyond the analogy to hydrophobic collapse, and the possible physical elements that might regulate the open-close transition?

We were very much interested in different ways to assess the predictions of our simulations in comparison with the experimental Hi-C maps. We focused on the Pearson between the A/B annotations between the simulation and experiment maps as well as the Pearson between the simulation and experimental maps at fixed genomic distances. These two measures are more stringent for comparing the compartmentalization patterns observed in the Hi-C maps.

While calculation of R between the simulated and experimental maps was discussed in our previous works, this particular measure is sensitive to the depth of sequencing of the experiment. The experimental data are irregular in terms of depth of sequencing (and therefore sparsity), which confounds the comparison. Global features of the ligation map such as the AB annotations are less sensitive to experimental irregularities than local features such as the direct ligation counts. Further, the Pearson between the maps are all high quality (>0.9) but they do not appear to be correlated to Figure 1E for the aforementioned reasons.

While MiChroM makes no assumptions of the molecular details that give rise to the phase separation, recent work in the Spakowitz lab [MacPherson, Beltran and Spakowitz, 2018] has shown that HP1 binding to chromatin can lead to distinct phases that resemble euchromatin and heterochromatin.

3) The authors showed that inclusion of CTCF-loops improves agreement of simulation and measured Hi-C heatmaps. Is this property uniquely for CTCF? A recent study showed that a small set of driver interactions exists for many loci, not necessarily involving CTCF, where 15-35 judiciously selected specific driver interactions can fold loci of enhancers of 500KB-1.9MB, with excellent R (>0.95) between measured and simulated Hi-C at 5 KB (PMC6966897). The authors may wish to further elaborate on their finding, e.g. after conducting a control study where a randomly (or non-randomly) chosen non-CTCF loop is include/not included, and whether such effects are also observed.

This is an interesting suggestion however we have not yet done such an in-depth analysis of short ranged loops. Rather, we focused on the global architecture of entire chromosomes. To date, we have only examined simulations with or without the presence of these CTCF-mediated loops and seen its effects on the global architecture. Our findings simply show that the inclusion of CTCF-mediated loops is important for features on the Hi-C map that are less than 10Mb in genomic distance (Figure 1D). We have added the citation and modified the text accordingly to point out that a more detailed analysis, in particular of driver interactions, might be important for an improved understanding of the organization of short segments of chromatin.

4) Simulated Hi-C heatmaps are reported for 7 cell lines. What are the cell nuclei sizes of these cells, and whether this size difference is taken into account in the simulations, and in general, why this is or is not an issue. This is relevant, as different nuclear size may affect the overall folding landscape of chromatin, since the effects of nuclear size confinement is one of the most strong constraints of chromatin.

The reviewers highlight a very important issue. However, at present we are not in the condition of addressing this point due to the lack of suitable experimental data regarding the nuclear size and shape distributions. All of the simulations are performed at a constant DNA density that is consistent with what is found in literature [citation: Rosa and Everaers, 2008]. It is worth pointing out that, using this constant density, we did previously predict FISH-measured distances (Di Pierro et al., 2017); this al least indicates that the overall degree of compaction of chromatin is generally correct. We look forward to further refine our models as soon as there are additional experimental observations regarding the nuclear size and shape.

5.1) The comparison with single-cell imaging data of Binto et al. is very interesting. It will be helpful if the authors show more details with examples of the degree of agreement between simulation and imaging studies. For example, a side-by-side comparison of best examples of single-cell heatmaps of Euclidean distance/contact between Bintu et al. measured single cells and simulated chromatin single-cell conformations. This can be supplemented with an overall scatter-plot of correlations of the pair-conformations of simulated/imaged conformations. Recent studies have shown some success in reproducing single cell Dip-C data through modeling (PMID PMC6966897), but comparison with imaging data is currently lacking, and the authors' results may set the standard for the field.

Thank you for your encouraging comment. We did in fact try to push further the direct comparison between imaging and simulations. We explored the comparison of ensemble averaged quantities such as the Hi-C maps and explored Q and Rg. However, the structures of chromatin are quite fluid and it is difficult (and perhaps impossible) to choose which structures are to be selected out of the ensemble to make a direct comparison. This is consistent with the observation and perspective that chromatin structures are like “snowflakes”, because no two are the same ([See: https://www.nature.com/articles/d41586-019-01426-w].) This is further supported by single-cell Hi-C and Dip-C experiments. The individual structures are quite different from one another and a direct comparison of structures with a scatter plot of the pair distances from simulation and experiment would simply indicate that the agreement is poor until we run our simulations longer to find a better match. Precisely because of this reason, the goal of this work was to go beyond a comparison of single structures and to provide a statistical characterization of the experimental and simulated data sets. The text has been amended to better convey this point.

5.2) The 3D coordinates are available for both the simulated and the experimental structural ensembles. Thus, in addition to analyses of macroscopic and statistical properties such as the distribution of Rg and the probability of open/close conformations, direct structural comparison should be feasible, which can show that indeed some of the structures generated by the simulations are highly consistent with the experimental structures. Can we compute the average accessible surface area for each 50kb segment and compare between simulations and experiments? Can we plot the contact probability for each pair of segments in the simulations against that in the experimental structures?

As mentioned in the previous response, no two structures are the same, confounding our direct analysis of structures. The surface area would be a very interesting quantity to compare and measure, which is directly related to buried or exposed regions of chromatin. However, for the experimental imaging data that is available, the surface area is not well defined. For example, the position of a locus in an image structure represents the mean position of the imaged locus. On the other hand, the contact probability for each pair of segments can easily be calculated; we thank the reviewers for the suggestion. We have included the scatter plot of the contact probability for each pair of segments between simulated and experimental structures in Figure 1—figure supplement 5.

5.3) In Figure 1, the simulated structures appear to have significantly more contacts than inferred from ligation experiments. Does this imply that the simulated structures might be too compact? Is this because the experiment sampling is more limited than the simulations? Or because there is some potential systematic discrepancy despite the high-level consistency between the simulation and the experiment. This should be discussed.

The reviewers bring up an excellent point. It is important to note that the DNA–DNA ligation maps of human lymphoblastoid cells (GM12878) [ Rao et al., 2014] are by far the most well-sampled Hi-C maps in existence with at least an order of magnitude more reads than the HiC maps of the other cell types discussed in this manuscript. While it might appear that the simulated maps are more compact from looking at the maps, this is not the case if we are to look at the average contact probability as a function of genomic distance, which we have added to Figure 1—figure supplement 5. For the majority of the cell types, the power law scaling of the contact probability is the same in the experiments and simulations. However, the average contact probability is notably higher in the experimental Hi-C maps for both HUVEC and H1-hESC with respect to the simulated maps; this can be seen in Figure 1—figure supplement 5 in both the power law scaling and the scatter plots of the contact probabilities. This could imply that the chromatin density is incorrect in these simulations. As mentioned previously, all of the simulations are performed at a constant DNA density that is consistent with Ref: [Rosa and Everaers, 2008]; we do not currently have any additional information regarding chromosomal density (as discussed in response to comment #4). Furthermore, the Minimal Chromatin model is not re-trained for each cell type.

The main text has been modified to better discuss these points.

6) While the clustering of both imaging and simulations clearly showed the heterogeneity of chromatin, how are clusters defined? If clustered by a different order parameter, e.g. Q instead of Rg, do we still have the same finding – I wonder if certain natural order emerges from clustering. If we cluster both the simulated and experimental structures together, do they mingle in the same clusters?

This is a deep question which is complicated by the fact that both the ordering parameters and the clustering algorithm affect the observed clusters, and yet, our hope is to uncover the true, natural clusters present in the data. In our work, we used agglomerative hierarchical clustering using 1-Q parameter as a distance. Clusters are naturally defined in the hierarchical clustering scheme by the distances 1-Q that separate different structures and are represented in the tree diagram of Figures 2 and 3. We did not cluster by Rg; although after clustering by 1 – Q, we did calculate Rg of each sub-cluster. There are many different parameters that can potentially be used to characterize structural ensembles; regardless of whether or not Q is the best collective variable to characterize chromatin structures, we have demonstrated that it can capture important structural features in real and simulated chromatin structures as both open and closed structures are identified from the experimental and simulated datasets.

7) There is an interesting discussion of the two-state equilibrium model and the PMF estimated energy gap of 4KT (also in the eighth paragraph of the subsection “Chromatin structural ensembles from DNA tracing reveal coexistence of open and 228 closed structures”). However, can imaging studies of Segment 1 of Bintu et al. really be thought as from thermodynamic equilibrium? Its experimental set-up may be quite complicated and it is unclear whether imaged cells are samples properly drawn from the equilibrium distribution. In fact, the authors stated that chromatins are highly dynamic and follow a non-equilibrium process (Discussion, first paragraph).

Thank you for your encouraging comment. In these HiC experiments, cells are synchronized so to be (predominantly) in interphase, the state of cellular homeostasis in between two successive cell divisions. Interphase is certainly not a state of thermodynamic equilibrium but is a long-lived steady state. This is also evident from the fact that it is possible to measure a well-defined characteristic contact map for the interphase ensemble. Our approach aims to capture the quasi-equilibrium of interphase using the typical Boltzmann distribution. We have no a priori guarantee this is a reasonable approach; we do however clearly show that this approach is sufficient to explain a large amount of experimental observation. In fact, at present there is no indication that a non-equilibrium theory is necessary, or useful, in modeling interphase human chromatin at the resolution treated in our manuscript (50kb), or larger. At the finer scale of kilobases, or in more dynamic phases of the cell life, we would clearly need to address the problem of non-equilibrium. To further complicate the matter, as the reviewers point out, it is also unclear how much the experimental procedure (fixing the cell, hybridization, etc.) degrade the observed ensemble.

8) "The genomic distance-dependent interactions recaptures the effect of motors acting along the DNA polymer." There is also well-known loop length dependent entropic effects as well, which may be part of the observed genomic distance dependency. It will be helpful if the authors clarify that only the motor-driven active process are at play in their model, or a mixture of both factors, and perhaps other hidden unspecified events are effectively accounted for in their model? Also, a citation on the motor process here will be helpful.

Thank you for your comment. The ideal chromosome includes both standard entropic effects and motors. Indeed, the approach is agnostic as to the full set of mechanisms at play. We have clarified the issue in the text as well as added references to the ideal chromosome.

9) The reported finding that the linker region connecting two compact domains contains most of genes for segment 1 is very nice. Is this observation general?

Thank you for your encouraging comment. Unfortunately, there isn’t a lot of experimental imaging data yet; therefore, the observation that we made about one particular 2Mb chromatin locus cannot yet be generalized. The next few years will truly be exciting times as more and more experimental data is generated. We look forward for more tracing/imaging experiments that examine gene rich regions.

10) The finding reported provides guidance for further biophysical studies of chromatin, for example, simple approximations such as Gaussian network models for chromatin domains are unlikely to be successful in capturing the heterogeneity of chromatin. The authors may want to briefly discuss the Gaussian network model in light of their results.

Thank you for your comment. One of our aims was precisely to shine light on this issue. We agree that the Gaussian network models likely will not be able to capture the conformational plasticity that is observed in the microscopy data and we need to go beyond methods that describe a single basin in the energy landscape. We have amended the text to make this point.

Revisions expected in follow-up work:

There was agreement that the value of the work could be greatly enhanced if the authors used their model to address a question of biological interest. For example, can the authors use their structural models-which have much richer details than available from experiments-to shed light on the distinction between the structural differences across different cell types and the variations among individual cells within the same cell type? What do the structural models teach us about transcription regulation?

This is a wonderful suggestion that aligns perfectly with our research agenda for the coming years. Uncovering the structure-to-function relationship of the genome is a key direction of our center and we look forward to comparing our models to the new experimental observations being made using microscopy as well as ligation methods. Currently, our models suggest that gene active segments of chromatin tend to organize at the surface of the chromosome territory, which is consistent with single-cell Hi-C experiments (Nagano et al., 2013).

https://doi.org/10.7554/eLife.60312.sa2

Article and author information

Author details

  1. Ryan R Cheng

    Center for Theoretical Biological Physics, Rice University, Houston, United States
    Contribution
    Conceptualization, Resources, Data curation, Software, Formal analysis, Supervision, Validation, Investigation, Visualization, Methodology, Writing - original draft, Project administration, Writing - review and editing
    For correspondence
    ryan.r.cheng@gmail.com
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-6378-295X
  2. Vinicius G Contessoto

    1. Center for Theoretical Biological Physics, Rice University, Houston, United States
    2. Brazilian Biorenewables National Laboratory - LNBR, Brazilian Center for Research in Energy and Materials - CNPEM, Campinas, Brazil
    Contribution
    Resources, Data curation, Software, Methodology
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-1891-9563
  3. Erez Lieberman Aiden

    1. Center for Theoretical Biological Physics, Rice University, Houston, United States
    2. Center for Genome Architecture, Baylor College of Medicine, Houston, United States
    Contribution
    Resources, Data curation, Software, Visualization
    Competing interests
    No competing interests declared
  4. Peter G Wolynes

    1. Center for Theoretical Biological Physics, Rice University, Houston, United States
    2. Department of Chemistry, Rice University, Houston, United States
    3. Department of Physics & Astronomy, Rice University, Houston, United States
    4. Department of Biosciences, Rice University, Houston, United States
    Contribution
    Formal analysis, Investigation, Writing - original draft, Writing - review and editing
    Competing interests
    No competing interests declared
  5. Michele Di Pierro

    1. Center for Theoretical Biological Physics, Rice University, Houston, United States
    2. Department of Physics, Northeastern University, Boston, United States
    Contribution
    Conceptualization, Formal analysis, Investigation, Methodology, Writing - original draft, Writing - review and editing
    For correspondence
    m.dipierro@northeastern.edu
    Competing interests
    No competing interests declared
  6. Jose N Onuchic

    1. Center for Theoretical Biological Physics, Rice University, Houston, United States
    2. Department of Chemistry, Rice University, Houston, United States
    3. Department of Physics & Astronomy, Rice University, Houston, United States
    4. Department of Biosciences, Rice University, Houston, United States
    Contribution
    Funding acquisition, Investigation, Writing - original draft, Writing - review and editing
    For correspondence
    jonuchic@rice.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9448-0388

Funding

National Science Foundation (PHY-2019745)

  • Ryan R Cheng
  • Vinicius G Contessoto
  • Erez Lieberman Aiden
  • Peter G Wolynes
  • Michele Di Pierro
  • Jose N Onuchic

National Science Foundation (CHE-1614101)

  • Jose N Onuchic

Welch Foundation (C-1792)

  • Jose N Onuchic

Cancer Prevention and Research Institute of Texas

  • Jose N Onuchic

Welch Foundation

  • Vinicius G Contessoto

Sao Paulo Research Foundation and Higher Education Personnel (2016/13998-8)

  • Vinicius G Contessoto

Higher Education Personnel Improvement Coordination (2017/09662-7)

  • Vinicius Contessoto

D. R. Bullard-Welch Chair at Rice University (Grant C-0016)

  • Peter G Wolynes

Welch Foundation (Q-1866)

  • Erez Lieberman Aiden

NIH Office of the Director (U01HL130010)

  • Erez Lieberman Aiden

NIH Office of the Director (UM1HG009375)

  • Erez Lieberman Aiden

NVIDIA Research Center Award

  • Erez Lieberman Aiden

McNair Medical Institute Scholar

  • Erez Lieberman Aiden

United States-Israel Binational Science Foundation (2019276)

  • Erez Lieberman Aiden

USDA (559-6040-8-001)

  • Erez Lieberman Aiden

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

The authors would like to thank Ting Wu for helpful discussions. This work was supported by the Center for Theoretical Biological Physics sponsored by the National Science Foundation NSF Grant PHY-2019745. JNO was also supported by the NSF-CHE-1614101 and by the Welch Foundation (Grant C-1792). JNO is a CPRIT Scholar in Cancer Research sponsored by the Cancer Prevention and Research Institute of Texas. V.G.C. is a Robert A Welch Postdoctoral Fellow and was also funded by FAPESP (São Paulo Research Foundation and Higher Education Personnel: Grant 2016/13998-8), and CAPES (Higher Education Personnel Improvement Coordination: Grant 2017/09662-7). Additional support to PGW was provided by the D R Bullard-Welch Chair at Rice University (Grant C-0016). E L A was also supported by the Welch Foundation (Q-1866), an NVIDIA Research Center Award, a McNair Medical Institute Scholar Award, an NIH 4D Nucleome Grant (U01HL130010), an NIH Encyclopedia of DNA Elements Mapping Center Award (UM1HG009375), an USDA award (559-6040-8-001) and a Binational Israeli Foundation Award (2019276).

Senior Editor

  1. Detlef Weigel, Max Planck Institute for Developmental Biology, Germany

Reviewing Editor

  1. Yibing Shan, DE Shaw Research, United States

Reviewers

  1. Yibing Shan, DE Shaw Research, United States
  2. Huafeng Xu, Unaffiliated with any company or institution, United States

Publication history

  1. Received: June 22, 2020
  2. Accepted: October 8, 2020
  3. Accepted Manuscript published: October 13, 2020 (version 1)
  4. Version of Record published: October 28, 2020 (version 2)

Copyright

© 2020, Cheng et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,344
    Page views
  • 245
    Downloads
  • 4
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Biochemistry and Chemical Biology
    2. Chromosomes and Gene Expression
    Robert P Fuchs et al.
    Research Article Updated

    Temozolomide (TMZ), a DNA methylating agent, is the primary chemotherapeutic drug used in glioblastoma treatment. TMZ induces mostly N-alkylation adducts (N7-methylguanine and N3-methyladenine) and some O6-methylguanine (O6mG) adducts. Current models propose that during DNA replication, thymine is incorporated across from O6mG, promoting a futile cycle of mismatch repair (MMR) that leads to DNA double-strand breaks (DSBs). To revisit the mechanism of O6mG processing, we reacted plasmid DNA with N-methyl-N-nitrosourea (MNU), a temozolomide mimic, and incubated it in Xenopus egg-derived extracts. We have shown that in this system, MMR proteins are enriched on MNU-treated DNA and we observed robust, MMR-dependent, repair synthesis. Our evidence also suggests that MMR, initiated at O6mG:C sites, is strongly stimulated in cis by repair processing of other lesions, such as N-alkylation adducts. Importantly, MNU-treated plasmids display DSBs in extracts, the frequency of which increases linearly with the square of alkylation dose. We suggest that DSBs result from two independent repair processes, one involving MMR at O6mG:C sites and the other involving base excision repair acting at a nearby N-alkylation adduct. We propose a new, replication-independent mechanism of action of TMZ, which operates in addition to the well-studied cell cycle-dependent mode of action.

    1. Chromosomes and Gene Expression
    Pedro Ortega et al.
    Research Article Updated

    DNA double-strand breaks (DSBs) are the most harmful DNA lesions and their repair is crucial for cell viability and genome integrity. The readout of DSB repair may depend on whether DSBs occur at transcribed versus non-transcribed regions. Some studies have postulated that DNA-RNA hybrids form at DSBs to promote recombinational repair, but others have challenged this notion. To directly assess whether hybrids formed at DSBs promote or interfere with the recombinational repair, we have used plasmid and chromosomal-based systems for the analysis of DSB-induced recombination in Saccharomyces cerevisiae. We show that, as expected, DNA-RNA hybrid formation is stimulated at DSBs. In addition, mutations that promote DNA-RNA hybrid accumulation, such as hpr1∆ and rnh1∆ rnh201∆, cause high levels of plasmid loss when DNA breaks are induced at sites that are transcribed. Importantly, we show that high levels or unresolved DNA-RNA hybrids at the breaks interfere with their repair by homologous recombination. This interference is observed for both plasmid and chromosomal recombination and is independent of whether the DSB is generated by endonucleolytic cleavage or by DNA replication. These data support a model in which DNA-RNA hybrids form fortuitously at DNA breaks during transcription and need to be removed to allow recombinational repair, rather than playing a positive role.