Abstract
Cancer cells undergo widespread changes in epigenetic patterns that mediate cancer compromised gene expression programs during cancer progression. However, the alterations in higher-order genome organization in which these changes occur and their functional implications are less well understood. To explore how chromatin structure and epigenetic parameters of genome architecture changes during cancer progression at a fine scale and genome-wide, we generated high-resolution Micro-C contact maps in non-malignant, pre-cancerous, and metastatic MCF10 breast cancer epithelial cells. We profiled progression-associated reorganization of chromatin compartments, topologically associated domains (TADs), and chromatin loops, and also identified invariable chromatin features. We find large-scale compartmental shifts occur predominantly in early stages of cancer development, with more fine-scale structural changes in TADs and looping accumulating during the later transition to metastasis. We related these structural features to changes in gene expression, histone marks, and potential enhancers and found a large portion of differentially expressed genes physically connected to distal regulatory elements. While changes in chromatin loops were relatively rare during progression, differential loops were enriched for progression-associated genes, including those involved in proliferation, angiogenesis, and differentiation. Changes in either enhancer-promoter contacts or distal enhancer activity were accompanied by differential gene regulation, suggesting that changes in chromatin contacts are not necessary but can be sufficient for gene regulation. Together, our results demonstrate a functionally relevant connection between gene regulation and genome remodeling at many key genes during cancer progression.
Key findings
The cancer genome is reorganized throughout cancer progression at the level of compartments, chromatin domains, and loops
Compartmental shifts occur in early stages of cancer development, with more fine-scale structural changes accumulating during metastasis
Chromatin domain boundaries are weakened during cancer progression
Many progression-regulated genes exhibit changes in distal enhancer histone modifications that are bridged by stable chromatin loops
Changes in enhancer activity or subtle changes in chromatin contacts can rewire enhancer-promoter connections to facilitate changes in gene expression
Prominent changes in chromatin loops occur at a small subset of differentially regulated genes during progression
Introduction
The eukaryotic genome is highly organized in the cell nucleus. Amongst the most prominent structural features are kilobase-sized chromatin loops, medium-scale topologically associating domains (TADs) and higher-order compartments (1–5). How chromatin organization contributes to epigenetic control of gene regulation, including in physiological and pathological settings, such as cancer, remains only partially understood (6–9).
A prominent mechanism to generate higher order chromatin structures is loop extrusion (10–12). During this process, the multi-component cohesin complex is loaded onto chromatin and, using its intrinsic molecular motor activity, extrudes chromatin bidirectionally along the genome to form a loop or a domain until it encounters the major chromatin architectural protein CTCF bound to convergently oriented binding sites. The encounter of cohesin with bound CTCF stalls the extrusion process and generates a chromatin loop or TAD. While loop extrusion has universally been implicated in formation of chromatin loops and domains, formation of chromatin features by other mechanisms have also been observed, especially at a smaller scale (13–16).
A common property of higher order chromatin folding is that the resulting loops, domains and compartments, bring distal genome elements into spatial proximity and into proximity of regulatory elements to their target genes has been implicated in gene regulation (2). For example, it has been suggested that one function of TADs is to facilitate the interaction of regulatory elements, particularly gene enhancers, with their target genes located within the same TAD (17–21). Similarly, long-range interactions via chromatin loops are thought to be essential at some loci to bring gene enhancers into proximity to their target promoters (22–26). However, regulation of many gene loci also appears independent of chromatin organization, and enhancer-promoter proximity often does not correlate with gene activity (27–30). In fact, acute depletion of cohesin revealed genome-wide disruption of chromatin organization but surprisingly limited impact on gene expression (31). A possible explanation for these divergent findings is that chromatin organization may be functionally more relevant to bring about changes in gene activity rather than maintenance of gene expression as suggested by several cohesin depletion studies (30,32,33).
Genome organization is likely relevant for cancer and its progression. Mutations in loop extrusion machinery, such as cohesin and the cohesin processivity factor NIPBL or at CTCF binding sites have been reported in many cancers (6,7,34–36). In addition, many structural variants (SVs) such as deletions, duplications, and translocations, have been documented in various cancer subtypes where SVs and the ensuing reorganization around them can lead to aberrant gene regulation (37–39).
Despite these observations, the full extent of genome reorganization during cancer progression, and its functional consequences, remains largely unknown. Several studies primarily focused on large scale reorganizations have found changes in higher-order chromatin organization such as chromosome clustering and dynamic compartments, some of which correlated with changes in differentially expressed oncogenes and enhancers (40–48). Analysis of TADs in various cancers have found mixed results, with some studies pointing to increased TADs and gained boundaries and others observing more stable TAD organization or weakened boundaries (42,46,47,49,50). Furthermore, cancer-associated structural variants, such as chromosomal translocations, have been related to altered gene expression, for example via enhancer-hijacking (51,52). However, how local chromatin loops and TADs are restructured during oncogenic reprogramming and how these changes relate to cancer-associated gene expression has not been well documented.
We address the question of how local and global chromatin organization changes during cancer progression, and how they related to cancer gene expression programs, by generating high-resolution Micro-C maps in the well-established MCF10 breast cancer progression model (53). This cancer model consists of three epithelial cell lines that were all originally derived from the same non-cancerous patient (54). MCF10A is an adherent epithelial pre-malignant cell line that spontaneously immortalized from the initial patient sample. Pre-malignant MCF10AT1 cells were derived from MCF10A cells by overexpressing a mutant Ha-Ras oncogene (55). When xenografted into immunocompromised mice, these cells form precancerous lesions. Metastatic MCF10CA1a cells are derived from metastatic tumors generated from xenografted MCF10AT1 cells (56). The MCF10 progression series represents a spectrum of cells that share a similar genetic background but are increasingly more cancerous and they have been widely used to study the genetic and epigenetic changes that occur during cancer progression and epithelial-mesenchymal transitions (EMT) (41,57–65).
In this study, comparing fine-scale chromatin organization and other epigenetic features in the MCF10 cancer progression model has allowed us to identify stage-specific differences in genome reorganization and relate them to changes in gene expression, including of cancer progression-associated genes. Our results provide novel insights into the principles of chromatin-mediated gene regulation and into the dynamic structure-function relationship potentially contributing to genome regulation in cancer.
Results
Mapping global and local genome organization across breast cancer progression
To understand how genome organization changes during cancer progression, we generated high-resolution (5 kb) genome-wide maps of chromatin contacts using Micro-C in the MCF10A, MCF10AT1, and MCF10CA1a cancer progression series (Fig. 1A, Supp. Fig. 1A). We obtained high-quality data with at least 1 billion Micro-C contacts per cell line, spread across 2 biological replicates with 4 technical replicates each (Supp Table 1).

Reorganization of compartments, TADs, and loops during breast cancer progression
(A) A diagram of the experimental design. Three epithelial cell lines represent various stages of breast cancer progression; MCF10A are non-cancerous, MCF10AT1 are pre-malignant, and MCF10CA1a are metastatic. In each cell line we generated 5kb resolution Micro-C to identify features such as compartments, topologically associating domains (TADs), and chromatin loops. We overlapped these features with functional changes in gene expression from RNA-Seq, histone modifications and CTCF binding from ChIP-Seq, and chromatin accessibility from ATAC-Seq. (B) Micro-C maps of a 2 Mb region of chromosome 1 in MCF10A (non-cancerous), MCF10AT1 (pre-cancerous), and MCF10CA1a (metastatic) cells at 5 kb resolution. Each map has annotations for loop calls, both static (black boxes) and differential (red boxes). Below each map is a track indicating compartment calls from CALDER (dark red is most A-like, dark blue is most B-like) as well as insulation scores tracks with static (grey) and differential (red) boundaries marked. Ribbons indicate TAD calls for each cell type. (C) Lengths of the genome assigned to each compartment in each cell type. (D) TAD and (E) loop calls from each cell type, colored by the number of maps they were initially detected in. (F) Saddle plots of interactions between regions of different compartments in MCF10A, MCF10AT1, and MCF10CA1a. Bottom plots indicate the average eigenvector value for each compartment ventile. Plots shown are for chromosomes 2, 12, and 17 (see Methods). (G) Left; Differential TAD boundaries clustered by their timing of change, depicted in line plots and heatmap. Right; aggregate plots of weakened and strengthened TAD boundaries (n=100). (H) Left; Differential chromatin loops clustered by their timing of change, depicted in line plots and heatmap. Right; aggregate plots of weakened and strengthened loops (n=100).
We identified features of chromatin organization at several levels, including large-scale reorganizations of compartments, medium-scale changes in topologically associating domains (TADs), and fine-scale changes in chromatin loops (Fig. 1B). We assigned A and B compartments using CALDER at a resolution of 10kb, TADs using SpectralTAD combined with FAN-C boundary insulation score calculations at 10kb, and loops using SIP at 5- and 10kb resolution (see Methods for details). Each cell type had similar percentages of the genome assigned to the active A (47.1-50.2%) or inactive B (49.7-52.9%) compartment, with the two cancer cell types MCF10AT1 and MCF10CA1a having a slightly higher proportion of A compartment designations (Fig. 1C). Similarly, we detected a similar number of TADs (between 7,459-7,825) and chromatin loops (between 15,713-17,332) in each cell line (Fig. 1D-E). Although the three cell lines are karyotypically similar due to their shared genetic background, they contain large-scale chromosomal duplications and translocations which were identified by SKY karyotyping analysis (Supp. Fig. 1B), and numerical chromosome aberrations based on SKY and Micro-C sequencing depth analysis were included in the analysis of all chromatin features (see Methods; Supp. Fig. 1C). After this correction, chromatin loop counts showed a high degree of reproducibility between technical and biological replicates (Supp. Fig. 1D). We used this deep dataset to characterize structural features that are reshaped during cancer progression.
Cancer progression reorganizes compartments, TADs, and chromatin loops
Comparative analysis across the three cell types identified significant changes in all major chromatin features during cancer progression (Fig. 1, Supp. Fig. 2A-B, Supp. Table 2-4).
At a large-scale, we detected changes in compartmentalization. We observe a general shift towards the more active A compartment in early cancer progression, with a larger portion of genomic regions becoming more A-like (31.0%) compared to more B-like (26.0%) in the transition from MCF10A to MCF10AT1, while these changes are more balanced in the later transition from MCF10AT1 to MCF10CA1a (30.0% and 30.6%, respectively). Interactions within the most A-like and B-like compartments were predominant in the pre-cancerous MCF10A cells (Fig 1F). However, in MCF10AT1 and MCF10CA1a, stronger interactions appear between more intermediate regions, suggesting a greater degree of intermixing that is consistent with increased compartmental heterogeneity which appears to occur early during cancer progression (Fig. 1F) (42,66).
TAD boundaries represent genomic regions where upstream and downstream sequences are partially insulated from one another, with fewer contacts between them than within (1,5,19). We detected a total of 13,231 TADs across all three cell types, with 17,097 unique boundaries. TADs detected range in size from 190kb to as large as 3.8 Mb, with a mean of 663kb and a median of 460kb (Supp. Fig. 2C). Assessing changes in insulation score (IS) at TAD boundaries revealed 3,392 (19.8%) boundaries where the degree of insulation changed significantly over the course of cancer progression. Because individual boundaries may be simultaneously used by multiple TADs, the total number of TADs which changed during progression is 5,084 (38.5%) (Fig. 1G). There are nearly three times as many boundary changes between later stages in cancer progression (1,693 differential boundaries between MCF10AT1 and MCF10CA1a) than early stages (567 between MCF10A and MCF10AT1). Interestingly, TAD boundaries that gained or lost insulation during progression showed a significant enrichment for weakened boundaries (71.2%) with far fewer boundaries exhibiting increased insulation strength as cancer progressed (28.8%; permutation test, Supp. Fig. 2D). This late-stage weakening of boundaries may reflect a more heterogeneous cell population as cancer progresses (6,67,68).
Chromatin loops are formed by two distal genomic regions that are in more frequent contact than their surrounding or intervening sequences, indicated by higher contact frequency (1,3,10,11). We found 29,205 chromatin loops across all three cell lines, ranging in size from 50 Kb to 2 Mb, with a mean of 402kb and median of 270 Kb in length (Supp. Fig. 2C). 77.6% of loop anchors coincided with CTCF peaks, representing 95.0% of loops with at least one anchor bound by CTCF, and CTCF-bound loops were stronger and longer than non-CTCF loops (Supp. Fig. 2E-F). Loop boundaries often overlapped with TAD boundaries with 52.4% of TADs consisting of loop domains across all cell lines. However, a majority (73.0%) of chromatin loops did not include TADs (Supp. Fig. 2G). TADs without loop interactions at their boundaries tended to be larger, while loops without TADs can be found at all sizes but are enriched for shorter loops (Supp Fig. 2H).

Persistent chromatin loops connect differentially expressed genes to distal enhancers.
(A) Percentages of loops designated as either promoter-promoter, enhancer-promoter, enhancer-enhancer, or single-sided promoter or enhancer loops. (B) Distributions of loop sizes by enhancer/promoter designations. P-values represent T-tests comparing the means of different loop classes. Boxplots show the median (middle line), 25th and 75th quartiles (box perimeters), and range excluding outliers (dashed line whiskers). Outliers are defined as values that are over 1.5 times the interquartile range beyond the box bounds and are excluded from these plots. (C) Distributions of loop strength by enhancer/promoter designations. (D) Percentages of upregulated genes that have gained H3K27ac at promoters, distal enhancers, both, or gained loops. P-values represent T-tests comparing the means of various loop sets. Non-significant (n.s.) represents p-values above 0.05. (E) Log2 fold-change of distal H3K27me3 (grey), distal H3K27ac (red), promoter H3K27ac (orange), gene expression (yellow), and loop strength (blue), when overlapped. Grey dots indicate features that do not change significantly, while colored points are significantly differential features. Boxplots are defined as in (B). P-values represent T-tests comparing the means of each class to 0. Non-significant (n.s.) represents p-values above 0.01. (F) Percentages of downregulated genes that have gained H3K27ac at promoters, distal enhancers, both, or gained loops. (G) Log2 fold-change of distal H3K27me3 (grey), distal H3K27ac (red), promoter H3K27ac (orange), gene expression (yellow), and loop strength (blue), when overlapped. Boxplot details as defined in (E). (H) An example of an upregulated gene (SPRY1) connected to gained enhancers by static loops. Black boxes show loop annotations. Red compartment tracks indicate A compartments, while blue indicates B compartments. In CTCF signal tracks, red highlights indicate differential CTCF peaks. In H3K27ac and ATAC-Seq signal tracks, red highlights indicate differential enhancers as determined by changes in H3K27ac. Genes highlighted in black are differentially expressed. (I) An example of downregulated genes (SCNN1G, SCNN1B) connected to lost enhancers by static loops. Plot annotations are as described in (H).
To identify loops that changed significantly during cancer progression, we assessed changes in contact frequency among every loop in each cell type, correcting for karyotypic differences that result in differences in coverage between cell lines (see Methods). We identified 1,469 chromatin loops that change significantly over the course of cancer progression, including both weakened and strengthened contacts (Fig. 1H), representing 5.0% of all identified loops. Unlike TADs there was a more balanced number of changes between early (1,004 differential loops between MCF10A and MCF10AT1) and late (1,204 between MCF10AT1 and MCF10CA1a) progression stages, as well as between strengthened (679 loops, 46.2% of all differential loops) and weakened loops (790 loops, 53.8% of all differential loops). Interestingly, only a small portion (19.0%) of differential loops were accompanied by changes in CTCF binding (Supp Fig. 2I). Motif analysis of differential loop anchors revealed only weak motifs of various transcription factors enriched at the boundaries of gained and lost loops, although occupancy did not appear high enough to explain most of the changes we observe (Supp Fig. 2J). Weakened loops were often associated with a decrease in H3K27ac, a mark of active enhancers, consistent with the notion that active enhancers can help recruit loop extrusion machinery (Supp Fig. 2K; see below) (69,70).
Taken together, these results demonstrate significant global changes in genome organization during cancer progression across multiple scales from chromatin compartments to loops.
Persistent chromatin loops connect cancer progression-regulated genes with distal regulatory features
To explore the functional role of chromatin loops, we related them to gene expression and potential regulatory regions. We identified 17,185 expressed genes across all three cell types using RNA-Seq (see Methods) and 52,953 potential enhancers as defined by overlapping histone H3K27ac ChIP-Seq and ATAC-Seq accessibility peaks, commonly used to identify enhancer regions (see Methods) (71–73).
Approximately half of chromatin loops featured some combination of active gene promoters and enhancers within 10kb of loop anchors (Fig. 2A). We found that chromatin loops that connect two features (either enhancers or promoters; mean length 251-280 kb) are typically shorter than those that contain only one feature (mean length 338-372 kb) or none (Fig. 2B; mean length 495 kb; T-test p-value for all comparisons < 2.2e-16), with promoter-promoter loops being the smallest on average (mean length 251 kb). Interestingly, enhancer-promoter and enhancer-enhancer loops (mean counts 8.2, 8.2, respectively) were stronger than promoter-promoter loops (mean counts 7.3) despite being longer on average, suggesting that epigenetic signatures associated with active enhancers may support stronger contacts (Fig. 2C; T-test p-value for both comparisons < 2.6e-7).
We then explored how differentially expressed genes relate to long-range chromatin interactions. We identified 8,840 differentially expressed genes across all pairwise comparisons in the MCF10 cancer progression (Supp Fig. 3A-B). A similar number of genes changed in later stages of cancer development (4,968 between MCF10AT1 and MCF10CA1a) compared to early progression (4,773 between MCF10A and MCF10AT1). Reassuringly, as expected from previous studies (41,64), genes associated with epithelial morphogenesis and cell adhesion were upregulated early during progression, whereas regulation of differentiation, tissue development, metabolism, and signal transduction genes was observed during later stages of progression (Supp Fig. 4B). These changes are consistent with the development of an intermediate and diverse pre-cancerous state early on during progression, while late changes are known to facilitate metastasis and support the epithelial-to-mesenchymal-like transition observed phenotypically among the progression series (Supp Fig. 3C-D) (74–77).

Changes in enhancer acetylation or enhancer-promoter contact are associated with changes in gene expression.
Boxplots of distal enhancer H3K27ac (pink), enhancer-promoter contact (blue), and ABC score (purple), as well as gene log2 fold-change (yellow) for enhancer promoter pairs that feature (A) differential H3K27ac among enhancers, (B) differential enhancer-promoter contact frequency, and (C) differential H3K27ac for enhancer-promoter pairs supported by a chromatin loop. Boxplots in (D) and (E) represent sets of non-looped enhancer-promoter pairs with differential H3K27ac that are matched to the looped set in (C) by contact and distance, respectively. Boxplots show the median (middle line), 25th and 75th quartiles (box perimeters), and range excluding outliers (dashed line whiskers). Outliers are defined as values that are over 1.5 times the interquartile range beyond the box bounds and are excluded from these plots. P-values represent T-tests comparing the means of values in MCF10A and MCF10CA1a for enhancer activity, enhancer-promoter contact, and ABC Score, and T-tests comparing the mean of the value to 0 for gene log2 fold-change. (F) Contact distribution of all enhancer-promoter pairs (dashed line), compared to the looped enhancer-promoter pairs in (C, solid line), the contact-matched pairs in (D, blue shade), and the distance-matched pairs in (E, grey shade). (F) Distance distribution of all enhancer-promoter pairs (dashed line), compared to the looped enhancer-promoter pairs in (C, solid line), the contact-matched pairs in (D, blue shade), and the distance-matched pairs in (E, grey shade). (G) Summary boxplot of the gene log2 fold-change for the enhancer-promoter pairs previously shown in figures (A-E). P-values represent T-tests comparing the means of average gene log2 fold-changes values for different sets of enhancer-promoter pairs.

Differential loops are enriched for cancer-relevant differentially expressed genes
(A) Log2 fold-change of differentially expressed genes at the anchors of gained (blue), weakened (green), or static (grey) loops. Boxplots show the median (middle line), 25th and 75th quartiles (box perimeters), and range excluding outliers (dashed line whiskers). Outliers are defined as values that are over 1.5 times the interquartile range beyond the box bounds and are excluded from these plots. P-values represent T-tests comparing the mean of each set to 0. (B) Bar plot showing the number of differentially expressed genes at strengthened or weakened loop anchors. Bar segments are colored by whether the gene is changing the same (blue for upregulated in strengthened loops, green for downregulated in weakened loops) or opposite (grey) direction as the loop. P-value represents a Fisher’s Exact Test for whether the odds ratio (OR) is greater than 1. (C) GO term enrichments for genes upregulated in MCF10A, MCF10AT1, or MCF10CA1a. Size indicates p-value. Terms are color-coded based on gene type; morphogenesis (purple), proliferation (orange), and cell adhesion (teal). (D) An example of an upregulated gene (COL12A1) with a promoter that overlaps a strengthened loop with distal enhancers. Black boxes show loop annotations, while red boxes indicate differential loops. Red compartment tracks indicate A compartments, while blue indicates B compartments. In CTCF signal tracks, red highlights indicate differential CTCF peaks. In H3K27ac and ATAC-Seq signal tracks, red highlights indicate differential enhancers as determined by changes in H3K27ac. Genes highlighted in black are differentially expressed. (E) An example of a downregulated gene (WNT5A) with a promoter that overlaps with several weakened loops containing distal enhancers that lose H3K27ac. Plots are annotated as in (A).
To understand the regulatory modes of action of genes which were differentially expressed during cancer progression, we determined if they had gained or lost the activity-associated H3K27ac mark at their promoters or at distally looped enhancers. We found that while many genes only featured a corresponding change in H3K27ac at their promoter (53.9% of upregulated and 28.6% of downregulated genes), a large percentage also showed changes in distal enhancer activity (28.8% of upregulated and 17.4% of downregulated genes), suggesting that enhancer loops may be playing an important functional role in control of these genes (Fig. 2D, F). Comparing the direction of fold-change for genes and promoter H3K27ac, distal H3K27ac, or contact frequency with distal enhancers using Fisher’s Exact test revealed odds ratios significantly higher than 1 for all comparisons (9.4, 2.2, and 1.2, respectively for changes between MCF10A and MCF10CA1a), but that there was a stronger association with promoters than enhancers or loop strength (Supp. Fig. 3E-F). This trend was similar for genes that were differentially regulated both early and late, suggesting that the role of chromatin loops is consistent across all stages of cancer progression.
Comparing the changes in acetylation at all gene promoters and distal regulatory regions revealed that upregulated genes exhibit a significant increase in H3K27ac at distally looped enhancers, as well as a significant loss of repressive H3K27me3 marks (Fig. 2E; T-test p-value for both < 2.2e-16). This trend is less clear for downregulated genes, which feature both gained and lost H3K27ac at both enhancers and promoters as well as both gained and lost H3K27me3 at distal regions (Fig. 2G). These results suggest that the static chromatin structures observed during the cancer progression process contribute to the control of differentially regulated genes, particularly among upregulated genes.
Given that only 5% of loops changed significantly during progression (see Fig. 1), it is not surprising that only a small percentage of differentially expressed genes exhibited significant changes in chromatin contacts with distal enhancers (2.1% of upregulated and 2.2% of downregulated genes; Fig. 2D, F). However, on average there is no significant change in contact frequency between gene promoters and distal enhancers. This trend was similar between both early- and late-regulated genes (Supp. Fig. 3G). Most differentially expressed genes are in regions where chromatin structure is essentially stable, reinforcing that persistent structural changes are not universally required for gene regulation. For example, the SPRY1 gene, which regulates cell growth and differentiation and has been shown to be upregulated in triple-negative breast cancer tumors (78), is upregulated between MCF10AT1 and MCF10CA1a, and is statically looped to distal enhancers that gain H3K27ac (Fig. 2H). Similarly, the SCNN1G gene, which encodes for a subunit of a sodium channel and is suppressed in head and neck squamous cell cancer (79), is downregulated between MCF10AT1 and MCF10CA1a, and is statically looped to distal enhancers that lose H3K27ac (Fig. 2I). In both cases, the contact frequency remains relatively constant despite changes in distal enhancer acetylation.
Taken together, our results show that changes in gene expression are not necessarily accompanied by structural changes, and they suggest that stable chromatin loops may facilitate cancer progression by providing a pre-existing structure through which differentially regulated distal enhancers can act on target genes.
Changes in enhancer acetylation and enhancer-promoter contact are associated with changes in gene expression
To begin to distinguish the effects of enhancer-promoter contact and chromatin looping from enhancer activity effects, we compared gene expression changes at looped and non-looped enhancer-promoter pairs. To do so, we used the activity-by-contact (ABC) model to predict functional enhancer-promoter pairs. ABC combines estimates of enhancer accessibility and activity from ATAC-Seq and H3K27ac ChIP-Seq with enhancer-promoter contact frequency from Micro-C data to generate an estimate of the likelihood of functional enhancer-gene interactions (see Methods) (80). This method allowed us to identify distal regulatory regions that are functionally linked to gene promoters without specifically requiring overlap with chromatin loops. For example, an enhancer and promoter may be in high contact as measured by Micro-C because they overlap with loop anchors, or because they are at close genomic distance.
Applying the ABC model to our data identified 150,056 potential enhancer-promoter pairs across all three cell types. Of these, 53.4% are also promoters of other genes, 23.7% are within the bodies of other genes, and 22.9% are intergenic, and range in distance from 750 to 5M base pairs away from target gene promoters (Supp Fig. 4A). To better understand the relationship between contact frequency, enhancer activity, and gene expression, we asked how changes in enhancer activity or contact relate to gene expression at target promoters.
Pairwise comparison of the MCF10 progression lines indicated that changes in both contact frequency and enhancer activity appear to drive changes in enhancer-promoter networks predicted by ABC (Fig. 3). For example, observing potential enhancers with changes in H3K27ac between MCF10CA1a and MCF10A reveals that these enhancers also exhibit a change in contact frequency and are associated with upregulation of target genes (Fig. 3A). We also find that changes in contact frequency are associated with increases in H3K27ac and correlate with higher gene expression (Fig. 3B). These results show that not only changes in either contact frequency and enhancer activity correlate with increased gene expression, but they also correlate with each other, suggesting a potentially linked functional role during enhancer-promoter communication.
To then relate enhancer-promoter pairs to chromatin loops and to orthogonally assess whether chromatin loops are acting as a functional bridge for active enhancers, we compared looped and non-looped enhancer-promoter pairs. Enhancer-promoter pairs that have changes in distal H3K27ac and are supported by chromatin loops correlated with changes in gene regulation (Fig. 3C). This effect was stronger than distance-matched non-looped enhancer-promoter pairs, but similar to contact-matched non-looped pairs, suggesting that increased contact frequency caused by loop extrusion may contribute to the stronger correlation with gene expression (Fig. 3D-E). Contact-matched non-looped pairs were closer on average to the looped pairs of similar contact frequency, while distance-matched non-looped pairs were in less-frequent contact than looped pairs of similar genomic distance (Fig. 3F-G).
Comparing the distributions of target gene fold-change for these various sets of enhancer-promoter pairs reveals several trends (Fig. 3H). First, pairs with significant changes in contact have a larger mean gene fold-change than pairs with significant changes in activity, suggesting that either can contribute to changes in enhancer-promoter functional pairing but that contact may have a particularly strong impact. Second, looped enhancer-promoter pairs have a comparable or larger mean gene fold-change to pairs with changes in activity or contact, suggesting again that chromatin loops may support functional enhancer-promoter pairs. Lastly, looped pairs have a similar mean gene fold-change as contact-matched pairs, which in turn have a higher mean gene fold-change than distance-matched pairs, suggesting that the increased contact frequency that chromatin loops provide to enhancer-promoter pairs may be a driving force for the functional connection. These trends hold true for all (tested) pairwise comparisons between cell types (Suppl Fig. 4B-C).
Taken together, these findings demonstrate that not all gene regulatory changes are accompanied by chromatin reorganization, but when it occurs reorganization appears to facilitate functional changes.
Changes in chromatin looping are enriched for progression-associated differentially expressed genes
We next explored how changes in chromatin looping may functionally contribute to gene regulation during cancer progression. To do so, we compared changes in chromatin loop contacts to alterations in expression of progression-associated genes. Overall, while only a small subset of gene promoters overlaps with the anchors of differential chromatin loops (507 genes, 3.0% of expressed genes), those that do are significantly enriched for genes that are differentially expressed during cancer progression based on permutation analysis (331 differentially expressed genes, 65.3% of all differentially looped genes; Supp. Fig. 5A).
We asked whether there was a relationship between the formation or loss of loops and differential expression of these loop-associated genes. We indeed find that differential loops are more likely to change in the same direction as the gene (i.e. increased contact frequency with distal regions associated with increased gene expression) (Fig. 4A-B). The fold-change of differential expressed genes which also showed differential loping were significantly higher than a random sampling of differentially expressed genes (Fig. 4A). For example, of 3,261 genes that were differentially upregulated between MCF10A and MCF10CA1a, loops were significantly strengthened at 98 of these genes and significantly weakened at 31. Similarly, of 3,088 downregulated genes, 65 genes overlap weakened loop anchors and 41 genes overlap strengthened loop anchors. In contrast, the number of expected genes at strengthened or weakened loops for a random sampling of genes this size is 38 and 32, respectively (Supp. Fig. 5B-C). We also find a subset of chromatin loops and genes changed in opposite directions (Fig. 4B). The genes whose changes in expression correlate with changes in looping are enriched for several cancer-relevant pathways, such as morphogenesis, differentiation, and proliferation (Fig. 4C) (81).
In total, we identified 127 unique genes upregulated in areas that experience increased chromatin contacts, either at loop anchors or within existing structures. As an example, the promoter of the COL12A1 gene, which encodes a collagen protein known to be associated with aggressive and mesenchymal phenotypes, overlaps a loop boundary that is very weak in MCF10A cells where COL12A1 is not expressed (82–84). As COL12A1 gene expression is upregulated during progression, contacts increase at a distal region 310 kb away, and H3K27ac and accessibility also increase at these likely distal regulatory regions (Fig. 4D).
Similarly, we observe 123 unique genes that are downregulated during oncogenic progression. One example is WNT5A which encodes for an important signaling protein and is downregulated in breast cancer and across MCF10A progression (85–87). Similar to COL12A1, the promoter of WNT5A is involved in many differential distal regulatory contacts, ranging in distance from 240 to 640 kb (Fig. 4E). Unlike COL12A1, these contacts are strongest in MCF10A cells and severely weaken in MCF10AT1 and MCF10CA1a cells. Accompanying these changes in contact, the distal regulatory regions that appear to support WNT5A in MCF10A cells become deacetylated and decrease in accessibility.
Genome reorganization occurs at cancer-relevant genes and is accompanied by proximal and distal changes in histone marks
Finally, we aimed to comprehensively explore the genes that are differentially regulated in areas that also have strong changes in chromatin looping, to better understand the potential regulatory mechanisms at play.
A locus-by-locus view of gene and loop fold-change allows us to view the relationship between changes in expression and structure among each pairwise comparison of cells (Fig. 5A-C). While we see that a majority of genes have a corresponding change in looping (i.e. up-regulated genes overlapping strengthened loops), we observe that the percentage of corresponding changes increases in the later stages of cancer progression. For example, the percentage of differential loop-gene pairs where the gene overlaps at least one gained loop is 47.5% and 69.9% among genes up-regulated in MCF10A compared to MCF10AT1 and MCF10CA1a, respectively, 66.7% and 79.6% among genes up-regulated in MCF10AT1 compared to MCF10A and MCF10AT1, respectively, and 70.7% and 58.0% among genes up-regulated in MCF10CA1a compared to MCF10A and MCF10AT1, respectively. This may indicate that the regulatory impacts of changes in chromatin looping occur over longer timescales, or that genes impacted by changes in chromatin structure may be more common in metastatic cells. We do not, however, find any correlation between the magnitude of loop fold-change and gene fold-change.

Progression-associated differentially expressed genes exhibit local and distal epigenetic changes at differential loops.
(A-C) Log2 fold-change of genes (colored dots) and the differential loops they overlap with (black/grey dots) for genes and loops that change between (A) MCF10A and MCF10AT1, (B) MCF10AT1 and MCF10CA1a, and (C) MCF10A and MCF10CA1a. Gene labels are below. (D) Log2 fold-change between MCF10A and MCF10CA1a of distal H3K27me3 (grey), distal H3K27ac (red), promoter H3K27ac (orange), gene expression (yellow), and loops (blue) among upregulated genes that overlap with gained loops (darker colors) or lost loops (lighter colors). Boxplots are defined as in (A). P-values represent T-tests comparing the mean values of the features at loops that change in the same and those that change in opposite directions from the differential genes at their anchors. Non-significant (n.s.) p-values are any p-values above 0.01. (E) Log2 fold-change of distal H3K27me3 (grey), distal H3K27ac (red), promoter H3K27ac (orange), gene expression (yellow), and loops (blue) among downregulated genes that overlap with gained loops (darker colors) or lost loops (lighter colors). P-values represent T-tests comparing the mean values of the features at loops that change in the same and those that change in opposite directions from the differential genes at their anchors. Non-significant (n.s.) p-values are any p-values above 0.01.
To better understand how changes in looping are related to gene expression, we compared patterns of gene expression, promoter H3K27 acetylation, and distal enhancer H3K27 acetylation or trimethylation at looped genes that change in either the same or opposite direction as the loops they overlap (Fig. 5D-E, Supp. Fig. 5D-E). These two histone H3 modifications are mutually exclusive and have opposite effects on gene expression, marking enhancers and silencers, respectively (88). Both modifications are able to impact distal gene expression via chromatin interactions (89).
Up-regulated genes between MCF10A and MCF10CA1a show similar epigenetic signatures at both proximal and distal regions, regardless of whether the loop they overlap gets stronger or weaker (Fig. 5D). Promoter regions gain H3K27ac, while distal regions gain H3K27ac and lose H3K27me3. There is however a significant difference in the extent of distal changes depending on the loop direction, with strengthened loops exhibiting a significantly higher increase in distal H3K27ac and a decrease in H3K37me3 marks. This behavior further supports the notion that these distally looped regulatory regions are important functional elements. Down-regulated genes show distinctly different signatures (Fig. 5E). Genes that overlap with weakened loop anchors show decreased promoter H3K27ac and distal H3K27ac and increased distal H3K27me3, consistent with signatures typically associated with reduced gene expression (88). Interestingly, genes that overlap strengthened loop anchors show different patterns, with a gain in promoter H3K27ac and loss of distal H3K27me3 repressive marks. We conclude that expression of a subset of progression-associated genes is strongly correlated with loop formation. These trends are similar but weaker for genes that change between different cell types (Supp. Fig. 5D-E).
Taken together, our genome-wide analysis of structural and regulatory changes during MCF10A cancer progression has highlighted hundreds of restructured regions where cancer-relevant genes are differentially regulated. These findings suggest that, while relatively rare, changes in chromatin looping may be associated with regulatory changes that drive expression of hundreds of oncogenes during cancer progression.
Discussion
Multiple levels of chromatin organization and integration with epigenetic parameters contribute to regulation of gene expression. We have identified dynamic chromatin organizational changes on multiple scales across breast cancer progression using the MCF10A model system. By comparing both a pre-malignant and metastatic cell line to non-cancerous epithelial cells we were able to detect both early- and late-stage changes as well as similarities in genome structure and relate them to gene expression changes, including many cancer-relevant genes.
We found that compartmental shifts occur more often in early stages of cancer development. This behavior is consistent with previous studies that have shown intermixing of chromatin A and B compartments in cancer cells (42,50). The general shift towards more active compartments during cancer progression may reflect a broader epigenetic landscape and the greater heterogeneity in gene expression within cancer cell populations (6,90–92).
On a finer scale, structural changes in TADs occur more often during later stages of metastasis. We also find an abundance of weakened TAD boundaries in MCF10 breast cancer progression compared with boundaries that gain insulation. This finding is in line with previous studies which have shown TAD weakening in triple-negative breast cancer (49), but is contrary to other observations which have suggested that prostate cancer is associated with the formation of many additional TADs (46), or findings of minimal changes in TAD structure in colorectal or breast cancer (42,50). The variety of observations on TAD structure in cancer may reflect the differences in the types of cancer samples analyzed or the analysis methods used. In the MCF10 model, we were able to detect subtle changes in insulation score at TAD boundaries, weakening of which could again support a more heterogeneous structure in more aggressive cancer stages.
Chromatin loops functionally connect gene promoters to distal regulatory elements. In the MCF10 model, many genes differentially regulated during cancer progression are associated with chromatin loops shared between all three cell lines, but show changes in distal enhancer H3K27ac and H3K27me3. These trends were stronger with up-regulated genes, suggesting that we may need to explore different epigenetic signatures to better understand how chromatin structure may influence repression. Activity-by-contact analysis showed that looped enhancer-promoter pairs also exhibit greater correlation between distal enhancer H3K27ac and gene expression than non-looped enhancers due to their increased contact frequency. These findings suggest that persistent chromatin loops that do not change during cancer progression nevertheless have functional relevance and that they do so by bridging enhancers to target gene promoters.
Activity-by-contact analysis also revealed that subtle changes in chromatin contact can contribute to the rewiring of enhancer-promoter regulatory connections. Enhancer-promoter pairs that exhibit changes in contact correlated with stronger changes in target gene expression than those with only changes in activity, in line with the concept that contact with distal regulatory elements is an important component of gene regulation (93). We also found that changes in chromatin contact are associated with more modest changes in activity, and vice-versa. This correlation between enhancer activity and enhancer-promoter contact further points to a functional link between the two. Together, these results suggest that both contact frequency and activity contribute to enhancer-promoter connectivity.
Strong changes in contact at chromatin loops are relatively rare across cancer progression, but many clear examples do exist, and they are notably enriched for differentially expressed genes. Interestingly, only a small portion of differential loops can be explained by changes in CTCF binding at anchors, suggesting other forces may be influencing their contact frequency. Loss of H3K27ac can be seen at the anchors of lost loops, consistent with the idea that active enhancers can help recruit cohesin to chromatin (69,70). Therefore, some weakened loops might be explained by a loss of H3K27ac which leads to a loss of cohesin at that region.
We also found that genes at differential loops are more likely to be differentially regulated in the same direction as the change in loop strength. This supports the notion that a subset of genes may be regulated in a structure-dependent manner. This interpretation is in line with previous observations which have shown a subset of genes to be sensitive to cohesin or NIPBL depletion which disrupts chromatin loops (11,17,20,25,32,94–96). Importantly, these findings suggest this subset of structure-sensitive genes may include many with direct relevance to breast cancer progression.
Epigenetic signatures at gene promoters and distal regions differed based on the direction of gene change. Up-regulated genes consistently showed a gain of active-associated H3K27ac marks at both promoters and distal regions, and a loss of distal repressive H3K27me3 marks. These changes were shared between static, strengthened, and weakened loops, although up-regulated genes at strengthened loops had stronger distal changes. These findings are consistent with the loops functionally supporting interactions with distal enhancers via increase contact. However, down-regulated genes showed more complex patterns. Fewer down-regulated genes could be explained by changes in H3K27ac or H3K27me3 at the promoter or distal regions, with over half of down-regulated genes having no clear epigenetic driver, compared to only 15% of up-regulated genes. This suggests that these histone marks are not sufficient to explain down-regulated genes as well as they can explain up-regulated genes. Down-regulated genes at differential loops also showed opposite patterns based on the direction of loop change; weakened loops showed loss of distal H3K27ac and gain of H3K27me3 consistent with an inactivated enhancer, while strengthened loops showed the opposite. Additional studies will be required to fully understand the repressive mechanisms in this system and how they relate to chromatin structure.
Our study has several limitations. Based on its largely correlative nature, we are unable to determine the causal relationships of chromatin structure changes at the loci of differentially expressed genes. Follow-up studies exploring any of the hundreds of gene loci identified here as possibly being contact-dependent could help elucidate these relationships. Additionally, as a population-level assay, Micro-C is only able to provide aggregate data across an entire population of cells. To address how cell-to-cell heterogeneity contributes to some of the functional relationships we observe, and whether that heterogeneity is occurring at a cellular or population level, we would need to apply single-cell sequencing or imaging-based approaches. These questions will be the subjects of future studies.
Taken together, using the powerful MCF10 breast cancer model, we have generated a rich genomic dataset of structural and functional changes in the genome during breast cancer progression. Our data uncovers new insights into the structure function relationship in gene regulation and into the role of genome organization during malignant breast cancer progression.
Methods
Cell culture
MCF10A cells were obtained from AATC (CRL-10317). MCF10AT1 (MCF-10AneoT) and MCF10CA1a (MCF10CA1a.cl1) cells were obtained from the Karmanos Cancer Institute. All cells were received frozen, thawed in media and grown to a confluence of 80% after 2-5 passages. Stock solutions were frozen down to be used for all experiments.
MCF10A and MCF10AT1 cells were cultured in standard high calcium medium with growth factors, consisting of DMEM/F12 (Invitrogen, 21041025) supplemented with 1.05 mM CaCl2, 10 mM HEPES, 10 ug/ml insulin (Sigma, I1882), 20 ng/ml EGF (Peprotech, AF-100-15), 0.5 ug/ml Hydrocortisone (Sigma, H0135), 100 ng/ml cholera toxin (Sigma, C8052), and 5% horse serum (Invitrogen, 16050122). MCF10CA1a cells were cultured in standard high calcium medium without growth factors, consisting of DMEM/F12, 1.05 mM CaCl2, 10 mM HEPES, and 5% horse serum.
Karyotypic analysis
Metaphase chromosomes were prepared by incubating cells with 100 ng/ml Colcemid (Roche, Brighton, MA) for two hours, followed by mitotic shake-off. The mitotic cells were then treated with a hypotonic solution (0.075M KCl) for 20 minutes at 37°C. After this treatment, the cells were centrifuged, the supernatant was extracted, and the cells were initially fixed with a methanol/acetic acid solution (3:1). This step was repeated three times. Finally, the cells were fixed onto a slide using a humidity-controlled Thermotron (Thermotron, Holland MI).
SKY probes were purchased from Applied Spectral Imaging (Carlsbad, CA) and hybridized onto slides that were aged at 37°C for 3 days. Detection was then carried out according to the protocol provided by Applied Spectral Imaging, using the following purchased antibodies: Avidin Cy5 (Rockland Immunochemicals, Limerick, PA), Mouse Anti-Digoxin antibody (Sigma-Aldrich), and a goat anti-mouse antibody conjugated to Cy5.5 (Rockland Immunochemicals). Slides were then mounted and counterstained with DAPI (Vector Laboratories, Newark, CA). Hybridization occurred over a period of two days at 37°C. Approximately 20-25 metaphases were imaged and karyotyped using the ASI GenASIS 8.2.2 software on an Olympus BX63 microscope (Evident, Tokyo, Japan) equipped with a Spectral Cube (Applied Spectral Imaging, Carlsbad, CA).
Micro-C library preparation
Cells were frozen and pellets of 1M cells were used for Micro-C library preparation. The Micro-C library was prepared using the Dovetail® Micro-C Kit according to the manufacturer’s protocol. Briefly, the chromatin was fixed with 0.3M disuccinimidyl glutarate (DSG) and 37% formaldehyde in the nucleus. The cross-linked chromatin was then digested in situ with micrococcal nuclease (MNase). Following digestion, cells were lysed with 20% SDS to extract the chromatin fragments and the chromatin fragments were bound to Chromatin Capture Beads. Next, the chromatin ends were repaired and ligated to a biotinylated bridge adapter followed by proximity ligation of adapter-containing ends. After proximity ligation, the crosslinks were reversed, the associated proteins were degraded, and the DNA was purified then converted into a sequencing library using Illumina-compatible adaptors. Biotin-containing fragments were isolated using streptavidin beads prior to PCR amplification. Each library was sequenced on an Illumina Novaseq platform to generate 240-340 million 2 x 150 bp read pairs (average depth of 280M reads). Mico-C libraries were prepared by Dovetail Genomics (Scotts Valley, CA).
RNA-Seq library preparation
Total RNA was isolated from cells using Trizol (Life Technologies) and purified using the Direct-zol RNA kit (Zymo Research, Irvine, CA, USA: R2050). RNA quality and quantity were assessed using the RNA 6000 Nano Kit with the Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA). RNA quantity was further assessed using a Nanodrop2000 (Thermo Scientific, Lafayette, CO) and Qubit HS RNA assay (Thermo Fisher Scientific). Total RNA was depleted of ribosomal RNA (New England Biolabs, NEB #7400), reverse transcribed using a random hexamer strategy, and strand-specific adapters were added following the NEBNext Ultra II RNA-seq library prep kit (New England Biolabs, E7770). Paired-End sequencing was used to generate high depth coverage ranging from 80 to 120 million paired-end reads.
ChIP-Seq library preparation
ChIPseq for CTCF (Cell Signaling Technology, catalog number 3418) and histone marks H3K27ac (Abcam, ab4729) and H3K27Me3 (Abcam ab6002). Independent biological replicates of each cell line (MCF10A, MCF10AT1, MCF10CA1a) were performed as described, including the optional step of snap freezing of chromatin and excluding the optional third washing step (97). Additionally, the chromatin was precleared with Pierce protein A/G beads for 2 hours at 4°C prior to incubation with antibodies. For CTCF we used 20ul of antibody (Cell Signaling Technologies, 3418S) and 150ug of chromatin for each sample. For H3K27ac 10ug of chromatin was used and 2ug of antibody, while for H3K27me3 15ug of chromatin was used with 4ug of antibody.
ATAC-Seq library preparation
The OMNI-ATACseq protocol was followed as previously described, with an optimized 5 minutes of nuclear extraction rather than 3 minutes (98,99).
Micro-C processing
Micro-C data from each technical replicate (library) was processed from raw reads into contact maps using guidelines outlined by Dovetail Genomics (100). Paired reads were aligned to the hg38 human genome assembly (NCBI GRCh38) using BWA mem (version 0.7.17; settings: -5SP -T0) (101). Pairtools (version 0.3.0) was then used to parse ligations, sort reads, and remove PCR duplicates using the parse (settings: --min-mapq 40 --walks-policy 5unique --max-inter-align-gap 30), sort, and dedup (settings: --mark-dups) commands (102). Alignment files were generated using pairtools split to create .pairs and .sam files, and samtools view (version 1.17; settings: -bS) to create .bam files (103). Final .bam alignment files were sorted and indexed using samtools sort and index commands. The .bam files were used to extract QC metrics using a script from Dovetail Genomics, and calculate complexity using preseq lc_extrap (settings: -bam -pe -extrap 2.1e9 -step 1e7 - seg_len 1000000000). Pairs files were used to generate contact maps using juicer_tools pre (version 1.22.01), and normalized using SCALE (104).
For merged contact maps, we first merged pairs files using pairtools merge, and then ran juicer_tools pre command on the resulting .pairs files. In total, we generated contact maps for each library (technical replicates), each sample (biological replicates), each cell type, and we created one fully merged map including all reads for a precise map of shared features.
Micro-C copy number variation analysis
Structural variant and copy number variations were detected from contact maps using NeoLoopFinder calculate-cnv at 1Mb resolution (-e uniform) (105). Denylists of regions to ignore for feature calling were generated based on regions where SCALE normalization factors were unable to be calculated at 5kb or 10kb, ignoring single bins and merging continuous areas within 100 kb. Activity-by-contact analysis also combined this denylist with the ENCODE denylist for hg38 (106).
Micro-C compartment calling
Compartments were called using CALDER (version 2.0; R version 4.2.1) at 10kb resolution and SCALE normalization (107). We also used FAN-C (version 0.9.21) to calculate eigenvectors at 100kb using SCALE normalization to build saddle plots, and oriented eigenvectors manually based on overlap with active genes (108).
Micro-C topologically associated domain calling
Topologically associated domains (TADs) were called using SpectralTAD (version 1.18.0) at 10kb resolution with SCALE normalization, a window size of 300, and a level of 3, with a quality filter applied (109). We then created a unified TAD list by merging TADs that overlap at both ends between cell types, with a maximum gap of 10kb (1 pixel). We then combined this analysis with continuous insulation score (IS) calculations from FAN-C insulation command at 10kb resolution with SCALE normalization. We then used the FAN-C boundaries command to detect IS boundaries and only kept TADs that also overlapped with an IS boundary.
Micro-C chromatin loop calling
Chromatin loops were called using SIP (version 1.6.2), run at 5kb and 10kb and then merged to 10kb (-g 2 -fdr 0.05). Loops were called in each cell type map using SCALE normalization, in addition to the combined map, and then merged similarly to TADs to create a unified loop list (110).
RNA-Seq processing
All RNA-seq data was analyzed using the nf-core/rnaseq pipeline (111). Adapter and quality trimming was performed with Trim Galore (112). Reads were aligned to the hg38 reference genome using STAR (113) and gene expression was quantified using Salmon (114). Differentially expressed genes were called using DESeq2 (115). An adjusted p-value of 0.05 and a log2fold change of 1 were used as thresholds to select significant differential expression.
ChIP-Seq processing and peak calling
Adapters were cut (cutadapt v1.11) and low quality reads trimmed (Galaxy FASTQ Quality Trimmer 1.0.0; window 10, step 1, minimum quality 20). Reads were mapped to the human genome (hg38 canonical) using STAR version 2.4 (113) with splicing disabled (– alignIntronMax 1). Enriched regions (narrowPeak calls) for each replicate were generated using MACS2 (Feng et al., 2012) and replicates were then evaluated using deepTools (116) to correlate alignments and IDR (117) to evaluate peak call reproducibility. After pooling replicates, MACS2 (118) was used on H3K4me3 to call narrowPeak at high stringency (P- value <10e-5), these peaks were further filtered according to IDR cutoffs. FE wiggle tracks were generated using MACS2’s bdgcmp and UCSC’s bedGraphToBigwig utility.ChIP-seq datasets have been deposited in the Gene Expressions Omnibus (GEO) under accession code GSE98551 for CTCF and XYZ for H3K27ac and H3K27me3.
ATAC-Seq processing and peak calling
Read trimming and quality filtering was performed using FastQC (119) and TrimGalore (112). The filtered fastq were then downsampled to approximately 50 million reads per replicate. Reads were aligned to the hg38 reference genome using Bowtie2 (120). Mitochondrial, Multi-mapped, and low quality reads, and duplicates were removed using samtools (103). MACS2 (118) was used to call narrowPeaks, followed by IDR (117) to generate sample level peak sets.
Enhancer and promoter definitions
Gene promoters were defined as regions between 2000 bp upstream and 500 bp downstream of gene transcription start sites.
Potential enhancer regions were identified based on regions that contained both an ATAC-Seq and H3K27ac ChIP-Seq peak. For activity-by-contact analysis, potential enhancers were defined as 150,000 ATAC-Seq peaks with the highest levels of H3K27ac signal, but were subset for regions with H3K27ac peaks after running (see below).
Compartmental saddle plots
Saddle plots were made manually in R. We selected three chromosomes that had no major karyotypic differences between cell lines and had high correlation of eigenvectors between replicates with the same signage (chr2, chr12, chr17). For each chromosome and cell type, we sorted eigenvectors into 20 bins. We then calculated the mean observed/expected values (using SCALE normalization) between regions belonging to different bins, and plotted it as a heatmap.
Differential loop, TAD, gene, and peak analysis
Differential genes were calculated using DESeq2 (version 1.42.0) (115). Each cell type had 3 replicates, and a design of ∼cellType was used. No fold-change cutoff was applied. Genes with an adjusted p-value below 0.01 were considered significant.
Differential H3K27ac within ATAC-Seq peaks were calculated using a similar design, but with a p-value cutoff of 0.05.
Differential loops were also identified using DESeq2, but with additional adjustments. Raw, un-normalized loop counts were pulled from each technical replicate map, resulting in 8 values per cell type (4 technical replicates for each of 2 biological replicates per cell type). An LRT design was used of ∼technicalRep + biologicalRep + cellType, with ∼technicalRep + biologicalRep as the comparison. Size factors were also provided manually based on NeoLoopFinder output (Supp. Table). A fold-change cutoff of 1.5 was applied, as was an adjusted p-value cutoff of 0.025.
Differential TAD boundaries were detected using an alternative method. Insulation scores were pulled from all TAD boundaries at the technical replicate level (8 values per cell type). A T-test was then applied for each pairwise comparison of cell types, and p-values were adjusted using FDR. Boundaries with an adjusted p-value below 0.01 were considered significant.
Differential loops, TAD boundaries, and genes were clustered based on their patterns of change across the three cell types using k-means clustering of centered and scaled normalized counts.
Micro-C feature overlap and aggregate analysis
Micro-C feature overlap and analysis was conducted in R primarily using the GenomicRanges, InteractionSet, and mariner packages (121–123). Aggregate matrices of loop pixels and TAD boundaries were generated using mariner and visualized using plotgardener (123,124).
Activity-by-contact
The activity-by-contact was used based on Fulco et al. 2019 with slight adjustments (125). To allow for direct comparison of all enhancer-promoter pairs across cell lines, we manually defined potential enhancer regions and used the same input for each cell type. These potential enhancer regions were defined as they typically are in the pipeline, by finding 150,000 ATAC-Seq peaks with the highest H3K27ac levels. The output was filtered using a suggested ABC score cutoff of 0.025 to identify likely enhancer-promoter pairs. To allow for direct comparison with our other enhancer analysis, we filtered the output based on the enhancer regions also overlapping H3K27ac, which still represented a majority of the valid pairs identified.
Matched enhancer-promoter sets
Covariate-matched subset selection among non-looped enhancer-promoter pairs was performed using the matchRanges function from the nullranges package (126,127). Enhancer-promoter pair distance or mean Micro-C contact frequency were used as covariates. Matching was done with the stratified matching method without replacement.
Gene ontology and gene set enrichment analysis
Gene ontology (GO) term enrichment was performed in R using the gprofiler2 package (127). Gene set enrichment analysis (GSEA) was performed with the GSEA software, using size factor normalized RNA-seq counts as input (128) and the Hallmark H1 gene set.
ATAC-Seq motif analysis
Motif analysis of ATAC-Seq peaks within strengthened and weakened loop anchors was conducted using the HOMER suite findMotifsGenome.pl script with size given(129). ATAC-Seq peaks within the anchors of static loop anchors were used as background.
H3K27ac peak pileup analysis
H3K27ac ChIP-Seq analysis in the anchors of gained, lost, and static loops was conducted using deeptools (116). Alignment files were normalized using RPGC with the bamCoverage function, and adjusted using scale factors generated from edgeR TMM normalization factors of counts from overlapping H3K27ac and ATAC-Seq peaks (130). Aggregate profile plots were then created using the plotProfile command.
Genomic data visualization
Micro-C contact frequency maps, aggregate analysis plots, gene annotations, and genomic signal tracks (RNA-Seq, ChIP-Seq, ATAC-Seq) were visualized and plotted in R using the plotgardener package (124).
Declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Availability of data and materials
The Micro-C datasets generated and analyzed during the current study are available in the GEO repository, accession GSE254045. CTCF ChIP-Seq data was previously published under GEO repository GSE98551. H3K27ac ChIP-Seq data was previously published under GEO repository GSE229295.
The code used to generate figures from the current study are available on Github at the following repository: https://github.com/ksmetz/MCF10-MicroC.
Funding
This work was supported by the Intramural Research Program of the NIH, National Cancer Institute, Center for Cancer Research (grant no. ZIABC010309-24 to T.M.), the Northern New England Clinical Translation Network (grant no. GM115516 to G.S.), funding from the National Cancer Institute (grant no. P01CA240685 to G.S., J.S., and S.F.), and funding from the Charlotte Perelman and Arthur Jason Perelman Fund for Cancer Research (G.S. and J.S.).
Authors’ contributions
KSMR, TM, AF, HG, JS, and GS conceptualized the project, designed the experiments, and interpreted the results. TM, SF, GS, and JS obtained funding. KSMR and AF executed experiments. KSMR and HG performed computational processing and analysis. SF conducted ChIP-Seq processing. KHH coordinated SKY karyotyping. KSMR and TM drafted the manuscript. KSMR, AF, HG, KHH, JS, GS, and TM participated in reviewing and editing the manuscript.
Supplemental figures

Karyotypic analysis and Micro-C loop reproducibility.
(A) Brightfield microscopy images of MCF10A, MCF10AT1, and MCF10CA1a cell lines. (B) Representative SKY karyotype example images from MCF10A, MCF10AT1, and MCF10CA1a cell lines. (C) Copy number variation (CNV) factors for loops across the genome as generated from NeoloopFinder. Highlighted regions indicate areas of karyotypic differences that were corrected for in the identification of differential loops; blue is regions with higher CNV in MCF10A, yellow is regions with higher CNV in MCF10AT1, and red is regions with higher CNV in MCF10CA1a. Regions that have shared karyotypic abnormalities among all three cell lines, such as the q arm of chromosome 5 which is duplicated and translocated to chromosome 9 in all cell types, did not require correction. (D) Micro-C chromatin loop count principal component analysis (PCA). Blue circles indicate MCF10A samples, yellow indicate MCF10AT1, and red indicates MCF10CA1a.

Differential loop and TAD feature details.
(A) The number of chromatin loops that exhibit a significant increase (blue) or decrease (green) in contact frequency between each pairwise comparison of cell types. (B) The number of topologically associating domain (TAD) boundaries that exhibit a significant increase (blue) or decrease (green) in insulation score between each pairwise comparison of cell types. (C) Size distributions of chromatin loops and TADs. (D) Permutation test results of weakened boundaries observed in each pairwise comparison as compared to a random sampling. (E) Percent of loops with one or both anchors overlapping CTCF ChIP-Seq peaks based on the strength of the loop (20 bins). (F) Boxplots of (left to right) maximum loop Micro- C counts, maximum loop count log2 fold-change, and loop length based on loop-CTCF class; neither (grey), one anchor overlap (light red), or both anchors overlap (dark red). (G) Venn diagram showing the degree of overlap between chromatin loops and TADs. (H) Feature sizes of loops that do not overlap TADs (light grey), loops that do overlap TADs (dark grey), TADs that do overlap loops (dark blue), and TADs that do not overlap loops (light blue). (I) Pie chart of the percentage of differential loops that have a correlated change in CTCF binding at either anchor; light grey indicates static CTCF peaks, dark grey indicates CTCF peaks that change in the opposite way as the loop (i.e. a strengthened loop with loss of CTCF binding), and dark red indicates CTCF peaks that change in the same way as the loop (i.e. a strengthened loop with gain of CTCF binding). (J) De novo motif results from HOMER for ATAC-Seq peaks within the anchors of gained/strengthened (top) and lost/weakened (bottom) chromatin loops. (K) H3K27ac ChIP-Seq aggregate profiles at the anchors of gained/strengthened, lost/weakened, and static loops.

Differential gene expression patterns across breast cancer progression.
(A) The number of genes that exhibit a significant increase (orange) or decrease (gold) in expression between each pairwise comparison of cell types. (B) Differential genes clustered by their timing of change, depicted in line plots and heatmap. (C) Gene ontology (GO) term enrichment for genes differential expressed in each pairwise comparison of cells. (D) Gene set enrichment analysis (GSEA) results showing gene sets enriched among genes differentially expressed early (MCF10AT1, top) or late (MCF10CA1a, bottom) in breast cancer progression. (E) The number of promoter H3K27ac peaks (left), distal enhancer H3K27ac peaks (middle), or loops (right) that change in a positive (grey) or negative (gold, red, blue) direction based on their overlap with up-regulated or down-regulated genes. (F) Same plots as (E) but subset for significantly differential features. (G) Percentages of up-regulated genes (left) and down-regulated genes (right) that have correlated changes in H3K27ac at promoters (gold), distal enhancers (red), both (orange), or loop contact frequency (blue), by pairwise comparison of cell types.

Enhancer-promoter pair details.
(A) Distribution of enhancer-promoter distances as predicted by the activity-by-contact (ABC) model. (B) Boxplots of distal enhancer H3K27ac (pink), enhancer-promoter contact (blue), and ABC score (purple), as well as gene log2 fold-change (yellow) for enhancer promoter pairs that feature (top to bottom) differential H3K27ac among enhancers, differential enhancer-promoter contact frequency, differential H3K27ac for looped enhancer-promoter pairs, contact-matched non-looped enhancer-promoter pairs, and distance-matched non-looped enhancer-promoter pairs. Comparison shows changes between MCF10A and MCF10AT1. (C) Same as (B) but showing comparisons between MCF10AT1 and MCF10Ca1a.

Differential genes within differential loops.
(A) The number of differential expressed genes among all genes (dashed light grey line), genes overlapping static loop anchors (dashed dark grey line), genes overlapping differential loop anchors (red line), and a permutation test of a random sampling of a similar number of genes (black line). (B) Permutation test results (n=1,000) for the number of up-regulated genes that overlap with strengthened/gained loops (red) compared to a random sampling (black). (C) Permutation test results (n=1,000) for the number of down-regulated genes that overlap with weakened/lost loops (red) compared to a random sampling (black). Boxplots in (D) and (E) represent log2 fold-change between MCF10A and MCF10AT1 (D) or MCF10AT1 and MCF10CA1a (E) of distal H3K27me3 (grey), distal H3K27ac (red), promoter H3K27ac (orange), gene expression (yellow), and loops (blue) among upregulated or downregulated genes that overlap with gained loops (darker colors) or lost loops (lighter colors. Boxplots are defined as in (A). P-values represent T-tests comparing the mean values of the features at loops that change in the same and those that change in opposite directions from the differential genes at their anchors. Non-significant (n.s.) p-values are any p- values above 0.01.
Acknowledgements
We thank Kathleen Quinn and Joseph Boyd for their work building the ChIP-Seq libraries and processing the ChIP-Seq data, respectively. We thank Darawalee Wangsa and Danny Wangsa in the CCR Genetics Branch and OMICS Technology Facility at the NCI for their expert SKY analysis. We thank Alquassem Abuorquob for his assistance in ChIP-Seq library preparation. STR analyses and Next Generation Sequencing was in part done with the assistance of the Vermont Integrated Genomics Resource (VIGR) at the Vermont Cancer Center, University of Vermont and the Genomics Sequencing Facility (GSF) at Greehey Children’s Cancer Research Institute UT Health San Antonio. We thank the members of the Misteli and Stein labs for feedback. We thank Jordan Zhang, Misha Gattengo, and Sierra Wilson at Dovetail Genomics for coordinating Micro-C library preparation and preliminary analysis.
Additional files
Additional information
Funding
National Cancer Institute (ZIABC010309-24)
National Cancer Institute (P01CA240685)
Northern New England Clinical Translation Network (GM115516)
Charlotte Perelman and Arthur Jason Perelman Fund for Cancer Research
References
- 1.The 3D genome as moderator of chromosomal communicationCell 164:1110https://doi.org/10.1016/j.cell.2016.02.007Google Scholar
- 2.Understanding 3D genome organization by multidisciplinary methodsNat Rev Mol Cell Biol 22:511–28https://hal.science/hal-03269159Google Scholar
- 3.The self-organizing genome: Principles of genome architecture and functionCell 183:28Google Scholar
- 4.Three-dimensional genome architecture: players and mechanismsNature Reviews Molecular Cell Biology 16:245–57https://www.nature.com/articles/nrm3965Google Scholar
- 5.Organizational principles of 3D genome architectureNature Reviews Genetics 19:789–800https://www.nature.com/articles/s41576-018-0060-8Google Scholar
- 6.The three-dimensional cancer genomeCurr Opin Genet Dev 36:1–7https://doi.org/10.1016/j.gde.2016.01.002Google Scholar
- 7.Regulation of 3D Organization and Its Role in Cancer BiologyFront Cell Dev Biol 10:879465Google Scholar
- 8.Disorder of three-dimensional chromosome structure plays a role in carcinogenesisClinical and Translational Discovery 1:e17https://onlinelibrary.wiley.com/doi/full/10.1002/ctd2.17Google Scholar
- 9.Rewiring cancer: 3D genome determinants of cancer hallmarksCurr Opin Genet Dev :91https://doi.org/10.1016/j.gde.2024.102307Google Scholar
- 10.Genome folding through loop extrusion by SMC complexesNature Reviews Molecular Cell Biology 22:445–64https://www.nature.com/articles/s41580-021-00349-7Google Scholar
- 11.New insights into genome folding by loop extrusion from inducible degron technologiesNature Reviews Genetics 24:73–85https://www.nature.com/articles/s41576-022-00530-4Google Scholar
- 12.Chromatin extrusion explains key features of loop and domain formation in wild-type and engineered genomesProc Natl Acad Sci U S A 112:E6456–65https://www.pnas.org/doi/abs/10.1073/pnas.1518552112Google Scholar
- 13.Region Capture Micro-C reveals coalescence of enhancers and promoters into nested microcompartmentsNature Genetics 55:1048–56https://www.nature.com/articles/s41588-023-01391-1Google Scholar
- 14.Resolving the 3D Landscape of Transcription-Linked Mammalian Chromatin FoldingMol Cell 78:539–553https://doi.org/10.1016/j.molcel.2020.03.002Google Scholar
- 15.In vitro reconstitution of chromatin domains shows a role for nucleosome positioning in 3D genome organizationNature Genetics 56:483–92https://www.nature.com/articles/s41588-023-01649-8Google Scholar
- 16.Organizational principles of 3D genome architectureNat Rev Genet 19:789–800https://doi.org/10.1038/s41576-018-0060-8Google Scholar
- 17.Structural perturbation of chromatin domains with multiple developmental regulators can severely impact gene regulation and developmentbioRxiv. https://www.biorxiv.org/content/10.1101/2024.08.03.606480v1Google Scholar
- 18.Deletion of a single CTCF motif at the boundary of a chromatin domain with three FGF genes disrupts gene expression and embryonic developmentDev Cell https://www.cell.com/action/showFullText?pii=S1534580725000644Google Scholar
- 19.TADs: Dynamic structures to create stable regulatory functionsCurr Opin Struct Biol 81:102622Google Scholar
- 20.How subtle changes in 3D structure can create large changes in transcriptioneLife. 10:e64320https://doi.org/10.7554/eLife.64320Google Scholar
- 21.The stochastic nature of genome organization and functionCurr Opin Genet Dev 72:45–52PubMedGoogle Scholar
- 22.Pre-established Chromatin Interactions Mediate the Genomic Response to GlucocorticoidsCell Syst 7:146–160PubMedGoogle Scholar
- 23.Cohesin is required for long-range enhancer action at the Shh locusNature Structural C Molecular Biology 29:891–7https://www.nature.com/articles/s41594-022-00821-8Google Scholar
- 24.Long-range enhancer-promoter contacts in gene expression controlNat Rev Genet 20:437–55PubMedGoogle Scholar
- 25.Loop extrusion by cohesin plays a key role in enhancer-activated gene expression during differentiationbioRxiv https://www.biorxiv.org/content/10.1101/2023.09.07.556660v1
- 26.Coming full circle: On the origin and evolution of the looping model for enhancer–promoter communicationJ Biol Chem 298:102117Google Scholar
- 27.Decreased Enhancer-Promoter Proximity Accompanying Enhancer ActivationMol Cell 76:473–484PubMedGoogle Scholar
- 28.Functional dissection of the Sox9–Kcnj2 locus identifies nonessential and instructive roles of TAD architectureNature Genetics 51:1263–71https://www.nature.com/articles/s41588-019-0466-zGoogle Scholar
- 29.Synergy between cis-regulatory elements can render cohesin dispensable for distal enhancer functionbioRxiv https://doi.org/10.1101/2024.10.04.615095Google Scholar
- 30.Enhancer-promoter interactions and transcription are largely maintained upon acute loss of CTCF, cohesin WAPL or YY1. Nat Genet 54:1919–32PubMedGoogle Scholar
- 31.Cohesin Loss Eliminates All Loop Domains. Cell 171:305–320Google Scholar
- 32.Control of inducible gene expression links cohesin to hematopoietic progenitor self-renewal and differentiationNat Immunol 19:932–41PubMedGoogle Scholar
- 33.Genome-wide in vivo dynamics of cohesin-mediated loop extrusion and its role in transcription activationbioRxiv https://www.biorxiv.org/content/10.1101/2024.10.02.616323v1
- 34.Chromatin structure in cancerBMC Mol Cell Biol 23:1–10https://bmcmolcellbiol.biomedcentral.com/articles/10.1186/s12860-022-00433-6Google Scholar
- 35.Mutation hotspots at CTCF binding sites coupled to chromosomal instability in gastrointestinal cancersNat Commun 9PubMedGoogle Scholar
- 36.CTCF/cohesin-binding sites are frequently mutated in cancerNat Genet 47:818–21PubMedGoogle Scholar
- 37.Structural Variation in Cancer: Role Prevalence, and Mechanisms. Annu Rev Genomics Hum Genet 23:123–52PubMedGoogle Scholar
- 38.The impact of translocations and gene fusions on cancer causationNat Rev Cancer 7:233–45PubMedGoogle Scholar
- 39.Enhancers dysfunction in the 3D genome of cancer cellsFront Cell Dev Biol 11Google Scholar
- 40.Chromatin interaction analysis reveals changes in small chromosome and telomere clustering between epithelial and breast cancer cellsGenome Biol PubMedGoogle Scholar
- 41.Intranuclear and higher-order chromatin organization of the major histone gene cluster in breast cancerJ Cell Physiol 233:1278Google Scholar
- 42.Large-Scale Topological Changes Restrain Malignant Progression in Colorectal CancerCell 182:1474Google Scholar
- 43.Hi-C profiling of cancer spheroids identifies 3D-growth-specific chromatin interactions in breast cancer endocrine resistanceClin Epigenetics 13:1–13https://clinicalepigeneticsjournal.biomedcentral.com/articles/10.1186/s13148-021-01167-6Google Scholar
- 44.Estrogen Induces Global Reorganization of Chromatin Structure in Human Breast Cancer CellsPLoS One 9:e113354Google Scholar
- 45.Oncogene-mediated alterations in chromatin conformationProc Natl Acad Sci U S A 109:9083–8Google Scholar
- 46.Three-dimensional disorganization of the cancer genome occurs coincident with long-range genetic and epigenetic alterationsGenome Res 26:719–31https://genome.cshlp.org/content/26/6/719.fullGoogle Scholar
- 47.The 3D genomic landscape of differential response to EGFR/HER2 inhibition in endocrine-resistant breast cancer cellsBiochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms 1863:194631Google Scholar
- 48.Temporal dynamic reorganization of 3D chromatin architecture in hormone-induced breast cancer and endocrine resistanceNature Communications 10:1–14https://www.nature.com/articles/s41467-019-09320-9Google Scholar
- 49.Comparative characterization of 3D chromatin organization in triple-negative breast cancersExperimental C Molecular Medicine 54:585–600https://www.nature.com/articles/s12276-022-00768-2Google Scholar
- 50.Loss of multi-level 3D genome organization during breast cancer progressionbioRxiv Google Scholar
- 51.3D genomic analysis reveals novel enhancer-hijacking caused by complex structural alterations that drive oncogene overexpressionNature Communications 15:1–15https://www.nature.com/articles/s41467-024-50387-wGoogle Scholar
- 52.Genome-wide detection of enhancer-hijacking events from chromatin interaction data in rearranged genomesNature Methods 18:661–8https://www.nature.com/articles/s41592-021-01164-wGoogle Scholar
- 53.The MCF10 Model of Breast Tumor ProgressionCancer Res 81:4183–5Google Scholar
- 54.Isolation and characterization of a spontaneously immortalized human breast epithelial cell line, MCF-10Cancer Res 50:6075–86https://europepmc.org/article/med/1975513Google Scholar
- 55.MCF10AT: a model for the evolution of cancer from proliferative breast diseaseAm J Pathol 148:313Google Scholar
- 56.Malignant MCF10CA1 cell lines derived from premalignant human breast epithelial MCF10AT cellsBreast Cancer Res Treat 65:101–10PubMedGoogle Scholar
- 57.Regulation of miRNA-29c and its downstream pathways in preneoplastic progression of triple-negative breast cancerOncotarget 8:19645Google Scholar
- 58.Differential expression of acidic proteins with progression in the MCF10 model of human breast diseaseInt J Oncol 31:941–9http://www.spandidos-publications.com/10.3892/ijo.31.4.941/abstractGoogle Scholar
- 59.m6A regulates breast cancer proliferation and migration through stage-dependent changes in Epithelial to Mesenchymal Transition gene expressionFront Oncol 13:1268977Google Scholar
- 60.Delineating Genetic Alterations for Tumor Progression in the MCF10A Series of Breast Cancer Cell LinesPLoS One 5:e9201Google Scholar
- 61.Analyses of BMAL1 and PER2 Oscillations in a Model of Breast Cancer Progression Reveal Changes With MalignancyIntegr Cancer Ther 18:1534735419836494Google Scholar
- 62.Three-dimensional modelling identifies novel genetic dependencies associated with breast cancer progression in the isogenic MCF10 modelJ Pathol 240:315Google Scholar
- 63.Cytogenetic and cDNA Microarray Expression Analysis of MCF10A Human Breast Cancer Progression Cell LinesCancer Res 69:5946Google Scholar
- 64.Molecular signatures associated with transformation and progression to breast cancer in the isogenic MCF10 modelGenomics 92:419–28Google Scholar
- 65.Differential Expression of Key Signaling Proteins in MCF10 Cell Lines, a Human Breast Cancer Progression ModelMol Cell Pharmacol 4:31Google Scholar
- 66.Heterogeneity in chromatin structure drives core regulatory pathways in B-cell Acute Lymphoblastic LeukemiabioRxiv. https://www.biorxiv.org/content/10.1101/2024.10.04.616668v1
- 67.Increased heterogeneity in expression of genes associated with cancer progression and drug resistanceTransl Oncol 41:101879Google Scholar
- 68.Tumour heterogeneity and resistance to cancer therapiesNature Reviews Clinical Oncology 15:81–94https://www.nature.com/articles/nrclinonc.2017.166Google Scholar
- 69.Active regulatory elements recruit cohesin to establish cell-1 specific chromatin domainscited https://doi.org/10.1101/2023.10.13.562171Google Scholar
- 70.Building regulatory landscapes reveals that an enhancer can recruit cohesin to create contact domains, engage CTCF sites and activate distant genesNature Structural C Molecular Biology 29:563–74https://www.nature.com/articles/s41594-022-00787-7Google Scholar
- 71.Histone H3K27ac separates active from poised enhancers and predicts developmental stateProc Natl Acad Sci U S A 107:21931–6https://www.pnas.org/doi/abs/10.1073/pnas.1016071107Google Scholar
- 72.Glossary – ENCODEhttps://www.encodeproject.org/glossary/
- 73.Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genomeNature Genetics 39:311–8https://www.nature.com/articles/ng1966Google Scholar
- 74.Regulation of Gene Expression in Cancer—An OverviewCells 11:4058Google Scholar
- 75.Hallmarks of transcriptional intratumour heterogeneity across a thousand tumoursNature 618:598–606PubMedGoogle Scholar
- 76.The Molecular Signatures Database (MSigDB) hallmark gene set collectionCell Syst 1:417–25PubMedGoogle Scholar
- 77.Gene expression profiles of human breast cancer progressionProc Natl Acad Sci U S A 100:5974–9https://www.pnas.org/doi/abs/10.1073/pnas.0931261100Google Scholar
- 78.Suppression of Spry1 inhibits triple-negative breast cancer malignancy by decreasing EGF/EGFR mediated mesenchymal phenotypeSci Rep 6:23216Google Scholar
- 79.Aberrant inactivation of SCNN1G promotes the motility of head and neck squamous cell carcinomaPathol Res Pract 240:154175Google Scholar
- 80.Activity-by-contact model of enhancer–promoter regulation from thousands of CRISPR perturbationsNature Genetics 51:1664–9https://www.nature.com/articles/s41588-019-0538-0Google Scholar
- 81.Hallmarks of cancer: The next generationCell 144:646–74https://www.cell.com/action/showFullText?pii=S0092867411001279Google Scholar
- 82.COL12A1 as a prognostic biomarker in HER2-enriched breast cancer and its association with immune infiltrationEur J Gynaecol Oncol 43:85–90https://www.ejgo.net/Google Scholar
- 83.Temporal profiling of the breast tumour microenvironment reveals collagen XII as a driver of metastasisNature Communications 13:1–21https://www.nature.com/articles/s41467-022-32255-7Google Scholar
- 84.COL12A1 Acts as a Novel Prognosis Biomarker and Activates Cancer-Associated Fibroblasts in Pancreatic Cancer through Bioinformatics and Experimental ValidationCancers (Basel) 15:1480Google Scholar
- 85.Paracrine WNT5A signaling inhibits expansion of tumor-initiating cellsCancer Res 75:1972Google Scholar
- 86.Loss of TGF-β or Wnt5a results in an increase in Wnt/β-catenin activity and redirects mammary tumour phenotypeBreast Cancer Res 11:R19Google Scholar
- 87.Multiple Roles of WNT5A in Breast CancerMed Sci Monit 22:5058Google Scholar
- 88.Characterization of an antagonistic switch between histone H3 lysine 27 methylation and acetylation in the transcriptional regulation of Polycomb group target genesNucleic Acids Res 38:4958Google Scholar
- 89.H3K27me3-rich genomic regions can function as silencers to repress gene expression via chromatin interactionsNature Communications 12:1–22https://www.nature.com/articles/s41467-021-20940-yGoogle Scholar
- 90.Epigenetic heterogeneity in cancerBiomarker Research 7:1–19https://biomarkerres.biomedcentral.com/articles/10.1186/s40364-019-0174-yGoogle Scholar
- 91.The epigenetic basis of cellular heterogeneityNat Rev Genet 22:235Google Scholar
- 92.Epigenetic plasticity and the hallmarks of cancerScience 357PubMedGoogle Scholar
- 93.Enhancer selectivity in space and time: from enhancer– promoter interactions to promoter activationNat Rev Mol Cell Biol 25:574Google Scholar
- 94.Cohesin-mediated 3D contacts tune enhancer-promoter regulationbioRxiv Google Scholar
- 95.Promoter-proximal CTCF binding promotes distal enhancer-dependent gene activationNature Structural C Molecular Biology 28:152–61https://www.nature.com/articles/s41594-020-00539-5Google Scholar
- 96.Cohesin-Dependent and -Independent Mechanisms Mediate Chromosomal Contacts between Promoters and EnhancersCell Rep 32PubMedGoogle Scholar
- 97.Using ChIP-seq Technology to Identify Targets of Zinc Finger Transcription FactorsMethods Mol Biol 649:437Google Scholar
- 98.An optimized ATAC-seq protocol for genome-wide mapping of active regulatory elements in primary mouse cortical neuronsSTAR Protoc 2:100854Google Scholar
- 99.An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissuesNat Methods 14:959–62https://doi.org/10.1038/nmeth.4396Google Scholar
- 100.Micro-C 0.1 documentationhttps://micro-c.readthedocs.io/en/latest/index.html
- 101.Fast and accurate short read alignment with Burrows–Wheeler transformBioinformatics 25:1754–60https://doi.org/10.1093/bioinformatics/btp324Google Scholar
- 102.Pairtools: from sequencing data to chromosome contactsbioRxiv https://doi.org/10.1101/2023.02.13.528389Google Scholar
- 103.Twelve years of SAMtools and BCFtoolsGigascience 10:1–4https://doi.org/10.1093/gigascience/giab008Google Scholar
- 104.Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C ExperimentsCell Syst 3:95–8PubMedGoogle Scholar
- 105.Genome-wide detection of enhancer-hijacking events from chromatin interaction data in rearranged genomesNature Methods 18:661–8https://www.nature.com/articles/s41592-021-01164-wGoogle Scholar
- 106.The ENCODE Blacklist: Identification of Problematic Regions of the GenomeScientific Reports 9:1–5https://www.nature.com/articles/s41598-019-45839-zGoogle Scholar
- 107.Systematic inference and comparison of multi-scale chromatin sub-compartments connects spatial organization to cell phenotypesNature Communications 12:1–11https://www.nature.com/articles/s41467-021-22666-3Google Scholar
- 108.FAN-C: a feature-rich framework for the analysis and visualisation of chromosome conformation capture dataGenome Biol 21:1–19https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02215-9Google Scholar
- 109.SpectralTAD: An R package for defining a hierarchy of topologically associated domains using spectral clusteringBMC Bioinformatics 21:1–19https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03652-wGoogle Scholar
- 110.Analysis of Hi-C data using SIP effectively identifies loops in organisms from C.Elegans to mammals. Genome Res 30:447–58Google Scholar
- 111.nf-core/rnaseq: nf-core/rnaseq v3.0 - Silver SharkZenodo https://zenodo.org/records/4323183
- 112.Trim Galore!.Babraham Bioinformatics http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/
- 113.STAR: ultrafast universal RNA-seq alignerBioinformatics 29:15–21http://www.ncbi.nlm.nih.gov/pubmed/23104886Google Scholar
- 114.Salmon provides fast and bias-aware quantification of transcript expressionNat Methods 14:417–9http://www.nature.com/articles/nmeth.4197Google Scholar
- 115.Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2Genome Biol 15:1–21https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0550-8Google Scholar
- 116.deepTools2: a next generation web server for deep-sequencing data analysisNucleic Acids Res 44:W160–5https://doi.org/10.1093/nar/gkw257Google Scholar
- 117.Measuring reproducibility of high-throughput experimentsAnn. Appl. Stat. 5:1752–79https://doi.org/10.1214/11-AOAS466Google Scholar
- 118.Model-based Analysis of ChIP-Seq (MACS)Genome Biol 9:R137http://genomebiology.biomedcentral.com/articles/10.1186/gb-2008-9-9-r137Google Scholar
- 119.FastQC: A Quality Control tool for High Throughput Sequence Data.Babraham Bioinformatics http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
- 120.Fast gapped-read alignment with Bowtie 2Nat Methods 9:357–9http://www.nature.com/doifinder/10.1038/nmeth.1923Google Scholar
- 121.Software for Computing and Annotating Genomic RangesPLoS Comput Biol 9:e1003118https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003118Google Scholar
- 122.Infrastructure for genomic interactions: Bioconductor classes for Hi-C, ChIA-PET and related experimentsF1000Res 5:950Google Scholar
- 123.Mariner: explore the Hi-CsBioinformatics 40https://doi.org/10.1093/bioinformatics/btae352Google Scholar
- 124.Plotgardener: cultivating precise multi-panel figures in RBioinformatics 38:2042–5https://doi.org/10.1093/bioinformatics/btac057Google Scholar
- 125.Activity-by-contact model of enhancer–promoter regulation from thousands of CRISPR perturbationsNature Genetics 51:1664–9https://www.nature.com/articles/s41588-019-0538-0Google Scholar
- 126.nullranges: Generation of null ranges via bootstrapping or covariate matchingGitHub https://nullranges.github.io/nullranges
- 127.gprofiler2 -- an R package for gene list functional enrichment analysis and namespace conversion toolset g:ProfilerF1000Res 9https://doi.org/10.12688/f1000research.24956.2Google Scholar
- 128.Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profilesProc Natl Acad Sci U S A 102:15545–50Google Scholar
- 129.Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identitiesMol Cell 38:576–89http://www.ncbi.nlm.nih.gov/pubmed/20513432Google Scholar
- 130.edgeR: a Bioconductor package for differential expression analysis of digital gene expression dataBioinformatics 26:139–40http://www.ncbi.nlm.nih.gov/pubmed/19910308Google Scholar
Article and author information
Author information
Version history
- Preprint posted:
- Sent for peer review:
- Reviewed Preprint version 1:
Cite all versions
You can cite all versions using the DOI https://doi.org/10.7554/eLife.108135. This DOI represents all versions, and will always resolve to the latest one.
Copyright
This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.
Metrics
- views
- 0
- downloads
- 0
- citations
- 0
Views, downloads and citations are aggregated across all versions of this paper published by eLife.