Figures and data

Reorganization of compartments, TADs, and loops during breast cancer progression
(A) A diagram of the experimental design. Three epithelial cell lines represent various stages of breast cancer progression; MCF10A are non-cancerous, MCF10AT1 are pre-malignant, and MCF10CA1a are metastatic. In each cell line we generated 5kb resolution Micro-C to identify features such as compartments, topologically associating domains (TADs), and chromatin loops. We overlapped these features with functional changes in gene expression from RNA-Seq, histone modifications and CTCF binding from ChIP-Seq, and chromatin accessibility from ATAC-Seq. (B) Micro-C maps of a 2 Mb region of chromosome 1 in MCF10A (non-cancerous), MCF10AT1 (pre-cancerous), and MCF10CA1a (metastatic) cells at 5 kb resolution. Each map has annotations for loop calls, both static (black boxes) and differential (red boxes). Below each map is a track indicating compartment calls from CALDER (dark red is most A-like, dark blue is most B-like) as well as insulation scores tracks with static (grey) and differential (red) boundaries marked. Ribbons indicate TAD calls for each cell type. (C) Lengths of the genome assigned to each compartment in each cell type. (D) TAD and (E) loop calls from each cell type, colored by the number of maps they were initially detected in. (F) Saddle plots of interactions between regions of different compartments in MCF10A, MCF10AT1, and MCF10CA1a. Bottom plots indicate the average eigenvector value for each compartment ventile. Plots shown are for chromosomes 2, 12, and 17 (see Methods). (G) Left; Differential TAD boundaries clustered by their timing of change, depicted in line plots and heatmap. Right; aggregate plots of weakened and strengthened TAD boundaries (n=100). (H) Left; Differential chromatin loops clustered by their timing of change, depicted in line plots and heatmap. Right; aggregate plots of weakened and strengthened loops (n=100).

Persistent chromatin loops connect differentially expressed genes to distal enhancers.
(A) Percentages of loops designated as either promoter-promoter, enhancer-promoter, enhancer-enhancer, or single-sided promoter or enhancer loops. (B) Distributions of loop sizes by enhancer/promoter designations. P-values represent T-tests comparing the means of different loop classes. Boxplots show the median (middle line), 25th and 75th quartiles (box perimeters), and range excluding outliers (dashed line whiskers). Outliers are defined as values that are over 1.5 times the interquartile range beyond the box bounds and are excluded from these plots. (C) Distributions of loop strength by enhancer/promoter designations. (D) Percentages of upregulated genes that have gained H3K27ac at promoters, distal enhancers, both, or gained loops. P-values represent T-tests comparing the means of various loop sets. Non-significant (n.s.) represents p-values above 0.05. (E) Log2 fold-change of distal H3K27me3 (grey), distal H3K27ac (red), promoter H3K27ac (orange), gene expression (yellow), and loop strength (blue), when overlapped. Grey dots indicate features that do not change significantly, while colored points are significantly differential features. Boxplots are defined as in (B). P-values represent T-tests comparing the means of each class to 0. Non-significant (n.s.) represents p-values above 0.01. (F) Percentages of downregulated genes that have gained H3K27ac at promoters, distal enhancers, both, or gained loops. (G) Log2 fold-change of distal H3K27me3 (grey), distal H3K27ac (red), promoter H3K27ac (orange), gene expression (yellow), and loop strength (blue), when overlapped. Boxplot details as defined in (E). (H) An example of an upregulated gene (SPRY1) connected to gained enhancers by static loops. Black boxes show loop annotations. Red compartment tracks indicate A compartments, while blue indicates B compartments. In CTCF signal tracks, red highlights indicate differential CTCF peaks. In H3K27ac and ATAC-Seq signal tracks, red highlights indicate differential enhancers as determined by changes in H3K27ac. Genes highlighted in black are differentially expressed. (I) An example of downregulated genes (SCNN1G, SCNN1B) connected to lost enhancers by static loops. Plot annotations are as described in (H).

Changes in enhancer acetylation or enhancer-promoter contact are associated with changes in gene expression.
Boxplots of distal enhancer H3K27ac (pink), enhancer-promoter contact (blue), and ABC score (purple), as well as gene log2 fold-change (yellow) for enhancer promoter pairs that feature (A) differential H3K27ac among enhancers, (B) differential enhancer-promoter contact frequency, and (C) differential H3K27ac for enhancer-promoter pairs supported by a chromatin loop. Boxplots in (D) and (E) represent sets of non-looped enhancer-promoter pairs with differential H3K27ac that are matched to the looped set in (C) by contact and distance, respectively. Boxplots show the median (middle line), 25th and 75th quartiles (box perimeters), and range excluding outliers (dashed line whiskers). Outliers are defined as values that are over 1.5 times the interquartile range beyond the box bounds and are excluded from these plots. P-values represent T-tests comparing the means of values in MCF10A and MCF10CA1a for enhancer activity, enhancer-promoter contact, and ABC Score, and T-tests comparing the mean of the value to 0 for gene log2 fold-change. (F) Contact distribution of all enhancer-promoter pairs (dashed line), compared to the looped enhancer-promoter pairs in (C, solid line), the contact-matched pairs in (D, blue shade), and the distance-matched pairs in (E, grey shade). (F) Distance distribution of all enhancer-promoter pairs (dashed line), compared to the looped enhancer-promoter pairs in (C, solid line), the contact-matched pairs in (D, blue shade), and the distance-matched pairs in (E, grey shade). (G) Summary boxplot of the gene log2 fold-change for the enhancer-promoter pairs previously shown in figures (A-E). P-values represent T-tests comparing the means of average gene log2 fold-changes values for different sets of enhancer-promoter pairs.

Differential loops are enriched for cancer-relevant differentially expressed genes
(A) Log2 fold-change of differentially expressed genes at the anchors of gained (blue), weakened (green), or static (grey) loops. Boxplots show the median (middle line), 25th and 75th quartiles (box perimeters), and range excluding outliers (dashed line whiskers). Outliers are defined as values that are over 1.5 times the interquartile range beyond the box bounds and are excluded from these plots. P-values represent T-tests comparing the mean of each set to 0. (B) Bar plot showing the number of differentially expressed genes at strengthened or weakened loop anchors. Bar segments are colored by whether the gene is changing the same (blue for upregulated in strengthened loops, green for downregulated in weakened loops) or opposite (grey) direction as the loop. P-value represents a Fisher’s Exact Test for whether the odds ratio (OR) is greater than 1. (C) GO term enrichments for genes upregulated in MCF10A, MCF10AT1, or MCF10CA1a. Size indicates p-value. Terms are color-coded based on gene type; morphogenesis (purple), proliferation (orange), and cell adhesion (teal). (D) An example of an upregulated gene (COL12A1) with a promoter that overlaps a strengthened loop with distal enhancers. Black boxes show loop annotations, while red boxes indicate differential loops. Red compartment tracks indicate A compartments, while blue indicates B compartments. In CTCF signal tracks, red highlights indicate differential CTCF peaks. In H3K27ac and ATAC-Seq signal tracks, red highlights indicate differential enhancers as determined by changes in H3K27ac. Genes highlighted in black are differentially expressed. (E) An example of a downregulated gene (WNT5A) with a promoter that overlaps with several weakened loops containing distal enhancers that lose H3K27ac. Plots are annotated as in (A).

Progression-associated differentially expressed genes exhibit local and distal epigenetic changes at differential loops.
(A-C) Log2 fold-change of genes (colored dots) and the differential loops they overlap with (black/grey dots) for genes and loops that change between (A) MCF10A and MCF10AT1, (B) MCF10AT1 and MCF10CA1a, and (C) MCF10A and MCF10CA1a. Gene labels are below. (D) Log2 fold-change between MCF10A and MCF10CA1a of distal H3K27me3 (grey), distal H3K27ac (red), promoter H3K27ac (orange), gene expression (yellow), and loops (blue) among upregulated genes that overlap with gained loops (darker colors) or lost loops (lighter colors). Boxplots are defined as in (A). P-values represent T-tests comparing the mean values of the features at loops that change in the same and those that change in opposite directions from the differential genes at their anchors. Non-significant (n.s.) p-values are any p-values above 0.01. (E) Log2 fold-change of distal H3K27me3 (grey), distal H3K27ac (red), promoter H3K27ac (orange), gene expression (yellow), and loops (blue) among downregulated genes that overlap with gained loops (darker colors) or lost loops (lighter colors). P-values represent T-tests comparing the mean values of the features at loops that change in the same and those that change in opposite directions from the differential genes at their anchors. Non-significant (n.s.) p-values are any p-values above 0.01.

Karyotypic analysis and Micro-C loop reproducibility.
(A) Brightfield microscopy images of MCF10A, MCF10AT1, and MCF10CA1a cell lines. (B) Representative SKY karyotype example images from MCF10A, MCF10AT1, and MCF10CA1a cell lines. (C) Copy number variation (CNV) factors for loops across the genome as generated from NeoloopFinder. Highlighted regions indicate areas of karyotypic differences that were corrected for in the identification of differential loops; blue is regions with higher CNV in MCF10A, yellow is regions with higher CNV in MCF10AT1, and red is regions with higher CNV in MCF10CA1a. Regions that have shared karyotypic abnormalities among all three cell lines, such as the q arm of chromosome 5 which is duplicated and translocated to chromosome 9 in all cell types, did not require correction. (D) Micro-C chromatin loop count principal component analysis (PCA). Blue circles indicate MCF10A samples, yellow indicate MCF10AT1, and red indicates MCF10CA1a.

Differential loop and TAD feature details.
(A) The number of chromatin loops that exhibit a significant increase (blue) or decrease (green) in contact frequency between each pairwise comparison of cell types. (B) The number of topologically associating domain (TAD) boundaries that exhibit a significant increase (blue) or decrease (green) in insulation score between each pairwise comparison of cell types. (C) Size distributions of chromatin loops and TADs. (D) Permutation test results of weakened boundaries observed in each pairwise comparison as compared to a random sampling. (E) Percent of loops with one or both anchors overlapping CTCF ChIP-Seq peaks based on the strength of the loop (20 bins). (F) Boxplots of (left to right) maximum loop Micro- C counts, maximum loop count log2 fold-change, and loop length based on loop-CTCF class; neither (grey), one anchor overlap (light red), or both anchors overlap (dark red). (G) Venn diagram showing the degree of overlap between chromatin loops and TADs. (H) Feature sizes of loops that do not overlap TADs (light grey), loops that do overlap TADs (dark grey), TADs that do overlap loops (dark blue), and TADs that do not overlap loops (light blue). (I) Pie chart of the percentage of differential loops that have a correlated change in CTCF binding at either anchor; light grey indicates static CTCF peaks, dark grey indicates CTCF peaks that change in the opposite way as the loop (i.e. a strengthened loop with loss of CTCF binding), and dark red indicates CTCF peaks that change in the same way as the loop (i.e. a strengthened loop with gain of CTCF binding). (J) De novo motif results from HOMER for ATAC-Seq peaks within the anchors of gained/strengthened (top) and lost/weakened (bottom) chromatin loops. (K) H3K27ac ChIP-Seq aggregate profiles at the anchors of gained/strengthened, lost/weakened, and static loops.

Differential gene expression patterns across breast cancer progression.
(A) The number of genes that exhibit a significant increase (orange) or decrease (gold) in expression between each pairwise comparison of cell types. (B) Differential genes clustered by their timing of change, depicted in line plots and heatmap. (C) Gene ontology (GO) term enrichment for genes differential expressed in each pairwise comparison of cells. (D) Gene set enrichment analysis (GSEA) results showing gene sets enriched among genes differentially expressed early (MCF10AT1, top) or late (MCF10CA1a, bottom) in breast cancer progression. (E) The number of promoter H3K27ac peaks (left), distal enhancer H3K27ac peaks (middle), or loops (right) that change in a positive (grey) or negative (gold, red, blue) direction based on their overlap with up-regulated or down-regulated genes. (F) Same plots as (E) but subset for significantly differential features. (G) Percentages of up-regulated genes (left) and down-regulated genes (right) that have correlated changes in H3K27ac at promoters (gold), distal enhancers (red), both (orange), or loop contact frequency (blue), by pairwise comparison of cell types.

Enhancer-promoter pair details.
(A) Distribution of enhancer-promoter distances as predicted by the activity-by-contact (ABC) model. (B) Boxplots of distal enhancer H3K27ac (pink), enhancer-promoter contact (blue), and ABC score (purple), as well as gene log2 fold-change (yellow) for enhancer promoter pairs that feature (top to bottom) differential H3K27ac among enhancers, differential enhancer-promoter contact frequency, differential H3K27ac for looped enhancer-promoter pairs, contact-matched non-looped enhancer-promoter pairs, and distance-matched non-looped enhancer-promoter pairs. Comparison shows changes between MCF10A and MCF10AT1. (C) Same as (B) but showing comparisons between MCF10AT1 and MCF10Ca1a.

Differential genes within differential loops.
(A) The number of differential expressed genes among all genes (dashed light grey line), genes overlapping static loop anchors (dashed dark grey line), genes overlapping differential loop anchors (red line), and a permutation test of a random sampling of a similar number of genes (black line). (B) Permutation test results (n=1,000) for the number of up-regulated genes that overlap with strengthened/gained loops (red) compared to a random sampling (black). (C) Permutation test results (n=1,000) for the number of down-regulated genes that overlap with weakened/lost loops (red) compared to a random sampling (black). Boxplots in (D) and (E) represent log2 fold-change between MCF10A and MCF10AT1 (D) or MCF10AT1 and MCF10CA1a (E) of distal H3K27me3 (grey), distal H3K27ac (red), promoter H3K27ac (orange), gene expression (yellow), and loops (blue) among upregulated or downregulated genes that overlap with gained loops (darker colors) or lost loops (lighter colors. Boxplots are defined as in (A). P-values represent T-tests comparing the mean values of the features at loops that change in the same and those that change in opposite directions from the differential genes at their anchors. Non-significant (n.s.) p-values are any p- values above 0.01.