Introduction

Breast cancer is the leading malignant disease in women worldwide. It is highly heterogeneous; based on molecular characteristics it can be divided into four broad subgroups: luminal A (estrogen receptor ER+ progesterone receptor PR+/-human epidermal growth factor receptor 2 HER2-), luminal B (ER+ PR+/-HER2+), HER2-enriched (ER-PR-HER2+), and triple-negative/basal (ER-PR-HER2-) (1). About 20 to 30% of early-stage breast cancer patients have a chance to ultimately develop distant metastasis, which accounts for 90% of all breast cancer-related deaths (2). Overall, metastasis implies the dissemination of cancer cells from the primary tumor site into the circulation system and later extravasation to a new site which leads to the formation of a secondary tumor. However, the organs affected by breast cancer metastasis vary remarkably, depending on the disease subtype, molecular features, primary tumor location, and secondary site microenvironment. This phenomenon is known as ‘metastasis organotropism’ (3). For example, breast cancer cells can metastasize to bone, lung, liver, and brain. However, bone metastasis accounts for a major percentage of metastatic cases compared to other organs (4). Moreover, luminal A and basal subtypes of breast cancer show a higher incidence of bone and lung metastasis, respectively, whereas luminal B preferentially metastasizes to both organs (5). Organotropic behavior is congruent with the ‘seed and soil’ hypothesis, according to which a cancer cell (seed) having specific intrinsic abilities will only be able to survive in a host local microenvironment (soil) that is favorable to it (6).

Secondary organ colonization by circulating tumor cells is a complex process, ruled by dynamic and evolving interactions between cancer cells and the host microenvironment. In most situations, the microenvironment is unfavorable, and the disseminated cells fail to survive in that ‘soil’. Despite the low survival rate of cancer cells during the metastatic colonization process, cells with the intrinsic capabilities to interact with the host microenvironment can survive and proliferate in the secondary site (7). Alternatively, cells might arrive at the secondary site and remain dormant under the influence of the local microenvironment (8-11) slowly acquiring characteristics that would later allow their proliferation. Importantly, the microenvironment surrounding primary breast tumors seems to condition the cells to specific metastatic profiles. For example, if the primary tumor stroma is enriched in transforming growth factor beta (TGFB), lung metastasis prevails through the induction of angiopoietin-like 4 (12). In contrast, a tumor stroma enriched with mesenchymal cytokines like C-X-C motif chemokine 12 (CXCL12) and insulin-like growth factor 1 (IGF1) predisposes primary cancer cell population to home into the bone marrow (13). How these different reactive microenvironments condition the origin of specific metastatic traits still poses an outstanding question.

The current view in the cancer biology field is that transcriptional gene signatures associated with organ-specific metastatic traits are regulated in part by cues from the reactive microenvironment that accompanies primary tumorigenesis (12-15). However, whether regulating those genes involves changes in 3D genome organization is still an open question. Recent advancements in imaging and molecular biology techniques have shown that the genome organizes in a non-random multi-layered fashion inside the nucleus, and each layer of organization contributes both independently and synergistically to gene expression regulation (16). At a large scale, the reorganization of genome compartments, in which euchromatic and heterochromatic genomic regions are spatially segregated, is also found to be associated with cancer progression (17-19). It has been shown that as breast cancer becomes endocrine-resistant, the change in compartment organization is accompanied by gene expression changes and differential estrogen receptor binding (17). Other studies have also found that an unintended consequence of various drug treatments and therapies is the rearrangement of chromatin architecture at different levels (17,20,21). Though these types of studies are essential for elucidating the association between the reorganization of the genome in three dimensions and oncogenic transformation, they do not consider the genome organization changes experienced by the metastatic cell that occur as a result of the interaction with the microenvironment of the secondary site. Genome spatial compartmentalization is strongly associated with cell type definition: genomic regions segregated into the A (euchromatic) or B (heterochromatic) compartments differ among cell types and tissue origin (22). These compartmentalization profiles help encode stable cell fate gene expression (23-25). Therefore, we hypothesize that similar to gene expression, metastatic cancer cells experience 3D genome structure rearrangements that mimic the compartment signatures of their new host microenvironment. To test this hypothesis, here, we focused on the breast cancer lung metastasis system. We compared genome organization data from different types of breast epithelial cell lines (normal, localized cancer, and metastatic) with normal and cancerous lung cells. Our analysis revealed that metastatic breast cancer cells have a higher proportion of lung-specific spatial compartment changes than localized breast cancer cells. We find similar brain microenvironment-specific compartment changes in prostate cancer cells that have the potential to metastasize to brain. These distal-site concordant compartment shifts often correspond to changes in gene expression in both cell lines and TCGA patient samples. However, we find that 3D genome changes do not always correlate with gene expression changes. Instead, particularly across the epithelial mesenchymal transition (EMT) spectrum, we find a set of genes that change expression without needing spatial reorganization and then a distinct set of epithelial and mesenchymal phenotype genes that change their spatial compartment without altered gene expression. We hypothesize that spatial genome organization facilitates immediate changes in gene expression in certain loci but can also poise certain regions for future activation in a way that favors EMT and adaptation to metastatic organ microenvironments.

Results

Modeling the breast cancer lung metastasis system

To investigate whether breast cancer spatial genome organization showed rearrangements specific to a distant organ metastatic site microenvironment, we needed to assemble genome spatial compartment data from different types of normal and cancerous cells for both breast and a secondary organ. Since most of breast cancer subtypes show a preference for lung metastasis, we narrowed our field to this metastatic site (26). Breast and lung normal and cancerous cell types for which publicly available genome organization (Hi-C) are available were selected for this work (Supplementary Table 1). A total of 15 cell types including primary cells and cancer cell lines from both organs were obtained. A classification of the cells used in this study based on their malignancy status and other molecular signatures is shown in Fig. 1a. Briefly, for the breast cancer system we considered both HMEC and MCF10A as normal (non-tumorigenic) breast epithelial cells and placed them at the starting node of the model. Cancer cell lines were then divided into three subcategories based on breast cancer molecular signatures: luminal (ZR751, BT474, ZR7530, MCF7 and T47D), HER2-enriched (HCC1954 and SKBR3) and triple-negative (HCC70, BT549 and MDA-MB-231). Further, within the luminal subtype, ZR751and BT474 were designated as localized cancers and the remaining three cell lines, ZR7530, MCF7, and T47D were used as models for ascites, lung, and lung metastatic breast cancers respectively. Similarly, for the HER2-enriched subcategory, HCC1954 and SKBR3 were considered localized and lung metastatic cancers, respectively. Finally, in the case of the triple-negative subtype, HCC70 and BT549 were used to model localized cancer whereas MDA-MB-231 represents lung metastatic cancer. For the lung portion of the system, HTBE primary cells represent normal lung epithelium and both the A549 and H460 cell lines represent localized lung cancer.

Spatial compartment identity segregates cells by subtype and epithelial-mesenchymal status.

a) Primary cells and cell lines used in this study are arranged depending on their molecular subtypes and malignancy status based on prior available characteristics. Green: normal breast and lung epithelium, red: luminal subtype, blue: HER2-enriched subtype, pink: triple-negative subtype, brown: localized lung cancer. b) Hierarchical clustering of genome-wide compartment identity data for all breast and lung cell types at 250 kb resolution; colors as in 1a. c) Principal component analysis of genome-wide compartment identity data for breast and lung cell types at 250 kb resolution; colors as in 1a. The PC2 axis is divided into two subspaces based on HER2 status (blue thick dotted line). d) GO term enrichment (biological processes) of the genes in genomic regions corresponding to the top 100 positive and 100 negative elements of the first eigenvector of genome-wide compartment identity PCA.

Subtype and epithelial-mesenchymal state-specific signatures are present in breast cancer compartmental organization

We compared the supervised ordering of the cell lines described above with the ordering derived from unsupervised analysis of genome-wide A/B compartment organization data. Compartment identity strengths were calculated using eigenvector decomposition on the 250 kb binned Hi-C data for each cell line (Methods). In this analysis a positive value signifies a euchromatic A compartment region, while a negative value represents a heterochromatic B compartment association.

When we applied hierarchical clustering on the genome-wide compartment data, we observed a grouping of cells that was congruent with the cell ordering constructed based on molecular and clinical features (Fig. 1b). For example, all the normal cell types, including both the breast and lung systems, were clustered together. Similarly, cells from the luminal subtype and the lung carcinoma cells were grouped in their respective clusters. However, we also observed a few interesting exceptions from the clustering result. MDA-MB-231 cells were grouped with the lung carcinoma cells. This implies that the genome organization of this cell line is closer to that of lung cells than to other breast epithelial cell lines. We also found that HCC70 and HCC1954, which are both cells from localized cancer, clustered relatively close to the normal epithelial cells. Collectively, this unsupervised analysis shows that the different subtypes of cancer exhibit distinctive 3D genome organization features at the compartment level (Fig. 1b), as previous studies suggested (18).

Performing principal component analysis (PCA) on the genome-wide compartment data revealed similar trends (Fig. 1c). Interestingly, the ordering of cell types along the first principal component (PC1) captured a transition between epithelial and mesenchymal-like cells. Cells characterized as highly epithelial were found on the left extreme of the PC1 axis, and as we move towards the right along PC1, the cells acquire a mesenchymal phenotype. For example, MDA-MB-231 cells are derived from healthy cells of epithelial origin but have undergone an epithelial-mesenchymal transition and show a mesenchymal-like phenotype (27,28). In our Hi-C principal components analysis, this cell line has a highly positive PC1 value. In contrast, MCF7 cells maintain a more epithelial morphology and are found at the opposite end of the PC1 spectrum from MDA-MB-231. The organization of cell lines along PC1 from spatial compartmentalization matches quite well with the previously characterized epithelial-mesenchymal classification of these cell lines based on transcriptomic data (Fig. S1) (29,30). The PC2 axis segregates the cell types depending on the presence or absence of the HER2, suggesting that in breast cancer cell lines, the presence or absence of this receptor and its downstream signaling network has important effects on the genome organization at the compartment scale. To further characterize the genomic regions that showed distinctive compartment signatures across cell lines, we extracted lists of all genes that fell into genomic bins with the 100 most positive and negative eigenvalues for PC1. Gene ontology enrichment results for these gene lists coincide with the idea that PC1 represents the EMT axis: PC1 positive genes have functions related to cell adhesion (tenascin C, JAG1, ACTA2), extracellular matrix organization (APP, LAMA2, FGF2, BMP4) and positive regulation of cell migration, consistent with a mesenchymal phenotype, while PC1 negative genes show GO terms related to epithelial maturation (Fig. 1d) Overall, our analyses show that the genome organization is not only able to distinguish different subtypes of cancers but also exhibits changes along the epithelial-mesenchymal transition.

Breast cancer transcriptome profiles also exhibit epithelial-mesenchymal state-based segregation

We next analyzed the gene expression signatures that emerged across different subtypes of breast cancer (data sources described in Supplementary Table 2) and compared those with the patterns obtained from the compartmental analysis. We first performed PCA on the transcriptome profile of all the breast and lung system cells considered in this study (Fig. 2a). From the low dimensional PC projection result, we observed that similar breast cancer subtype cells were arranged closely in the PC space based on their transcriptome profile. As in the compartment signature, MDA-MB-231 cells localize close to lung cancer cells A549 and H460, suggesting that this lung metastatic cell line has transcriptional similarities to lung cancer. We also noticed that along the largest PC axis (PC1), cells are again ordered based on their epithelial-mesenchymal transition state. Performing GO term enrichment analysis using the genes with the top 100 most positive and 100 most negative eigenvalues of the first eigenvector, we find that genes associated with PC1 positive values relate to epithelial organization, differentiation and negative regulation of cell migration, denoting a more epithelial phenotype (Fig. 2b). In contrast, PC1 negative genes relate to the mesenchymal functions with enriched terms for wound healing, hemostasis, cell movement, and ECM remodeling (Fig. 2b). Along with the unsupervised PCA, we also performed gene expression analysis in a supervised fashion using two different methods – gene set variation analysis (GSVA) and non-negative principal component analysis (nnPCA) (Fig. 2c and 2d, see Methods).

Gene expression differences also segregate breast cancer cell lines along an EMT axis.

a) Principal component analysis of transcriptome data for breast and lung cell types in this study. Cell types are colored based on their molecular subtypes as represented in Fig. 1a. b) GO term enrichment (biological processes) of the genes corresponding to top 100 positive and 100 negative elements of the first eigenvector of the transcriptome profile PCA. c, d) Projection of breast and lung cell types on a curated epithelial and mesenchymal axis based on their gene expression using gene set variation analysis (GSVA) (2c) and non-negative principal component analysis (nnPCA) (2d) methods.

Briefly, for each cell type, an E (epithelial) score and an M (mesenchymal) score were calculated based on the expression levels of a curated set of 232 epithelial and 193 mesenchymal genes (30,31). A higher E score represents more epithelial characteristics whereas a higher M score corresponds to a mesenchymal phenotype. The GSVA and nnPCA transcriptome analyses recapitulate a similar ordering of cell lines according to epithelial-mesenchymal characteristics.

Distinct EMT gene sets are affected by spatial compartment changes vs. transcription changes, revealing multiple layers of EMT signature

Numerous studies have already demonstrated that cancer cells segregate along an EMT axis by transcriptomics (29-34). Given that spatial compartmentalization is often correlated with gene expression (22), a simple possible explanation for the EMT segregation of cells by compartment signatures could be that genes which change expression also change compartment. An alternative hypothesis would be that transcription and 3D genome organization change represent two independent aspects of the EMT transition. To evaluate these possibilities, we compared the genes obtained from the compartmental PCA analysis, the transcriptome PCA analysis, and the curated epithelial-mesenchymal gene set (Fig. 3a). We found that the genes in regions that change spatial compartmentalization have very little overlap with the sets of genes that most varied in transcription across these cell lines or were curated as differently expressed EMT genes (Fig. 3a). To further investigate this result, we examined the transcription status of genes that changed compartment across the EMT spectrum and, conversely, the compartment status of genes that changed transcription (Fig. 3b, c, and d). As expected, hierarchical clustering of the detected highly varying compartments (Fig 3b, top) and highly varying transcripts (Fig 3c and d, bottom) resulted in clear segregation of the cell types into similar groupings (i.e., HMEC and MCF10A together, BT474 and SKBR3 together with an opposing pattern). But, the genes which strongly changed compartments between cell lines showed no consistent patterns of correlated expression change (Fig. 3b, bottom). Likewise, very few compartment differences existed across cell lines at genes whose transcription segregated cells by EMT status (Fig. 3c and d, top). Notably, for both the transcriptome PCA analysis and the curated EMT gene set, the majority of the differentially expressed genes are found in the active A compartment. Overall, our results favor the hypothesis that compartmentalization and gene expression signatures are capturing two independent factors in EMT transition. The genes which differ in expression across these cell lines are largely already accessible (A compartment) and therefore their expression is likely not controlled by spatial compartmentalization. Meanwhile, the additional EMT relevant genes that change compartment, but not transcription, may be now spatially poised for future gene regulation given additional environmental cues, as has been shown in other contexts (35-38).

Transcription and compartment changes capture distinct sets of EMT-related genomic regions.

a) Overlap analysis of the genes obtained from the compartmental analysis PC1 (red), the transcriptome analysis PC1 (green), and the curated breast cancer epithelial-mesenchymal gene set (blue). b-d) Hierarchical clustering of compartmental profiles (top row) and gene expression (bottom row) of the genes obtained from either (b) the compartmental analysis PC1, (c) transcriptome analysis PC1, or (d) the curated set of breast cancer epithelial and mesenchymal genes.

Lung metastatic breast cancer cell lines acquire lung-like genome architecture

To survive a metastatic environment, a cancer cell must express a repertoire of binding factors, extracellular matrix proteins, and ligands that positively interact with the secondary tissue microenvironment and foster colonization and growth. We hypothesized that for that to happen, the genome organization of the metastatic cell should change to resemble that of the cells at the secondary site, as a prelude of transcriptional activation. To evaluate this hypothesis, we compared the 3D genome organization of the metastatic and localized breast cancer cells with lung epithelial vs. lung cancer cells (Fig. 4a, 4b). As a baseline, we first compared the breast cancer compartment identity data with that of the normal breast epithelial cells. For each comparison, this resulted in four possible compartment identity combinations: AA (A compartment in both normal breast and breast cancer), AB (A compartment in normal breast and B compartment in breast cancer), BA (B compartment in normal breast and A compartment in breast cancer), and BB (B compartment in both normal breast and breast cancer). Similarly, the compartment identity of lung cancer cell lines was also compared with that of the normal lung cell. We then performed a cross-comparison between the breast and lung cancer comparisons, leading to a total of 16 compartment identity combinations (Fig. 4b). Of these, we were particularly interested in two specific combinations: AB_BB (Genomic regions which are in the B compartment in both normal lung and localized lung cancer, and which change from A to B in the transition from normal breast to breast cancer) and BA_AA (Genomic regions which switch from B to A in breast cancer compared to normal breast epithelium, and which are in the A compartment in both normal lung and lung cancer). These combinations were termed “lung permissive genome organization changes”, as the compartment switches in breast cancer make the cells more like the lung compartment pattern. These combinations allow us to distinguish changes that could foster lung-specific-survival changes from cancer-specific traits such as AB_AB and BA_BA (compartments that change in the same direction from normal to cancer cells in both breast and lung). We then evaluated whether lung-permissive 3D genome changes occurred more frequently in lung metastatic breast cancer than in localized breast cancer.

a) Normal breast epithelium (HMEC and MCF10A), localized breast cancer (BT474, HCC1954, and BT549), and lung metastatic breast cancer (MCF7, T47D, SKBR3, and MDA-MB-231) cells are used to model different stages of breast cancer progression. To model diseased lung, normal lung epithelium (HTBE) and localized lung cancer (A549 and H460) cells are used. b) Schematic diagram representing breast cancer lung permissive changes calculation. First, for both breast and lung, cancer cell compartment identity is compared with the normal epithelial cell. This results in four possible compartment identity combinations: AA (A compartment in both normal and cancer), AB (A compartment in normal and B compartment in cancer), etc. Then, a cross-comparison between the breast and lung systems leads to 16 compartment identity combinations. Two specific combinations: AB_BB and BA_AA are defined as lung permissive changes (yellow arrows). c) Fraction of lung permissive changes shown by localized and metastatic cancers from different breast cancer subtypes. d-f) TCGA patient gene expression (left) and GO term enrichment (biological processes) (right) for genes from regions exhibiting (d) BA_AA lung permissive changes in the luminal metastatic breast cancer subtype, (e) AB_BB lung permissive changes in case of HER2-enriched metastatic breast cancer subtype, or (f) BA_AA lung permissive changes in case of triple-negative metastatic breast cancer subtype. TCGA data from BRCA and LUAD tumor vs. normal sets used and sample size indicated.

To quantify the frequency of lung permissive changes, we calculated what fraction of the compartment changes in breast cancer vs. normal breast tissue were also counted as lung permissive. (e.g., for AB_BB breast cancer lung permissive changes, we calculate: # of AB_BB bins / # of breast cancer AB bins. Similarly, for BA_AA: # of BA_AA bins / # of breast cancer BA bins). We divided cells according to breast cancer subtype for this comparison. For the subtypes where several cell lines were available, we considered compartment identity combinations that were consistent between members of the group. We found that for luminal and triple-negative subtypes, metastatic breast cancer exhibits a higher level of lung permissive changes than localized cancer (Fig. 4c).

Genes affected by lung permissive compartment changes show lung-like expression levels cancer patients

After selecting the genomic regions that exhibit lung permissive changes in metastatic breast cancer cell lines, we extracted the underlying genes and used the GEPIA2 tool (39) to analyze the expression of these gene sets in TCGA breast and lung cancer patient data (Fig. 4d, 4e, and 4f; left panels). We found cases in which the gene expression levels change concordantly with the compartment. The results show that for the regions exhibiting lung permissive changes, the switch in compartmental identity is associated with the gene expression change from normal to tumor. That is, for BA_AA regions, the breast tumor expression of these genes is higher than normal breast tissue expression, coinciding with higher expression in the lung (Fig. 4d and f). Conversely, for AB_BB regions, the breast tumor expression of genes in these regions is lower than for normal tissue, corresponding to lower expression in the lung (Fig. 4e).

Organotropic 3D genome changes match target organ more than an unrelated organ

To verify that “lung permissive changes” truly occurred more than expected in lung metastatic cells, we used a different organ system-brain and brain cancer-as a negative control. Breast cancer can metastasize to brain(40,41), but the specific metastatic cell lines for which we have data are not metastatic to brain. Therefore, we would expect fewer “brain like” compartment changes in metastatic breast cancer that metastasizes to lung. Before comparing percent permissive changes between organs, we developed an adjusted version of the calculation to account for differing baseline probabilities of compartment change at the distant organ site (SFig. 2, Methods). The adjusted calculation still results in a higher level of lung permissive changes in lung metastatic breast cancer cell lines as compared to localized (SFig. 3d). For the brain comparison, we used the NHA cell line (normal human glia) and the SF9427 cell line (glioblastoma) as our reference models (SFig. 3a and 3b). We then calculated the adjusted level of brain permissive changes shown by localized and metastatic breast cancers (SFig. 3c). We found that the metastatic breast cancer cells used in this study showed lung-permissive changes more frequently than brain-permissive compartment changes (SFig. 3c and d). This is congruent with the fact that these metastatic breast cancer cell lines were all isolated from lung pleural effusions. We note that Hi-C data for brain metastatic breast cancer cell lines is not currently available for comparison.

Brain metastatic prostate cancer cell lines exhibit organotropic 3D genome changes toward brain

The analysis above suggests that lung metastatic breast cancer cell lines show more lung permissive than brain permissive genome organization changes. Next, we asked whether a cancer type that metastasizes to the brain exhibits brain-permissive compartment changes. For this purpose, we chose prostate cancer cell lines derived from localized (lymph node) and brain metastatic sites (LNCaP and DU145), and compared their genome architecture to the non-tumorigenic prostate epithelial cell line RWPE (SFig. 4a, b). We found that the brain metastatic cells showed more brain-permissive changes than the one derived from a lymph node (SFig. 4c). Also, particularly in the B to A direction, there were greater brain permissive changes in this prostate cancer system (Fig. 4c) than in the breast cancer system (SFig3c). This confirms that a cancer type that metastasizes to the brain matches the brain compartment profile more often than a cancer type that does not metastasize to the brain. We further found that the genes in the switched compartment regions have functions that could relate to leaving the prostate epithelium and adapting to the brain environment (SFig. 4d). For example, the DU145 (brain metastatic) brain-permissive B to A switched genes were enriched for functions in regulation of nervous system development (including semaphorin genes SEMA5A, SEMA3E, SEMA3A) and regulation of canonical WNT signaling pathway (such as WNT2, FZD4). Conversely, the regions that switch from A to B to match the brain B compartment and are enriched for keratinization and cell adhesion, suggesting a downregulation of prostatic epithelial properties.

Discussion

During metastasis, breast cancer cells spread preferentially to certain organs, a series of events widely known as metastasis organotropism. This phenomenon, also shown by other types of cancer, may be explained, to some extent, by the ‘seed and soil’ hypothesis which states that cancer cells will be able to colonize a distant site if the cells have intrinsic ability to survive in the microenvironment of the secondary site. Though the process of metastasis is complex and involves other factors, the primary tumor microenvironment conveys survival capabilities to cancer cells that can potentially favor survival in other organs; for this to occur, the cancer cell must engage specific gene expression signatures. However, there are many open questions related to the rise of organ-specific survival traits. Are these gene signatures a result or consequence of genome organization changes? What is the role of genome rearrangements in metastasis organotropism for which there are no significant gene expression alteration? Does the genome organization of the metastatic cell become similar to that of the cells at the distant site? In this work, we have made a step toward answering these questions by analyzing compartmental organization data of breast cancer cell lines derived from lung metastasis in comparison to lung cell lines.

Unsupervised clustering of compartment organization data showed that compartment organization can distinguish cancer cell lines by subtype, providing the first indication that lung metastatic cell lines may show similarities to lung cell genome organization. For example, the triple-negative invasive metastatic breast cancer cell MDA-MB-231, clusters close to the lung carcinoma cells (A549 and H460). The clustering result supports the conclusion that the breast cancer subtypes are not only quite different at the gene expression level, but also at the level of genome organization. Furthermore, it also shows that the genome organization is highly consistent among luminal subtype cancers, irrespective of the tissue from which the cells were collected, while the other subtypes such as triple negative, were more heterogeneous in their structure. This is congruent with studies involving gene expression which have shown that the triple-negative breast cancer category is highly heterogeneous, encompassing six subtypes (42).

When we compared the compartmental organization changes shown by the localized and lung metastatic breast cancers to the compartment state of the lung, we found that, for luminal and triple-negative subtypes, metastatic cells exhibit higher percentages of lung permissive changes compared to localized cancers. Given that these cancer cell lines were derived from lung pleural effusions, we cannot tell whether these metastatic cells acquired a lung-like compartmental organization at the primary tumor site or later while adapting to the metastatic site. To answer this, we would need temporal genome organization data of metastatic progression, which is not currently available. For the lung permissive chromatin conformation changes, we find that shifts toward the A compartment are favored compared to the B compartment. It may be possible that shifts toward the A compartment are predominant because both activation and gene repression are possible once a region is in the A compartment while a shift to the B compartment is not necessary to accomplish gene repression (35,43,44). Alternatively, this preference for shifts toward an A compartment could also be related to the general loss of heterochromatin previously observed in cancer (45). Indeed, we found that genes in these spatially reorganized regions, in patient samples, tend to change expression concordantly with their compartment changes. Overall, our results are consistent with the idea that genome compartmentalization can play a role in facilitating organ-specific metastasis.

While our gene expression and compartment analyses echo the current dogma that 3D genome compartments correlate with gene expression changes, further analysis revealed that gene expression and compartment signatures across breast cancer types surprisingly capture two orthogonal sets of genomic regions. We found that compartment profiles group breast cancer cell lines according to their epithelial vs. mesenchymal states (46). This observation becomes much more visible in the PCA transformed data as the cells are arranged along an epithelial-mesenchymal spectrum. While epithelial vs. mesenchymal states have been clearly associated with gene expression, it is also becoming clear that these states exhibit characteristic genome architectures as well. Indeed, a recent study shows that genome organization changes during epithelial-mesenchymal transition in ovarian cancer system, by considering some representative epithelial and mesenchymal cell lines (47). Here, we extend this conclusion for another major cancer, the breast cancer system by including a spectrum of breast and lung cancer and normal cell lines. Across this cohort, we find not only that both 3D genome architecture and gene expression changes occur during EMT, but also that these changes happen in different regions. The 3D genome changes that most strongly capture the differences across the epithelial and mesenchymal spectrum occur at regions where gene expression is not notably distinct between epithelial and mesenchymal cells. Likewise, the signature gene expression changes occur in regions that tend to be in the A compartment across all cell types. This suggests that many genes key to EMT are regulated by local factors in the A compartment, while major spatial shifts in chromosome regions may relate more to changing the likelihood of future inducibility of genes or alterations in the physical/mechanical properties of the cell during EMT, which can affect chromatin state (48,49). Overall, our results suggest that genome spatial compartment changes can help encode a cell state that favors metastasis (EMT) as well as enabling survival in a new organ context (lung or brain).

Methods

Hi-C data processing

To process Hi-C sequencing reads, we used the HiC-Pro (https://github.com/nservant/HiC-Pro) pipeline with default parameters (50). The reads were aligned to the hg19 reference genome. Also, for each cell type, all the available reads from all replicates were merged together to improve the quality of contact data for further downstream processing. After obtaining the valid pair interactions, we binned the interactions at 250 kb resolution for each individual chromosome and normalized the matrices using the ICE technique (51). Following that, the number of normalized contacts of each chromosome was scaled to 1M using ‘scaleMatrix.pl’ script from the cworld-dekker (https://github.com/dekkerlab/cworld-dekker) tool suite. And finally, in order to get the compartmental organization for each individual chromosome, principal components analysis was performed on the normalized and scaled matrices using the ‘matrix2compartment.pl’ script from the cworld-dekker tool suite. The compartment analysis assigns A/B compartment identity to genomic regions based on the sign of the elements of the eigenvector (A-positive and B-negative).

RNA-seq data processing

The RNA-seq files were processed as per the steps described in the previous publication (35). Briefly, the BBDuk (https://github.com/kbaseapps/BBTools) tool was used first for adapter and quality trimming (52). After that, the reads were mapped to hg19 human reference genome using STAR aligner (RRID:SCR_004463) (53). Once aligned reads were generated, HTSeq-Counts (https://github.com/simon-anders/htseq) tool was used to perform gene level feature count (54). As the RNA-seq files were obtained from various sources and generated using different experimental techniques, they suffer from the batch effect which masks the biological signal. To remove the batch effect, ComBat-seq (https://github.com/zhangyuqing/ComBat-seq) tool was applied on the raw gene counts data (55).

Hierarchical clustering of genome-wide compartmental organization

To cluster different cell types based on their compartmental organization, hierarchical clustering was performed on the genome-wide compartment identity data. First, the pairwise similarity score between all the cells were calculated using Pearson’s correlation. The similarity data was then further converted to distance data by subtracting from 1. And finally, hierarchical clustering was performed on the distance data using ‘ward’ linkage.

Gene set variation analysis (GSVA)

GSVA was performed to compute the enrichment scores for individual cell lines using RNA-seq data (56,57). A list of 232 epithelial signature genes, and a list of 193 mesenchymal signature genes were used to compute the E enrichment score (E score) and M enrichment score (M score) respectively (30,31,58).

Non-negative principal component analysis (nnPCA)

We performed nnPCA using the gene sets mentioned above. nnPCA provided higher resolutions for large number of samples in the EMT spectrum compared to GSVA according to previous studies (31,59). nnPCA determines the approximately orthogonal axes with non-negative coefficients (loadings) for features (genes). Variances of projections of data points (cell lines) onto these axes are maximized via an optimization method (60).

Correction for comparing organ permissive changes across different organs

It is possible that while comparing the host organ permissive changes across different secondary organs, the compartment switch ratio of different secondary localized cancers might include inherent bias to the comparison. To eliminate the bias, for the secondary organ systems, we first calculate the probability of a genomic region to be in a specific compartment in localized cancer given that the region belongs to the same compartment in normal cell (SFig. 3b and SFig. 4d). For the A compartment, p(TA|NA) represents the probability of a region in A compartment in localized cancer cell given that region also belongs to A compartment in normal cell. Similarly, p(TB|NB) for the B compartment. If we have more than one cell line representing the secondary site localized cancer, the minimum probability across all the cells is considered. Once we have those, the probabilities are subtracted from one and the resultant values are multiplied with corresponding secondary organ specific percentages. For example, (1-p(TA|NA)) is multiplied to the percentage of BA_AA and (1-p(TB|NB)) to the AB_BB.

Data availability

Datasets used for this project were obtained from NCBI GEO, EMBL ENA, 4DN Data Portal and ENCODE repositories and their accession IDs are given in Supplementary Tables 1 and 2.

Acknowledgements

The authors would like to thank all the members of McCord lab for their fruitful suggestions in developing the research project. This work was supported by the National Institutes of Health NIGMS grant R35GM133557 to R.P.M and American Cancer Society postdoctoral fellowship 134060-PF-19-183-01-CSM to R.S.M. T.H. is supported by the National Institutes of Health NIGMS grant R35GM149531.

Additional information

Disclosure statement

The authors declare no competing interests exist.

Author contributions

P.D. conceived the study in collaboration with R.P.M and R.S.M., performed data analyses, and prepared figures. R.S.M. performed the grouping and ordering of cancer cell lines into biological subtypes and interpreted the biological enrichment analysis. T.H. performed EMT gene signature analyses. R.P.M. supervised and guided the work. P.D. and R.P.M. wrote the manuscript with input from all authors.