Transcriptional coregulation in cis around a contact insulation site revealed by single-molecule microscopy

Maciej A Kerlin
Ilham Aboulfath-Ladid
Julia Roensch
Chloé Jaubert
Aude Battistella
Kyra JE Borgman author has email address
Antoine Coulon author has email address

Institut Curie, PSL Research University, Sorbonne Université, CNRS UMR3664, Laboratoire Dynamique du Noyau, Paris, France
Institut Curie, PSL Research University, Sorbonne Université, CNRS UMR168, Laboratoire Physique des Cellules et Cancer, Paris, France

https://doi.org/10.7554/eLife.106678.1

Open access
Copyright information

Figures and data

Population-level data suggest local transcriptional coordination in response to estrogen.
(A) From GRO-seq measurements²² of nascent transcription over time in response to estradiol (E2) in MCF7 cells, the similarity of transcriptional time profiles between genes is assessed using the Pearson correlation coefficient (i.e. insensitive to differences in offset or scale of the time profiles) and plotted against their genomic distance as a two-dimensional histogram. (B) Same analysis using only gene pairs separated by a strong contact insulation site as measured by Hi-C²³.

Measuring nascent transcription of three adjacent genes at individual alleles.
(A) Hi-C contact map and insulation score (low values indicate strong contact insulation) at our genomic locus of interest. (B) MCF7 cell imaged in five colors after combined oligo-based DNA FISH and single-molecule RNA FISH (gray: DAPI). Five allele loci are visible in the DNA channel. In the RNA channels, bright spots colocalize with allele DNA loci, revealing ongoing bursts of nascent RNAs, while dim spots correspond to single RNAs (see Methods and Figure S1D-I). Shown are maximum-intensity projections of band-pass filtered images. Scale bar: 5 μm. (C) Quantification of nascent transcription of FOS, JDP2, and BATF, measured by FISH at individual alleles, following a 40-min treatment with 100 nM estradiol (E2) or vehicle control (DMSO), represented as number of nascent RNAs (top) and fraction of active alleles (bottom). (D) Same data, represented as joint distributions of number of nascent RNAs at individual alleles, for all three gene pairs, in uninduced (DMSO) and induced (E2) conditions.

Multilevel decomposition of transcriptional correlations.
(A) For each gene pairs, the Pearson correlation coefficient between nascent RNAs at individual alleles is separated into two additive components, isolating the contribution of trans and cis effects. (B) The latter is further separated into two additive components, quantifying correlations due to burst co-occurrence and burst size correlations, respectively. Averages of 5 replicates (dots) are shown as bars, with standard errors as error bars. See Methods for statistical tests and p-values. (C) Our data and computational approaches yield a multiscale understanding of transcriptional correlation patterns, from population-level heterogeneity to allele-specific coupling and gene bursting dynamics.

Local genomic context and transcription at the FOS-JDP2-BATF locus.
Genomic context of our locus of interest described by publicly available data from MCF7 cells: (A) Hi-C contact map²³ and an insulation score (low score indicates strong insulation), (B) ChIP-seq for CTCF and ERα on E2 treated cells^29,30, and GRO-seq over time after E2 treatment²². (C) Histograms of EdU signal measured in MCF7 cells, after a 48 hrs treatment with different drug combinations, followed by 8 hrs of EdU incorporation. Percentages of arrested cells are indicated. (D) Distribution of fluorescence intensity of spots detected within nuclei of MCF7 cell, in the DNA FISH channel, using 12,005 fluorescent oligonucleotide probes labeling our locus of interest. Vertical line: intensity threshold above which a spot is considered to be an allele locus. (E) Histogram of the number of allele loci per nucleus. (F) Distributions of pairwise distances (left) between allele loci or (right) between allele loci and RNA FISH spots in each of the three RNA FISH channels (FOS, JDP2, and BATF). Vertical line: 650 nm. (G) Distributions of fluorescence intensity of FOS RNA spots, separating spots within 650 nm of an allele locus (yellow) from the rest (gray). The latter being considered single RNAs, the peak of their distribution is used, here and thereafter, to normalize fluorescence intensities. Vertical line: intensity threshold for an RNA spot to be unambiguously brighter than a single RNA, used to classify alleles a transcriptionally active. (H) Distributions of the number of nascent RNAs at individual alleles for all three genes, in uninduced and induced conditions. Black curve: expected distributions if the genes were not undergoing transcriptional bursting, i.e. producing individual RNAs as random and uncorrelated events.

Mathematical approaches to decompose transcriptional correlations.
(A) Considering measurements of nascent RNAs on two genes at individual alleles in single cells, the covariance of such measurements and the covariance of their sums or averages within cells can be combined to deduce the covariance across all the possible trans permutations. Intuitively, contains the products of the cell-wise sums (i.e. X₁Y₁ + X₂Y₂ +…) which, when expanded in terms of single-allele measurement (e.g. (x₁ + x₂ + x₃)(y₁ + y₂ + y₃) + (x₄ + x₅)(y₄ + y₅) +…), contains all intra-cellular pairs (i.e. x₁y₁ + x₁y₂ + x₁y₃ + x₂y₁ + x₂y₂ + x₂y₃ + x₃y₁ + x₃y₂ + x₃y₃ + x₄y₄ + x₄y₅ + x₅y₅ + x₅y₅ +…), out of which the cis terms can be subtracted out using (i.e. containing only the cis terms, x₁y₁ + x₂y₂ + x₃y₃ + x₄y₄ + x₅y₅ +…). See Methods. (B) The trans covariance captures extrinsic factors (cell-to-cell differences in cell cycle, availability of machinery…) and equals the allele-level covariance under the null hypothesis that genes X and Y at the same alleles do not correlate more or less than any other (non-cis) pair within cells. The residual hence captures intrinsic variability, i.e. effects that are specifically occurring in cis. Illustrated are the two extreme scenarios where the variability is dominated by extrinsic factors (left, i.e. ) or by intrinsic factors (right, i.e. ). (C) Our general approach can be used recursively on data sets with several hierarchical levels: the cell-wise aggregated data (sums or averages of allele-level data) can itself be aggregate further at the cell-type level, then at the tissue levels… etc, each time decomposing further the origin of the variability. (D) Potential application in a broader context includes single-cell and/or spatial multi-omics data, which may include time (development, response to stimuli…) and experimental conditions, as well as other ‘omics’ measurements than gene transcription (e.g. scATAC, scCUT&Tag, …). (E) The joint distribution of nascent RNA counts from two genes can separated into four quadrants, indicating for each allele whether either gene is bursting or not. The total covariance of RNA counts can be written as the sum of the covariance within quadrants and the covariance of the average RNA counts within quadrants, which can themselves be expressed as a rescaled version of the contingency table and deviation terms. Separating out the contribution of the contingency table from the covariance of RNA counts characterizes burst co-occurrence, while all the other terms involve burst sizes.

Mathematical approaches to decompose transcriptional correlations.
(A) Considering measurements of nascent RNAs on two genes at individual alleles in single cells, the covariance of such measurements and the covariance of their sums or averages within cells can be combined to deduce the covariance across all the possible trans permutations. Intuitively, contains the products of the cell-wise sums (i.e. X₁Y₁ + X₂Y₂ +…) which, when expanded in terms of single-allele measurement (e.g. (x₁ + x₂ + x₃)(y₁ + y₂ + y₃) + (x₄ + x₅)(y₄ + y₅) +…), contains all intra-cellular pairs (i.e. x₁y₁ + x₁y₂ + x₁y₃ + x₂y₁ + x₂y₂ + x₂y₃ + x₃y₁ + x₃y₂ + x₃y₃ + x₄y₄ + x₄y₅ + x₅y₅ + x₅y₅ +…), out of which the cis terms can be subtracted out using (i.e. containing only the cis terms, x₁y₁ + x₂y₂ + x₃y₃ + x₄y₄ + x₅y₅ +…). See Methods. (B) The trans covariance captures extrinsic factors (cell-to-cell differences in cell cycle, availability of machinery…) and equals the allele-level covariance under the null hypothesis that genes X and Y at the same alleles do not correlate more or less than any other (non-cis) pair within cells. The residual hence captures intrinsic variability, i.e. effects that are specifically occurring in cis. Illustrated are the two extreme scenarios where the variability is dominated by extrinsic factors (left, i.e. ) or by intrinsic factors (right, i.e. ). (C) Our general approach can be used recursively on data sets with several hierarchical levels: the cell-wise aggregated data (sums or averages of allele-level data) can itself be aggregate further at the cell-type level, then at the tissue levels… etc, each time decomposing further the origin of the variability. (D) Potential application in a broader context includes single-cell and/or spatial multi-omics data, which may include time (development, response to stimuli…) and experimental conditions, as well as other ‘omics’ measurements than gene transcription (e.g. scATAC, scCUT&Tag, …). (E) The joint distribution of nascent RNA counts from two genes can separated into four quadrants, indicating for each allele whether either gene is bursting or not. The total covariance of RNA counts can be written as the sum of the covariance within quadrants and the covariance of the average RNA counts within quadrants, which can themselves be expressed as a rescaled version of the contingency table and deviation terms. Separating out the contribution of the contingency table from the covariance of RNA counts characterizes burst co-occurrence, while all the other terms involve burst sizes.

Covariance matrices.
(A) The covariances calculated across all the alleles measured from a cell population can be separated into two components, respectively capturing trans effects (i.e. cell-to-cell heterogeneity) and cis effects (i.e. (co)variations that are not explained by cell-wise effects). See Methods and Figure S2A-B. Each component corresponds to a covariance matrix, the values of which are shown in panel (B) with the same color code. (B) Trans and cis components of the variances and covariances of nascent RNA counts, in uninduced (DMSO) and induced (E2) conditions. Each bar is the average of n = 5 replicates, shown as points. Numbers below the bars indicate the contribution of the cis and trans to the total (co)variance in DMSO conditions).

Covariance matrices.
(A) The covariances calculated across all the alleles measured from a cell population can be separated into two components, respectively capturing trans effects (i.e. cell-to-cell heterogeneity) and cis effects (i.e. (co)variations that are not explained by cell-wise effects). See Methods and Figure S2A-B. Each component corresponds to a covariance matrix, the values of which are shown in panel (B) with the same color code. (B) Trans and cis components of the variances and covariances of nascent RNA counts, in uninduced (DMSO) and induced (E2) conditions. Each bar is the average of n = 5 replicates, shown as points. Numbers below the bars indicate the contribution of the cis and trans to the total (co)variance in DMSO conditions).

Sign up for email alerts