Introduction

Robust regulation of gene expression is essential for normal cell physiology and embryo development. Variations in expression levels or ectopic activation at inappropriate times or locations can significantly affect cell phenotypes, fitness, survival, and differentiation [13]. Gene misregulation can be related to different developmental defects and disease phenotypes [4, 5]. Despite the need for strict control, single cell imaging has revealed random fluctuations (noise) in promoter transcriptional activity over time [6, 7]. Several studies have characterized transcriptional noise in the expression of developmental genes across organisms [815], and it is important to understand how this noise in single cells affects overall spatial expression patterning and subsequent development.

Gene expression is characterized by transcriptional bursts: intermittent periods of promoter activity followed by periods of inactivity in which transcripts are not produced [1626]. Randomness in switching between active and inactive promoter states has been shown to be a key driver of intercellular variability in gene product levels across isogenic cell populations [2737]. By analyzing the properties of transcriptional bursting in various genetic backgrounds, we can better understand the mechanisms for regulating the levels of gene products [16, 38, 39].

We present a comprehensive analysis of the transcriptional bursting properties of reporter genes driven by various regulatory enhancers and endogenous genes in early Drosophila embryos. We analyze publicly available live imaging data where single-cell transcriptional activity is tracked as a function of time and position in the embryo [40,41]. Each gene is labeled with the 24x MS2 sequence, which allows visualization of nascent transcripts in each cell with a fluorescent marker (MCP-GFP), allowing high temporal resolution to investigate the dynamics of bursting [42, 43]. We develop a simple yet robust algorithm to estimate promoter transcriptional states based on the accumulation and decay of individual fluorescence trajectories. These inferred promoter states are then used to quantify the spatiotemporal statistics of transcriptional bursting across the gene expression domain.

Among the properties of transcriptional bursting studied, we consider the burst duration (ON time), the time between successive bursts (OFF time), the rate of nascent mRNA transcription (loading rate), and the span from the first to the last burst (activity time). To examine how parameters such as spatial heterogeneity, different genes, or different enhancers affect burst dynamics, we selected enhancers of Krüppel (Kr) and rhomboid (rho) genes. The mean transcriptional activity of these genes drives expression that follows a graded pattern along the anterior-posterior and dorsoventral axes of an embryo, respectively [44, 45]. Interestingly, we find that the mean ON and OFF times are homogeneous across the expression domain, not showing the expression gradient observed in the average transcription activity. This heterogeneity can be mainly explained by differences in activity time, suggesting that different cells have different durations of active transcription but similar bursting properties once activated.

We compare two redundant sna enhancers (sna shadow and sna proximal), which drive spatially homogeneous ventral expression, revealing distinct bursting characteristics and indicating that enhancers can modulate burst timing statistics [46]. Furthermore, we study the endogenous even-skipped (eve) gene, with its seven-strip spatial pattern [47, 48]. Although eve shows homogeneous burst duration, we also found spatially varied activity time and inter-burst duration, aligning with observations from the Kr and rho constructs.

Results

This study characterizes the spatiotemporal dynamics of key genes involved in Drosophila embryonic development. We analyze gene expression dynamics during the 14th nuclear cycle (NC14), which corresponds approximately to a time window of 2-3 hours after fertilization. We use the enhancers of rho, Kr, and sna, which drive transcription of the MS2 reporter genes that recapitulate endogenous gene expression [21]. We also analyze the transcription of the endogenous eve gene (see Methods). All of these genes play an important role in the formation of body patterns in Drosophila [49].

Spatio-temporal properties of transcription bursts

The experiments analyzed (see Methods) consist of confocal microscope images of living Drosophila embryos (Figure 1A). These images discern fluorescence dynamics from individual cells, allowing single-cell resolution analysis along the anteriorposterior and dorsoventral axes of the embryo (Figure 1B). This fluorescence trajectory corresponds to transcription dynamics captured using the MS2/MCP system (Figure 1C). In this system, the number of nascent mRNA molecules is proportional to the fluorescence signal (Figure 1D, top panel). This trajectory reveals different periods of intense activity (referred to as bursts) with stochastic duration indicated by τON and intervals between successive bursts denoted as τOFF . To estimate the transcriptional state, we developed an algorithm (see Methods) based on the finding that the fluorescence signal increase (during ON state) and decrease (during OFF state) is approximately linear over time (Figure S1), with relatively similar slopes (Figure S2). In Figure 1D bottom, we show an example of an inferred transcription state trajectory from the fluorescence signal (see Supplementary Information Section S3 and Figure S3 for technical details of the inference algorithm).

The MCP/MS2-mediated live imaging system enables the visualization of transcriptional activity at the single-cell level in living Drosophila embryos.

(A) Representative snapshot of MS2-based imaging, where endogenous eve is tagged with MS2 to visualize the spatial expression pattern in embryos. Regions of the embryo (ventral, dorsal, anterior and posterior) are provided. (B) A single cell in the embryo showing GFP fluorescence with intensity assumed to be proportional to the number of nascent mRNAs. (C) Schematic showing how the nascent transcripts are tagged with fluorescent proteins using the MS2/MCP system. (D) (Top:) Representative snapshots of a single nucleus producing transcripts (green puncta) over time. (Bottom:) Fluorescence dynamics over time (green line) and the inferred transcriptional state (red line). From the inferred state, we estimate the duration of active (ON) states (τON ) and the duration of inactive (OFF) states (τOFF ). The black dashed line shows the burst threshold: the level where any sudden increase from the base level is considered as a burst (see Methods). Bracket in the top of the trajectory represents the activity time: the time between the beginning of the first burst and the end of the last one.

The fluorescence intensity averaged throughout the nuclear cycle reveals a distinct spatial pattern in the embryo (Figure 1A). Here, we relate this mean fluorescence to the statistics of τON and τOFF . In addition to τON and τOFF , our algorithm also estimates the loading rate λ*, which is defined as the signal increase rate that best predicts the signal dynamics (see Methods) and is specific for each cell. Finally, we also estimate the time between the beginning of the first burst and the end of the last burst, which is called the activity time. We will study how these different burst properties vary spatially across the embryo, and which properties have the best concordance with the single-cell averaged mean fluorescence level.

The activity time as an important contributor to the control of gene expression

We first analyze the burst statistics of the embryos expressing rho and Kr genes. rho shows a graded expression along the dorsoventral (vertical) axis and Kr shows a gradient along the anterior-posterior (horizontal) axis (see Figure 2A and 2G) [44, 45]. rhoNEE enhancer that drives the neuroectodermal expression of rho is used to drive the reporter gene MS2-yellow expression [45]. On the other hand, the expression pattern of Kr is driven by the enhancer Kr CD2 [50].

Heterogeneous spatial expression patterns for enhancer rho and Kr.

(A,G) Gene expression driven by the rho (A) and Kr (G) enhancers which exhibit a dorsoventral and anterior-posterior gradient, respectively. The yellow box indicates, approximately, the region imaged by live imaging. (B,H) Representative fluorescence trajectories (green) and the inferred transcriptional state (red) for rho (B) and Kr (G). Fluorescence signals are normalized to their maximum. Activity time is represented by black brackets. Dashed horizontal line represents the burst threshold. (C,I) Average fluorescence intensity, (D,J) cell averaged τON , (E,K) average τOFF , and (F,L) the activity time. The position of each point represents one cell in the beginning of NC14 and its color represents the mentioned average burst property. Cells presented in (B,H) are highlighted in red in (C-F) and (I-L) respectively. Marginal plots show the mean variables calculated with a spatial binning. The error bars show the 5% and 95% quantiles, while the length of the boxes show the 25% and 75% quantiles for each bin. Black dots represent the mean value. The means are connected by lines to show the spatial trends. Marginal plots also present the color scale at their edges used in the central scatter plot.

Our initial hypothesis was that gene expression can be directly associated with the frequency and duration of transcriptional bursts. This means that in regions with low or high mean gene expression, we expect less or more frequent bursts, respectively. To verify the validity of this hypothesis, we plot the spatial patterns of the average fluorescence intensity of rho (Figure 2C-F). As examples of observed dynamics, four representative nuclei from the gene expression domain were selected and their transcriptional trajectories were plotted (Figure 2B). In Figure 2C, we show a scatter plot with each dot representing one cell. The position of each dot represents the position of the respective tracked cell at the beginning of NC14. The color of the dots represents the fluorescence level averaged throughout NC14. To better visualize the spatial gradient of the burst properties, we also made marginal plots by grouping the cells into bins along the dorsoventral and anterior-posterior axis and calculating the statistics for each bin.

Figure 2D shows spatial patterns of the cell-averaged τON . We excluded cells that did not show fluorescence levels above the burst threshold, as it is not possible to estimate the average τON . The marginal plots in Figure 2D show the variation of the cellaveraged τON for the selected bins. These bins reveal that the mean τON for multiple regions of the embryo is constant, with a value of about 1 min. Similar spatial trends for τOFF are presented in Figure 2E. Marginal plots show that the mean τOFF along the axes also shows a low variation throughout the embryo. In this case, the average τOFF is around 3 min.

The finding that the cell averaged τON and τOFF is spatially uniform contradicts the hypothesis of a non-homogeneous gene switching rate. To explain pattern formation in the selected genes, we noticed, as presented in Figure 2B, that the dynamics of gene expression can be simplified as periods of activity. During these long periods, multiple bursts of similar duration occur at an approximately constant rate. This motivated us to define the activity time as the span from the first burst to the final one. The duration of this activity time is marked with a bracket on the signal trajectory in Figure 2B.

We notice that spatial gene expression corresponds better to the patterns of activity time (Figure 2F) than to the cell-averaged τON and τOFF . This suggests that the activity time better reflects the spatial heterogeneity of gene expression. This is confirmed by the Kr construct, where mean fluorescence shows a similar spatial pattern to activity time (Figures 2G-2L). Although Kr has different mean fluorescence levels along the embryo axis compared to rho, the average τON and τOFF in Kr are relatively homogeneous with mean durations similar to those in rho. Our analysis indicates that the graded expression pattern results from the modulation of activity time rather than bursting properties.

Different enhancers can modify the time between successive bursts

Enhancers are short non-coding DNA sequences that interact with the target promoter to drive transcription (Figure 1C). Particular enhancers can increase the likelihood of transcriptional initiation by containing binding sites for transcription factors and other proteins that promote transcription [51]. We now ask whether different enhancers of the same gene can affect the burst-timing statistics. We studied the sna gene, which has a broad expression domain on the ventral side of an embryo, as shown in Figure 3A. sna is regulated by two enhancers, the proximal enhancer (snaPE ) that is located right upstream of the transcription start site (TSS) and the distal or shadow enhancer (snaSE ) located 8kb upstream of the TSS [46]. Both enhancers are known to drive sna expression in the same domain at the same time, but snaPE drives weaker expression than snaSE (Figures 3B and 3G) [52].

Spatial patterns of sna gene expression for different enhancers: shadow (SE) and proximal (PE)

(A) Image of the sna expression across the embryo. The yellow box indicates, approximately, the region imaged. (B, G) Representative gene expression trajectories normalized to the maximum (green) and the inferred transcriptional state (red) for snaPE (B) and snaSE (G). Activity time is represented by black brackets. Dashed horizontal line represents the burst threshold. These cells are highlighted in red in (C-F) and (H-K). (C, H) Spatial trends of average fluorescence, (D, I) average τON , (E, J) average τOFF , and (F, L) activity time. Marginal plots show how the mean changes in different spatial bins of the expression domain. Bar length represents the 25-75% interquantile distance and error bars, the 5-95% interquantile distance.

Comparing the bursting properties of snaPE and snaSE, we find some similarities with the rho and Kr genes. For example, τON and τOFF have low spatial heterogeneity and the activity time is highly correlated with the average fluorescence (Figures 3C-F and 3H-K). However, some differences are also observed. For example, the mean fluorescence intensity is higher for the SE enhancer than for the PE enhancer, although both regulate the same gene (Figures 3C and 3H). This difference can be explained by a shorter average τOFF for the SE enhancer. Since snaSE is considered a stronger enhancer than snaPE, producing more mRNA at a given duration, our results suggest that snaSE can drive stronger expression by spending less time in the OFF state, which can also explain similar results found previously [40]. Our analysis indicates that different enhancers of the same gene can induce different bursting behaviors and therefore regulate the level of mRNA production differently.

Statistical analysis of transcriptional bursting events

To better understand the statistical properties of the burst timing variables, we collect all the values of τON and τOFF for different replicate embryos and study their statistical distribution without discrimination by cell or position. More specifically, we quantify the mean τON and τOFF (denoted by ⟨τON⟩ and ⟨τOFF⟩, respectively). The random variability in τON and τOFF is measured by the squared coefficient of variation (the variance over the mean squared) and represented by and , respectively. These statistics are estimated over all cells in the embryo during NC14 for 3 different embryos (biological replicas) per construct.

While τOFF follows an exponential distribution, τON has a hypo-exponential distribution

The analysis of ⟨τON ⟩ and ⟨τOFF ⟩ is presented in Figure 4A. There, we show that ⟨τON ⟩ ≈ 1 min while ⟨τOFF ⟩ ≈ 3 min for Kr, rho, and snaPE. The main outlier corresponds to snaSEτOFF⟩ ≈ 2 min. These results show that the properties of the burst dynamics are consistent not only for different replicas, but also among different genes. Regarding the random variability of the time intervals, Figure 4B shows how while in all replicas and genes, for Kr, rho, and snaPE. The main outlier, snaSE, shows a variability of . As a reference for the interpretation of this metric, we recall that the coefficient of variation for an exponentially-distributed random variation is 1 (dashed line in Figure 4B).

Statistical properties of transcriptional bursting timing across different constructs.

(A) Mean duration of τON and τOFF for the four studied enhancers, each with three replicas. (B) Random variability of τON and τOFF measured by the squared coefficient of variation and respectively. The dashed line represents the coefficient of variation of one for an exponentially-distributed random variable. (C) Probability density for τON for the sna enhancers. The PE enhancer (left) is compared to the SE enhancer (right). The mean and squared coefficients of variation are presented with the 95% confidence interval. The solid line represents the best fit to a gamma distribution. (D) Analysis similar to (C) but for τOFF . The resolution of τON and τOFF is 20 s.

To better visualize the interpretation of the metrics of and , we plot the histograms of τON for snaPE and snaSE (Figure 4C). For these enhancers . This means that the distribution of τON is hypoexponential, i.e., with less random variation than an exponential distribution (which has a coefficient of variation of 1). We also plot the histogram of τOFF in Figure 4D for the same enhancers. τOFF for snaPE has a mean of approximately 2.8 min and a noise of similar to the exponential distribution. snaSE, on the contrary, shows a shorter ⟨τOFF ⟩ ≈ 1.7 min and a lower variability .

Correlation between burst variables an mean fluorescence

In this section, we explicitly examine how the cell-averaged fluorescence relates to four other variables: cell-averaged τON , cellaveraged τOFF , activity time and optimal loading rate λ* (equation (3) in Methods). We gather data from three biological replicate embryos for each construct and observe the trends between these variables.

We observe that activity time increases monotonically with increasing mean fluorescence intensity, reaching saturation at high levels of mean fluorescence (Figure 5A). On the other hand, cellaveraged τON , cell-averaged τOFF , and λ* show weaker correlations with mean fluorescence. Cell-averaged τON increases slightly with increasing mean fluorescence, validating our simplification that τON is essentially constant across transgenic constructs (Figure 5B). The cell-averaged τOFF shows a slightly decreasing trend with mean fluorescence, although the trend is weaker compared to the relationship between activity time and mean fluorescence (Figure 5C). Relatively longer periods of τOFF are observed for nuclei with lower gene expression. Finally, there is almost no correlation between the loading rate λ* and the mean fluorescence (Figure 5D).

Statistical properties of the transcriptional bursts with respect to the mean fluorescence for different enhancers.

We plot the trend of different signal statistics such as (A) activity time, (B) mean ON state duration, (C) mean OFF state duration, and (D) mean loading rate (the rate of signal increase) relative to the mean fluorescence. Each dot in the plots represents the mean value of a single cell obtained from its fluorescence trajectory during NC14. These trends suggest that the activity time aligns better with mean fluorescence.

The statistics of the eve gene expression

Until now, the data analyzed to estimate the burst statistics correspond to transgenic constructs. Next, we performed a similar analysis of the transcriptional burst statistics of eve, an endogenous gene with a more complex spatial pattern of expression. The average fluorescence level of eve has a pattern of seven stripes during early embryonic development (Figure 6A) [47]. These stripes are the result of interactions between eve and gap genes defining the position of specific para-segments that set the stages for the final segmentation of the embryo body [53]. As observed in Figure 6A, we analyze transcription trajectories of the nuclei corresponding to stripes 2-5. Examples of typical fluorescence trajectories of eve expression can be seen in Figure 6B showing, in general, a lower average intensity than the synthetic constructs (Kr, rho and sna) but a similar basal intensity. Regarding the timing statistics, we observe low spatial variability on the cell-averaged τON and τOFF on the spatial domain of eve (Figure 6D and 6E). Furthermore, these mean times also show values similar to those of the synthetic constructs: ⟨ τON ⟩ ≈ 1 min and ⟨τOFF⟩ ≈ 3 min. Again, the activity time appears to align better with the average fluorescence pattern (Figure 6F).

Spatial patterns of endogenous eve gene expression.

(A) Expression pattern of the eve gene. The yellow box indicates, approximately, the region imaged. (B) Expression trajectories normalized to the maximum (green) and the state of the inferred gene (red). Activity time is represented by black brackets. The horizontal dashed line represents the burst threshold which is the same as used in the constructs. These cells are highlighted in red in (C-F). (C) Spatial trends of average fluorescence, (D) average τON , (E) average τOFF , and (F) the activity time in the studied embryo domain. Bar length represents the 25-75% interquantile distance and error bars, the 5-95% interquantile distance.

Figure 7 presents a more detailed analysis of the relationship between mean fluorescence and burst properties for the eve gene, based on data from six biological replicates. As with synthetic constructs, cell-averaged fluorescence shows a strong alignment with activity time (Figure 7A), but only a weak correlation with cell-averaged τON (Figure 7B) and loading rate (Figure 7D). Cell-averaged τOFF shows a stronger decrease with mean fluorescence compared to the one observed in synthetic constructs (Figure 7C). The mean burst duration for eve shows relative consistency over replicas with similar mean (Figure 7E) and (Figure 7F). However, τOFF has slightly less consistency on the mean ⟨τOFF ⟩ while the random variability was hyperexponential .

Properties of transcriptional burst timing for the expression of endogenous eve gene.

Plots of average burst properties with respect to single-cell averaged fluorescence level (A) activity time. (B) Mean τON , (C) Mean τOFF and (D) Mean loading rate. Each dot in these plots corresponds to a cell and the statistics are calculated over the NC14 time-span. Error bars represent the binned mean trend with 95% confidence. We also present the statistics of the burst timing estimated using all bursts and all cells for 6 different biological replicas: (E) mean burst duration ⟨τON ⟩, (F) variability on burst duration , (G) mean time between successive bursts ⟨τOFF ⟩ and (H) variability in the time between bursts . Horizontal dashed lines in (F) and (H) correspond to a coefficient of variation of 1 for the exponential distribution. Error bars represent the 95% confidence interval.

The relatively large observed in eve is mainly due to the presence of abnormally long inter-burst times, often exceeding 12 min (Figure S4A). To validate this concept, we reanalyzed the burst statistics, focusing exclusively on cells with inter-burst times of less than 12 min. After excluding these cells, all burst properties align with those observed in synthetic constructs: ⟨τON ⟩ ≈ 1 min and ⟨τOFF ⟩ ≈ 3 min, while and (Figure S4C-D). The comparison of the burst properties versus the mean fluorescence after this filtration is also presented in Figure S4E showing a clearer trend in activity time versus fluorescence and a flatter relationship between the mean τOFF and the mean fluorescence. We believe that these long periods are due to the eve regulation by multiple stripe-specific enhancers, in contrast to transgenic constructs, which typically correspond to the effects of a single enhancer. The complex interaction between these enhancers likely contributes to increased time variability while maintaining a similar spatial pattern of ⟨τON⟩ and ⟨τOFF⟩ across stripes. In addition, it is known that eve stripe expression shifts anteriorly during NC14, such that some cells expressing eve at the posterior boundary of a stripe early in NC14 become transcriptionally inactive later [41]. For example, a cell located between stripe 1 and stripe 2 may initially express genes early in NC14 due to the activity of stripe 1 enhancers. Midway through NC14, it becomes transcriptionally inactive and later in NC14, as the expression domain shifts, it begins to express genes regulated by stripe 2-specific enhancers. This trajectory results in a long inactivity period between bursts, which contributes to higher variability. In fact, nuclei with inter-burst times of more than 12 min are mostly located at the edges of each stripe (Figure S4B).

In summary, our analysis suggests that the bursting properties of endogenous genes are comparable to those of developmental enhancers. These findings align with a previous study that characterized the bursting properties of eve using a reporter construct based on bacterial artificial chromosomes [48, 54]. Similarly to what has been observed in enhancers, the mean gene expression level for the eve gene is mainly aligned with changes in activity time despite showing a more complex spatial pattern (Figure 7A). We observed some cells with anomaly long interburst duration that biased the τOFF statistics. We hypothesized that these long periods are likely due to the coordinated action of multiple enhancers that collectively regulate eve expression.

Discussion

In this article, we aim to explain how the spatial pattern of mean gene expression in the early embryonic development of Drosophila emerges from the temporal dynamics of transcriptional burst activity at the single cell level. We analyze the dynamics of the amount of nascent RNA molecules measured using MS2 tagging. To infer the transcriptional state (active or inactive), we propose an algorithm based on the signal slope (see Methods). The simplicity of our algorithm helps us analyze thousands of fluorescence trajectories from individual nuclei with high throughput, providing reproducible results on inferred transcriptional states. Our analysis reveals that fluorescence accumulation and decay follow linear functions of time (Figure S1). Although simple, this dynamics requires a careful explanation using a model that considers a detailed description of RNA polymerase recruitment and elongation on the gene. Although this kind of modeling is beyond the scope of this article, we show that the process can be simplified by a two-state model as described by equations (1) and (2) in Methods to capture key transcriptional burst properties.

On the basis of the statistics of transcriptional bursts, it is possible to infer properties of the mechanism controlling the burst timing. There are known mechanisms that drive and regulate transcriptional bursting, such as chromatin dynamics [5559], 3D enhancer-promoter interactions [54, 6063], transcription factor binding [6468], mRNA polymerase pausing [69], and interactions between developmental promoter motifs and associated proteins [70]. Our observation of exponentially-distributed interburst timing (Figure 4B) suggests that burst initiation may be governed by a single rate-limiting step in these developmental enhancers. The observation of burst-duration timing to be hypoexponentially distributed as shown in Figure 4 B (that is, these times have statistical fluctuations less than an exponential distribution) implies a more complex transcriptional turn-OFF with multiple rate-limited steps. Findings of non-exponential promoter state durations have previously been made for other promoters in their OFF times [71, 72], and also for their ON times [73]. Further analysis of whether the durations of consecutive bursts are related (Figure S5) revealed weak correlations suggesting ON/OFF times being independent and identically distributed random variables as good approximations for the observed bursting behavior.

There is a rich body of literature connecting transcriptional bursting to fluctuations in levels of a given gene product [7484]. Borrowing this recent mathematical modeling framework, we explore the impact of non-exponential promoter switching times on the statistical fluctuations in gene expression levels. We linked transcription states (ON/OFF) to the generation and degradation of a gene product, with the time spent in each state itself being an independent and identically distributed random variable following an arbitrary distribution (see Supplementary Information Section S5). Our analysis shows that the noise in mRNA levels increases with the noise in the time spent in either of the ON and OFF states (Figure S6). These results imply that tight control of transcriptional burst durations, as seen in this study, will function to buffer stochastic variations in expressed mRNA levels. An intriguing finding is the consistency in the statistics of promoter ON/OFF times across cells and different genes suggesting common mechanisms controlling bursting timing. It is worth mentioning that many studies have suggested that the kinetics of bursting can vary under different conditions [48, 85, 86], and that the time between successive bursts was observed to be controlled by the gene enhancer. This control includes, as shown here, decreasing the mean duration and regulating its random variability to hypoexponential levels. In future research, we plan to modify the distance and position of the enhancer relative to the promoter to better understand the mechanisms of bursting regulation.

We found that one of the main contributors to the formation of the gene expression pattern is activity time (Figure 5), and there is also a weak dependence of the mean ⟨τOFF⟩ on transcription activity. This observation is similar to previous findings [85, 87]. The activity time and ⟨τOFF⟩ are related variables and their regulation can share similar mechanisms of action. Specifically, we found cells with isolated periods of long inactivity (>12 min) in between bursts suggesting that these cells can have more than one activity time. In recent work [88], monitoring the transcriptional activity of a promoter under the control of Polycomb proteins has shown a switch between an OFF state and a transcriptionally permissive state, with multiple bursts occurring during the permissive period. In this context, the activity time can be understood as the permissive period that is developmentally regulated to drive spatial expression patterning.

The observed importance of activity time in controlling gene expression patterns raises interesting questions about its molecular regulation. Since activity time represents the temporal window during which a gene undergoes bursting cycles, it probably reflects the duration of allowed promoter states for transcriptional activation. We hypothesize that binding of transcription factors (TFs) to cis-regulatory sequences could control this window of activity. For example, Dorsal, a transcription factor that activates sna and rho studied here, shows a graded expression along the dorsoventral axis of the embryo, showing its highest expression at the most ventral point and a gradual decrease in expression level [89]. The activity time of the Dorsal target genes (sna and rho), correlates with Dorsal level. Specifically, Dorsal target genes show a longer activity time in the ventral region compared to the more dorsolateral domain. Moreover, cooperative binding of multiple TFs at enhancers may create a stable microenvironment that persists for the duration of the activity time. Our observation that different enhancers (such as snaPE versus snaSE ) exhibit distinct activity time distributions supports this model, as these enhancers contain different combinations of TF binding sites that could affect the stability and duration of active transcriptional states.

In summary, we present a simplified model to capture the stochastic dynamics of transcription in single cells and propose high-throughput techniques to infer promoter states. Our results reveal that these burst timing statistics exhibit uniform spatial patterns within the embryo, with ON times showing tighter statistical fluctuations compared to the OFF times. Moreover, a major determinant of spatial expression regulation for different developmental enhancers is the activity time that is related to the permissive period of transcription.

Methods

Experimental details

MS2/MCP system is used to visualize nascent transcripts (Figure 1C). To build the MS2/MCP mechanism, 24 copies of MS2 sequences are inserted into the 5’ UTR or 3’ UTR of the target gene of interest. Upon transcription, MS2 sequences form a stem-loop structure, which can bind to maternally loaded MCP proteins fused with GFP. As a result, clustered MCP-GFP molecules at the site of active transcription can be detected as fluorescent puncta in each nucleus (Figure 1B).

The data analyzed in this article, for rho, Kr, and sna genes, correspond to the experiments carried out by the reference [40], where the regulatory enhancer of each gene drives the expression of the MS2-yellow reporter gene (Figure 1C). For eve, we used data from [41], where endogenous eve is tagged with MS2 using CRISPR-mediated genome editing. For the nonendogenous data, it has been observed that the dynamics of the reporter gene recapitulate the expression of the endogenous gene. All images are taken in embryos 2-3 hours after fertilization, corresponding to the 14th nuclear cycle (NC14). From each live imaging movie, we track the fluorescence intensity of the MS2 puncta in each nucleus for the duration of NC14 and use it as a measure of instantaneous transcriptional activity (Figure 1D).

Transcription state Inference

As shown in Figure 1D, a typical fluorescence trajectory reveals complex dynamics. We can segment the complete signal into periods of no activity when the fluorescence stays at a basal level (before the first burst and after the last one) and periods of activity when transcriptional bursting occurs. Our goal is to determine the precise moments at which these bursts occur which are related to the moment in which the fluorescence signal rises. To do so, we first explain the model on which the inference method is based and then explain the method itself.

Phenomenological modeling of the fluorescence signal dynamics

To simplify our analysis, we consider that the fluorescence signal dynamics is driven mainly by the transcriptional state. This state exists in two modes: ON and OFF. When the transcriptional state is ON, high transcriptional activity results in rising fluorescence levels (illustrated by dark green periods in Figure 1D). Conversely, during the OFF state, the absence of RNA polymerase recruitment causes the fluorescence signal to decrease or stabilize at its base level (light green periods in Figure 1D).

Given the possible transcriptional states (ON and OFF), we define the function ϕ(t ) as follows:

which quantifies the transcriptional state value. When the state is ON, we assume that the fluorescence levels increase at a constant rate. Similarly, when the gene is OFF, the fluorescence also decreases at a constant rate. This linear accumulation and decay approximation was concluded after a detailed analysis of fluorescence accumulation and decay trajectories in response to ON/OFF states, as presented in Figure S1. In addition to a good fit with linear functions of time, we observe that the slopes for decreasing/increasing dynamics have very similar mean absolute values (Figure S2).

In this context, we simplify the dynamics of the fluorescence signal assuming that the signal increases and decreases with the same absolute rate λ that could be different for each cell. Given the transcriptional state ϕ(t ), the fluorescence signal ym is modeled as per the following ordinary differential equation:

with the superscript m standing for the model-predicted signal. The burst threshold y is defined such that below this threshold, we assume that no bursting occurs and the transcription state is OFF (Figure 1). To determine y , we selected the maximum fluorescence level in cells that did not exhibit noticeable bursts. This value was found to be y = 0.1 (a.u.). Since basal fluorescence levels did not differ significantly between genes, we used the same threshold for all genes. Next, we will explain how, assuming (2), we infer the most probable λ and the corresponding transcriptional state for each cell.

Transcription state inference method

Since the data consists of sampling the fluorescence signal every Δt = 20s, we consider a discrete-time analog of (2) that follows

where t = {0, Δt , 2Δt , … }. Given experimental measurements yt , our objective is to infer the transcriptional state ϕt that best explains the data. The technical details of the defined algorithm are presented in Supplementary Information Section S3, and are briefly summarized here.

The inference method starts with the experimental trajectory yt, which is first smoothed to obtain the signal by convolving it with a triangular kernel function to not flatten the signal peaks too much (see Supplementary Information Section S3). Next, we propose a value for λ for all transcriptional bursts in a single nucleus. Given and λ, we first obtain the transcription activity ρt defined as

and is a quantity which estimates the rate of signal increase relative to λ. We perform a new round of smoothing of ρt with a square kernel function (see Supplementary Information section 3) obtaining the smoothed . In this way, the inferred transcriptional state ϕt is calculated as follows:

The resulting ϕt is replaced in (3) obtaining a trajectory that depends parametrically on the particular λ. Finally, the optimal λ, denoted as λ*, is defined as the loading rate for that specific nuclei that minimizes the mean square distance between the data yt and the resultant defined as:

The optimal λ* will predict the optimal ϕt by using (5) which is taken as the most accurate trajectory of the transcriptional state. Examples of best-fit trajectories ϕt and are shown in Figure S3D-E, respectively.

Data accessibility statement

Data set and data processing scripts are publicly available at https://doi.org/10.5281/zenodo.13208281 [90].

Acknowledgements

B.L. is funded by NIH R35 GM133425. A.S. and C.N. acknowledge support from NIH-NIGMS through grant R35GM148351.

Additional information

Author Contributions

  • Conceptualization: B.L. C.N., A.S.

  • Investigation: Z.V., C.N.

  • Data Curation: B.L., Z.V., C.N.

  • Formal Analysis: Z.V., C.N.

  • Funding Acquisition: A.S., B.L.

  • Investigation: C.N., Z.V., B.L., A.S.

  • Methodology: B.L., Z.V., C.N.

  • Resources: A.S., B.L.

  • Software: C.N., Z.V.

  • Supervision: A.S., B.L.

  • Visualization: C.N., Z.V.

  • Writing: C.N., B.L., A.S.

  • Writing – Review Editing: B.L., A.S.

Funding

National Institutes of Health (R35GM133425)

National Institutes of Health (R35GM148351)

Additional files

Supplementary Information