Deep learning linking mechanistic models to single-cell transcriptomics data reveals transcriptional bursting in response to DNA damage

eLife Assessment

This study presents DeepTX, a valuable methodological tool that integrates mechanistic stochastic models with single-cell RNA sequencing data to infer transcriptional burst kinetics at genome scale. The approach is broadly applicable and of interest to subfields such as systems biology, bioinformatics, and gene regulation. The evidence supporting the findings is solid, with appropriate validation on synthetic data and thoughtful discussion of limitations related to identifiability and model assumptions.

https://doi.org/10.7554/eLife.100623.4.sa0

Significance of the findings:

Valuable: Findings that have theoretical or practical implications for a subfield

Landmark
Fundamental
Important
Valuable
Useful

Strength of evidence:

Solid: Methods, data and analyses broadly support the claims with only minor weaknesses

Exceptional
Compelling
Convincing
Solid
Incomplete
Inadequate

During the peer-review process the editor and reviewers write an eLife Assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife Assessments

Abstract
Introduction
Results
Discussion
Methods
Appendix 1
Data availability
References
Article and author information
Metrics

Abstract

Cells must adopt flexible regulatory strategies to make decisions regarding their fate, including differentiation, apoptosis, or survival in the face of various external stimuli. One key cellular strategy that enables these functions is stochastic gene expression programs. However, understanding how transcriptional bursting, and consequently, cell fate, responds to DNA damage on a genome-wide scale poses a challenge. In this study, we propose an interpretable and scalable inference framework, DeepTX, that leverages deep learning methods to connect mechanistic models and single-cell RNA sequencing (scRNA-seq) data, thereby revealing genome-wide transcriptional burst kinetics. This framework enables rapid and accurate solutions to transcription models and the inference of transcriptional burst kinetics from scRNA-seq data. Applying this framework to several scRNA-seq datasets of DNA-damaging drug treatments, we observed that fluctuations in transcriptional bursting induced by different drugs were associated with distinct fate decisions: 5′-iodo-2′-deoxyuridine treatment was associated with differentiation in mouse embryonic stem cells by increasing the burst size of gene expression, while low- and high-dose 5-fluorouracil treatments in human colon cancer cells were associated with changes in burst frequency that corresponded to apoptosis- and survival-related fate, respectively. Together, these results show that DeepTX enables genome-wide inference of transcriptional bursting from single-cell transcriptomics data and can generate hypotheses about how bursting dynamics relate to cell fate decisions.

Introduction

Cells must employ flexible regulatory strategies to determine their fate, including cell differentiation, survival, or apoptosis, in response to diverse external stimuli (López-Maury et al., 2008; Stadhouders et al., 2019). One key cellular strategy that enables both beneficial and detrimental functions is the stochastic nature of gene expression programs, arising from intricate biochemical processes and termed ‘noise’ (Raj and Oudenaarden, 2008; Balázsi et al., 2011; Losick and Desplan, 2008). Noise can arise at many steps—from stochastic transcription factor binding and chromatin remodeling to variability in RNA transcription and protein translation. Among these diverse sources, transcriptional noise sits at the very start of the central dogma, converting upstream molecular fluctuations into the mRNA outputs that drive all downstream regulatory and functional processes. Transcriptional noise originates from a discontinuous pattern, wherein mRNA production occurs through alternating active and inactive gene states (Eling et al., 2019; Tunnacliffe and Chubb, 2020). This phenomenon, known as ‘transcriptional bursting’, has been observed in numerous experiments across both prokaryotic and eukaryotic cells (Golding et al., 2005; Chubb et al., 2006; Larson et al., 2011). The burst kinetics, characterized by burst frequency (BF) and burst size (BS), describe the stochastic behavior of molecular interactions at the gene expression process, offering valuable insights into how cells encode and leverage variability. However, it remains an unresolved question how cells make fate decisions in response to external signals through genome-wide regulation of transcriptional burst mechanisms (Lammers et al., 2020; Rodriguez and Larson, 2020).

DNA damage is a crucial and prevalent factor contributing to cellular responses, inducing varying degrees of cytotoxicity and thereby influencing diverse cell fate decisions (Krenning et al., 2019; Su, 2006; Hafner et al., 2019). For instance, low cytotoxicity limits cell proliferation (van den Berg et al., 2019; Arora et al., 2017; Barr et al., 2017; Deng et al., 2023), whereas heightened cytotoxicity prompts cell differentiation and senescence (Müllers et al., 2014; Feringa et al., 2018; Toledo et al., 2008; Zhao et al., 2023). Furthermore, extreme scenarios of high cytotoxicity can cause cells to engage in the apoptotic process (Yousefzadeh et al., 2021; Carneiro and El-Deiry, 2020; Roos and Kaina, 2013; Zheng et al., 2018). Experiments have shown that DNA damage can disrupt the gene transcription process (Lans et al., 2019). Specifically, DNA damage slows down the movement of RNA Pol II along the DNA strand (Muñoz et al., 2009). When more severe DNA damage is encountered, RNA Pol II exhibits a sliding pause behavior for DNA repair before resuming transcription (Figure 1A; Gregersen and Svejstrup, 2018; Geijer and Marteijn, 2018; Giono et al., 2016). These changes in RNA Pol II movement behavior would further reflect the cellular regulation of gene expression, particularly burst kinetics (Friedrich et al., 2019; Calia et al., 2023). A recent study has shown that cellular responses to DNA damage involve the regulation of BF on specific genes, rather than a genome-wide conclusion (Friedrich et al., 2019). They have demonstrated that cells respond to DNA damage by increasing gene expression noise, but they do not delve into the topic of transcriptional bursts (Calia et al., 2023; Desai et al., 2021). Therefore, the question of how DNA damage causes cells to regulate transcriptional burst kinetics on a genome-wide scale remains unresolved.

Figure 1

Download asset Open asset

Overview of the DeepTX framework.

(A) DNA-damaging drugs induce DNA double-strand damage, which will slow down or even stop the movement of RNA Pol II on the DNA double strands. Changes in the state of RNA Pol II movement will lead to changes in the kinetics of gene expression bursts (including burst frequency and burst size), which will affect cell fate decisions such as apoptosis, differentiation, and survival. (B) The input to the DeepTX framework is scRNA-seq data, and the output is the burst kinetics corresponding to each gene. The core of the DeepTX inference framework is a hierarchical model, which is a mixture of the stationary distribution of the mechanism model solved by the neural network (Step 1) and the binomial distribution followed by the sequencing process. This hierarchical model is used to infer the underlying dynamics parameters corresponding to the data distribution (Step 2).

To understand the dynamics of transcriptional bursts, ideally, one needs to observe the fluctuations of gene expression over a continuous time interval, such as smFISH, scNT-seq, and HT-smFISH (Femino et al., 1998; Qiu et al., 2020; Safieddine et al., 2023). However, most existing techniques can only observe a limited number of genes, making their application to genome-wide studies challenging. Fortunately, single-cell RNA sequencing (scRNA-seq) technology provides excellent opportunities to explore this question, emerging as the leading method for genome-wide mRNA measurements and revealing gene expression noise within individual cells (Tanay and Regev, 2017; Picelli et al., 2013; Zheng et al., 2017). Many studies have employed scRNA-seq to elucidate the diverse sources of gene expression noise and the fundamental principles of transcriptional dynamics on a genome-wide scale (Eling et al., 2019; Faure et al., 2017; Morgan and Marioni, 2018; Ochiai et al., 2020). However, scRNA-seq data provide only static snapshots of cellular states. To infer the underlying transcriptional burst kinetics, one can apply mathematical models of stochastic gene expression, typically under the assumption that the observed distributions of scRNA-seq data reflect the steady-state outcome of the underlying dynamic processes (Larsson et al., 2019; Luo et al., 2023a; Luo et al., 2023b).

For mathematical modeling, the gene expression model for DNA damage ought to encompass more sophisticated generalized properties compared to the traditional telegraph model, which is a common model describing gene expression burst kinetics by genes switching stochastically between active (ON) and inactive (OFF) states with a single-step process (exponentially distributed waiting times) (Stumpf et al., 2017; Voss and Hager, 2014). However, the presence of DNA damage necessitates modeling the transcriptional process as a multi-step process, rather than a single-step process, to capture the additional complexity introduced by the damage (Singh et al., 2013; Cavallaro et al., 2021). Many efforts in this field are addressing the challenges of describing this multi-step process of gene expression in an interpretable and tractable model. One common modeling approach is to introduce multiple intermediate states between inactivate and activate states (Zhang et al., 2012; Zhang and Zhou, 2014; Zhou and Zhang, 2012). Although this multi-state model can fit the experimental data better, it is hard to rationalize the state numbers and parameters (Desai et al., 2021; Rodriguez et al., 2019; Zoller et al., 2015). Another alternative modeling approach is direct non-Markovian modeling for non-exponential waiting times in gene states, which maps the multiple parameters from multi-state models to a small number of interpretable parameters that are easily observed experimentally (Soltani et al., 2015; Kumar et al., 2015; Schwabe et al., 2012; Stinchcombe et al., 2012; Zhang and Zhou, 2019a; Zhang and Zhou, 2019b). However, the ensuing difficulty is solving the analytic solution of the steady distribution from a non-Markovian model, which may require some numerical simulation methods that are computationally resource-intensive and time-consuming, such as Monte Carlo methods (Sisson et al., 2007).

Statistical inference, particularly in recovering burst kinetic parameters from genome-wide scRNA-seq data, necessitates efficient and scalable inference algorithms (Gómez-Schiavon et al., 2017). Therefore, the stochastic dynamical system must be quickly solved as parameters are continuously updated throughout the inference process. Deep learning exhibits a broad array of applications as a contemporary method for addressing complex systems (Jiang et al., 2021; Wang et al., 2019; Michoski et al., 2020). For example, some studies employed neural networks to establish the mapping relationship between a wide range of parameters in the gene expression model and the corresponding pre-simulated steady distribution (Figure 2B; Jiang et al., 2021; Wang et al., 2019; Davis et al., 2020; Tang et al., 2023b). This approach, based on neural networks, has been utilized for parameter inference in both deterministic and stochastic models (Jiang et al., 2021; Gaskin et al., 2023; Sukys et al., 2022; Tang et al., 2023a).

Figure 2 with 2 supplements see all

Download asset Open asset

Solving the TXmodel using deep learning.

(A) With the TXmodel and corresponding model parameters $θ$ , we obtain a steady distribution using the stochastic simulation algorithm (SSA) simulation and the moment statistics employing binomial moment theory. These results are compared with the distribution and moment statistics of the neural network’s output to calculate the neural network’s loss value. The parameters of the neural network are optimized using gradient descent until convergence is achieved, indicating attainment of optimal parameters. (B) Comparison between SSA results and DeepTX prediction results for four sets of test parameters. The gray bars and the red stepped line correspond to the distributions obtained from SSA simulations and DeepTX predictions, respectively. (C) Verification results of the moment statistics predicted by DeepTX and the true moments on a test set containing 1000 elements. (D) The loss curve during DeepTX training. The blue curve represents that the loss function is composed of Kullback–Leibler (KL) divergence of distributions, and the red curve represents that the loss function is composed of KL divergence of distributions and statistics. (E) Box plots of the Hellinger distance between the true distribution and the predicted distribution by DeepTX of different loss types on the test set.

In this study, we introduce DeepTX, a deep learning inference framework that integrates mechanistic models and deep learning methods to elucidate the genome-wide regulation of transcriptional bursts in DNA damage response using scRNA-seq data. The DeepTX framework comprises two modules: DeepTXsolver and DeepTXinferrer. Specifically, DeepTX initially employs the DeepTXsolver module to efficiently solve complex gene expression dynamics models using neural network architectures. It subsequently utilizes the DeepTXinferrer module to accurately infer potential transcriptional burst kinetic parameters using Bayesian methods. DeepTX demonstrates good performance on synthetic datasets. Furthermore, to investigate genome-wide regulation of transcriptional bursts in response to DNA damage, we applied the DeepTX framework to three DNA damage-related scRNA-seq datasets representing cell differentiation, apoptosis, and cell survival. As a result, we observed that the fate decision of mouse embryonic stem cells (mESCs) to undergo cell differentiation in response to DNA damage caused by 5′-iodo-2′-deoxyuridine (IdU) compounds was associated with enhanced burst size (BS enhancer) of genes linked to delayed mitosis phase transition, which may reflect regulatory programs related to reprogramming and differentiation. In colorectal cancer cells, low-dose 5-fluorouracil (5FU) treatment was associated with increased burst frequency (BF enhancer) of genes connected to intracellular oxidative stress, coinciding with apoptosis-related processes. By contrast, high-dose 5FU treatment was associated with BF enhancer genes enriched in telomerase-related pathways, which may mitigate oxidative stress and correlate with cellular resistance. In conclusion, DeepTX is a computational framework with utility and extensibility to infer transcriptional dynamics from scRNA-seq data.

Results

Overview of DeepTX framework

To understand how cell influences underlying transcriptional mechanisms in response to DNA damage, as described in the introduction section (Figure 1A), we propose an effective computational inference framework, referred to as DeepTX (Figure 1B). DeepTX takes as input a set of scRNA-seq datasets and then outputs genome-wide burst kinetics parameters with the benefit from deep learning methods (Figure 1B). The inference process of the DeepTX framework is composed of two crucial modules, DeepTXsolver and DeepTXinferrer. The first module, DeepTXsolver, is to solve for the stationary distribution of a gene expression dynamic model (Figure 1B, middle). A neural network is constructed in DeepTXsolver to approximate the mapping from model parameters to the corresponding stationary distribution. As in many biological contexts, there is a need to model dynamics that describe more realistic gene expression processes that are difficult to solve. For instance, gene expression under DNA damage is better represented by multi-step transcriptional models, which more faithfully reflect the underlying regulatory complexity. Moreover, sequencing noise should be integrated into the modeling framework (see Methods). The second module, DeepTXinferrer, infers the burst kinetics parameters (Figure 1B, middle). Using a trained neural network from DeepTXsolver, we can easily and quickly obtain the stationary distribution (called model distribution) of any parameters within a reasonable parameter space. DeepTXinferrer compares the model distribution to the stationary distribution of each gene in the scRNA-seq data (called data distribution) as the parameters are iterated until convergence and utilizes the Bayesian inference to output the posterior distribution of burst kinetic parameter (Figure 1B, right). We will describe these two modules in detail in the following two sections.

DeepTX solves the transcriptional model using deep learning

To enable the recovery of kinetic information from static scRNA-seq data, we first performed rational mechanistic modeling of gene expression processes with DNA damage. The gene transcription burst process is often characterized using the traditional two-state model (Singh et al., 2013; Cavallaro et al., 2021), which assumes that genes switch randomly between OFF and ON states with exponentially distributed waiting times for each state. However, the gene expression process is inherently a multi-step process, which particularly cannot be neglected under conditions of DNA damage. DNA damage can result in slowing or even stopping the RNA Pol II movement (Lans et al., 2019) and cause many macromolecules to be recruited for damage repair. This process will affect the spatially localized behavior of the promoter (Lans et al., 2019), causing the dwell time of promoter inactivation and activation that cannot be approximated by a simple two-state. We therefore used a generalized model we developed previously (Luo et al., 2023a), called TXmodel here, which extended the state waiting time to an arbitrary distribution, that is, the gene expression system is a non-Markovian system (Figure 2A and see Methods Modeling stochastic gene expression processes). More specifically, the waiting times for the transitions between the OFF and ON states are represented by two random variables, assumed to follow the arbitrary distributions $f_{off} (k_{off}, r_{off})$ and $f_{on} (k_{on}, r_{on})$ , respectively. Additionally, we assumed that the rates of mRNA synthesis and degradation are constants, that is, the waiting times for mRNA transcription and degradation follow exponential distributions with rate parameters $k_{syn}$ and $k_{deg}$ , respectively (Figure 2A, left). Nevertheless, this non-Markovian gene expression system is difficult to solve stationary distribution analytically, while numerical solution methods are time-consuming.

For that reason, we utilized a deep learning approach, which has been demonstrated to be effective in solving stochastic systems (Gupta et al., 2021), to construct a mapping from the parameter of the mechanistic TXmodel to its corresponding stationary distribution. More specifically, we aimed to train a fully connected neural network referred to as DeepTXsolver, whose inputs are the parameters of the TXmodel $θ$ and whose output is a stationary distribution parameterized by a mixed negative binomial distribution $P_{neuralnet}$ , which has good performance in approximating the solution of the chemical master equation (Perez-Carrasco et al., 2020; Öcal et al., 2022; Figure 2, Figure 2—figure supplement 1 and see Methods Training neural network to solve gene expression model). To generate effective training sets for DeepTXsolver, we first generated a large number of parameter sets within a reasonable range of parameter space by using Sobol sampling (see Methods Training neural network to solve gene expression model; Sobol’, 1967). Subsequently, for each parameter set, we employed a modified stochastic simulation algorithm (SSA) for non-Markovian TXmodels to generate numerous samples (Figure 2A and see Methods Training neural network to solve gene expression model). The resulting stationary distributions $P_{simulation}$ served as training labels. Additionally, we demonstrated that the arbitrary order binomial moments of the TXmodel can be analytically solved iteratively within the context of queueing theory (Zhang et al., 2021; Zhang et al., 2024). This enables the straightforward computation of significant summary statistics, including mean and Fano factor (Figure 2A, see Methods Modeling stochastic gene expression processes, and Appendix Model analysis). Thus, our loss function comprises two primary components: (1) the Kullback–Leibler (KL) divergence between the predicted distribution generated by the neural network $P_{neuralnet}$ and the labeled distribution generated by the SSA $P_{simulation}$ , which is a widely used metric for quantifying the difference between two probability distributions; (2) the logarithmic error of the predicted summary statistics computed by the neural network $\hat{s}$ and the labeled summary statistics $s$ computed by theoretical derivation (Figure 2A and see Methods Training neural network to solve gene expression model). Furthermore, we systematically investigated different model architectures and hyperparameter settings to identify the configuration that achieves the best predictive performance (see Methods Training neural network to solve gene expression model and Figure 2—figure supplement 2).

Consequently, the loss function of DeepTXsolver converges effectively during the training process. We utilized DeepTXsolver to predict the stationary distribution using parameters from the test set. We observed that it effectively fits all four representative distributions from the TXmodel, encompassing unimodal distribution at zero point, unimodal distribution at non-zero point, bimodal distribution with one peak at zero point, and bimodal distribution with both peaks at non-zero point (Figure 2B). These distributions have been extensively linked to cell fate decisions in numerous experiments (Gupta et al., 2011; Cohen et al., 2008; Bessarabova et al., 2010). Moreover, we demonstrated that DeepTXsolver accurately predicted crucial distribution properties, including mean, Fano factor, and bimodal coefficient, with a high correlation between theoretical and predicted summary statistics ( $r > 0.99$ , $p -value < 2.2 \times 10^{- 16}$ , $t$ -test, Figure 2C). It is noteworthy that, to assess the impact of the presence or absence of summary statistics on DeepTXsolver, we performed separate experiments using different loss functions. We demonstrated that incorporating summary statistics into the model led to quicker convergence of DeepTXsolver on the training set (Figure 2D) and yielded more robust and accurate predictions on the test set (Figure 2E).

Overall, DeepTXsolver effectively establishes mappings from parameters to stationary distributions. This approach circumvents the high computational resource requirements of classical simulation algorithms and ensures the algorithm’s efficiency and scalability in subsequent statistical inference, particularly in genome-wide inference.

DeepTX infers genome-wide transcriptional kinetics from scRNA-seq data

Having obtained the trained neural network model, we can rapidly compute the stationary distributions of any TXmodel parameters. These distributions represent the outcomes of the underlying true gene expression process, referred to as the ‘true distribution’ $P_{model}$ (Figure 3A). However, during the inference process, the gene expression distributions measured by the scRNA-seq data we utilized, termed the ‘observed distribution of the data’ $P_{data}$ , were subject to noise, including errors stemming from the measurement technique (Sarkar and Stephens, 2021; Figure 3A). Thus, we introduced a mechanistic hierarchical model to bridge the gap between the true and observed distributions. This hierarchical model considers that the observed data results from the interplay between the gene expression process and the measurement process. It combines the two distributions using a convolutional form $P_{data} (Y = y) = \int_{0}^{\infty} P_{measure} (y | x) P_{model} (x) d x$ (see Methods Inferring burst kinetics from scRNA-seq data with DeepTXinferrer). $P_{model} (x)$ represents the mixed negative binomial distribution approximated, whereas for $P_{measure} (y | x)$ , we opted for the binomial distribution. This choice is informed by the fact that it can approximate hypergeometric distributions representing sequencing sampling without replacement when the sample size is sufficiently large. Consequently, the final distribution can maintain the form of a mixed negative binomial distribution (Appendix Derivation of hierarchical negative binomial mixture distribution), denoted as the observed distribution of the model $P_{data}$ (Figure 3A).

Figure 3 with 4 supplements see all

Download asset Open asset

Inferring TXmodel from scRNA-seq data.

(A) The dynamic parameters are sampled from a given prior distribution, and as input to the neural network, the solution of the corresponding dynamic model can be obtained by using this parameter. The solution of the dynamic model is mixed with the binomial distribution to obtain the model observed distribution. Loss values were obtained using the observed distribution of the data obtained from the scRNA-seq data compared to the observed distribution of the model. The loss values are optimized until convergence using gradient descent to obtain the parameters of the mechanism model and the posterior distribution of its parameters. (B) The scatters represent the real BS and BF values of the stochastic simulation algorithm (SSA) synthetic set parameters, and the depth of the color represents the error between the inferred BS (BF) and the true BS (BF). (**C, D**) The blue solid line represents the edge density of the burst kinetics of the model, and the red dashed line represents the true value. (E) The marginal density for the five parameters of the model, where the red dashed line represents the true value. The relative error (RE) quantifies the discrepancy between the true parameter and the peak of the inferred posterior distribution.

Leveraging the hierarchical model, we can construct a scalable and interpretable module called DeepTXinferrer, facilitating genome-wide inference of transcriptional burst kinetics from scRNA-seq data. First, we input the prior distribution of mechanistic model parameters and a set of scRNA-seq data, and we can get the observed distribution of the model $P_{obsmodel}$ (derived successively from the trained neural network and the hierarchical model) and the observed distribution of the data $P_{obsdata}$ , respectively (Figure 3A). Second, we used the Hellinger distance $L (θ) = D (P_{obsmodel}, P_{obsdata})$ to measure the difference between $P_{obsmodel}$ and $P_{obsdata}$ (see Methods Inferring burst kinetics from scRNA-seq data with DeepTXinferrer). It is worth noting that the loss function remains solvable via gradients, despite the inclusion of complex computations, such as neural networks and hierarchical models. Specifically, to avoid the gradient-based optimization process from getting trapped in local optima, we employ a black-box optimization approach to select the initial point for parameter inference (Das and Suganthan, 2010). Consequently, we utilize a gradient-based optimizer to update the iterative parameters until convergence. Lastly, we employ a method based on loss potentials, which computes the posterior distribution of the parameters during the optimization of the loss function (Gaskin et al., 2023; Figure 3A, see Methods Inferring burst kinetics from scRNA-seq data with DeepTXinferrer). Subsequently, this process yields results of the transcriptional burst kinetics derived from the estimated parameters of the mechanistic model.

We assessed the effectiveness of DeepTXinferrer by conducting validation on synthetic data (see Methods Inferring burst kinetics from scRNA-seq data with DeepTXinferrer). Synthetic samples were generated for each parameter set, and our inference algorithm was then applied to derive the output transcriptional burst kinetics (see Methods Inferring burst kinetics from scRNA-seq data with DeepTXinferrer). As a result, the estimation of transcriptional burst kinetics demonstrated high accuracy and strong correlation across most parameter regions ( $r > 0.99$ , $p -value < 2.2 \times 10^{- 16}$ of $t$ -test; Figure 3—figure supplement 1), particularly within the region of fully expressed mRNA abundance (Figure 3B). Moreover, the estimated posterior distribution of each parameter suggests that the peak value closely approximates the true value (Figure 3C–E; Figure 3—figure supplement 2). Additionally, we verified the robustness of our algorithm concerning the number of samples, observing that higher sample numbers corresponded to increased accuracy in estimation (Figure 3—figure supplement 1). We additionally examined the stability of inferred burst phenotypes under varying capture efficiencies and found consistently strong correlations between inferred burst size and BF across capture efficiency settings (Figure 3—figure supplement 3). Compared to previous work (Larsson et al., 2019; Luo et al., 2023a; Tang et al., 2023b; Gu et al., 2025), DeepTX demonstrates superior performance in both efficiency and accuracy (Figure 3—figure supplement 4). In summary, our inference algorithm exhibited strong performance on synthetic data, affirming its reliability for application to real-world data in inferring transcriptional bursts.

In the subsequent sections, we will employ the DeepTXinferrer to explore how cells determine diverse fates in response to DNA damage stimuli through genome-wide regulation of transcriptional bursting. We analyzed three sets of scRNA-seq data representing distinct cell fates: cell differentiation, apoptosis, and survival. The first set of data is a mutually controlled scRNA-seq dataset of mESC cell differentiation induced by IdU drug treatment. This dataset contains 12,481 genes of transcriptomes from 812 cells and 13,780 genes of transcriptomes from 744 cells, respectively. The second set of data is a mutually controlled scRNA-seq dataset of human colon cancer cells treated with low-dose 5FU drug that causes apoptosis. This dataset contains 8534 genes of transcriptomes from 1673 cells and 7077 genes of transcriptomes from 632 cells, respectively. The third set of data is a mutually controlled scRNA-seq dataset of human colon cancer cells in which high-dose 5FU drug treatment resulted in cell-resistant survival. This dataset contains 8534 genes of transcriptomes from 1673 cells and 6661 genes of transcriptomes from 619 cells, respectively.

DeepTX reveals burst size enhancement induced by IdU treatment associated with cell differentiation

The administration of the IdU drug, a thymine analogue, influences the gene expression process within cells, subsequently impacting cell differentiation (Desai et al., 2021; Li et al., 2024). The perturbation in the gene expression process primarily stems from DNA damage, as IdU is randomly integrated into the DNA chain, thereby inducing DNA damage (Li et al., 2024). Although DNA damage caused by IdU treatment has little impact on genome-wide mean gene expression, it increases variance across the genome, thereby affecting cell differentiation (Desai et al., 2021). This is different from the understanding that increases in variance are usually caused by fluctuations in the mean gene expression (Newman et al., 2006). Meanwhile, the mean can be deconvolved as the product of BS and BF. This raises the question of the relationship between bursting dynamics and variance on a genome-wide scale, and how changes in bursting dynamics affect cell differentiation.

We used the DeepTX framework to infer a set of preprocessed scRNA-seq data (see Methods IdU and DMSO dataset) to obtain underlying burst kinetics (Figure 4). The difference between the statistics and distribution obtained by inference and the statistics and distribution of sequencing data is minimal, ensuring the correctness of the inference results (Figure 4—figure supplement 1). Subsequently, we observed considerable variations in BS and BF across different genes, with some genes exhibiting minimal changes in mean expression but pronounced differences in variance (Figure 4, Figure 4—figure supplement 2a–f). Notably, genes with significantly increased BS exhibit corresponding decreases in BF, suggesting relative stability in the average gene expression level (Figure 4—figure supplement 2d-f). Additionally, we observed a significant increase in transcriptional BS for most genes in the treatment group (Figure 4A). This indicates that the rise in variance primarily results from the enhanced BS induced by DNA damage (Figure 4B). Notably, this conclusion is consistent with the results of theoretical analysis (Appendix 2). To further explore the biological characteristics of BS enhancement, we identified genes with upregulated BS using differential analysis methods. Further, we perform GO enrichment analysis (The Gene Ontology Consortium, 2015) on the identified genes. We observed that the genes with enhanced BS revealed significant enrichment in terms related to mitotic cell cycle checkpoint signaling, alongside pathways involved in DNA damage response and cell cycle transitions (Smits and Medema, 2001; Figure 4D). Specifically, the mitotic checkpoint serves as a crucial safeguard to ensure accurate chromosome segregation and maintain genomic stability under DNA damage conditions. Activation of the mitotic checkpoint can influence cell fate decisions and differentiation potential. Sustained activation of the spindle assembly checkpoint has been reported to induce mitotic slippage and polyploidization, which in turn may enhance the differentiation potential of embryonic stem cells. Subsequently, we performed Gene Set Enrichment Analysis (GSEA) (Subramanian et al., 2005) on the identified differentially expressed genes associated with BS and also observed enrichment in pathways related to cell cycle transitions and mitotic intra-S DNA damage checkpoint signaling (Figure 4E, F). Similarly, we subjected genes significantly enhanced by BF in the treatment group (Figure 4—figure supplement 2f) to GO analysis (Figure 4—figure supplement 2g). While it is tempting to hypothesize that enhanced BS may contribute to DNA damage-related checkpoint activation and thereby influence cell cycle progression and differentiation, our current results only indicate an association between burst size enhancement and pathways involved in DNA damage response and checkpoint signaling. This is consistent with previous experimental observations showing that IdU treatment induces DNA damage and alters cell fate decisions through effects on the cell cycle (Rosina et al., 2019; Riccio et al., 2022; Liu et al., 2017; Figure 4C). Additionally, research indicates that enhanced cell differentiation results from delayed cell cycle transitions (Rosina et al., 2019).

Figure 4 with 2 supplements see all

Download asset Open asset

Burst size enhancement is linked to a delay in the cell cycle.

(A) Density map of BS of genes treated with 5′-iodo-2′-deoxyuridine (IdU) drugs compared to BS of genes not treated with drugs. Green dots represent upregulated BS differential genes. (B) Schematic representation of the simultaneous enhancement of both BS and noise of a gene accompanied by the mean value of the gene remaining unchanged after IdU drug treatment. In the log–log–log 3D space, the mean expression level lies on a blue plane defined by $\log (B S) + \log (B F) = \log (M e a n)$ , indicating that it is determined by the product of burst size and burst frequency. The orange plane captures a scaling relationship between expression noise and burst kinetics, described by $\log (B S) + \log (B F) = κ \log (N o i s e)$ , where $κ$ is a constant reflecting their covariation. Notably, the trajectory of the green sphere shows that, under a fixed mean expression level (i.e., confined to the blue plane), increased gene expression noise primarily results from an increase in burst size. (C) The schematic diagram illustrates the hypothesis that DNA damage induced by IdU treatment may affect gene transcription, potentially influencing cell cycle transitions during mitosis and regulating cellular differentiation. (D) Gene GO enrichment analysis is performed on the green dots of BS in (A) to obtain enrichment pathway diagram. The bold red pathways highlight the key components involved in the cellular differentiation mechanism. (**E, F**) Pathways obtained by GSEA enrichment analysis of BS of genes.

DeepTX reveals BF enhancement under low-dose 5FU treatment and its association with apoptosis

The chemotherapeutic compound, 5FU, exerts its cytotoxic effects on colorectal cancer cells by inflicting damage to their DNA (Kuipers et al., 2015; Bunz et al., 1999; Chang et al., 2014). This damage triggers apoptosis, a process of programmed cell death, thus inhibiting the growth and proliferation of malignant cells (Kuipers et al., 2015; Bunz et al., 1999; Chang et al., 2014). However, the impact of 5FU-induced DNA damage on the gene expression process and its potential relationship to cell apoptosis remains incompletely understood. In this section, we employ the DeepTX to investigate the effect of 5FU-induced DNA damage on the gene expression process in colorectal cancer cells with scRNA-seq data (Park et al., 2020). First, we inferred the burst kinetics from the scRNA-seq data of cells treated with low doses (10 μM) of 5FU and controls (Figure 5). After preprocessing the data (see Methods 5FU dataset), we inferred the scRNA-seq data to obtain a distribution that fits the scRNA-seq data (Figure 5—figure supplement 1), as well as the underlying burst kinetics (Figure 5A, B). Further, we found that the mean difference between the two groups of scRNA-seq data was not significant ( $p -value = 0.5$ of $t$ -test), but the variance and BS were significantly different ( $p -value < 0.05$ of $t$ -test; Figure 5A, Figure 5—figure supplement 2a–c). Then, we performed a differential analysis of the BS and BF of gene expression between the control cells and the cells treated with 10 μM 5FU, and then performed GO gene enrichment analysis on differential genes (Figure 5C, Figure 5—figure supplement 2d–f). We found that BS downregulated differential genes were mainly enriched in apoptosis pathways (Hongmei, 2012), electron exchange pathways, and oxidation-related pathways (Figure 5E, Figure 5—figure supplement 2g). Specifically, the enrichment analysis indicated that although the burst size of genes in 5FU-treated cells was generally larger than that in the control group, the genes primarily influencing cell apoptosis exhibited a downregulated BS.

Figure 5 with 2 supplements see all

Download asset Open asset

Low-dose 5-fluorouracil (5FU) treatment is linked to increased burst frequency and cell apoptosis.

(A) Comparison of gene bursting kinetics and statistics between control and 10-dose 5FU treated cells. (B) Density map of inferred burst size. (C) Volcano map of the fold changes (FC) of inferred burst size between control and 10-dose 5FU treated cells. (D) The diagram hypothesizing the potential association between low-dose 5FU treatment and the induction of apoptosis. (E) The enrichment analysis results were derived from genes that were downregulated in BS and upregulated in BF in the experimental group. The bold red pathways highlight the key components involved in the cellular apoptotic mechanism.

These enrichment results are in line with previously reported biological processes (Figure 5D). First, studies in recent years have found that many anticancer drugs trigger apoptosis and necrosis of cancer cells by activating the production of reactive oxygen species (ROS) (Sun and Rigas, 2008). And experiments have shown that 5FU can induce ROS production in colorectal cancer cells (oxidation-related pathways) (Laha et al., 2015; Bwatanglang et al., 2016; Chenna et al., 2022). Meanwhile, mitochondria are the main source of ROS production, and defects in the mitochondrial electron transport chain will increase ROS production (electron exchange pathways) (Adam-Vizi, 2005). As a result, excessive production of ROS can cause mitochondrial DNA damage and nuclear DNA damage (Figure 5D). In particular, when mitochondrial DNA is damaged, mitochondrial damage repair will further increase the pressure of ROS (Figure 5D; Shokolenko et al., 2014). Overall, our results support an association between 5FU-induced burst kinetics changes and pathways implicated in oxidative stress and apoptosis (Figure 5D; Handali et al., 2018; Hwang et al., 2001).

DeepTX reveals BF enhancement under high-dose 5FU treatment and its association with survival pathways

Low-dose 5FU treatment can induce cell apoptosis, while research shows high-dose 5FU treatment can induce cell resistance and evade apoptosis to survive (Kuranaga et al., 2001). Hence, this part mainly studies the impact of differences in burst kinetics of gene expression between high-dose (50 μM) 5FU-treated and control colorectal cancer cells on cell fate decisions.

First, we inferred the burst kinetics from the scRNA-seq data of cells treated with high doses (50 μM) of 5FU and controls (Figure 6). After data preprocessing (see Methods 5FU dataset), we inferred the scRNA-seq data to obtain the distribution that fits the scRNA-seq data (Figure 6—figure supplement 1), as well as the underlying burst kinetics (Figure 6A). We similarly observed the phenomenon that the average expression value of genes remained stable, while there was an enhanced increase in the genome-wide variance (Figure 6—figure supplement 2a–c). Further, we performed a difference analysis on the BS of the two sets of inferred data and selected genes with large differences in BS and BF for GO gene enrichment (Figure 6A, B, Figure 6—figure supplement 2d–f). We found that the BS downregulated differentially expressed genes were mainly enriched in the Cajal body-related pathway, the electron exchange pathway, and telomerase-related pathways (Nabetani and Ishikawa, 2011), but the apoptosis-related pathways were not significant (Figure 6D). In particular, according to experimental studies, telomerase-related pathways can activate telomerase and play an important role in the elongation of telomeres (Cristofari et al., 2007). These enrichment results suggest that high-dose 5FU treatment may be associated with processes such as telomerase activation and mitochondrial function maintenance, both of which have been implicated in cell survival and apoptosis evasion in previous experimental studies. For example, prior work indicates that Human Telomerase Reverse Transcriptase (hTERT) translocation can activate telomerase pathways to support telomere maintenance and reduce oxidative stress, which is thought to contribute to apoptosis resistance (Kuranaga et al., 2001). Overall, the observed transcriptional bursting changes are consistent with these reported survival-associated mechanisms.

Figure 6 with 2 supplements see all

Download asset Open asset

High-dose 5-fluorouracil (5FU) treatment may promote cell drug resistance by altering burst frequency.

(A) Density map of inferred burst size. (B) Volcano map of the fold changes (FC) of inferred burst size between control and 50-dose 5FU treated cells. (C) Schematic diagram hypothesizing the mechanism by which 5FU drug treatment may induce cells to acquire antioxidant properties, potentially enabling continued survival. (D) The enrichment analysis results were derived from genes that were downregulated in BS and upregulated in BF in the experimental group. (E) UMAP of G0 arrest quality score (QS) of control and 50-dose 5FU treated cells. (F) Proportion of G0 arrested (drug-resistant) and cycling (non-resistant) cells. (G) Comparison of gene bursting kinetics and statistics of G0 arrested genes between control and 50-dose 5FU treated cells. The light gray lines indicate the trend of changes in the corresponding bursting kinetics values, mean, and variance for each gene.

Gene enrichment results are consistent with existing ways of cells evading apoptosis (Figure 6C). That is, endogenous ROS production exceeds the cellular antioxidant defense capacity, leading to chemical damage to mitochondrial DNA. Meanwhile, high doses of 5FU cause hTERT translocation, which on the one hand triggers telomerase activity and maintenance pathways and enables cells to evade apoptosis (Kuranaga et al., 2001). On the other hand, this translocation will enrich hTERT in mitochondrial DNA, thereby reducing oxidative stress and mitochondrial DNA damage to protect mitochondria and protect cells from apoptosis (Lipinska et al., 2017).

Resistance to therapy has also been associated with G0 arrest, a non-proliferative state characteristic of persister cells (Wiecek et al., 2023). To verify the presence of resistant cells in cells treated with high doses of 5FU, we selected genes related to G0 arrest and performed gene set variation analysis (Hänzelmann et al., 2013) to obtain the G0 arrest quality score (Wiecek et al., 2023) of each cell and perform UMAP dimensionality reduction visualization (Figure 6E). Through arrest quality score discrimination, we found that the proportion of resistant cells was high among cells treated with high doses of 5FU (Figure 6F). This result was consistent with the results of our enrichment analysis. Specifically, by comparing the BS and mean values of these genes, we found that most of the BS of G0-arrest-related genes were upregulated, but the mean changes were small (Figure 6G).

Discussion

Gene expression is inherently stochastic, and transcriptional bursting contributes to inter-individual variability in gene expression patterns (Dar et al., 2012; Zenklusen et al., 2008; Phillips et al., 2012), although it represents only one of multiple sources of cellular heterogeneity. Experimental observations confirm this bursting mechanism as a cellular response to environmental changes, including DNA damage (Friedrich et al., 2019; Desai et al., 2021), but its precise role in DNA damage response and subsequent cell fate decisions remains an emerging hypothesis. In this study, we introduced a mechanism-based deep learning framework (DeepTX) for performing genome-wide inference on transcriptional burst kinetics of a comprehensive gene expression model using scRNA-seq data. DeepTX provides an efficient computational approach to characterize burst kinetics across genes, enabling the generation of testable hypotheses about how transcriptional bursting may be associated with DNA damage response and potential changes in cellular state.

Utilizing DeepTX, we analyzed three sets of scRNA-seq data comprising control and stimulus pairs driving different cell fate decisions: cell differentiation, apoptosis, and survival. This analysis unveiled significant insights into the underlying mechanisms of transcriptional burst kinetics. First, the IdU drug treatment-induced enhancement of BS in genes appears to be associated with a delayed transition in the cell mitosis phase, which may in turn be related to changes in cell reprogramming and differentiation. Interestingly, while prior studies have suggested that noise enhancement contributes to cell fate decisions, our observations indicate that BS enhancement may reflect a deeper regulatory feature correlated with cell differentiation. Second, low-dose 5FU treatment is associated with increased oxidative stress and an accompanying elevation in BF, which may relate to enhanced apoptotic activity. Consistent with previous findings linking apoptosis to ROS stress, our study suggests that ROS-induced DNA damage is correlated with changes in gene expression burst kinetics, which may contribute to the apoptotic outcomes observed. Third, high-dose 5FU treatment induces telomerase extension, mitigating oxidative stress and fostering drug resistance by upregulating BF. Despite prior evidence of drug-resistant cells, our enrichment analysis underscores telomerase extension’s role in resisting oxidative stress, underscoring the impact of BF enhancement.

We highlight several advantages of our DeepTX framework, which differs significantly from previous work that utilizes scRNA-seq data to study diverse noise sources from several aspects (Eling et al., 2019; Faure et al., 2017; Morgan and Marioni, 2018; Ochiai et al., 2020). First, DeepTX employs a mechanistic hierarchical model that is both interpretable and extensible. To enhance the model’s realism in interpreting scRNA-seq data, we developed a novel approach integrating the sequencing process and the underlying gene expression process. Specifically, we employed the TXmodel, a complex non-Markovian stochastic dynamic model that extends the waiting times for gene OFF and ON states into a non-exponential distribution, reflecting the multi-step nature of gene regulation. As a result, TXmodel not only captures gene expression dynamics under DNA damage but also depicts other multi-step regulatory processes, such as chromatin opening (Lomvardas and Thanos, 2002), preinitiation complex formation (Fuda et al., 2009), transcription factors binding (Nordick et al., 2022). Second, despite the mathematical complexity of the mechanistic hierarchical model, DeepTX renders it tractable. This is achieved by (1) directly mapping model parameters to corresponding stationary distributions via a neural network trained on extensive simulated single-cell RNA data and theoretical analysis summary statistics for TXmodel; (2) approximating the stationary distribution output of TXmodel, coupled with the sequencing process, by a mixed negative binomial distribution, maintaining the final distribution format. Third, DeepTX’s inference process is scalable, allowing genome-wide bursting kinetics inference from static scRNA-seq data. Despite the computational complexity of parameter inference involving hierarchical models and trained neural networks, DeepTX retains auto-differentiation for efficient and accurate gradient computation.

The DeepTX research paradigm contributes to a growing line of area aiming to link mechanistic models of gene regulation with scRNA-seq data. Maizels provided a comprehensive review of computational strategies for incorporating dynamic mechanisms into single-cell transcriptomics (Maizels, 2024). In this context, RNA velocity is one of the most important examples as it infers short-term transcriptional trends based on splicing kinetics and deterministic ODE models. However, such approaches are limited by their deterministic assumptions and cannot fully capture the stochastic nature of gene regulation. DeepTX can be viewed as an extension of this framework to stochastic modeling, explicitly addressing transcriptional bursting kinetics under DNA damage. Similarly, DeepCycle, developed by Sukys and Grima (Sukys and Grima, 2025) investigates transcriptional burst kinetics during the cell cycle, employing a stochastic age-dependent model and a neural network to infer burst parameters while correcting for measurement noise. By contrast, MIGNON integrates genomic variation data and static transcriptomic measurements into a mechanistic pathway model (HiPathia) to infer pathway-level activity changes, rather than gene-level stochastic transcriptional dynamics (Garrido-Rodriguez et al., 2021). In this sense, DeepTX and MIGNON are complementary, with DeepTX resolving burst kinetics at the single-gene level and MIGNON emphasizing pathway responses to genomic perturbations, which could inspire future extensions of DeepTX that incorporate sequence-level information.

The DeepTX research paradigm is poised for extension to encompass both dynamic models and biological data. On the one hand, our focus is currently confined to the transcriptional processes influenced by DNA damage, limiting the explanatory scope of the TXmodel to larger-scale biological phenomena such as alternative splicing (Wan et al., 2021; Gorin and Pachter, 2022), protein translation (Luo et al., 2023a; Golan-Lavi et al., 2017), epigenetic modification (Nicolas et al., 2018), and chromatin movement (Liu et al., 2016; Wang et al., 2024; Wang et al., 2023). Moreover, further development is needed to incorporate additional regulatory factors, including signaling pathways (Garrido-Rodriguez et al., 2021), cell cycle progression, cell size (Sukys and Grima, 2025), and leaky transcription (Kepler and Elston, 2001). These broader biological processes necessitate more intricate dynamic models, often challenging to solve and thus challenging to apply to realistic biological data.

On the other hand, inference from static scRNA-seq relies on the steady-state assumption, which has some limitations. First, in some scenarios, the cell system may exhibit highly transient transcriptional programs that do not satisfy stationarity, leading to biased or misleading parameter estimates. For example, immediately following a synchronized developmental stimulus such as serum shock-induced activation of immediate-early genes, transcription levels increase significantly (Hope et al., 1992). Second, because DeepTX infers the mean BF and size across the population, it cannot recover the underlying time-resolved dynamics or distinguish heterogeneous kinetic subpopulations. Third, lack of time-resolved measurements may affect the accuracy of inferences about dynamic parameters, especially the unidentifiability of parameters inferred from steady-state distributions, that is, multiple parameters leading to the same steady-state distribution. The unidentifiability of individual parameters is a common and critical problem in systems biology studies. Hence, in DeepTX, we employed a Bayesian approach based on loss potential to infer the posterior distributions of the parameters. Although DeepTX also encounters the issue of unidentifiability for individual parameters (Figure 3—figure supplement 2), the multimodal nature of the posterior distribution suggests that multiple distinct parameter sets can produce similarly good fits to the observed data, highlighting the inherent non-identifiability of the model. Nevertheless, in the multimodal posterior distribution, at least one of the posterior peaks aligns closely with the ground truth, thereby demonstrating the validity of the inferred result. In future work, integrating time-series single-cell measurements with other emerging data types could help overcome these limitations. Examples include increasingly comprehensive transcriptome data with both temporal and spatial resolution (Wen and Tang, 2022), as well as state-of-the-art multi-omics data (Maizels, 2024; Miao et al., 2021). Moreover, another limitation of the current implementation is that DeepTX is only trained on simulated datasets. While providing controlled benchmarks for mechanistic validation, it may not fully recapitulate the heterogeneity and stochastic complexity of real biological systems. In the future work, publicly available annotated scRNA-seq datasets could be used to complement this simulation-based training strategy and enhance generalizability.

In conclusion, DeepTX provides a step toward inferring complex transcriptional kinetics from scRNA-seq data through mechanism-based deep learning inference. It sheds light on the augmented burst kinetics in DNA damage response across various cell fate decisions, thereby offering novel insights.

Methods

The DeepTX framework

Framework overview

The DeepTX framework aims to infer the underlying bursting dynamics from static scRNA-seq data to explore the impact of changes in bursting dynamics on cell fate decisions. The DeepTX framework includes two modules, DeepTXsolver and DeepTXinferrer. DeepTXsolver can efficiently and accurately solve transcription models using neural network architecture, and DeepTXinferrer can infer scRNA-seq data to obtain its underlying transcriptional burst kinetics using Bayesian methods.

The input of the DeepTXsolver is the parameters of the mechanism model, and the output is the corresponding stationary distribution solution of the model. The input to the DeepTXinferrer is the discrete probability distribution of gene expression obtained from scRNA-seq data, and the output of the model is the parameters of the mechanism model.

The construction of the DeepTX framework consists of the following parts: (1) Model the gene expression process. (2) Train a neural network that solves the mechanism model using the dataset generated by SSA simulation. (3) Hierarchical model modeling to infer underlying burst kinetic parameters from scRNA-seq data.

When building the DeepTX Framework, we made the following assumptions: (1) The gene expression of cells was in a stationary distribution during sequencing. This assumption has been widely adopted in scRNA-seq studies, as it effectively captures mRNA expression distributions and enables the inference of underlying dynamic parameters (Larsson et al., 2019; Luo et al., 2023a; Ramsköld et al., 2024; Gupta et al., 2022). (2) Gene expression counts of the same cell type follow the same distribution. Given that cells of the same type typically share similar transcriptional programs, their gene expression distributions are approximately identical, supporting the validity of this commonly used assumption. (3) The solution of the model can be approximated by a mixed negative binomial distribution. Since the stationary solution of the chemical master equation can be expressed as a Poisson mixture, which can be more efficiently approximated using negative binomial mixtures (Gorin et al., 2024). (4) The state space sampled from a sufficiently long single simulation is statistically equivalent to that obtained from multiple simulations at steady state in gene expression models (Kuntz et al., 2021).

Modeling stochastic gene expression processes

To recover dynamic transcriptional burst kinetic parameters from static scRNA-seq data, we require the stochastic modeling approach to profile the gene expression process. The traditional telegraph model is often used to model gene transcriptional bursting, which assumes that switching between genes follows an exponential distribution (i.e., switching is a single-step process) (Peccoud and Ycart, 1995; Kim and Marioni, 2013). However, this assumption is an oversimplification of the gene expression process in the context of DNA damage. Due to the fact that DNA damage slows down and even stops the elongation of RNA Pol II across the double-strand DNA (Lans et al., 2019), the gene expression process should be regarded as a multi-step process. This property causes the waiting time for the gene switching between active and inactive states to follow a non-exponential distribution (Soltani et al., 2015; Zhang and Zhou, 2019a). Therefore, we use TXmodel (Luo et al., 2023a) to better characterize the gene transcription process in the context of DNA damage (Figure 2), whose chemical reaction network (CRN) can be written as follows:

\begin{aligned} R_{1} & : O F F \overset{f_{o f f} (t, ω_{o f f})}{\to} O N \\ R_{2} & : O N \overset{f_{o n} (t, ω_{o n})}{\to} O F F \\ R_{3} & : O N \overset{k_{s y n}}{\to} O N + m R N A \\ R_{4} & : m R N A \overset{k_{d e g}}{\to} \emptyset, \end{aligned}

where $R_{1}$ and $R_{2}$ describe the gene-state switching randomly between the ON and OFF states, and $f_{o f f} (t, ω_{o f f})$ and $f_{o n} (t, ω_{o n})$ are the distribution of dwell time in ON and OFF states, respectively. Specifically, dwell time distribution refers to the probability distribution of the time in which a gene remains in a particular transcriptional state (ON or OFF) before transitioning to the other state. While the gene is in the active state, the synthesis of mRNA molecules is governed by a Poisson process with a constant synthesis rate $k_{s y n}$ ( $R_{3}$ ). Meanwhile, each mRNA molecule can be stochastically degraded ( $R_{4}$ ) with a constant degradation rate $k_{d e g}$ . Overall, the TXmodel can be determined by a set of parameters, denoted as $θ = (ω_{o f f}, ω_{o n}, k_{s y n}, k_{d e g})$ .

The TXmodel as an extension of the traditional telegraph model can capture more realistic transcriptional burst kinetics, with BF and BS being two important parameters. The definition of BF is

B F = \frac{1}{⟨ τ_{o n} ⟩ + ⟨ τ_{o f f} ⟩},

where $⟨ τ_{o f f} ⟩ = \int_{0}^{\infty} t f_{o f f} (t, ω_{o f f}) d t$ and $⟨ τ_{o n} ⟩ = \int_{0}^{\infty} t f_{o n} (t, ω_{o n}) d t$ are the mean dwell time in OFF and ON states, respectively. BF represents the average time it takes for the promoter to complete one full stochastic cycle between its active and inactive states. This definition is slightly different from the traditional definition $1 / (k_{d e g} * τ_{o f f})$ , which can be regarded as a simplified version of our definition (Equation 2), under the assumptions that $τ_{o n}$ is negligible and $k_{d e g} = 1$ (i.e., rate parameters are normalized to be dimensionless). Although it is reasonable to neglect activation time $τ_{o n}$ , as it is typically much shorter than inactive time under some conditions, we chose a more complete way to define the BF so that it is applicable to more general situations. Additionally, Equation 2 has been widely used in recent literature (Ramsköld et al., 2024; Zoller et al., 2018; Hoppe et al., 2020). The definition of BS is

B S = k_{s y n} ⟨ τ_{o n} ⟩ .

BS refers to the average number of mRNA transcripts produced during a single transcriptional activation event of a gene. Notably, the mean transcription level of mRNA $⟨ y ⟩$ can be expressed as $⟨ y ⟩ = k_{s y n} ⟨ τ_{o n} ⟩ / (⟨ τ_{o n} ⟩ + ⟨ τ_{o f f} ⟩) = B S \times B F$ .

Without loss of generality, the gamma distribution can be used to characterize non-exponential distributions, that is, $f_{o f f} (t, ω_{o f f}) = r_{o f f}^{k_{o f f}} t^{k_{o f f} - 1} e^{- r_{o f f} t} / Γ (k_{o f f})$ such that $ω_{o f f} = (r_{o f f}, k_{o f f})$ and $f_{o n} (t, ω_{o n}) = r_{o n}^{k_{o n}} t^{k_{o n} - 1} e^{- r_{o n} t} / Γ (k_{o n})$ such that $ω_{o n} = (r_{o n}, k_{o n})$ , where $Γ (\cdot)$ is Gamma function, $r_{o f f}$ and $r_{o n}$ are the rate-parameters of state switching, $k_{o f f}$ and $k_{o n}$ are the possible number of reaction steps between OFF and ON state. The rationale for employing the gamma distribution lies in the complex, multi-step nature of promoter state transitions. Biologically, both the ‘ON’ and ‘OFF’ states of a promoter encompass multiple underlying sub-states, with transitions governed by a series of molecular events, each characterized by exponentially distributed waiting times. When aggregated, these sequential exponential processes yield non-exponential overall waiting time distributions. The gamma distribution, being a natural analytical extension of such a convolution of multiple exponential distributions, provides a mathematically and biologically appropriate approximation for such processes. Here, in the TXmodel, the state-switching waiting time follows a non-Markovian distribution (i.e., it is not exponentially distributed), indicating that the system exhibits non-Markovian dynamics.

It is a challenge to solve the stationary distribution of this gene expression system directly from a theoretical approach. But we proved that the binomial moments of this system can be obtained in an iterative form, by rationalizing this gene expression system as a queueing theory model. Thus, we can theoretically compute some important summary statistics $s (θ)$ given a parameter $θ$ , such as mean, variance, kurtosis, and bimodal coefficient (see details in Appendix 1.2 Modeling stochastic gene expression processes).

Training neural network to solve gene expression model

The stationary distribution of CRN (1) is difficult to be solved analytically. In general, SSAs are used to approximate the solution of chemical master equations (Boguñá et al., 2014; Masuda and Rocha, 2018). However, the inference process involves updating the parameters, and each update requires a simulation to obtain the stationary distribution, which makes statistical inference time-consuming. We note that deep learning methods may have potential applications in solving this challenge because they are widely used for approximating the solution of partial differential equations, chemical master equations, and non-Markovian chemical master equations (Jiang et al., 2021; Wang et al., 2019; Davis et al., 2020; Gupta et al., 2021; BarSinai et al., 2019).

Therefore, in this part, we used a neural network to approximately solve CRN (1). This enables CRN (1) to be solved efficiently and accurately. Details will be introduced in the following part.

Generating synthetic training dataset

To construct the mapping from the parameters of the TXmodel to the stationary distribution, we first need to generate the neural network’s training set, that is, a large-scale range of parameter sets as features and corresponding probability distributions as labels. For the features, we used the Sobol algorithm (Sobol’, 1967) to conduct large-scale sampling to generate the neural network dataset, which can cover the entire parameter space more evenly than random sampling.

For the labels, we perform the SSA of the TXmodel to obtain the stationary distribution given a set of preset parameters. For clarity, the synthetic training dataset is denoted by ${(θ_{i}, P_{s i m u l a t i o n, i})}$ , where $θ_{i}$ presents the model parameters of TXmodel, and $P_{s i m u l a t i o n, i}$ represents the stationary distribution solution obtained by SSA (Figure 2A). Specifically, the value range of each parameter component of $θ_{i}$ is $k_{o f f} \in (1, 15)$ , $r_{o f f} \in (0.1, 10)$ , $k_{o n} \in (1, 15)$ , $r_{o n} \in (0.01, 10)$ , and $k_{s y n} \in (0.1, 400)$ . The experimentally measured values of the mean duration of the OFF state, burst duration, transcriptional burst sizes, and the number of intermediate steps in transcriptional state transitions fall within the specified parameter ranges (Tunnacliffe and Chubb, 2020; Gupta et al., 2022).

Deep learning model architecture

We build the basic deep learning model architecture in DeepTX via fully connected neural networks, which has been effectively used to solve approximate solutions from biochemical reaction networks (Sukys et al., 2022; Gupta et al., 2021). The deep learning model architecture consists of an input layer, hidden layers, and an output layer.

First, the input layer contains five neurons corresponding to the parameters of the mechanism model $θ_{i} = (k_{o f f, i}, r_{o f f, i}, k_{o n, i}, r_{o n, i}, k_{s y n, i})$ . Second, the input layer is followed by $n_{h i d d e n}$ hidden layers with each layer containing $n_{n e u r o n s} = (n_{n e u r o n s, 1}, \dots, n_{n e u r o n s, n_{h i d d e n}})$ neurons. For the neurons in each hidden layer, we implement a nonlinear mapping by the commonly used ReLU activation function (Figure 2—figure supplement 1):

\begin{matrix} h_{i j}^{(1)} = R e L U (w_{1 j}^{(1)} k_{o f f, i} + w_{2 j}^{(1)} r_{o f f, i} + w_{3 j}^{(1)} k_{o n, i} + w_{4 j}^{(1)} r_{o n, i} + w_{5 j}^{(1)} k_{s y n, i}), (j = 1, 2, \dots, n_{n e u r o n s, 1}), \\ h_{i j}^{(2)} = R e L U (w_{1 j}^{(2)} h_{i 1}^{(1)} + w_{2 j}^{(2)} h_{i 2}^{(1)} + \dots + w_{n_{n e u r o n s, 1} j}^{(2)} h_{i, n_{n e u r o n s, 1}}^{(1)}), (j = 1, 2, \dots, n_{n e u r o n s, 2}), \\ ⋮ \\ h_{i j}^{(n_{h i d d e n})} = R e L U (w_{1 j}^{(n_{h i d d e n})} h_{i 1}^{(n_{h i d d e n} - 1)} + \dots + w_{n_{n e u r o n s, n_{h i d d e n} - 1} j}^{(n_{h i d d e n})} h_{i, n_{n e u r o n s, n_{h i d d e n} - 1}}^{(n_{h i d d e n} - 1)}), (j = 1, 2, \dots, n_{n e u r o n s, n_{h i d d e n}}), \end{matrix}

where $h_{i j}^{(n_{h i d d e n})}$ represents the value of the $j$ th neuron of the $n_{h i d d e n}$ th hidden layer of the $i$ th group of parameters, $w_{k j}^{(n_{h i d d e n})}$ represents the neural network weight connecting the $k$ th neuron of the $n_{h i d d e n}$ th hidden layer and the $j$ th neuron of the $(n_{h i d d e n} - 1)$ th hidden layer.

Third, in the output layer, we aim to allow neurons to approximate synthetic scRNA-seq data generated by TXmodel. Negative binomial distributions are a class of distributions that are widely used in modeling scRNA-seq data and have been shown to account for both the ‘drop-out’ phenomenon and the fact that the Fano factor is greater than 1 in scRNA-seq data (Risso et al., 2018; Love et al., 2014). Also, the negative binomial distribution has been employed as an approximate stationary solution to the TXmodel for many gene expression systems (Sukys et al., 2022; Öcal et al., 2022). We assign weights and parameters $(a_{i j}, m_{i j}, l_{i j})$ to the output layer neurons (Figure 2—figure supplement 1):

\begin{matrix} h_{i j}^{(o u t p u t)} = w_{1 j}^{(o u t p u t)} h_{i 1}^{(n_{h i d d e n})} + w_{2 j}^{(o u t p u t)} h_{i 2}^{(n_{h i d d e n})} + \dots + w_{n_{n e u r o n s, n_{h i d d e n}} j}^{(o u t p u t)} h_{i, n_{n e u r o n s, n_{h i d d e n}}}^{(n_{h i d d e n})}, (j = 1, 2, \dots, n_{o u t p u t}), \\ (a_{i 1}, a_{i 2}, \dots, a_{i n_{N B}}) = S o f t m a x (h_{i 1}^{(o u t p u t)}, \dots h_{i, 3 * j + 1}^{(o u t p u t)}, \dots, h_{i, 3 * n_{N B} - 2}^{(o u t p u t)}), \\ m_{i j} = E x p (h_{i, 3 * n_{N B} - 1}^{(o u t p u t)}), j = 1, \dots, n_{N B}, \\ l_{i j} = S i g m o i d (h_{i, 3 * n_{N B}}^{(o u t p u t)}), j = 1, \dots, n_{N B}, \end{matrix}

where $h_{i j}^{(o u t p u t)}$ represents the value of the $j$ th neuron of the output layer of the $i$ th set of parameters, $a_{i j}$ is the weight coefficient of the negative binomial distribution, satisfying $\sum_{j = 1}^{n_{N B}} a_{i j} = 1$ , $l_{i j}$ , $m_{i j}$ represents the $j$ th parameter of the $j$ th negative binomial distribution of the mixed negative binomial distribution, $n_{N B}$ is the number of negative binomial distributions. And the mixed negative binomial distributions can be expressed as (Figure 2—figure supplement 1):

P_{n e u r a l n e t, i} (n_{c o u n t s}; w (θ_{i})) = \sum_{j = 1}^{n_{N B}} a_{i j} (w (θ_{i})) \cdot N B (n_{c o u n t s}; m_{i j} (w (θ_{i})), l_{i j} (w (θ))),

where $N B (N = n_{c o u n t s}; l, m) = (\binom{n_{c o u n t s} + m - 1}{n_{c o u n t s}}) l^{n_{c o u n t s}} (1 - l)^{m}$ .

Loss function of DeepTXsolver

Our loss function includes two parts: one part is the KL divergence between the mixed negative binomial distribution generated by neural network $P_{n e u r a l n e t, i}$ and label distribution $P_{s i m u l a t i o n, i}$ generated by SSA, and the other part is the log difference of the moment statistics between theoretical solution of the TXmodel $s_{i j}$ and the output of the neural network ${\hat{s}}_{i j}$ :

L o s s = \sum_{i = 1}^{n_{b a t c h}} [K L (P_{s i m u l a t i o n, i}, P_{n e u r a l n e t, i}) + λ \sum_{j = 1}^{n_{s t a t s}} {(\log (\frac{s_{i j}}{{\hat{s}}_{i j}}))}^{2}],

where $K L (P_{s i m u l a t i o n, i}, P_{n e u r a l n e t, i}) = \sum_{n} P_{s i m u l a t i o n, i} (n) \log (P_{s i m u l a t i o n, i} (n) / P_{n e u r a l n e t, i} (n))$ , and $λ$ is the hyperparameter, represents the weight coefficient of the statistical loss, and $λ = 0.1$ , $n_{b a t c h}$ represents the number of samples, where each sample is characterized by $(θ_{i}, P_{s i m u l a t i o n, i})$ in each batch, and $n_{s t a t s}$ is the number of summary statistics. For the latter part of the loss function, we can still obtain the moment statistic by theoretical derivation, although the analytic distribution is difficult to solve. Adding moment statistics to the loss function can make the neural network more robust (Figure 2E). Specifically, we use four statistics here: $s_{i 1}$ indicates the mean, $s_{i 2}$ denotes the Fano factor, $s_{i 3}$ denotes kurtosis, $s_{i 4}$ represents the bimodal coefficient. Specifically, the Fano factor, which normalizes variance by the mean, provides a robust measure of transcriptional noise across genes or conditions with varying expression levels. It’s worth mentioning that the statistics here are scalable and we can add more information inside the loss function.

Training details of DeepTXsolver

Effective model training necessitates the selection of an appropriate optimizer to minimize the loss function. In our case, we employed the Adam optimizer (Kingma and Ba, 2014), a widely adopted choice in deep learning due to its robust performance. And each time, a small batch of samples is selected to update the neural network parameters $w$ (Goodfellow et al., 2016), and the iteration of all samples is called an epoch. Furthermore, we utilized the Glorot Uniform method to establish an optimal initial point for our neural network parameter (Glorot and Bengio, 2010).

Batch size and learning rate are key optimization hyperparameters affecting model convergence. A smaller batch size enhances model generalizability but lengthens training time; hence, a batch size of 64 is chosen for balance (Keskar et al., 2016). In the gradient descent process, the learning rate significantly influences the step size, where large rates may induce instability or divergence, and conversely, small rates may precipitate premature convergence to local optima. To counteract this, a decaying learning rate is used, allowing larger early updates for faster convergence and preventing instability in later stages. Further, training of the model stops if it has been trained for over 200 epochs or if the learning rate has been reduced five times. These criteria are adaptive and efficient as further training does not significantly enhance the model’s performance.

Hyperparameter tuning of DeepTXsolver

To obtain a neural network model with accurate prediction and generalization, we compared the model architectures in terms of the number of neurons per layer, the number of hidden layers, and the number of neurons in the output layer, and compared the performance of the model with different dataset sizes. And we applied the Hellinger distance between the true distribution and the predicted distribution by the neural network as an evaluation criterion.

First, we compared the performance of architectures with a single hidden layer with (Chubb et al., 2006; Gaskin et al., 2023; Laha et al., 2015; Subramanian et al., 2005) neurons and found that the performance of the model barely increased after increasing to 128 neurons (Figure 2—figure supplement 2a). Second, after comparing the performance of architectures with different numbers of hidden layers, we found that a single hidden layer has the best performance (Figure 2—figure supplement 2b). Third, comparing the performance of $3 i$ ( $i = 1, 2, \dots, 10$ ) neurons in the output layer, we found that after 12 neurons (4 negative binomial distribution mixtures), the performance of the model was not improved. Particularly, four negative binomial distribution mixtures are able to fit the bimodal nature of the distribution as well as ensure good training efficiency (Öcal et al., 2022).

Finally, to obtain a dataset that contributes to convergence as well as model generalization while ensuring training efficiency, we compared the model performance for dataset sizes of (500, 1000, 5000, 10,000, and 15,000) and found that increasing the dataset size further does not significantly improve the model performance beyond 15,000 samples (Figure 2—figure supplement 2c). Therefore, we identified a single hidden layer of 128 neurons and an output layer of 12 neurons as the appropriate model architecture, and 15,000 samples as the appropriate training set.

Inferring burst kinetics from scRNA-seq data with DeepTXinferrer

This section describes how to recover dynamic burst kinetics from static scRNA-seq data. In principle, once we have constructed the mapping from parameters to their corresponding stationary distributions through the pre-trained neural network in the Training neural network to solve gene expression model, we can quickly access the corresponding likelihood functions when updating the parameters during the statistical inference process. However, observed scRNA-seq data is the result of a combination of gene expression processes and sequencing processes, and the current pre-trained neural network can only yield a description of the gene expression process modeled by TXmodel. Therefore, we require extending it into a hierarchical model to depict the more realistic generation mechanism behind the scRNA-seq data (Sarkar and Stephens, 2021).

Modeling observed scRNA-seq data with a hierarchical model

The hierarchical model aims to couple two different processes, specifically, the true gene expression mechanism process and the sequencing process that immediately follows. In terms of a formula, the probability distribution of observations can be represented as a convolution:

P (Y = y) = \int_{0}^{\infty} P_{m e a s u r e} (y ∣ x) P_{m o d e l} (x) d x,

where $P_{m e a s u r e}$ is the measure model and $P_{m o d e l}$ is the expression model.

First, we model the sequencing process. Each cell can be regarded as a pool of mRNA, and $X_{c g}^{(t r u e)}$ represents the true expression amount of the $g$ th gene in the $c$ th cell. $Y_{c g}^{(o b s e r v e d)}$ refers to the observed value obtained by scRNA-seq from the given true expression value $X_{c g}^{(t r u e)}$ , satisfying the following conditional probability distribution:

Y_{c g}^{(o b s e r v e d)} ∣ X_{c g}^{(t r u e)} \sim P_{m e a s u r e} (y ∣ x),

where $P_{m e a s u r e} (y ∣ x)$ characterizes all noise in the sequencing process, and is generally assumed to be Binomial distribution or Poisson distribution, which has been proven experimentally and theoretically (Sarkar and Stephens, 2021; Wang et al., 2018). By incorporating an additional sampling probability, denoted as $α_{c g}$ , into the sequencing process, we are able to characterize capture efficiency. We make the assumption that intercellular molecules are independent of one another, and that only proportional products are captured and sequenced, following a Binomial distribution:

Y_{c g}^{(o b s e r v e d)} ∣ X_{c g}^{(t r u e)} \sim P_{m e a s u r e} (y ∣ x, α_{c g}),

where $α_{c g}$ is the sampling probability. Although the experimental capture efficiency ranges from 0.06 to 0.32 (Zheng et al., 2017; Macosko et al., 2015), we fixed the parameter $α_{c g} = 0.5$ to minimize complexity and unidentifiability issues. Further, we demonstrated that inference across different efficiencies (0.5, 0.3, 0.2) consistently yielded strong correlations between inferred burst size and BF.

Second, we model the true gene expression process, assuming that it satisfies the following probability distribution:

X_{c g}^{(t r u e)} \sim P_{m o d e l} (x) .

The selection of the true gene expression probability distribution is very critical. It needs to meet the following conditions: (1) It can well characterize the transcriptional burst kinetics. (2) The inference derived from this distribution necessitates both efficiency and precision.

In the Modeling stochastic gene expression processes, we have constructed a model of the gene expression process with DNA damage, enabling the model’s stationary distribution to encapsulate the bursty dynamics inherent in gene expression. The solution to the model is approximated by a mixed negative binomial distribution, which is obtained from a trained neural network. Importantly, this mixed negative binomial distribution enables efficient inference. Hence,

P_{m o d e l} (x) = P_{n e u r a l n e t} (x, θ) .

Further, the following hierarchical model can be obtained:

\begin{matrix} Y_{c g}^{(o b s e r v e d)} ∣ X_{c g}^{(t r u e)} & \sim B i n o m i a l (y_{c g} ∣ x_{c g}, α_{c g}), \\ X_{c g}^{(t r u e)} & \sim \sum_{j = 1}^{n_{N B}} a_{j} {N B}_{j} (x_{c g}; m_{j} (θ), l_{j} (θ)), \end{matrix}

where $X_{c g}^{(t r u e)}$ represents the random variable of the true expression value, $Y_{c g}^{(o b s e r v e d)}$ represents the random variable of the observed expression value, $B i n o m i a l$ represents the binomial distribution, $N B$ represents negative binomial distribution, and $α_{c g}$ represents the sampling probability. Further, we can obtain

Y_{c g}^{(o b s e r v e d)} \sim \sum_{x_{c g} = 1}^{n_{c o u n t}} (B i n o m i a l (y_{c g} ∣ x_{c g}, α_{c g}) \sum_{j = 1}^{n_{N B}} a_{j} {N B}_{j} (x_{c g}, m_{j} (θ), l_{j} (θ))) .

It is proved that $Y_{c g}^{(o b s e r v e d)}$ also follows a mixed negative binomial distribution (Appendix 1.3 Training neural network to solve gene expression model), denoted as $P_{o b s m o d e l}$ .

Estimation and optimization process of DeepTXinferrer

We can use the DeepTX framework to infer scRNA-seq data to obtain the corresponding gene expression burst kinetics through optimization methods. Given the expression counts of gene $g$ in all cells, its corresponding probability distribution $P_{o b s d a t a}$ can be obtained. We use the Hellinger distance to compare the error between the true probability distribution $P_{o b s d a t a}$ of genes and the model probability distribution $P_{m o d e l d a t a}$ . The following optimization problem can be obtained:

\hat{θ} = \underset{θ}{\arg min} \sum_{m = 0}^{n_{c o u n t}} {(\sqrt{P_{o b s m o d e l} (y_{c g} = m; θ)} - \sqrt{P_{o b s d a t a} (y_{c g} = m)})}^{2},

where $n_{c o u n t}$ is the maximum value of mRNA expression and $P_{o b s d a t a}$ is the density probability for each gene $g$ . Specifically, $P_{o b s m o d e l}$ is a function that is differentiable with respect to parameter $θ_{g}$ . We can use the gradient descent method to obtain the optimal parameters of the TXmodel and the corresponding distribution of the TXmodel.

Posterior distribution

The optimization method in the previous section can obtain the parameters of the mechanism model that satisfies the distribution of scRNA-seq data. In this section, we give a method to solve the confidence interval of each optimized parameter (Gaskin et al., 2023).

Each iteration of the optimization process can obtain the estimated value ${\hat{θ}}^{(k)}$ ( $k = 1, \dots, n_{i t e r a t i o n}$ ) and loss value $J (\hat{θ})$ corresponding to the estimated value. In particular, according to (Equation 15), the loss is calculated as follows:

J ({\hat{θ}}^{(k)}, θ) = J^{(k)} ({\hat{P}}_{o b s m o d e l} ({\hat{θ}}^{(k)}), P_{o b s d a t a}) = \sum_{m = 0}^{n_{c o u n t}} {(\sqrt{{\hat{P}}_{o b s m o d e l} (y = m; {\hat{θ}}^{(k)})} - \sqrt{P_{o b s d a t a} (y = m)})}^{2} .

The following posterior probability can be obtained:

π ({\hat{θ}}^{(k)} ∣ P_{o b s d a t a}) \sim \frac{n^{(k)} \exp (- J^{(k)} ({\hat{P}}_{o b s m o d e l}, P_{o b s d a t a}))}{\sum_{k = 1}^{n_{i t e r a t i o n}} \exp (- J^{(k)} ({\hat{P}}_{o b s m o d e l}, P_{o b s d a t a}))} π^{(0)} ({\hat{θ}}^{(k)}),

where $π^{(0)}$ is the prior, we take it as uniform density, $n^{(k)}$ represents the number of values ${\hat{θ}}^{(k)}$ takes during the iteration process, and $\exp (- J ({\hat{P}}_{m i x t u r e}, P_{s i m u l a t i o n}))$ represents the loss potential of each iteration. Let the posterior probability be $P_{p o s t e r i o r} ({\hat{θ}}^{(k)})$ , and the marginal distribution corresponding to each component of parameter ${\hat{θ}}^{(k)}$ can be obtained:

p ({\hat{θ}}_{j}^{(k)}) \sim \int P_{p o s t e r i o r} ({\hat{θ}}^{(k)}) d {\hat{θ}}_{- j}^{(k)} .

The subscript $- j$ means that we integrate the components of ${\hat{θ}}^{(k)}$ except ${\hat{θ}}_{j}^{(k)}$ . In particular, to allow the iterative parameters to traverse the entire parameter space, multiple optimizations can be performed with different initial values to improve the performance of the posterior distribution.

Validation on synthetic data

We conducted an evaluation of the inference performance of the DeepTX framework on synthetic scRNA-seq data, focusing on three key aspects. (2) Model accuracy: This refers to the minimal discrepancy between the distribution corresponding to the inferred parameters and the input distribution. (2) Parameter identifiability: This denotes the proximity of the values of the inferred parameters to the true parameters, suggesting that the model can accurately identify the true parameters. (3) Result robustness: This implies that the outcomes of each inference do not exhibit significant variations.

We created a synthetic dataset ${(θ_{i}^{s y n}, P_{i}^{s y n})} (i = 1, 2, \dots, 500)$ using Sobol sampling and SSA under preset parameters. The synthesized distribution $P_{i}^{s y n}$ is used as input to the DeepTX framework, and upon optimization, we obtained estimated parameters and their corresponding distribution ${(θ_{i}^{e s t}, P_{i}^{e s t})}$ . We showed that the Hellinger distance between the estimated and synthetic dataset’s distribution was minimal, indicating accurate estimations (Figure 3—figure supplement 1a). Additionally, the burst kinetics of both the synthetic data and the estimated data can be computed using their respective parameters, $θ_{i}^{s y n}$ and $θ_{i}^{e s t}$ . We showed that a strong correlation was observed between the synthetic burst kinetics $γ^{s y n} = ({B F}^{s y n}, {B S}^{s y n})$ and estimated burst kinetics $γ^{e s t} = ({B F}^{e s t}, {B S}^{e s t})$ (Figure 3—figure supplement 1b, c), suggesting that burst kinetics are identifiable.

Subsequently, we conducted a robustness verification of the framework’s ability to infer burst kinetics. The error was computed as follows:

E r r o r (γ^{s y n}, γ^{e s t}) = {(\log ({B F}^{s y n}) - \log ({B F}^{e s t}))}^{2} + {(\log ({B S}^{s y n}) - \log ({B S}^{e s t}))}^{2} .

Given a set of parameters $θ_{i}^{s y n}$ , we drew samples of sizes (100, 500, 1000, 1500, and 2000) from the probability distribution $P_{i}^{s y n}$ corresponding to $θ_{i}^{s y n}$ . For each sample size, we performed 50 repetitions, and then the DeepTX framework was used to infer the distribution obtained from each sampling, setting different initial values each time. We found that the inferred results are robust, and the robustness increases with the increase in the number of samples (Figure 3—figure supplement 1d, e).

Dataset

IdU and DMSO dataset

To investigate the effects of IdU on genome-wide gene expression, Desai et al. performed scRNA-seq on mESCs, both with and without IdU treatment. Consequently, they obtained scRNA-seq data for 12481 genes of transcriptomes from 812 cells and 13,780 genes of transcriptomes from 744 cells, respectively. The authors reported that IdU drug treatment increased transcriptional noise genome-wide but did not affect transcriptional mean, further affecting cell fate decisions.

We used the DeepTX framework to infer this set of scRNA-seq data to obtain potential transcriptional burst kinetics and further study the impact of changes in transcriptional burst kinetics on cell fate decisions. To eliminate the impact of technical noise on data inference, we preprocess the data as follows. First, we normalized the data $y_{c g}$ using Seurat, where $y_{c g}$ represents the expression level of the gene $g$ in the cell $c$ . The normalization process is that we multiplied each $y_{c g}$ by a scaling factor $S = y_{n o r m a l i z e d} / \sum_{g = 1}^{n_{g e n e s}} y_{c g}$ , where $y_{n o r m a l i z e d} = 50, 000$ . Particularly, the scaling factor $S$ is related to the total expression of genes in each cell. Second, after normalizing the expression, we divided it into bins with a unit of 1 to obtain the bin where each expression value is located. Finally, to eliminate the impact of low-expression genes on the inference process, we filtered out genes whose average gene expression is less than 1. After data processing, there are 6082 genes from 812 cells left in our IdU and 6082 genes from 744 cells left in DMSO datasets.

5FU dataset

To study the impact of 5FU drug-induced DNA damage on cell fate decisions, Park et al. performed scRNA-seq on colon cancer cells treated with 0, 10, and 50 μM 5FU drug (Park et al., 2020) and obtained 8534 genes from 1673 cells, 7077 genes from 632 cells, and 6661 genes from 619 cells, respectively. The authors found that colon cancer cells treated with different doses of 5FU exhibit different transcriptional phenotypes, and these different phenotypes correspond to different cell fate decisions (Park et al., 2020).

We used the DeepTX framework to infer these three sets of scRNA-seq data to obtain potential transcriptional burst kinetics, thereby studying the impact of changes in transcriptional burst kinetics caused by different doses of 5FU treatment on cell fate decisions. To eliminate the impact of technical noise on data inference, we preprocess the data in the following steps.

First, we normalize $y_{c g}$ by multiplying a scale factor $S$ , where $S = y_{n o r m a l i z e d} / \sum_{g = 1}^{n_{g e n e s}} y_{c g}$ , $y_{n o r m a l i z e d} = 2500$ . Second, we divide the normalized data into intervals of unit 1 to round the data. Finally, to eliminate the impact of low-expression genes on the inference process, we filtered out genes whose average gene expression is less than 1. After data processing, there are 730 genes from 593 cells left in the control dataset, 174 genes from 593 cells left in our 10 μM 5FU treatment dataset, and 187 genes from 564 cells left in the 50 μM 5FU treatment dataset.

Appendix 1

1. DeepTX framework

1.1 Modeling gene expression processes in DNA damage

Generally, gene expression switches between ON and OFF states, and the two-state model can model this gene expression process. However, DNA damage will cause multiple different transcription rates during DNA transcription, that is, multiple steps. Therefore, we use a more general model of gene transcription (TX) to model the transcription process, taking DNA damage into account. We first describe the classic telegraph model (CTM), which is a specific case of the TXmodel.

The CTM model is a widely used model for capturing Markovian gene expression burst kinetics. It describes stochastic gene expression as a sequence of four biochemical reactions involving two gene states (ON and OFF), mRNA transcription and degradation:

\begin{aligned} R_{1} & : O F F \overset{k_{o f f}}{\to} O N, R_{2} : O N \overset{k_{o n}}{\to} O F F, \\ R_{3} & : O N \overset{k_{s y n}}{\to} O N + m R N A, R_{4} : m R N A \overset{k_{d e g}}{\to} \emptyset, \end{aligned}

where k_off is the rate at which the gene switches from OFF to ON, k_on is the rate at which the gene switches from ON to OFF, k_syn is the rate of mRNA synthesis and k_deg is the rate of mRNA degradation. In this model, gene switching between active and inactive states is governed by a memoryless Markovian process, where the waiting times for transitions follow exponential distributions.

However, promoter-state switching involves multiple biochemical reaction processes, resulting in the number of effective states of most promoters being greater than two and diverse switching between states. Hence, we consider the more general TXmodel, in which the dwell time of the state switch follows the general distribution. The process of gene transcription is described by following the reaction diagram:

\begin{aligned} R_{1} & : O F F \overset{f_{o f f} (t, ω_{o f f})}{\to} O N, \\ R_{2} & : O N \overset{f_{o n} (t, ω_{o n})}{\to} O F F, \\ R_{3} & : O N \overset{k_{s y n}}{\to} O N + m R N A, \\ R_{4} & : m R N A \overset{k_{d e g}}{\to} \emptyset, \end{aligned}

where R₁ and R₂ describe the gene state switches randomly between the ON and OFF states. The dwell time distributions of OFF and ON states are $f_{o f f} (t, ω_{o f f})$ and $f_{o n} (t, ω_{o n})$ , respectively. When the gene is in the active state, R₃, the synthesis of mRNA molecules is governed by a Poisson process, with a probability per unit time of a synthesis event equal to k_syn. Each mRNA molecule is stochastically degraded, R₄, with probabilities per unit time, per molecule, of k_deg . Each TXmodel can be determined by a group of parameters, denoted as $θ = (ω_{o f f}, ω_{o n}, k_{s y n}, k_{d e g})$ .

1.2 Model analysis

In this part, we theoretically solve the statistics of the formula. Let M(t) represent the number of mRNA at time t, and G(t) be the state at time t. $E_{o f f} (t), E_{o n} (t)$ represents the elapsed time for the gene state to transition to the OFF (ON) state at time t. Then, ${M (t), G (t), E_{o f f} (t), E_{o n} (t); t \geq 0}$ is a continuous Markov process. Let $p_{o f f} (n, τ, t)$ and $p_{o n} (n, τ, t)$ represent n mRNA molecules produced at time t and the elapsed time is the probability density function of τ.

Therefore, we can get

\begin{aligned} p_{o f f} (n, τ, t) Δ τ & = Pr {N (t) = n, G (t) = O F F, τ < E_{o f f} (t) \leq τ + Δ τ}, \\ p_{o n} (n, τ, t) Δ τ & = Pr {N (t) = n, G (t) = O N, τ < E_{o n} (t) \leq τ + Δ τ} . \end{aligned}

From the above equation, we can get the following Chapman–Kolmogorov backward equation:

\begin{matrix} p_{off} (n, τ + Δ t, t + Δ t) = & p_{off} (n, τ, t) (1 - n r_{deg} Δ t) (1 - H_{off} (τ) Δ t) \\ + & p_{off} (n + 1, τ, t) (n + 1) r_{deg} Δ t (1 - H_{off} (τ) Δ t) + o (Δ t), \\ p_{on} (n, τ + Δ t, t + Δ t) = & p_{on} (n, τ, t) (1 - n r_{deg} Δ t) (1 - r_{syn} Δ t) (1 - H_{on} (τ) Δ t) \\ + & p_{on} (n + 1, τ, t) (n + 1) r_{deg} Δ t (1 - r_{syn} Δ t) (1 - H_{on} (τ) Δ t) \\ + & p_{on} (n - 1, τ, t) (1 - n r_{deg} Δ t) r_{syn} Δ t (1 - H_{on} (τ) Δ t) + o (Δ t), \end{matrix}

where $H_{off} (τ) = f_{off} (τ) / S_{off} (τ)$ is a hazard rate function with the survival function $S_{off} (τ) = \int_{τ}^{\infty} f_{off} (t) d t$ . The definitions of $H_{on} (τ)$ and $S_{on} (τ)$ are similar.

Next, we focus on the steady-state distributions. Simulations in the main text have verified that the stationary distributions of $p_{off} (n, τ, t)$ and $p_{on} (n, τ, t)$ exist, and they are denoted by $p_{off} (n, τ)$ and $p_{on} (n, τ)$ , respectively. Equation 23 converts to the following stationary chemical master equation in the limit of small $Δ t$ and large $t$ ,

\begin{matrix} \frac{\partial}{\partial τ} p_{off} (n, τ) & = (n + 1) k_{deg} p_{off} (n + 1, τ) - (n k_{deg} + H_{off} (τ)) p_{off} (n, τ), \\ \frac{\partial}{\partial τ} p_{on} (n, τ) & = (n + 1) k_{deg} p_{on} (n + 1, τ) + k_{syn} p_{on} (n - 1, τ) - (n k_{deg} + k_{syn} + H_{on} (τ)) p_{on} (n, τ), \end{matrix}

with the integral boundary condition

p_{off} (n, 0) = \int_{0}^{\infty} p_{on} (n, τ) H_{on} (τ) d τ, p_{on} (n, 0) = \int_{0}^{\infty} p_{off} (n, τ) H_{off} (τ) d τ .

Based on Equation 24 with its boundary conditions, we use the binomial moment method to calculate the mRNA stationary distribution $P (n) = \int_{0}^{\infty} [p_{off} (n, τ) + p_{on} (n, τ) d] τ$ and its statistical characteristics. Binomial moments of the mRNA stationary distribution are defined as $b_{m} = \sum_{m \geq n}^{\infty} (\binom{m}{n}) P (n)$ , where the symbol $(\binom{m}{n})$ represents the combinatorial number. Note that binomial moments converge to zero as their orders go to infinity, and they can be used to reconstruct $| P (n) |$ by $P (n) = \sum_{m = n}^{\infty} (- 1)^{m - n} (\binom{m}{n}) b_{m}$ . After some algebra calculations of Equation 24, we can obtain the $n$ th binomial moment of mRNA in a recursive form

b_{n} = \frac{1}{n!} {(\frac{k_{syn}}{k_{deg}})}^{n} \sum_{i = 0}^{n - 1} (\binom{n - 1}{i}) C_{i} \sum_{j = 0}^{n - 1 - i} (\binom{n - 1 - i}{j}) (- 1)^{n - 1 - i - j} {\tilde{S}}_{on} ((n - 1 - j) k_{deg}),

where

C_{n} = \frac{{\tilde{f}}_{off} (n k_{deg})}{1 - {\tilde{f}}_{off} (n k_{deg}) {\tilde{f}}_{on} (n k_{deg})} \sum_{i = 0}^{n - 1} (\binom{n}{i}) C_{i} \sum_{j = 0}^{n - i} (\binom{n - i}{j}) (- 1)^{n - i - j} {\tilde{f}}_{on} ((n - j) k_{deg}),

for $n = 1, 2, \dots$ . Here, $C_{0} = {(⟨ τ_{off} ⟩ + ⟨ τ_{on} ⟩)}^{- 1}$ is equal to the burst frequency. $\tilde{f} (s)$ represents the Laplace transform of function $f (t)$ . Especially, ${\tilde{S}}_{on} (s) = (1 - {\tilde{f}}_{on} (s)) / s$ and ${\tilde{S}}_{on} (0) = ⟨ τ_{on} ⟩$ . According to the relationship between binomial moments and central moments and Equations 26 and 27, we obtain the mean and noise of mRNA expression

\begin{matrix} Mean = \frac{k_{syn} ⟨ τ_{on} ⟩}{k_{deg} (⟨ τ_{off} ⟩ + ⟨ τ_{on} ⟩)}, \\ {CV}^{2} = \frac{1}{Mean} + \frac{⟨ τ_{off} ⟩}{⟨ τ_{on} ⟩} + \frac{⟨ τ_{off} ⟩ + ⟨ τ_{on} ⟩}{k_{deg} ⟨ τ_{on} ⟩^{2}} \frac{(1 - {\tilde{f}}_{off} (k_{deg})) (1 - {\tilde{f}}_{on} (k_{deg}))}{1 - {\tilde{f}}_{off} (k_{deg}) {\tilde{f}}_{on} (k_{deg})} . \end{matrix}

Note that if we let $τ_{off} k_{deg}$ and $τ_{on} k_{deg}$ be the rescaled random variables for OFF- and ON-state dwell times, and $k_{syn} / k_{deg}$ be the rescaled mean synthesis rate, the mean expression not only equals the product of the mean synthesis rate and the stationary probability of ON state, but also the product of burst size and burst frequency.

The binomial moment can obtain any order statistics of the model solution. Next, we introduce the six statistics we use in training neural networks: (1) The mean value $μ_{1}$ is the most commonly used indicator in statistics, and it represents the average level of mRNA expression in scRNA-seq data. (2) The noise strength is a measurement of the dispersion of the probability distribution, defined as $μ_{2} / μ_{1}^{2}$ , where $μ_{2}$ is the variance. (3) The Fano factor is another statistic that measures the dispersion of a probability distribution relative to a Poisson distribution, defined as $μ_{2} / μ_{1}$ . (4) The skewness is a description of the symmetry of the distribution, and it is defined as $μ_{3} / μ_{2}^{3 / 2}$ , where $μ_{3}$ is the third central moment. (5) The kurtosis describes whether the peak of the distribution is abrupt or flat, which is defined as $μ_{4} / μ_{2}^{2}$ , where $μ_{4}$ is the fourth central moment. (6) The bimodality coefficient can describe the bimodal distribution, which is usually a critical feature in a dynamical system. Precisely, we can calculate the central moments with the binomial moment:

μ_{k} (t) = (- b_{1} (t))^{k} + \sum_{l = 0}^{k - 1} \sum_{j = 0}^{k - 1} R (k, i, j) (j!) {(b_{1} (t))}^{j} b_{j} (t),

in which $R (k, i, j) = (- 1)^{i} (\binom{k}{i}) u (k - i, j)$ with $u (n, k) = \sum_{i = 0}^{k} (- 1)^{k - i} (\binom{k}{i}) i^{n}$ being the Stirling number of the second kind. Therefore, the above summary statistics can be expressed by binomial moments as follows:

\begin{matrix} Mean = b_{1}, \\ NoiseStrength = \frac{2 b_{2} + b_{1} - b_{1}^{2}}{b_{1}^{2}}, \\ FanoFactor = \frac{2 b_{2} + b_{1} - b_{1}^{2}}{b_{1}}, \\ Skewness = \frac{6 b_{3} + 6 b_{2} + b_{1} - 3 b_{1} (2 b_{2} + b_{1}) + 2 b_{1}^{3}}{{(2 b_{2} + b_{1} - b_{1}^{2})}^{3 / 2}}, \\ Kurtosis = \frac{24 b_{4} + 36 b_{3} + 14 b_{2} + b_{1} - 4 b_{1} (6 b_{3} + 6 b_{2} + b_{1}) + 6 b_{1}^{2} (2 b_{2} + b_{1}) - 3 b_{1}^{4}}{{(2 b_{2} + b_{1} - b_{1}^{2})}^{2}} - 3, \\ BimodalityCoefficient = \frac{{Skewness}^{2} + 1}{Kurtosis} . \end{matrix}

It should be noted that we can extend the summary statistics to higher-order moments because our binomial moments can compute arbitrary high-order moment statistics.

1.3 Algorithm pipeline of deepTX framework

Appendix 1—algorithm 1.
Dataset: We use the Sobol algorithm to sample from a preset range of TXmodel parameters to obtain the parameters ${θ_{i}}$ . For each set of $θ$ , the distribution and moments are obtained using the SSA algorithm and binomial moment theory to form the dataset ${(θ_{i}, P_{simulation, i}, s_{i})}_{i = 1, . . ., N}$ . Input: A batch parameter of TXmodel ${θ_{i}}_{batch}$ . Output: The solution of the TXmodel obtained from the neural network and the corresponding statistics ${(P_{neuralnet, i}, {\hat{s}}_{i})}_{batch}$ . Initialize the neural network architecture as well as the weights and select hyperparameters. For every epoch do Sampling from the dataset ${(θ_{i}, P_{simulation, i}, s_{i})}$ to get a batch with parameter set ${(θ_{i}, P_{simulation, i}, s_{i})}_{batch}$ . Input parameter set ${θ_{i}}_{batch}$ to the neural network to get the distribution and statistics predicted by the neural network ${(P_{neuralnet, i}, {\hat{s}}_{i})}_{batch}$ . Train the neural network by the loss function. $L = \sum_{i = 1}^{n_{batch}} [KL (P_{simulation, i}, P_{neuralnet, i}) + λ \sum_{j = 1}^{n_{stats}} {(\log (\frac{s_{i j}}{{\hat{s}}_{i j}}))}^{2}] .$ End for

Appendix 1—algorithm 1.

Dataset: We use the Sobol algorithm to sample from a preset range of TXmodel parameters to obtain the parameters

{θ_{i}}

. For each set of

θ

, the distribution and moments are obtained using the SSA algorithm and binomial moment theory to form the dataset

{(θ_{i}, P_{simulation, i}, s_{i})}_{i = 1, . . ., N}

.
Input: A batch parameter of TXmodel

{θ_{i}}_{batch}

.
Output: The solution of the TXmodel obtained from the neural network and the corresponding statistics

{(P_{neuralnet, i}, {\hat{s}}_{i})}_{batch}

.
Initialize the neural network architecture as well as the weights and select hyperparameters.
For every epoch do

Sampling from the dataset ${(θ_{i}, P_{simulation, i}, s_{i})}$ to get a batch with parameter set ${(θ_{i}, P_{simulation, i}, s_{i})}_{batch}$ .
Input parameter set ${θ_{i}}_{batch}$ to the neural network to get the distribution and statistics predicted by the neural network ${(P_{neuralnet, i}, {\hat{s}}_{i})}_{batch}$ .
Train the neural network by the loss function.
$L = \sum_{i = 1}^{n_{batch}} [KL (P_{simulation, i}, P_{neuralnet, i}) + λ \sum_{j = 1}^{n_{stats}} {(\log (\frac{s_{i j}}{{\hat{s}}_{i j}}))}^{2}] .$

End for

1.4 Derivation of hierarchical negative binomial mixture distribution

Assume the following hierarchical distribution:

\begin{aligned} Y ∣ X & \sim B i n o m i a l (y ∣ x, p), \\ X & \sim N B (n, q) . \end{aligned}

Calculate the marginal distribution as follows:

\begin{matrix} Pr {Y = y} & = \sum_{x = 0}^{\infty} P (X = x, Y = y) = \sum_{x = 0}^{\infty} P (Y = y ∣ X = x) P (X = x) \\ = \sum_{x = y}^{\infty} (\binom{x}{y}) (\binom{x + n - 1}{x}) p^{y} (1 - p)^{x - y} (1 - q)^{n} q^{x} (conditional probability is 0 if y > x) \\ = \sum_{x = y}^{\infty} \frac{x!}{y! (x - y)!} \cdot \frac{(x + n - 1)!}{x! (n - 1)!} \cdot \frac{(n + y - 1)!}{(n + y - 1)!} p^{y} (1 - p)^{x - y} (1 - q)^{n} q^{x} \\ = \sum_{x = y}^{\infty} (\binom{x + n - 1}{x - y}) (\binom{n + y - 1}{y}) p^{y} (1 - p)^{x - y} (1 - q)^{n} q^{x} q^{y} q^{- y} \\ = (\binom{n + y - 1}{y}) (p q)^{y} (1 - q)^{n} \sum_{x = y}^{\infty} (\binom{x + n - 1}{x - y}) ((1 - p) q)^{x - y} \\ = (\binom{n + y - 1}{y}) {(\frac{p q}{1 - q + p q})}^{y} {(\frac{1 - q}{1 - q + p q})}^{n} . \end{matrix}

Further,

Y \sim N B (n, \frac{p q}{1 - q + p q}) .

If we assume that

P (X = x) = w_{1} P_{1} (X = x) + w_{2} P_{2} (X = x),

where $P_{1}$ and $P_{2}$ are both negative binomial distributions, then we can calculate them separately, and the result is the addition of two negative binomial distributions.

2. Relationship between noise and burst dynamics

For a TXmodel,

\begin{matrix} B F = \frac{1}{⟨ τ_{on} ⟩ + ⟨ τ_{off} ⟩}, B S = r_{syn} ⟨ τ_{on} ⟩, \\ ⟨ X ⟩ = \frac{r_{syn} ⟨ τ_{on} ⟩}{r_{deg} (⟨ τ_{on} ⟩ + ⟨ τ_{off} ⟩)} = \frac{B F \cdot B S}{r_{deg}}, \\ η_{X}^{2} = \frac{1}{⟨ X ⟩} + \frac{r_{deg} {(⟨ τ_{off} ⟩)}^{2}}{r_{deg} ⟨ τ_{on} ⟩ ⟨ τ_{off} ⟩ + ⟨ τ_{on} ⟩ + ⟨ τ_{off} ⟩} . \end{matrix}

When the mean protein $⟨ X ⟩$ is fixed to a constant $M$ , BS and BF are inversely proportional. We consider the following cases to indicate that the noise $η_{X}^{2}$ is a monotonically decreasing (increasing) function about BF (BS).

(1) $⟨ τ_{on} ⟩$ is fixed.

$⟨ τ_{off} ⟩$ and $r_{syn}$ satisfy the following relation,

r_{syn} = M r_{deg} (1 + \frac{⟨ τ_{off} ⟩}{⟨ τ_{on} ⟩}) .

On the other hand, BF decreases with an increasing $⟨ τ_{off} ⟩$ , and BS increases with the corresponding increasing $r_{syn}$ . In this case,

η_{X}^{2} = \frac{1}{M} + \frac{r_{deg}}{\frac{r_{deg} ⟨ τ_{on} ⟩ + 1}{⟨ τ_{off} ⟩} + \frac{⟨ τ_{on} ⟩}{⟨ τ_{off} ⟩^{2}}}

increases with an increasing $⟨ τ_{off} ⟩$ . Therefore, the noise $η_{X}^{2}$ is a monotonically decreasing (increasing) function about BF (BS).

(2) $r_{syn}$ is fixed.

$⟨ τ_{off} ⟩$ and $⟨ τ_{on} ⟩$ satisfy the following relation,

⟨ τ_{off} ⟩ = K ⟨ τ_{on} ⟩,

where the constant $K = (1 - C) / C$ and $C = M r_{deg} / r_{syn}$ . Note that $C \in (0, 1)$ , then $K > 0$ . Thus, $⟨ τ_{off} ⟩$ is directly proportional to $⟨ τ_{on} ⟩$ .

On the other hand, BF decreases with an increasing $⟨ τ_{on} ⟩$ and the corresponding increasing $⟨ τ_{off} ⟩$ , and BS increases with an increasing $⟨ τ_{on} ⟩$ . In this case,

η_{X}^{2} = \frac{1}{M} + \frac{r_{deg}}{\frac{r_{deg}}{K} + \frac{1}{K ⟨ τ_{on} ⟩} + \frac{1}{K^{2} ⟨ τ_{on} ⟩}}

increases with an increasing $⟨ τ_{off} ⟩$ . Therefore, the noise $η_{X}^{2}$ is a monotonically decreasing (increasing) function about BF (BS).

(3) The growth of $r_{syn}$ is consistent with the growth of $⟨ τ_{on} ⟩$ .

$⟨ τ_{off} ⟩$ can be obtained by

⟨ τ_{off} ⟩ = K (r_{syn}) ⟨ τ_{on} ⟩,

where $K (r_{syn}) = (r_{syn} - M r_{deg}) / M r_{deg}$ is a monotonically increasing function about $r_{syn}$ . Similar to case 2, it is easy to derive that the noise $η_{X}^{2}$ is a monotonically decreasing (increasing) function about BF (BS).

3. The definition of quality score

The quality score (QS) is calculated based on the difference in z-scores derived from GSVA (gene set variation analysis) of gene sets upregulated and downregulated during the quiescent phase and is defined as $Q S = z (up genes) - z (down genes)$ (Wiecek et al., 2023). $z (up genes)$ represents the standardized enrichment score of the gene set upregulated during quiescence in each sample. A higher value indicates that the quiescence-associated upregulated genes are actively expressed, suggesting that the sample is more likely to be in a quiescent (G0) state. $z (down genes)$ corresponds to the standardized enrichment score of genes downregulated during quiescence. A lower value implies effective suppression of these genes, which is also consistent with quiescence. The difference score $Q S$ serves as an integrated indicator of the quiescent state: A higher value reflects simultaneous activation of quiescence-associated upregulated genes and repression of downregulated genes, indicating a gene expression profile that strongly aligns with the G0/quiescent state. A lower or negative value suggests a deviation from the quiescent signature, potentially reflecting a proliferative state or failure to enter quiescence.

4. The rationale for the parameter space

In our study, we considered five parameters: $θ = (k_{off}, r_{off}, k_{on}, r_{% o n}, k_{syn})$ . The parameters $k_{off}$ and $k_{on}$ represent the number of intermediate reaction steps involved in transcriptional state transitions. These values were sampled uniformly from the range 1–15, which aligns with biological evidence indicating that most genes undergo either direct (single-step) transitions or a small number of intermediate steps, typically fewer than ten (Tunnacliffe and Chubb, 2020). This range is sufficient to capture both widely used single-step models and more detailed multi-step mechanisms without introducing biologically implausible complexity.

Among these parameters, $r_{off}$ and $r_{on}$ denote the rate constants governing stochastic transitions between the OFF and ON transcriptional states, respectively. The mean duration of the OFF state, which corresponds to the time between transcriptional bursts, is given by $⟨ τ_{off} ⟩ = k_{off} / r_{off}$ , and falls within the range $⟨ τ_{off} ⟩ \in (0.1, 150)$ . Experimental measurements report a median value of $⟨ τ_{off} ⟩$ approximately 3.7 (Gupta et al., 2022), which is well contained within this range. Similarly, the mean duration of the ON state, referred to as the burst duration, is defined by $⟨ τ_{on} ⟩ = k_{on} / r_{on}$ , and spans the interval $⟨ τ_{on} ⟩ \in (0.1, 1500)$ . The experimentally observed median value of 0.12 (Gupta et al., 2022) confirms that the parameter range adequately captures biologically realistic dynamics.

The parameter $k_{syn}$ represents the normalized synthesis rate after accounting for molecular degradation. Its range was chosen based on empirical observations of transcriptional burst sizes, which typically vary from single molecules to several dozen (Gupta et al., 2022). Considering the relationship $B S = k_{syn} \cdot ⟨ τ_{on} ⟩$ , the selected range of $k_{syn}$ ensures that the experimentally observed burst sizes are well represented within the defined parameter space.

Data availability

All the scRNA-seq data can be obtained from the public database. The scRNA-seq data of mouse embryonic stem cells (mESC) + DMSO and mESC + IdU data reported here is available at the Gene Expression Omnibus (GEO) database under accession number: GSE176044. The scRNA-seq data of RKO cells with different 5-fluorouracil treatment is available at the Gene Expression Omnibus (GEO) database under accession number: GSE149224. The code for reproducing the presented analysis results is available at the GitHub repository (https://github.com/cellfateTX/DeepTX, copy archived at cellfateTX and cogitoErgoSum, 2025).

The following previously published data sets were used

1. Desai RV
2. Weinberger LS
(2021) NCBI Gene Expression Omnibus
ID GSE176044. A DNA Repair Pathway Can Regulate Transcriptional Noise to Promote Cell Fate Transitions.

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE176044
1. Park SR
2. Kang HM
3. Lee JH
(2020) NCBI Gene Expression Omnibus
ID GSE149224. Transcriptome landscape of DNA-damage response at single-cell resolution.

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE149224

References

1. Adam-Vizi V
(2005) Production of reactive oxygen species in brain mitochondria: contribution by electron transport chain and non–electron transport chain sources
Antioxidants & Redox Signaling 7:1140–1149.

https://doi.org/10.1089/ars.2005.7.1140
- Google Scholar
1. Arora M
2. Moser J
3. Phadke H
4. Basha AA
5. Spencer SL
(2017) Endogenous replication stress in mother cells leads to quiescence of daughter cells
Cell Reports 19:1351–1364.

https://doi.org/10.1016/j.celrep.2017.04.055
- Google Scholar
(2011) Cellular decision making and biological noise: from microbes to mammals
Cell 144:910–925.

https://doi.org/10.1016/j.cell.2011.01.030
- Google Scholar
1. Barr AR
2. Cooper S
3. Heldt FS
4. Butera F
5. Stoy H
6. Mansfeld J
7. Novák B
8. Bakal C
(2017) DNA damage during S-phase mediates the proliferation-quiescence decision in the subsequent G1 via p21 expression
Nature Communications 8:14728.

https://doi.org/10.1038/ncomms14728
- Google Scholar
(2019) Learning data-driven discretizations for partial differential equations
PNAS 116:15344–15349.

https://doi.org/10.1073/pnas.1814058116
- PubMed
- Google Scholar
(2010) Bimodal gene expression patterns in breast cancer
BMC Genomics 11 Suppl 1:S8.

https://doi.org/10.1186/1471-2164-11-S1-S8
- PubMed
- Google Scholar
(2014) Simulating non-Markovian stochastic processes
Physical Review E 90:042108.

https://doi.org/10.1103/PhysRevE.90.042108
- Google Scholar
1. Bunz F
2. Hwang PM
3. Torrance C
4. Waldman T
5. Zhang Y
6. Dillehay L
7. Williams J
8. Lengauer C
9. Kinzler KW
10. Vogelstein B
(1999) Disruption of p53 in human cancer cells alters the responses to therapeutic agents
The Journal of Clinical Investigation 104:263–269.

https://doi.org/10.1172/JCI6863
- PubMed
- Google Scholar
1. Bwatanglang IB
2. Mohammad F
3. Yusof NA
4. Abdullah J
5. Alitheen NB
6. Hussein MZ
7. Abu N
8. Mohammed NE
9. Nordin N
10. Zamberi NR
11. Yeap SK
(2016) In vivo tumor targeting and anti-tumor effects of 5-fluororacil loaded, folic acid targeted quantum dot system
Journal of Colloid and Interface Science 480:146–158.

https://doi.org/10.1016/j.jcis.2016.07.011
- PubMed
- Google Scholar
Preprint
(2023) Comparative Analysis between Single-Cell RNA-Seq and Single-Molecule RNA FISH Indicates That the Pyrimidine Nucleobase Idoxuridine (IdU) Globally Amplifies Transcriptional Noise
bioRxiv.

https://doi.org/10.1101/2023.03.14.532632
- Google Scholar
1. Carneiro BA
2. El-Deiry WS
(2020) Targeting apoptosis in cancer therapy
Nature Reviews. Clinical Oncology 17:395–417.

https://doi.org/10.1038/s41571-020-0341-y
- PubMed
- Google Scholar
(2021) 3 ′-5 ′ crosstalk contributes to transcriptional bursting
Genome Biology 22:56.

https://doi.org/10.1186/s13059-020-02227-5
- Google Scholar
Software
1. cellfateTX
2. cogitoErgoSum
(2025) DeepTX, version swh:1:rev:183e3cfb018c7baaaa77086b7825752ea6dcebbd
Software Heritage.

https://archive.softwareheritage.org/swh:1:dir:94982436900cba2c2bcdde869d27acc655f4b3a0;origin=https://github.com/cellfateTX/DeepTX;visit=swh:1:snp:9bec55bf5918d893787bc475a0b990980a595420;anchor=swh:1:rev:183e3cfb018c7baaaa77086b7825752ea6dcebbd
1. Chang GS
2. Chen XA
3. Park B
4. Rhee HS
5. Li P
6. Han KH
7. Mishra T
8. Chan-Salis KY
9. Li Y
10. Hardison RC
11. Wang Y
12. Pugh BF
(2014) A comprehensive and high-resolution genome-wide response of p53 to stress
Cell Reports 8:514–527.

https://doi.org/10.1016/j.celrep.2014.06.030
- PubMed
- Google Scholar
(2022) Mechanisms and mathematical modeling of ROS production by the mitochondrial electron transport chain
American Journal of Physiology. Cell Physiology 323:C69–C83.

https://doi.org/10.1152/ajpcell.00455.2021
- PubMed
- Google Scholar
1. Chubb JR
2. Trcek T
3. Shenoy SM
4. Singer RH
(2006) Transcriptional pulsing of a developmental gene
Current Biology 16:1018–1025.

https://doi.org/10.1016/j.cub.2006.03.092
- Google Scholar
1. Cohen AA
2. Geva-Zatorsky N
3. Eden E
4. Frenkel-Morgenstern M
5. Issaeva I
6. Sigal A
7. Milo R
8. Cohen-Saidon C
9. Liron Y
10. Kam Z
11. Cohen L
12. Danon T
13. Perzov N
14. Alon U
(2008) Dynamic proteomics of individual cancer cells in response to a drug
Science 322:1511–1516.

https://doi.org/10.1126/science.1160165
- Google Scholar
1. Cristofari G
2. Adolf E
3. Reichenbach P
4. Sikora K
5. Terns RM
6. Terns MP
7. Lingner J
(2007) Human telomerase RNA accumulation in Cajal bodies facilitates telomerase recruitment to telomeres and telomere elongation
Molecular Cell 27:882–889.

https://doi.org/10.1016/j.molcel.2007.07.020
- Google Scholar
1. Dar RD
2. Razooky BS
3. Singh A
4. Trimeloni TV
5. McCollum JM
6. Cox CD
7. Simpson ML
8. Weinberger LS
(2012) Transcriptional burst frequency and burst size are equally modulated across the human genome
PNAS 109:17454–17459.

https://doi.org/10.1073/pnas.1213530109
- Google Scholar
1. Das S
2. Suganthan PN
(2010) Differential evolution: a survey of the state-of-the-art
IEEE Transactions on Evolutionary Computation 15:4–31.

https://doi.org/10.1109/TEVC.2010.2059031
- Google Scholar
(2020) The use of mixture density networks in the emulation of complex epidemiological individual-based models
PLOS Computational Biology 16:e1006869.

https://doi.org/10.1371/journal.pcbi.1006869
- PubMed
- Google Scholar
1. Deng Z
2. Shen D
3. Yu M
4. Zhou F
5. Shan D
6. Fang Y
7. Jin W
8. Qian K
9. Li S
10. Wang G
11. Zhang Y
12. Ju L
13. Xiao Y
14. Wang X
(2023) Pectolinarigenin inhibits bladder urothelial carcinoma cell proliferation by regulating DNA damage/autophagy pathways
Cell Death Discovery 9:214.

https://doi.org/10.1038/s41420-023-01508-9
- PubMed
- Google Scholar
1. Desai RV
2. Chen X
3. Martin B
4. Chaturvedi S
5. Hwang DW
6. Li W
7. Yu C
8. Ding S
9. Thomson M
10. Singer RH
11. Coleman RA
12. Hansen MMK
13. Weinberger LS
(2021) A DNA repair pathway can regulate transcriptional noise to promote cell fate transitions
Science 373:eabc6506.

https://doi.org/10.1126/science.abc6506
- PubMed
- Google Scholar
(2019) Challenges in measuring and understanding biological noise
Nature Reviews. Genetics 20:536–548.

https://doi.org/10.1038/s41576-019-0130-6
- PubMed
- Google Scholar
(2017) Systematic analysis of the determinants of gene expression noise in embryonic stem cells
Cell Systems 5:471–484.

https://doi.org/10.1016/j.cels.2017.10.003
- Google Scholar
1. Femino AM
2. Fay FS
3. Fogarty K
4. Singer RH
(1998) Visualization of single RNA transcripts in situ
Science 280:585–590.

https://doi.org/10.1126/science.280.5363.585
- Google Scholar
(2018) Persistent repair intermediates induce senescence
Nature Communications 9:3923.

https://doi.org/10.1038/s41467-018-06308-9
- Google Scholar
(2019) Stochastic transcription in the p53-mediated response to DNA damage is modulated by burst frequency
Molecular Systems Biology 15:e9068.

https://doi.org/10.15252/msb.20199068
- Google Scholar
(2009) Defining mechanisms that regulate RNA polymerase II transcription in vivo
Nature 461:186–192.

https://doi.org/10.1038/nature08449
- Google Scholar
(2021) A versatile workflow to integrate RNA-seq genomic and transcriptomic data into mechanistic models of signaling pathways
PLOS Computational Biology 17:e1008748.

https://doi.org/10.1371/journal.pcbi.1008748
- PubMed
- Google Scholar
(2023) Neural parameter calibration for large-scale multiagent models
PNAS 120:e2216415120.

https://doi.org/10.1073/pnas.2216415120
- Google Scholar
1. Geijer ME
2. Marteijn JA
(2018) What happens at the lesion does not stay at the lesion: Transcription-coupled nucleotide excision repair and the effects of DNA damage on transcription in cis and trans
DNA Repair 71:56–68.

https://doi.org/10.1016/j.dnarep.2018.08.007
- PubMed
- Google Scholar
(2016) The RNA response to DNA damage
Journal of Molecular Biology 428:2636–2651.

https://doi.org/10.1016/j.jmb.2016.03.004
- PubMed
- Google Scholar
Conference
1. Glorot X
2. Bengio Y
(2010)
Understanding the difficulty of training deep feedforward neural networks

Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. pp. 249–256.
- Google Scholar
1. Golan-Lavi R
2. Giacomelli C
3. Fuks G
4. Zeisel A
5. Sonntag J
6. Sinha S
7. Köstler W
8. Wiemann S
9. Korf U
10. Yarden Y
11. Domany E
(2017) Coordinated pulses of mRNA and of protein translation or degradation produce EGF-Induced protein bursts
Cell Reports 18:3129–3142.

https://doi.org/10.1016/j.celrep.2017.03.014
- Google Scholar
(2005) Real-time kinetics of gene activity in individual bacteria
Cell 123:1025–1036.

https://doi.org/10.1016/j.cell.2005.09.031
- Google Scholar
(2017) BayFish: Bayesian inference of transcription dynamics from population snapshots of single-molecule RNA FISH in single cells
Genome Biology 18:164.

https://doi.org/10.1186/s13059-017-1297-9
- Google Scholar
Book
(2016)
Deep Learning

MIT Press.
- Google Scholar
1. Gorin G
2. Pachter L
(2022) Modeling bursty transcription and splicing with the chemical master equation
Biophysical Journal 121:1056–1069.

https://doi.org/10.1016/j.bpj.2022.02.004
- Google Scholar
1. Gorin G
2. Carilli M
3. Chari T
4. Pachter L
(2024) Spectral neural approximations for models of transcriptional dynamics
Biophysical Journal 123:2892–2901.

https://doi.org/10.1016/j.bpj.2024.04.034
- PubMed
- Google Scholar
1. Gregersen LH
2. Svejstrup JQ
(2018) The cellular response to transcription-blocking DNA damage
Trends in Biochemical Sciences 43:327–341.

https://doi.org/10.1016/j.tibs.2018.02.010
- Google Scholar
1. Gu J
2. Laszik N
3. Miles CE
4. Allard J
5. Downing TL
6. Read EL
(2025) Scalable inference and identifiability of kinetic parameters for transcriptional bursting from single cell data
Bioinformatics 41:btaf581.

https://doi.org/10.1093/bioinformatics/btaf581
- PubMed
- Google Scholar
1. Gupta PB
2. Fillmore CM
3. Jiang G
4. Shapira SD
5. Tao K
6. Kuperwasser C
7. Lander ES
(2011) Stochastic state transitions give rise to phenotypic equilibrium in populations of cancer cells
Cell 146:633–644.

https://doi.org/10.1016/j.cell.2011.07.026
- PubMed
- Google Scholar
(2021) DeepCME: a deep learning framework for computing solution statistics of the chemical master equation
PLOS Computational Biology 17:e1009623.

https://doi.org/10.1371/journal.pcbi.1009623
- Google Scholar
1. Gupta A
2. Martin-Rufino JD
3. Jones TR
4. Subramanian V
5. Qiu X
6. Grody EI
7. Bloemendal A
8. Weng C
9. Niu S-Y
10. Min KH
11. Mehta A
12. Zhang K
13. Siraj L
14. Al’ Khafaji A
15. Sankaran VG
16. Raychaudhuri S
17. Cleary B
18. Grossman S
19. Lander ES
(2022) Inferring gene regulation from stochastic transcriptional variation across single cells at steady state
PNAS 119:e2207392119.

https://doi.org/10.1073/pnas.2207392119
- Google Scholar
1. Hafner A
2. Bulyk ML
3. Jambhekar A
4. Lahav G
(2019) The multiple mechanisms that regulate p53 activity and cell fate
Nature Reviews. Molecular Cell Biology 20:199–210.

https://doi.org/10.1038/s41580-019-0110-x
- PubMed
- Google Scholar
1. Handali S
2. Moghimipour E
3. Rezaei M
4. Ramezani Z
5. Kouchak M
6. Amini M
7. Angali KA
8. Saremy S
9. Dorkoosh FA
(2018) A novel 5-Fluorouracil targeted delivery to colon cancer using folic acid conjugated liposomes
Biomedicine & Pharmacotherapy 108:1259–1273.

https://doi.org/10.1016/j.biopha.2018.09.128
- Google Scholar
(2013) GSVA: gene set variation analysis for microarray and RNA-seq data
BMC Bioinformatics 14:7.

https://doi.org/10.1186/1471-2105-14-7
- PubMed
- Google Scholar
Book
1. Hongmei Z
(2012) Extrinsic and intrinsic apoptosis signal pathway review
In: Hongmei Z, editors. Apoptosis and Medicine. InTechOpen. pp. 3–23.

https://doi.org/10.5772/50129
- Google Scholar
1. Hope B
2. Kosofsky B
3. Hyman SE
4. Nestler EJ
(1992) Regulation of immediate early gene expression and AP-1 binding in the rat nucleus accumbens by chronic cocaine
PNAS 89:5764–5768.

https://doi.org/10.1073/pnas.89.13.5764
- Google Scholar
1. Hoppe C
2. Bowles JR
3. Minchington TG
4. Sutcliffe C
5. Upadhyai P
6. Rattray M
7. Ashe HL
(2020) Modulation of the promoter activation rate dictates the transcriptional response to graded BMP signaling levels in the Drosophila Embryo
Developmental Cell 54:727–741.

https://doi.org/10.1016/j.devcel.2020.07.007
- PubMed
- Google Scholar
1. Hwang PM
2. Bunz F
3. Yu J
4. Rago C
5. Chan TA
6. Murphy MP
7. Kelso GF
8. Smith RA
9. Kinzler KW
10. Vogelstein B
(2001) Ferredoxin reductase affects p53-dependent, 5-fluorouracil-induced apoptosis in colorectal cancer cells
Nature Medicine 7:1111–1117.

https://doi.org/10.1038/nm1001-1111
- Google Scholar
1. Jiang Q
2. Fu X
3. Yan S
4. Li R
5. Du W
6. Cao Z
7. Qian F
8. Grima R
(2021) Neural network aided approximation and parameter inference of non-Markovian models of gene expression
Nature Communications 12:2618.

https://doi.org/10.1038/s41467-021-22919-1
- PubMed
- Google Scholar
1. Kepler TB
2. Elston TC
(2001) Stochasticity in transcriptional regulation: origins, consequences, and mathematical representations
Biophysical Journal 81:3116–3136.

https://doi.org/10.1016/S0006-3495(01)75949-8
- Google Scholar
Preprint
(2016) On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
arXiv.

https://doi.org/10.48550/arXiv.1609.04836
- Google Scholar
1. Kim JK
2. Marioni JC
(2013) Inferring the kinetics of stochastic gene expression from single-cell RNA-sequencing data
Genome Biology 14:R7.

https://doi.org/10.1186/gb-2013-14-1-r7
- PubMed
- Google Scholar
Preprint
1. Kingma DP
2. Ba J
(2014) Adam: A Method for Stochastic Optimization
arXiv.

https://doi.org/10.48550/arXiv.1412.6980
- Google Scholar
(2019) Life or death after a break: what determines the choice?
Molecular Cell 76:346–358.

https://doi.org/10.1016/j.molcel.2019.08.023
- PubMed
- Google Scholar
(2015) Colorectal cancer
Nature Reviews Disease Primers 1:15065.

https://doi.org/10.1038/nrdp.2015.65
- Google Scholar
(2015) Transcriptional bursting in gene expression: analytical results for general stochastic models
PLOS Computational Biology 11:e1004292.

https://doi.org/10.1371/journal.pcbi.1004292
- PubMed
- Google Scholar
1. Kuntz J
2. Thomas P
3. Stan GB
4. Barahona M
(2021) Stationary distributions of continuous-time markov chains: a review of theory and truncation-based approximations
SIAM Review 63:3–64.

https://doi.org/10.1137/19M1289625
- Google Scholar
(2001) Long-term cultivation of colorectal carcinoma cells with anti-cancer drugs induces drug resistance and telomere elongation: an in vitro study
BMC Cancer 1:10.

https://doi.org/10.1186/1471-2407-1-10
- PubMed
- Google Scholar
1. Laha D
2. Pramanik A
3. Chattopadhyay S
4. Dash SK
5. Roy S
6. Pramanik P
7. Karmakar P
(2015) Folic acid modified copper oxide nanoparticles for targeted delivery in in vitro and in vivo systems
RSC Advances 5:68169–68178.

https://doi.org/10.1039/C5RA08110F
- Google Scholar
1. Lammers NC
2. Kim YJ
3. Zhao J
4. Garcia HG
(2020) A matter of time: Using dynamics and theory to uncover mechanisms of transcriptional bursting
Current Opinion in Cell Biology 67:147–157.

https://doi.org/10.1016/j.ceb.2020.08.001
- PubMed
- Google Scholar
(2019) The DNA damage response to transcription stress
Nature Reviews. Molecular Cell Biology 20:766–784.

https://doi.org/10.1038/s41580-019-0169-4
- PubMed
- Google Scholar
1. Larson DR
2. Zenklusen D
3. Wu B
4. Chao JA
5. Singer RH
(2011) Real-time observation of transcription initiation and elongation on an endogenous yeast gene
Science 332:475–478.

https://doi.org/10.1126/science.1202142
- PubMed
- Google Scholar
(2019) Genomic encoding of transcriptional burst kinetics
Nature 565:251–254.

https://doi.org/10.1038/s41586-018-0836-1
- PubMed
- Google Scholar
1. Li C
2. Xu X
3. Chen S
4. Xu A
5. Guan T
6. Wu H
7. Pei D
8. Liu J
(2024) Epigenetic reshaping through damage: promoting cell fate transition by BrdU and IdU incorporation
Cell & Bioscience 14:01192.

https://doi.org/10.1186/s13578-024-01192-x
- Google Scholar
(2017) Telomerase and drug resistance in cancer
Cellular and Molecular Life Sciences 74:4121–4132.

https://doi.org/10.1007/s00018-017-2573-2
- PubMed
- Google Scholar
1. Liu T
2. Zhang J
3. Zhou T
(2016) Effect of interaction between chromatin loops on cell-to-cell variability in gene expression
PLOS Computational Biology 12:e1004917.

https://doi.org/10.1371/journal.pcbi.1004917
- PubMed
- Google Scholar
1. Liu Z
2. Wang L
3. Welch JD
4. Ma H
5. Zhou Y
6. Vaseghi HR
7. Yu S
8. Wall JB
9. Alimohamadi S
10. Zheng M
11. Yin C
12. Shen W
13. Prins JF
14. Liu J
15. Qian L
(2017) Single-cell transcriptomics reconstructs fate conversion from fibroblast to cardiomyocyte
Nature 551:100–104.

https://doi.org/10.1038/nature24454
- Google Scholar
1. Lomvardas S
2. Thanos D
(2002) Opening chromatin
Molecular Cell 9:209–211.

https://doi.org/10.1016/s1097-2765(02)00463-x
- PubMed
- Google Scholar
(2008) Tuning gene expression to changing environments: from rapid responses to evolutionary adaptation
Nature Reviews. Genetics 9:583–593.

https://doi.org/10.1038/nrg2398
- PubMed
- Google Scholar
1. Losick R
2. Desplan C
(2008) Stochasticity and cell fate
Science 320:65–68.

https://doi.org/10.1126/science.1147888
- PubMed
- Google Scholar
1. Love MI
2. Huber W
3. Anders S
(2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
Genome Biology 15:550.

https://doi.org/10.1186/s13059-014-0550-8
- Google Scholar
1. Luo S
2. Wang Z
3. Zhang Z
4. Zhou T
5. Zhang J
(2023a) Genome-wide inference reveals that feedback regulations constrain promoter-dependent transcriptional burst kinetics
Nucleic Acids Research 51:68–83.

https://doi.org/10.1093/nar/gkac1204
- Google Scholar
1. Luo S
2. Zhang Z
3. Wang Z
4. Yang X
5. Chen X
6. Zhou T
7. Zhang J
(2023b) Inferring transcriptional bursting kinetics from single-cell snapshot data using a generalized telegraph model
Royal Society Open Science 10:221057.

https://doi.org/10.1098/rsos.221057
- PubMed
- Google Scholar
1. Macosko EZ
2. Basu A
3. Satija R
4. Nemesh J
5. Shekhar K
6. Goldman M
7. Tirosh I
8. Bialas AR
9. Kamitaki N
10. Martersteck EM
11. Trombetta JJ
12. Weitz DA
13. Sanes JR
14. Shalek AK
15. Regev A
16. McCarroll SA
(2015) Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets
Cell 161:1202–1214.

https://doi.org/10.1016/j.cell.2015.05.002
- Google Scholar
1. Maizels RJ
(2024) A dynamical perspective: moving towards mechanism in single-cell transcriptomics
Philosophical Transactions of the Royal Society B 379:20230049.

https://doi.org/10.1098/rstb.2023.0049
- Google Scholar
1. Masuda N
2. Rocha LEC
(2018) A gillespie algorithm for non-markovian stochastic processes
SIAM Review 60:95–115.

https://doi.org/10.1137/16M1055876
- Google Scholar
1. Miao Z
2. Humphreys BD
3. McMahon AP
4. Kim J
(2021) Multi-omics integration in the age of million single-cell data
Nature Reviews Nephrology 17:710–724.

https://doi.org/10.1038/s41581-021-00463-x
- Google Scholar
(2020) Solving differential equations using deep neural networks
Neurocomputing 399:193–212.

https://doi.org/10.1016/j.neucom.2020.02.015
- Google Scholar
1. Morgan MD
2. Marioni JC
(2018) CpG island composition differences are a source of gene expression noise indicative of promoter responsiveness
Genome Biology 19:81.

https://doi.org/10.1186/s13059-018-1461-x
- Google Scholar
(2014) Nuclear translocation of Cyclin B1 marks the restriction point for terminal cell cycle exit in G2 phase
Cell Cycle 13:2733–2743.

https://doi.org/10.4161/15384101.2015.945831
- Google Scholar
(2009) DNA damage regulates alternative splicing through inhibition of RNA polymerase II elongation
Cell 137:708–720.

https://doi.org/10.1016/j.cell.2009.03.010
- PubMed
- Google Scholar
1. Nabetani A
2. Ishikawa F
(2011) Alternative lengthening of telomeres pathway: recombination-mediated telomere maintenance mechanism in human cells
Journal of Biochemistry 149:5–14.

https://doi.org/10.1093/jb/mvq119
- PubMed
- Google Scholar
(2006) Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise
Nature 441:840–846.

https://doi.org/10.1038/nature04785
- Google Scholar
1. Nicolas D
2. Zoller B
3. Suter DM
4. Naef F
(2018) Modulation of transcriptional burst frequency by histone acetylation
PNAS 115:7153–7158.

https://doi.org/10.1073/pnas.1722330115
- Google Scholar
1. Nordick B
2. Yu PY
3. Liao G
4. Hong T
(2022) Nonmodular oscillator and switch based on RNA decay drive regeneration of multimodal gene expression
Nucleic Acids Research 50:3693–3708.

https://doi.org/10.1093/nar/gkac217
- PubMed
- Google Scholar
(2022) Inference and uncertainty quantification of stochastic gene expression via synthetic models
Journal of the Royal Society, Interface 19:20220153.

https://doi.org/10.1098/rsif.2022.0153
- PubMed
- Google Scholar
1. Ochiai H
2. Hayashi T
3. Umeda M
4. Yoshimura M
5. Harada A
6. Shimizu Y
7. Nakano K
8. Saitoh N
9. Liu Z
10. Yamamoto T
11. Okamura T
12. Ohkawa Y
13. Kimura H
14. Nikaido I
(2020) Genome-wide kinetic properties of transcriptional bursting in mouse embryonic stem cells
Science Advances 6:eaaz6699.

https://doi.org/10.1126/sciadv.aaz6699
- Google Scholar
1. Park SR
2. Namkoong S
3. Friesen L
4. Cho C-S
5. Zhang ZZ
6. Chen Y-C
7. Yoon E
8. Kim CH
9. Kwak H
10. Kang HM
11. Lee JH
(2020) Single-cell transcriptome analysis of colon cancer cell response to 5-fluorouracil-induced DNA damage
Cell Reports 32:108077.

https://doi.org/10.1016/j.celrep.2020.108077
- Google Scholar
1. Peccoud J
2. Ycart B
(1995) Markovian modeling of gene-product synthesis
Theoretical Population Biology 48:222–234.

https://doi.org/10.1006/tpbi.1995.1027
- Google Scholar
(2020) Effects of cell cycle variability on lineage and population measurements of messenger RNA abundance
Journal of the Royal Society, Interface 17:20200360.

https://doi.org/10.1098/rsif.2020.0360
- PubMed
- Google Scholar
Book
1. Phillips R
2. Kondev J
3. Theriot J
4. Garcia HG
5. Orme N
(2012) Physical Biology of the Cell
Garland Science.

https://doi.org/10.1201/9781134111589
- Google Scholar
(2013) Smart-seq2 for sensitive full-length transcriptome profiling in single cells
Nature Methods 10:1096–1098.

https://doi.org/10.1038/nmeth.2639
- Google Scholar
1. Qiu Q
2. Hu P
3. Qiu X
4. Govek KW
5. Cámara PG
6. Wu H
(2020) Massively parallel and time-resolved RNA sequencing in single cells with scNT-seq
Nature Methods 17:991–1001.

https://doi.org/10.1038/s41592-020-0935-4
- Google Scholar
1. Raj A
2. Oudenaarden A
(2008) Nature, nurture, or chance: stochastic gene expression and its consequences
Cell 135:216–226.

https://doi.org/10.1016/j.cell.2008.09.050
- Google Scholar
(2024) Single-cell new RNA sequencing reveals principles of transcription at the resolution of individual bursts
Nature Cell Biology 26:1725–1733.

https://doi.org/10.1038/s41556-024-01486-9
- Google Scholar
1. Riccio F
2. Micarelli E
3. Secci R
4. Giuliani G
5. Vumbaca S
6. Massacci G
7. Castagnoli L
8. Fuoco C
9. Cesareni G
(2022) Transcription factor activation profiles (TFAP) identify compounds promoting differentiation of Acute Myeloid Leukemia cell lines
Cell Death Discovery 8:008117.

https://doi.org/10.1038/s41420-021-00811-7
- Google Scholar
1. Risso D
2. Perraudeau F
3. Gribkova S
4. Dudoit S
5. Vert JP
(2018) A general and flexible method for signal extraction from single-cell RNA-seq data
Nature Communications 9:284.

https://doi.org/10.1038/s41467-017-02554-5
- Google Scholar
1. Rodriguez J
2. Ren G
3. Day CR
4. Zhao K
5. Chow CC
6. Larson DR
(2019) Intrinsic dynamics of a human gene reveal the basis of expression heterogeneity
Cell 176:213–226.

https://doi.org/10.1016/j.cell.2018.11.026
- Google Scholar
1. Rodriguez J
2. Larson DR
(2020) Transcription in living cells: molecular mechanisms of bursting
Annual Review of Biochemistry 89:189–212.

https://doi.org/10.1146/annurev-biochem-011520-105250
- Google Scholar
1. Roos WP
2. Kaina B
(2013) DNA damage-induced cell death: from specific DNA lesions to the DNA damage response and apoptosis
Cancer Letters 332:237–248.

https://doi.org/10.1016/j.canlet.2012.01.007
- Google Scholar
(2019) Osteogenic differentiation of skeletal muscle progenitor cells is activated by the DNA damage response
Scientific Reports 9:5447.

https://doi.org/10.1038/s41598-019-41926-3
- Google Scholar
1. Safieddine A
2. Coleno E
3. Lionneton F
4. Traboulsi AM
5. Salloum S
6. Lecellier CH
7. Gostan T
8. Georget V
9. Hassen-Khodja C
10. Imbert A
11. Mueller F
12. Walter T
13. Peter M
14. Bertrand E
(2023) HT-smFISH: a cost-effective and flexible workflow for high-throughput single-molecule RNA imaging
Nature Protocols 18:157–187.

https://doi.org/10.1038/s41596-022-00750-2
- PubMed
- Google Scholar
1. Sarkar A
2. Stephens M
(2021) Separating measurement and expression models clarifies confusion in single-cell RNA sequencing analysis
Nature Genetics 53:770–777.

https://doi.org/10.1038/s41588-021-00873-4
- Google Scholar
(2012) Transcription stochasticity of complex gene regulation models
Biophysical Journal 103:1152–1161.

https://doi.org/10.1016/j.bpj.2012.07.011
- PubMed
- Google Scholar
(2014) Aging: A mitochondrial DNA perspective, critical analysis and an update
World Journal of Experimental Medicine 4:46–57.

https://doi.org/10.5493/wjem.v4.i4.46
- PubMed
- Google Scholar
Conference
(2013) Stochastic analysis and inference of a two-state genetic promoter model
2013 American Control Conference (ACC). pp. 4563–4568.

https://doi.org/10.1109/ACC.2013.6580542
- Google Scholar
(2007) Sequential Monte Carlo without likelihoods
PNAS 104:1760–1765.

https://doi.org/10.1073/pnas.0607208104
- Google Scholar
1. Smits VAJ
2. Medema RH
(2001) Checking out the G2/M transition
Biochimica et Biophysica Acta (BBA) - Gene Structure and Expression 1519:1–12.

https://doi.org/10.1016/S0167-4781(01)00204-4
- Google Scholar
1. Sobol’ IM
(1967) On the distribution of points in a cube and the approximate evaluation of integrals
USSR Computational Mathematics and Mathematical Physics 7:86–112.

https://doi.org/10.1016/0041-5553(67)90144-9
- Google Scholar
(2015) Inferring single-cell gene expression mechanisms using stochastic simulation
Bioinformatics 31:1428–1435.

https://doi.org/10.1093/bioinformatics/btv007
- Google Scholar
(2019) Transcription factors and 3D genome conformation in cell-fate decisions
Nature 569:345–354.

https://doi.org/10.1038/s41586-019-1182-7
- Google Scholar
(2012) Population density approach for discrete mRNA distributions in generalized switching models for stochastic gene expression
Physical Review E 85:061919.

https://doi.org/10.1103/PhysRevE.85.061919
- Google Scholar
1. Stumpf PS
2. Smith RCG
3. Lenz M
4. Schuppert A
5. Müller FJ
6. Babtie A
7. Chan TE
8. Stumpf MPH
9. Please CP
10. Howison SD
11. Arai F
12. MacArthur BD
(2017) Stem cell differentiation as a non-markov stochastic process
Cell Systems 5:268–282.

https://doi.org/10.1016/j.cels.2017.08.009
- PubMed
- Google Scholar
1. Su TT
(2006) Cellular responses to DNA damage: one signal, multiple choices
Annual Review of Genetics 40:187–208.

https://doi.org/10.1146/annurev.genet.40.110405.090428
- Google Scholar
1. Subramanian A
2. Tamayo P
3. Mootha VK
4. Mukherjee S
5. Ebert BL
6. Gillette MA
7. Paulovich A
8. Pomeroy SL
9. Golub TR
10. Lander ES
11. Mesirov JP
(2005) Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles
PNAS 102:15545–15550.

https://doi.org/10.1073/pnas.0506580102
- Google Scholar
1. Sukys A
2. Öcal K
3. Grima R
(2022) Approximating solutions of the Chemical Master equation using neural networks
iScience 25:105010.

https://doi.org/10.1016/j.isci.2022.105010
- Google Scholar
1. Sukys A
2. Grima R
(2025) Cell-cycle dependence of bursty gene expression: insights from fitting mechanistic models to single-cell RNA-seq data
Nucleic Acids Research 53:gkaf295.

https://doi.org/10.1093/nar/gkaf295
- PubMed
- Google Scholar
1. Sun Y
2. Rigas B
(2008) The thioredoxin system mediates redox-induced cell death in human colon cancer cells: implications for the mechanism of action of anticancer agents
Cancer Research 68:8269–8277.

https://doi.org/10.1158/0008-5472.CAN-08-2010
- Google Scholar
1. Tanay A
2. Regev A
(2017) Scaling single-cell genomics from phenomenology to mechanism
Nature 541:331–338.

https://doi.org/10.1038/nature21350
- Google Scholar
(2023a) Modelling capture efficiency of single-cell RNA-sequencing data improves inference of transcriptome-wide burst kinetics
Bioinformatics 39:btad395.

https://doi.org/10.1093/bioinformatics/btad395
- Google Scholar
1. Tang Y
2. Weng J
3. Zhang P
(2023b) Neural-network solutions to stochastic reaction networks
Nature Machine Intelligence 5:376–385.

https://doi.org/10.1038/s42256-023-00632-6
- Google Scholar
1. The Gene Ontology Consortium
(2015) Gene ontology consortium: going forward
Nucleic Acids Research 43:D1049–D1056.

https://doi.org/10.1093/nar/gku1179
- Google Scholar
(2008) ATR signaling can drive cells into senescence in the absence of DNA breaks
Genes & Development 22:297–302.

https://doi.org/10.1101/gad.452308
- Google Scholar
1. Tunnacliffe E
2. Chubb JR
(2020) What is a transcriptional burst?
Trends in Genetics 36:288–297.

https://doi.org/10.1016/j.tig.2020.01.003
- Google Scholar
Preprint
1. van den Berg J
2. Joosten SEP
3. Kim Y
4. Manjón AG
5. Krenning L
6. Koob L
7. Feringa FM
8. Klompmaker R
9. van den Broek B
10. Jalink K
11. Zwart W
12. Medema RH
(2019) DNA End-Resection in Highly Accessible Chromatin Produces a Toxic Break
bioRxiv.

https://doi.org/10.1101/691857
- Google Scholar
1. Voss TC
2. Hager GL
(2014) Dynamic regulation of transcriptional states by chromatin and transcription factors
Nature Reviews Genetics 15:69–81.

https://doi.org/10.1038/nrg3623
- Google Scholar
1. Wan Y
2. Anastasakis DG
3. Rodriguez J
4. Palangat M
5. Gudla P
6. Zaki G
7. Tandon M
8. Pegoraro G
9. Chow CC
10. Hafner M
11. Larson DR
(2021) Dynamic imaging of nascent RNA reveals general principles of transcription dynamics and stochastic splice site selection
Cell 184:2878–2895.

https://doi.org/10.1016/j.cell.2021.04.012
- PubMed
- Google Scholar
1. Wang J
2. Huang M
3. Torre E
4. Dueck H
5. Shaffer S
6. Murray J
7. Raj A
8. Li M
9. Zhang NR
(2018) Gene expression distribution deconvolution in single-cell RNA sequencing
PNAS 115:E6437–E6446.

https://doi.org/10.1073/pnas.1721085115
- Google Scholar
1. Wang S
2. Fan K
3. Luo N
4. Cao Y
5. Wu F
6. Zhang C
7. Heller KA
8. You L
(2019) Massive computational acceleration by using neural networks to emulate mechanism-based biological models
Nature Communications 10:4354.

https://doi.org/10.1038/s41467-019-12342-y
- Google Scholar
1. Wang Z
2. Luo S
3. Zhang Z
4. Zhou T
5. Zhang J
(2023) 4D nucleome equation predicts gene expression controlled by long-range enhancer-promoter interaction
PLOS Computational Biology 19:e1011722.

https://doi.org/10.1371/journal.pcbi.1011722
- Google Scholar
1. Wang Z
2. Zhang Z
3. Luo S
4. Zhou T
5. Zhang J
(2024) Power-law behavior of transcriptional bursting regulated by enhancer–promoter communication
Genome Research 34:106–118.

https://doi.org/10.1101/gr.278631.123
- Google Scholar
1. Wen L
2. Tang F
(2022) Recent advances in single-cell sequencing technologies
Precision Clinical Medicine 5:bac002.

https://doi.org/10.1093/pcmedi/pbac002
- PubMed
- Google Scholar
1. Wiecek AJ
2. Cutty SJ
3. Kornai D
4. Parreno-Centeno M
5. Gourmet LE
6. Tagliazucchi GM
7. Jacobson DH
8. Zhang P
9. Xiong L
10. Bond GL
11. Barr AR
12. Secrier M
(2023) Genomic hallmarks and therapeutic implications of G0 cell cycle arrest in cancer
Genome Biology 24:128.

https://doi.org/10.1186/s13059-023-02963-4
- PubMed
- Google Scholar
(2021) DNA damage-how and why we age?
eLife 10:e62852.

https://doi.org/10.7554/eLife.62852
- PubMed
- Google Scholar
(2008) Single-RNA counting reveals alternative modes of gene expression in yeast
Nature Structural & Molecular Biology 15:1263–1271.

https://doi.org/10.1038/nsmb.1514
- Google Scholar
1. Zhang J
2. Chen L
3. Zhou T
(2012) Analytical distribution and tunability of noise in a model of promoter progress
Biophysical Journal 102:1247–1257.

https://doi.org/10.1016/j.bpj.2012.02.001
- PubMed
- Google Scholar
1. Zhang J
2. Zhou T
(2014) Promoter-mediated Transcriptional Dynamics
Biophysical Journal 106:479–488.

https://doi.org/10.1016/j.bpj.2013.12.011
- Google Scholar
1. Zhang J
2. Zhou T
(2019a) Markovian approaches to modeling intracellular reaction processes with molecular memory
PNAS 116:23542–23550.

https://doi.org/10.1073/pnas.1913926116
- Google Scholar
1. Zhang J
2. Zhou T
(2019b) Stationary moments, distribution conjugation and phenotypic regions in stochastic gene transcription
Mathematical Biosciences and Engineering 16:6134–6166.

https://doi.org/10.3934/mbe.2019307
- PubMed
- Google Scholar
1. Zhang Z
2. Deng Q
3. Wang Z
4. Chen Y
5. Zhou T
(2021) Exact results for queuing models of stochastic transcription with memory and crosstalk
Physical Review E 103:062414.

https://doi.org/10.1103/PhysRevE.103.062414
- Google Scholar
1. Zhang J
2. Chen A
3. Qiu H
4. Zhang J
5. Tian T
6. Zhou T
(2024) Exact results for gene-expression models with general waiting-time distributions
Physical Review E 109:024119.

https://doi.org/10.1103/PhysRevE.109.024119
- Google Scholar
1. Zhao Y
2. Simon M
3. Seluanov A
4. Gorbunova V
(2023) DNA damage and repair in age-related inflammation
Nature Reviews Immunology 23:75–89.

https://doi.org/10.1038/s41577-022-00751-y
- Google Scholar
1. Zheng GXY
2. Terry JM
3. Belgrader P
4. Ryvkin P
5. Bent ZW
6. Wilson R
7. Ziraldo SB
8. Wheeler TD
9. McDermott GP
10. Zhu J
11. Gregory MT
12. Shuga J
13. Montesclaros L
14. Underwood JG
15. Masquelier DA
16. Nishimura SY
17. Schnall-Levin M
18. Wyatt PW
19. Hindson CM
20. Bharadwaj R
21. Wong A
22. Ness KD
23. Beppu LW
24. Deeg HJ
25. McFarland C
26. Loeb KR
27. Valente WJ
28. Ericson NG
29. Stevens EA
30. Radich JP
31. Mikkelsen TS
32. Hindson BJ
33. Bielas JH
(2017) Massively parallel digital transcriptional profiling of single cells
Nature Communications 8:14049.

https://doi.org/10.1038/ncomms14049
- PubMed
- Google Scholar
1. Zheng P
2. Chen Q
3. Tian X
4. Qian N
5. Chai P
6. Liu B
7. Hu J
8. Blackstone C
9. Zhu D
10. Teng J
11. Chen J
(2018) DNA damage triggers tubular endoplasmic reticulum extension to promote apoptosis by facilitating ER-mitochondria signaling
Cell Research 28:833–854.

https://doi.org/10.1038/s41422-018-0065-z
- PubMed
- Google Scholar
1. Zhou T
2. Zhang J
(2012) Analytical results for a multistate gene model
SIAM Journal on Applied Mathematics 72:789–818.

https://doi.org/10.1137/110852887
- Google Scholar
1. Zoller B
2. Nicolas D
3. Molina N
4. Naef F
(2015) Structure of silent transcription intervals and noise characteristics of mammalian genes
Molecular Systems Biology 11:823.

https://doi.org/10.15252/msb.20156257
- PubMed
- Google Scholar
(2018) Diverse spatial expression patterns emerge from unified kinetics of transcriptional bursting
Cell 175:835–847.

https://doi.org/10.1016/j.cell.2018.09.056
- PubMed
- Google Scholar

Article and author information

Author details

Zhiwei Huang
1. Guangdong Province Key Laboratory of Computational Science, Sun Yat-sen University, Guangzhou, China
2. School of Mathematics, Sun Yat-sen University, Guangzhou, China
Contribution
Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing – original draft, Writing – review and editing

Contributed equally with
Songhao Luo

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0009-0007-1957-3940
Songhao Luo

Department of Mathematics, University of California Irvine, Irvine, United States

Contribution
Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing – original draft, Writing – review and editing

Contributed equally with
Zhiwei Huang

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0003-1162-9608
Zihao Wang
1. Guangdong Province Key Laboratory of Computational Science, Sun Yat-sen University, Guangzhou, China
2. School of Mathematics, Sun Yat-sen University, Guangzhou, China
Contribution
Data curation, Validation, Investigation, Visualization, Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0001-5440-843X
Zhenquan Zhang

School of Mathematics and Statistics, Guangdong University of Technology, Guangzhou, China

Contribution
Data curation, Formal analysis, Validation, Investigation, Visualization, Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-2913-4905
Benyuan Jiang

Guangdong Lung Cancer Institute, Guangdong Provincial People’s Hospital and Guangdong Academy of Medical Sciences, Guangzhou, China

Contribution
Resources, Supervision, Writing – review and editing

For correspondence
jiangbenyuan@gdph.org.cn

Competing interests
No competing interests declared
Qing Nie

Department of Mathematics, University of California Irvine, Irvine, United States

Contribution
Supervision, Writing – review and editing

For correspondence
qnie@uci.edu

Competing interests
No competing interests declared
Jiajun Zhang
1. Guangdong Province Key Laboratory of Computational Science, Sun Yat-sen University, Guangzhou, China
2. School of Mathematics, Sun Yat-sen University, Guangzhou, China
Contribution
Conceptualization, Resources, Data curation, Formal analysis, Supervision, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing – original draft, Project administration, Writing – review and editing

For correspondence
zhjiajun@mail.sysu.edu.cn

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0001-7107-4814

Funding

National Key Research and Development Program of China (2021YFA1302500)

Jiajun Zhang

Natural Science Foundation of P.R. China (12171494)

Jiajun Zhang

National Natural Science Foundation of China (12301646)

Zhenquan Zhang

National Natural Science Foundation of China (12501700)

Zihao Wang

Key-Area Research and Development Program of Guangzhou, P. R. China (2019B110233002)

Jiajun Zhang

Guangdong Basic and Applied Basic Research Foundation (2022A1515011540)

Jiajun Zhang

Guangdong Basic and Applied Basic Research Foundation (2023A1515110273)

Zihao Wang

Guangdong Basic and Applied Basic Research Foundation (2024A1515012786)

Zihao Wang

Guangdong Provincial Key Laboratory of Mathematical and Neural Dynamical Systems (2024B1212010004)

Jiajun Zhang

The funders had no role in study design, data collection, and interpretation, or the decision to submit the work for publication.

Acknowledgements

This work was supported by National Key R&D Program of China (Grant No. 2021YFA1302500), Natural Science Foundation of P R China (Grants No. 12171494, No. 12301646 and No. 12501700), Key-Area Research and Development Program of Guangzhou, P R China (Grant No. 2019B110233002), Guangdong Basic and Applied Basic Research Foundation (Grants No. 2022A1515011540, No. 2023A1515110273, and No. 2024A1515012786), and Guangdong Provincial Key Laboratory of Mathematical and Neural Dynamical Systems (Grant No. 2024B1212010004).

Version history

Sent for peer review: July 10, 2024
Preprint posted: July 12, 2024
Reviewed Preprint version 1: December 13, 2024
Reviewed Preprint version 2: August 22, 2025
Reviewed Preprint version 3: January 16, 2026
Version of Record published: March 4, 2026

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.100623. This DOI represents all versions, and will always resolve to the latest one.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.