Introduction

Cells must employ flexible regulatory strategies to determine their fate, including cell differentiation, survival, or apoptosis, in response to diverse external stimuli (López-Maury L et al., 2008; Stadhouders R et al., 2019). One key cellular strategy that enables both beneficial and detrimental functions is the stochastic nature of gene expression programs, arising from intricate biochemical processes and termed “noise” (Raj A & van Oudenaarden A, 2008; Balázsi G et al., 2011; Losick R & Desplan C, 2008). Gene expression noise originates from a discontinuous transcriptional pattern, wherein mRNA production occurs through alternating active and inactive gene states (Eling N et al., 2019; Tunnacliffe E & Chubb JR, 2020). This phenomenon, known as “transcriptional bursting”, has been observed in numerous experiments across both prokaryotic and eukaryotic cells (Golding I et al., 2005; Chubb JR et al., 2006; Larson DR et al., 2011). However, it remains an unresolved question how cells make fate decisions in response to external signals through genome-wide regulation of transcriptional burst mechanisms (Lammers NC et al., 2020; Rodriguez J & Larson DR, 2020).

DNA damage is a crucial and prevalent factor contributing to cellular responses, inducing varying degrees of cytotoxicity and thereby influencing diverse cell fate decisions (Krenning L et al., 2019; Su TT, 2006; Hafner A et al., 2019). For instance, low cytotoxicity limits cell proliferation (van den Berg J et al., 2019; Arora M et al., 2017; Barr AR et al., 2017; Deng Z et al., 2023), whereas heightened cytotoxicity prompts cell differentiation and senescence (Müllers E et al., 2014; Feringa FM et al., 2018; Toledo LI et al., 2008; Zhao Y et al., 2023). Furthermore, extreme scenarios of high cytotoxicity can cause cells to engage in the apoptotic process (Yousefzadeh M et al., 2021; Carneiro BA & El-Deiry WS, 2020; Roos WP & Kaina B, 2013; Zheng P et al., 2018). Experiments have shown that DNA damage can disrupt the gene transcription process (Lans H et al., 2019). Specifically, DNA damage slows down the movement of RNA Pol II along the DNA strand (Muñoz MJ et al., 2009). When more severe DNA damage is encountered, RNA Pol II exhibits a sliding pause behavior for DNA repair before resuming transcription (Figure 1A) (Gregersen LH & Svejstrup JQ, 2018; Geijer ME & Marteijn JA, 2018; Giono LE et al., 2016). These changes in RNA Pol II movement behavior would further reflect the cellular regulation of gene expression, particularly burst kinetics (Friedrich D et al., 2019; Calia GP et al., 2023). A recent study has shown that cellular responses to DNA damage involve the regulation of burst frequency (BF) on specific genes, rather than a genome-wide conclusion (Friedrich D et al., 2019). They have demonstrated that cells respond to DNA damage by increasing gene expression noise, but they do not delve into the topic of transcriptional bursts (Calia GP et al., 2023; Desai RV et al., 2021). Therefore, the question of how DNA damage causes cells to regulate transcriptional burst kinetics on a genome-wide scale remains unresolved.

Overview of DeepTX framework. (A) Cells stimulated by DNA damaging drugs cause damage to the DNA double strands, which will slow down or even stop the movement of RNA Pol II on the DNA double strands. Changes in the state of RNA Pol II movement will lead to changes in the kinetics of gene expression bursts (including burst frequency and burst size), which will affect cell fate decisions such as apoptosis, differentiation, and survival. (B) The input to the DeepTX framework is scRNA-seq data, and the output is the burst kinetics corresponding to each gene. The core of the DeepTX inference framework is a hierarchical model, which is a mixture of the stationary distribution of the mechanism model solved by the neural network (Step 1) and the binomial distribution followed by the sequencing process. This hierarchical model is used to infer the out dynamics parameters corresponding to the data distribution (Step 2).

To understand the dynamics of transcriptional bursts, ideally, one needs to observe the fluctuations of gene expression over a continuous time interval, such as smFISH, scNT-seq, and HT-smFISH (Femino AM et al., 1998; Qiu Q et al., 2020; Safieddine A et al., 2023). However, most existing techniques can only observe a limited number of genes, making their application to genome-wide studies challenging. Fortunately, single-cell RNA sequencing (scRNA-seq) technology provides excellent opportunities to explore this question, emerging as the leading method for genome-wide mRNA measurements and revealing gene expression noise within individual cells (Tanay A & Regev A, 2017; Picelli S et al., 2013; Zheng GX et al., 2017). Many studies have employed scRNA-seq to elucidate the diverse sources of gene expression noise and the fundamental principles of transcriptional dynamics on a genome-wide scale (Eling N et al., 2019; Faure AJ et al., 2017; Morgan MD & Marioni JC, 2018; Ochiai H et al., 2020). However, scRNA-seq data are static snapshot measurements. To recover the underlying dynamic transcriptional burst kinetics, a mathematical approach is needed to model the dynamic gene expression stochastic processes by regarding the scRNA-seq data as a steady-state result of the dynamic model (Larsson AJM et al., 2019; Luo S et al., 2023; Luo S et al., 2023).

For mathematical modeling, the gene expression model for DNA damage ought to encompass more sophisticated generalized properties compared to the traditional telegraph model. This is because DNA damage-induced RNA pol II turns transcription into a multi-step process rather than a single-step process (Stumpf PS et al., 2017; Voss TC & Hager GL, 2014). Specifically, the burst kinetics can be conceptualized as genes switching between inactive and active states (Singh A et al., 2013; Cavallaro M et al., 2021). Many efforts in this field are addressing the challenges of describing this multi-step process of gene expression in an interpretable and tractable model. One common modeling approach is to introduce multiple intermediate states between inactivate and activate states (Zhang J et al., 2012; Zhang J & Zhou T, 2014; Zhou T & Zhang J, 2012). Although this multi-state model can fit the experimental data better, it is hard to rationalize the state numbers and parameters (Desai RV et al., 2021; Rodriguez J et al., 2019; Zoller B et al., 2015). Another alternative modeling approach is direct non-Markovian modeling for non-exponential waiting times in gene state, which maps the multiple parameters from multi-state models to a small number of interpretable parameters that are easily observed experimentally (Daigle BJ, Jr. et al., 2015; Kumar N et al., 2015; Schwabe A et al., 2012; Stinchcombe AR et al., 2012; Zhang J & Zhou T, 2019; Zhang JJ & Zhou TS, 2019). However, the ensuing difficulty is solving the analytic solution of the steady distribution from a non-Markovian model, which may require some numerical simulation methods that are computationally resource-intensive and time-consuming, such as Monte Carlo methods (Sisson SA et al., 2007).

Statistical inference, particularly in recovering burst kinetic parameters from genome-wide scRNA-seq data, necessitates efficient and scalable inference algorithms (Gómez-Schiavon M et al., 2017). Therefore, the stochastic dynamical system must be quickly solved as parameters are continuously updated throughout the inference process. Deep learning exhibits a broad array of applications as a contemporary method for addressing complex systems (Jiang Q et al., 2021; Wang S et al., 2019; Michoski C et al., 2020). For example, some studies employed neural networks to establish the mapping relationship between a wide range of parameters in the gene expression model and the corresponding pre-simulated steady distribution (Figure 2B) (Jiang Q et al., 2021; Wang S et al., 2019; Davis CN et al., 2020; Tang Y et al., 2023). This approach, based on neural networks, has been utilized for parameter inference in both deterministic and stochastic models (Jiang Q et al., 2021; Gaskin T et al., 2023; Sukys A et al., 2022; Tang W et al., 2023).

Solving TXmodel using deep learning. (A) With the TXmodel and corresponding model parameters θ, we obtain a steady distribution using the SSA simulation and the moment statistics employing binomial moment theory. These results are compared with the distribution and moment statistics of the neural network’s output to calculate the neural network’s loss value. The parameters of the neural network are optimized using gradient descent until convergence is achieved, indicating attainment of optimal parameters. (B) Comparison between SSA results and DeepTX prediction results for four sets of test parameters. The gray histograms represent the distribution simulated by SSA, while the red curve indicates the distribution predicted by DeepTX. (C) Verification results of the moment statistics predicted by DeepTX and the true moments on a test set containing 1000 elements. (D) The loss curve during DeepTX training. The blue curve represents that the loss function is composed of KL divergence of distributions, and the red curve represents that the loss function is composed of KL divergence of distributions and statistics. (E) Box plots of the Hellinger distance between the true distribution and the predicted distribution by DeepTX of different loss types on the test set.

In this study, we introduce DeepTX, a deep learning inference framework that integrates mechanistic models and deep learning methods to elucidate the genome-wide regulation of transcriptional bursts in DNA damage response using scRNA-seq data. The DeepTX framework comprises two modules: DeepTXsolver and DeepTXinferrer. Specifically, DeepTX initially employs the DeepTXsolver module to efficiently solve complex gene expression dynamics models using neural network architectures. It subsequently utilizes the DeepTXinferrer module to accurately infer potential transcriptional burst kinetic parameters using Bayesian methods. DeepTX demonstrates good performance on synthetic datasets. Furthermore, to investigate genome-wide regulation of transcriptional bursts in response to DNA damage, we applied the DeepTX framework to three DNA damage-related scRNA-seq datasets representing cell differentiation, apoptosis, and cell survival. As a result, firstly, we found that the fate decision of mouse embryonic stem cells (mESCs) to undergo cell differentiation in response to DNA damage caused by 5’-iodo-2’-deoxyuridine (IdU) compounds may be driven by enhancing burst size (BS enhancer) of genes that delay the cell mitosis phase transition cycle, thereby influencing cell reprogramming and differentiation. Secondly, the fate decision of colorectal cancer cells to cell apoptosis in response to DNA damage induced by low dose 5-fluorouracil (5FU) drug may be driven by enhancing burst frequency (BF enhancer) of genes that increase intracellular oxidative stress. Thirdly, the fate decision of colorectal cancer cells to cell survival in response to DNA damage induced by high dose 5FU drug may be driven by BS enhancer genes that induce telomerase extension to slow down oxidative stress and help cells to develop drug resistance. In conclusion, DeepTX is a computational framework with utility and extensibility to infer transcriptional dynamics from scRNA-seq data.

Results

Overview of DeepTX framework

To understand how cell influences underlying transcriptional mechanisms in response to DNA damage, as described in the introduction section (Figure 1A), we propose an effective computational inference framework, referred to as DeepTX (Figure 1B). DeepTX takes as input a set of scRNA-seq datasets and then outputs genome-wide burst kinetics parameters with the benefit from deep learning methods (Figure 1B). The inference process of the DeepTX framework is composed of two crucial modules, DeepTXsolver and DeepTXinferrer. The first module, DeepTXsolver, is to solve for the stationary distribution of a gene expression dynamic model (Figure 1B middle). A neural network is constructed in DeepTXsolver to approximate the mapping from model parameters to the corresponding stationary distribution. As in many biological contexts, there is a need to model dynamics that describe more realistic gene expression processes that are difficult to solve (such as the case with DNA damage). The second module, DeepTXinferrer, infers the burst kinetics parameter (Figure 1B middle). Using a trained neural network from DeepTXsolver, we can easily and quickly obtain the stationary distribution (called model distribution) of any parameter within a reasonable parameter space. DeepTXinferrer compares the model distribution to the stationary distribution of each gene in the scRNA-seq data (called data distribution) as the parameters are iterated until convergence and utilize the Bayesian inference to output the posterior distribution of burst kinetic parameter (Figure 1B right). We will describe these two modules in detail in the following two sections.

DeepTX solves the transcriptional model using deep learning

To enable the recovery of kinetic information from static scRNA-seq data, we first performed rational mechanistic modeling of gene expression processes with DNA damage. The gene transcription burst process is often characterized using the traditional two-state model (Singh A et al., 2013; Cavallaro M et al., 2021), which assumes that genes switch randomly between OFF and ON states with exponentially distributed waiting times for each state. However, the gene expression process under conditions of DNA damage can result in slowing or even stopping the RNA pol II movement (Lans H et al., 2019). Further, DNA damage will cause many macromolecules to be recruited for damage repair, and this process will affect the spatially localized behavior of the promoter (Lans H et al., 2019), causing promoter inactivation and activation to become a complex multi-step process. We therefore used a generalized model we developed previously (Luo S et al., 2023), called TXmodel here, which extended the state waiting time to an arbitrary distribution, i.e., the gene expression system is a non-Markovian system (Figure 2A and see Methods 1.2). More specifically, the waiting times for the transitions between the OFF and ON states are represented by two random variables, assumed to follow the arbitrary distributions foff (koff, roff) and fon (kon, ron), respectively. Additionally, we assumed that the rates of mRNA synthesis and degradation are constants, i.e., the waiting times for mRNA transcription and degradation follow exponential distributions with rate parameters ksyn and kdeg, respectively (Figure 2A left). Nevertheless, this non-Markovian gene expression system is difficult to solve stationary distribution analytically, while numerical solution methods are time-consuming.

For that reason, we utilized a deep learning approach, which has been demonstrated to be effective in solving stochastic systems (Gupta A et al., 2021), to construct a mapping from the parameter of the mechanistic TXmodel to its corresponding stationary distribution. More specifically, we aimed to train a fully connected neural network referred as DeepTXsolver, whose inputs are the parameters of the TXmodel θ and whose output is a stationary distribution parameterized by a mixed negative binomial distribution Pneuralnet, which has good performance in approximating the solution of the chemical master equation (Perez-Carrasco R et al., 2020; Öcal K et al., 2022) (Figure 2A and see Methods 1.3). To generate effective training sets for DeepTXsolver, we first generated a large number of parameter sets within a reasonable range of parameter space by using Sobol sampling (see Methods 1.3) (Sobol’ IyM, 1967). Subsequently, for each parameter set, we employed a modified stochastic simulation algorithm (SSA) for non-Markovian TXmodels to generate numerous samples (Figure 2A and see Methods 1.3). The resulting stationary distributions Psimulation served as training labels. Additionally, we demonstrated that the arbitrary order binomial moments of the TXmodel can be analytically solved iteratively within the context of queueing theory (Zhang Z et al., 2021; Zhang J et al., 2024). This enables the straightforward computation of significant summary statistics, including mean and Fano factor (Figure 2A, see Methods 1.2, and Supplementary Text 1.2). Thus, our loss function comprises two primary components: (i) the KL divergence between the predicted distribution generated by the neural network Pneuralnet and the labeled distribution generated by the SSA Psimulation; (ii) the logarithmic error of the predicted summary statistics computed by the neural network Ŝ and the labeled summary statistics s computed by theoretical derivation (Figure 2A and see Methods 1.3). Furthermore, we compared various optimizers and hyperparameters of model architectures to determine the best predicting performance (see Methods 1.3 and Supplementary Figure S2).

Consequently, the loss function of DeepTXsolver converges effectively during the training process. We utilized DeepTXsolver to predict the stationary distribution using parameters from the test set. We observed that it effectively fits all four representative distributions from the TXmodel, encompassing unimodal distribution at zero point, unimodal distribution at non-zero point, bimodal distribution with one peak at zero point, and bimodal distribution with both peaks at non-zero point (Figure 2B). These distributions have been extensively linked to cell fate decisions in numerous experiments (Gupta PB et al., 2011; Cohen AA et al., 2008; Bessarabova M et al., 2010). Moreover, we demonstrated that DeepTXsolver accurately predicted crucial distribution properties, including mean, Fano factor, and bimodal coefficient, with a high correlation between theoretical and predicted summary statistics (r > 0.99, p-value< 2.2*10-16 of t-test Figure 2C). It is noteworthy that, to assess the impact of the presence or absence of summary statistics on DeepTXsolver, we performed separate experiments using different loss functions. We demonstrated that incorporating summary statistics into the model led to quicker convergence of DeepTXsolver on the training set (Figure 2D) and yielded more robust and accurate predictions on the test set (Figure 2E).

Overall, DeepTXsolver effectively establishes mappings from parameters to stationary distributions. This approach circumvents the high computational resource requirements of classical simulation algorithms and ensures the algorithm’s efficiency and scalability in subsequent statistical inference, particularly in genome-wide inference.

DeepTX infers genome-wide transcriptional kinetics from scRNA-seq data

Having obtained the trained neural network model, we can rapidly compute the stationary distributions of any TXmodel parameters. These distributions represent the outcomes of the underlying true gene expression process, referred to as the “true distribution” Pmodel (Figure 3A). However, during the inference process, the gene expression distributions measured by the scRNA-seq data we utilized, termed the “observed distribution of the data” Pdata, were subject to noise, including errors stemming from the measurement technique (Sarkar A & Stephens M, 2021) (Figure 3A). Thus, we introduced a mechanistic hierarchical model to bridge the gap between the true and observed distributions. This hierarchical model considers that the observed data results from the interplay between the gene expression process and the measurement process. It combines the two distributions using a convolutional form (see Methods 1.4). Pmodel (x) represents the mixed negative binomial distribution approximated, whereas for Pmeasure (yx), we opted for the binomial distribution. This choice is informed by the fact that it can approximate hypergeometric distributions representing sequencing sampling without replacement when the sample size is sufficiently large. Consequently, the final distribution can maintain the form of a mixed negative binomial distribution (Supplementary Note 1.4), denoted as the observed distribution of the model Pdata (Figure 3A).

Inferring TXmodel from scRNA-seq data. (A) The dynamic parameters are sampled from a given prior distribution, and as input to the neural network, the solution of the corresponding dynamic model can be obtained by using this parameter. The solution of the dynamic model is mixed with the binomial distribution to obtain the model observed distribution. Loss values were obtained using the observed distribution of the data obtained from the scRNA-seq data compared to the observed distribution of the model. The loss values are optimized until convergence using gradient descent to obtain the parameters of the mechanism model and the posterior distribution of its parameters. (B) The scatters represent the real BS and BF values of the SSA synthetic set parameters, and the depth of the color represents the error between the inferred BS (BF) and the true BS (BF). (C, D) The blue solid line represents the edge density of the burst kinetics of the model, and the red dashed line represents the true value. (E) The marginal density for the five parameters of the model, where the red dashed line represents the true value.

Leveraging the hierarchical model, we can construct a scalable and interpretable module called DeepTXinferrer, facilitating genome-wide inference of transcriptional burst kinetics from scRNA-seq data. First, we input the prior distribution of mechanistic model parameters and a set of scRNA-seq data, and we can get the observed distribution of the model Pobsmodel (derived successively from the trained neural network and the hierarchical model) and the observed distribution of the data Pobsdata, respectively (Figure 3A). Second, we used the Hellinger distance L (θ) = D (Pobsmodel, Pobsdata) to measure the difference between Pobsmodel and Pobsdata (see Methods 1.4). It’s worth noting that the loss function remains solvable via gradients, despite the inclusion of complex computations, such as neural networks and hierarchical models. Consequently, we utilize a gradient-based optimizer to update the iterative parameters until convergence. Lastly, we employ a method based on loss potentials, which computes the posterior distribution of the parameters during the optimization of the loss function (Gaskin T et al., 2023) (Figure 3A, see Methods 1.4). Subsequently, this process yields results of the transcriptional burst kinetics derived from the estimated parameters of the mechanistic model.

We assessed the effectiveness of DeepTXinferrer by conducting validation on synthetic data (see Methods 1.4). Synthetic samples were generated for each parameter set, and our inference algorithm was then applied to derive the output transcriptional burst kinetics (see Methods 1.4). As a result, the estimation of transcriptional burst kinetics demonstrated high accuracy and strong correlation across most parameter regions (r > 0.99, p-value< 2.2*10-16 of t-test Supplementary Figure S3b, c), particularly within the region of fully expressed mRNA abundance (Figure 3B). Moreover, the estimated posterior distribution of each parameter suggests that the peak value closely approximates the true value (Figure 3C-E). Additionally, we verified the robustness of our algorithm concerning the number of samples, observing that higher sample numbers corresponded to increased accuracy in estimation (Figure 3D, E). In summary, our inference algorithm exhibited strong performance on synthetic data, affirming its reliability for application to real-world data in inferring transcriptional bursts.

In the subsequent sections, we will employ the DeepTXinferrer to explore how cells determine diverse fates in response to DNA damage stimuli through genome-wide regulation of transcriptional bursting. We analyzed three sets of scRNA-seq data representing distinct cell fates: cell differentiation, apoptosis, and survival. The first set of data is a mutually controlled scRNA-seq dataset of mESC cell differentiation induced by IdU drug treatment. This data set contains 12481 genes of transcriptomes from 812 cells and 13780 genes of transcriptomes from 744 cells, respectively. The second set of data is a mutually controlled scRNA-seq dataset of human colon cancer cells treated with low-dose 5FU drug that causes apoptosis. This data set contains 8534 genes of transcriptomes from 1673 cells and 7077 genes of transcriptomes from 632 cells, respectively. The third set of data is a mutually controlled scRNA-seq dataset of human colon cancer cells in which high-dose 5FU drug treatment resulted in cell-resistant survival. This data set contains 8534 genes of transcriptomes from 1673 cells and 6661 genes of transcriptomes from 619 cells, respectively.

DeepTX reveals that the BS enhancer induced by IdU treatment promotes cell differentiation

The administration of the IdU drug, a thymine analogue, influences the gene expression process within cells, subsequently impacting cell differentiation (Desai RV et al., 2021; Li C et al., 2024). The perturbation in the gene expression process primarily stems from DNA damage, as IdU is randomly integrated into the DNA chain, thereby inducing DNA damage (Li C et al., 2024). Although DNA damage caused by IdU treatment has little impact on genome-wide mean gene expression, it increases variance across the genome, thereby affecting cell differentiation (Desai RV et al., 2021). This is different from the understanding that increases in variance are usually caused by fluctuations in the mean gene expression (Newman JR et al., 2006). Meanwhile, the mean can be deconvolved as the product of BS and BF. This raises the question of the relationship between bursting dynamics and variance on a genome-wide scale, and how changes in bursting dynamics affect cell differentiation.

We used the DeepTX framework to infer a set of preprocessed scRNA-seq data (see Methods 2.1) to obtain underlying burst kinetics. The difference between the statistics and distribution obtained by inference and the statistics and distribution of sequencing data is minimal, ensuring the correctness of the inference results (Supplementary Figure S4a-l). Subsequently, we observed considerable variations in BS and BF across different genes (Figure 4A, Supplementary Figure S5d-f). Notably, genes with significantly increased BS exhibit corresponding decreases in BF, suggesting relative stability in the average gene expression level (Supplementary Figure S5d-f). Additionally, we observed a significant increase in transcriptional BS for most genes in the treatment group (Figure 4A). This indicates that the rise in variance primarily results from the heightened BS induced by DNA damage (Figure 4B). Notably, this conclusion is consistent with the results of theoretical analysis (Supplementary text 2).

Burst size enhancer delays cell cycle. (A) Density map of BS of genes treated with IdU drugs compared to BS of genes not treated with drugs. Green dots represent up-regulated BS differential genes. (B) Schematic representation of the simultaneous enhancement of both BS and noise of a gene accompanied by the mean value of the gene remaining unchanged after IdU drug treatment. The blue plane represents the relationship between burst kinetics (BS and BF) and the mean. The orange plane represents the relationship between mean and noise. (C) Schematic diagram of the mechanism by which IdU drug treatment affects the mitotic cell cycle transition of cells via DNA damage. (D) Gene GO enrichment analysis is performed on the green dots of BS in (A) to obtain enrichment pathway diagram. (E, F) Pathways obtained by GSEA enrichment analysis of BS of genes.

To discover the biological characteristics of BS enhancement, we identified genes with upregulated BS using a t-test-based differential analysis method. Further, we perform GO enrichment analysis (Consortium GO, 2015) on the identified genes. We observed that the genes with enhanced BS were mainly enriched in the signal transduction of DNA damage response and the pathway that regulates the cell mitosis phase transition cycle (Figure 4D). Subsequently, we performed Gene Set Enrichment Analysis (GSEA) (Subramanian A et al., 2005) on the identified differentially expressed genes associated with BS and observed enrichment in pathways related to cardiac muscle cell differentiation and mitotic intra-S DNA damage checkpoint signaling (Figure 4E, F). Similarly, we subjected genes significantly enhanced by BF in the treatment group (Supplementary Figure S5f) to GO analysis (Supplementary Figure S5g). Notably, the insignificance of the cell cycle transition pathway suggests that it is primarily influenced by BS-enhanced genes. Moreover, previous experimental studies have confirmed that treatment with IdU induces DNA damage, subsequently reprogramming cell differentiation fate (Rosina M et al., 2019; Riccio F et al., 2022; Liu Z et al., 2017) (Figure 4C). Additionally, research indicates that enhanced cell differentiation results from delayed cell cycle transitions (Rosina M et al., 2019).

DeepTX reveals that the BF enhancer induced by low-dose 5FU treatment accelerates cell apoptosis

The chemotherapeutic compound, 5-fluorouracil (5FU), exerts its cytotoxic effects on colorectal cancer cells by inflicting damage to their DNA (Kuipers EJ et al., 2015; Bunz F et al., 1999; Chang GS et al., 2014). This damage triggers apoptosis, a process of programmed cell death, thus inhibiting the growth and proliferation of malignant cells (Kuipers EJ et al., 2015; Bunz F et al., 1999; Chang GS et al., 2014). However, the impact of 5FU-induced DNA damage on the gene expression process and the underlying mechanism of cell apoptosis remains unclear. In this section, we employ the DeepTX to investigate the effect of 5FU-induced DNA damage on the gene expression process in colorectal cancer cells with scRNA-seq data (Park SR et al., 2020).

First, we inferred the burst kinetics from the scRNA-seq data of cells treated with low doses (10 μm) of 5FU and controls. After preprocessing the data (see Methods 2.2), we inferred the scRNA-seq data to obtain a distribution that fits the scRNA-seq data (Supplementary Figure S6a-l), as well as the underlying burst kinetics (Figure 5A, B).

Low dose 5FU treatment induces cell apoptosis by enhancing burst frequency. (A) Comparison of gene bursting kinetics and statistics between control and 10-dose 5FU treated cells. (B) Density map of inferred burst size. (C) Volcano map of the fold changes (FC) of inferred burst size between control and 10-dose 5FU treated cells. (D) Mechanism diagram of apoptosis induced by low-dose 5FU drug treatment. (E) Gene GO enrichment analysis is performed on the green dots of BS in (A) to obtain enrichment pathway diagram.

Further, we found that the mean difference between the two groups of scRNA-seq data was not significant (p-value = 0.5 of t-test), but the variance and BS were significantly different (p-value<0.05 of t-test Figure 5A and Supplementary Figure S7a-c). Then, we performed a differential analysis on the BS and BF of gene expression in the 5FU-untreated cells and 10 μm 5FU treated cells, and then performed GO gene enrichment analysis on differential genes (Figure 5C and Supplementary Figure S7d-f). We found that BS down-regulates differential genes were mainly enriched in apoptosis pathways, electron exchange pathways, and oxidation-related pathways (Figure 5D and Supplementary Figure S7g). Specifically, the enrichment analysis indicated that although the burst size of genes in 5FU-treated cells was generally larger than that in the control group, the genes primarily influencing cell apoptosis exhibited a down-regulated burst size

Our enrichment analysis results are consistent with existing experimental results (Figure 5C). First, studies in recent years have found that many anticancer drugs trigger apoptosis and necrosis of cancer cells by activating the production of reactive oxygen species (ROS) (Sun Y & Rigas B, 2008). And experiments have shown that 5FU can induce ROS production in colorectal cancer cells (oxidation-related pathways) (Laha D et al., 2015; Bwatanglang IB et al., 2016; Chenna S et al., 2022). Meanwhile, mitochondria are the main source of ROS production, and defects in the mitochondrial electron transport chain will increase ROS production (electron exchange pathways) (Adam-Vizi V, 2005). As a result, excessive production of ROS can cause mitochondrial DNA damage and nuclear DNA damage (Figure 5D). In particular, when mitochondrial DNA is damaged, mitochondrial damage repair will further increase the pressure of ROS (Figure 5C right) (Shokolenko IN et al., 2014). Hence, DNA damage ultimately leads to cell death through apoptosis (apoptosis pathways) (Figure 5C) (Handali S et al., 2018; Hwang PM et al., 2001).

DeepTX reveals that BF enhancer induced by high-dose 5FU treatment contributes to cell survival

Low-dose 5FU treatment can induce cell apoptosis, while research shows high-dose 5FU treatment can induce cell resistance and evade apoptosis to survive (Kuranaga N et al., 2001). Hence, this part mainly studies the impact of differences in burst kinetics of gene expression between high-dose (50 μm) 5FU-treated and untreated colorectal cancer on cell fate decisions.

After data preprocessing (see Methods 2.2), we similarly observed the phenomenon that the average expression value of genes remained stable, while there is an enhanced increase in the variance genome-wide (Supplementary Figure S9a-c). Then, after preprocessing the data (see Methods 2.2), we inferred the scRNA-seq data to obtain the distribution that fits the scRNA-seq data (Supplementary Figure S8a-l), as well as the underlying burst kinetics (Figure 6A). Further, we performed a difference analysis on the BS of the two sets of inferred data, and selected genes with large differences in BS and BF for GO gene enrichment (Figure 6A, B and Supplementary Figure S9d-f). We found that the BS down-regulates differential genes were mainly enriched in the Cajal body related pathway, the electron exchange pathway, and telomerase-related pathways, but the apoptosis-related pathways were not significant (Figure 6D). In particular, according to experimental studies, telomerase-related pathways can activate telomerase and play an important role in the elongation of telomeres (Cristofari G et al., 2007).

High dose 5FU treatment may promotes cell drug resistance by altering burst frequency. (A) Density map of inferred burst size. (B) Volcano map of the fold changes (FC) of inferred burst size between control and 50-dose 5FU treated cells. (C) Schematic diagram of the mechanism by which 5FU drug treatment induces cells to produce antioxidant properties and then continue to survive. (D) Gene GO enrichment analysis is performed on the green dots of BS in (A) to obtain enrichment pathway diagram. (E) UMAP of G0 arrest quality score (QS) of control and 50-dose 5FU treated cells. (F) Proportion of G0 arrested (drug-resistant) and cycling (non-resistant) cells. (G) Comparison of gene bursting kinetics and statistics of G0 arrested genes between control and 50-dose 5FU treated cells.

Gene enrichment results are consistent with existing ways of cells evading apoptosis (Figure 6E). That is, endogenous ROS production exceeds the cellular antioxidant defense capacity, leading to chemical damage to mitochondrial DNA (mtDNA). Meanwhile, high doses of 5FU causes Human Telomerase Reverse Transcriptase (hTERT) translocation, which on the one hand triggers telomerase activity and maintenance pathways, which enables cells to evade apoptosis (Kuranaga N et al., 2001). On the other hand, this translocation will enrich hTERT in mitochondrial DNA, thereby reducing oxidative stress and mitochondrial DNA damage to protect mitochondria and prevent cells from apoptosis (Lipinska N et al., 2017).

An important hallmark of cancer cell resistance is that cells are temporarily arrested in a non-proliferative G0 state (Wiecek AJ et al., 2023). To verify the presence of resistant cells in cells treated with high doses of 5FU, we selected genes related to G0 arrest and performed GSVA (Hänzelmann S et al., 2013) to obtain the G0 arrest quality score (Wiecek AJ et al., 2023) of each cell and perform UMAP dimensionality reduction visualization (Figure 6E). Through mass score discrimination, we found that the proportion of resistant cells was large among cells treated with high doses of 5FU (Figure 6F). This result was consistent with the results of our enrichment analysis. Specifically, by comparing the BS and mean values of these genes, we found that most of the BS of G0 arrest related genes were up-regulated, but the mean changes were small (Figure 6G).

Discussion

Gene expression is inherently stochastic, leading to inter-individual variations primarily attributed to differences in burst patterns (Dar RD et al., 2012; Zenklusen D et al., 2008; Phillips R et al., 2012). Experimental observations confirm this bursting mechanism as a cellular response to environmental changes, particularly DNA damage (Friedrich D et al., 2019; Desai RV et al., 2021), influencing cell fate decisions. In this study, we introduced a mechanism-based deep learning framework (DeepTX) for performing genome-wide inference on transcriptional burst kinetics of a comprehensive gene expression model using scRNA-seq data. This framework offers an efficient computational approach to explore how cells regulate transcriptional burst kinetics in response to DNA damage at the genome-wide level, affecting various cell fate decisions.

Utilizing DeepTX, we analyzed three sets of scRNA-seq data comprising control and stimulus pairs driving different cell fate decisions: cell differentiation, apoptosis, and survival. This analysis unveiled significant insights into the underlying mechanisms of transcriptional burst kinetics. Firstly, IdU drug treatment-induced BS enhancement delays the cell mitosis phase transition, impacting cell reprogramming and differentiation. Interestingly, while prior research suggested noise enhancement as the driver of cell fate decisions, we uncovered that BS enhancement is the deeper mechanism leading to cell differentiation, potentially resulting in cardiomyocyte differentiation. Secondly, low-dose 5FU treatment increases oxidative stress, promoting cell apoptosis by augmenting BF. Consistent with previous findings attributing apoptosis to ROS stress, our study reveals that ROS-induced DNA damage alters gene expression burst kinetics, ultimately triggering apoptosis. Thirdly, high-dose 5FU treatment induces telomerase extension, mitigating oxidative stress and fostering drug resistance by upregulating BF. Despite prior evidence of drug-resistant cells, our enrichment analysis underscores telomerase extension’s role in resisting oxidative stress, underscoring the impact of BF enhancement.

We highlight several advantages of our DeepTX framework. Firstly, DeepTX employs a mechanistic hierarchical model that is both interpretable and extensible. To enhance the model’s realism in interpreting scRNA-seq data, we developed a novel approach integrating the sequencing process and the underlying gene expression process. Specifically, we employed the TXmodel, a complex non-Markovian stochastic dynamic model that extends the waiting times for gene OFF and ON states into a non-exponential distribution, reflecting the multi-step nature of gene regulation. As a result, TXmodel not only captures gene expression dynamics under DNA damage but also depicts other multi-step regulatory processes, such as chromatin opening (Lomvardas S & Thanos D, 2002), preinitiation complex formation (Fuda NJ et al., 2009), and transcription factors binding (Nordick B et al., 2022). Secondly, despite the mathematical complexity of the mechanistic hierarchical model, DeepTX renders it tractable. This is achieved by (i) directly mapping model parameters to corresponding stationary distributions via a neural network trained on extensive simulated single-cell RNA data and theoretical analysis summary statistics for TXmodel; (ii) approximating the stationary distribution output of TXmodel, coupled with the sequencing process, by a mixed negative binomial distribution, maintaining the final distribution format. Thirdly, DeepTX’s inference process is scalable, allowing genome-wide bursting kinetics inference from static scRNA-seq data. Despite the computational complexity of parameter inference involving hierarchical models and trained neural networks, DeepTX retains auto-differentiation during inference, ensuring efficiency.

The DeepTX research paradigm is poised for extension to encompass both dynamic models and biological data. Currently, our focus is confined to the transcriptional processes influenced by DNA damage, limiting the explanatory scope of the TXmodel to larger-scale biological phenomena such as alternative splicing (Wan Y et al., 2021; Gorin G & Pachter L, 2022), protein translation (Luo S et al., 2023; Golan-Lavi R et al., 2017), epigenetic modification (Nicolas D et al., 2018), and chromatin movement (Liu T et al., 2016; Wang Z et al., 2024; Wang Z et al., 2023). These broader biological processes necessitate more intricate dynamic models, often challenging to solve and thus challenging to apply to realistic biological data. However, the construction of neural networks within the DeepTX framework, leveraging extensive simulated data and summary statistics, facilitates the implementation of these complex dynamic models. As for biological data, our analysis has been restricted to single-cell RNA data from control and stimulated groups to elucidate the impact of DNA damage on burst kinetics. Yet, the advancement of sequencing technology promises an influx of diverse data types ripe for integration into DeepTX. Examples include increasingly comprehensive transcriptome data with both temporal and spatial resolution (Wen L & Tang F, 2022), as well as state-of-the-art multi-omics data (Miao Z et al., 2021). In conclusion, DeepTX provides a step toward inferring complex transcriptional kinetics from scRNA-seq data through mechanism-based deep learning inference. It sheds light on the augmented burst kinetics in DNA damage response across various cell fate decisions, thereby offering novel insights.

Methods

1 The DeepTX framework

1.1 Framework overview

The DeepTX Framework aims to infer the underlying bursting dynamics from static single-cell RNA sequencing (scRNA-seq) data to explore the impact of changes in bursting dynamics on cell fate decisions. The DeepTX framework includes two modules, DeepTXsolver and DeepTXinferrer. DeepTXsolver can efficiently and accurately solve transcription models using neural network architecture, and DeepTXinferrer can infer scRNA-seq data to obtain its underlying transcription burst kinetics using Bayesian methods.

The input of the DeepTXsolver is the parameters of the mechanism model, and the output is the corresponding stationary distribution solution of the model. The input to the DeepTXinferrer is the discrete probability distribution of gene expression obtained from scRNA-seq data, and the output of the model is the parameters of the mechanism model.

The construction of the DeepTX framework consists of the following parts: (1) Model the gene expression process. (2) Train a neural network that solves the mechanism model using the data set generated by stochastic simulation algorithm (SSA) simulation. (3) Hierarchical model modeling to infer underlying burst kinetic parameters from scRNA-seq data.

When building the DeepTX Framework, we made the following assumptions: (1) The gene expression of cells was in a stationary distribution during sequencing. (2) Gene expression counts of the same cell type follow the same distribution. (3) The solution of the model can be approximated by a mixed negative binomial distribution. (4) Gene expression state space and time are equivalent (SSA simulation truncation).

1.2 Modelling stochastic gene expression processes

To recover dynamic transcriptional burst kinetic parameters from static scRNA-seq data, we require the stochastic modeling approach to profile the gene expression process. The traditional telegraph model is often used to model gene transcriptional bursting, which assumes that switching between genes follows an exponential distribution (i.e., switching is a single-step process) (Peccoud J & Ycart B, 1995; Kim JK & Marioni JC, 2013). However, this assumption is an oversimplification of the gene expression process in the context of DNA damage. Due to the fact that DNA damage slows down and even stops the elongation of RNA pol II across the double-strand DNA (Lans H et al., 2019), the gene expression process should be regarded as a multi-step process. This property causes the waiting time for the gene switching between active and inactive states to follow a non-exponential distribution (Daigle BJ, Jr. et al., 2015; Zhang J & Zhou T, 2019; Zhang JJ & Zhou TS, 2019). Therefore, we use TXmodel (Luo S et al., 2023) to better characterize the gene transcription process in the context of DNA damage (Figure 2), whose chemical reaction network (CRN) can be written as follows:

where R1 and R2 describe the gene-state switching randomly between the ON state and OFF state, and foff (t,ωoff) and fon (t,ωon) are the distribution of dwell time in ON and OFF state, respectively. While the gene is in the active state, the synthesis of mRNA molecules is governed by a Poisson process with a constant synthesis rate ksyn (R3). Meanwhile, each mRNA molecule can be stochastically degraded (R4) with a constant degradation rate kdeg. Overall, the TXmodel can be determined by a set of parameters, denoted as θ = (ωoff, ωon, ksyn, kdeg).

The TXmodel as an extension of the traditional telegraph model can capture more realistic transcriptional burst kinetics, with BF and BS being two important parameters. The definition of BF is

where and are the mean dwell time in OFF and ON state, respectively. The definition of BS is

Without loss of generality, the gamma distribution can be used to characterize non-exponential distributions, i.e., and , where Γ(·) is Gamma function, roff and ron are the rate-parameters of state switching, koff and kon are the possible number of reaction steps between OFF and ON state.

It is a challenge to solve the stationary distribution of this gene expression system directly from a theoretical approach. But we proved that the binomial moments of this system can be obtained in an iterative form, by rationalizing this gene expression system as a queueing theory model. Thus, we can theoretically compute some important summary statistics s (θ) given a parameter θ, such as mean, variance, kurtosis, and bimodal coefficient (see details in Supplementary Text 1.2).

1.3 Training neural network to solve gene expression model

The stationary distribution of CRN (1) is difficult to be solved analytically. In general, stochastic simulation algorithms are used to approximate the solution of chemical master equations (Boguñá M et al., 2014; Masuda N & Rocha LE, 2018). However, the inference process involves updating the parameters, and each update requires a simulation to obtain the stationary distribution, which makes statistical inference time-consuming. We note that deep learning methods may have potential applications in solving this challenge because it is widely used for approximating the solution of partial differential equations, chemical master equations, and non-Markovian chemical master equations (Jiang Q et al., 2021; Wang S et al., 2019; Davis CN et al., 2020; Gupta A et al., 2021; Bar-Sinai Y et al., 2019).

Therefore, in this part, we used a neural network to approximately solve CRN (1). This enables CRN (1) to be solved efficiently and accurately. Details will be introduced in the following part.

Generating synthetic training dataset

To construct the mapping from the parameters of TXmodel to the stationary distribution, we first need to generate the neural network’s training set, i.e., a large-scale range of parameter sets as features and corresponding probability distributions as labels. For the features, we used the Sobol algorithm (Sobol’ IyM, 1967) to conduct large-scale sampling to generate the neural network data set, which can cover the entire parameter space more evenly than random sampling.

For the labels, we perform the SSA of the TXmodel to obtain the stationary distribution given a set of preset parameters. For clarity, the synthetic training dataset is denoted by {(θi, Psimulation,i)}, where θi presents the model parameters of TXmodel, and Psimulation, i represents the stationary distribution solution obtained by SSA (Figure 2A). Specifically, the value range of each parameter component of θi is koff ∈(1,15), roff ∈(0.1,10), kon ∈(1,15), ron ∈(0.01,10), ksyn ∈(0.1, 400).

Deep learning model architecture

We build the basic deep learning model architecture in DeepTX via fully connected neural networks, which has been effectively used to solve approximate solutions from biochemical reaction networks (Sukys A et al., 2022; Gupta A et al., 2021). The deep learning model architecture consists of an input layer, hidden layers, and an output layer.

Firstly, the input layer contains 5 neurons corresponding to the parameters of the mechanism model θi = (koff,i, roff, i, kon,i, ron,i, ksyn,i). Secondly, the input layer is followed by nhidden hidden layers with each layer containing neurons. For the neurons in each hidden layer, we implement a nonlinear mapping by the commonly used ReLU activation function (Supplementary Figure S1):

where represents the value of the j -th neuron of the nhidden -th hidden layer of the i -th group of parameters, represents the neural network weight connecting the k -th neuron of the nhidden -th hidden layer and the j -th neuron of the (nhidden −1) -th hidden layer.

Thirdly, In the output layer, we aim to allow neurons to approximate synthetic scRNA-seq data generated by TXmodel. Negative binomial distributions are a class of distributions that are widely used in modeling scRNA-seq data and have been shown to account for both the “drop-out” phenomenon and the fact that the Fano factor is greater than 1 in scRNA-seq data (Risso D et al., 2018; Love MI et al., 2014). Also, the negative binomial distribution has been employed as an approximate stationary solution to the TXmodel for many gene expression systems (Sukys A et al., 2022; Öcal K et al., 2022). We assign weights and parameters (aij, mij, lij) output layer neurons (Supplementary Figure S1):

where represents the value of the j -th neuron of the output layer of the i -th set of parameters, aij is the weight coefficient of the negative binomial distribution, satisfying , lij, mij represents the i -th parameter of the j -th negative binomial distribution of the mixed negative binomial distribution, nNB is the number of negative binomial distribution. And the mixed negative binomial distributions can be expressed as (Supplementary Figure S1):

where .

Loss function of DeepTXsolver

Our loss function includes two parts: one part is the KL divergence between the mixed negative binomial distribution generated by neural network Pneuralnet, i and label distribution Psimulation,i generated by SSA, and the another part is the log difference of the moment statistics between theoretical solution of the TXmodel sij and the output of the neural network Ŝij:

where λ is the hyperparameter, represents the weight coefficient of the statistical loss, and λ = 0.1, nbatch represents the number of samples in each batch, and nstats is the number of summary statistics. For the latter part of loss function, we can still obtain the moment statistic by theoretical derivation, although the analytic distribution is difficult to solve. Adding moment statistics to the loss function can make the neural network more robust (Figure 2E). Specifically, we use four statistics here: si1 indicates the mean, si 2 denotes the Fano factor, si 3 denotes kurtosis, si 4 represents the bimodal coefficient. It’s worth mentioning that the statistics here is scalable and we can add more information inside the loss function.

Training details of DeepTXsolver

Effective model training necessitates the selection of an appropriate optimizer to minimize the loss function. In our case, we employed the Adam optimizer (Kingma DP & Ba J, 2014), a widely adopted choice in deep learning due to its robust performance. And each time, a small batch of samples is selected to update the neural network parameters w (Goodfellow I et al., 2016), and the iteration of all samples is called an epoch. Furthermore, we utilized the Glorot Uniform method to establish an optimal initial point for our neural network parameter (Glorot X & Bengio Y, 2010).

Batch size and learning rate are key optimization hyperparameters affecting model convergence. A smaller batch size enhances model generalizability but lengthens training time, hence a batch size of 64 is chosen for balance (Keskar NS et al., 2016). In gradient descent process, the learning rate significantly influences the step size, where large rates may induce instability or divergence, and conversely, small rates may precipitate premature convergence to local optima. To counteract this, a decaying learning rate is used, allowing larger early updates for faster convergence and preventing instability in later stages. Further, training of the model stops if it has been trained for over 200 epochs or if the learning rate has been reduced five times. These criteria are adaptive and efficient as further training does not significantly enhance the model’s performance.

Hyperparameter tunning of DeepTXsolver

To obtain a neuron network model with accurate prediction and generalization, we compared the model architectures in terms of the number of neurons per layer, the number of hidden layers, and the number of neurons in the output layer, and compared the performance of the model with different dataset sizes. And we applied the Hellinger distance between the true distribution and the predicted distribution by the neuron network as a judgment criterion.

First, we compared the performance of architectures with a single hidden layer with (16, 32, 64, 128, 256, 512) neurons, and found that the performance of the model barely increased after increasing to 128 neurons (Supplementary Figure S2a). Secondly, after comparing the performance of architectures with different numbers of hidden layers, we found that a single hidden layer has the best performance (Supplementary Figure S2b). Thirdly, comparing the performance of 3i (i = 1, 2,…10) neurons in the output layer, we found that after 12 neurons (4 negative binomial distribution mixtures), the performance of the model was not improved (Supplementary Figure S2c). Particularly, four negative binomial distribution mixtures are able to fit the bimodal nature of the distribution as well as ensure good training efficiency (Öcal K et al., 2022).

Finally, to obtain a dataset that contributes to convergence as well as model generalization while ensuring training efficiency, we compared the model performance for dataset sizes of (500, 1000, 5000,10000,15000) and found that increasing the dataset size further does not significantly improve the model performance after to 15000 samples (Supplementary Figure S2d). Therefore, we identified a single hidden layer of 128 neurons and an output layer of 12 neurons as the appropriate model architecture, and 15,000 samples as the appropriate training set.

1.4 Inferring burst kinetics from scRNA-seq data with DeepTXinferrer

This section describes how to recover dynamic burst kinetics from static scRNA-seq data. In principle, once we have constructed the mapping of the parameters to the corresponding stationary distributions through the pre-trained neural network in section 1.3, we can quickly access the corresponding likelihood functions when updating the parameters during the statistical inference process. However, true scRNA-seq data is the result of a combination of gene expression processes and sequencing processes, and the current pre-trained neural network can only yield a description of the gene expression process modeled by TXmodel. Therefore, we require to extent it into a hierarchical model to depict the more realistic generation mechanism behind the scRNA-seq data (Sarkar A & Stephens M, 2021).

Modeling observed scRNA-seq data with hierarchical model

The hierarchical model aims to couple two different processes, specifically, the true gene expression mechanism process and the sequencing process that immediately follows. In terms of a formula, the probability distribution of observations can be represented as a convolution:

where Pmeasure is the measure model and Pmodel is the expression model.

First, we model the sequencing process. Each cell can be regarded as a pool of mRNA, and represents the true expression amount of the g -th gene in the c -th cell. refers to the observed value obtained by scRNA-seq from the given true expression value , satisfying the following conditional probability distribution

where Pmeasure (y | x) characterizes all noise in the sequencing process, and is generally assumed to be Binomial distribution or Poisson distribution, which has been proven experimentally and theoretically (Sarkar A & Stephens M, 2021; Wang J et al., 2018). By incorporating an additional sampling probability, denoted as αcg, into the sequencing process, we are able to characterize capture efficiency. We make the assumption that intercellular molecules are independent of one another, and that only proportional products are captured and sequenced, following a Binomial distribution

where αcg is the sampling probability, and we set αcg = 0.5.

Second, we model the true gene expression process, assuming that it satisfies the following probability distribution

The selection of the true gene expression probability distribution is very critical. It needs to meet the following conditions: (i) It can well characterize the transcription burst kinetics. (ii) The inference derived from this distribution necessitates both efficiency and precision.

In section 1.2, we have constructed a model of the gene expression process with DNA damage, enabling the model’s stationary distribution to encapsulate the bursty dynamics inherent in gene expression. The solution to the model is approximated by a mixed negative binomial distribution, which is obtained from a trained neural network. Importantly, this mixed negative binomial distribution enables efficient inference. Hence,

Further, the following hierarchical model can be obtained

where represents the random variable of the true expression value, represents the random variable of the observed expression value, Binomial represents the binomial distribution, NB represents negative binomial distribution and represents the sampling probability. Further, we can obtain

It is proved that also follows a mixed negative binomial distribution (Supplementary text 1.3), denoted as Pobsmodel.

Estimation and optimization process of DeepTXinferrer

We can use the DeepTX framework to infer scRNA-seq data to obtain the corresponding gene expression burst kinetics through optimization methods. Given the expression counts of gene g in all cells, its corresponding probability distribution Pobsdata can be obtained. We use the Hellinger distance to compare the error between the true probability distribution Pobsdata of genes and the model probability distribution Pmodeldata. The following optimization problem can be obtained

where ncount is the maximum value of mRNA expression, Pobsdata is the density probability for each gene g. Specifically, Pobsdata is a function that is differentiable with respect to parameter θ g. We can use the gradient descent method to obtain the optimal parameters of the TXmodel and the corresponding distribution of the TXmodel. The stopping condition of the gradient descent method is that the loss value is less than 10−3.

Posterior distribution

The optimization method in the previous section can obtain the parameters of the mechanism model that satisfies the distribution of scRNA-seq data. In this section, we give a method to solve the confidence interval of each optimized parameter (Gaskin T et al., 2023).

Each iteration of the optimization process can obtain the estimated value and loss value corresponding to the estimated value. In particular, according to Equation (15), the loss is calculated as follows.

The following posterior probability can be obtained

where π (0) is the prior, we take it as uniform density, n(k) represents the number of values takes during the iteration process, and represents the loss potential of each iteration. Let the posterior probability be , and the marginal distribution corresponding to each component of parameter can be obtained

The subscript − j means that we integrate the components of θ(k) except . In particular, to allow the iterative parameters to traverse the entire parameter space, multiple optimizations can be performed with different initial values to improve the performance of the posterior distribution.

Validation on synthetic data

We conducted an evaluation of the inference performance of the DeepTX framework on synthetic scRNA-seq data, focusing on three key aspects. (i) Model accuracy: This refers to the minimal discrepancy between the distribution corresponding to the inferred parameters and the input distribution. (ii) Parameter identifiability: This denotes the proximity of the values of the inferred parameters to the true parameters, suggesting that the model can accurately identify the true parameters. (iii) Result robustness: This implies that the outcomes of each inference do not exhibit significant variations.

We created a synthetic dataset using Sobol sampling and SSA under preset parameters. The synthesized distribution is used as input to the DeepTX framework, and upon optimization, we obtained estimated parameters and their corresponding distribution .We showed that the Hellinger distance between the estimated and synthetic dataset’s distribution was minimal, indicating accurate estimations (Supplementary Figure S3a). Additionally, the burst kinetics of both the synthetic data and the estimated data can be computed utilizing their respective parameters, and . We showed that a strong correlation was observed between the synthetic burst kinetics γ syn = (BFsyn, BSsyn) and estimated burst kinetics γest = (BFest, BSest) (Supplementary Figure S3b-c), suggesting that burst kinetics are identifiable.

Subsequently, we conducted a robustness verification of the framework’s ability to infer burst kinetics. The error was computed as follows:

Given a set of parameters , we drew samples of sizes (100, 500, 1000, 1500, 2000) from the probability distribution corresponding to . For each sample size, we performed 50 repetitions, and then DeepTX framework is used to infer the distribution obtained from each sampling, setting different initial values each time. We found that the inferred results are robust, and the robustness increases with the increase in the number of samples (Supplementary Figure S3d-e).

2 Dataset

2.1 IdU and DMSO dataset

To investigate the effects of 5’-iodo-2’-deoxyuridine (IdU) on genome-wide gene expression, Desai et al. conducted scRNA-seq data on mouse embryonic stem cells, both with and without IdU treatment. Consequently, they obtained scRNA-seq data for 12481 genes of transcriptomes from 812 cells and 13780 genes of transcriptomes from 744 cells, respectively. The authors found a phenomenon that IdU drug treatment increased transcriptional noise genome-wide but did not affect transcriptional mean, further affecting cell fate decisions.

We used the DeepTX framework to infer this set of scRNA-seq data to obtain potential transcription burst kinetics, and further study the impact of changes in transcription burst kinetics on cell fate decisions. To eliminate the impact of technical noise on data inference, we preprocess the data as follows. Firstly,We normalized the data ycg using Seurat, where ycg represents the expression level of the gene g in the cell c. The normalization process is that we multiplied each ycg by a scaling factor , where ynormalized = 50000. Particularly, the scaling factor S is related to the total expression of genes in each cell. Secondly, after normalizing the expression, we divided it into bins with a unit of 1 to obtain the bin where each expression value is located. Finally, to eliminate the impact of low-expression genes on the inference process, we filter out genes whose average gene expression is less than 1. After data processing, there are 6082 genes from 812 cells left in our IdU and 6082 genes from 744 cells left in DMSO data sets.

2.2 5FU Dataset

To study the impact of 5-fluorouracil (5FU) drug-induced DNA damage on cell fate decisions, Park SR et al. performed scRNA-seq on colon cancer cells treated with 0 μm, 10 μm, and 50 μm 5FU drug (Park SR et al., 2020) and obtained 8534 genes from 1673 cells, 7077 genes from 632 cells, and 6661 genes from 619 cells, respectively. The authors found that Colon cancer cells treated with different doses of 5FU exhibit different transcriptional phenotypes, and these different phenotypes correspond to different cell fate decisions.

We used the DeepTX framework to infer these three sets of scRNA-seq data to obtain potential transcriptional burst kinetics, thereby studying the impact of changes in transcriptional burst kinetics caused by different doses of 5FU treatment on cell fate decisions. To eliminate the impact of technical noise on data inference, we preprocess the data in the following steps.

First, we normalize ycg by multiplying a scale factor S, where . Second, we divide the normalized data into intervals of unit 1 to round the data. Finally, to eliminate the impact of low-expression genes on the inference process, we filter out genes whose average gene expression is less than 1. After data processing, there are 730 genes from 593 cells left in control data set, 174 genes from 593 cells left in our 10 μm data set. 5FU treatment data set and 187 genes from 564 cells left in 50 μm 5FU treatment

Additional information

Funding

This work was supported by National Key R&D Program of China grant 2021YFA1302500 by Natural Science Foundation of P.R. China grants 12171494, 11931019, 11775314, 62373384, and 12301646 by Key-Area Research and Development Program of Guangzhou, P.R. China, grants 202007030004 and 2019B110233002 by Guangdong Basic and Applied Basic Research Foundation grants 2022A1515011540 and 2023A1515011982 by Guangdong Province Key Laboratory of Computational Science at the Sun Yat-sen University grant 2020B1212060032 by the Fundamental Research Funds for the Central Universities, Sun Yat sen University grants 23qnpy48 and 23pnqy49 and by China Postdoctoral Science Foundation grant 2023M734061.

Author contributions

J.Z., Q.N., and B.J. conceived the project. J.Z., Z.H., and S.L. designed the algorithm. Z.H. and S.L. implemented the code, performed the simulations, analyzed the data, and interpreted the results. Z.W. and Z.Z. analyzed the data and interpreted the results. Q.N. and B.J. supported the data interpretation. Z.H., S.L., and J.Z. wrote the paper. All authors revised and approved the paper. J.Z. supervised the project.

Data availability

All the scRNA-seq data can be obtained from the public database. The scRNA-seq data of mouse embryonic stem cells (mESC) + DMSO and mESC + IdU data reported here is available at the Gene Expression Omnibus (GEO) database under accession number: GSE176044. The scRNA-seq data of RKO cell with different 5-fluorouracil treatment is available at the Gene Expression Omnibus (GEO) database under accession number: GSE149224.

Code availability

The code for reproducing the presented analysis results is available at GitHub repository (https://github.com/cellfateTX/DeepTX).