Abstract
Cells must adopt flexible regulatory strategies to make decisions regarding their fate, including differentiation, apoptosis, or survival in the face of various external stimuli. One key cellular strategy that enables these functions is stochastic gene expression programs. However, understanding how transcriptional bursting, and consequently, cell fate, responds to DNA damage on a genome-wide scale poses a challenge. In this study, we propose an interpretable and scalable inference framework, DeepTX, that leverages deep learning methods to connect mechanistic models and scRNA-seq data, thereby revealing genome-wide transcriptional burst kinetics. This framework enables rapid and accurate solutions to transcription models and the inference of transcriptional burst kinetics from scRNA-seq data. Applying this framework to several scRNA-seq datasets of DNA-damaging drug treatments, we observed that fluctuations in transcriptional bursting induced by different drugs could lead to distinct fate decisions: IdU treatment induces differentiation in mouse embryonic stem cells by increasing the burst size of gene expression, while 5FU treatment with low and high dose increases the burst frequency of gene expression to induce cell apoptosis and survival in human colon cancer cells. Together, these results show that DeepTX can be used to analyze single-cell transcriptomics data and can provide mechanistic insights into cell fate decisions.
Introduction
Cells must employ flexible regulatory strategies to determine their fate, including cell differentiation, survival, or apoptosis, in response to diverse external stimuli (López-Maury L et al., 2008; Stadhouders R et al., 2019). One key cellular strategy that enables both beneficial and detrimental functions is the stochastic nature of gene expression programs, arising from intricate biochemical processes and termed “noise” (Raj A & van Oudenaarden A, 2008; Balázsi G et al., 2011; Losick R & Desplan C, 2008). Gene expression noise originates from a discontinuous transcriptional pattern, wherein mRNA production occurs through alternating active and inactive gene states (Eling N et al., 2019; Tunnacliffe E & Chubb JR, 2020). This phenomenon, known as “transcriptional bursting”, has been observed in numerous experiments across both prokaryotic and eukaryotic cells (Golding I et al., 2005; Chubb JR et al., 2006; Larson DR et al., 2011). However, it remains an unresolved question how cells make fate decisions in response to external signals through genome-wide regulation of transcriptional burst mechanisms (Lammers NC et al., 2020; Rodriguez J & Larson DR, 2020).
DNA damage is a crucial and prevalent factor contributing to cellular responses, inducing varying degrees of cytotoxicity and thereby influencing diverse cell fate decisions (Krenning L et al., 2019; Su TT, 2006; Hafner A et al., 2019). For instance, low cytotoxicity limits cell proliferation (van den Berg J et al., 2019; Arora M et al., 2017; Barr AR et al., 2017; Deng Z et al., 2023), whereas heightened cytotoxicity prompts cell differentiation and senescence (Müllers E et al., 2014; Feringa FM et al., 2018; Toledo LI et al., 2008; Zhao Y et al., 2023). Furthermore, extreme scenarios of high cytotoxicity can cause cells to engage in the apoptotic process (Yousefzadeh M et al., 2021; Carneiro BA & El-Deiry WS, 2020; Roos WP & Kaina B, 2013; Zheng P et al., 2018). Experiments have shown that DNA damage can disrupt the gene transcription process (Lans H et al., 2019). Specifically, DNA damage slows down the movement of RNA Pol II along the DNA strand (Muñoz MJ et al., 2009). When more severe DNA damage is encountered, RNA Pol II exhibits a sliding pause behavior for DNA repair before resuming transcription (Figure 1A) (Gregersen LH & Svejstrup JQ, 2018; Geijer ME & Marteijn JA, 2018; Giono LE et al., 2016). These changes in RNA Pol II movement behavior would further reflect the cellular regulation of gene expression, particularly burst kinetics (Friedrich D et al., 2019; Calia GP et al., 2023). A recent study has shown that cellular responses to DNA damage involve the regulation of burst frequency (BF) on specific genes, rather than a genome-wide conclusion (Friedrich D et al., 2019). They have demonstrated that cells respond to DNA damage by increasing gene expression noise, but they do not delve into the topic of transcriptional bursts (Calia GP et al., 2023; Desai RV et al., 2021). Therefore, the question of how DNA damage causes cells to regulate transcriptional burst kinetics on a genome-wide scale remains unresolved.
To understand the dynamics of transcriptional bursts, ideally, one needs to observe the fluctuations of gene expression over a continuous time interval, such as smFISH, scNT-seq, and HT-smFISH (Femino AM et al., 1998; Qiu Q et al., 2020; Safieddine A et al., 2023). However, most existing techniques can only observe a limited number of genes, making their application to genome-wide studies challenging. Fortunately, single-cell RNA sequencing (scRNA-seq) technology provides excellent opportunities to explore this question, emerging as the leading method for genome-wide mRNA measurements and revealing gene expression noise within individual cells (Tanay A & Regev A, 2017; Picelli S et al., 2013; Zheng GX et al., 2017). Many studies have employed scRNA-seq to elucidate the diverse sources of gene expression noise and the fundamental principles of transcriptional dynamics on a genome-wide scale (Eling N et al., 2019; Faure AJ et al., 2017; Morgan MD & Marioni JC, 2018; Ochiai H et al., 2020). However, scRNA-seq data are static snapshot measurements. To recover the underlying dynamic transcriptional burst kinetics, a mathematical approach is needed to model the dynamic gene expression stochastic processes by regarding the scRNA-seq data as a steady-state result of the dynamic model (Larsson AJM et al., 2019; Luo S et al., 2023; Luo S et al., 2023).
For mathematical modeling, the gene expression model for DNA damage ought to encompass more sophisticated generalized properties compared to the traditional telegraph model. This is because DNA damage-induced RNA pol II turns transcription into a multi-step process rather than a single-step process (Stumpf PS et al., 2017; Voss TC & Hager GL, 2014). Specifically, the burst kinetics can be conceptualized as genes switching between inactive and active states (Singh A et al., 2013; Cavallaro M et al., 2021). Many efforts in this field are addressing the challenges of describing this multi-step process of gene expression in an interpretable and tractable model. One common modeling approach is to introduce multiple intermediate states between inactivate and activate states (Zhang J et al., 2012; Zhang J & Zhou T, 2014; Zhou T & Zhang J, 2012). Although this multi-state model can fit the experimental data better, it is hard to rationalize the state numbers and parameters (Desai RV et al., 2021; Rodriguez J et al., 2019; Zoller B et al., 2015). Another alternative modeling approach is direct non-Markovian modeling for non-exponential waiting times in gene state, which maps the multiple parameters from multi-state models to a small number of interpretable parameters that are easily observed experimentally (Daigle BJ, Jr. et al., 2015; Kumar N et al., 2015; Schwabe A et al., 2012; Stinchcombe AR et al., 2012; Zhang J & Zhou T, 2019; Zhang JJ & Zhou TS, 2019). However, the ensuing difficulty is solving the analytic solution of the steady distribution from a non-Markovian model, which may require some numerical simulation methods that are computationally resource-intensive and time-consuming, such as Monte Carlo methods (Sisson SA et al., 2007).
Statistical inference, particularly in recovering burst kinetic parameters from genome-wide scRNA-seq data, necessitates efficient and scalable inference algorithms (Gómez-Schiavon M et al., 2017). Therefore, the stochastic dynamical system must be quickly solved as parameters are continuously updated throughout the inference process. Deep learning exhibits a broad array of applications as a contemporary method for addressing complex systems (Jiang Q et al., 2021; Wang S et al., 2019; Michoski C et al., 2020). For example, some studies employed neural networks to establish the mapping relationship between a wide range of parameters in the gene expression model and the corresponding pre-simulated steady distribution (Figure 2B) (Jiang Q et al., 2021; Wang S et al., 2019; Davis CN et al., 2020; Tang Y et al., 2023). This approach, based on neural networks, has been utilized for parameter inference in both deterministic and stochastic models (Jiang Q et al., 2021; Gaskin T et al., 2023; Sukys A et al., 2022; Tang W et al., 2023).
In this study, we introduce DeepTX, a deep learning inference framework that integrates mechanistic models and deep learning methods to elucidate the genome-wide regulation of transcriptional bursts in DNA damage response using scRNA-seq data. The DeepTX framework comprises two modules: DeepTXsolver and DeepTXinferrer. Specifically, DeepTX initially employs the DeepTXsolver module to efficiently solve complex gene expression dynamics models using neural network architectures. It subsequently utilizes the DeepTXinferrer module to accurately infer potential transcriptional burst kinetic parameters using Bayesian methods. DeepTX demonstrates good performance on synthetic datasets. Furthermore, to investigate genome-wide regulation of transcriptional bursts in response to DNA damage, we applied the DeepTX framework to three DNA damage-related scRNA-seq datasets representing cell differentiation, apoptosis, and cell survival. As a result, firstly, we found that the fate decision of mouse embryonic stem cells (mESCs) to undergo cell differentiation in response to DNA damage caused by 5’-iodo-2’-deoxyuridine (IdU) compounds may be driven by enhancing burst size (BS enhancer) of genes that delay the cell mitosis phase transition cycle, thereby influencing cell reprogramming and differentiation. Secondly, the fate decision of colorectal cancer cells to cell apoptosis in response to DNA damage induced by low dose 5-fluorouracil (5FU) drug may be driven by enhancing burst frequency (BF enhancer) of genes that increase intracellular oxidative stress. Thirdly, the fate decision of colorectal cancer cells to cell survival in response to DNA damage induced by high dose 5FU drug may be driven by BS enhancer genes that induce telomerase extension to slow down oxidative stress and help cells to develop drug resistance. In conclusion, DeepTX is a computational framework with utility and extensibility to infer transcriptional dynamics from scRNA-seq data.
Results
Overview of DeepTX framework
To understand how cell influences underlying transcriptional mechanisms in response to DNA damage, as described in the introduction section (Figure 1A), we propose an effective computational inference framework, referred to as DeepTX (Figure 1B). DeepTX takes as input a set of scRNA-seq datasets and then outputs genome-wide burst kinetics parameters with the benefit from deep learning methods (Figure 1B). The inference process of the DeepTX framework is composed of two crucial modules, DeepTXsolver and DeepTXinferrer. The first module, DeepTXsolver, is to solve for the stationary distribution of a gene expression dynamic model (Figure 1B middle). A neural network is constructed in DeepTXsolver to approximate the mapping from model parameters to the corresponding stationary distribution. As in many biological contexts, there is a need to model dynamics that describe more realistic gene expression processes that are difficult to solve (such as the case with DNA damage). The second module, DeepTXinferrer, infers the burst kinetics parameter (Figure 1B middle). Using a trained neural network from DeepTXsolver, we can easily and quickly obtain the stationary distribution (called model distribution) of any parameter within a reasonable parameter space. DeepTXinferrer compares the model distribution to the stationary distribution of each gene in the scRNA-seq data (called data distribution) as the parameters are iterated until convergence and utilize the Bayesian inference to output the posterior distribution of burst kinetic parameter (Figure 1B right). We will describe these two modules in detail in the following two sections.
DeepTX solves the transcriptional model using deep learning
To enable the recovery of kinetic information from static scRNA-seq data, we first performed rational mechanistic modeling of gene expression processes with DNA damage. The gene transcription burst process is often characterized using the traditional two-state model (Singh A et al., 2013; Cavallaro M et al., 2021), which assumes that genes switch randomly between OFF and ON states with exponentially distributed waiting times for each state. However, the gene expression process under conditions of DNA damage can result in slowing or even stopping the RNA pol II movement (Lans H et al., 2019). Further, DNA damage will cause many macromolecules to be recruited for damage repair, and this process will affect the spatially localized behavior of the promoter (Lans H et al., 2019), causing promoter inactivation and activation to become a complex multi-step process. We therefore used a generalized model we developed previously (Luo S et al., 2023), called TXmodel here, which extended the state waiting time to an arbitrary distribution, i.e., the gene expression system is a non-Markovian system (Figure 2A and see Methods 1.2). More specifically, the waiting times for the transitions between the OFF and ON states are represented by two random variables, assumed to follow the arbitrary distributions foff (koff, roff) and fon (kon, ron), respectively. Additionally, we assumed that the rates of mRNA synthesis and degradation are constants, i.e., the waiting times for mRNA transcription and degradation follow exponential distributions with rate parameters ksyn and kdeg, respectively (Figure 2A left). Nevertheless, this non-Markovian gene expression system is difficult to solve stationary distribution analytically, while numerical solution methods are time-consuming.
For that reason, we utilized a deep learning approach, which has been demonstrated to be effective in solving stochastic systems (Gupta A et al., 2021), to construct a mapping from the parameter of the mechanistic TXmodel to its corresponding stationary distribution. More specifically, we aimed to train a fully connected neural network referred as DeepTXsolver, whose inputs are the parameters of the TXmodel θ and whose output is a stationary distribution parameterized by a mixed negative binomial distribution Pneuralnet, which has good performance in approximating the solution of the chemical master equation (Perez-Carrasco R et al., 2020; Öcal K et al., 2022) (Figure 2A and see Methods 1.3). To generate effective training sets for DeepTXsolver, we first generated a large number of parameter sets within a reasonable range of parameter space by using Sobol sampling (see Methods 1.3) (Sobol’ IyM, 1967). Subsequently, for each parameter set, we employed a modified stochastic simulation algorithm (SSA) for non-Markovian TXmodels to generate numerous samples (Figure 2A and see Methods 1.3). The resulting stationary distributions Psimulation served as training labels. Additionally, we demonstrated that the arbitrary order binomial moments of the TXmodel can be analytically solved iteratively within the context of queueing theory (Zhang Z et al., 2021; Zhang J et al., 2024). This enables the straightforward computation of significant summary statistics, including mean and Fano factor (Figure 2A, see Methods 1.2, and Supplementary Text 1.2). Thus, our loss function comprises two primary components: (i) the KL divergence between the predicted distribution generated by the neural network Pneuralnet and the labeled distribution generated by the SSA Psimulation; (ii) the logarithmic error of the predicted summary statistics computed by the neural network Ŝ and the labeled summary statistics s computed by theoretical derivation (Figure 2A and see Methods 1.3). Furthermore, we compared various optimizers and hyperparameters of model architectures to determine the best predicting performance (see Methods 1.3 and Supplementary Figure S2).
Consequently, the loss function of DeepTXsolver converges effectively during the training process. We utilized DeepTXsolver to predict the stationary distribution using parameters from the test set. We observed that it effectively fits all four representative distributions from the TXmodel, encompassing unimodal distribution at zero point, unimodal distribution at non-zero point, bimodal distribution with one peak at zero point, and bimodal distribution with both peaks at non-zero point (Figure 2B). These distributions have been extensively linked to cell fate decisions in numerous experiments (Gupta PB et al., 2011; Cohen AA et al., 2008; Bessarabova M et al., 2010). Moreover, we demonstrated that DeepTXsolver accurately predicted crucial distribution properties, including mean, Fano factor, and bimodal coefficient, with a high correlation between theoretical and predicted summary statistics (r > 0.99, p-value< 2.2*10-16 of t-test Figure 2C). It is noteworthy that, to assess the impact of the presence or absence of summary statistics on DeepTXsolver, we performed separate experiments using different loss functions. We demonstrated that incorporating summary statistics into the model led to quicker convergence of DeepTXsolver on the training set (Figure 2D) and yielded more robust and accurate predictions on the test set (Figure 2E).
Overall, DeepTXsolver effectively establishes mappings from parameters to stationary distributions. This approach circumvents the high computational resource requirements of classical simulation algorithms and ensures the algorithm’s efficiency and scalability in subsequent statistical inference, particularly in genome-wide inference.
DeepTX infers genome-wide transcriptional kinetics from scRNA-seq data
Having obtained the trained neural network model, we can rapidly compute the stationary distributions of any TXmodel parameters. These distributions represent the outcomes of the underlying true gene expression process, referred to as the “true distribution” Pmodel (Figure 3A). However, during the inference process, the gene expression distributions measured by the scRNA-seq data we utilized, termed the “observed distribution of the data” Pdata, were subject to noise, including errors stemming from the measurement technique (Sarkar A & Stephens M, 2021) (Figure 3A). Thus, we introduced a mechanistic hierarchical model to bridge the gap between the true and observed distributions. This hierarchical model considers that the observed data results from the interplay between the gene expression process and the measurement process. It combines the two distributions using a convolutional form (see Methods 1.4). Pmodel (x) represents the mixed negative binomial distribution approximated, whereas for Pmeasure (y ❘ x), we opted for the binomial distribution. This choice is informed by the fact that it can approximate hypergeometric distributions representing sequencing sampling without replacement when the sample size is sufficiently large. Consequently, the final distribution can maintain the form of a mixed negative binomial distribution (Supplementary Note 1.4), denoted as the observed distribution of the model Pdata (Figure 3A).
Leveraging the hierarchical model, we can construct a scalable and interpretable module called DeepTXinferrer, facilitating genome-wide inference of transcriptional burst kinetics from scRNA-seq data. First, we input the prior distribution of mechanistic model parameters and a set of scRNA-seq data, and we can get the observed distribution of the model Pobsmodel (derived successively from the trained neural network and the hierarchical model) and the observed distribution of the data Pobsdata, respectively (Figure 3A). Second, we used the Hellinger distance L (θ) = D (Pobsmodel, Pobsdata) to measure the difference between Pobsmodel and Pobsdata (see Methods 1.4). It’s worth noting that the loss function remains solvable via gradients, despite the inclusion of complex computations, such as neural networks and hierarchical models. Consequently, we utilize a gradient-based optimizer to update the iterative parameters until convergence. Lastly, we employ a method based on loss potentials, which computes the posterior distribution of the parameters during the optimization of the loss function (Gaskin T et al., 2023) (Figure 3A, see Methods 1.4). Subsequently, this process yields results of the transcriptional burst kinetics derived from the estimated parameters of the mechanistic model.
We assessed the effectiveness of DeepTXinferrer by conducting validation on synthetic data (see Methods 1.4). Synthetic samples were generated for each parameter set, and our inference algorithm was then applied to derive the output transcriptional burst kinetics (see Methods 1.4). As a result, the estimation of transcriptional burst kinetics demonstrated high accuracy and strong correlation across most parameter regions (r > 0.99, p-value< 2.2*10-16 of t-test Supplementary Figure S3b, c), particularly within the region of fully expressed mRNA abundance (Figure 3B). Moreover, the estimated posterior distribution of each parameter suggests that the peak value closely approximates the true value (Figure 3C-E). Additionally, we verified the robustness of our algorithm concerning the number of samples, observing that higher sample numbers corresponded to increased accuracy in estimation (Figure 3D, E). In summary, our inference algorithm exhibited strong performance on synthetic data, affirming its reliability for application to real-world data in inferring transcriptional bursts.
In the subsequent sections, we will employ the DeepTXinferrer to explore how cells determine diverse fates in response to DNA damage stimuli through genome-wide regulation of transcriptional bursting. We analyzed three sets of scRNA-seq data representing distinct cell fates: cell differentiation, apoptosis, and survival. The first set of data is a mutually controlled scRNA-seq dataset of mESC cell differentiation induced by IdU drug treatment. This data set contains 12481 genes of transcriptomes from 812 cells and 13780 genes of transcriptomes from 744 cells, respectively. The second set of data is a mutually controlled scRNA-seq dataset of human colon cancer cells treated with low-dose 5FU drug that causes apoptosis. This data set contains 8534 genes of transcriptomes from 1673 cells and 7077 genes of transcriptomes from 632 cells, respectively. The third set of data is a mutually controlled scRNA-seq dataset of human colon cancer cells in which high-dose 5FU drug treatment resulted in cell-resistant survival. This data set contains 8534 genes of transcriptomes from 1673 cells and 6661 genes of transcriptomes from 619 cells, respectively.
DeepTX reveals that the BS enhancer induced by IdU treatment promotes cell differentiation
The administration of the IdU drug, a thymine analogue, influences the gene expression process within cells, subsequently impacting cell differentiation (Desai RV et al., 2021; Li C et al., 2024). The perturbation in the gene expression process primarily stems from DNA damage, as IdU is randomly integrated into the DNA chain, thereby inducing DNA damage (Li C et al., 2024). Although DNA damage caused by IdU treatment has little impact on genome-wide mean gene expression, it increases variance across the genome, thereby affecting cell differentiation (Desai RV et al., 2021). This is different from the understanding that increases in variance are usually caused by fluctuations in the mean gene expression (Newman JR et al., 2006). Meanwhile, the mean can be deconvolved as the product of BS and BF. This raises the question of the relationship between bursting dynamics and variance on a genome-wide scale, and how changes in bursting dynamics affect cell differentiation.
We used the DeepTX framework to infer a set of preprocessed scRNA-seq data (see Methods 2.1) to obtain underlying burst kinetics. The difference between the statistics and distribution obtained by inference and the statistics and distribution of sequencing data is minimal, ensuring the correctness of the inference results (Supplementary Figure S4a-l). Subsequently, we observed considerable variations in BS and BF across different genes (Figure 4A, Supplementary Figure S5d-f). Notably, genes with significantly increased BS exhibit corresponding decreases in BF, suggesting relative stability in the average gene expression level (Supplementary Figure S5d-f). Additionally, we observed a significant increase in transcriptional BS for most genes in the treatment group (Figure 4A). This indicates that the rise in variance primarily results from the heightened BS induced by DNA damage (Figure 4B). Notably, this conclusion is consistent with the results of theoretical analysis (Supplementary text 2).
To discover the biological characteristics of BS enhancement, we identified genes with upregulated BS using a t-test-based differential analysis method. Further, we perform GO enrichment analysis (Consortium GO, 2015) on the identified genes. We observed that the genes with enhanced BS were mainly enriched in the signal transduction of DNA damage response and the pathway that regulates the cell mitosis phase transition cycle (Figure 4D). Subsequently, we performed Gene Set Enrichment Analysis (GSEA) (Subramanian A et al., 2005) on the identified differentially expressed genes associated with BS and observed enrichment in pathways related to cardiac muscle cell differentiation and mitotic intra-S DNA damage checkpoint signaling (Figure 4E, F). Similarly, we subjected genes significantly enhanced by BF in the treatment group (Supplementary Figure S5f) to GO analysis (Supplementary Figure S5g). Notably, the insignificance of the cell cycle transition pathway suggests that it is primarily influenced by BS-enhanced genes. Moreover, previous experimental studies have confirmed that treatment with IdU induces DNA damage, subsequently reprogramming cell differentiation fate (Rosina M et al., 2019; Riccio F et al., 2022; Liu Z et al., 2017) (Figure 4C). Additionally, research indicates that enhanced cell differentiation results from delayed cell cycle transitions (Rosina M et al., 2019).
DeepTX reveals that the BF enhancer induced by low-dose 5FU treatment accelerates cell apoptosis
The chemotherapeutic compound, 5-fluorouracil (5FU), exerts its cytotoxic effects on colorectal cancer cells by inflicting damage to their DNA (Kuipers EJ et al., 2015; Bunz F et al., 1999; Chang GS et al., 2014). This damage triggers apoptosis, a process of programmed cell death, thus inhibiting the growth and proliferation of malignant cells (Kuipers EJ et al., 2015; Bunz F et al., 1999; Chang GS et al., 2014). However, the impact of 5FU-induced DNA damage on the gene expression process and the underlying mechanism of cell apoptosis remains unclear. In this section, we employ the DeepTX to investigate the effect of 5FU-induced DNA damage on the gene expression process in colorectal cancer cells with scRNA-seq data (Park SR et al., 2020).
First, we inferred the burst kinetics from the scRNA-seq data of cells treated with low doses (10 μm) of 5FU and controls. After preprocessing the data (see Methods 2.2), we inferred the scRNA-seq data to obtain a distribution that fits the scRNA-seq data (Supplementary Figure S6a-l), as well as the underlying burst kinetics (Figure 5A, B).
Further, we found that the mean difference between the two groups of scRNA-seq data was not significant (p-value = 0.5 of t-test), but the variance and BS were significantly different (p-value<0.05 of t-test Figure 5A and Supplementary Figure S7a-c). Then, we performed a differential analysis on the BS and BF of gene expression in the 5FU-untreated cells and 10 μm 5FU treated cells, and then performed GO gene enrichment analysis on differential genes (Figure 5C and Supplementary Figure S7d-f). We found that BS down-regulates differential genes were mainly enriched in apoptosis pathways, electron exchange pathways, and oxidation-related pathways (Figure 5D and Supplementary Figure S7g). Specifically, the enrichment analysis indicated that although the burst size of genes in 5FU-treated cells was generally larger than that in the control group, the genes primarily influencing cell apoptosis exhibited a down-regulated burst size
Our enrichment analysis results are consistent with existing experimental results (Figure 5C). First, studies in recent years have found that many anticancer drugs trigger apoptosis and necrosis of cancer cells by activating the production of reactive oxygen species (ROS) (Sun Y & Rigas B, 2008). And experiments have shown that 5FU can induce ROS production in colorectal cancer cells (oxidation-related pathways) (Laha D et al., 2015; Bwatanglang IB et al., 2016; Chenna S et al., 2022). Meanwhile, mitochondria are the main source of ROS production, and defects in the mitochondrial electron transport chain will increase ROS production (electron exchange pathways) (Adam-Vizi V, 2005). As a result, excessive production of ROS can cause mitochondrial DNA damage and nuclear DNA damage (Figure 5D). In particular, when mitochondrial DNA is damaged, mitochondrial damage repair will further increase the pressure of ROS (Figure 5C right) (Shokolenko IN et al., 2014). Hence, DNA damage ultimately leads to cell death through apoptosis (apoptosis pathways) (Figure 5C) (Handali S et al., 2018; Hwang PM et al., 2001).
DeepTX reveals that BF enhancer induced by high-dose 5FU treatment contributes to cell survival
Low-dose 5FU treatment can induce cell apoptosis, while research shows high-dose 5FU treatment can induce cell resistance and evade apoptosis to survive (Kuranaga N et al., 2001). Hence, this part mainly studies the impact of differences in burst kinetics of gene expression between high-dose (50 μm) 5FU-treated and untreated colorectal cancer on cell fate decisions.
After data preprocessing (see Methods 2.2), we similarly observed the phenomenon that the average expression value of genes remained stable, while there is an enhanced increase in the variance genome-wide (Supplementary Figure S9a-c). Then, after preprocessing the data (see Methods 2.2), we inferred the scRNA-seq data to obtain the distribution that fits the scRNA-seq data (Supplementary Figure S8a-l), as well as the underlying burst kinetics (Figure 6A). Further, we performed a difference analysis on the BS of the two sets of inferred data, and selected genes with large differences in BS and BF for GO gene enrichment (Figure 6A, B and Supplementary Figure S9d-f). We found that the BS down-regulates differential genes were mainly enriched in the Cajal body related pathway, the electron exchange pathway, and telomerase-related pathways, but the apoptosis-related pathways were not significant (Figure 6D). In particular, according to experimental studies, telomerase-related pathways can activate telomerase and play an important role in the elongation of telomeres (Cristofari G et al., 2007).
Gene enrichment results are consistent with existing ways of cells evading apoptosis (Figure 6E). That is, endogenous ROS production exceeds the cellular antioxidant defense capacity, leading to chemical damage to mitochondrial DNA (mtDNA). Meanwhile, high doses of 5FU causes Human Telomerase Reverse Transcriptase (hTERT) translocation, which on the one hand triggers telomerase activity and maintenance pathways, which enables cells to evade apoptosis (Kuranaga N et al., 2001). On the other hand, this translocation will enrich hTERT in mitochondrial DNA, thereby reducing oxidative stress and mitochondrial DNA damage to protect mitochondria and prevent cells from apoptosis (Lipinska N et al., 2017).
An important hallmark of cancer cell resistance is that cells are temporarily arrested in a non-proliferative G0 state (Wiecek AJ et al., 2023). To verify the presence of resistant cells in cells treated with high doses of 5FU, we selected genes related to G0 arrest and performed GSVA (Hänzelmann S et al., 2013) to obtain the G0 arrest quality score (Wiecek AJ et al., 2023) of each cell and perform UMAP dimensionality reduction visualization (Figure 6E). Through mass score discrimination, we found that the proportion of resistant cells was large among cells treated with high doses of 5FU (Figure 6F). This result was consistent with the results of our enrichment analysis. Specifically, by comparing the BS and mean values of these genes, we found that most of the BS of G0 arrest related genes were up-regulated, but the mean changes were small (Figure 6G).
Discussion
Gene expression is inherently stochastic, leading to inter-individual variations primarily attributed to differences in burst patterns (Dar RD et al., 2012; Zenklusen D et al., 2008; Phillips R et al., 2012). Experimental observations confirm this bursting mechanism as a cellular response to environmental changes, particularly DNA damage (Friedrich D et al., 2019; Desai RV et al., 2021), influencing cell fate decisions. In this study, we introduced a mechanism-based deep learning framework (DeepTX) for performing genome-wide inference on transcriptional burst kinetics of a comprehensive gene expression model using scRNA-seq data. This framework offers an efficient computational approach to explore how cells regulate transcriptional burst kinetics in response to DNA damage at the genome-wide level, affecting various cell fate decisions.
Utilizing DeepTX, we analyzed three sets of scRNA-seq data comprising control and stimulus pairs driving different cell fate decisions: cell differentiation, apoptosis, and survival. This analysis unveiled significant insights into the underlying mechanisms of transcriptional burst kinetics. Firstly, IdU drug treatment-induced BS enhancement delays the cell mitosis phase transition, impacting cell reprogramming and differentiation. Interestingly, while prior research suggested noise enhancement as the driver of cell fate decisions, we uncovered that BS enhancement is the deeper mechanism leading to cell differentiation, potentially resulting in cardiomyocyte differentiation. Secondly, low-dose 5FU treatment increases oxidative stress, promoting cell apoptosis by augmenting BF. Consistent with previous findings attributing apoptosis to ROS stress, our study reveals that ROS-induced DNA damage alters gene expression burst kinetics, ultimately triggering apoptosis. Thirdly, high-dose 5FU treatment induces telomerase extension, mitigating oxidative stress and fostering drug resistance by upregulating BF. Despite prior evidence of drug-resistant cells, our enrichment analysis underscores telomerase extension’s role in resisting oxidative stress, underscoring the impact of BF enhancement.
We highlight several advantages of our DeepTX framework. Firstly, DeepTX employs a mechanistic hierarchical model that is both interpretable and extensible. To enhance the model’s realism in interpreting scRNA-seq data, we developed a novel approach integrating the sequencing process and the underlying gene expression process. Specifically, we employed the TXmodel, a complex non-Markovian stochastic dynamic model that extends the waiting times for gene OFF and ON states into a non-exponential distribution, reflecting the multi-step nature of gene regulation. As a result, TXmodel not only captures gene expression dynamics under DNA damage but also depicts other multi-step regulatory processes, such as chromatin opening (Lomvardas S & Thanos D, 2002), preinitiation complex formation (Fuda NJ et al., 2009), and transcription factors binding (Nordick B et al., 2022). Secondly, despite the mathematical complexity of the mechanistic hierarchical model, DeepTX renders it tractable. This is achieved by (i) directly mapping model parameters to corresponding stationary distributions via a neural network trained on extensive simulated single-cell RNA data and theoretical analysis summary statistics for TXmodel; (ii) approximating the stationary distribution output of TXmodel, coupled with the sequencing process, by a mixed negative binomial distribution, maintaining the final distribution format. Thirdly, DeepTX’s inference process is scalable, allowing genome-wide bursting kinetics inference from static scRNA-seq data. Despite the computational complexity of parameter inference involving hierarchical models and trained neural networks, DeepTX retains auto-differentiation during inference, ensuring efficiency.
The DeepTX research paradigm is poised for extension to encompass both dynamic models and biological data. Currently, our focus is confined to the transcriptional processes influenced by DNA damage, limiting the explanatory scope of the TXmodel to larger-scale biological phenomena such as alternative splicing (Wan Y et al., 2021; Gorin G & Pachter L, 2022), protein translation (Luo S et al., 2023; Golan-Lavi R et al., 2017), epigenetic modification (Nicolas D et al., 2018), and chromatin movement (Liu T et al., 2016; Wang Z et al., 2024; Wang Z et al., 2023). These broader biological processes necessitate more intricate dynamic models, often challenging to solve and thus challenging to apply to realistic biological data. However, the construction of neural networks within the DeepTX framework, leveraging extensive simulated data and summary statistics, facilitates the implementation of these complex dynamic models. As for biological data, our analysis has been restricted to single-cell RNA data from control and stimulated groups to elucidate the impact of DNA damage on burst kinetics. Yet, the advancement of sequencing technology promises an influx of diverse data types ripe for integration into DeepTX. Examples include increasingly comprehensive transcriptome data with both temporal and spatial resolution (Wen L & Tang F, 2022), as well as state-of-the-art multi-omics data (Miao Z et al., 2021). In conclusion, DeepTX provides a step toward inferring complex transcriptional kinetics from scRNA-seq data through mechanism-based deep learning inference. It sheds light on the augmented burst kinetics in DNA damage response across various cell fate decisions, thereby offering novel insights.
Methods
1 The DeepTX framework
1.1 Framework overview
The DeepTX Framework aims to infer the underlying bursting dynamics from static single-cell RNA sequencing (scRNA-seq) data to explore the impact of changes in bursting dynamics on cell fate decisions. The DeepTX framework includes two modules, DeepTXsolver and DeepTXinferrer. DeepTXsolver can efficiently and accurately solve transcription models using neural network architecture, and DeepTXinferrer can infer scRNA-seq data to obtain its underlying transcription burst kinetics using Bayesian methods.
The input of the DeepTXsolver is the parameters of the mechanism model, and the output is the corresponding stationary distribution solution of the model. The input to the DeepTXinferrer is the discrete probability distribution of gene expression obtained from scRNA-seq data, and the output of the model is the parameters of the mechanism model.
The construction of the DeepTX framework consists of the following parts: (1) Model the gene expression process. (2) Train a neural network that solves the mechanism model using the data set generated by stochastic simulation algorithm (SSA) simulation. (3) Hierarchical model modeling to infer underlying burst kinetic parameters from scRNA-seq data.
When building the DeepTX Framework, we made the following assumptions: (1) The gene expression of cells was in a stationary distribution during sequencing. (2) Gene expression counts of the same cell type follow the same distribution. (3) The solution of the model can be approximated by a mixed negative binomial distribution. (4) Gene expression state space and time are equivalent (SSA simulation truncation).
1.2 Modelling stochastic gene expression processes
To recover dynamic transcriptional burst kinetic parameters from static scRNA-seq data, we require the stochastic modeling approach to profile the gene expression process. The traditional telegraph model is often used to model gene transcriptional bursting, which assumes that switching between genes follows an exponential distribution (i.e., switching is a single-step process) (Peccoud J & Ycart B, 1995; Kim JK & Marioni JC, 2013). However, this assumption is an oversimplification of the gene expression process in the context of DNA damage. Due to the fact that DNA damage slows down and even stops the elongation of RNA pol II across the double-strand DNA (Lans H et al., 2019), the gene expression process should be regarded as a multi-step process. This property causes the waiting time for the gene switching between active and inactive states to follow a non-exponential distribution (Daigle BJ, Jr. et al., 2015; Zhang J & Zhou T, 2019; Zhang JJ & Zhou TS, 2019). Therefore, we use TXmodel (Luo S et al., 2023) to better characterize the gene transcription process in the context of DNA damage (Figure 2), whose chemical reaction network (CRN) can be written as follows:
where R1 and R2 describe the gene-state switching randomly between the ON state and OFF state, and foff (t,ωoff) and fon (t,ωon) are the distribution of dwell time in ON and OFF state, respectively. While the gene is in the active state, the synthesis of mRNA molecules is governed by a Poisson process with a constant synthesis rate ksyn (R3). Meanwhile, each mRNA molecule can be stochastically degraded (R4) with a constant degradation rate kdeg. Overall, the TXmodel can be determined by a set of parameters, denoted as θ = (ωoff, ωon, ksyn, kdeg).
The TXmodel as an extension of the traditional telegraph model can capture more realistic transcriptional burst kinetics, with BF and BS being two important parameters. The definition of BF is
where and are the mean dwell time in OFF and ON state, respectively. The definition of BS is
Without loss of generality, the gamma distribution can be used to characterize non-exponential distributions, i.e., and , where Γ(·) is Gamma function, roff and ron are the rate-parameters of state switching, koff and kon are the possible number of reaction steps between OFF and ON state.
It is a challenge to solve the stationary distribution of this gene expression system directly from a theoretical approach. But we proved that the binomial moments of this system can be obtained in an iterative form, by rationalizing this gene expression system as a queueing theory model. Thus, we can theoretically compute some important summary statistics s (θ) given a parameter θ, such as mean, variance, kurtosis, and bimodal coefficient (see details in Supplementary Text 1.2).
1.3 Training neural network to solve gene expression model
The stationary distribution of CRN (1) is difficult to be solved analytically. In general, stochastic simulation algorithms are used to approximate the solution of chemical master equations (Boguñá M et al., 2014; Masuda N & Rocha LE, 2018). However, the inference process involves updating the parameters, and each update requires a simulation to obtain the stationary distribution, which makes statistical inference time-consuming. We note that deep learning methods may have potential applications in solving this challenge because it is widely used for approximating the solution of partial differential equations, chemical master equations, and non-Markovian chemical master equations (Jiang Q et al., 2021; Wang S et al., 2019; Davis CN et al., 2020; Gupta A et al., 2021; Bar-Sinai Y et al., 2019).
Therefore, in this part, we used a neural network to approximately solve CRN (1). This enables CRN (1) to be solved efficiently and accurately. Details will be introduced in the following part.
Generating synthetic training dataset
To construct the mapping from the parameters of TXmodel to the stationary distribution, we first need to generate the neural network’s training set, i.e., a large-scale range of parameter sets as features and corresponding probability distributions as labels. For the features, we used the Sobol algorithm (Sobol’ IyM, 1967) to conduct large-scale sampling to generate the neural network data set, which can cover the entire parameter space more evenly than random sampling.
For the labels, we perform the SSA of the TXmodel to obtain the stationary distribution given a set of preset parameters. For clarity, the synthetic training dataset is denoted by {(θi, Psimulation,i)}, where θi presents the model parameters of TXmodel, and Psimulation, i represents the stationary distribution solution obtained by SSA (Figure 2A). Specifically, the value range of each parameter component of θi is koff ∈(1,15), roff ∈(0.1,10), kon ∈(1,15), ron ∈(0.01,10), ksyn ∈(0.1, 400).
Deep learning model architecture
We build the basic deep learning model architecture in DeepTX via fully connected neural networks, which has been effectively used to solve approximate solutions from biochemical reaction networks (Sukys A et al., 2022; Gupta A et al., 2021). The deep learning model architecture consists of an input layer, hidden layers, and an output layer.
Firstly, the input layer contains 5 neurons corresponding to the parameters of the mechanism model θi = (koff,i, roff, i, kon,i, ron,i, ksyn,i). Secondly, the input layer is followed by nhidden hidden layers with each layer containing neurons. For the neurons in each hidden layer, we implement a nonlinear mapping by the commonly used ReLU activation function (Supplementary Figure S1):
where represents the value of the j -th neuron of the nhidden -th hidden layer of the i -th group of parameters, represents the neural network weight connecting the k -th neuron of the nhidden -th hidden layer and the j -th neuron of the (nhidden −1) -th hidden layer.
Thirdly, In the output layer, we aim to allow neurons to approximate synthetic scRNA-seq data generated by TXmodel. Negative binomial distributions are a class of distributions that are widely used in modeling scRNA-seq data and have been shown to account for both the “drop-out” phenomenon and the fact that the Fano factor is greater than 1 in scRNA-seq data (Risso D et al., 2018; Love MI et al., 2014). Also, the negative binomial distribution has been employed as an approximate stationary solution to the TXmodel for many gene expression systems (Sukys A et al., 2022; Öcal K et al., 2022). We assign weights and parameters (aij, mij, lij) output layer neurons (Supplementary Figure S1):
where represents the value of the j -th neuron of the output layer of the i -th set of parameters, aij is the weight coefficient of the negative binomial distribution, satisfying , lij, mij represents the i -th parameter of the j -th negative binomial distribution of the mixed negative binomial distribution, nNB is the number of negative binomial distribution. And the mixed negative binomial distributions can be expressed as (Supplementary Figure S1):
where .
Loss function of DeepTXsolver
Our loss function includes two parts: one part is the KL divergence between the mixed negative binomial distribution generated by neural network Pneuralnet, i and label distribution Psimulation,i generated by SSA, and the another part is the log difference of the moment statistics between theoretical solution of the TXmodel sij and the output of the neural network Ŝij:
where λ is the hyperparameter, represents the weight coefficient of the statistical loss, and λ = 0.1, nbatch represents the number of samples in each batch, and nstats is the number of summary statistics. For the latter part of loss function, we can still obtain the moment statistic by theoretical derivation, although the analytic distribution is difficult to solve. Adding moment statistics to the loss function can make the neural network more robust (Figure 2E). Specifically, we use four statistics here: si1 indicates the mean, si 2 denotes the Fano factor, si 3 denotes kurtosis, si 4 represents the bimodal coefficient. It’s worth mentioning that the statistics here is scalable and we can add more information inside the loss function.
Training details of DeepTXsolver
Effective model training necessitates the selection of an appropriate optimizer to minimize the loss function. In our case, we employed the Adam optimizer (Kingma DP & Ba J, 2014), a widely adopted choice in deep learning due to its robust performance. And each time, a small batch of samples is selected to update the neural network parameters w (Goodfellow I et al., 2016), and the iteration of all samples is called an epoch. Furthermore, we utilized the Glorot Uniform method to establish an optimal initial point for our neural network parameter (Glorot X & Bengio Y, 2010).
Batch size and learning rate are key optimization hyperparameters affecting model convergence. A smaller batch size enhances model generalizability but lengthens training time, hence a batch size of 64 is chosen for balance (Keskar NS et al., 2016). In gradient descent process, the learning rate significantly influences the step size, where large rates may induce instability or divergence, and conversely, small rates may precipitate premature convergence to local optima. To counteract this, a decaying learning rate is used, allowing larger early updates for faster convergence and preventing instability in later stages. Further, training of the model stops if it has been trained for over 200 epochs or if the learning rate has been reduced five times. These criteria are adaptive and efficient as further training does not significantly enhance the model’s performance.
Hyperparameter tunning of DeepTXsolver
To obtain a neuron network model with accurate prediction and generalization, we compared the model architectures in terms of the number of neurons per layer, the number of hidden layers, and the number of neurons in the output layer, and compared the performance of the model with different dataset sizes. And we applied the Hellinger distance between the true distribution and the predicted distribution by the neuron network as a judgment criterion.
First, we compared the performance of architectures with a single hidden layer with (16, 32, 64, 128, 256, 512) neurons, and found that the performance of the model barely increased after increasing to 128 neurons (Supplementary Figure S2a). Secondly, after comparing the performance of architectures with different numbers of hidden layers, we found that a single hidden layer has the best performance (Supplementary Figure S2b). Thirdly, comparing the performance of 3i (i = 1, 2,…10) neurons in the output layer, we found that after 12 neurons (4 negative binomial distribution mixtures), the performance of the model was not improved (Supplementary Figure S2c). Particularly, four negative binomial distribution mixtures are able to fit the bimodal nature of the distribution as well as ensure good training efficiency (Öcal K et al., 2022).
Finally, to obtain a dataset that contributes to convergence as well as model generalization while ensuring training efficiency, we compared the model performance for dataset sizes of (500, 1000, 5000,10000,15000) and found that increasing the dataset size further does not significantly improve the model performance after to 15000 samples (Supplementary Figure S2d). Therefore, we identified a single hidden layer of 128 neurons and an output layer of 12 neurons as the appropriate model architecture, and 15,000 samples as the appropriate training set.
1.4 Inferring burst kinetics from scRNA-seq data with DeepTXinferrer
This section describes how to recover dynamic burst kinetics from static scRNA-seq data. In principle, once we have constructed the mapping of the parameters to the corresponding stationary distributions through the pre-trained neural network in section 1.3, we can quickly access the corresponding likelihood functions when updating the parameters during the statistical inference process. However, true scRNA-seq data is the result of a combination of gene expression processes and sequencing processes, and the current pre-trained neural network can only yield a description of the gene expression process modeled by TXmodel. Therefore, we require to extent it into a hierarchical model to depict the more realistic generation mechanism behind the scRNA-seq data (Sarkar A & Stephens M, 2021).
Modeling observed scRNA-seq data with hierarchical model
The hierarchical model aims to couple two different processes, specifically, the true gene expression mechanism process and the sequencing process that immediately follows. In terms of a formula, the probability distribution of observations can be represented as a convolution:
where Pmeasure is the measure model and Pmodel is the expression model.
First, we model the sequencing process. Each cell can be regarded as a pool of mRNA, and represents the true expression amount of the g -th gene in the c -th cell. refers to the observed value obtained by scRNA-seq from the given true expression value , satisfying the following conditional probability distribution
where Pmeasure (y | x) characterizes all noise in the sequencing process, and is generally assumed to be Binomial distribution or Poisson distribution, which has been proven experimentally and theoretically (Sarkar A & Stephens M, 2021; Wang J et al., 2018). By incorporating an additional sampling probability, denoted as αcg, into the sequencing process, we are able to characterize capture efficiency. We make the assumption that intercellular molecules are independent of one another, and that only proportional products are captured and sequenced, following a Binomial distribution
where αcg is the sampling probability, and we set αcg = 0.5.
Second, we model the true gene expression process, assuming that it satisfies the following probability distribution
The selection of the true gene expression probability distribution is very critical. It needs to meet the following conditions: (i) It can well characterize the transcription burst kinetics. (ii) The inference derived from this distribution necessitates both efficiency and precision.
In section 1.2, we have constructed a model of the gene expression process with DNA damage, enabling the model’s stationary distribution to encapsulate the bursty dynamics inherent in gene expression. The solution to the model is approximated by a mixed negative binomial distribution, which is obtained from a trained neural network. Importantly, this mixed negative binomial distribution enables efficient inference. Hence,
Further, the following hierarchical model can be obtained
where represents the random variable of the true expression value, represents the random variable of the observed expression value, Binomial represents the binomial distribution, NB represents negative binomial distribution and represents the sampling probability. Further, we can obtain
It is proved that also follows a mixed negative binomial distribution (Supplementary text 1.3), denoted as Pobsmodel.
Estimation and optimization process of DeepTXinferrer
We can use the DeepTX framework to infer scRNA-seq data to obtain the corresponding gene expression burst kinetics through optimization methods. Given the expression counts of gene g in all cells, its corresponding probability distribution Pobsdata can be obtained. We use the Hellinger distance to compare the error between the true probability distribution Pobsdata of genes and the model probability distribution Pmodeldata. The following optimization problem can be obtained
where ncount is the maximum value of mRNA expression, Pobsdata is the density probability for each gene g. Specifically, Pobsdata is a function that is differentiable with respect to parameter θ g. We can use the gradient descent method to obtain the optimal parameters of the TXmodel and the corresponding distribution of the TXmodel. The stopping condition of the gradient descent method is that the loss value is less than 10−3.
Posterior distribution
The optimization method in the previous section can obtain the parameters of the mechanism model that satisfies the distribution of scRNA-seq data. In this section, we give a method to solve the confidence interval of each optimized parameter (Gaskin T et al., 2023).
Each iteration of the optimization process can obtain the estimated value and loss value corresponding to the estimated value. In particular, according to Equation (15), the loss is calculated as follows.
The following posterior probability can be obtained
where π (0) is the prior, we take it as uniform density, n(k) represents the number of values takes during the iteration process, and represents the loss potential of each iteration. Let the posterior probability be , and the marginal distribution corresponding to each component of parameter can be obtained
The subscript − j means that we integrate the components of θ(k) except . In particular, to allow the iterative parameters to traverse the entire parameter space, multiple optimizations can be performed with different initial values to improve the performance of the posterior distribution.
Validation on synthetic data
We conducted an evaluation of the inference performance of the DeepTX framework on synthetic scRNA-seq data, focusing on three key aspects. (i) Model accuracy: This refers to the minimal discrepancy between the distribution corresponding to the inferred parameters and the input distribution. (ii) Parameter identifiability: This denotes the proximity of the values of the inferred parameters to the true parameters, suggesting that the model can accurately identify the true parameters. (iii) Result robustness: This implies that the outcomes of each inference do not exhibit significant variations.
We created a synthetic dataset using Sobol sampling and SSA under preset parameters. The synthesized distribution is used as input to the DeepTX framework, and upon optimization, we obtained estimated parameters and their corresponding distribution .We showed that the Hellinger distance between the estimated and synthetic dataset’s distribution was minimal, indicating accurate estimations (Supplementary Figure S3a). Additionally, the burst kinetics of both the synthetic data and the estimated data can be computed utilizing their respective parameters, and . We showed that a strong correlation was observed between the synthetic burst kinetics γ syn = (BFsyn, BSsyn) and estimated burst kinetics γest = (BFest, BSest) (Supplementary Figure S3b-c), suggesting that burst kinetics are identifiable.
Subsequently, we conducted a robustness verification of the framework’s ability to infer burst kinetics. The error was computed as follows:
Given a set of parameters , we drew samples of sizes (100, 500, 1000, 1500, 2000) from the probability distribution corresponding to . For each sample size, we performed 50 repetitions, and then DeepTX framework is used to infer the distribution obtained from each sampling, setting different initial values each time. We found that the inferred results are robust, and the robustness increases with the increase in the number of samples (Supplementary Figure S3d-e).
2 Dataset
2.1 IdU and DMSO dataset
To investigate the effects of 5’-iodo-2’-deoxyuridine (IdU) on genome-wide gene expression, Desai et al. conducted scRNA-seq data on mouse embryonic stem cells, both with and without IdU treatment. Consequently, they obtained scRNA-seq data for 12481 genes of transcriptomes from 812 cells and 13780 genes of transcriptomes from 744 cells, respectively. The authors found a phenomenon that IdU drug treatment increased transcriptional noise genome-wide but did not affect transcriptional mean, further affecting cell fate decisions.
We used the DeepTX framework to infer this set of scRNA-seq data to obtain potential transcription burst kinetics, and further study the impact of changes in transcription burst kinetics on cell fate decisions. To eliminate the impact of technical noise on data inference, we preprocess the data as follows. Firstly,We normalized the data ycg using Seurat, where ycg represents the expression level of the gene g in the cell c. The normalization process is that we multiplied each ycg by a scaling factor , where ynormalized = 50000. Particularly, the scaling factor S is related to the total expression of genes in each cell. Secondly, after normalizing the expression, we divided it into bins with a unit of 1 to obtain the bin where each expression value is located. Finally, to eliminate the impact of low-expression genes on the inference process, we filter out genes whose average gene expression is less than 1. After data processing, there are 6082 genes from 812 cells left in our IdU and 6082 genes from 744 cells left in DMSO data sets.
2.2 5FU Dataset
To study the impact of 5-fluorouracil (5FU) drug-induced DNA damage on cell fate decisions, Park SR et al. performed scRNA-seq on colon cancer cells treated with 0 μm, 10 μm, and 50 μm 5FU drug (Park SR et al., 2020) and obtained 8534 genes from 1673 cells, 7077 genes from 632 cells, and 6661 genes from 619 cells, respectively. The authors found that Colon cancer cells treated with different doses of 5FU exhibit different transcriptional phenotypes, and these different phenotypes correspond to different cell fate decisions.
We used the DeepTX framework to infer these three sets of scRNA-seq data to obtain potential transcriptional burst kinetics, thereby studying the impact of changes in transcriptional burst kinetics caused by different doses of 5FU treatment on cell fate decisions. To eliminate the impact of technical noise on data inference, we preprocess the data in the following steps.
First, we normalize ycg by multiplying a scale factor S, where . Second, we divide the normalized data into intervals of unit 1 to round the data. Finally, to eliminate the impact of low-expression genes on the inference process, we filter out genes whose average gene expression is less than 1. After data processing, there are 730 genes from 593 cells left in control data set, 174 genes from 593 cells left in our 10 μm data set. 5FU treatment data set and 187 genes from 564 cells left in 50 μm 5FU treatment
Additional information
Funding
This work was supported by National Key R&D Program of China grant 2021YFA1302500 by Natural Science Foundation of P.R. China grants 12171494, 11931019, 11775314, 62373384, and 12301646 by Key-Area Research and Development Program of Guangzhou, P.R. China, grants 202007030004 and 2019B110233002 by Guangdong Basic and Applied Basic Research Foundation grants 2022A1515011540 and 2023A1515011982 by Guangdong Province Key Laboratory of Computational Science at the Sun Yat-sen University grant 2020B1212060032 by the Fundamental Research Funds for the Central Universities, Sun Yat sen University grants 23qnpy48 and 23pnqy49 and by China Postdoctoral Science Foundation grant 2023M734061.
Author contributions
J.Z., Q.N., and B.J. conceived the project. J.Z., Z.H., and S.L. designed the algorithm. Z.H. and S.L. implemented the code, performed the simulations, analyzed the data, and interpreted the results. Z.W. and Z.Z. analyzed the data and interpreted the results. Q.N. and B.J. supported the data interpretation. Z.H., S.L., and J.Z. wrote the paper. All authors revised and approved the paper. J.Z. supervised the project.
Data availability
All the scRNA-seq data can be obtained from the public database. The scRNA-seq data of mouse embryonic stem cells (mESC) + DMSO and mESC + IdU data reported here is available at the Gene Expression Omnibus (GEO) database under accession number: GSE176044. The scRNA-seq data of RKO cell with different 5-fluorouracil treatment is available at the Gene Expression Omnibus (GEO) database under accession number: GSE149224.
Code availability
The code for reproducing the presented analysis results is available at GitHub repository (https://github.com/cellfateTX/DeepTX).
References
- Tuning gene expression to changing environments: from rapid responses to evolutionary adaptationNature Reviews Genetics 9:583–593https://doi.org/10.1038/nrg2398
- Transcription factors and 3D genome conformation in cell-fate decisionsNature 569:345–354https://doi.org/10.1038/s41586-019-1182-7
- Nature, nurture, or chance: stochastic gene expression and its consequencesCell 135:216–226https://doi.org/10.1016/j.cell.2008.09.050
- Cellular decision making and biological noise: from microbes to mammalsCell 144:910–925https://doi.org/10.1016/j.cell.2011.01.030
- Stochasticity and cell fateScience 320:65–68https://doi.org/10.1126/science.1147888
- Challenges in measuring and understanding biological noiseNature Reviews Genetics 20:536–548https://doi.org/10.1038/s41576-019-0130-6
- What is a transcriptional burst?Trends in Genetics 36:288–297https://doi.org/10.1016/j.tig.2020.01.003
- Real-time kinetics of gene activity in individual bacteriaCell 123:1025–1036https://doi.org/10.1016/j.cell.2005.09.031
- Transcriptional pulsing of a developmental geneCurrent Biology 16:1018–1025https://doi.org/10.1016/j.cub.2006.03.092
- Real-time observation of transcription initiation and elongation on an endogenous yeast geneScience 332:475–478https://doi.org/10.1126/science.1202142
- A matter of time: using dynamics and theory to uncover mechanisms of transcriptional burstingCurrent Opinion in Cell Biology 67:147–157https://doi.org/10.1016/j.ceb.2020.08.001
- Transcription in living Cells: molecular mechanisms of burstingAnnual Review of Biochemistry 89:189–212https://doi.org/10.1146/annurev-biochem-011520-105250
- Life or death after a break: what determines the choice?Molecular Cell 76:346–358https://doi.org/10.1016/j.molcel.2019.08.023
- Cellular responses to DNA damage: one signal, multiple choicesAnnual Review of Genetics 40:187–208https://doi.org/10.1146/annurev.genet.40.110405.090428
- The multiple mechanisms that regulate p53 activity and cell fateNature Reviews Molecular Cell Biology 20:199–210https://doi.org/10.1038/s41580-019-0110-x
- DNA end-resection in highly accessible chromatin produces a toxic breakbioRxiv https://doi.org/10.1101/691857
- Endogenous replication stress in mother cells leads to 1uiescence of daughter cellsCell Reports 19:1351–1364https://doi.org/10.1016/j.celrep.2017.04.055
- DNA damage during S-phase mediates the proliferation-quiescence decision in the subsequent G1 via p21 expressionNature Communications 8https://doi.org/10.1038/ncomms14728
- Pectolinarigenin inhibits bladder urothelial carcinoma cell proliferation by regulating DNA damage/autophagy pathwaysCell Death Discovery 9https://doi.org/10.1038/s41420-023-01508-9
- Nuclear translocation of Cyclin B1 marks the restriction point for terminal cell cycle exit in G2 phaseCell Cycle 13:2733–2743https://doi.org/10.4161/15384101.2015.945831
- Persistent repair intermediates induce senescenceNature Communications 9https://doi.org/10.1038/s41467-018-06308-9
- ATR signaling can drive cells into senescence in the absence of DNA breaksGenes & Development 22:297–302https://doi.org/10.1101/gad.452308
- DNA damage and repair in age-related inflammationNature Reviews Immunology 23:75–89https://doi.org/10.1038/s41577-022-00751-y
- DNA damage-how and why we age?Elife 10https://doi.org/10.7554/eLife.62852
- Targeting apoptosis in cancer therapyNature Reviews Clinical Oncology 17:395–417https://doi.org/10.1038/s41571-020-0341-y
- DNA damage-induced cell death: from specific DNA lesions to the DNA damage response and apoptosisCancer Letters 332:237–248https://doi.org/10.1016/j.canlet.2012.01.007
- DNA damage triggers tubular endoplasmic reticulum extension to promote apoptosis by facilitating ER-mitochondria signalingCell Research 28:833–854https://doi.org/10.1038/s41422-018-0065-z
- The DNA damage response to transcription stressNature Reviews Molecular Cell Biology 20:766–784https://doi.org/10.1038/s41580-019-0169-4
- DNA damage regulates alternative splicing through inhibition of RNA polymerase II elongationCell 137:708–720https://doi.org/10.1016/j.cell.2009.03.010
- The cellular response to transcription-blocking DNA damageTrends in Biochemical Sciences 43:327–341https://doi.org/10.1016/j.tibs.2018.02.010
- What happens at the lesion does not stay at the lesion: transcription-coupled nucleotide excision repair and the effects of DNA damage on transcription in cis and transDNARepair 71:56–68https://doi.org/10.1016/j.dnarep.2018.08.007
- The RNA response to DNA damageJournal of Molecular Biology 428:2636–2651https://doi.org/10.1016/j.jmb.2016.03.004
- Stochastic transcription in the p53-mediated response to DNA damage is modulated by burst frequencyMolecular Systems Biology 15https://doi.org/10.15252/msb.20199068
- Comparative analysis between RNA-seq and single-molecule RNAFISH indicates that the pyrimidine nucleobase idoxuridine (IdU) globally amplifies transcriptional noisebioRxiv https://doi.org/10.1101/2023.03.14.532632
- ADNA repair pathway can regulate transcriptional noise to promote cell fate transitionsScience 373https://doi.org/10.1126/science.abc6506
- Visualization of single RNA transcripts in situScience 280:585–590https://doi.org/10.1126/science.280.5363.585
- Massively parallel and time-resolved RNA sequencing in single cells with scNT-seqNature Methods 17:991–1001https://doi.org/10.1038/s41592-020-0935-4
- HT-smFISH: a cost-effective and flexible workflow for high-throughput single-molecule RNA imagingNature Protocols 18:157–187https://doi.org/10.1038/s41596-022-00750-2
- Scaling single-cell genomics from phenomenology to mechanismNature 541:331–338https://doi.org/10.1038/nature21350
- Smart-seq2 for sensitive full-length transcriptome profiling in single cellsNature Methods 10:1096–1098https://doi.org/10.1038/nmeth.2639
- Massively parallel digital transcriptional profiling of single cellsNature Communications 8https://doi.org/10.1038/ncomms14049
- Systematic Analysis of the Determinants of Gene Expression Noise in Embryonic Stem CellsCell Systems 5:471–484https://doi.org/10.1016/j.cels.2017.10.003
- CpG island composition differences are a source of gene expression noise indicative of promoter responsivenessGenome Biology 19https://doi.org/10.1186/s13059-018-1461-x
- Genome-wide kinetic properties of transcriptional bursting in mouse embryonic stem cellsScience Adavances 6https://doi.org/10.1126/sciadv.aaz6699
- Genomic encoding of transcriptional burst kineticsNature 565:251–254https://doi.org/10.1038/s41586-018-0836-1
- Genome-wide inference reveals that feedback regulations constrain promoter-dependent transcriptional burst kineticsNucleic Acids Research 51:68–83https://doi.org/10.1093/nar/gkac1204
- Inferring transcriptional bursting kinetics from single-cell snapshot data using a generalized telegraph modelRoyal Society Open Science 10https://doi.org/10.1098/rsos.221057
- Stem cell differentiation as a non-Markov stochastic processCell Systems 5:268–282https://doi.org/10.1016/j.cels.2017.08.009
- Dynamic regulation of transcriptional states by chromatin and transcription factorsNature Reviews Genetics 15:69–81https://doi.org/10.1038/nrg3623
- Stochastic analysis and inference of a two-state genetic promoter model2013 American Control Conference IEEE :4563–4568
- 3 (‘)-5 (‘) crosstalk contributes to transcriptional burstingGenome Biology 22https://doi.org/10.1186/s13059-020-02227-5
- Analytical distribution and tunability of noise in a model of promoter progressBiophysical Journal 102:1247–1257https://doi.org/10.1016/j.bpj.2012.02.001
- Promoter-mediated transcriptional dynamicsBiophysical Journal 106:479–488https://doi.org/10.1016/j.bpj.2013.12.011
- Analytical results for a multistate gene modelSIAMJournal On Applied Mathmatics 72:789–818https://doi.org/10.1137/110852887
- Intrinsic dynamics of a human gene reveal the basis of expression heterogeneityCell 176:213–226https://doi.org/10.1016/j.cell.2018.11.026
- Structure of silent transcription intervals and noise characteristics of mammalian genesMolecular Systems Biology 11https://doi.org/10.15252/msb.20156257
- Inferring single-cell gene expression mechanisms using stochastic simulationBioinformatics 31:1428–1435https://doi.org/10.1093/bioinformatics/btv007
- Transcriptional bursting in gene expression: analytical results for general stochastic modelsPLoS Computational Biology 11https://doi.org/10.1371/journal.pcbi.1004292
- Transcription stochasticity of complex gene regulation modelsBiophysical Journal 103:1152–1161https://doi.org/10.1016/j.bpj.2012.07.011
- Population density approach for discrete mRNA distributions in generalized switching models for stochastic gene expressionPhysical Review E: Statistical Nonlinear and Soft Matter Physics 85https://doi.org/10.1103/PhysRevE.85.061919
- Markovian approaches to modeling intracellular reaction processes with molecular memoryProceedings of the National Academy of Sciences 116:23542–23550https://doi.org/10.1073/pnas.1913926116
- Stationary moments, distribution conjugation and phenotypic regions in stochastic gene transcriptionMathematical Biosciences and Engineering 16:6134–6166https://doi.org/10.3934/mbe.2019307
- Sequential Monte Carlo without likelihoodsProceedings of the National Academy of Sciences 104:1760–1765https://doi.org/10.1073/pnas.0607208104
- BayFish: Bayesian inference of transcription dynamics from population snapshots of single-molecule RNAFISH in single cellsGenome Biology 18https://doi.org/10.1186/s13059-017-1297-9
- Neural network aided approximation and parameter inference of non-Markovian models of gene expressionNature Communications 12https://doi.org/10.1038/s41467-021-22919-1
- Massive computational acceleration by using neural networks to emulate mechanism-based biological modelsNature Communications 10https://doi.org/10.1038/s41467-019-12342-y
- Solving differential equations using deep neural networksNeurocomputing 399:193–212https://doi.org/10.1016/j.neucom.2020.02.015
- The use of mixture density networks in the emulation of complex epidemiological individual-based modelsPLoS Computational Biology 16https://doi.org/10.1371/journal.pcbi.1006869
- Neural-network solutions to stochastic reaction networksNature Machine Intelligence 5:376–385https://doi.org/10.1038/s42256-023-00632-6
- Neural parameter calibration for large-scale multiagent modelsProceedings of the National Academy of Sciences 120https://doi.org/10.1073/pnas.2216415120
- Approximating solutions of the Chemical Master equation using neural networksiScience 25https://doi.org/10.1016/j.isci.2022.105010
- Modelling capture efficiency of single cell RNA-sequencing data improves inference of transcriptome-wide burst kineticsbioRxiv https://doi.org/10.1101/2023.03.06.531327
- DeepCME: a deep learning framework for computing solution statistics of the chemical master equationPLoS Computational Biology 17https://doi.org/10.1371/journal.pcbi.1009623
- Effects of cell cycle variability on lineage and population measurements of messenger RNA abundanceJournal of the Royal Society Interface 17https://doi.org/10.1098/rsif.2020.0360
- Inference and uncertainty quantification of stochastic gene expression via synthetic modelsJournal of the Royal Society Interface 19https://doi.org/10.1098/rsif.2022.0153
- On the distribution of points in a cube and the approximate evaluation of integralsZhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki 7:784–802https://doi.org/10.1016/0041-5553(67)90144-9
- Exact results for queuing models of stochastic transcription with memory and crosstalkPhysical Review E 103https://doi.org/10.1103/PhysRevE.103.062414
- Exact results for gene-expression models with general waiting-time distributionsPhysical Review E 109https://doi.org/10.1103/PhysRevE.109.024119
- Stochastic state transitions give rise to phenotypic equilibrium in populations of cancer cellsCell 146:633–644https://doi.org/10.1016/j.cell.2011.07.026
- Dynamic proteomics of individual cancer cells in response to a drugScience 322:1511–1516https://doi.org/10.1126/science.1160165
- Bimodal gene expression patterns in breast cancerBMCGenomics 11https://doi.org/10.1186/1471-2164-11-s1-s8
- Separating measurement and expression models clarifies confusion in single-cell RNA sequencing analysisNature Genetics 53:770–777https://doi.org/10.1038/s41588-021-00873-4
- Epigenetic reshaping through damage: promoting cell fate transition by BrdU and IdU incorporationCell and Bioscience 14https://doi.org/10.1186/s13578-024-01192-x
- Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noiseNature 441:840–846https://doi.org/10.1038/nature04785
- Gene Ontology Consortium: going forwardNucleic Acids Research 43:D1049–1056https://doi.org/10.1093/nar/gku1179
- Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profilesProceedings of the National Academy of Sciences 102:15545–15550https://doi.org/10.1073/pnas.0506580102
- Osteogenic differentiation of skeletal muscle progenitor cells is activated by the DNA damage responseScientific Reports 9https://doi.org/10.1038/s41598-019-41926-3
- Transcription Factor Activation Profiles (TFAP) identify compounds promoting differentiation of Acute Myeloid Leukemia cell linesCell Death Discovery 8https://doi.org/10.1038/s41420-021-00811-7
- Single-cell transcriptomics reconstructs fate conversion from fibroblast to cardiomyocyteNature 551:100–104https://doi.org/10.1038/nature24454
- Colorectal cancerNature Reviews Disease Primers 1https://doi.org/10.1038/nrdp.2015.65
- Disruption of p53 in human cancer cells alters the responses to therapeutic agentsJournal of Clinical Investigation 104:263–269https://doi.org/10.1172/jci6863
- A comprehensive and high-resolution genome-wide response of p53 to stressCell Reports 8:514–527https://doi.org/10.1016/j.celrep.2014.06.030
- Single-Cell Transcriptome Analysis of Colon Cancer Cell Response to 5-Fluorouracil-Induced DNADamageCell Reports 32https://doi.org/10.1016/j.celrep.2020.108077
- The thioredoxin system mediates redox-induced cell death in human colon cancer cells: implications for the mechanism of action of anticancer agentsCancer Research 68:8269–8277https://doi.org/10.1158/0008-5472.Can-08-2010
- Folic acid modified copper oxide nanoparticles for targeted delivery in in vitro and in vivo systemsRSCAdvances 5:68169–68178https://doi.org/10.1039/C5RA08110F
- In vivo tumor targeting and anti-tumor effects of 5-fluororacil loaded, folic acid targeted quantum dot systemJournal of Colloid and Interface Science 480:146–158https://doi.org/10.1016/j.jcis.2016.07.011
- Mechanisms and mathematical modeling of ROS production by the mitochondrial electron transport chainAmerican Journal of Physiology: Cell Physiology 323:C69–c83https://doi.org/10.1152/ajpcell.00455.2021
- Production of reactive oxygen species in brain mitochondria: contribution by electron transport chain and non-electron transport chain sourcesAntioxidants & Redox Signaling 7:1140–1149https://doi.org/10.1089/ars.2005.7.1140
- Aging: a mitochondrial DNA perspective, critical analysis and an updateWorld Journal of Experimental Medicine 4:46–57https://doi.org/10.5493/wjem.v4.i4.46
- A novel 5-Fluorouracil targeted delivery to colon cancer using folic acid conjugated liposomesBiomedicine & Pharmacotherapy 108:1259–1273https://doi.org/10.1016/j.biopha.2018.09.128
- Ferredoxin reductase affects p53-dependent, 5-fluorouracil-induced apoptosis in colorectal cancer cellsNature Medicine 7:1111–1117https://doi.org/10.1038/nm1001-1111
- Long-term cultivation of colorectal carcinoma cells with anti-cancer drugs induces drug resistance and telomere elongation: an in vitro studyBMCCancer 1https://doi.org/10.1186/1471-2407-1-10
- Human telomerase RNA accumulation in Cajal bodies facilitates telomerase recruitment to telomeres and telomere elongationMolecular Cell 27:882–889https://doi.org/10.1016/j.molcel.2007.07.020
- Telomerase and drug resistance in cancerCellular and Molecular Life Sciences 74:4121–4132https://doi.org/10.1007/s00018-017-2573-2
- Genomic hallmarks and therapeutic implications of G0 cell cycle arrest in cancerGenome Biology 24https://doi.org/10.1186/s13059-023-02963-4
- GSVA: gene set variation analysis for microarray and RNA-seq dataBMCBioinformatics 14https://doi.org/10.1186/1471-2105-14-7
- Transcriptional burst frequency and burst size are equally modulated across the human genomeProceedings of the National Academy of Sciences 109:17454–17459https://doi.org/10.1073/pnas.1213530109
- Single-RNA counting reveals alternative modes of gene expression in yeastNature Structural & Molecular Biology 15:1263–1271https://doi.org/10.1038/nsmb.1514
- Physical biology of the cellGarland Science
- Opening chromatinMolecular Cell 9:209–211https://doi.org/10.1016/s1097-2765(02)00463-x
- Defining mechanisms that regulate RNA polymerase II transcription in vivoNature 461:186–192https://doi.org/10.1038/nature08449
- Nonmodular oscillator and switch based on RNA decay drive regeneration of multimodal gene expressionNucleic Acids Research 50:3693–3708https://doi.org/10.1093/nar/gkac217
- Dynamic imaging of nascent RNA reveals general principles of transcription dynamics and stochastic splice site selectionCell 184:2878–2895https://doi.org/10.1016/j.cell.2021.04.012
- Modeling bursty transcription and splicing with the chemical master equationBiophysical Journal 121:1056–1069https://doi.org/10.1016/j.bpj.2022.02.004
- Coordinated pulses of mRNA and of protein translation or degradation produce EGF-Induced protein burstsCell Reports 18:3129–3142https://doi.org/10.1016/j.celrep.2017.03.014
- Modulation of transcriptional burst frequency by histone acetylationProceedings of the National Academy of Sciences 115:7153–7158https://doi.org/10.1073/pnas.1722330115
- Effect of Interaction between chromatin loops on cell-to-cell variability in gene expressionPLoS Computational Biology 12https://doi.org/10.1371/journal.pcbi.1004917
- Power-law behavior of transcriptional bursting regulated by enhancer-promoter communicationGenome Research 34:106–118https://doi.org/10.1101/gr.278631.123
- 4D nucleome equation predicts gene expression controlled by long-range enhancer-promoter interactionPLoS Computational Biology 19https://doi.org/10.1371/journal.pcbi.1011722
- Recent advances in single-cell sequencing technologiesPrecision Clinical Medicine 5https://doi.org/10.1093/pcmedi/pbac002
- Multi-omics integration in the age of million single-cell dataNature Reviews Neurology 17:710–724https://doi.org/10.1038/s41581-021-00463-x
- Markovian modeling of gene-product synthesisTheoretical Population Biology 48:222–234https://doi.org/10.1006/tpbi.1995.1027
- Inferring the kinetics of stochastic gene expression from single-cell RNA-sequencing dataGenome Biology 14https://doi.org/10.1186/gb-2013-14-1-r7
- Simulating non-Markovian stochastic processesPhysical Review E: Statistical Nonlinear and Soft Matter Physics 90https://doi.org/10.1103/PhysRevE.90.042108
- A Gillespie algorithm for non-Markovian stochastic processesSIAMReview 60:95–115https://doi.org/10.1137/16M1055876
- Learning data-driven discretizations for partial differential equationsProceedings of the National Academy of Sciences 116:15344–15349https://doi.org/10.1073/pnas.1814058116
- A general and flexible method for signal extraction from single-cell RNA-seq dataNature Communications 9https://doi.org/10.1038/s41467-017-02554-5
- Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2Genome Biology 15https://doi.org/10.1186/s13059-014-0550-8
- Adam: a method for stochastic optimizationarXiv https://doi.org/10.48550/arXiv.1412.6980
- Deep learningMIT press
- Understanding the difficulty of training deep feedforward neural networksProceedings of the thirteenth international conference on artificial intelligence and statistics :249–256
- On large-batch training for deep learning: generalization gap and sharp minimaarXiv https://doi.org/10.48550/arXiv.1609.04836
- Gene expression distribution deconvolution in single-cell RNA sequencingProceedings of the National Academy of Sciences 115:E6437–E6446https://doi.org/10.1073/pnas.1721085115
Article and author information
Author information
Version history
- Sent for peer review:
- Preprint posted:
- Reviewed Preprint version 1:
Copyright
© 2024, Huang et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.