Deep learning linking mechanistic models to single-cell transcriptomics data reveals transcriptional bursting in response to DNA damage

  1. Zhiwei Huang
  2. Songhao Luo
  3. Zihao Wang
  4. Zhenquan Zhang
  5. Benyuan Jiang  Is a corresponding author
  6. Qing Nie  Is a corresponding author
  7. Jiajun Zhang  Is a corresponding author
  1. Guangdong Province Key Laboratory of Computational Science, Sun Yat-sen University, China
  2. School of Mathematics, Sun Yat-sen University, China
  3. Department of Mathematics, University of California Irvine, United States
  4. School of Mathematics and Statistics, Guangdong University of Technology, China
  5. Guangdong Lung Cancer Institute, Guangdong Provincial People’s Hospital and Guangdong Academy of Medical Sciences, China
6 figures and 1 additional file

Figures

Overview of the DeepTX framework.

(A) DNA-damaging drugs induce DNA double-strand damage, which will slow down or even stop the movement of RNA Pol II on the DNA double strands. Changes in the state of RNA Pol II movement will lead to changes in the kinetics of gene expression bursts (including burst frequency and burst size), which will affect cell fate decisions such as apoptosis, differentiation, and survival. (B) The input to the DeepTX framework is scRNA-seq data, and the output is the burst kinetics corresponding to each gene. The core of the DeepTX inference framework is a hierarchical model, which is a mixture of the stationary distribution of the mechanism model solved by the neural network (Step 1) and the binomial distribution followed by the sequencing process. This hierarchical model is used to infer the underlying dynamics parameters corresponding to the data distribution (Step 2).

Figure 2 with 2 supplements
Solving the TXmodel using deep learning.

(A) With the TXmodel and corresponding model parameters θ, we obtain a steady distribution using the stochastic simulation algorithm (SSA) simulation and the moment statistics employing binomial moment theory. These results are compared with the distribution and moment statistics of the neural network’s output to calculate the neural network’s loss value. The parameters of the neural network are optimized using gradient descent until convergence is achieved, indicating attainment of optimal parameters. (B) Comparison between SSA results and DeepTX prediction results for four sets of test parameters. The gray bars and the red stepped line correspond to the distributions obtained from SSA simulations and DeepTX predictions, respectively. (C) Verification results of the moment statistics predicted by DeepTX and the true moments on a test set containing 1000 elements. (D) The loss curve during DeepTX training. The blue curve represents that the loss function is composed of Kullback–Leibler (KL) divergence of distributions, and the red curve represents that the loss function is composed of KL divergence of distributions and statistics. (E) Box plots of the Hellinger distance between the true distribution and the predicted distribution by DeepTX of different loss types on the test set.

Figure 2—figure supplement 1
Network architecture of DeepTX.

(a) The input of the neural network is the parameter of the mechanism model, and the output is the parameter of the mixed negative binomial distribution. The neural network has one or more hidden layers. The layers are connected through the weight parameter w. Specifically, the activation functions corresponding to the hidden layer and the output layer are marked at the bottom of the layer. (b) Schematic diagram of mixed negative binomial distribution.

Figure 2—figure supplement 2
Hyperparameter tuning for DeepTX.

(a) Hyperparameter tuning of neuron number. (b) Hyperparameter tuning of neuron hidden layer architecture. (c) Hyperparameter tuning of the number of dataset size. (d) Hyperparameter tuning of the mixture components.

Figure 3 with 4 supplements
Inferring TXmodel from scRNA-seq data.

(A) The dynamic parameters are sampled from a given prior distribution, and as input to the neural network, the solution of the corresponding dynamic model can be obtained by using this parameter. The solution of the dynamic model is mixed with the binomial distribution to obtain the model observed distribution. Loss values were obtained using the observed distribution of the data obtained from the scRNA-seq data compared to the observed distribution of the model. The loss values are optimized until convergence using gradient descent to obtain the parameters of the mechanism model and the posterior distribution of its parameters. (B) The scatters represent the real BS and BF values of the stochastic simulation algorithm (SSA) synthetic set parameters, and the depth of the color represents the error between the inferred BS (BF) and the true BS (BF). (C, D) The blue solid line represents the edge density of the burst kinetics of the model, and the red dashed line represents the true value. (E) The marginal density for the five parameters of the model, where the red dashed line represents the true value. The relative error (RE) quantifies the discrepancy between the true parameter and the peak of the inferred posterior distribution.

Figure 3—figure supplement 1
Precision and robustness of inferred results.

(a) Verification of the Hellinger distance between the distribution obtained by DeepTXsolver and the distribution obtained by stochastic simulation algorithm (SSA) simulation of the test set. (b, c) Correlation scatterplot of inferred and theoretical BS (BF). (d, e) Box plots of BF and BS obtained by inferring from different cell numbers, where the red dashed line corresponds to the true parameter.

Figure 3—figure supplement 2
Posterior marginal distributions of TXmodel parameters under an alternative setting.

(a–e) Posterior marginal densities for the five kinetic parameters inferred by DeepTX on synthetic data. The red dashed line indicates the ground-truth value used to generate the data.

Figure 3—figure supplement 3
Comparison of inferred burst dynamics under different capture efficiencies.

(a, b) Scatter plots comparing the inferred BS and BF between capture efficiencies of 0.3 and 0.5. (c, d) Scatter plots comparing the inferred BS and BF between capture efficiencies of 0.2 and 0.5.

Figure 3—figure supplement 4
Comparison of inference accuracy and efficiency between DeepTX and other algorithms.

(a) Hellinger distance for accuracy. (b) Average inference time per sample (seconds) for efficiency.

Figure 4 with 2 supplements
Burst size enhancement is linked to a delay in the cell cycle.

(A) Density map of BS of genes treated with 5′-iodo-2′-deoxyuridine (IdU) drugs compared to BS of genes not treated with drugs. Green dots represent upregulated BS differential genes. (B) Schematic representation of the simultaneous enhancement of both BS and noise of a gene accompanied by the mean value of the gene remaining unchanged after IdU drug treatment. In the log–log–log 3D space, the mean expression level lies on a blue plane defined by log(BS)+log(BF)=log(Mean), indicating that it is determined by the product of burst size and burst frequency. The orange plane captures a scaling relationship between expression noise and burst kinetics, described by log(BS)+log(BF)=κlog(Noise), where κ is a constant reflecting their covariation. Notably, the trajectory of the green sphere shows that, under a fixed mean expression level (i.e., confined to the blue plane), increased gene expression noise primarily results from an increase in burst size. (C) The schematic diagram illustrates the hypothesis that DNA damage induced by IdU treatment may affect gene transcription, potentially influencing cell cycle transitions during mitosis and regulating cellular differentiation. (D) Gene GO enrichment analysis is performed on the green dots of BS in (A) to obtain enrichment pathway diagram. The bold red pathways highlight the key components involved in the cellular differentiation mechanism. (E, F) Pathways obtained by GSEA enrichment analysis of BS of genes.

Figure 4—figure supplement 1
Inference results of scRNA-seq data treated with DMSO and 5′-iodo-2′-deoxyuridine (IdU).

(a, b) Comparative analysis of the mean and variance between the inferred distribution and the distribution of control scRNA-seq data. (c) The goodness-of-fit test is demonstrated for all inferred genes, with the histogram displaying gene counts exhibiting good fit (blue) and poor fit (red) relative to mean expression levels. (d–f) Comparison between mRNA distributions and DeepTX-inferred distributions for scRNA-seq data of three genes. The gray bars and the red stepped lines correspond to the distributions derived from scRNA-seq data and inferred by DeepTX, respectively. (g–l) The result of IdU treatment scRNA-seq data, similar to (a–f).

Figure 4—figure supplement 2
Analysis results of scRNA-seq data treated with 5′-iodo-2′-deoxyuridine (IdU) and DMSO.

(a–c) Mean, variance, and CV analysis results of IdU treatment and control scRNA-seq data, the blue scatter represents genes. (d) The number of intersections of genes upregulated by BF (BS) and genes downregulated by BF (BS). (e) Red dots represent genes significantly downregulated by BS, and green dots represent genes significantly upregulated by BS. (f) Red dots represent genes significantly downregulated by BF, and green dots represent genes significantly upregulated by BF. (g) Enrichment analysis results of BS downregulated differential genes.

Figure 5 with 2 supplements
Low-dose 5-fluorouracil (5FU) treatment is linked to increased burst frequency and cell apoptosis.

(A) Comparison of gene bursting kinetics and statistics between control and 10-dose 5FU treated cells. (B) Density map of inferred burst size. (C) Volcano map of the fold changes (FC) of inferred burst size between control and 10-dose 5FU treated cells. (D) The diagram hypothesizing the potential association between low-dose 5FU treatment and the induction of apoptosis. (E) The enrichment analysis results were derived from genes that were downregulated in BS and upregulated in BF in the experimental group. The bold red pathways highlight the key components involved in the cellular apoptotic mechanism.

Figure 5—figure supplement 1
Inference results of scRNA-seq data treated with low-dose 5-fluorouracil (5FU) and without 5FU treatment.

(a, b) Comparative analysis of the mean and variance between the inferred distribution and the distribution of control scRNA-seq data. (c) The goodness-of-fit test is demonstrated for all inferred genes, with the histogram displaying gene counts exhibiting good fit (blue) and poor fit (red) relative to mean expression levels. (d–f) Comparison between mRNA distribution and inferred distribution by DeepTX for scRNA-seq data of three genes. The gray bars and the red stepped line correspond to the distributions derived from scRNA-seq data and inferred by DeepTX, respectively. (g–l) The result of low-dose 5FU treatment scRNA-seq data, similar to (a–f).

Figure 5—figure supplement 2
Analysis results of scRNA-seq data treated with low-dose 5-fluorouracil (5FU) and without 5FU treatment.

(a–c) Mean, variance, and CV analysis results of low-dose 5FU treatment and control scRNA-seq data, each blue scatter represents a gene. (d) The number of intersections of genes upregulated by BF (BS) and genes downregulated by BF (BS). (e) Red dots represent genes significantly downregulated by BF, and green dots represent genes significantly upregulated by BF. (f) Red dots represent genes significantly downregulated by BS, and green dots represent genes significantly upregulated by BS. (g) Enrichment analysis results of BS upregulated differential genes.

Figure 6 with 2 supplements
High-dose 5-fluorouracil (5FU) treatment may promote cell drug resistance by altering burst frequency.

(A) Density map of inferred burst size. (B) Volcano map of the fold changes (FC) of inferred burst size between control and 50-dose 5FU treated cells. (C) Schematic diagram hypothesizing the mechanism by which 5FU drug treatment may induce cells to acquire antioxidant properties, potentially enabling continued survival. (D) The enrichment analysis results were derived from genes that were downregulated in BS and upregulated in BF in the experimental group. (E) UMAP of G0 arrest quality score (QS) of control and 50-dose 5FU treated cells. (F) Proportion of G0 arrested (drug-resistant) and cycling (non-resistant) cells. (G) Comparison of gene bursting kinetics and statistics of G0 arrested genes between control and 50-dose 5FU treated cells. The light gray lines indicate the trend of changes in the corresponding bursting kinetics values, mean, and variance for each gene.

Figure 6—figure supplement 1
Inference results of scRNA-seq data treated with high-dose 5-fluorouracil (5FU) and without 5FU treatment.

(a, b) Comparative analysis of the mean and variance between the inferred distribution and the distribution of control scRNA-seq data. (c) The goodness-of-fit test is demonstrated for all inferred genes, with the histogram displaying gene counts exhibiting good fit (blue) and poor fit (red) relative to mean expression levels. (d–f) Comparison between mRNA distributions and DeepTX-inferred distributions for scRNA-seq data of three genes. The gray bars and the red stepped line correspond to the distributions derived from scRNA-seq data and inferred by DeepTX, respectively. (g–l) The result of high-dose 5FU treatment scRNA-seq data, similar to (a–f).

Figure 6—figure supplement 2
Analysis results of scRNA-seq data treated with high-dose 5-fluorouracil (5FU) and without 5FU treatment.

(a–c) Mean, variance, and CV analysis results of high-dose 5FU treatment and control scRNA-seq data, each blue scatter represents a gene. (d) The number of intersections of genes upregulated by BF (BS) and genes downregulated by BF (BS). (e) Red dots represent genes significantly downregulated by BF, and green dots represent genes significantly upregulated by BF. (f) Red dots represent genes significantly downregulated by BS, and green dots represent genes significantly upregulated by BS. (g) Enrichment analysis results of BS upregulated differential genes.

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Zhiwei Huang
  2. Songhao Luo
  3. Zihao Wang
  4. Zhenquan Zhang
  5. Benyuan Jiang
  6. Qing Nie
  7. Jiajun Zhang
(2026)
Deep learning linking mechanistic models to single-cell transcriptomics data reveals transcriptional bursting in response to DNA damage
eLife 13:RP100623.
https://doi.org/10.7554/eLife.100623.4