# Pathway dynamics can delineate the sources of transcriptional noise in gene expression

1. School of BioSciences, University of Melbourne, Australia
2. Department of Mathematics and Statistics, La Trobe University, Australia
3. School of Mathematics and Statistics, University of Melbourne, Australia
9 figures, 12 tables and 3 additional files

## Figures

Figure 1 Modeling the effects of both intrinsic and extrinsic noise. (A) A schematic of the Telegraph process, with nodes A (active) and I (inactive) representing the state of the gene. Transitions between the states A and I occur stochastically at rates μ and λ, respectively. The parameter K is the mRNA transcription rate, and δ is the degradation rate. (B) The compound model incorporates extrinsic noise by assuming that parameters θ of the Telegraph model vary across an ensemble of cells, according to some probability distribution f⁢(θ;η). (C) Variation in the parameters across the cell population leads to greater variability in the mRNA copy number distribution.
Figure 2 Accuracy of our integral representations for the Telegraph and negative binomial distribution. (A) For each of the results in (3 - 5), we compare the (fixed-parameter) Telegraph and negative binomial distributions with their respective compound representations for two different sets of parameter values. The top panel (pink) shows comparisons for (3), with parameter values (left) λ=2, μ′=12, K′=100, μ=3, and K∼BetaK′(5,9), and (right) λ=1, μ′=20, K′=100, μ=2 and K∼BetaK′(3,18). The middle panel (green) gives comparisons for (4), with parameter values (left) λ=10, β=2, μ=2 and K∼Gamma(12,2) and (right) λ=1, β=1, μ=2 and K∼Gamma(3,1). The bottom panel (coral) gives comparisons for (5). The parameter values (left) are λ′=10, λ=15 and c=2 and (right) are λ′=2, λ=5 and c=3. (B) The top figure compares a Telegraph(2,4,60) distribution with samples from a compound Telegraph distribution with normal noise Norm(37,10) on the transcription rate parameter. The middle figure compares a NegBin(5,0.5) with samples from a compound Telegraph distribution with normal noise Norm(5.5,2.3) on the transcription rate parameter. The bottom figure compares a NegBin(5,1) distribution with samples from a compound negative binomial distribution with normal noise Norm(2.3,0.6) on the burst intensity parameter.
Figure 3 A comparison of joint distributions in the case of moderate extrinsic noise and no extrinsic noise. The plots are generated from a three-stage model of gene transcription, incorporating the production of nascent mRNA, mature mRNA and protein. Details of the model can be found in Figure 4 (model 𝐌4) and the associated text. The top panel shows nascent-mature, nascent-protein and mature-protein joint distributions in the case of extrinsic noise, while the bottom panel displays the corresponding plots in the case of no extrinsic noise. Extrinsic noise produces a visibly more correlated joint distribution, which forms the basis of the pathway-reporter method.
Figure 4 Stochastic models of gene expression. (A) The model 𝐌1 is the simplest model of mRNA maturation. Here, nascent (unspliced) mRNA are shown in red/blue wavy lines; the blue segments represent introns and the red segments represent the exons. Nascent mRNA are synthesised at the rate KN, and spliced into mature mRNA (blue wavy lines) at rate KM. Degradation of the mature mRNA occurs at rate δM. The model 𝐌2 is the well-known two-stage model of gene expression. The model 𝐌3 is the extension of the two-stage model to include promoter switching. The nodes A (active) and I (inactive) represent the state of the gene, with transitions between states occurring at rates λ and μ. The remaining parameters are the same as those in the model 𝐌2. The model 𝐌4 extends the model 𝐌3 by incorporating mRNA maturation. Here, KN is the transcription rate parameter, and KM is the maturation rate. All other parameters are the same as in 𝐌3. (B) Time series simulation of the copy number and activity state of a gene modelled by 𝐌4. For ease of visualisation, the parameters were artificially chosen as λ=2, μ=2.5, KN=40, KM=4, Kp=4 and δp=1, with all parameters scaled relative to δm=1. (C) As λ approaches 0, we see a higher correlation in the copy numbers of nascent mRNA, mature mRNA and protein. Again, the parameters are artificially chosen to be λ=0.1, μ=2.5, KN=80, KM=4, Kp=4 and δp=1, with all parameters scaled relative to δm=1.
Figure 5 with 1 supplement Heatmaps for the intrinsic contribution to the covariance. These heatmaps estimate the level of overshoot in the pathway-reporter approach for the nascent-protein and mature-protein reporters; blue regions show an overshoot of less than ≃0.05. Here, the intrinsic contribution is calculated using stochastic simulations of the model 𝐌4. For the mature-protein and nascent-protein reporters, we consider three different values of the parameter μ, specifically μ=2, μ=10, and μ=20. In all cases, the parameter δp and the on-rate λ are varied between 0.01 and 0.5, and 0.5 and 5, respectively. The parameters of the model 𝐌4 are scaled so that δM=1. The maturation rate is fixed at 20, with the parameters KN and KP chosen to produce a mean protein level of 1000, a mean nascent mRNA level of 5 and a mean mature mRNA level of 50. Each individual pixel is generated from a sample of size 3000, although there is still some instability in the convergence for the nascent-protein reporter, particularly as the overshoot estimation starts to increase, and particularly as μ is larger. To produce more accurate values, the case of μ=2 was averaged over two full experiments while μ=20 was averaged over three. This was also done for the mature-protein reporter, however for these images there was almost no visible difference between the various runs of the experiment and their averages. Each of the three μ values takes approximately 7–10 hr of computation, depending on lead in time before sampling within a simulation. Figure 5—figure supplement 1 gives a heatmap for the overshoot in the pathway-reporter approach for nascent-mature pathway reporters.
Figure 5—figure supplement 1 Heatmap for the intrinsic contribution to the covariance for nascent-mature pathway reporters. The nascent-mature reporter concerns only mRNA and so is independent of all protein-related parameters. The heatmap shows the intrinsic contribution for values of λ and μ between 0.1 and 20, with the same parameter selections for KN, KM as in Figure 5 of the main text. Similar simulations for average nascent mRNA levels of 3 and of 8, and mature mRNA levels of 30 and of 160 produced almost identical heatmaps.
Figure 6 Multiscale model of transcriptional bursting with additional features of the cell cycle. In this model, the gene stochastically switches between three states: two active states, S10 and S11, and one inactive state S0. Gene activation occurs in two steps, initially by the binding of transcription factors (at rate λ1, reversible at rate μ1), and then as a secondary step by the binding and pause of the mRNA polymerase (at rate λ2). Transitions from S11 to S0 also occur at rate μ1, due to detachment of both the transcriptional factors and polymerase. Transcription of nascent mRNA (at rate KN) occurs only in state S11 and results in immediate transition to state S10. Nascent mRNA mature at rate KM, and are subsequently translated into protein at rate Kp. Degradation of mRNA and protein occur with rates δm and δp, respectively. We verify our pathway reporter method on three variations of the multiscale model. First, we assume all reactions are first-order Poisson processes (Case (2) in the main text). We then incorporate further details of the mRNA maturation process, where nascent mRNA occurs after a fixed amount of time (Case (3)). Finally, we incorporate features of the cell-cycle such as gene replication, dosage compensation, cell division, and cell-cycle length variability, as well as incorporating more realistic Erlang distributed maturation times (Case (4)).
Appendix 4—figure 1 Comparison of convergence of η2 estimates for low and high mRNA levels by way of nascent-mature reporters and mature-mature reporters. Low level corresponds to mean nascent mRNA level of 0.5, and mean mature mRNA level of 5. High level corresponds to mean nascent mRNA level of 5, and mean mature mRNA level of 50. In both cases, the simulated genes are constitutive and noise is on all parameters except for δM=1. The green line gives the squared coefficient of variation for KN, set to 0.2, which is the value the various reporters are expected to estimate. (A) Convergence of the η2 estimate over the first 2000 samples in the low- and high-output genes. (B) Convergence of the η2 estimate over 100,000 samples in the low-output gene only.
Appendix 4—figure 2 Comparison of convergence for low and high mRNA levels by way of mature-protein and mature-mature reporters. Low level corresponds to mean nascent mRNA level of 0.5, and mean mature mRNA level of 5. High level corresponds to mean nascent mRNA level of 5, and mean mature mRNA level of 50. In both cases the simulated genes are constitutive and noise is on all parameters except for δM=1. The noise on KN has squared coefficient of variation equal to 0.2, which is shown as the red horizontal line. Our theory shows that mature-protein reporters will return an overshoot that is negligible in the high-output gene (the blue horizontal line), but larger in the low output gene (light blue horizontal line); these values are calculated in the text. (A) Comparison of convergence for low and high mRNA levels over the first 2000 samples. (B) Convergence of the η2 estimate over 20,000 samples in the case of the low-output gene only. Two examples of each are given, to show the variation in behaviour.
Appendix 4—figure 3 Convergence for reporter pairs for low gene activity. In each case, the mean nascent mRNA level is 0.5, the mean mature mRNA level of 5, and the mean protein level is 500. The simulated genes are constitutive and noise is on all parameters except for δM=1. Each graph shows the convergence of 600 individual reporter simulations, for each combination of reporters from nascent mRNA, mature mRNA and protein. Each reporter simulation is from 10,000 samples, with the reporter estimates calculated at intervals of 100. The noise on KN has squared coefficient of variation equal to 0.2, which should be identified by both the nascent-mature reporter and mature-mature dual reporter. As in Figure Convergence of Pathway and Dual Reporters, the mature-protein reporter should converge to an estimate of approximately 0.2315. Nascent-nascent and protein-protein reporters identify combined noise on more parameters, so do not converge to 0.2. The lower graph shows each of nascent-mature, mature-mature, mature-protein and nascent-protein in the same plot for direct comparison.

## Tables

Table 1
Table 2
###### Table 2—source data 1

This is an Excel spreadsheet containing the data used to produce the final values in Table 2.

https://cdn.elifesciences.org/articles/69324/elife-69324-table2-data1-v2.xlsx
Table 3
###### Table 3—source data 1

This is an Excel spreadsheet containing the data used to produce the final values in Table 3.

https://cdn.elifesciences.org/articles/69324/elife-69324-table3-data1-v2.xlsx
Table 4
###### Table 4—source data 1

This is an Excel spreadsheet containing the data used to produce the final values in Table 4.

https://cdn.elifesciences.org/articles/69324/elife-69324-table4-data1-v2.xlsx
Appendix 5—table 1
Appendix 5—table 2
Appendix 5—table 3
Appendix 5—table 4
Appendix 5—table 5
Appendix 5—table 6
Appendix 5—table 7
Appendix 5—table 8

###### Supplementary file 1

Simulation results of the pathway-reporter method for constitutive genes across 60 different parameter values.

We consider noise on all of the parameters except mRNA decay in a constitutive model with mRNA maturation and protein translation. Refer to the excel spreadsheet ConstitutiveaResults.xlsx for full details of the simulation, including the chosen noise distributions and parameters.

https://cdn.elifesciences.org/articles/69324/elife-69324-supp1-v2.xlsx
###### Supplementary file 2

Simulation results for the overshoot estimate in the pathway-reporter method for bursty genes across 448 different parameter values.

Refer to the excel spreadsheet NoiseFreeaResults.xlsx for full details of the simulation, including the chosen noise distributions and parameters.

https://cdn.elifesciences.org/articles/69324/elife-69324-supp2-v2.xlsx
###### Transparent reporting form
https://cdn.elifesciences.org/articles/69324/elife-69324-transrepform-v2.pdf

A two-part list of links to download the article, or parts of the article, in various formats.