Pathway dynamics can delineate the sources of transcriptional noise in gene expression

  1. Lucy Ham  Is a corresponding author
  2. Marcel Jackson
  3. Michael PH Stumpf  Is a corresponding author
  1. School of BioSciences, University of Melbourne, Australia
  2. Department of Mathematics and Statistics, La Trobe University, Australia
  3. School of Mathematics and Statistics, University of Melbourne, Australia
9 figures, 12 tables and 3 additional files

Figures

Modeling the effects of both intrinsic and extrinsic noise.

(A) A schematic of the Telegraph process, with nodes A (active) and I (inactive) representing the state of the gene. Transitions between the states A and I occur stochastically at rates μ and λ, respectively. The parameter K is the mRNA transcription rate, and δ is the degradation rate. (B) The compound model incorporates extrinsic noise by assuming that parameters θ of the Telegraph model vary across an ensemble of cells, according to some probability distribution f(θ;η). (C) Variation in the parameters across the cell population leads to greater variability in the mRNA copy number distribution.

Accuracy of our integral representations for the Telegraph and negative binomial distribution.

(A) For each of the results in (3 - 5), we compare the (fixed-parameter) Telegraph and negative binomial distributions with their respective compound representations for two different sets of parameter values. The top panel (pink) shows comparisons for (3), with parameter values (left) λ=2, μ=12, K=100, μ=3, and KBetaK(5,9), and (right) λ=1, μ=20, K=100, μ=2 and KBetaK(3,18). The middle panel (green) gives comparisons for (4), with parameter values (left) λ=10, β=2, μ=2 and KGamma(12,2) and (right) λ=1, β=1, μ=2 and KGamma(3,1). The bottom panel (coral) gives comparisons for (5). The parameter values (left) are λ=10, λ=15 and c=2 and (right) are λ=2, λ=5 and c=3. (B) The top figure compares a Telegraph(2,4,60) distribution with samples from a compound Telegraph distribution with normal noise Norm(37,10) on the transcription rate parameter. The middle figure compares a NegBin(5,0.5) with samples from a compound Telegraph distribution with normal noise Norm(5.5,2.3) on the transcription rate parameter. The bottom figure compares a NegBin(5,1) distribution with samples from a compound negative binomial distribution with normal noise Norm(2.3,0.6) on the burst intensity parameter.

A comparison of joint distributions in the case of moderate extrinsic noise and no extrinsic noise.

The plots are generated from a three-stage model of gene transcription, incorporating the production of nascent mRNA, mature mRNA and protein. Details of the model can be found in Figure 4 (model 𝐌4) and the associated text. The top panel shows nascent-mature, nascent-protein and mature-protein joint distributions in the case of extrinsic noise, while the bottom panel displays the corresponding plots in the case of no extrinsic noise. Extrinsic noise produces a visibly more correlated joint distribution, which forms the basis of the pathway-reporter method.

Stochastic models of gene expression.

(A) The model 𝐌1 is the simplest model of mRNA maturation. Here, nascent (unspliced) mRNA are shown in red/blue wavy lines; the blue segments represent introns and the red segments represent the exons. Nascent mRNA are synthesised at the rate KN, and spliced into mature mRNA (blue wavy lines) at rate KM. Degradation of the mature mRNA occurs at rate δM. The model 𝐌2 is the well-known two-stage model of gene expression. The model 𝐌3 is the extension of the two-stage model to include promoter switching. The nodes A (active) and I (inactive) represent the state of the gene, with transitions between states occurring at rates λ and μ. The remaining parameters are the same as those in the model 𝐌2. The model 𝐌4 extends the model 𝐌3 by incorporating mRNA maturation. Here, KN is the transcription rate parameter, and KM is the maturation rate. All other parameters are the same as in 𝐌3. (B) Time series simulation of the copy number and activity state of a gene modelled by 𝐌4. For ease of visualisation, the parameters were artificially chosen as λ=2, μ=2.5, KN=40, KM=4, Kp=4 and δp=1, with all parameters scaled relative to δm=1. (C) As λ approaches 0, we see a higher correlation in the copy numbers of nascent mRNA, mature mRNA and protein. Again, the parameters are artificially chosen to be λ=0.1, μ=2.5, KN=80, KM=4, Kp=4 and δp=1, with all parameters scaled relative to δm=1.

Figure 5 with 1 supplement
Heatmaps for the intrinsic contribution to the covariance.

These heatmaps estimate the level of overshoot in the pathway-reporter approach for the nascent-protein and mature-protein reporters; blue regions show an overshoot of less than 0.05.

Here, the intrinsic contribution is calculated using stochastic simulations of the model 𝐌4. For the mature-protein and nascent-protein reporters, we consider three different values of the parameter μ, specifically μ=2, μ=10, and μ=20. In all cases, the parameter δp and the on-rate λ are varied between 0.01 and 0.5, and 0.5 and 5, respectively. The parameters of the model 𝐌4 are scaled so that δM=1. The maturation rate is fixed at 20, with the parameters KN and KP chosen to produce a mean protein level of 1000, a mean nascent mRNA level of 5 and a mean mature mRNA level of 50. Each individual pixel is generated from a sample of size 3000, although there is still some instability in the convergence for the nascent-protein reporter, particularly as the overshoot estimation starts to increase, and particularly as μ is larger. To produce more accurate values, the case of μ=2 was averaged over two full experiments while μ=20 was averaged over three. This was also done for the mature-protein reporter, however for these images there was almost no visible difference between the various runs of the experiment and their averages. Each of the three μ values takes approximately 7–10 hr of computation, depending on lead in time before sampling within a simulation. Figure 5—figure supplement 1 gives a heatmap for the overshoot in the pathway-reporter approach for nascent-mature pathway reporters.

Figure 5—figure supplement 1
Heatmap for the intrinsic contribution to the covariance for nascent-mature pathway reporters.

The nascent-mature reporter concerns only mRNA and so is independent of all protein-related parameters. The heatmap shows the intrinsic contribution for values of λ and μ between 0.1 and 20, with the same parameter selections for KN, KM as in Figure 5 of the main text. Similar simulations for average nascent mRNA levels of 3 and of 8, and mature mRNA levels of 30 and of 160 produced almost identical heatmaps.

Multiscale model of transcriptional bursting with additional features of the cell cycle.

In this model, the gene stochastically switches between three states: two active states, S10 and S11, and one inactive state S0. Gene activation occurs in two steps, initially by the binding of transcription factors (at rate λ1, reversible at rate μ1), and then as a secondary step by the binding and pause of the mRNA polymerase (at rate λ2). Transitions from S11 to S0 also occur at rate μ1, due to detachment of both the transcriptional factors and polymerase. Transcription of nascent mRNA (at rate KN) occurs only in state S11 and results in immediate transition to state S10. Nascent mRNA mature at rate KM, and are subsequently translated into protein at rate Kp. Degradation of mRNA and protein occur with rates δm and δp, respectively. We verify our pathway reporter method on three variations of the multiscale model. First, we assume all reactions are first-order Poisson processes (Case (2) in the main text). We then incorporate further details of the mRNA maturation process, where nascent mRNA occurs after a fixed amount of time (Case (3)). Finally, we incorporate features of the cell-cycle such as gene replication, dosage compensation, cell division, and cell-cycle length variability, as well as incorporating more realistic Erlang distributed maturation times (Case (4)).

Appendix 4—figure 1
Comparison of convergence of η2 estimates for low and high mRNA levels by way of nascent-mature reporters and mature-mature reporters.

Low level corresponds to mean nascent mRNA level of 0.5, and mean mature mRNA level of 5. High level corresponds to mean nascent mRNA level of 5, and mean mature mRNA level of 50. In both cases, the simulated genes are constitutive and noise is on all parameters except for δM=1. The green line gives the squared coefficient of variation for KN, set to 0.2, which is the value the various reporters are expected to estimate. (A) Convergence of the η2 estimate over the first 2000 samples in the low- and high-output genes. (B) Convergence of the η2 estimate over 100,000 samples in the low-output gene only.

Appendix 4—figure 2
Comparison of convergence for low and high mRNA levels by way of mature-protein and mature-mature reporters.

Low level corresponds to mean nascent mRNA level of 0.5, and mean mature mRNA level of 5. High level corresponds to mean nascent mRNA level of 5, and mean mature mRNA level of 50. In both cases the simulated genes are constitutive and noise is on all parameters except for δM=1. The noise on KN has squared coefficient of variation equal to 0.2, which is shown as the red horizontal line. Our theory shows that mature-protein reporters will return an overshoot that is negligible in the high-output gene (the blue horizontal line), but larger in the low output gene (light blue horizontal line); these values are calculated in the text. (A) Comparison of convergence for low and high mRNA levels over the first 2000 samples. (B) Convergence of the η2 estimate over 20,000 samples in the case of the low-output gene only. Two examples of each are given, to show the variation in behaviour.

Appendix 4—figure 3
Convergence for reporter pairs for low gene activity.

In each case, the mean nascent mRNA level is 0.5, the mean mature mRNA level of 5, and the mean protein level is 500. The simulated genes are constitutive and noise is on all parameters except for δM=1. Each graph shows the convergence of 600 individual reporter simulations, for each combination of reporters from nascent mRNA, mature mRNA and protein. Each reporter simulation is from 10,000 samples, with the reporter estimates calculated at intervals of 100. The noise on KN has squared coefficient of variation equal to 0.2, which should be identified by both the nascent-mature reporter and mature-mature dual reporter. As in Figure Convergence of Pathway and Dual Reporters, the mature-protein reporter should converge to an estimate of approximately 0.2315. Nascent-nascent and protein-protein reporters identify combined noise on more parameters, so do not converge to 0.2. The lower graph shows each of nascent-mature, mature-mature, mature-protein and nascent-protein in the same plot for direct comparison.

Tables

Table 1
Summary of the non-identifiability results.

in lines 1, 3, and 5 are our contributions, while the remaining representations (lines 2 and 4) are known and can be obtained as special cases of our results. Note that here we use Tele(λ,μ,K) to denote a Telegraph distribution with parameters λ,μ,K. In lines 3 and 4, the parameter β>0 can be chosen freely and determines the mean burst intensity in the resulting compound system. In line 5, the parameters b,θ>0 are again mean burst intensities, and b can be chosen freely in the determination of the distribution of θ.

Copy no. distribution q~(n;η)Underlying distribution p~(n;θ)Noise distribution f(θ;η)
Tele(λ,μ,K)Tele(λ,μ,K)KBetaK(λ+μ,μμ)
Tele(λ,μ,K)Pois(K)KBetaK(λ,μ)
NegBin(λ,ββ+1)Tele(λ,μ,K)KGamma(λ+μ,β)
NegBin(λ,ββ+1)Pois(K)KGamma(λ,β)
NegBin(λ,bb+1)NegBin(λ,θθ+1)θBetaPrimeb(λλ,λ)
Table 2
A comparison of the pathway-reporter method and the dual-reporter method for constitutive expression under the model 𝐌1.

Here, PR (NM) gives the results of the nascent and mature pathway reporters, while DR (Mat) gives the results of dual reporters calculated from the mature mRNA. We considered noise on both the transcription rate (KN) and the maturation rate (KM). The decay rate is fixed at one, with the other parameters scaled accordingly. In each case, the maturation rate KM is varied according to a Gamma(8,1.25) distribution, which has coefficient of variation 0.125. The values given are the average of 100 simulations, each calculated from 500 copy number samples, and the errors are ± one standard deviation. Our theory predicts that pathway-reporters will identify the noise on the nascent transcription rate KN (ηext2). The noise distribution parameters are chosen to produce an average nascent mRNA copy number of approximately five and an average mature mRNA copy number of approximately 50.

TheorySimulation
(r)1-2 ηext2Noise (KN)Pr (NM)DR (Mat)
0.00KN=500.00±0.010.00±0.00
0.10Beta133.3˙(6,10.5)0.10±0.010.10±0.01
0.20Gamma(5,10)0.20±0.020.20±0.01
0.50Beta300(1.5,7.5)0.49±0.040.50±0.03
Table 2—source data 1

This is an Excel spreadsheet containing the data used to produce the final values in Table 2.

https://cdn.elifesciences.org/articles/69324/elife-69324-table2-data1-v2.xlsx
Table 3
A comparison of the pathway-reporter method and the dual-reporter method for constitutive expression under the model 𝐌2.

Here PR (MP) gives the results of the mRNA-protien pathway reporters, while DR (Mat) gives the results of dual reporters calculated from the mature mRNA. We considered noise on the transcription rate (Km), the protein synthesis rate (Kp), and the protein decay rate (δp). The mRNA decay rate is fixed at one. In each case, we varied Kp according to a Gamma(5,0.4) distribution and δp according to a Gamma(8,0.125) distribution; the corresponding noise strengths are 0.20 and 0.125, respectively. We considered different noise distributions on Km, which produce a range of noise strengths. The noise distribution parameters are selected to produce a mean mRNA of approximately 50 and a mean number of approximately 1000 proteins in each simulation. The values given are the average of 100 simulations, each calculated from 500 copy number samples, and the errors are ± one standard deviation. As our theory predicts, the mRNA-protein reporters identify the noise on the transcription rate parameter Km (ηext2).

TheorySimulation
(r)1-2 ηext2Noise (Km)Pr (MP)DR (Mat)
0.00Km=500.00±0.010.00 ± 0.00
0.10Beta133.3˙(6,10.5)0.10±0.010.10 ± 0.01
0.20Gamma(5,10)0.20±0.020.20±0.01
0.50Beta300(1.5,7.5)0.51±0.040.50±0.03
Table 3—source data 1

This is an Excel spreadsheet containing the data used to produce the final values in Table 3.

https://cdn.elifesciences.org/articles/69324/elife-69324-table3-data1-v2.xlsx
Table 4
A comparison of the pathway-reporter method and dual-reporter method for bursty expression.

Here PR (NP) gives the results of the nascent and protein pathway reporters, PR (MP) gives the results of the mRNA and protein reporters, while DR (Mat) gives the results of the dual reporters calculated from the mature mRNA. We consider noise on all of the parameters except for δM and KM; see discussion in main text. The values given are the average of 100 simulations, each calculated from 500 copy number samples, and the errors are ± one standard deviation. Our theory predicts that pathway-reporters will identify the noise at both the promoter level (λ,μ) and transcriptional level (KN); the total extrinsic noise in each case is given by ηext2. As before, the noise distribution parameters are chosen to produce an average nascent mRNA copy number of 5 and an average mature mRNA copy number of 50, and an average number of 1000 proteins.

MeanSimulation
(r)1-5 λμKNKPδPPr (MP)Pr (NP)DR (Mat)
0.5115020.10.46±0.060.38±0.070.32±0.07
1215020.10.39±0.050.34±0.070.32±0.05
120105020.10.66±0.150.52±0.220.50±0.15
2210060.30.35±0.040.29±0.050.27±0.03
22055060.30.61±0.090.47±0.150.47±0.09
101010060.30.29±0.030.27±0.040.27±0.02
Table 4—source data 1

This is an Excel spreadsheet containing the data used to produce the final values in Table 4.

https://cdn.elifesciences.org/articles/69324/elife-69324-table4-data1-v2.xlsx
Appendix 5—table 1
A comparison of the pathway-reporter method and dual-reporter method for constitutive gene expression with Erlang-distributed maturation times (Case (1)A).

Here, PR (NP) gives the results of the nascent and protein pathway reporters, PR (MP) gives the results of the mRNA and protein reporters, while DR (Mat) gives the results of the dual reporters calculated from the mature mRNA. The maturation time Tmat is chosen to be Erlang distributed with mean length 1/30,0.05, and 0.1, respectively. We consider the rate parameters for the remaining exponentially distributed times to be constant, so that there is no extrinsic noise. The pathway-reporters correctly identify the zero extrinsic noise contribution.

ParametersSimulation
(r)1-6 KNTmat(mean)KPδPPr(NM)Pr (MP)Pr (NP)DR (Mat)
3000.03˙0.06˙0.10.00±0.0010.00±0.00010.00±0.00030.00±0.0001
2000.0510.10.00±0.0010.00±0.00010.00±0.00040.00±0.0001
1000.120.10.00±0.0010.00±0.00020.00±0.00010.00±0.0004
Appendix 5—table 2
A comparison of the pathway-reporter method and dual-reporter method for constitutive gene expression and fixed maturation time (Case (1)B).

For each of the parameters KP,δP we selected a scaled Beta(5,6) distribution, with squared coefficient of variation η2=0.1; the scaling is chosen in each case to achieve a mean value equal to the parameter value. The parameter KN is given the noise distribution Beta(3,6), which has a slightly higher coefficient of variation η2=0.2. In order to benchmark against dual reporters, the maturation time was fixed in each case. The extrinsic noise contribution predicted by the pathway-reporters matches well with the dual reporter values.

MeanSimulation
(r)1-4 TmatKNKPδPPr(NM)Pr (MP)Pr (NP)DR (Mat)
0.0520010.10.20±0.010.20±0.020.20±0.030.20±0.01
0.110020.10.20±0.010.21±0.030.21±0.030.20±0.01
Appendix 5—table 3
A comparison of the pathway-reporter method and dual-reporter method for the multiscale model (Case (2)A above).

We consider fixed parameters values (that is, no extrinsic noise). As our theory predicts, the pathway-reporters correctly identify zero extrinsic noise.

ParametersSimulation
(r)1-6 λ1μ1λ2=KNKMKPδPPr (MP)Pr (NP)DR (Mat)
224002020.10.02±0.0030.00±0.010.00±0.002
42012102020.10.02±0.0030.01±0.010.00±0.01
Appendix 5—table 4
A comparison of the pathway-reporter method and dual-reporter method for the multiscale model (Case 2.B).

For each of the parameters λ1,μ1,KP,δP we selected a scaled Beta(5,6) distribution, with squared coefficient of variation η2=0.1; the scaling is chosen in each case to achieve a mean value equal to the parameter value. The parameter λ2=KN is given the noise distribution Beta(3,6), which has a slightly higher coefficient of variation η2=0.2. In order to benchmark against dual reporters, the maturation rate was fixed at 20. As our theory suggests, the extrinsic noise contribution predicted by the pathway reporters matches well with the dual-reporter values.

MeanSimulation
(r)1-5 λ1μ1λ2=KNKPδPPr (MP)Pr (NP)DR (Mat)
2240020.10.18±0.030.16±0.040.16±0.02
420121020.10.29±0.040.27±0.070.28±0.03
Appendix 5—table 5
A comparison of the pathway-reporter method and dual-reporter method for the multiscale model with a fixed duration of maturation (Case (3)A).

Here the time to maturation Tmat is chosen to be consistent with the mean of the stochastic maturation time used in our other models (where the maturation time is exponentially distributed). We consider all rate parameters to be constant, that is, there is no extrinsic noise. Pathway-reporters correctly identify the zero extrinsic noise contribution.

ParametersSimulation
(r)1-6 λ1μ1λ2=KNTmatKPδPPr (MP)Pr (NP)DR (Mat)
224000.0520.10.02±0.0040.00±0.010.00±0.01
42012100.0520.10.02±0.0030.00±0.010.00±0.01
Appendix 5—table 6
A comparison of the pathway-reporter method and dual-reporter method for the multiscale model with a fixed duration of maturation (Case (3)B).

Here the maturation time, Tmat, is set to 0.05. For each of the parameters λ1,μ1,KP,δP, we selected a scaled Beta(5,6) distribution, with squared coefficient of variation η2=0.1; the scaling is chosen in each case to achieve a mean value equal to the parameter value. The parameter λ2=KN is given the noise distribution Beta(3,6), which has a slightly higher coefficient of variation η2=0.2. The extrinsic noise values given by pathway reporters match well with those obtained by dual reporters.

MeanSimulation
(r)1-5 λ1μ1λ2=KNKPδPPr (MP)Pr (NP)DR (Mat)
2240020.10.18±0.020.17±0.040.15±0.02
420121020.10.30±0.050.30±0.120.28±0.05
Appendix 5—table 7
A comparison of the pathway-reporter method and dual-reporter method for the multiscale model with Erlang-distributed maturation times and cell-cycle effects (Case (4)A).

Here, the time to maturation, Tmat, is chosen to be consistent with the mean of the stochastic maturation time used in our other models (where the maturation time is exponentially distributed). Specifically, we choose TmatErlang(3,60), with mean length 3/60=0.05, matching our earlier benchmarking using exponentially-distributed maturation time, with mean length 0.05. We consider the rate parameters for the remaining exponentially distributed times to be constant, that is, there is no extrinsic noise beyond that contributed by the cell-cycle effects.

ParametersSimulation
(r)1-5 λ1μ1λ2=KNKPδPPr (MP)Pr (NP)DR (Mat)
2240020.10.04±0.010.02±0.0010.04±0.01
420121020.10.02±0.0030.01±0.010.02±0.01
Appendix 5—table 8
A comparison of the pathway-reporter method and dual-reporter method for the multiscale model with Erlang-distributed maturation times and cell-cycle effects (Case (4)B).

The Erlang distributed maturation time is chosen as in Appendix 5—tables 7. For each of the parameters λ1,μ1,KP,δP, we selected a scaled Beta(5,6) distribution, with squared coefficient of variation η2=0.1; the scaling is chosen in each case to achieve a mean value equal to the parameter value. The parameter λ2=KN is given the noise distribution Beta(3,6), which has a slightly higher coefficient of variation η2=0.2.

MeansSimulation
(r)1-6 λ1μ1λ2=KNTmatKPδPPr (MP)Pr (NP)DR (Mat)
224000.0520.10.21±0.040.18±0.040.23±0.02
42012100.0520.10.34±0.050.30±0.120.36±0.02

Additional files

Supplementary file 1

Simulation results of the pathway-reporter method for constitutive genes across 60 different parameter values.

We consider noise on all of the parameters except mRNA decay in a constitutive model with mRNA maturation and protein translation. Refer to the excel spreadsheet ConstitutiveaResults.xlsx for full details of the simulation, including the chosen noise distributions and parameters.

https://cdn.elifesciences.org/articles/69324/elife-69324-supp1-v2.xlsx
Supplementary file 2

Simulation results for the overshoot estimate in the pathway-reporter method for bursty genes across 448 different parameter values.

Refer to the excel spreadsheet NoiseFreeaResults.xlsx for full details of the simulation, including the chosen noise distributions and parameters.

https://cdn.elifesciences.org/articles/69324/elife-69324-supp2-v2.xlsx
Transparent reporting form
https://cdn.elifesciences.org/articles/69324/elife-69324-transrepform-v2.pdf

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Lucy Ham
  2. Marcel Jackson
  3. Michael PH Stumpf
(2021)
Pathway dynamics can delineate the sources of transcriptional noise in gene expression
eLife 10:e69324.
https://doi.org/10.7554/eLife.69324