Pathway dynamics can delineate the sources of transcriptional noise in gene expression
Abstract
Singlecell expression profiling opens up new vistas on cellular processes. Extensive celltocell variability at the transcriptomic and proteomic level has been one of the standout observations. Because most experimental analyses are destructive we only have access to snapshot data of cellular states. This loss of temporal information presents significant challenges for inferring dynamics, as well as causes of celltocell variability. In particular, we typically cannot separate dynamic variability from within cells (‘intrinsic noise’) from variability across the population (‘extrinsic noise’). Here, we make this nonidentifiability mathematically precise, allowing us to identify new experimental setups that can assist in resolving this nonidentifiability. We show that multiple generic reporters from the same biochemical pathways (e.g. mRNA and protein) can infer magnitudes of intrinsic and extrinsic transcriptional noise, identifying sources of heterogeneity. Stochastic simulations support our theory, and demonstrate that ‘pathwayreporters’ compare favourably to the wellknown, but often difficult to implement, dualreporter method.
eLife digest
In biology, seemingly random variation within or between cells can have significant effects on a number of cellular processes, like how cells divide and develop. For example, how often a gene is switched on, or ‘expressed’, can randomly fluctuate over time. This ‘noise’ may lead to a cell having slightly more of a particular molecule, causing it to behave differently to other cells in the population. However, it is currently unclear how this random variation is created and controlled in cells, and what effect this has on biological systems as a whole.
When a gene is expressed, its sequence typically gets copied in to a molecule called mRNA, which is then processed and used to build the protein encoded by the gene. By measuring the levels of mRNA molecules in individual cells, researchers have been able to investigate how gene expression varies within populations. These experiments are carried out on dead cells at a single point in time, and mathematical models are then applied to detect noise in the molecular data.
This approach, however, precludes how noise changes over time, making it difficult to determine the source of celltocell variability. In particular, whether the variation detected is the result of genuine random molecular changes (intrinsic noise), or external factors – such as temperature and pH – fluctuating in the cells environment (extrinsic noise).
Here, Ham et al. have built on previous mathematical models to identify a new approach for investigating the source of molecular noise. They found that for any given gene it is impossible to understand what causes its activity levels to vary just from data on its mRNA levels. Instead, information on other molecules that are affected by expression of the gene (termed ‘pathway reporters’) can provide a clearer picture of whether molecular variability is the result of intrinsic or extrinsic noise.
The mathematical models developed by Ham et al. reveal what can and cannot be learned about noise from gene expression data. Furthermore, pathwayreporters are easier to measure experimentally than other reporters that are typically used to study the origins and effects of celltocell variability. These findings could help researchers design single cell experiments that are better for studying noise, leading to a deeper understanding of how different types of variation impact cell biology.
Introduction
Noise is a fundamental aspect of every cellular process Shahrezaei and Swain, 2008b. Frequently it is even of functional importance, for example in driving cellfate transitions. Sometimes it can afford evolutionary advantages, for example, in the context of bethedging strategies. Sometimes, it can be a nuisance, for example, when it makes cellular signal processing more difficult. But noise is nearly ubiquitous at the molecular scale, and its presence has profoundly shaped cellular life. Analysing and understanding the sources of noise, how it is propagated, amplified or attenuated, and how it can be controlled, has therefore become a cornerstone of modern molecular cell biology.
Noise arising in gene expression has arguably attracted most of the attention so far (but see e.g. Filippi et al., 2016 and Jetka et al., 2018 for the analysis of noise at the signalling level). Generally speaking, gene expression noise is separable into two sources of variability, as pioneered by Swain et al., 2002. Intrinsic noise is generated by the dynamics of the gene expression process itself. The process, however, is often influenced by other external factors, such as the availability of promoters and of RNA polymerase, the influence of long noncoding RNA as a transcriptional regulator Goodrich and Kugel, 2006, as well as differences in the cellular environment. Such sources of variability contribute extrinsic noise, and reflect the variation in gene expression and transcription activity across the cell population. As such, understanding extrinsic noise lies at the heart of understanding cellpopulation heterogeneity.
So far, identifying the sources of gene expression noise from transcriptomic measurements alone has proven difficult Paulsson, 2004; Pedraza and Paulsson, 2008. The fundamental hindrance lies in the fact that singlecell RNA sequencing, which provides most of the available data, is destructive, so that datasets reflect samples from across a population, rather than samples taken repeatedly from the same cell. As temporal information is lost in such measurements Komorowski et al., 2011, it may be impossible to distinguish temporal variability within individual cells (e.g. burstiness), from ensemble variability across the population (i.e. extrinsic noise). A number of numerical and experimental studies have suggested this confounding effect Jones and Elf, 2018; Jones et al., 2014; Zopf et al., 2013, showing that systems with intrinsic noise alone exhibit behaviour that is indistinguishable from systems with both extrinsic and intrinsic noise. This is examined more formally in Ham et al., 2020a, where we show that the moment scaling behaviour and transcript distribution may be indistinguishable from situations with purely intrinsic noise. The limitations in inferring dynamics from population data are becoming increasingly evident, and a number of studies that seek to address some of these problems have emerged Skinner et al., 2016; Gorin and Pachter, 2020a.
Here we provide adetailed analysis of the extent to which sources of variability are identifiable from population singlecell omics data. We are able to prove rigorously that it is in general impossible to identify the sources of variability, and consequently, the underlying transcription dynamics, from observed transcript abundance distributions alone. Systems with intrinsic noise alone can always present identically to similar systems with extrinsic noise. For practical purposes, the effect does not appear to depend on the precise choice of distribution, but holds more generally. Moreover, we demonstrate that extrinsic noise invariably distorts the apparent degree of burstiness of the underlying system: data which seems ‘bursty’ is not necessarily generated by a bursty process, if there is celltocell variability across the population. This is a stronger nonidentifiability result than has previously been obtained Ingram et al., 2008; Shahrezaei and Swain, 2008a; Khammash, 2009; Munsky et al., 2009; Komorowski et al., 2011, and has important ramifications for our analysis of experimental data: we cannot assess causes of transcriptional variability, if we do not simultaneously assess celltocell variability in the transcriptional machinery. Our results highlight (in fact prove mathematically) the requirement for additional information, beyond the observed copy number distribution, in order to constrain the space of possible dynamics that could give rise to the same distribution.
This seemingly intractable problem can at least partially be resolved with a brilliantly simple approach: the dualreporter method Swain et al., 2002. In this approach, noise can be separated into extrinsic and intrinsic components, by observing correlations between the expression of two independent, but identically distributed fluorescent reporter genes. Dualreporter assays have been employed experimentally to study the noise contributed by both global and genespecific effects Elowitz et al., 2002; Raser and O'Shea, 2005; Raj et al., 2006. A particular challenge, however, is that dual reporters are rarely identically regulated Raj et al., 2006; Quarton et al., 2020, and are not straightforward to set up in every system. More recently, it has been shown that the independence of dual reporters cannot be guaranteed in some systems Naik et al., 2021. As a result, there have been efforts in developing alternative methods for decomposing noise Quarton et al., 2020; Singh, 2014; Lin and Amir, 2021. Here we develop a widely applicable generalisation (and simplification) of the original dualreporter approach Swain et al., 2002. We demonstrate that nonidentical and notnecessarily independent reporters can provide an analogous noise decomposition. Our result shows that measurements taken from the same biochemical pathway (e.g. mRNA and protein) can replace dual reporters, enabling the noise decomposition to be obtained from a single gene. This completely circumvents the requirement of conditionally independent and identically regulated reporter genes. The results obtained from our ‘pathwayreporter’ method are also borne out by stochastic simulations, and compare favourably with the dualreporter method. In the case of constitutive expression, the results obtained from our decomposition are identical to those obtained from dual reporters. For bursty systems, we show that our approach provides a satisfactorily close approximation, except in extreme cases. Our methodology is verified mathematically for the most common models of gene transcription, but holds independently of the specific nature of the gene expression dynamics, as we verify in silico across a range of more detailed models.
Materials and methods
A simple model for stochastic mRNA dynamics is the Telegraph model: a twostate model describing promoter switching, transcription, and mRNA degradation. In this model, all parameters are fixed, and gene expression variability arises due to the inherent stochasticity of the transcription process. As discussed above, this process will often be influenced by extrinsic sources of variability, and so modifications to account for this additional source of variability are required.
The telegraph model
Request a detailed protocolThe Telegraph model was first introduced in Ko, 1991, and has since then been widely employed in the literature to model bursty gene expression in eukaryotic cells Bahar Halpern et al., 2015; So et al., 2011; Suter et al., 2011; Larsson et al., 2019. In this model, the gene switches probabilistically between an active state and an inactive state, at rates λ (onrate) and μ (offrate), respectively. In the active state, mRNAs are synthesised according to a Poisson process with rate $K$, while in the inactive state, transcription does either not occur, or possibly occurs at some lower Poisson rate, ${K}_{0}\ll K$. Degradation of mRNA molecules occurs independently of the gene promoter state at rate δ. Figure 1A shows a schematic of the Telegraph model. Throughout the discussion here, we will rescale all parameters of the Telegraph model by the mRNA degradation rate, so that $\delta =1$. The steadystate distribution for the mRNA copy number can be explicitly calculated as Peccoud and Ycart, 1995,
Here, θ denotes the parameter vector $(\mu ,\lambda ,K,\delta )$, the function ${}_{1}F_{1}$ is the confluent hypergeometric function Abramowitz and Stegun, 1965, and, for real number $x$ and positive integer $n$, the notation ${x}^{(n)}$ abbreviates the rising factorial of $x$ (also known as the Pochhammer function). Throughout, we refer to the probability mass function ${\stackrel{~}{p}}_{T}(n;\theta )$ as the Telegraph distribution with parameters θ.
Constitutive gene expression is a limiting case of the Telegraph model, which arises when the offrate μ is 0, so that the gene remains permanently in the active state. In this case, the Telegraph distribution simplifies to a Poisson distribution with rate $K$; the distribution $\mathrm{P}\mathrm{o}\mathrm{i}\mathrm{s}(K)$.
At the opposite extreme is instantaneously bursty gene expression in which mRNA are produced in very short bursts with potentially prolonged periods of inactivity in between. This mode of gene expression has been frequently reported experimentally, particularly in mammalian genes Raj et al., 2006; Bahar Halpern et al., 2015; Suter et al., 2011; Larsson et al., 2019. Transcriptional bursting may be treated as a limit of the Telegraph model, where the offrate, μ, tends to infinity, while the onrate λ remains fixed. In this limit, it can be shown Jones and Elf, 2018; Ham et al., 2020a that the Telegraph distribution converges to the negative binomial distribution $\mathrm{N}\mathrm{e}\mathrm{g}\mathrm{B}\mathrm{i}\mathrm{n}(\lambda ,\frac{K}{\mu +K})$.
The compound distribution
Request a detailed protocolWe can account for random celltocell variation across a population by way of a compound distribution Ham et al., 2020b
where $\stackrel{~}{p}(n;\theta )$ is the stationary probability distribution of a system with fixed parameters θ and $f(\theta ;\eta )$ denotes the multivariate distribution for θ with hyperparameters η. Often we will take $\stackrel{~}{p}(n;\theta )$ to be the stationary probability distribution of the Telegraph model ((1)), and refer to (2) as the compound Telegraph distribution. Sometimes $\stackrel{~}{p}(n;\theta )$ will be the Poisson distribution or the negative binomial distribution, depending on the underlying mode of gene activity. Figure 1B gives a pictorial representation of the compound distribution.
The compound distribution is valid in the case of ensemble heterogeneity, that is, when parameter values differ between individual cells according to the distribution $f(\theta ;\eta )$, but remain constant over time Filippi et al., 2016. This model is also a valid approximation for individual cells with dynamic parameters, provided these change sufficiently slower than the transcriptional dynamics Lenive et al., 2016. In general, the compound Telegraph distribution $\stackrel{~}{q}(n;\eta )$ will be more dispersed than a Telegraph distribution to account for the uncertainty in the parameters; see Figure 1C. Such dispersion is widely observed experimentally, and as demonstrated in Ham et al., 2020a, reflects the presence of extrinsic noise.
In the context of gene expression, it has been shown experimentally that some of the primary sources of extrinsic noise have an autocorrelation time comparable to the cell cycle Rosenfeld et al., 2005. It is these slow changes in variability that justify the assumptions of the compound model. Typical sources of extrinsic variability for each parameter in the Telegraph model are summarised in Table 1 of Ham et al., 2020a. A further significant source of heterogeneity arises from the differences in cellcycle phases across the population Skinner et al., 2016. Such effects have been shown to obscure the precise underlying transcriptional dynamics Zopf et al., 2013; Huh and Paulsson, 2011, and impede the inference of transcriptional parameters from experimental data Beentjes et al., 2020. The compound model we consider here is able to capture this variability, provided that the parameter change within each cell phase is relatively slow, and any dynamic parameter changes during the transition between phases can be considered as ephemeral. A more explicit treatment of the mechanisms and changes during the cell cycle is challenging to study analytically, and theoretical modelling is only in its infancy Cao and Grima, 2020; Beentjes et al., 2020; PerezCarrasco et al., 2020. Later, we verify our proposed noise decomposition on detailed models of gene transcription, incorporating salient features of the cellcycle, such as gene replication, dosage compensation, binomial partitioning of products due to cell division, and cellcycle length variability.
Results
Identifiability considerations
Decoupling the effects of extrinsic noise from experimental measurements has been notoriously challenging. In the context of (2), the distribution $f(\theta ;\eta )$ reflects the population heterogeneity, but experimental data provides samples only of $\stackrel{~}{q}(n;\eta )$. How much can we deduce of the underlying dynamics (that is, $\stackrel{~}{p}(n;\theta ))$, and the population heterogeneity ($f(\theta ;\eta )$), from measurements of transcripts from across the cell population ($\stackrel{~}{q}(n;\eta )$)?
Of course, even though we may be able to infer the parameter η from experimental data, the expression $\stackrel{~}{p}(n;\theta )$ is really a family of distributions, parameterised by θ. This presents two fundamental challenges. The first is the possibility that there are different families of distributions $\stackrel{~}{p}(n;\theta )$ that can yield the same compound distribution, $\stackrel{~}{q}(n;\eta )$, but which are generated by different mechanisms, that is noise distributions, $f(\theta ;\eta )$. The second, perhaps more subtle, challenge is that, even for a fixed family of distributions $\stackrel{~}{p}(n;\theta )$ it may be possible that different choices of the noise distribution $f(\theta ;\eta )$ could still yield the same compound distribution $\stackrel{~}{q}(n;\eta )$. In these situations, we cannot hope to infer a unique solution for the noise distribution. This belongs to the important class of identifiability problems Villaverde, Villaverde and Banga, 2017; it has important ramifications for the interpretability of parameter estimates obtained from experimental data Ingram et al., 2008. Indeed, if two or more model parameterisations are observationally equivalent (in this case, in the form of the transcript abundance distribution $\stackrel{~}{q}(n;\eta )$), then not only does this cast doubt upon the suitability of the model to represent (and subsequently predict) the system, but it also obstructs our ability to infer mechanistic insight from experimental data.
An example of the first identifiability problem arises from a wellknown example of a compound distribution, (2): when $f(\theta ;\eta )$ is a gamma distribution and $\stackrel{~}{p}(n;\theta )$ is a Poisson distribution, corresponding to constitutive gene expression, the resulting compound distribution $\stackrel{~}{q}(n;\eta )$ is a negative binomial distribution Greenwood and Yule, 1920. But this is the same distribution as that arising from instantaneously bursty gene expression Ham et al., 2020a. Such identifiability instances may be circumvented if there is confidence in the basic mode of gene activity, that is, if there is reason to believe that a gene is not constitutively active, for example. We find, however, that there are numerous instances that can present insurmountable identifiability problems.
Bursty gene expression
We first observe that any Telegraph distribution with fixed parameters can be identically obtained from a Telegraph distribution with parameter variation. As shown in the supplementary material (Appendix Pathway dynamics can delineate the sources of transcriptional noise in gene expression), any Telegraph distribution ${\stackrel{~}{p}}_{T}(n;\lambda ,{\mu}^{\prime},{K}^{\prime})$ can be written as,
where $\mu <{\mu}^{\prime}$ and ${f}_{{K}^{\prime}}(t;\lambda +\mu ,{\mu}^{\prime}\mu )$ is the probability density function for a scaled beta distribution ${\mathrm{B}\mathrm{e}\mathrm{t}\mathrm{a}}_{K}(\lambda +\mu ,{\mu}^{\mathrm{\prime}}\mu )$ with support $[0,{K}^{\prime}]$. Thus, any Telegraph distribution can be obtained by varying the transcription rate parameter on a narrower Telegraph distribution (i.e. with a smaller offrate) according to a scaled beta distribution. Figure 2A (top panel) compares the representation obtained in (3) with the corresponding fixedparameter Telegraph distribution for two different sets of parameters. When $\mu =0$ the representation given in (3) simplifies to the wellknown Poisson representation of the Telegraph distribution in terms of the scaled beta distribution Srividya et al., 2009.
Instantaneously bursty gene expression
The previous result extends to instantaneously bursty systems. The copy number distribution of an instantaneously bursty system can be obtained from both bursty and instantaneously bursty dynamics, provided that there is appropriate parameter variation. The supplementary material contains the relevant derivations; refer to Appendix Pathway dynamics can delineate the sources of transcriptional noise in gene expression. In the following, we let ${\stackrel{~}{p}}_{\text{NB}}(n;r,\beta )$ denote the probability mass function of a $\mathrm{N}\mathrm{e}\mathrm{g}\mathrm{B}\mathrm{i}\mathrm{n}(r,\frac{\beta}{\beta +1})$ distribution, where $\beta >0$. Then for any negative binomial distribution of the form $\mathrm{N}\mathrm{e}\mathrm{g}\mathrm{B}\mathrm{i}\mathrm{n}(\lambda ,\frac{\beta}{\beta +1})$ we have,
where $f(t;\lambda +\mu ,\beta )$ is the probability density function of a $\mathrm{G}\mathrm{a}\mathrm{m}\mathrm{m}\mathrm{a}(\lambda +\mu ,\beta )$ distribution. This result generalises the aforementioned wellknown representation of the negative binomial distribution Greenwood and Yule, 1920, which corresponds to the particular case of $\mu =0$. In Figure 2A (middle panel), we compare the representation obtained in (4) with the corresponding fixedparameter negative binomial distribution for two different sets of parameters.
We also obtain the following representation for a negative binomial distribution in terms of a scaled beta prime distribution,
where ${f}_{b}(b\theta 1;\lambda {\lambda}^{\prime},{\lambda}^{\prime})$ is the probability mass function of a scaled beta prime ${\mathrm{B}\mathrm{e}\mathrm{t}\mathrm{a}\mathrm{P}\mathrm{r}\mathrm{i}\mathrm{m}\mathrm{e}}_{b}(\lambda {\lambda}^{\mathrm{\prime}},\lambda )$ distribution, where $b>0$ and $\lambda >{\lambda}^{\prime}$. This equivalently corresponds to scaled beta noise ${\mathrm{B}\mathrm{e}\mathrm{t}\mathrm{a}}_{b}(\lambda {\lambda}^{\mathrm{\prime}},{\lambda}^{\mathrm{\prime}})$ on the inverse of the expected burst intensity. Thus, the distribution of any instantaneously bursty system with mean burst intensity $b$ can be obtained from one with greater burst frequency, by varying the mean burst intensity θ according to a shifted beta prime distribution. Figure 2A (bottom panel), compares the representation obtained in (5) with the associated fixedparameter negative binomial distribution for two different sets of parameters.
An exception: constitutive expression
It has long been known Feller, 1943 that a compound Poisson distribution uniquely determines the compounding distribution. In the context of (2), this means the full extrinsic noise distribution $f(\theta ,\eta )$ is identifiable from $\stackrel{~}{q}(n;\eta )$. As we will demonstrate in related work (Ham et al., in preparation) , it is therefore possible to extract the extrinsic noise distribution, $f(\theta ,\eta )$, from transcript copy number measurements.
Implications for parameter inference
Estimates of kinetic parameters from experimental data suggest that gene expression is often either bursty or instantaneously bursty (i.e., $\mu \gg \lambda $). In turn, the assumption that geneinactivation events occur far more frequently than geneactivation events is often used to derive other models of stochastic mRNA dynamics Jia and Grima, 2020; Cao and Grima, 2020; Beentjes et al., 2020. The representations given in (3 – 5), however, show that both estimating parameters and the underlying dynamics from the form of the copy number distribution alone can be misleading. Noise on the transcription rate will invariably produce copy number data that is suggestive of a more bursty model. To illustrate this, consider an example in which the underlying process is a (mildly) bursty Telegraph system with distribution ${\stackrel{~}{p}}_{T}(n;2,3,K)$. Now assume that noise on the transcription rate parameter $K$ follows the scaled Beta distribution on the interval $[0,100]$ with $\alpha =\lambda +\mu =2+3=5$ and $\beta ={\mu}^{\prime}\mu =123=9$. The shape of this noise distribution closely resembles a slightly skewed Gaussian distribution, with the majority of transcription rates between around 10 and 60. This noise on the transcription rate $K$ within the Telegraph system ${\stackrel{~}{p}}_{T}(n;2,3,K)$ will present identically to the significantly burstier system ${\stackrel{~}{p}}_{T}(n;2,12,100)$.
It is of practical importance to recognise that, while the nonidentifiability results (summarised in Table 1) are dependent on specific noise distributions, for practical purposes any similar distribution will produce a similar effect. To demonstrate this, we replace the various noise distributions required for the representations in (3 – 5), with suitable normal distributions truncated at 0. In each case, we sample 1000 data points from the corresponding compound distribution, and compared this with the associated fixedparameter copy number distribution. The results are shown in Figure 2B. The truncated normal distribution is not chosen on the basis of biological relevance, but rather to demonstrate that even a symmetric noise distribution (except for truncation at 0) produces qualitatively similar results to the distributions used in the precise nonidentifiability results. In every case, the effect of varying the transcription rate or burst intensity parameter according to a unimodal noise distribution is to produce copy number distributions that are generally consistent with systems that appear burstier.
Finally, we note that our results explain previous empirical observations that static measurements of mRNA are not always sufficient to infer the underlying dynamics of gene activity. Skinner et al., 2016 address some of these limitations by quantifying both nascent and mature mRNA in individual cells, as well incorporating cellcycle effects into their analysis of two mammalian genes. A more developed treatment of model identifiability is given in Gorin and Pachter, 2020a, where it shown how a stochastic model incorporating the downstream processing of mRNA can be used to distinguish a particular instance of nonidentifiability. More specifically, the authors consider the nonidentifiability problem noted in Ham et al., 2020a, arising from the GammaPoisson compound representation of the negative binomial distribution; a particular case of (4) above. Despite identical distributions at the nascent level, the marginal distributions of the processed (mature) mRNA are found to be substantially different. It is likely that a similar analysis will be valuable in the context of the other identifiability problems we have given in Table 1. In the next section, we take a more general approach to resolving nonidentifibiality and exploit the properties of complex gene expression dynamics to determine not only the presence of extrinsic noise, but also estimate its magnitude.
Resolving nonidentifiability
The results of the previous section show that additional information, beyond the observed copy number distribution, is required to constrain the space of possible dynamics that could give rise to the same distribution. One way to narrow this space of possibilities, is to determine the intrinsic and extrinsic contributions to the total variation in the system.
The dualreporter method
The total gene expression noise, as measured by the squared coefficient of variation ${\eta}^{2}$, can be decomposed exactly into a sum of intrinsic and extrinsic noise contributions Swain et al., 2002. The decomposition applies to dynamic noise Hilfinger and Paulsson, 2011, and generalises to higher moments in Hilfinger et al., 2012. Sets of dual reporters at multiple levels of the transcriptional pathway has been shown to achieve a finer breakdown of noise into subcategories Bowsher and Swain, 2012. As shown in Hilfinger and Paulsson, 2011, the noise decomposition is equivalent to the normalised Law of Total Variance (Ross, 2014). Indeed, if $X$ is the random variable denoting the number of molecules of a certain species (e.g. mRNA or protein) in a given cell, then we can decompose the total noise by conditioning $X$ on the state $\mathbf{\mathbf{Z}}=({Z}_{1},\mathrm{\dots},{Z}_{n})$ of the environmental variables Z_{1},…,Zn,
It has been shown Swain et al., 2002; Hilfinger and Paulsson, 2011 that if X_{1} and X_{2} are random variables denoting the expression levels of independent (conditional on $\mathbf{\mathbf{Z}}$) and identically distributed gene reporters, then the extrinsic noise contribution ${\eta}_{\text{ext}}^{2}$ in (6) can be identified by the normalised covariance between X_{1} and X_{2},
Decomposing noise with nonidentical reporters
The dualreporter method requires distinguishable measurements of transcripts or proteins from two conditionally independent and identically distributed reporter genes integrated into the same cell. In practice, however, dual reporters rarely have identical dynamics, which is widely considered to be a significant challenge to interpreting experimental results Quarton et al., 2020. We show that, under certain conditions, the decomposition in (6) can alternatively be obtained from nonidentically distributed and notnecessarily independent reporters.
Our result relies on the observation that the covariance of any two variables can be decomposed into the expectation of a conditional covariance and the covariance of two conditional expectations (the Law of Total Covariance Ross, 2014). If $X$ and $Y$ denote, for example, the numbers of molecules of two chemical species (eg. mRNA and protein) in a given cell, then the covariance of $X$ and $Y$ can be decomposed by conditioning on the state $\mathbf{\mathbf{Z}}=({Z}_{1},\mathrm{\dots},{Z}_{n})$ of the environmental variables Z_{1},…,Z_{n},
We will see that in many cases of interest the random variable $\mathrm{E}(X;\mathbf{Z})$ (as a function of $\mathbf{\mathbf{Z}}$) splits across common variables with $\mathrm{E}(Y;\mathbf{Z})$. By this we mean that $\mathrm{E}(X;\mathbf{Z})=f({\mathbf{Z}}_{X}){h}_{X}({\mathbf{Z}}^{\mathrm{\prime}})$ and $\mathrm{E}(Y;\mathbf{Z})=g({\mathbf{Z}}_{Y}){h}_{Y}({\mathbf{Z}}^{\mathrm{\prime}})$, where $\mathbf{Z}}_{X$ are the variables of $\mathbf{\mathbf{Z}}$ that appear in $\mathrm{E}(X;\mathbf{Z})$ but not in $\mathrm{E}(Y;\mathbf{Z})$, and dually, ${\mathbf{\mathbf{Z}}}_{Y}$ are those in $\mathrm{E}(Y;\mathbf{Z})$ that are not in $\mathrm{E}(X;\mathbf{Z})$. The variables ${\mathbf{\mathbf{Z}}}^{\prime}$ are those variables from $\mathbf{\mathbf{Z}}$ not in $\mathbf{Z}}_{X}\cup {\mathbf{Z}}_{Y$. In these cases, the component of $\mathrm{C}\mathrm{o}\mathrm{v}(X,Y)$ that is contributed by the variation in $\mathbf{\mathbf{Z}}$ (the extrinsic component) may be written as the covariance of the functions ${h}_{X}({\mathbf{\mathbf{Z}}}^{\prime})$ and ${h}_{Y}({\mathbf{\mathbf{Z}}}^{\prime})$. Conveniently, in the cases of interest here, the two functions ${h}_{X}$ and ${h}_{Y}$ coincide, and this is the form we use in the following decomposition principle. The supplementary material (Appendix A) contains the proof of this result.
The noise decomposition principle (NDP)
Assume that there are measurable functions $f$, $g$, and $h$ such that $\mathrm{E}(X;\mathbf{Z})$ and $\mathrm{E}(Y;\mathbf{Z})$ split across common variables by way of $\mathrm{E}(X;\mathbf{Z})=f({\mathbf{Z}}_{X})h({\mathbf{Z}}^{\mathrm{\prime}})$ and $\mathrm{E}(Y;\mathbf{Z})=g({\mathbf{Z}}_{Y})h({\mathbf{Z}}^{\mathrm{\prime}})$. Then, provided that the variables ${Z}_{1},\mathrm{\dots},{Z}_{m}$ are mutually independent, the normalised covariance of $\mathrm{E}(X;\mathbf{Z})$ and $\mathrm{E}(Y;\mathbf{Z})$ will identify the total noise on $h({\mathbf{\mathbf{Z}}}^{\prime})$ (i.e. ${\eta}_{h({\mathbf{\mathbf{Z}}}^{\prime})}^{2}$).
As we show in the next section, there are many situations where the random variable $\mathrm{E}(X;\mathbf{Z})$ is precisely the common part of $\mathrm{E}(Y;\mathbf{Z})$ and $\mathrm{E}(X;\mathbf{Z})$ (i.e., $h({\mathbf{Z}}^{\mathrm{\prime}})=\mathrm{E}(X;\mathbf{Z})$), and the normalised intrinsic contribution to the covariance is either zero or negligible. In these cases, the normalised covariance of $X$ and $Y$ will identify precisely the extrinsic noise contribution ${\eta}_{\text{ext}}^{2}$ to the total noise ${\eta}_{X}^{2}$. To see this, consider the situation where $\mathrm{E}(Y;\mathbf{Z})=f({\mathbf{Z}}_{Y})\mathrm{E}(X;\mathbf{Z})$. Then provided $f({\mathbf{Z}}_{Y})$ and $E(X;\mathbf{\mathbf{Z}})$ are independent random variables, the extrinsic contribution to the covariance of $X$ and $Y$ is given by,
If the normalised intrinsic contribution to the covariance is either zero or is negligible, it follows from (8) that
Thus, under certain conditions, measuring the covariance between two nonidentically distributed and notnecessarily independent reporters can replace dual reporters.
The pathwayreporter method
We show that for some reporters $X$ and $Y$ belonging to the same biochemical pathway, the covariance of $X$ and $Y$ continues to identify the extrinsic, and subsequently intrinsic, noise contributions to the total noise. The basis of the pathwayreporter method depends on the emergent covariances between the various species (e.g. nascent/mature mRNA and protein) in the gene expression pathway. Qualitatively, this effect can be seen in Figure 3, which compares simulated sample distributions of a simple fourstage model of gene transcription (refer to the model ${\mathbf{\mathbf{M}}}_{4}$ below) in the case of moderate extrinsic noise to the case with no extrinsic noise. The plots are the bivariate distributions for nascentmature, nascentprotein, and matureprotein levels, respectively. This will be made more precise below, where we find that it is possible to extract quantitative information about the extrinsic noise distribution itself from this data.
Throughout this section, we assume that extrinsic noise sources act independently i.e., the environmental variables ${Z}_{1},\mathrm{\dots},{Z}_{n}$ of $\mathbf{\mathbf{Z}}$ are mutually independent. Additionally, our modelling focuses only on a single gene copy, although the same analysis applies to multiple but indistinguishable gene copies; we refer to the supplementary material (Appendix B) for more details.
Measuring noise from a constitutive gene
We consider first the simplest case where the underlying process is constitutive. We begin with a stochastic model of mRNA maturation, which we denote by ${\mathbf{\mathbf{M}}}_{1}$; Figure 4A (top left) gives a schematic of the constitutive model. In this model, the gene continuously produces nascent mRNA according to a Poisson process at constant rate ${K}_{N}$, which are subsequently spliced into mature mRNA according at rate ${K}_{M}$. Degradation of mature mRNA occurs as a firstorder Poisson process with rate ${\delta}_{M}$. The model ${\mathbf{\mathbf{M}}}_{1}$, together with its extensions, has been considered in a number of recent studies Gorin and Pachter, 2020a; Cao and Grima, 2020; La Manno et al., 2018; Gorin and Pachter, 2020b; Bergen2020, and has a known solution for the stationary joint probability distribution Jahnke and Huisinga, 2007 given by,
where $n$ is the number of nascent mRNA, $m$ is the number of mature mRNA, and the parameter $\theta =({K}_{N},{K}_{M},{\delta}_{M})$. We use ${X}_{N}$ to denote the number of nascent mRNA, ${X}_{M}$ the number of mature mRNA produced from the same constitutive gene, and $\mathbf{\mathbf{Z}}=({K}_{N},{K}_{M},{\delta}_{M})$. To simplify notation, we abbreviate the variables in ${\mathbf{\mathbf{Z}}}_{{X}_{N}}$ as ${\mathbf{\mathbf{Z}}}_{N}$, and similarly for ${\mathbf{\mathbf{Z}}}_{{X}_{M}}$. It follows immediately from (11) that ${X}_{N}$ and ${X}_{M}$ are independent conditional on $\mathbf{\mathbf{Z}}$, and so the intrinsic contribution to the covariance of ${X}_{N}$ and ${X}_{M}$ (the first term of (8)) is 0. It is also easy to see from (11) that $E({X}_{N};\mathbf{\mathbf{Z}})=f({\mathbf{\mathbf{Z}}}_{N}){K}_{N}$ and $E({X}_{M};\mathbf{\mathbf{Z}})=g({\mathbf{\mathbf{Z}}}_{M}){K}_{N}$, where $f({\mathbf{\mathbf{Z}}}_{N})=\frac{1}{{K}_{M}}$ and $g({\mathbf{\mathbf{Z}}}_{M})=\frac{1}{{\delta}_{M}}$. Since the extrinsic noise sources are assumed to act independently, it follows that the Noise Decomposition Principle (NDP) of the previous section holds. We then have that $\mathrm{C}\mathrm{o}\mathrm{v}({X}_{N},{X}_{M})={\eta}_{{K}_{N}}^{2}$, where ${\eta}_{{K}_{N}}^{2}$ is the total noise on the transcription rate parameter ${K}_{N}$. Thus, measuring $\mathrm{C}\mathrm{o}\mathrm{v}({X}_{N},{X}_{M})$ can replace dual reporters to decompose the gene expression noise at the transcriptional level.
To support our mathematical results, we simulate the model ${\mathbf{\mathbf{M}}}_{1}$ subject to parameter variation using the stochastic simulation algorithm (SSA). Table 2 compares the extrinsic noise contributions found from various simulations with the corresponding theoretical values. In each simulation, the degradation rate ${\delta}_{m}$ is fixed at 1, with the other parameters scales accordingly. The maturation rate ${K}_{M}$ is sampled according to a $\mathrm{G}\mathrm{a}\mathrm{m}\mathrm{m}\mathrm{a}(8,0.0125)$ distribution, which has coefficient of variation 0.125. We consider different noise distributions on ${K}_{N}$, producing a range of noise strengths. Our theory predicts that pathwayreporters will identify the total noise on ${K}_{N}$. Overall, we observe an excellent agreement between the results obtained by the pathwayreporter method, the dualreporter method and the theoretical noise. There is consistently slightly more variation in the pathwayreporter results compared with the dualreporter results.
To explore the pathwayreporter method more widely, we consider 60 different parameter combinations to produce a range of mean copy numbers consistent with those reported experimentally. We also consider different noise distributions taken from the scaled Beta distribution family in order to produce a range of noise strengths; refer to Supplementary file 1. The pathwayreporter method performs favourably compared to the dualreporter method calculated from mature mRNA, and consistently outperforms the dualreporter method on nascent mRNA.
Next, we consider the simplest stochastic model of gene expression that includes both mRNA and protein dynamics: the wellknown ‘twostage model’ of gene expression, which, together with its threestage extension to include promoter switching has been widely studied Raser and O'Shea, 2005; Raj et al., 2006; Thattai and van Oudenaarden, 2001; Friedman et al., 2006; Singh and Hespanha, 2007; Shahrezaei and Swain, 2008a; Bokes et al., 2012; Molina et al., 2013. We denote this model by ${\mathbf{\mathbf{M}}}_{2}$; see Figure 4A (top right) for a schematic of this model. In this model, mRNA are synthesised according a Poisson process at rate ${K}_{m}$, which are then later translated into protein at rate ${K}_{p}$. Degradation of mRNA and protein occur as firstorder Poisson processes with rates ${\delta}_{m}$ and ${\delta}_{p}$, respectively. If ${X}_{m}$ denotes the number of mRNA, ${X}_{p}$ the number of proteins produced from the same constitutive gene, and if $\mathbf{\mathbf{Z}}=({K}_{m},{K}_{p},{\delta}_{m},{\delta}_{p})$, then the stationary means and covariance are given by Thattai and van Oudenaarden, 2001; Singh and Hespanha, 2007:
It is easily verified that $\mathrm{E}({X}_{p};\mathbf{Z})=f({\mathbf{Z}}_{p})\mathrm{E}({X}_{m};\mathbf{Z})$, where $f({\mathbf{\mathbf{Z}}}_{p})=\frac{{K}_{p}}{{\delta}_{p}}$. Thus, it follows from the NDP that the normalised contribution of $\mathrm{C}\mathrm{o}\mathrm{v}({X}_{m},{X}_{p})$ contributed by $\mathbf{\mathbf{Z}}$ will identify the extrinsic noise contribution to the total noise on ${X}_{m}$. Now, if we assume that ${\delta}_{m}$ is fixed across the cellpopulation, and all parameters are scaled so that ${\delta}_{m}=1$, we have the following expression for the intrinsic contribution to the covariance of ${X}_{m}$ and ${X}_{p}$; refer to the supplementary material (Appendix B) for details.
Since mRNA tends to be less stable than protein, we have ${\delta}_{p}<1$, and often ${\delta}_{p}\ll 1$Bernstein et al., 2002; Schwanhäusser et al., 2011. So, we can expect $\alpha \ll 1$. Further, for many genes we can expect the number of mRNA per cell (${K}_{m}$) to be in the order of tens, so $1/E({K}_{m})<1$. It follows that $\mathrm{E}(\mathrm{C}\mathrm{o}\mathrm{v}({X}_{m},{X}_{p};\mathbf{Z}))\ll \text{}1$, so that $\mathrm{C}\mathrm{o}\mathrm{v}({X}_{m},{X}_{p})$ will closely approximate the extrinsic noise at the transcriptional level.
We test our theory using stochastic simulations of the model ${\mathbf{\mathbf{M}}}_{2}$ subject to parameter variation. Table 3 gives a comparison of the results of the mRNAprotein reporters and dual reporters. In each case, we varied ${K}_{p}$ according to a $\mathrm{G}\mathrm{a}\mathrm{m}\mathrm{m}\mathrm{a}(5,0.4)$ distribution and ${\delta}_{p}$ according to a $\mathrm{G}\mathrm{a}\mathrm{m}\mathrm{m}\mathrm{a}(8,0.125)$ distribution; the corresponding noise strengths are 0.20 and 0.125, respectively. We consider different noise distributions on ${K}_{m}$, which produce a range of noise strengths, and the noise distribution parameters are selected to produce a mean mRNA of approximately 50 and a mean number of approximately 1000 proteins in each simulation. As our theory predicts, the mRNAprotein reporters identify the extrinsic noise contribution to the total noise on ${X}_{m}$. Again, we can see from Table 3 that there is excellent agreement between the results of the pathway reporters and the dual reporters, with slightly more variation in the pathwayreporter results. A larger exploration of the parameter space reveals similar results; these are provided in Supplementary file 1. Thus, despite mRNA and protein numbers not being strictly independent, they can, for practical purposes, replace dual reporters to decompose the noise at the transcriptional level.
We note that both the pathwayreporter (nascentmature or matureprotein) and dualreporter methods show slower convergence when copy numbers are low. Pathway reporters usually show fractionally slower convergence and fractionally more variation than a dual reporter, as suggested by the standard deviations in Tables 2 and 3. A more detailed exploration of convergence is given in the supplementary material (Appendix C).
Measuring noise from a facultative (bursty) gene
The most common mode of gene expression that is reported experimentally is burstiness Jones and Elf, 2018; Raj et al., 2006; Bahar Halpern et al., 2015; Suter et al., 2011; Larsson et al., 2019; Golding et al., 2005, in which mRNA are produced in short bursts with periods of inactivity in between. One example is gene regulation via repression, which naturally leads to periods of gene inactivity. Here, we consider a fourstage model of bursty gene expression, which incorporates both promoter switching and mRNA maturation; we denote this model by ${\mathbf{\mathbf{M}}}_{4}$; see Figure 4A (bottom left). This model has recently been considered in Cao and Grima, 2020, where the marginal distributions are solved in some limiting cases. In this model, the gene switches probabilistically between an active state (A) and an inactive state (I), at rates λ (onrate) and μ (offrate), respectively. In the active state, nascent mRNA is synthesised according to a Poisson process at rate ${K}_{N}$, while in the inactive state transcription does not occur. Nascent mRNA is spliced into mature mRNA at rate ${K}_{M}$; this in turn is later translated into protein, with rate ${K}_{P}$. Degradation of mRNA and protein occur independently of the promoter state at rates ${\delta}_{M}$ and ${\delta}_{P}$, respectively.
For this model, we have three natural candidates for pathway reporters: (a) nascent and mature mRNA (b) mature mRNA and protein, and (c) nascent mRNA and protein reporters. Below we show that nascent mRNA–protein reporters yield consistently good estimates of the extrinsic noise contribution ${\eta}_{\mathrm{ext}}^{2}$ to the total gene expression noise, while nascent–mature and mature RNA–protein reporters are reliable in some restricted cases. We begin by showing that each of the reporter pairs (a), (b), and (c) satisfy the Noise Decomposition Principle. We then demonstrate computationally, that despite a lack of independence between these reporter pairs, the pathwayreporter method can still be used to decompose the total gene expression noise at the transcriptional level. Throughout, we let ${X}_{N}$ denote the number of nascent mRNA, we let ${X}_{M}$ denote the number of mature mRNA, and let ${X}_{P}$ denote the number of proteins produced from the same gene. We also let $\mathbf{\mathbf{Z}}=\{\lambda ,\mu ,{K}_{N},{K}_{M},{K}_{P},{\delta}_{M},{\delta}_{P}\}$.
Following Raj et al., 2006, we assume that the transcription rate ${K}_{N}$ is large relative to the other parameters. We further assume that the maturation rate ${K}_{M}$ is large (i.e. ${K}_{M}>{\delta}_{M}$), which is supported by experiments Cao and Grima, 2020. Then, using the results of Raj et al., 2006 and arguments similar to those in Cao and Grima, 2020, it can be shown that the stationary averages for the nascent mRNA, mature mRNA and protein levels are given by,
respectively. The supplementary material (Appendix B) provides more detail on how these expressions can be obtained.
We consider first the nascentmature pathway reporters, case (a). From (14), it is easily seen that $\mathrm{E}({X}_{N};\mathbf{Z})=f({\mathbf{Z}}_{N}){K}_{N}\frac{\lambda}{(\lambda +\mu )}$ and $\mathrm{E}({X}_{M};\mathbf{Z})=g({\mathbf{Z}}_{M}){K}_{N}\frac{\lambda}{(\lambda +\mu )}$, where $f({\mathbf{Z}}_{N})=\frac{1}{{K}_{M}}$ and $g({\mathbf{\mathbf{Z}}}_{M})=\frac{1}{{\delta}_{M}}$. So the NDP holds, and the normalised covariance of $\mathrm{E}({X}_{N};\mathbf{Z})$ and $\mathrm{E}({X}_{M};\mathbf{Z})$ will identify the extrinsic noise on the transcription component $K}_{N}\frac{\lambda}{(\lambda +\mu )$. Consider now the matureprotein reporters, case (b). Again, we can see from (14) that $\mathrm{E}({X}_{M};\mathbf{Z})=f({\mathbf{Z}}_{P})\mathrm{E}(X;\mathbf{Z})$, where $f({\mathbf{Z}}_{P})=\frac{{K}_{P}}{{\delta}_{P}}$. Thus, the NDP holds, and so the normalised covariance of $\mathrm{E}({X}_{M};\mathbf{Z})$ and $\mathrm{E}({X}_{P};\mathbf{Z})$ will identify the total noise on $\mathrm{E}({X}_{M};\mathbf{Z})$ (the extrinsic noise on ${X}_{M}$). For the nascentprotein reporters, case (c), it is easy to see that $\mathrm{E}({X}_{N};\mathbf{Z})=f({\mathbf{Z}}_{N}){K}_{N}\frac{\lambda}{(\lambda +\mu )}$, where $f({\mathbf{Z}}_{N})=\frac{1}{{K}_{M}}$, and $\mathrm{E}({X}_{P};\mathbf{Z})=g({\mathbf{Z}}_{P}){K}_{N}\frac{\lambda}{(\lambda +\mu )}$, where $g({\mathbf{Z}}_{P})=\frac{{K}_{P}}{{\delta}_{M}{\delta}_{P}}$. Thus, again the NDP holds, and the normalised covariance of $\mathrm{E}({X}_{N};\mathbf{Z})$ and $\mathrm{E}({X}_{P};\mathbf{Z})$ will identify the noise on the transcriptional component $K}_{N}\frac{\lambda}{(\lambda +\mu )$.
In order for the pathwayreporter method to provide a close approximation to the extrinsic noise in cases (a), (b), and (c), we require that the normalised intrinsic contribution to the covariance is either zero or negligible. This condition will hold provided there is sufficiently small correlation between the reporter pairs. In the case of (prokaryotic) mRNA and protein, this lack of correlation has been been verified experimentally in E. coli (Taniguchi et al., 2010). More generally, it is possible to provide an intuition about the conditions under which the lack of correlation might hold. The time series of copy numbers for each of nascent mRNA, mature mRNA and protein broadly follow each other, each with delay from its predecessor (Figure 4B). Parameter values that reduce this delay will tend to increase correlation, and thereby increase the normalised intrinsic contribution to the covariance. The primary example of this effect is seen when ${\delta}_{p}$ approaches, or even exceeds ${\delta}_{m}$ (or for nascentmature reporters, when ${\delta}_{M}$ approaches the maturation rate). A further contributor to high correlation between mRNA and protein, is when the system undergoes long timescale changes. In this situation, the copy numbers tend to drop to very low values for extended periods. The primary parameter influencing this type of behaviour is the activerate λ, specifically, when λ tends to 0 (Figure 4C). An illustrative example of this can be seen by considering a Telegraph system in the limit of slow switching, which produces a copy number distribution that converges to a scaled PoissonBernoulli compound distribution: even without any extrinsic noise, the pathway reporter method will identify the ${\eta}^{2}$ value of the corresponding scaled Bernoulli distribution.
An extensive computational exploration of the parameter space (Supplementary file 2) supports our intuition, though the strength of the effect varies across the three different reporter pairs. This is further corroborated by the heatmaps shown in Figure 5, which for three fixed values of μ and a broad spectrum of ${\delta}_{p}$ and λ values, give the intrinsic term ${\eta}_{\mathrm{int}}^{2}$ in (8), for fixed $\mathbf{\mathbf{Z}}$. Thus, the heatmaps provide an estimate for the overshoot error in the pathwayreporter approach. Note that blue pixels represent an overshoot estimate of less than 0.05.
For nascentprotein reporters, the normalised intrinsic contribution to the covariance is satisfactorily small (less than 0.05) except for (a) high values of ${\delta}_{p}$ in unison with (b) low values of λ (less than 1, although lower values are acceptable if ${\delta}_{p}$ is small). The assumption (a) that ${\delta}_{P}<{\delta}_{M}$ is known to be true for a large number of genes, and is justified by the difference in the mRNA and protein lifetimes. While there is of course variation across genes and organisms, values of ${\delta}_{P}\le 0.5{\delta}_{M}$ and even ${\delta}_{P}\le 0.2{\delta}_{M}$ seem reasonable for the majority of genes. In E. coli Taniguchi et al., 2010 and yeast Belle et al., 2006, for example, mRNA are typically degraded within a few minutes, while most proteins have lifetimes at the level of 10 s of minutes to hours. For mammalian genes Schwanhäusser et al., 2011, it has been reported that the median mRNA decay rate ${\delta}_{M}$ is (approximately) five times larger than the median protein decay rate ${\delta}_{P}$, determined from 4200 genes. Assumption (b) requires that the gene is sufficiently active. In a recent paper by Larsson et al., 2019, the promoterswitching rates λ, μ, and the transcription rate $K$ of the Telegraph model are estimated from singlecell data for over 7000 genes in mouse and human fibroblasts. Of those genes with mean mRNA levels greater than 5, we found that over 90% have a value for λ of at least 0.5, and over 65% have a value for λ greater than 1. In Cao and Grima, 2020, a comprehensive list of genes (ranging from yeast cells through to human cells) with experimentally inferred parameter values are sourced from across the literature (see Table 1 in Cao and Grima, 2020). After scaling the parameter values of the 26 genes reported there, we find that around 88% have a value for λ of at least 0.5, and approximately 58% have a value for λ greater than 1. Thus, the heatmaps given in Figure 5 (top panel) suggest that nascentprotein reporters will provide a satisfactory estimate of the extrinsic noise level for a substantial fraction of genes.
The mature nRNA–protein reporters are less reliable, with the requirement of slow protein decay and higher onrate being more pronounced than for the nascent mRNA–protein reporters; this is evident from Figure 5 (second panel). The performance of the nascent–mature reporter is of course independent of ${\delta}_{p}$, but is only viable in the case of a large onrate (see Figure 5—figure supplement 1).
We test our approach for each of the pathway reporter pairs (a), (b), and (c) against dual reporters using stochastic simulations. Table 4 shows the results from six simulations across a spectrum of behaviours from moderately slow switching, to fast switching as well as a range of levels of burstiness. For each of the parameters $\mu ,\lambda ,{K}_{P},{\delta}_{P}$, we selected a scaled $\mathrm{B}\mathrm{e}\mathrm{t}\mathrm{a}(5,6)$ distribution, with squared coefficient of variation ${\eta}^{2}=0.1$; the scaling is chosen in each case to achieve a mean value equal to the parameter value given in Table 4. It is routinely verified that scaling these distributions does not change the value of ${\eta}^{2}$. The parameter ${K}_{N}$ is given the noise distribution $\mathrm{B}\mathrm{e}\mathrm{t}\mathrm{a}(3,6)$, which has a slightly higher coefficient of variation ${\eta}^{2}=0.2$. In order to achieve direct benchmarking against the dual reporters, the parameter ${K}_{M}$ is fixed; we select ${K}_{M}=10$. This is because the nascentprotein pathway reporter estimates noise on the value of ${K}_{N}\frac{\lambda}{\lambda +\mu}$, while the mature mRNA dualreporter measures noise on $\frac{{K}_{N}}{{K}_{M}}\frac{\lambda}{\lambda +\mu}$, and these coincide only when ${K}_{M}$ is fixed. The mean values of ${K}_{N}$ and ${K}_{P}$ are chosen to achieve approximate average nascent mRNA levels, mature mRNA levels and protein levels at 5, 50, and 1000 respectively, given the chosen values of $\lambda ,\mu ,{\delta}_{P}$.
The results for the nascent mRNA–protein reporters, case (c), given in Table 4 show comparable performance to dual reporters, with only modest overshoot; even in the worst performing case of $\lambda =0.5$, $\mu =1$ the result of the pathway reporters is within one standard deviation, in a very tight distribution. The error heatmaps of Figure 5 provide an accurate estimate of the overshoot in the nascentprotein results in Table 4. As an example, the first row is most closely matched by the heatmap at top left of Figure 5, which at $\lambda =0.5$ and ${\delta}_{P}=0.1$ is suggestive of an error around the boundary between blue and red (around 0.06). The same accuracy is obtained for the other rows. As predicted, the matureprotein reporters show significantly more overshoot, especially with the less active genes. Improved accuracy can again be obtained by subtracting the estimated overshoot given in the error heatmaps from the obtained value. Thus for example, the error heatmap for $\mu =2$ (Figure 5 lower left) gives an error approximately 0.07 for $\lambda =1,{\delta}_{P}=0.1$, which agrees very closely to the actual overshoot of 0.07 shown in the corresponding row of Table 4. An overshoot of approximately 0.06 is suggested by the heatmap for $\mu =2$, when $\lambda =2,{\delta}_{P}=0.3$, which leads to a correction from 0.35 in Table 4 to a value of 0.29. This is quite consistent with the dual reporter benchmark of 0.27. As expected (based on Figure 5—figure supplement 1), nascentmature reporters do not perform well on bursty systems except for high λ and so the values are not included in Table 4; only in the case of $\lambda =\mu =10$ does the result begin to approach the dual reporter value, returning $0.32\pm 0.03$.
Generality of the pathway reporter method
To test the robustness of our pathway reporter approach, we validate our theoretical results on various other gene expression dynamics. (1): We begin by considering a more detailed model of the mRNA maturation process, where the nascent mRNA maturate after a fixed amount of time. The assumption of a fixed maturation time is sometimes taken to approximate the combined effect of intermediate maturation processes such as initiation, elongation, and release Xu et al., 2016. More specifically, we consider the model ${\mathbf{\mathbf{M}}}_{4}$ above (Figure 4D) in the case of constitutive expression ($\lambda =1$, $\mu =0$), and replacing the firstorder reaction ${K}_{M}$ by a fixed interval of time. Additionally, we explore maturation times sampled from Erlang distributions, to account for the fact that maturation can involve several shorter stochastic processes. We find that the extrinsic noise contribution obtained using the nascent and mature mRNA reporters match closely to the true (dual reporter) values across a range of maturation times; refer to the supplementary material for details.
(2): Next we consider an extension of a model of transcriptional bursting given in Bartman et al., 2019; Cao et al., 2020. The model we consider is the same as in Bartman et al., 2019; Cao et al., 2020, however, is extended to include protein synthesis and degradation. This model captures the transcriptional process at a finer level of detail, and is argued in Cao et al., 2020 to be the model most closely matching experimental observations. In this more nuanced ‘multiscale’ model, the gene stochastically switches between three states: two active states S_{10} and S_{11}, and one inactive state S_{0}. The activation of the gene occurs in two steps, initially by the binding of transcriptional factors (transition from S_{0} to S_{10} at rate ${\lambda}_{1}$, and reversible at rate ${\mu}_{1}$), and then as a secondary step, by the binding and pause of the mRNA polymerase (transition from S_{10} to S_{11} at rate ${\lambda}_{2}$). Transitions from S_{11} to S_{0} also occur at rate ${\mu}_{1}$, due to detachment of both the transcriptional factors and polymerase. Transcription of nascent mRNA (at rate ${K}_{N}$) occurs only in state S_{11} and results in immediate transition to state S_{10}. Nascent mRNA maturate at rate ${K}_{M}$, and are subsequently translated into protein at rate ${K}_{p}$. Degradation of mRNA and protein occur with rates ${\delta}_{m}$ and ${\delta}_{p}$, respectively. All reactions are considered to be firstorder with exponentially distributed waiting times between successive reactions. A schematic of the model is given in Figure 6 (inner rectangle). In this case, we are again able to prove that the Noise Decomposition Principle holds for all reporter pairs taken from this pathway using existing formulæ derived in Cao et al., 2020 for the marginal means. For details refer to the supplementary material (Appendix Convergence of Pathway and Dual Reporters).
(3): We combine models (1) and (2) above, incorporating the fixed time maturation of (1) with the multiscale approach of (2).
(4): The cell cycle is a major contributing factor to extrinsic noise, introducing population heterogeneity (as cells are at different stages of the cell cycle), as well as internal dynamics to parameter values. Here we incorporate the salient features of the cell cycle into model (3), which is measurable as extrinsic noise by our methodology. Specifically, we model the effects of (i) gene replication, (ii) dosage compensation, (iii) binomial partitioning of products due to cell division, as well as (iv) cellcycle length variability. Refer to Figure 6 for a schematic. This model is an extension of that solved in Cao and Grima, 2020. Our model further incorporates the multiscale dynamics of model (3) and the Erlangdistributed maturation times of model (1). As far as we are aware, this model has not been explored even by stochastic simulations before. A detailed description of how the above cellcycle effects are captured in our model is given as follows. (i) Replication results in a doubling of the gene from one to two at the replication time, t_{r}. This replication occurs over a period which is much shorter than the length of the cell cycle, and we follow the assumptions in Cao and Grima, 2020 by considering it to occur instantaneously. (ii) Dosage compensation changes the rate at which the gene switches from inactive to active (${\lambda}_{1}$) upon replication at time t_{r}. Again following Cao and Grima, 2020, the activation rate after replication is 70% of the rate prior to replication. (iii) Binomial partitioning of nascent mRNA, mature mRNA and protein at cell division. We assume that nascent mRNA, mature mRNA, and protein segregate into the two daughter cells, with independent probability $1/2$. We follow just one of the daughter cells, chosen at random with equal probability. (iv) Cellcycle length variability. The time between successive cell divisions is stochastic, and is assumed to be sampled from an Erlang distribution. This has been shown to be consistent with experimental data Cao and Grima, 2020. The time to replication, and subsequently to cell division, are both chosen from an Erlang distribution with shape parameter 12, which produces a total cell cycle length distributed according to $\mathrm{E}\mathrm{r}\mathrm{l}\mathrm{a}\mathrm{n}\mathrm{g}(24,\lambda )$; this matches the $\mathrm{E}\mathrm{r}\mathrm{l}\mathrm{a}\mathrm{n}\mathrm{g}(24,\lambda )$ cell cycle length in Cao and Grima, 2020, where replication is at the exact halfway point. We similarly choose a rate parameter $\lambda ={\lambda}_{1}$ chosen at a commensurate proportion to our mRNA decay rate of $\delta =1$.
In each of the cases (1 – 4) above, we find that the correlations between reporter pairs is negligible, and the predicted contribution of extrinsic noise matches those obtained from the dual reporter method across a range of parameter combinations. Details of the simulation methods and results can be found in the supplementary material (Appendix Convergence of Pathway and Dual Reporters). In summary, the results show that our pathway reporter approach is remarkably insensitive to the specific dynamics of mRNA and protein synthesis. In particular, the correlations between reporter pairs do not strongly depend on the details of the gene expression model used.
Discussion
Despite the proliferation of experimental methods for singlecell profiling, the ability to extract transcriptional dynamics from measured distributions of mRNA copy numbers is limited. In particular, the multiple factors that contribute to mRNA heterogeneity can confound the measured distribution, which hinders analysis. Theoretical innovations that allow us to quantify and help in identifying the causes of these observable effects are therefore of great importance. In this work, we have demonstrated, through a series of mathematical results, that it is impossible to delineate the relative sources of heterogeneity from the measured transcript abundance distribution alone: multiple possible dynamics can give rise to the same distribution. Our approach involves establishing integral representations for distributions that are commonly encountered in singlecell data analysis, such as the negative binomial distribution and the stationary probability distribution of the Telegraph model. We show that a number of wellknown representations can be obtained from our results. A particular feature of our nonidentifiability results is that population heterogeneity inflates the apparent burstiness of the system. It is therefore necessary to collect further information, beyond measurements of the transcripts alone, in order to constrain the number of possible theoretical models of gene activity that could represent the system. In particular, additional work may be required to determine the true level of burstiness of the underlying system.
We have developed a theoretical framework for estimating levels of extrinsic noise, which can assist in resolving nonidentifiability problems. The dual reporter method of Swain et al., 2002 already provides one such approach; but it is experimentally challenging to set up in many systems, and requires strictly identical and conditionally independent pairs of gene reporters. Our Noise Decomposition Principle directly generalises the theoretical underpinnings of the dual reporter method and related approaches Bowsher and Swain, 2012; Jetka et al., 2018; here we have used it to identify a practical approach—the pathwayreporter method—for obtaining an effective and experimentally more easily implementable noise decomposition. Our approach allows us to use measurements of two different species from the transcriptional pathway of a single gene copy instead of having to set up a more cumbersome dual reporter assay. The accuracy of the pathwayreporter method is provably identical for constitutive gene expression, and in the case of nascentmature mRNA reporters, the measurements are readily obtainable from current singlecell data Shah et al., 2018; La Manno et al., 2018; Skinner et al., 2016. For bursty systems, the method in general provides only an approximation of the extrinsic noise. We are, however, able to demonstrate computationally, that one of the proposed pathway reporters provides a satisfactory estimate of the extrinsic noise for most genes. The other pathway reporters also provide viable estimates of the extrinsic noise in some cases. We further validate our theoretical framework on synthetic data for genes with various underlying gene expression dynamics. Our results show that the pathway reporter method is independent of the specific dynamics of mRNA and protein synthesis, and therefore should be applicable to a broad range of experimental scenarios.
Despite the generality of our theoretical contribution, our pathwayreporter approach has some caveats. In particular, the approach relies on the assumption that extrinsic noise sources act independently. Experimentally, however, these may be correlated. For example, it has been suggested Hilfinger et al., 2016; Cole and LutheySchulten, 2017 that the transcription and translation rates in E. coli anticorrelate. Additional work is required to determine degree to which the independence of noise sources is a reasonable assumption.
Recent developments in singlecell profiling now allow simultaneous measurements of transcripts and proteins in thousands of single cells Stoeckius et al., 2017; Peterson et al., 2017; Reimegård et al., 2019. As discussed in Gorin and Pachter, 2020a, experimental improvements that would additionally allow measurements of nascent transcripts, coupled with theoretical developments such as those presented here, will allow for identification of noise sources on a genomewide scale. Our work reveals that extrinsic noise distorts the multivariate copy number distribution of the different species in the gene expression pathway. We have exploited this to yield reliable estimates of noise strength, which we are confident will assist in setting better practices for model fitting and inference in the analysis of singlecell data. A more nuanced analysis of this multivariate distribution may offer even further insight into model and noise identifiability.
Appendix 1
Derivations of the nonidentifiability results
In the main text we provide a number of examples of when the compound distribution does not have a unique representation. We here provide the full details of the derivations of these results.
Bursty expression: telegraph representation
Consider first a Telegraph distribution ${\stackrel{~}{p}}_{T}(n;\lambda ,{\mu}^{\prime},{K}^{\prime})$ and the probability density function of the scaled beta distribution ${\mathrm{B}\mathrm{e}\mathrm{t}\mathrm{a}}_{{K}^{\mathrm{\prime}}}(\lambda +\mu ,{\mu}^{\mathrm{\prime}}\mu )$, given by
Note that this distribution has support $[0,{K}^{\prime}]$. In the main text, we claim that the Telegraph distribution ${\stackrel{~}{p}}_{T}(n;\lambda ,{\mu}^{\prime},{K}^{\prime})$ can be obtained from compounding a Telegraph distribution by a scaled beta distribution with pdf $f(t;\lambda +\mu ,{\mu}^{\prime}\mu )$. In other words, that ${\stackrel{~}{p}}_{T}(n;\lambda ,{\mu}^{\prime},{K}^{\prime})$ can be written as:
This is Equation (3) in the main text. Starting from the right hand side of (15), we have
Substituting $t={K}^{\prime}T$ and simplifying, the right hand side of (16) becomes
Now, using the integral representation Olver et al., 2010 [13.4.2] of the confluent hypergeometric function (with $a=\lambda +n$, $b=\lambda +{\mu}^{\prime}+1$ and $c=\lambda +\mu +1$), this becomes
which is the left hand side of (15). Hence, we have that (15) holds.
Instantaneously bursty expression: negative binomial representations
We consider first the representation given in Equation (4) in the main text. Recall that we let ${\stackrel{~}{p}}_{\text{NB}}(n;r,\beta )$ denote the probability mass function of a $\mathrm{N}\mathrm{e}\mathrm{g}\mathrm{B}\mathrm{i}\mathrm{n}(r,\frac{\beta}{\beta +1})$ distribution, where $\beta >0$. In the main text, we claim that for any negative binomial distribution of the form $\mathrm{N}\mathrm{e}\mathrm{g}\mathrm{B}\mathrm{i}\mathrm{n}(\lambda ,\frac{\beta}{\beta +1})$ (where $\beta >0$) we have,
where $f(t;\lambda +\mu ,\beta )$ is the probability mass function of a $\mathrm{G}\mathrm{a}\mathrm{m}\mathrm{m}\mathrm{a}(\lambda +\mu ,\beta )$ distribution. Beginning with the right hand side of (17) we have,
We now apply the following identity given in Saad and Hall, 2003 [Appendix 1] with $a=\lambda +n$, $b=\lambda +\mu +n$, $k=1$, $d=\lambda +\mu +n$ and $h=\beta $.
So the left hand side of (18) becomes
which is the right hand side of (17). Hence, we have that (17) holds. We now consider the representation given in Equation (5) of the main text. Here we claim that any negative binomial distribution of the form $\mathrm{N}\mathrm{e}\mathrm{g}\mathrm{B}\mathrm{i}\mathrm{n}({\lambda}^{\mathrm{\prime}},\frac{b}{1+b})$ (where $b>0$) can be written as,
where ${f}_{b}(b\theta 1;\lambda {\lambda}^{\prime},{\lambda}^{\prime})$ is the probability mass function of a scaled beta prime ${\mathrm{B}\mathrm{e}\mathrm{t}\mathrm{a}\mathrm{P}\mathrm{r}\mathrm{i}\mathrm{m}\mathrm{e}}_{b}(\lambda {\lambda}^{\mathrm{\prime}},\lambda )$ distribution, where $b>0$ and $\lambda >{\lambda}^{\prime}$. Starting from the right hand side of (20), we have
Now substituting $H=b\theta 1$ and simplifying, we obtain
and letting $y=b+1$ and simplifying we have that
Here, we used the fact that the integral in the variable $y$ is the probability density function of a $\mathrm{B}\mathrm{e}\mathrm{t}\mathrm{a}\mathrm{P}\mathrm{r}\mathrm{i}\mathrm{m}\mathrm{e}(\lambda {\lambda}^{\mathrm{\prime}},{\lambda}^{\mathrm{\prime}}+n)$ distribution. Thus, Equation (21) simplifies to
which is the right hand side of (20). Thus, we have that (20) holds.
Appendix 2
Proof of the Noise Decomposition Principle
Here we provide the proof of the Noise Decomposition Principle given in the main text. For convenience we restate the principle below.
The Noise Decomposition Principle (NDP)
Assume that there are measurable functions $f$, $g$ and $h$ such that $\mathrm{E}(X;\mathbf{Z})$ and $\mathrm{E}(Y;\mathbf{Z})$ split across common variables by way of $\mathrm{E}(X;\mathbf{Z})=f({\mathbf{Z}}_{X})h({\mathbf{Z}}^{\mathrm{\prime}})$ and $\mathrm{E}(Y;\mathbf{Z})=g({\mathbf{Z}}_{Y})h({\mathbf{Z}}^{\mathrm{\prime}})$. Then provided that the variables ${Z}_{1},\mathrm{\dots},{Z}_{m}$ are mutually independent, the normalised covariance of $\mathrm{E}(X;\mathbf{Z})$ and $\mathrm{E}(Y;\mathbf{Z})$ will identify the total noise on $h({\mathbf{Z}}^{\mathrm{\prime}})$ (i.e., ${\eta}_{h({Z}^{\prime})}^{2}$).
Consider first the covariance of $\mathrm{E}(X;\mathbf{Z})$ and $E(Y;\mathbf{\mathbf{Z}})$. We have
Note that here we used the fact that the variables in $\mathbf{Z}=({Z}_{1},\dots ,{Z}_{n})$ are mutually independent. Now using the fact that $\mathrm{E}(X)=\mathrm{E}(\mathrm{E}(X;\mathbf{Z}))$ and $\mathrm{E}(Y)=\mathrm{E}(\mathrm{E}(Y;\mathbf{Z}))$ (the Law of Total Expectation), and then normalising we obtain
Hence, under certain conditions the normalised covariance of $\mathrm{E}(X;\mathbf{Z})$ and $\mathrm{E}(Y;\mathbf{Z})$ will identify the total noise on $h({\mathbf{\mathbf{Z}}}^{\prime})$.
Appendix 3
Pathway reporters
Justification for simulating only one copy of the gene
Our simulations and theory have been based over reporters from a single gene copy, whereas in practice there may be multiple copies of the gene that contribute to the overall mRNA. If there are mechanisms in place to distinguish the mRNA or protein of one gene copy from another then the theory and analysis we have developed in main paper holds without change. In the case where it is not possible to distinguish mRNA and protein (respectively) from the multiple gene copies, then we now observe that the general theory continues to hold, provided there is independence between the gene copies; this assumption has been verified experimentally Skinner et al., 2016. Here, we demonstrate this in the case of two gene copies, though the general case for more than two genes is essentially identical but is notationally cumbersome. We are considering a situation where the variable $X$ in the Noise Decomposition Principle is the sum of two independent variables ${X}_{1},{X}_{2}$ and $Y$ is the sum of two independent variables ${Y}_{1},{Y}_{2}$. We assume common dependence on the environmental variables $\mathbf{\mathbf{Z}}$ so that $E({X}_{1};\mathbf{\mathbf{Z}})=E({X}_{2};\mathbf{\mathbf{Z}})$, $E({Y}_{1};\mathbf{\mathbf{Z}})=E({Y}_{2};\mathbf{\mathbf{Z}})$. Using these equalities and the independence of $X}_{1},{X}_{2$ and ${Y}_{1},{Y}_{2}$ in $X={X}_{1}+{X}_{2}$, $Y={Y}_{1}+{Y}_{2}$, we find the numerator of
is simply $4\mathrm{C}\mathrm{o}\mathrm{v}(\mathrm{E}({X}_{1};\mathbf{Z}),\mathrm{E}({Y}_{1};\mathbf{Z}))$, while the denominator is $4\mathrm{E}({X}_{1})\mathrm{E}({Y}_{1})$. Thus the noise decomposition coincides with that for the single copy $X}_{1},{Y}_{1$. Further work may be required to consider systems where there is independence between, or there is significant deviations in the gene copies.
Upper bound for the intrinsic contribution to the covariance: constitutive expression
In the main text, we claim that the error in the pathwayreporter approach in the case of mRNAprotein reporters is negligible (i.e. the error is $\ll 1$); refer to Equation (13) in the main text. We here provide full details of the derivation of this expression. First let ${X}_{m}$ and ${X}_{p}$ be the number of mRNA and protein produced from the same constitutive gene modelled by the 'twostage' model, ${\mathbf{\mathbf{M}}}_{2}$ (see Figure 4A (top right) of the main text). Also, let $\mathbf{Z}=\{{K}_{m},{\delta}_{m},{K}_{p},{\delta}_{p}\}$.
We restate here the expression for the intrinsic contribution to the covariance of ${X}_{m}$ and ${X}_{p}$ given as Equation (13) of the main text.
We require the following expressions for the stationary mean mRNA level and protein level of the twostage model Thattai and van Oudenaarden, 2001; Singh and Hespanha, 2007.
Assuming that ${\delta}_{m}$ is fixed across the cellpopulation, and all parameters are scaled so that ${\delta}_{m}=1$, it follows from (28), that
Using the Total Law of Expectation, we also have,
Thus, it follows that
Determination of the marginal means for model ${\mathbf{\mathbf{M}}}_{4}$
In order to establish the Noise Decomposition principle in the case of bursty gene expression, we rely on expressions for the marginal stationary means of the model ${\mathbf{\mathbf{M}}}_{4}$; see Figure 4A (bottom left) and the associated caption for more details. To derive the marginal means for nascent and mature mRNA and for protein (Equation (14) of the main text), we first observe that the nascent mRNA population may treated identically to that of mRNA in general (that is, no distinction between nascent and mature), as in Peccoud and Ycart, 1995, except that mRNA decay is replaced by the sum of decay and maturation. As in the work of Cao and Grima, 2020, the assumption of fast maturation allows us to ignore decay completely in the nascent phase, so that the marginal distribution is identical to that of Peccoud and Ycart, 1995, except with decay replaced by maturation. This leads to a marginal nascent mRNA mean of
The marginal means for mature mRNA and protein are derived in Raj et al., 2006 under the assumption that the transcription rate parameter ${K}_{N}$ is large relative to the other parameters. The expressions are given by
Formally, the marginal means in Raj et al., 2006 are for the threestage model ${\mathbf{\mathbf{M}}}_{3}$, which ignores the downstream processing of mRNA, such as splicing. The assumption of fast maturation however, justifies the treatment of the nascent phase of mRNA as an ephemeral step within the Poissonian modelling of mRNA transcription.
There are a number of possibly compounding assumptions on the parameters here, but simulations show that there is a lot of tolerance, with even only moderate maturation and transcription still returning sample means consistent with the formulas.
Measuring noise from an instantaneously bursty gene
Here we use the simple stochastic model of Singh and Bokes, 2012 that includes both instantaneous transcriptional bursting and mRNA maturation. This model can be obtained as a limiting case of the model ${\mathbf{\mathbf{M}}}_{4}$ (Figure 4A (top left) in the main text), where the offrate μ has tended toward infinity, while the onrate λ remains fixed. Under this condition, transcription is rare enough to be considered instantaneous, leading to ‘bursts’ of transcriptional activity. The intensity of the bursts $M$ is known to be geometrically distributed with mean burst size $B=\frac{{K}_{N}}{\mu}$.
If we let ${X}_{N}$ and ${X}_{M}$ denote the number of nascent and mature mRNA, respectively, then according to Singh and Bokes, 2012, the steadystate marginal means and covariance are given by
It is easily seen from (32) that $\mathrm{E}({X}_{N};\mathbf{Z})=f({\mathbf{Z}}_{N})B\lambda$ and $\mathrm{E}({X}_{M};\mathbf{Z})=g({\mathbf{Z}}_{M})B\lambda$, where $f({\mathbf{Z}}_{N})=\frac{1}{{K}_{M}}$ and $g({\mathbf{Z}}_{M})=\frac{1}{{\delta}_{M}}$. Thus, the Noise Decomposition Principle holds. Using arguments similar to those given in the above section, it is straightforward to derive the following expression for the error in the pathway reporter approach:
We can expect that $\alpha \approx 1$, so that unless the burst frequency is substantial, that is $\lambda \gg 1$, the overshoot in the the nascentmature pathwayreporter approach is significant.
Appendix 4
Convergence of pathway and dual reporters
Convergence of both pathway reporters and dual reporters in low output genes is slower than for high output genes, and this affect is most pronounced for reporters taken from nascent mRNA. Fig. Convergence of Pathway and Dual Reporters compares convergence in some low output genes and high output genes for nascentmature pathway reporters and the maturemature dual reporter, in the case of constitutive expression. From Equation (11) of the main text and the NDP, these should measure precisely the extrinsic noise on the transcription rate parameter ${K}_{N}$. In the low output gene, both the dual reporter and pathway reporters are yet to show accurate measurement of the overall extrinsic noise after 1000 samples (Figure Convergence of Pathway and Dual ReportersA), although they are providing rough estimates after as little as a few hundred samples. Pathway reporters for nascentmature typically exhibit slightly slower convergence than maturemature dual reporter; the difference is marginal, but can be seen for both the high output gene (compare the values after 500 samples) and the low output gene (compare the values after 1000 samples). In this figure, all parameters (except the reference parameter ${\delta}_{M}$) were given scaled Beta distribution noise. The noise on ${K}_{N}$ is a $\mathrm{B}\mathrm{e}\mathrm{t}\mathrm{a}(3,6)$ distribution scaled to achieve $E({K}_{N})=5$ in the low output case and $E({K}_{N})=50$ in the high output case. The squared coefficient of variation is 0.2, and is shown in green in the figure.
Convergence for the matureprotein reporters is considered in Figure Convergence of Pathway and Dual Reporters. Here, Equation (13) of the main text shows that we should expect an overshoot in comparison to the maturemature dual reporter. We have again compared a low output gene with a high output gene, and with all parameters (except ${\delta}_{M}$) experiencing noise. The transcription is again given scaled $\mathrm{B}\mathrm{e}\mathrm{t}\mathrm{a}(3,6)$ distribution, having squared coefficient of variation 0.2, with scaling to achieve $E({K}_{N})=5$ in low output and $E({K}_{N})=50$ in the high output. The noise distribution for ${\delta}_{P}$ is a $\mathrm{B}\mathrm{e}\mathrm{t}\mathrm{a}(8,6)$ distribution, scaled to achieve a mean value of 0.2. Computational sampling from 10^{7} samples finds the value of $\frac{\mathrm{E}(1/({\delta}_{p}+1))}{\mathrm{E}(1/{\delta}_{p})}$ as approximately 0.1573. The high output gene then expects an overshoot of 0.003146 from the matureprotein reporters, while the low output gene expects an overshoot of approximately 0.03146. Figure Convergence of Pathway and Dual ReportersA shows comparison of convergence over the first 2000 samples, with the theoretical values accommodating the calculated overshoot. Figure Convergence of Pathway and Dual ReportersB shows the same over 20,000 samples; this time two lowoutput two highoutput genes are considered, so that the variation around the expected long term value can be seen. It is evident that reasonable estimates are given after a relatively modest number of samples, but there is a very long delay to a highly accurate convergence for the pathway reporters in the low output gene.
Appendix 5
Generality of the pathway reporter method: model and simulation details
We verify our pathway reporter approach on synthetic data for genes with various underlying gene expression dynamics. In the main text, we consider four different scenarios, beyond the standard fourstage model of gene transcription. Below we give details of these models, and their method of simulation.
(1): First, we consider a more detailed model of the mRNA maturation process, where the nascent mRNA maturate after a more finely controlled time interval. Specifically we adjust the constitutive model of gene transcription (${\mathbf{\mathbf{M}}}_{4}$ of the main text with $\lambda =1$ and $\mu =0$) to include a fixedduration of maturation. We also explore timeintervals sampled from Erlang distributions, to account for the fact that maturation can involve several shorter stochastic processes. The process is simulated using an adaptation of SSA, whereby the time of maturation of an individual nascent mRNA is calculated (by sample from the chosen distribution) at the point it is transcribed, and this timepoint is queued in a register of maturation times; this can be trivially adapted to other distributions; generalisedErlang for example. At each iteration of the simulation, determination of the next event involves taking the smallest of the various exponentially sampled Markov processes (on, off, mRNA transcription, etc) and the next maturation event (the minimum of the maturation register). In the case that maturation is the next event, the corresponding maturation time is removed from the maturation register.
(2): We next consider a model of transcriptional bursting given in Bartman et al., 2019; Cao et al., 2020. In this ‘multiscale’ model, gene activation occurs in two steps, initially by the binding of transcription factors, and then as a secondary step by the binding and pause of the mRNA polymerase. For details of the model refer to Figure 6 and the surrounding paragraphs in the main text. We here show that our Noise Decomposition Principle holds for all reporters taken from this gene pathway, and verify computationally that correlations between some of the reporter pairs are negligible; the process is simulated by the SSA. In the following, we assume all reactions are firstorder, characterised by exponential waiting times between successive reactions. Let ${X}_{N}$ denote the number of nascent mRNA, let ${X}_{M}$ denote the number of mature mRNA, and let ${X}_{P}$ denote the number of proteins produced from the same gene. Also let $\mathbf{\mathbf{Z}}=\{{\lambda}_{1},{\mu}_{1},{\lambda}_{2},{K}_{N},{K}_{M},{K}_{P},{\delta}_{M},{\delta}_{P}\}$. We assume that the maturation rate ${K}_{M}$ is large (i.e. ${K}_{M}>{\delta}_{M}$), which is supported by experiments Cao and Grima, 2020. Then using the results of Cao et al., 2020, the stationary averages for the nascent mRNA, mature mRNA and protein levels are given by,
respectively. Here $\gamma}_{1}={\lambda}_{1}+{\mu}_{1$ and ${\gamma}_{2}={K}_{N}+{\lambda}_{2}+{\mu}_{1}$. We begin by considering the nascentmature pathway reporters. From (34), it is easily seen that $\mathrm{E}({X}_{N};\mathbf{Z})=f({\mathbf{Z}}_{N}){K}_{N}\frac{{\lambda}_{1}{\lambda}_{2}}{{\gamma}_{1}{\gamma}_{2}}$ and $\mathrm{E}({X}_{M};\mathbf{Z})=g({\mathbf{Z}}_{M}){K}_{N}\frac{{\lambda}_{1}{\lambda}_{2}}{{\gamma}_{1}{\gamma}_{2}}$, where $f({\mathbf{Z}}_{N})=\frac{1}{{K}_{M}}$ and $g({\mathbf{\mathbf{Z}}}_{M})=\frac{1}{{\delta}_{M}}$. So the NDP holds, and the normalised covariance of $\mathrm{E}({X}_{N};\mathbf{Z})$ and $\mathrm{E}({X}_{M};\mathbf{Z})$ will identify the extrinsic noise on the transcriptional component ${K}_{N}\frac{{\lambda}_{1}{\lambda}_{2}}{{\gamma}_{1}{\gamma}_{2}}$. For the matureprotein reporters, we can again see from (34) that $\mathrm{E}({X}_{M};\mathbf{Z})=f({\mathbf{Z}}_{P})\mathrm{E}(X;\mathbf{Z})$, where $f({\mathbf{Z}}_{P})=\frac{{K}_{P}}{{\delta}_{P}}$. Thus the NDP holds, and so the normalised covariance of $\mathrm{E}({X}_{M};\mathbf{Z})$ and $\mathrm{E}({X}_{P};\mathbf{Z})$ will identify the strength of extrinsic noise in the mRNA copy number distribution. For the nascentprotein reporters, it is easy to see that $\mathrm{E}({X}_{N};\mathbf{Z})=f({\mathbf{Z}}_{N}){K}_{N}\frac{{\lambda}_{1}{\lambda}_{2}}{{\gamma}_{1}{\gamma}_{2}}$, where $f({\mathbf{\mathbf{Z}}}_{N})=\frac{1}{{K}_{M}}$, and $\mathrm{E}({X}_{P};\mathbf{Z})=g({\mathbf{Z}}_{P}){K}_{N}\frac{{\lambda}_{1}{\lambda}_{2}}{{\gamma}_{1}{\gamma}_{2}}$, where $g({\mathbf{\mathbf{Z}}}_{P})=\frac{{K}_{P}}{{\delta}_{M}{\delta}_{P}}$. Thus, again the NDP holds, and the normalised covariance of $\mathrm{E}({X}_{N};\mathbf{Z})$ and $\mathrm{E}({X}_{P};\mathbf{Z})$ will identify the noise on the transcriptional component $K}_{N}\frac{{\lambda}_{1}{\lambda}_{2}}{{\gamma}_{1}{\gamma}_{2}$.
(3): We combine models (1) and (2) above, incorporating the fixed time maturation of (1) with the finelevel transcriptional approach of (2). Again, we employ the adaptation of the SSA described in (1).
(4): We incorporate the salient features of the cellcycle into model (3), as well as introducing Erlangdistributed maturation times. Specifically, the model accounts for the effects of gene replication, dosage compensation, binomial partitioning of products due to cell division, as well as cellcycle length variability. Within each phase of the cell cycle, the handling of maturation times for a given gene copy is exactly as in Case (1) above, but within the overall system described in Case (3). To capture the cell cycle, the SSA is now further modified to handle the two key sporadic changes: gene replication and cell division. After gene replication, we perform two independent modified SSA simulations, with common parameter values. The final copy numbers are obtained as a sum of those from each gene copy. At cell division, we follow just one daughter cell, selecting the inherited copy numbers by way of binomial partitioning from the mother cell values at the point of division (including the maturation register). The length of these cellcycle phases is sampled from the chosen Erlang distribution, on commencement of the simulation of the phase.
The results of pathway reporters for each of the Cases (1 – 4) above are given in Appendix 5—tables 1–8 below. In all cases, the values given are the average of 20 simulations, each calculated from 500 copy number samples, and the errors are ± one standard deviation. For each model, we consider two parameter sets, each without extrinsic noise (Subcase A) and with extrinsic noise (Subcase B). For Case (1), the model parameters are chosen to produce an average nascent mRNA copy number of 10, and an average number of 2000 proteins; the average mature mRNA copy number varies according to the maturation time. We find that all of the pathway reporters accurately predict the true extrinsic noise levels (as given by dual reporters) across a range of maturation times; refer to Appendix 5—table 1, 2. For Cases (2 – 4), the model parameters are chosen to produce an average nascent mRNA copy number of 5 and an average mature mRNA copy number of 100, and an average number of 2000 proteins. We verify computationally that the matureprotein and nascentprotein reporter pairs provide accurate predictions of extrinsic noise contributions across a range of noise levels. For these results refer to Appendix 5—tables 3–8. We mention that the results of Case (4)B (Appendix 5—tables 8) may suggest a slight undershoot in the pathway reporter values in comparison to dual reporters. The difference is, however, within one standard deviation, and are calculated from a relatively small sample size due to the complexity of the simulation and corresponding run time. Investigating the presence and possible causes of this marginal effect for this model and other complicated models may be of interest in further work.
Data availability
All methods and simulation results are shared via a github site https://github.com/leham/PathwayReporters (copy archived at https://archive.softwareheritage.org/swh:1:rev:269e0fffe4fc716db6991ccf78ad2191e509c2e1). There is no original data associated with this manuscript.
References

BookHandbook of Mathematical Functions with Formulas, Graphs, and Mathematical TablesMassachusetts, United States: Courier Corporation.

Bursty gene expression in the intact mammalian liverMolecular Cell 58:147–156.https://doi.org/10.1016/j.molcel.2015.01.027

Exact and approximate distributions of protein and mRNA levels in the lowcopy regime of gene expressionJournal of Mathematical Biology 64:829–854.https://doi.org/10.1007/s0028501104335

A stochastic model of gene expression with polymerase recruitment and pause releaseBiophysical Journal 119:1002–1014.https://doi.org/10.1016/j.bpj.2020.07.020

Stochastic gene expression in a single cellScience 297:1183–1186.https://doi.org/10.1126/science.1070919

On a general class of "Contagious" DistributionsThe Annals of Mathematical Statistics 14:389–400.https://doi.org/10.1214/aoms/1177731359

NoncodingRNA regulators of RNA polymerase II transcriptionNature Reviews. Molecular Cell Biology 7:612–616.https://doi.org/10.1038/nrm1946

Special function methods for bursty models of transcriptionPhysical Review. E 102:022409.https://doi.org/10.1103/PhysRevE.102.022409

Extrinsic noise and HeavyTailed laws in gene expressionPhysical Review Letters 124:108101.https://doi.org/10.1103/PhysRevLett.124.108101

Exactly solvable models of stochastic gene expressionThe Journal of Chemical Physics 152:144106.https://doi.org/10.1063/1.5143540

Nonidentifiability of the source of intrinsic noise in gene expression from singleburst dataPLOS Computational Biology 4:e1000192.https://doi.org/10.1371/journal.pcbi.1000192

Solving the chemical master equation for monomolecular reaction systems analyticallyJournal of Mathematical Biology 54:1–26.https://doi.org/10.1007/s002850060034x

Small protein number effects in stochastic models of autoregulated bursty gene expressionThe Journal of Chemical Physics 152:084115.https://doi.org/10.1063/1.5144578

Bursting onto the scene? exploring stochastic mRNA production in bacteriaCurrent Opinion in Microbiology 45:124–130.https://doi.org/10.1016/j.mib.2018.04.001

Stochastic gene expression: modeling, analysis, and identificationIFAC Proceedings Volumes 42:1022–1028.https://doi.org/10.3182/200907063FR2004.00170

A stochastic model for gene inductionJournal of Theoretical Biology 153:181–194.https://doi.org/10.1016/s00225193(05)804217

Disentangling intrinsic and extrinsic gene expression noise in growing cellsPhysical Review Letters 126:078101.https://doi.org/10.1103/PhysRevLett.126.078101

Listening to the noise: random fluctuations reveal gene network parametersMolecular Systems Biology 5:318.https://doi.org/10.1038/msb.2009.75

BookNIST Handbook of Mathematical FunctionsCambridge, United Kingdom: Cambridge University Press.

Markovian modeling of GeneProduct synthesisTheoretical Population Biology 48:222–234.https://doi.org/10.1006/tpbi.1995.1027

Effects of cell cycle variability on lineage and population measurements of messenger RNA abundanceJournal of the Royal Society Interface 17:20200360.https://doi.org/10.1098/rsif.2020.0360

Multiplexed quantification of proteins and transcripts in single cellsNature Biotechnology 35:936–939.https://doi.org/10.1038/nbt.3973

Uncoupling gene expression noise along the central dogma using genome engineered human cell linesNucleic Acids Research 48:9406–9413.https://doi.org/10.1093/nar/gkaa668

Gene regulation at the singlecell levelScience 307:1962–1965.https://doi.org/10.1126/science.1106914

Integrals containing confluent hypergeometric functions with applications to perturbed singular potentialsJournal of Physics A: Mathematical and General 36:7771–7788.https://doi.org/10.1088/03054470/36/28/307

The stochastic nature of biochemical networksCurrent Opinion in Biotechnology 19:369–374.https://doi.org/10.1016/j.copbio.2008.06.011

Consequences of mRNA transport on stochastic variability in protein levelsBiophysical Journal 103:1087–1096.https://doi.org/10.1016/j.bpj.2012.07.015

ConferenceStochastic analysis of gene regulatory networks using moment closure2007 American Control Conference.

General properties of transcriptional time series in Escherichia coliNature Genetics 43:554–560.https://doi.org/10.1038/ng.821

Stochasticity of gene products from transcriptional pulsingPhysical Review. E, Statistical, Nonlinear, and Soft Matter Physics 79:031911.https://doi.org/10.1103/PhysRevE.79.031911

Simultaneous epitope and transcriptome measurement in single cellsNature Methods 14:865–868.https://doi.org/10.1038/nmeth.4380

Stochastic kinetics of nascent RNAPhysical Review Letters 117:128101.https://doi.org/10.1103/PhysRevLett.117.128101

Cellcycle dependence of transcription dominates noise in gene expressionPLOS Computational Biology 9:e1003161.https://doi.org/10.1371/journal.pcbi.1003161
Decision letter

Ramon GrimaReviewing Editor

Patricia J WittkoppSenior Editor; University of Michigan, United States

Ramon GrimaReviewer
In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.
Acceptance summary:
The revisions have substantially improved the manuscript. In particular the authors have clarified the connection of the present work to previous work and shown that their method's accuracy in identifying noise sources does not depend on the details of the gene expression model used. The novel method is based on multiple generic reporters from the same biochemical pathways. Since now it is possible to do simultaneous measurements of transcripts and proteins in single cells, this method offers a viable alternative to the standard dual reporter method, as a means to infer the magnitudes of intrinsic and extrinsic transcriptional noise.
Decision letter after peer review:
[Editors’ note: the authors submitted for reconsideration following the decision after peer review. What follows is the decision letter after the first round of review.]
Thank you for submitting the paper "Pathway dynamics can delineate the sources of transcriptional noise in gene expression" for consideration at eLife. Your article has been reviewed by 3 reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by a Senior Editor. Although the work is of interest, we are not convinced that the findings presented have the potential significance that we require for publication in eLife.
Specifically, some of the main critical comments overlapping between one or more reviews are: the need for experimental confirmation of the approach, major issues with the model assumptions and potential overlap with previously published works. Several recommendations have been made by the referees and we hope you find these useful. However it appears that the changes needed are considerable and the paper would need to be rewritten, hence the decision to reject the paper in its current form.
Reviewer #1:
Ham et al. present a manuscript that attempts to answer two main questions:
(i) Is it possible to identify the relative sources of population heterogeneity from
the measured transcript abundance distribution alone?
(ii) Can one develop a method that can reliably estimate the strength of intrinsic and extrinsic noise and that does not require identical and independent pairs of
gene reporters?
In the context of this paper, extrinsic noise is assumed to be variations in parameters across cells; these variations are static and do not depend on time. The authors then proceed to answer question (i) by writing the observed distribution of transcripts as a compound distribution given by Equation 2 which takes as input the distribution of the telegraph model (that is analytically known in closedform) – the underlying dynamics – and the distribution of parameters across a cellular population – the population heterogeneity. As summarized in Table I, they find that there are various different choices of these 2 distributions that can lead to the same compound (observed) distribution, thus highlighting the issue of nonidentifiability if one only has available transcription abundance distributions. I note that the issue of nonidentifiability is already known and described in the literature, however the authors identify some cases that were previously unknown. This framework is useful and the results are interesting; however the present formalism does not take into account some of the most important and ubiquitous sources of noise, namely those due to cell division (binomial partitioning of products upon division) and the cellcycle (replication and cellcycle length variability) because the authors use the conventional telegraph model that does not account for these phenomena – see Ref 38 where it is shown that these sources of noise can easily mimic that due to transcriptional bursting.
In the second part of the paper, the authors address question (ii). The dual reporter method of Swain et al. (Ref 4) is the standard method to decompose noise into its intrinsic and extrinsic contributions by using two independent reporter genes integrated into the same cell. The assumption that the reporters have identical dynamics is a strong one and presents difficulties in interpreting the results. Ham et al. present an alternative method which does not need dual reporters, rather makes use of 2 pieces of data (for e.g. mRNA and protein) belonging to the same biochemical pathway. The result is based on the decomposition of the covariance of any two variables according to the law of total covariance. This "pathwayreporter method" is shown to be accurate provided there is a small correlation between the reporter pairs – this is a strong assumption, in my view, and it is difficult to make a strong case for it generally. Also when carrying out verification of the method using synthetic data, the correlations between reporter pairs may strongly depend on the details of the model used. For e.g. nascent mRNA maturation to mRNA is in the present paper modeled via a onestep firstorder process but more biologically faithful models (see for example PMID: 27667861 and 33976195) model this via a reaction step with a fixed time delay (modelling elongation + termination). The correlations calculated using the latter are likely stronger than the former because the former assumes exponentially distributed times. My main concern however is that the method presented is similar to another one described in the paper PMID: 22529351 which also uses the law of total covariance to understand how the variation in a species is determined by other species in the same pathway. There is also no comparison of the present method with another common one whereby the extrinsic noise is estimated as the part of the expression for the coefficient of variation that is not dependent on the mean molecule numbers; see for e.g. Ref 38 and Ref. 55 (Figure 2B).
Reviewer #1:
I found this paper very interesting to read and I think there is ample scope for the development of new methods in this field. Hence I am generally supportive of this paper.
My main concerns that I would like the authors to address, in order of importance, are:
(i) Novelty – is their method different than PMID: 22529351? They seem to be based on similar mathematical formalisms.
(ii) Nascent mRNA is more faithfully modeled using a delay step rather than a firstorder step. Using the latter is approximative at best but will certainly decrease the correlations and maybe making your method perform pathwayreporter better than it would otherwise. The issue might be that the inclusion of such a delay step will lead to a nonMarkovian model in which case I am not sure that the same mathematical steps follow
(iii) Cellcycle and cell division effects are strong sources of noise and it would be ideal that these are at least discussed extensively; even better if they can be integrated into a modified telegraph model and then one can ask the same questions about what one can identify from a compound distribution based on Equation 2.
(iv) Discussion of their method vs the other method of obtaining the extrinsic noise from plots of the coefficient of variation squared versus molecule numbers.
(iv) The manuscript should be checked carefully as I found a number of grammatical mistakes scattered throughout.
Reviewer #2:
The paper deals, theoretically, with the question of decomposing noise into intrinsic and extrinsic components. This has previously been done primarily using the dual reporter method. Here, the authors suggest alternative methods that could in principle bypass the need for using two reporters.
I do not find the results very relevant from the biological perspective. The authors only verify their methods on stochastic simulations, where the model assumptions are specified by hand and thus fit perfectly into their theoretical framework. But how the proposed protocols would perform on experimental data remains unclear. The authors also seem unaware of recent work where essentially the same problem is addressed (Lin and Amir, PRL 2021) – inferring extrinsic vs. intrinsic noise from data – and a method not relying on dual reporters is tested on two existing experimental datasets. Related to the above, it was unclear to me precisely what sort of data would be needed in practice to implement their pathwayreporter method – and whether such data are currently available.
Another example of the potential discrepancy between mathematics and working with actual data is the authors' discussion of the identifiability problem. It seems to me that in reality distributions are always subject to noise (e.g. due to sampling errors) and it is unclear to me whether the sort of rigorous analysis they perform in the paper (that shows whether or not identifiability is possible or not in principle, assuming a perfectly measured distribution) is relevant at all to real data. (e.g.. even if two distributions are mathematically different, would it be possible to distinguish them in practice?)
It seems to me that the paper in its current format is a much better fit for specialized journals in biophysics/mathematics such as the biophysical journal or physical review.
Reviewer #3:
It is now wellknown that singlecell expression data exhibits significant celltocell heterogeneity, which can stem from both "extrinsic" factors (e.g. enzyme concentrations, energy content, cellular environment etc.) or "intrinsic" noise (e.g. firing of reactions). Ham et al. rigorously show how extrinsic factors can cause nonidentifiability of a model from single timepoint geneexpression data, and also lead to incorrect conclusions about the underlying process if they are excluded from the model. Then they propose a novel pathway reporter (PR) approach for quantifying the contributions of the extrinsic and the intrinsic factors to the overall celltocell heterogeneity. These results are illustrated with examples based on the telegraph geneexpression model which includes both nascent and mature mRNA. While the developed results are interesting and mathematically accurate, they come with many caveats, and they do not present a significant advance over existing works on this topic. My reasons for this assessment are mentioned below:
1. Nonidentifiability results: The observations regarding nonidentifiability of the dynamic parameters from the compound distribution model are not surprising and even though specific examples in Table 1 are nicely worked out, I do not see how they contribute substantially to the existing knowledge on the subject. Essentially nonidentifiability will hold for nearly all choices of tilde{p} and f, and identifiability, if it exists, is an exception rather than the norm. This is consistent with what the authors observed in the simulation study outlined at the end of page 10.
2. The Pathway Reporter (PR) scheme: The PR scheme is presented as an alternative to the dual reporter (DR) scheme. However PR also suffers from issues similar to the DR scheme:
– DR assumes conditional independence of reporters. Likewise, PR also requires that such an independence (at least at the level of covariance) in order to argue that the intrinsic noise contribution is zero in the reporter covariance. The paper establishes this approximately for certain parameter regimes by using analytical expressions for the steadystate covariances. However generally such neat expressions are not readily available. It is unclear how one can check that this expected covariance is small. Of course if a fully identified model is available, then one can check it with simulations, but then one can simply compute the extrinsic noise directly.
– Another critical assumption of the PR scheme is that the conditional expectation for the reporters given the extrinsic factors Z, should nicely separate into a function of the common parameters and a function of independent parameters. Even though this works for the specific linear geneexpression model considered in the paper, it is unclear if this would work more generally. Moreover having such a splitting is (up to an additional scaling factor) is not much more general than having identical conditional expectations, as needed by the DR approach.
3. Applicability to experimental studies: As the authors mention in the paper, it is difficult to experimentally construct reporters satisfying the conditions of the DR scheme. However the paper does not discuss how for unknown experimental systems one can identify/construct reporters that satisfy the conditions of the PR scheme. In light of my previous comment, this seems to be as challenging as the DR approach, if one does not have a wellcharacterized mathematical model.
4. Lack of a meaningful connection: There is very little connection between the two parts of the paper (nonidentifiability and PR scheme). A more substantial connection is expected, as in the abstract it is said that: "Here we mathematically formalize this nonidentifiability; but we also use this to identify how new experimental setups coupled to statistical noise decomposition can resolve this nonidentifiability."
An example showing how noise decomposition helped turn a nonidentifiable system into an identifiable one is lacking. Moreover, it is unclear how quantification of the extrinsic noise enables resolution of the challenging nonidentifiability issue in the compound model.
5. All the results are presented for a specific geneexpression model and its derivatives. As the paper claims to provide a general approach for analyzing noise sources, more examples need to be provided, for a clearer evaluation of the PR method and comparison with the dual reporter method.
6. The paper only considers single timepoint snapshot data measured at steadystate. What happens if multiple timepoints, including the transience, is also taken into account. Certainly model identifiability would improve and the PR scheme should still work. The authors should consider such examples in the paper.
7. The description of the dual reporter method is misleading – at several places the paper says that the dual reporter method requires "independent genereporters". However only conditional independence is required by the method. If there is full independence, then the covariance (extrinsic noise) would always be zero.
8. On page 21, first paragraph it is mentioned that "For nascentprotein reporters, the normalized intrinsic contribution to the covariance is satisfactorily small (less than 0.05) for (a) high values of δp in unison with (b) low values of λ (less than 1, though lower values are acceptable if δp is small)". However based on the discussion prior/post this statement and also Figure 5, it seems that λ values should be high (not low!) for the normalized intrinsic contribution to the covariance to be small.
https://doi.org/10.7554/eLife.69324.sa1Author response
[Editors’ note: The authors appealed the original decision. What follows is the authors’ response to the first round of review.]
Reviewer #1:
[…] I found this paper very interesting to read and I think there is ample scope for the development of new methods in this field. Hence I am generally supportive of this paper.
My main concerns that I would like the authors to address, in order of importance, are:
(i) Novelty – is their method different than PMID: 22529351? They seem to be based on similar mathematical formalisms.
These are based on similar mathematical formalisms, but beyond this there is no appreciable overlap. To clarify, the primary similarity is that both papers involve an analysis of the covariance between reporters. In our submission, this analysis is between two reporters in the same pathway. In the article PMID: 2259351, this is between reporters in different genes. While it is true that PMID: 2259351 considers dual reporters (not pathway reporters) from multiple levels of a system, we emphasise that this is not the same as an analysis of covariance between single reporters from multiple levels of a system, as we do. There is certainly relevance of the PMID: 2259351 method in our broader discussion of noise decomposition, and we have now included this in the revised manuscript.
(ii) Nascent mRNA is more faithfully modeled using a delay step rather than a firstorder step. Using the latter is approximative at best but will certainly decrease the correlations and maybe making your method perform pathwayreporter better than it would otherwise. The issue might be that the inclusion of such a delay step will lead to a nonMarkovian model in which case I am not sure that the same mathematical steps follow
While we understand that this is not the primary concern of the referee, we agree that it is a point deserving consideration. We would first like to address the validity of the model for nascent mRNA maturation used in the submitted manuscript. This model is standard in the literature and the article PMID: 27667861 states that the model we use is a “reasonable interpretation of the release kinetics" of the maturation process. PMID: 27667861 also notes that the release kinetics are currently not wellunderstood but that the nascent mRNA distribution is insensitive to whether the maturation time is exponentially distributed or is modelled as a deterministic time delay, provided that the transcription rate is significantly greater than the switching (on/off rates) rates; this assumption is generally the case experimentally (refer to Section 14.3 and Figure S3 of the Supplementary Material). Our expectation then is that the nascent/maturemRNA pathway reporter scheme will produce similar results irrespective of the details of the model used.
In order to substantiate this intuition, we have now tested the performance of our method for an alternative model of maturation, where the nascent mRNA maturate after a fixed amount of time. We also explore maturation times sampled from Erlang distributions to account for the fact that maturation can involve several shorter stochastic processes. As expected, we find that pathway reporters continue to correctly identify the extrinsic noise contribution across a range of maturation times; a discussion of these results have now been added to the revised manuscript (see the new section titled “Generality of the Pathway Reporter method"). We mention, in addition, that the issue does not arise for the maturemRNA/protein (or the nascentmRNA/protein) pathway reporter method in our article, which for at least some reasonable parameter ranges, is sufficient to provide good estimates of extrinsic noise.
Furthermore, we have now additionally considered various other gene expression dynamics, including a nuanced model of transcriptional bursting (capturing polymerase recruitment and pause release), as well as models that account for cellcycle effects such as gene replication, dosage compensation, binomial partitioning due to cell division, and cellcycle length variability. In all cases, the pathway reporter method is able to accurately identify noise sources, showing that our approach does not depend on the precise dynamics of mRNA and protein synthesis. In particular, we find that the correlations between pathway reporter pairs do not depend on the details of the gene expression model used. A new section that encompasses these results, as well as addressing the generality of the pathway reporter method has been added to the revised manuscript; please refer to the new section titled “Generality of the Pathway Reporter method" and the supplementary material. Together our results continue to suggest that our method will serve as a useful tool for gene expression analysis.
(iii) Cellcycle and cell division effects are strong sources of noise and it would be ideal that these are at least discussed extensively; even better if they can be integrated into a modified telegraph model and then one can ask the same questions about what one can identify from a compound distribution based on Equation 2.
Cellcycle and cell division/replication effects do indeed have a significant impact on gene expression variability. Cellcycle effects arise predominantly from differences in the transcription rate across the cell population, and can be modelled already by Equation 2 under reasonable assumptions, without modification to the underlying telegraph model. A more explicit treatment of the mechanisms and changes along the cell cycle are challenging to study analytically (see Reference 38 for example), and theoretical models to account for such details are only starting to emerge. However, we agree that an interesting future direction would be to address how taking into account such details in the context of Equation 2 may also affect the inference of biochemical parameters. At this stage, we feel that a more detailed discussion of these effects should be included, and we have now amended the manuscript to reflect this; please refer to the final paragraph of the section titled “The Compound Distribution".
(iv) Discussion of their method vs the other method of obtaining the extrinsic noise from plots of the coefficient of variation squared versus molecule numbers.
The method of obtaining extrinsic noise from such plots is mentioned already in the introduction of the submitted manuscript. Recent work of two of the authors proves that this method is subject to nonidentifiability issues (at least from measurements of transcript abundance); see Ham et al., Phys. Rev. Lett, 2020 (Reference 12 of the submitted manuscript). There it is demonstrated that the coefficient of variation squared as a function of copy numbers cannot distinguish the presence of extrinsic noise.
(iv) The manuscript should be checked carefully as I found a number of grammatical mistakes scattered throughout.
The manuscript has been thoroughly checked, and all grammatical mistakes that we can find have been amended.
Reviewer #2:
The paper deals, theoretically, with the question of decomposing noise into intrinsic and extrinsic components. This has previously been done primarily using the dual reporter method. Here, the authors suggest alternative methods that could in principle bypass the need for using two reporters.
I do not find the results very relevant from the biological perspective. The authors only verify their methods on stochastic simulations, where the model assumptions are specified by hand and thus fit perfectly into their theoretical framework. But how the proposed protocols would perform on experimental data remains unclear.
This is a very valid observation in general concerning the translation from mathematical models to practice. The article considers already some existing databases of transcriptomic data, finding that around 90% of the genes lie within the parameter range for reasonable estimates via our method (and certainly identification of the presence of extrinsic noise), with 70% in the range for accurate measurement. We also give discussion of newly emerging experimental approaches required for the most widely applicable of our pathway combinations; these are in the final discussion of the article.
While for some models we have been able to mathematically prove that our method applies, we have now considered the robustness of our pathway reporter method on more nuanced models of gene transcription. More specifically, we consider models incorporating complex maturation processes (at the response of Referee #1), nuanced models of transcriptional bursting (capturing polymerase recruitment and pause release), as well as incorporating the effects of the cell cycle such as gene replication, dosage compensation, binomial partitioning of products due to cell division and cellcycle length variability. In all cases, the pathway reporter method is able to accurately identify noise sources, showing that our approach does not depend on the precise dynamics of mRNA and protein synthesis. We hope this will help in assuring the referee that the method does not significantly depend on the precise details of the gene expression dynamics, nor on the availability of analytical formulas. A new section that encompasses these results has been added to the revised manuscript; please refer to the section titled “Generality of the Pathway Reporter Method" in the revised version.
The authors also seem unaware of recent work where essentially the same problem is addressed (Lin and Amir, PRL 2021) – inferring extrinsic vs. intrinsic noise from data – and a method not relying on dual reporters is tested on two existing experimental datasets. Related to the above, it was unclear to me precisely what sort of data would be needed in practice to implement their pathwayreporter method – and whether such data are currently available.
We thank the reviewer for drawing our attention to this work. The article by Lin and Amir was yet to appear at the time of submission, which partly explains why the work slipped the attention of the authors. Lin and Amir’s work does offer an alternative approach to decomposing noise, which we now reference in the revised manuscript. The generality of their approach appears comparable to ours, and is demonstrated on what is now a similar range of model variations. The applicability also appears comparable: their approach relies upon lineage (singlecell trajectory) measurements of protein concentration over many generations, and such high throughput measurements have only recently become available. Some of our reporter pairs also require experimental data that is only recently emerging, while others are obtainable from nascent/maturemRNA measurements, which have been routinely achieved for some time.
Another example of the potential discrepancy between mathematics and working with actual data is the authors' discussion of the identifiability problem. It seems to me that in reality distributions are always subject to noise (e.g. due to sampling errors) and it is unclear to me whether the sort of rigorous analysis they perform in the paper (that shows whether or not identifiability is possible or not in principle, assuming a perfectly measured distribution) is relevant at all to real data. (e.g.. even if two distributions are mathematically different, would it be possible to distinguish them in practice?)
We respectfully disagree that mathematically different distributions in general will not be distinguishable in practice: the calibration of models to data is one of the central pillars of the scientific method, and in gene expression in particular, has lead to significant improvements in the understanding of the underlying mechanisms. The nonidentifiability results presented in the manuscript show that the concern expressed by the referee does indeed hold in the case of the analysis of transcript abundance alone. It is a particularly strong instance, as it shows that totally different processes can present identically in experimental data. This is proved using precise distributions, but the manuscript verifies that the effect is robust, and that “for practical purposes any similar distribution will produce a similar effect”; see Figure 2B for example. A direct consequence of our results is that extrinsic noise invariably leads to data appearing more bursty than the true underlying system; we prove this mathematically for the most popular model underlying current gene expression studies, and demonstrate this through simulations for other instances. Discussion of this important observation can be found in the introductory section as well as in the section just mentioned. To ensure that these points are made as clearly as possible, we have now amended some of the phrasing in the manuscript.
It seems to me that the paper in its current format is a much better fit for specialized journals in biophysics/mathematics such as the biophysical journal or physical review.
Reviewer #3:
It is now wellknown that singlecell expression data exhibits significant celltocell heterogeneity, which can stem from both "extrinsic" factors (e.g. enzyme concentrations, energy content, cellular environment etc.) or "intrinsic" noise (e.g. firing of reactions). Ham et al. rigorously show how extrinsic factors can cause nonidentifiability of a model from single timepoint geneexpression data, and also lead to incorrect conclusions about the underlying process if they are excluded from the model. Then they propose a novel pathway reporter (PR) approach for quantifying the contributions of the extrinsic and the intrinsic factors to the overall celltocell heterogeneity. These results are illustrated with examples based on the telegraph geneexpression model which includes both nascent and mature mRNA. While the developed results are interesting and mathematically accurate, they come with many caveats, and they do not present a significant advance over existing works on this topic. My reasons for this assessment are mentioned below:
1. Nonidentifiability results: The observations regarding nonidentifiability of the dynamic parameters from the compound distribution model are not surprising and even though specific examples in Table 1 are nicely worked out, I do not see how they contribute substantially to the existing knowledge on the subject. Essentially nonidentifiability will hold for nearly all choices of tilde{p} and f, and identifiability, if it exists, is an exception rather than the norm. This is consistent with what the authors observed in the simulation study outlined at the end of page 10.
We agree that our nonidentifiability results may not be surprising to some audiences (nonidentifiability issues have already been reported empirically), and we are open about this in the manuscript. This does not, however, undermine the value in proving that they are true and uncovering previously unknown, and more encompassing, nonidentifiability cases. One of the results, for example, has recently formed the basis of the research paper doi.org/10.1101/2020.09.25.312868.
Our nonidentifiability results are in many ways more of an entry point to our main noise decomposition methods. However, there is a very important consequence of this preliminary section: (as noted in the comments to Referee #2) the results demonstrate that extrinsic noise invariably leads to data appearing more bursty than the actual underlying system, and that this holds for all parameter regimes of the standard models. We feel this is a particularly important message to be highlighted, especially to an interdisciplinary community where the analysis of transcriptional processes increasingly relies on calibrating models to experimental data.
2. The Pathway Reporter (PR) scheme: The PR scheme is presented as an alternative to the dual reporter (DR) scheme. However PR also suffers from issues similar to the DR scheme:
– DR assumes conditional independence of reporters. Likewise, PR also requires that such an independence (at least at the level of covariance) in order to argue that the intrinsic noise contribution is zero in the reporter covariance. The paper establishes this approximately for certain parameter regimes by using analytical expressions for the steadystate covariances. However generally such neat expressions are not readily available. It is unclear how one can check that this expected covariance is small. Of course if a fully identified model is available, then one can check it with simulations, but then one can simply compute the extrinsic noise directly.
While we used the specific expressions to prove approximate independence (conditional on the cell environment), the approximate independence can also be verified across a range of models, where analytical expressions are not available. Our computational efforts already provided this corroboration to some extent, and we have now explored a broader range of models, with the same result. Please refer to our response of point (5) below for more details of these newly added models.
– Another critical assumption of the PR scheme is that the conditional expectation for the reporters given the extrinsic factors Z, should nicely separate into a function of the common parameters and a function of independent parameters. Even though this works for the specific linear geneexpression model considered in the paper, it is unclear if this would work more generally. Moreover having such a splitting is (up to an additional scaling factor) is not much more general than having identical conditional expectations, as needed by the DR approach.
The Noise Decomposition Principle presented in the paper is a convenient and powerful framework that appears to be general enough to capture a flexible range of reporter pairs to estimate extrinsic noise. The assumptions are not restrictive. We have focused on the linear gene expression model because it is broadly applicable, and we can use the available analytical formulas to provide a mathematical verification of the method. As in the previous response, however, the computational simulations also provide a demonstration of what our method is capable of measuring. We have now tested the theoretical framework in extensive simulations for a range of other, more nuanced, gene expression dynamics, to explore applicability in the absence of analytical solutions. For these results, refer to the new section in the manuscript titled "Generality of the Pathway Reporter Method". The results show that our approach is remarkably insensitive to the specific dynamics of mRNA and protein synthesis, and therefore should be broadly applicable to a range of experimental scenarios.
3. Applicability to experimental studies: As the authors mention in the paper, it is difficult to experimentally construct reporters satisfying the conditions of the DR scheme. However the paper does not discuss how for unknown experimental systems one can identify/construct reporters that satisfy the conditions of the PR scheme. In light of my previous comment, this seems to be as challenging as the DR approach, if one does not have a wellcharacterized mathematical model.
As discussed, the general approach is verified computationally across a range of models, including those where we cannot rely on precise formulas. Additionally, the models we consider are widely used for identifying modes of gene activity from experimental data. In such situations, our assumptions have been implicitly invoked already, and the consequences of our observations would appear to hold the same validity as other consequences frequently being drawn from experimental data.
4. Lack of a meaningful connection: There is very little connection between the two parts of the paper (nonidentifiability and PR scheme). A more substantial connection is expected, as in the abstract it is said that: "Here we mathematically formalize this nonidentifiability; but we also use this to identify how new experimental setups coupled to statistical noise decomposition can resolve this nonidentifiability."
An example showing how noise decomposition helped turn a nonidentifiable system into an identifiable one is lacking. Moreover, it is unclear how quantification of the extrinsic noise enables resolution of the challenging nonidentifiability issue in the compound model.
We initially show that mRNA measurements alone cannot be used to identify even the presence of extrinsic noise; we then proceed to show that with pathway reporters it can; and moreover, we can measure its strength. We feel that this is a very clear connection. The pathway reporter scheme resolves nonidentifiability to the extent that the presence of extrinsic noise can be identified and measured, whereas from copy number data alone it cannot, as we demonstrate in the first part of the manuscript. There is no claim in the article that we can identify the precise noise distribution in a compound model, and we are not sure this can be done in general, though further consideration of that claim will be an interesting topic for future work. It is, we believe, important to be able to demonstrate convincingly and unequivocally the presence of extrinsic noise from experimental data.
5. All the results are presented for a specific geneexpression model and its derivatives. As the paper claims to provide a general approach for analyzing noise sources, more examples need to be provided, for a clearer evaluation of the PR method and comparison with the dual reporter method.
We thank the reviewer for this suggestion, as we believe the addition of a more robust evaluation of the pathway reporter method has substantially improved the paper. We have now considered various other gene expression dynamics, including detailed models of the maturation process (at the suggestion of Referee #1), a more nuanced model of transcriptional bursting (capturing polymerase recruitment and pause release), as well as accounting for the cellcycle effects such as gene replication, dosage compensation, binomial partitioning of products due to cell division, and cellcycle length variability. In all cases, the pathway reporter method is able to accurately identify noise sources, showing that our approach does not depend on the precise dynamics of mRNA and protein synthesis. In particular, we find that the correlations between pathway reporter pairs do not depend on the details of the gene expression model used. A new section that encompasses these results, as well as addressing the generality of the pathway reporter method has been added to the revised manuscript. Together our results continue to suggest that our method will serve as a useful tool for gene expression analysis.
6. The paper only considers single timepoint snapshot data measured at steadystate. What happens if multiple timepoints, including the transience, is also taken into account. Certainly model identifiability would improve and the PR scheme should still work. The authors should consider such examples in the paper.
The extension to multiple time points is certainly of interest to us. In previous and related work, we have considered time course data for cells and the identifiability of model characteristics (PMID:21551095) in contrast to snapshot data. So this is definitely an avenue for further exploration that we intend to follow. But in the submitted manuscript, the focus is on the development of an alternative noise decomposition for snapshot data as such data sets are becoming readily available. The analysis of the transient behaviour of dissipative dynamical systems in the presence of noise requires a much more detailed analysis than we attempt here and is clearly outside the scope of what can be dealt with here.
7. The description of the dual reporter method is misleading – at several places the paper says that the dual reporter method requires "independent genereporters". However only conditional independence is required by the method. If there is full independence, then the covariance (extrinsic noise) would always be zero.
We have now amended the manuscript to make this implicit understanding explicit.
8. On page 21, first paragraph it is mentioned that "For nascentprotein reporters, the normalized intrinsic contribution to the covariance is satisfactorily small (less than 0.05) for (a) high values of δp in unison with (b) low values of λ (less than 1, though lower values are acceptable if δp is small)". However based on the discussion prior/post this statement and also Figure 5, it seems that λ values should be high (not low!) for the normalized intrinsic contribution to the covariance to be small.
We thank the reviewer for drawing our attention to this. It appears that this error was introduced at some stage in preparation (it is expressed exactly as the referee suggests in the original BioaRχiv version of the manuscript, for example). We regret this potential cause for confusion, and have carefully checked and corrected all typographical mistakes that we can find in the paper.
https://doi.org/10.7554/eLife.69324.sa2Article and author information
Author details
Funding
University of Melbourne (DRM)
 Lucy Ham
 Michael PH Stumpf
Volkswagen Foundation (93 062)
 Michael PH Stumpf
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
The authors gratefully acknowledge Rowan D Brackston for helpful discussions in the early stages of this research. We thank Lior Pachter and Gennady Gorin for fruitful discussions on noise identifiability. We also wish to thank Arjun Raj for providing valuable feedback on this work. LH. and MPHS. were supported by the University of Melbourne Driving Research Momentum initiative.
Senior Editor
 Patricia J Wittkopp, University of Michigan, United States
Reviewing Editor
 Ramon Grima
Reviewer
 Ramon Grima
Version history
 Preprint posted: September 30, 2020 (view preprint)
 Received: April 12, 2021
 Accepted: October 11, 2021
 Accepted Manuscript published: October 12, 2021 (version 1)
 Version of Record published: November 22, 2021 (version 2)
Copyright
© 2021, Ham et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 2,603
 Page views

 361
 Downloads

 14
 Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Chromosomes and Gene Expression
A new in vitro system called RecSeq sheds light on how mRNA molecules compete for the machinery that translates their genetic sequence into proteins.

 Chromosomes and Gene Expression
 Structural Biology and Molecular Biophysics
Histone posttranslational modifications (PTMs) play a critical role in chromatin regulation. It has been proposed that these PTMs form localized ‘codes’ that are read by specialized regions (reader domains) in chromatinassociated proteins (CAPs) to regulate downstream function. Substantial effort has been made to define [CAP: histone PTM] specificities, and thus decipher the histone code and guide epigenetic therapies. However, this has largely been done using the reductive approach of isolated reader domains and histone peptides, which cannot account for any higherorder factors. Here, we show that the [BPTF PHD finger and bromodomain: histone PTM] interaction is dependent on nucleosome context. The tandem reader selectively associates with nucleosomal H3K4me3 and H3K14ac or H3K18ac, a combinatorial engagement that despite being in cis is not predicted by peptides. This in vitro specificity of the BPTF tandem reader for PTMdefined nucleosomes is recapitulated in a cellular context. We propose that regulatable histone tail accessibility and its impact on the binding potential of reader domains necessitates we refine the ‘histone code’ concept and interrogate it at the nucleosome level.