1. Computational and Systems Biology
  2. Genetics and Genomics
Download icon

Intrinsic adaptive value and early fate of gene duplication revealed by a bottom-up approach

  1. Guillermo Rodrigo  Is a corresponding author
  2. Mario A Fares
  1. CSIC – UPV, Spain
  2. CSIC – UV, Spain
  3. University of Dublin, Ireland
Research Article
  • Cited 0
  • Views 873
  • Annotations
Cite as: eLife 2018;7:e29739 doi: 10.7554/eLife.29739

Abstract

The population genetic mechanisms governing the preservation of gene duplicates, especially in the critical very initial phase, have remained largely unknown. Here, we demonstrate that gene duplication confers per se a weak selective advantage in scenarios of fitness trade-offs. Through a precise quantitative description of a model system, we show that a second gene copy serves to reduce gene expression inaccuracies derived from pervasive molecular noise and suboptimal gene regulation. We then reveal that such an accuracy in the phenotype yields a selective advantage in the order of 0.1% on average, which would allow the positive selection of gene duplication in populations with moderate/large sizes. This advantage is greater at higher noise levels and intermediate concentrations of the environmental molecule, when fitness trade-offs become more evident. Moreover, we discuss how the genome rearrangement rates greatly condition the eventual fixation of duplicates. Overall, our theoretical results highlight an original adaptive value for cells carrying new-born duplicates, broadly analyze the selective conditions that determine their early fates in different organisms, and reconcile population genetics with evolution by gene duplication.

https://doi.org/10.7554/eLife.29739.001

Introduction

Gene duplication has enthralled researchers for decades due to its link to the emergence of major evolutionary innovations in organisms of ranging complexity (Ohno, 1970). The key aspect to deeply understand this process concerns the early stage, when the fate of the new-born gene is decided (Innan and Kondrashov, 2010). A classical theory predicts the fixation of duplicated genes in the population under neutral selective conditions (i.e. by random genetic drift; Kimura, 1983; Lynch and Conery, 2003). Hence, the loss of the new-born gene is the most common evolutionary fate. Once a duplicate is fixed, it is generally accepted that genetic redundancy leads to relaxed selection constraints over one or both gene copies, which increases the load in mutations (Lynch and Conery, 2000; Keane et al., 2014). In rare occasions, this evolutionary process leads to the origin of a novel, previously unexplored function by one of the gene copies (Conant and Wolfe, 2008).

However, because gene duplication can impose a cost to the cell by requiring additional resources for expression (Wagner, 2005; Lynch and Marinov, 2015; Price and Arkin, 2016), especially in simple organisms, purifying selection could preclude that fixation. Gene duplication can also unbalance tightly regulated pathways that are instrumental for the cell (Papp et al., 2003; Birchler et al., 2005), leading to diseases in complex organisms (Tang and Amon, 2013). A possible rationale that has been long recognized is that those duplicated genes that were fixed in the population immediately contributed with an adaptive value to the organism (Innan and Kondrashov, 2010). Even though, it is still stunningly unclear to what extent natural selection could also take part in the process that drives the fixation, and also initial maintenance, of duplicated genes according to population genetics (Lynch, 2007).

Two basic hypotheses have been proposed to explain the selective advantage of duplicated genes. First, a higher gene expression level resulting from duplication could be favorable (Riehle et al., 2001). This hypothesis requires that the ancestral system (pre-duplication) is far from the optimal operation point; as far as to assert that nearby 100% expression increase is beneficial. This seems plausible in extreme circumstances, but not in routine environments for which the organism should be adapted (King and Masel, 2007). It is then not surprising that many of the reported examples in which a greater gene copy number is favorable relate to sporadic, mainly stressing environments (Riehle et al., 2001; Gonzalez et al., 2005). Arguably, if a duplicate were fixed in one of these environments, it would be rapidly removed by purifying selection once the extreme circumstance ceased. Moreover, beneficial single-point mutations occurring in the cis-regulatory region of the gene of interest would be mostly sufficient to face several environmental changes (Wray, 2007). Thus, this model is insufficient to clarify the origin of most duplications, although it could explain some particular cases.

Second, the functional backup provided by the second gene copy upon duplication may allow the rapid accumulation of beneficial mutations, either to develop a novel function (Zhang et al., 1998; Bergthorsson et al., 2007), or to escape from the conflict of optimizing alternative functions (Hittinger and Carroll, 2007; Des Marais and Rausher, 2008). The positive selection of these mutations may of course occur, as suggested by the dN/dS values (>1) reported for different genomic sequences (Han et al., 2009; Fischer et al., 2014). This requires, nevertheless, that the frequency of cells carrying a second gene copy in the population increases to a point at which a mutation in the duplicate is likely to be found; a condition that is not met during the critical very initial phase following duplication (Lynch et al., 2001). Therefore, such adaptive processes, although important for the long-term maintenance of duplicates, do not contribute much to increase their fixation probabilities.

In addition to these two hypotheses, it has been proposed that gene duplication could allow compensating for errors in the phenotypic response due to a loss of expression caused by genotypic or phenotypic mutations (Clark, 1994; Nowak et al., 1997; Wagner, 1999). This model needs to invoke high error rates to have an impact at the population level from the beginning, and then to reach prevalence of genotypes with duplication by overcoming genetic drift. Errors in phenotype could also be caused by stochastic fluctuations in gene expression (Elowitz et al., 2002; Balázsi et al., 2011), with gene duplication eventually reducing the amplitude of such fluctuations (Kafri et al., 2006; Lehner, 2010; Rodrigo and Poyatos, 2016). But this strategy works on average, that is, duplication may warrant more accuracy when multiple decisions in gene expression are considered. Thus, it is not obvious whether an individual (or some) with duplication is able to invade a population, especially in a fluctuating environmental context. This is a key, largely unexplored question that may preclude the support of this idea. Other mechanistic models have been proposed beyond the demand for increased expression or the accumulation of beneficial mutations (Innan and Kondrashov, 2010), yet do not convincingly resolve the main population genetic dynamical issue.

In this work, we tested the idea of error buffering to reveal the adaptive value that gene duplication has per se. Subsequently, we developed a comprehensive model to explain the early fate of duplicates compatible with population genetics (Lynch et al., 2001; Lynch, 2007), global gene expression patterns (Qian et al., 2010; Gout and Lynch, 2015; Cardoso-Moreira et al., 2016; Lan and Pritchard, 2016), and unexpected gene copy number variation rates (Reams et al., 2010; Schrider et al., 2013). To this end, instead of performing a conventional sequence analysis (top-down approach), we followed a very precise quantitative framework, based on biochemistry, to study the goodness of having a second gene copy for the cell without functional divergence (bottom-up approach). Using a gene of Escherichia coli (lacZ) as a model system from which to apply our theory, we showed, without loss of generality, that the sum of two different, partially correlated responses allows reducing gene expression inaccuracies (Rodrigo and Poyatos, 2016); inaccuracies that are a consequence of the inherently stochastic nature of all molecular reactions underlying gene expression (Raser et al., 2004; Carey et al., 2013) and suboptimal gene regulation (Dekel and Alon, 2005; Price et al., 2013). Here, we considered intrinsic and extrinsic noise sources (Elowitz et al., 2002), that is, stochastic fluctuations that are specific of a gene and fluctuations that are unspecific, so gene duplication is expected to only buffer intrinsic fluctuations. In turn, cell fitness can weakly increase on average, if such errors in gene expression are costly (Wang and Zhang, 2011); that is, a stochastic fluctuation may take the system far from the optimal operation point if the system is deterministically centered in this point), and then genotypes with duplication can be fixed in the population. We further studied the genetic and environmental conditions that are more favorable for the selection of gene duplication.

Results

Quantitative biochemical view of a fitness trade-off

In cellular systems, fitness trade-offs arise because beneficial actions involve costs. Fitness is a complex figure integrating multiple components, so the enhancement of one component (vital attribute) usually comports the reduction of another component (e.g. stress resistance vs. reproductive success; Casanueva et al., 2012). This is critically revealed when the environment changes, as the relevance of each component mostly depends on the external conditions. Such components can be described in different ways according to the problem. A paradigmatic and simple fitness trade-off emerges when a given enzyme needs to be expressed to metabolize a given nutrient present in the environment (Figure 1a,b,c). On the one hand, the cell growth rate (here taken as a metric of fitness; Elena and Lenski, 2003) increases as long as the enzyme metabolizes the nutrient. On the other hand, the enzyme expression produces a cost to the cell (i.e. reduces its growth rate). Therefore, the enzyme expression needs to be very precise to warrant an optimal or near-optimal behavior (cost-benefit analysis). To solve this issue, regulations (mainly transcriptional) evolved to link enzyme expression inside the cell with nutrient amount available in the environment. An example of this paradigmatic system is the well-known lactose utilization network of E. coli (Jacob and Monod, 1961), where lactose (nutrient, environmental molecule) activates, through inhibition of LacI (transcription factor), the production of LacZ (enzyme). We used this model system to apply a theoretical framework (see Materials and methods) in order to reveal the intrinsic adaptive value of gene duplication under a fitness trade-off, as this system has been quantitatively characterized (Dekel and Alon, 2005; Kuhlman et al., 2007; Eames and Kortemme, 2012).

Fitness trade-off related to metabolic benefit and expression cost.

(a) Scheme of a paradigmatic genetic system, coupling regulation and metabolism, where a given environmental signal determines the physiology of the cell. The environmental molecule can be metabolized by the cell, and it can also activate transcriptionally the expression of enzymes. A particular case is the lactose utilization system of E. coli. (b) Scheme of the same system with gene duplication. (c) Illustrative chart of the fitness trade-off showing four different cellular regimes. When the signal molecule (lactose) is not present in the medium, the expression of the enzyme (LacZ) is not required. However, when the signal molecule is present, the enzyme is required for its metabolic processing. (d) Fitness (W) landscape as a function of lactose (contributing to the benefit, x denotes its concentration) and LacZ (contributing to both the benefit and the cost, y denotes its concentration). This was experimentally determined. x0 denotes the lactose EC50 on LacZ expression, so x/x0 is a normalized lactose concentration. (e) Dose-response curve between lactose and LacZ. The solid line corresponds to the actual regulation (experimentally determined), whilst the dashed line corresponds to a hypothetical optimal regulation (obtained by imposing dW/dy = 0). (f) Sensitivity to changes in lactose dose, either in fitness (dW/dx, solid line) or in LacZ (dy/dx, dashed line), characterizing the nonlinear phenotypic plasticity of the cell. Each curve is normalized by its maximum. This also measures sensitivity to molecular noise. The region where information transfer is high is shaded.

https://doi.org/10.7554/eLife.29739.002

Cell fitness increases monotonically with lactose dose (following a Michaelis-Menten kinetics), but presents an optimum with LacZ expression (Figure 1d). This is because lactose does not introduce a cost into the system, but LacZ does. Here, we simply considered a cost function based on LacZ expression (i.e. more expression, more cost), with a marginal cost of 0.036 in the units of the model (Dekel and Alon, 2005). However, it would be more precise to have a cost function based on lactose permease (LacY) activity (Eames and Kortemme, 2012), another gene in the lac operon in charge of the uptake, rather than on LacZ expression. The regulation of the system appears to be quite accurate, as the actual and optimal dose-response curves roughly match (Figure 1e). By generating different dose-response curves with values of x0 (lactose EC50 on LacZ) between 0.01 and 1 mM, we found that most of them deviate from the optimal one (p = 0.02; Euclidean distance as a metric). This entails great phenotypic plasticity of the cell to cope with lactose variations. However, plasticity is not equal for all environmental changes. Whilst the system (in terms of LacZ expression or cell fitness) reaches optimal sensitivity at intermediate doses, it is quite insensitive at very low or very high doses, where lactose-LacZ information transfer falls down (Figure 1f).

Gene duplication helps to better resolve the fitness trade-off

The LacZ expression in E. coli involves a variety of noisy actions, such as the LacI expression, the LacI-DNA binding, the RNA polymerase-DNA binding, and the transcriptional elongation process (Elowitz et al., 2002; Raser et al., 2004; Carey et al., 2013). The resulting stochastic fluctuations in expression can have an impact on fitness (Figure 2). Using a simple mathematical model, we simulated the stochastic LacZ expression of the wild-type system for a varying lactose dose (Figure 3a,b). The magnitudes of the stochastic fluctuations were chosen as to end in typical variations of lactose EC50 of 10–100%, up or down, resulting in values of gene expression noise, around 0.5, compatible with experimental results (Elowitz et al., 2002). At a given dose, these simulations would correspond to different single-cell responses. We also considered a system with two copies of the lacZ gene, with total expression equal to the previous one-copy system, and simulated its stochastic response (Figure 3c). For the moment, we ensured gene dosage sharing to evaluate in a quantitative way the goodness of having a second gene copy for the cell without invoking the need for more expression. We observed that the system with gene duplication produces a more accurate response (i.e. a response closer to the deterministic one), highlighting the role of gene copy number in noise buffering (Rodrigo and Poyatos, 2016).

Schematics of cell fitness as a function of gene expression.

Fitness function can (a) present a maximum, (b) be flat, or (c) present a minimum. Depending on the local shape, stochastic fluctuations in expression can be costly, beneficial, or neutral.

https://doi.org/10.7554/eLife.29739.003
Selective advantage of gene duplication. 

(a) Block diagram of the system. Gene expression is calculated by means of a stochastic function, whilst fitness by means of a deterministic one. (b, c) Single-cell responses at different lactose doses (stochastic simulations, noise amplitudes of ηin = 0.5 and ηex = 0). Lactose and LacZ concentrations are denoted by x and y, respectively. The solid white line corresponds to the deterministic simulation. In b) the genotype contains a single copy of lacZ gene, whilst in c) it contains two copies. The value of mutual information (I) is shown in both cases: 1.29 bits of information in case of a singleton and 1.58 bits in case of a duplicate (about 25% increase in fidelity, significance assessed by a z-test, p ≈ 0 with 104 points). (d) Selection coefficient (S) of a genotype with two copies of lacZ gene over another with just one copy. The mean selection coefficient is shown (dashed line). Skewness coefficient of 2.63. W values calculated from x, y values shown in b, c). (e) Fitness (W) as a function of LacZ (constant x = 0.2 mM), showing the distributions of expression (boxplots) in case of one or two gene copies. The actual LacZ expression is shown (dashed line). (f) Dose-response curve between lactose concentration and the median LacZ expression (⟨y⟩). The solid lines correspond to the actual responses in case of one (black) or two (gray) gene copies (ηin = 0.5 and ηex = 0), whilst the dashed line corresponds to the optimal response. (g) Mean selection coefficient (⟨S⟩) landscape of gene duplication as a function of the median lactose dose (⟨x⟩, fluctuating dose) and the amplitude of intrinsic noise (ηin, with fixed ηex = 0.3). In all these plots, the expression levels of the duplicates with respect to the singletons are equal (ymax,1 = ymax,2 = 0.5).

https://doi.org/10.7554/eLife.29739.004

In addition, we calculated the proposed fitness function for each single-cell response. Small gene expression inaccuracies (e.g. an excess of enzyme for the available substrate) can be perceived as a consequence of a hill-like fitness landscape in terms of the genotype-environment interaction (Figure 1d). To properly compare how each system of study resolves the fitness trade-off, we then calculated the selection coefficient for each response. We found a skewed distribution, peaked at 0 and with a positive mean of 0.08% (Figure 3d). This entails that phenotypic responses generated by duplicated genes give, on average, higher fitness values than responses generated by singleton genes. To better illustrate this fact, we represented cell fitness as a function of LacZ expression (Figure 3e), uncovering two reasons by which gene duplication is adaptive. In first place, the variance of the stochastic fluctuations (noise) in gene expression is reduced by 50% upon duplication (Wang and Zhang, 2011); when only intrinsic fluctuations are considered). However, when both intrinsic and extrinsic fluctuations are considered, the variance is reduced by 15–25%. In any case, this increases fitness on average, because the system displays a near-optimal behavior in the deterministic regime, thus fluctuations are costly. In second place, the population response upon duplication is slightly closer to the optimal operation point (Figure 3e,f). The model-based median dose-response curve (corresponding to the experimental response at the population level) is sigmoidal and has a Hill coefficient of 4 (Dekel and Alon, 2005). This results in a slope (LacZ vs. normalized lactose) of 1, calculated as n/4 at x0 (n is the Hill coefficient). This slope is higher than the slope coming from the optimal dose-response curve, which is 0.47 at x0. However, when duplication is considered (maintaining the same expression levels), the median dose-response curve shows a slope of 0.75 (corresponding to an effective Hill coefficient of 3) also at x0 (Figure 3f). This is because, in this case, the actual dose-response curve is more nonlinear than the optimal one, a feature that can indeed be amended by genetic redundancy (Gammaitoni, 1995; Rodrigo and Poyatos, 2016).

Finally, we calculated how much selection exists, on average, as a two-dimensional function of the magnitude of intrinsic noise and the concentration of lactose in the medium (Figure 3g). This highlights the fundamental link between noise reduction in gene expression and selective advantage (cell fitness). More in detail, we found that the higher the intrinsic noise, the higher the adaptive value of gene duplication. This is because intrinsic noise generates the required heterogeneity between the responses of the two gene copies to limit large stochastic fluctuations in the total gene expression. We also found that there is a maximal adaptive value of gene duplication at intermediate lactose doses, where the sensitivity of the system is the highest (Figure 1f). Out of this regime, the stochastic fluctuations, according to our simple mathematical model, have less impact on the phenotype (Blake et al., 2006).

Gene duplication can be positively selected in a population thanks to more accurate responses

If gene duplication enhances cell fitness on average, viz., by reducing gene expression inaccuracies, it would be expected a positive selection of this trait in a population (Kimura, 1983). To verify this assumption, we performed experiments of in silico evolution (see Materials and methods), where a mixed population of cells carrying singletons and duplicates was monitored, considering equal LacZ expression in both types of cells (Figure 4a). The population was left to evolve without introducing any bias, with time-dependent stochastic fluctuations in gene expression uncorrelated from cell to cell. For simplicity, we simulated a scenario of experimental evolution (Elena and Lenski, 2003; Dekel and Alon, 2005), although the dynamics in nature might be more complex. We found that the frequency of cells carrying duplicates in the population increases with time, and that such an increase is well predicted by population genetic dynamics with the mean selection coefficient (Figure 4b). Notably, this points out that this parameter, which can be mathematically calculated a priori, is sufficient to capture all the complexity underlying the stochastic evolutionary dynamics of the system (Hegreness et al., 2006).

In silico evolution experiments.

(a) Scheme of an evolutionary procedure, where serial dilution passages are applied, to assess the performance in a cell population of a genotype with two copies of lacZ gene over another with just one copy. (b) Time-dependent frequency of cells with gene duplication (f). Open circles and error bars correspond to experiments of in silico evolution (mean and standard deviation of three replicates) with an initial frequency of f0 = 0.5, fluctuating lactose dose, and noise levels of ηin = 0.5 and ηex = 0. The solid line corresponds to the theoretical prediction. (c) f at 1000 generations (f1000) as a function of the amplitude of intrinsic noise (ηin). Experiments and prediction with f0 = 0.5 and ηex = 0. The dashed line corresponds to the theoretical prediction with ηex = 1. (d) f1000 as a function of the median lactose dose (⟨x⟩). Experiments and prediction with f0 = 0.5, ηin = 0.5 and ηex = 0.5. (e) f1000 as a function of the lactose fluctuation amplitude (Δx). Δx = 0 corresponds to constant lactose dose. Experiments and prediction with the same values of f0, ηin and ηex as in d). Three replicates were also considered in c, d, e). In all these plots, the expression levels of the duplicates with respect to the singletons are equal (ymax,1 = ymax,2 = 0.5).

https://doi.org/10.7554/eLife.29739.005

In addition, we studied the effect of the magnitude of molecular noise. We distinguished between intrinsic and extrinsic noise (Elowitz et al., 2002). As predicted from our previous results, we found that the higher the intrinsic noise of the system, the higher the frequency of gene duplication in the population (Figure 4c). By contrast, the higher the extrinsic noise, the lower the frequency (Figure 4c), as this type of noise affects in the same way the responses of the two copies. Note that there is no gain following duplication when only extrinsic noise is considered. Furthermore, we studied the effect of the environment (lactose dose). As predicted, we found an intermediate median dose at which the frequency of gene duplication in the population is the highest (Figure 4d). We also found that the higher the variance, the lower the frequency (Figure 4e). This is because, when lactose fluctuates from very low to very high doses, the signal-to-noise ratio is large enough to warrant a relatively accurate response with just one gene copy (Hansen et al., 2015). Of relevance, the population genetic dynamics in all these cases, with the corresponding mean selection coefficients, correctly explained the reported frequencies.

Fixation is conditioned by the unexpected recurrence of formation and deletion of duplicates in a population

Gene duplicates can be spontaneously produced, through different mechanisms (Hastings et al., 2009), at very high rates in the cell. These rates, measured from experiments of mutation accumulation, go from 10−4 dup./gene/gen. in prokaryotes (Reams et al., 2010) to 10−7 dup./gene/gen. in higher eukaryotes (Schrider et al., 2013). Once produced, most of these duplicates are deleted as they are unstable, with a rate that appears to be higher than the formation rate (Reams et al., 2010; Schrider et al., 2013). In the particular case of the lacZ gene, we have a formation rate of 3·10−4 dup./gene/gen. and a deletion rate of 4.4·10−2 -/gene/gen. (in a single bacterial cell; data for Salmonella enterica). Therefore, gene duplication can be understood as a recurrent process that reaches an equilibrium point given by the ratio between the formation and deletion rates (Figure 5a), neglecting fitness effects. This equilibrium point would be lower if fitness effects (mostly detrimental) were considered. This entails about 2·105 cells carrying lacZ duplicates in a typical E. coli population of 2·108 cells in nature (Lynch et al., 2016; that is, frequency of about 0.1%). This surprising scenario has an immediate consequence, viz., duplicated genes cannot be fixed in the population by drift under neutral selective conditions (Figure 5b); a result already anticipated (Clark, 1994) in clear discrepancy with the conventional wisdom (Lynch, 2007). Indeed, the formation-deletion balance would always take the system to the same equilibrium point.

Gene duplication leading to double expression.

(a) Scheme of the formation-deletion balance in gene duplication. (b) Time-dependent frequency of cells with gene duplication (f) when the formation and deletion rates of a second lacZ copy are considered. Sequence remodeling was not taken into account. The solid line corresponds to a scenario of neutrality, whilst the dashed line corresponds to a scenario of positive selection (with S = 10%). (c) Schematics of fitness as a function of expression showing the effect of gene duplication. Two scenarios are considered: deleterious duplication (left; hill-like fitness landscape) and nearly neutral duplication (right; quasi-flat fitness landscape). (d) Mean selection coefficient (⟨S⟩) as a function of lactose dose upon lacZ duplication doubling gene expression (ymax,1 = ymax,2 = 1). The solid line corresponds to noise levels of ηin = ηex = 0.3 (moderate), whilst the dashed line corresponds to ηin = ηex = 1 (high). (e) Identification of effectively neutral selective conditions (when |⟨N⟩·⟨S⟩|<1, region shaded) in terms of gene expression (y) and genome size (G), which determines the effective population size (⟨N⟩). In this context, no benefit was considered (a = 0), with moderate noise levels.

https://doi.org/10.7554/eLife.29739.006

However, the preceding argument only focuses on a static picture, ignoring the dynamics of the genetic process. In bacteria (lacZ gene), the time to reach the equilibrium point is about 68 generations (three times the inverse of the deletion rate), which is a relatively short transient period. By contrast, in flies (Drosophila melanogaster), the formation rate is of 10−7 dup./gene/gen. and the deletion rate of 10−6 -/gene/gen. (Schrider et al., 2013). Although this would yield equilibrium frequencies up to 10%, the transient periods would be longer than 106 generations (0.2 Ma in natural conditions; Pool, 2015). Fixation could then happen by drift, as their effective population sizes are of 106 flies (Lynch et al., 2016), although not persistently. Note that the inverse of this number indeed specifies an upper limit for the deletion rate. In addition, the formation-deletion balance could be shifted if further genome rearrangements affecting duplicated genes were considered, such as gene relocation (about 10−11 fixed rearr./gene/gen. for D. melanogaster; Ranz et al., 2001). In effective terms, gene relocation would reduce the deletion rate, and, consequently, fixation would be more likely (Wong and Wolfe, 2005). Such a relocation would also shift the intrinsic-extrinsic noise balance toward more uncoupled responses (Becskei et al., 2005), which could enhance the benefit by intrinsic noise reduction.

Most of the new-born duplicates lead to increased expression and are costly for the cell

So far, we have demonstrated that a duplicated gene offers a selective advantage provided the total gene expression level is maintained, with one or two copies (gene dosage sharing). However, this condition is not usually met during the critical very initial phase, when the duplicate has just born. In general, we can assume that the expression level is doubled upon duplication, although this may vary due to the particular position in the chromosome of the duplicated gene and the type of cell (Stranger et al., 2007). Certainly, an increase of expression due to gene duplication is detrimental in most environments (Figure 5c,d; Price and Arkin, 2016), thus positive or neutral selective conditions are difficult to invoke to explain the fixation of these type of genotypic changes, mainly in prokaryotes and lower eukaryotes (Lynch and Marinov, 2015). For instance, at constant 0.13 mM lactose, we obtained mean selection coefficients between −28% (at very high noise levels) and −1% (at no noise) upon duplication of the lacZ gene (assuming double expression), which yield negligible fixation probabilities (almost 0) for a sufficiently large bacterial population. It can be argued, nevertheless, that the cost of over-expression decreases as long as the genome size increases (Lynch and Marinov, 2015). This assumption, together with the negative correlation between complexity and population size (Lynch and Conery, 2003), makes effectively neutral selective conditions plausible to rationalize the fixation of duplicates that are expressed (e.g., essential genes) in higher eukaryotes (Figure 5e; Makino et al., 2009).

Only in absence of lactose, when the enzyme is not needed, the duplication is strictly neutral (no benefit, no cost due to regulation). But neutral selective conditions can be reached de facto if the absolute value of the selection coefficient is lower than the inverse of the effective population size (Kimura, 1983). This condition is challenging for prokaryotes, as their population sizes are very large (Lynch and Conery, 2003). In our particular case, we obtained mean selection coefficients in the order of −10−10 (at moderate noise levels) when the nutrient amount is scarce (1 μM lactose), which could favor the fixation of a lacZ duplicate by genetic drift.

Gene dosage sharing upon duplication, fitness increase on average, and estimation of the fixation probability

Can a cell carrying a new-born duplicate that is expressed (in principle, in an operation point close to a local optimum) overcome the cost of an additional copy and then invade the population without invoking the need for more expression (to face an extreme environment)? We here predicted that the genetic variability existing in a population would allow reaching adaptive gene duplications (Figure 6a). Mutations in the cis-regulatory region of the lacZ gene may change its wild-type expression level. According to previous results (Otwinowski and Nemenman, 2013), the distribution of mutations in terms of maximal promoter activity is peaked at 1, but skewed to the left (Figure 6b). This indicates that about 10% of them yield cells with nearby 50% lower expression. Thus, if a gene duplication event occurred in one of these cells, the genotypic change would be selectively advantageous (Figure 6c). The frequency of such cells in the population depends, of course, on the mutation rate; the greater the ability to generate genetic diversity, the higher the chances to reach adaptive duplications. For E. coli, where the per base mutation rate is of 10−10 mut./bp/gen. (Lee et al., 2012), this frequency can be estimated in 10−9 (i.e. 0.2 mutants with nearby 50% lower expression per generation in a natural population of 2·108 cells). Hence, the probability that a duplication and such a mutation concur in the same cell in a generation (duplication after promoter mutation) is of 10−4 (=0.2·10−3/2; i.e., 1 suitable concurrence each 104 generations).

Gene duplication leading to maintained expression. 

(a) Schematics of fitness as a function of expression showing a path to reach adaptive gene duplications without the need for more expression. Two steps are considered: first a base-pair mutation that reduces in half the expression level, and then a duplication that recovers the ancestral level. (b) Distribution of the activity of lac promoter mutants based on experimental data, as the maximal LacZ expression (ymax, irrespective of lactose dose). The mean activity is shown (dashed line). Skewness coefficient of −0.68. (c) ⟨S⟩ of the promoter mutants versus the wild-type system (solid line), with fluctuating lactose dose and high noise levels. The dashed line corresponds to the comparative between promoter mutants that duplicated the lacZ gene and the wild-type system. (d) Fixation probability (Pfix) of gene duplication as a function of the mutation rate of the cell (μ), with ⟨S⟩ = 0.19% and ⟨N⟩ = 2·108. (e) ⟨S⟩ as a function of the expression imbalance between the two lacZ copies (ymax,1 / ymax,2), when the system recovers its ancestral expression levels (ymax,1 = ymax,2 = 0.5), with constant x = 0.13 mM and high noise levels. Arrows illustrate the corresponding promoter strengths.

https://doi.org/10.7554/eLife.29739.007

In particular, at constant 0.13 mM lactose, we obtained a relatively high mean selection coefficient of 0.19% when the wild-type expression is recovered upon duplication (in a highly noisy scenario). However, the selection coefficient has to be greater than the duplication deletion rate to ensure fixation (Figure 5b); a condition that is not met here. Certainly, the high deletion rates observed in bacteria (Reams et al., 2010) protect them from acquiring genetic redundancy (perhaps, this is why lacZ is not duplicated in E. coli despite this may be beneficial). In other local genetic contexts, also in bacteria, the deletion rate of a lacZ duplicate can be as low as 4.1·10−4 -/gene/gen. (Reams et al., 2012). In this scenario, a selection coefficient of 0.19% would lead to fixation. We then estimated a global fixation probability of 3·10−7 (= 2·15·10−4·10−4; Figure 6d; see Materials and methods). Remarkably, our estimation is much higher than 5·10−9, the fixation probability under hypothetical neutrality (Kimura, 1983).

A fitness increase on average due to expression noise reduction could also lead to the fixation of duplicates in eukaryotes, as nothing prevents assuming the same positive selective conditions (Raser et al., 2004; Hansen et al., 2015), which now largely outperform the duplication deletion processes. For D. melanogaster, for instance, where the per base mutation rate is of 5·10−9 mut./bp/gen. (Schrider et al., 2013), and complete gene duplications have little impact on fitness (Emerson et al., 2008; note that other genome rearrangements not affecting entire genes are significantly deleterious), we estimated that 0.05 mutants with nearby 50% lower expression and up to 105 duplicants of the gene of interest would be found in the natural population. Hence, the probability of concurrence in the same organism (duplication after promoter mutation) would be of 2.5·10−3. Consequently, the global fixation probability would be of 10−5; again, higher than the one under hypothetical neutrality (Kimura, 1983).

Maintenance of a duplicate upon fixation in the population

A forthcoming change in lactose dose would be highly detrimental if a second lacZ copy were fixed in the population either under neutrality due to insignificant expression or under strong selection due to expression demand. In the former case, an increase of lactose would be detrimental; in the latter, a decrease would. Consequently, either the elimination of the duplicate by purifying selection (Lynch and Conery, 2000) or the accumulation of mutations that lower the LacZ expression to recover the ancestral phenotype (Force et al., 1999; Qian et al., 2010) would be promoted; with clonal interference in the case of asexual populations (Rozen et al., 2002; Desai et al., 2007). In the latter case, the two gene copies could be maintained in the genome for long time by buffering of costly stochastic fluctuations of intrinsic nature if they held similar expression levels (Figure 6e; Gout and Lynch, 2015); otherwise the gain in accuracy decreased. Conversely, if a second lacZ copy were fixed according to the path shown in Figure 6a under weak selection, it would be safe from changes in lactose dose.

The genomic inspection of organisms in which genetic drift is not, in principle, a suitable force to drive the fixation of duplicates (e.g. bacteria or yeast; Lynch and Marinov, 2015) gave us some empirical insight, despite the masking produced by subsequent evolutionary trajectories. In many cases of duplication, there is no a significant increase in total expression (e.g. duplicates in Saccharomyces cerevisiae vs. singletons in Schizosaccharomyces pombe; Qian et al., 2010). Thus, either duplicates were fixed by dosage in a definite environment to then return to ancestral expression levels, or duplicates were fixed by other means. In any case, the preservation of the ancestral function in the second copy is expected (DeLuna et al., 2008). Whether noise reduction was actually relevant for some fixations or not is hard to say without conducting an experimental approach to measure variability and selection (revealing the fitness landscape; Figure 2); notwithstanding, it seems a plausible mechanism according to our results, already put forward with the computational analysis of gene expression patterns (Lehner, 2010) and metabolic flux balances (Wang and Zhang, 2011) in yeast.

If dosage mattered at some point, the function encoded by the duplicated gene would be more important at the time of duplication than today. In E. coli, for example, genes fsaA and fsaB are paralogs, with high sequence (69%) and functional similarity, coding for a genuine fructose-6-phosphate aldolase (Sánchez-Moreno et al., 2012). The relevance of this enzyme for today E. coli is unclear, suggesting that fsaB might have been fixed by dosage in past habitats in which rare sugars were frequent. However, if noise were the critical aspect, the system would present some regulation to link environment with phenotype and the function would be of routine for the cell. In particular, E. coli expresses two redundant gluconokinases, encoded in genes gntK and idnK (51% of sequence identity), to face environments in which gluconate is the carbon source due to glucose oxidation (Vivas et al., 1994). Similar to the regulation of lacZ by lactose (Jacob and Monod, 1961), gluconate activates the expression of gntK and idnK by inhibiting the transcriptional repressor GntR (Afroz et al., 2014). Again, there would be a trade-off between metabolic benefit and expression cost (Figure 1c; read gluconate instead of lactose and GntK/IdnK instead of LacZ). Arguably, duplication might have been fixed in this case to cope with gene expression inaccuracies, especially when GntR produces bimodal responses (captured in single-cell experiments; Afroz et al., 2014).

A comprehensive model compatible with population genetics to explain the early fate of gene duplications

Taking all our results together, we formulated a comprehensive model to explain the early fate (viz., fixation or elimination) of gene duplications (Figure 7). Notably, this model is compatible with population genetics, involving positive and neutral selective conditions (Lynch, 2007). On the one hand, a significant number of duplicates could be fixed by genetic drift only in complex organisms (i.e. higher eukaryotes; sector A in Figure 7). This would be due to their increased ability to allocate additional resources for expression (Lynch and Marinov, 2015), and their apparently reduced duplication deletion rate with respect to the inverse of the population size (Schrider et al., 2013). However, these fixed duplications would not be stable, due to the formation-deletion balance (Reams et al., 2010), and then, for a long-term preservation, they would require the accumulation of beneficial mutations (Han et al., 2009), or the relocation of the second copy in the genome to prevent its deletion (Ranz et al., 2001). This would lead to late fates of sub- or neo-functionalization (Force et al., 1999; Conant and Wolfe, 2008).

General model to explain the fixation of duplicated genes as a function of the degree of selection in the population and preservation in the genome for long time.

Representative silhouettes correspond to bacteria (prokaryotes), yeasts (lower eukaryotes), insects, plants, and mammals (higher eukaryotes).

https://doi.org/10.7554/eLife.29739.008

On the other hand, positive selection could drive the fixation of duplicates in both complex and simple organisms. When the environmental changes were relatively rapid, only organisms with short generation times (i.e., prokaryotes and lower eukaryotes) could fix duplications (sector C in Figure 7; Riehle et al., 2001). However, such duplications would be quickly eliminated from the population afterwards (once the environment changed again), as the genome rearrangement rates are orders of magnitude higher than the per base mutation rates (Reams et al., 2010). By contrast, when a given environmental change were prolonged, any organism, irrespective of its generation time, could fix duplications (sector D in Figure 7; Emerson et al., 2008). In this case, they would be under strong positive selection, and, consequently, they would be preserved for long time. Furthermore, all organisms could fix duplications by producing more accurate responses (sector B in Figure 7), without the need of significant environmental changes; provided the gene of interest were noisily expressed (Elowitz et al., 2002; Raser et al., 2004), and the duplication deletion rate were lower than the weak selective advantage. In the very long term, these weak selective conditions could also allow the exploration of novel functions, as they ensure the preservation of duplicates, without invoking fortuitous exploration in the ancestral state (Bergthorsson et al., 2007), and with amplification when the advantage provided by the narrowed novel function were higher than the advantage by noise reduction.

Discussion

The inherently stochastic nature of gene expression is certainly an evolutionary driver when it is linked to cell fitness to dictate the selection of particular genetic architectures (Batada and Hurst, 2007; Maamar et al., 2007). Our results demonstrate that gene duplication can be positively selected as an architecture that allows enhancing information transfer in genetic networks (i.e. mitigation of expression errors; Rodrigo and Poyatos, 2016). Accordingly, the genetic robustness indeed observed upon the accumulation of genetic redundancy (Keane et al., 2014) would be more a consequence than a selective trait (Kafri et al., 2006). Certainly, by aggregating the responses of two genes, intrinsic fluctuations can be mitigated, but not fluctuations of extrinsic nature. This way, duplication would be more favorable in scenarios in which intrinsic noise is preponderant. The balance between intrinsic and extrinsic noise depends on the particular environmental conditions and the regulatory structures in which the gene in embedded. Intrinsic noise can be significant when the medium is rich in nutrients, the expression levels are low, and no further regulations affect the gene (Swain et al., 2002). For example, competence in Bacillus is mainly governed by intrinsic noise (Maamar et al., 2007). To follow our model, noise has to mainly impinge the regulation of the system, that is, disturb the link between the signal molecule and gene expression (Blake et al., 2006). Moreover, our results highlight that a population genetic model with the mean selection coefficient is enough to explain the complex, stochastic evolutionary dynamics of duplication fixation. Of note, the reported intrinsic adaptive value, which cannot be captured by sequence analyses, was derived from basic mathematical models of gene regulation and cell fitness (Dekel and Alon, 2005).

Notably, we anticipated a series of testable results by following our theory of error buffering upon duplication. First, the gene expression level is indicative of the fixation path. The theory requires that gene expression is roughly maintained (i.e. gene dosage sharing, duplicates vs.. singletons), with the aim of minimizing deleterious fitness effects. This would hold for several fixed duplicates in different organisms (Qian et al., 2010; Gout and Lynch, 2015; Cardoso-Moreira et al., 2016; Lan and Pritchard, 2016), although most of the formed duplicates would be under strong purifying selection due to the cost of over-expression, as already proposed (Lynch and Conery, 2000). By contrast, those fixed duplicates showing increased gene expression levels would reflect the effect of genetic drift (Lynch and Conery, 2003) or positive selection for dosage after prolonged environmental changes (e.g. the case of flies; Emerson et al., 2008; Cardoso-Moreira et al., 2016).

Second, noisy genes are expected to be more duplicable (e.g. as it seems to happen in yeast; Lehner (2010); Dong et al., 2011) when noise has deleterious fitness effects. Indeed, the gain experimented by the system upon duplication is greater when gene expression inaccuracies are significant (Rodrigo and Poyatos, 2016). This would explain the TATA box enrichment in the cis-regulatory regions of duplicated genes, as these genetic motifs are associated to high plasticity (i.e. high sensitivity to multiple environmental changes) and high gene expression noise by inducing transcriptional bursts (Blake et al., 2006; Lehner, 2010). Note that if noise were beneficial (e.g. as a survival strategy in fluctuating environments; Acar et al., 2008), duplication would not be favored. Moreover, we might argue that essential genes would be less duplicable (He and Zhang, 2006) as a consequence of their reduced gene expression noise (Batada and Hurst, 2007). Genes under the control of regulatory structures that buffer noise (e.g. negative feedbacks) would not be duplicable either (Warnecke et al., 2009). However, this consideration should be taken with caution, as genes not essential a priori could be duplicated and then, upon fixation, accumulate beneficial mutations (Han et al., 2009) to ensure preservation for long time, resulting a posteriori in essential genes due to functional diversification (as it seems in the case of mammals; Makino et al., 2009).

Third, the local genetic context would be highly determinant of the fixation of a duplicate (Reams et al., 2012), explaining why some genes are more duplicable than others in scenarios of apparent neutrality (hot spots; Perry et al., 2006). Moreover, duplicates would be much shorter lived in prokaryotes than in eukaryotes (Lynch and Conery, 2003), due to the differences of orders of magnitude in the duplication deletion rates. After all, the precise experimental determination of the molecular rates of gene copy number variation would unveil to what extent natural selection has actually rivaled random genetic drift to shape complexity along the course of life history (Rodrigo, 2017).

These predictions involve, nevertheless, some limitations. On the one hand, due to a simplified mathematical model not considering the many molecular/genetic attributes that impinge implicitly on gene expression, such as promoter sequence-dependent noise levels (Metzger et al., 2015), response coupling due to genetic proximity (Becskei et al., 2005), or recursive fitness-expression dependence (Klumpp et al., 2009). On the other hand, due to the difficulty to provide direct empirical evidence supporting the fixation of duplicates by reducing intrinsic noise. In this regard, we expect to carry out in the future an experimental approach (Dekel and Alon, 2005; Keane et al., 2014) complementary to this theoretical study. Despite these edges, our results complete a mechanistic model in which duplicates are fixed either by genetic drift (no selection) or by gene dosage (strong selection) with the addition of a new principle, viz., reduction of gene expression inaccuracies upon duplication can result in a weak selective advantage.

Materials and methods

Fitness function

The lac operon of E. coli (Jacob and Monod, 1961) was considered as a biological model system from which to apply a mathematical framework, and cell growth rate was taken as a metric of fitness (W; Elena and Lenski, 2003). In this particular case, the benefit function reads B = a·y·x / (k + x), where a accounts for the increase in growth rate due to lactose utilization (x denotes its concentration; y denotes the normalized LacZ expression), and k is the Michaelis-Menten constant. In addition, the cost function reads C = b·y / (h - y), where b accounts for the decrease in growth rate due to LacZ expression, and h for the maximal resources available in the cell (Dekel and Alon, 2005). Thus, the fitness function reads W = W0·(1 + B - C), where W0 is the cell growth rate in absence of lactose (x = 0). Note that this model underestimates the adaptive ability of the bacterium by not considering the effect of LacY. Moreover, the normalized LacZ expression, in the deterministic regime, is given by y = xn/(x0n + xn), where x0 is the lactose regulatory constant, and n the Hill coefficient (accounting for the regulatory sensitivity). In this model, LacZ is not expressed in absence of lactose. If y > h, we assumed W = 0. All parameter values were experimentally fitted, resulting in W0 = 1 h−1, a = 0.17, k = 0.40 mM, b = 0.036, h = 1.80, x0 = 0.13 mM, and n = 4 (Dekel and Alon, 2005). The optimal LacZ expression (yopt) was obtained by imposing dW/dy = 0, resulting in yopt = h - [b·h·(k + x) / (a·x)]1/2.

Stochastic gene expression

The normalized LacZ expression in presence of molecular noise was modeled, in steady state, as y = ymax·(x·z1·z0)n / [x0n + (x·z1·z0)n], where ymax is the maximal expression level (in general, ymax = 1), and z1 and z0 random variables accounting for intrinsic and extrinsic noise sources, respectively. Here, they were log-normally distributed [with mean 0 for both log(z1) and log(z0), and standard deviation ηin for log(z1) and ηex for log(z0)]. This accounts for the noisy de-repression of the promoter and subsequent expression due to lactose. Note that whilst LacZ can show a bistable expression pattern with non-metabolizable synthetic compounds (Ozbudak et al., 2004), its expression is monostable with lactose (van Hoek and Hogeweg, 2006). For simplicity, the transient LacZ expression was overlooked, and the noise levels were considered constant during a cell cycle. The median response of a population is denoted by ⟨y⟩.

Typical values characterizing the magnitude of the stochastic fluctuations (ηin and ηex) range between 0.1 and 0.5. They lead to values of gene expression noise (understood as the coefficient of variation) between 0.26 and 0.72 (in the case of ηin = ηex and x = x0), in agreement with experimental reports (Elowitz et al., 2002).

Gene duplication

The combined expression of two genes coding for LacZ in presence of molecular noise was modeled as y = ymax,1·(x·z1·z0)n / [x0n + (x·z1·z0)n]+ymax,2·(x·z2·z0)n / [x0n + (x·z2·z0)n], where z2 is a random variable accounting for intrinsic noise on the second copy, with the same distribution as for z1 (z1 and z0 as before). Note that whilst extrinsic fluctuations (z0) are common, intrinsic fluctuations (z1 and z2) are independent for each gene copy (Elowitz et al., 2002). Moreover, the expression levels of the duplicates with respect to the singletons can be adjusted with the values of ymax,1 and ymax,2, with ymax,1 = ymax,2 = 0.5 for equal total expression, and ymax,1 = ymax,2 = 1 for double expression.

In addition, the bacterial model was modified to simulate the effect of gene duplication in organisms of different complexity. For that, the parameter h in the cost function was set in terms of the genome size (G, in Mbp of haploid genome), simply as ≈ 0.36·G (e.g. ≈ 5 for E. coli, or ≈ 3000 for H. sapiens), assuming that complex organisms have more resources to accommodate new gene expressions (Lynch and Marinov, 2015). The effective population size (here denoted by ⟨N⟩), determinant of the fixation of new genotypes, was also set in terms of G, resulting in ⟨N⟩ ≈ 3·109 / G1.44; an equation roughly inferred from previously reported estimates (Lynch and Conery, 2003).

Information transfer

Mutual information (I) was used as a metric to characterize information transfer by considering the system as a communication channel between the environmental molecule (lactose) and the functional protein (enzyme, LacZ) resulting from gene expression. I was calculated as previously done (Rodrigo and Poyatos, 2016), between log(x) and y. To model the variation of lactose, a random variable log-normally distributed was considered [with mean 0 and standard deviation 1, otherwise specified, for log(x/x0)]. The median lactose dose is denoted by ⟨x⟩, and the fluctuation amplitude, denoted by Δx, corresponds to the standard deviation of log(x). To compare statistically two I values, we followed the approximation proposed by Cellucci et al. (2005) to obtain an equivalent correlation coefficient, and then the Fisher’s r-to-z transformation.

Genotype-phenotype map

Here, the LacZ expression defines the phenotype of the cell (i.e. its metabolic capacity), and for the wild-type genotype it is lactose dependent through the LacI regulation (Jacob and Monod, 1961). Because differences in fitness are very small, the normalized expression (y) was assumed independent of it (Klumpp et al., 2009). Potential beneficial mutations are those that change the lac promoter activity (the cis-regulatory regulatory region of LacZ, of about 102 bp). According to an analysis of a large library of mutants (Kinney et al., 2010) resulting in a linear model of categorical variables (Otwinowski and Nemenman, 2013), the distribution of maximal LacZ expression upon single-point mutations was inferred. For simplicity, no epistatic interactions were taken into account, although they could matter. Mutations were also assumed to affect only the mean expression level and not the noise, even though this latter might happen (Metzger et al., 2015).

In silico evolution

A medium with maximal capacity for N = 105 cells was considered, and serial dilution passages were simulated (Elena and Lenski, 2003), with a dilution factor of D = 100 (in terms of volume, with deterministic dominance). The dilution period was set to 1 d. Lactose also varied with the same period. The doubling time of a given cell was 1/W, with W calculated from the stochastic LacZ expression. In case of no saturation, the cell volume increased as 2W·t, where t is the time in h. Because doublings occur in about 1 h, the number of generations per passage is bounded to log2(D) = 6.64. Two genotypes were put in competition: one with a single copy of LacZ, the other with two copies. No mutations were allowed to occur.

Population genetics

In scenarios of competition between two subpopulations (i.e. two different genotypes), the ratio between them (r) reads r = r0·2S·t, where r0 is the initial ratio, S the selection coefficient, and t the time measured in generations (Hegreness et al., 2006). By setting W and W’ the fitness values of each genotype (with W’ W), the selection coefficient is calculated as S = W’/W - 1. When fitness changes over time, the mean selection coefficient (⟨S⟩) is used. The frequency of the genotype with advantage in the population is f = 1/(1 + 1/r). The dynamics of a punctual beneficial mutant appeared in an evolutionary experiment of serial dilution passages, with maximal population size N and dilution factor D, is given by r = 2S·t / ⟨N⟩, where ⟨N⟩ = N / D1/2 is the geometric mean population size (also considered the effective population size; Lewontin and Cohen, 1969). The fixation probability is Pfix = 2S, and the characteristic fixation time tfix = log2(⟨N2)/S. Note that the time for 50% invasion of the population is thalf-fix = log2(⟨N⟩)/S = tfix/2. However, we have Pfix = 1/⟨N⟩ and tfix = 2⟨N⟩ for a neutral mutant (Kimura, 1983).

By contrast, if multiple beneficial mutants are recurrently produced at rate μb, the dynamics is given by r = μb·N·2S·t / [S·log(D)·⟨N⟩] ≈ μb·2S·t / S, as in each passage μb·N different mutants are generated (valid for μb·N > 1; Desai et al., 2007). Because mutants are now recurrent, Pfix = 1, and the characteristic fixation time reads tfix = log2[⟨N⟩·S / μb]/S. When m different mutations accumulate successively, tfix tfix(m) + thalf-fix(m-1) + … + thalf-fix(1), that is, a subsequent mutation can start its fixation when the preceding mutation has invaded the 50% of the population (Lang et al., 2013). If μb·N << 1, the system can be treated as in the case of a punctual beneficial mutation, and the dynamics can be written as r = 2S·(t - T) / ⟨N⟩, with a delay of T = log2(D) / (μb·N), the mean number of generations required to produce a mutant, and Pfix = 2S.

Moreover, in case of gene duplication, if multiple beneficial mutants are recurrently produced at rate μc, and deleted at rate μd, the dynamics is given by ≈ μc·2S’·t / S’, with S’ S - μd as an effective selection coefficient (valid for μc·N > 1, and S > μd). Again, if μc·N << 1, the system can be treated as in the case of a punctual beneficial mutation, with Pfix = 2S’. If S << μd, the stationary solution can be approached by ≈ μc / μd for effectively neutral mutations, or by ≈ μc / (μd - S) for deleterious mutations.

Genetic diversity

The per base mutation rate of E. coli is μ = 10−10 mut./bp/gen. (Lee et al., 2012). Cultures of this bacterium may reach population sizes up to N = 109 cells (⟨N⟩ = 2·108). This means, on average, 0.02 (= μ·⟨N⟩) mutants of a given base pair in the population. The number of base pairs, mainly in the cis-regulatory regulatory region, whose mutation reduces in half the expression of a gene of interest can be estimated in 10 (based on data for lacZ). Thus, μb = 10·μ, which means 0.2 (= μb·⟨N⟩) mutant of this type in the population on average. This frequency may even be higher if we not only consider the mutations in the lac promoter, but also the mutations in the coding region, or affecting the activity of its regulators (e.g. CRP; Kinney et al., 2010).

In addition, for the lacZ gene, its duplication formation rate is of μc = 3·10−4 dup./gene/gen., and its duplication deletion rate of μd = 4.1·10−4–4.4·10−2 -/gene/gen. (Reams et al., 2010; Reams et al., 2012). In absence of lactose, duplications are neutral (S = 0), which means, on average, a duplication frequency in the population of 0.68–42% [= μc / (μc + μd)]. By contrast, in presence of lactose, duplications are deleterious (≈ −28%), and then the average duplication frequency is of 0.09–0.11% [= μc / (μc + μd - S)]. Note that the deletion rates are difficult to estimate experimentally, as this requires starting from a genotype with new-born (mostly unstable) duplications, albeit they are essential to properly understand the fixation process.

Availability of resources

A Matlab code to model gene expression (y) and cell fitness (W) and a C++ code to perform the in silico evolution (as described above) are freely available for download at https://sourceforge.net/projects/rodrigo-duplications/files (Rodrigo, 2017b). A copy is archived at https://github.com/elifesciences-publications/rodrigo-duplications.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
    Preservation of duplicate genes by complementary, degenerative mutations
    1. A Force
    2. M Lynch
    3. FB Pickett
    4. A Amores
    5. YL Yan
    6. J Postlethwait
    (1999)
    Genetics 151:1531–1545.
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
    The probability of preservation of a newly arisen gene duplicate
    1. M Lynch
    2. M O'Hely
    3. B Walsh
    4. A Force
    (2001)
    Genetics 159:1789–1804.
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
  62. 62
  63. 63
  64. 64
  65. 65
  66. 66
  67. 67
  68. 68
  69. 69
  70. 70
  71. 71
  72. 72
  73. 73
  74. 74
  75. 75
    rodrigo-duplications
    1. G Rodrigo
    (2017b)
    SourceForge.
  76. 76
  77. 77
  78. 78
  79. 79
  80. 80
  81. 81
  82. 82
  83. 83
  84. 84
  85. 85
  86. 86
  87. 87
  88. 88
  89. 89
  90. 90

Decision letter

  1. Diethard Tautz
    Reviewing Editor; Max-Planck Institute for Evolutionary Biology, Germany

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

[Editors’ note: this article was originally rejected after discussions between the reviewers, but the authors were invited to resubmit after an appeal against the decision.]

Thank you for submitting your work entitled "Intrinsic adaptive value and early fate of gene duplication revealed by a bottom-up approach" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and a Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Ashley Teufel (Reviewer #3).

Our decision has been reached after consultation between the reviewers. Based on these discussions and the individual reviews below, we regret to inform you that your work will not be considered further for publication in eLife.

As you will see, the referees found the problem to be interesting and appreciated your approach, but they also raised a number of issues which preclude publication in eLife. These include:

My read of the reviews is that while there might be an interesting glimmer of an idea here, it hasn't been cashed out properly and what we're left with is a not particularly persuasive model/argument.

It seems that the problems include:

1) An inadequate treatment of previous literature in the area;

2) A possible overstatement of the possible fitness benefits associated with reducing intrinsic noise;

3) Highly optimistic assumptions about how dosage compensation would work and how gene expression would be partitioned.

Reviewer #1:

Authors propose that gene duplication reduces gene expression noise, which could be beneficial and hence lead to the fixation of newly duplicated genes. However, I believe this model is untenable. My detailed comments follow.

1) There is no lack of evolutionary models to explain the fixation and long-term retention of duplicates. In terms of fixation, which is the focus of the present study, a new duplicate may be fixed by positive selection for increased gene dose (for either the main or minor function of the gene) or genetic drift. The Introduction seems to suggest a lack of suitable models (hence need for a new model), which is misleading.

2) The present model relies on a reduction of gene expression noise caused by gene duplication. This noise reduction is tiny. In the best case scenario, the intrinsic noise measured by CV is reduced by 29% upon duplication. But because expression noise is mainly from extrinsic noise, which is not reduced by gene duplication, the fraction of total expression noise reduced is minute (likely <5%).

3) The fitness benefit from 5% noise reduction will be swamped by a much greater fitness cost of doubling the expression of the gene owing to gene duplication. So, authors propose that the total expression level of the duplicate pair is halved by a mutation. Simply halving the total expression is actually not sufficient, because the above calculation of the noise reduction assumes equal expression levels of the two genes. If the two genes have different expression levels, the amount of noise reduced becomes even smaller. So, two very special mutations that reduce the expression of each gene by ~50% are required. I believe the probability of simultaneously acquiring two such mutations in a cell is almost 0.

4) Compared with the probability of acquiring the above two mutations, the probability of acquiring mutation(s) conferring a new function may be larger. In other words, neofunctionalization is probably more likely to happen than the scenario proposed by the authors.

5) Authors based their calculations on one gene (lacZ), but wrote as if the calculations apply to all genes.

6) I wonder why lacZ is not duplicated in E. coli if their theory predicts that duplication of lacZ is beneficial.

7) They provide no empirical evidence for even one duplicate gene that was likely fixed by the mechanism proposed.

Reviewer #2:

Rodrigo and Fares examine the immediate effect of gene duplication on gene expression and fitness, specifically on the reduction of noise in gene expression. The subject is of interest to a large community, including those interested in gene duplication, the evolution of regulatory systems and also microbial adaptation. The way to approach the problem is rather novel and brings to light new aspects on the issue of why would immediate gene duplication be maintained if dosage itself is not favored. The fact that gene duplication may reduce intrinsic noise in gene expression has been noticed before (Wang and Zhang, as cited by the authors) and directly derives from the fact the average of two random values from a distribution are closer to the mean of that distribution than any of the single values are, at least for the type of distributions we are dealing with. Showing this using a well-known regulatory system is valuable, especially if other factors such as trade-off that may derive from the cost of expression or the cost of having two gene copies are considered. That being said, the manuscript as presented would require additional work before it can be considered for publication. Here are some points:

1) The writing needs to be improved. Some wordings are confusing and also the overall structure of the paper would benefit from a more logical organization. The different alternative assumptions should be confronted directly side by side to clarify the limitations and the conditions in which the system could evolve. The issue of trade-off that is introduced as being important in the Introduction should be better addressed.

Here are some examples of sentences that need to be revisited:

- The first sentence of the Abstract is difficult to read. The fact that duplication contributes to complexity implies that duplicates are maintained. Why use “albeit”?

- Introduction, second paragraph, first sentence. This sentence is also hard to follow and brings multiple important elements in a single sentence.

- Introduction, fifth paragraph, second to last sentence. Again hard to follow.

- Introduction, last paragraph. What is a real gene?

- “Here, we simply considered a cost function based on LacZ expression, although it would be more precise a cost based on lactose permease (LacY) activity (Eames and Kortemme, 2012)”. This sentence is difficult to follow.

- “The population was let to evolve without introducing any artifact…”? What does artifact refer to here?

- Subsection “Most of the new-born duplicated genes are costly for the cell and do not offer phenotypic accuracy”, last paragraph. It is strange to start this paragraph with “even though”. We would expect a contrast to be made but it is not the case.

- Subsection “Most of the new-born duplicated genes are costly for the cell and do not offer phenotypic accuracy” “as long as” is not used in the proper context.

- Subsection “Fixation is conditioned by the unexpected recurrence of creation and deletion of gene duplications in a population”, first sentence. The word “created” is used to refer to gene duplication. Gene duplication is a process that cannot be created. Gene duplicates can be created, although I would refrain from using “created” here.

- The authors use “simple” and “punctual” mutation rates. They may want to refer to per base or nucleotide mutation rate instead, or use other standard nomenclature.

- Subsection “A comprehensive model compatible with population genetics to explain the early fate of gene duplications”, second paragraph. It would be better to state directly the effect of generation time rather than mention “simple” organisms because they have short generation time.

.…

2) A large fraction of the results presented assume that the sum of expression of the two copies is equal to the expression of the ancestral copy. This assumption is later relaxed in the paper. However, because expression is likely to scale with copy number, this assumption is most likely extremely optimistic. In addition, it is possible that with two copies, repression is not as efficient and the genes are now expressed even when not needed. The two different scenario (2X expression and 1X expression, and their intermediates if possible) should be compared side-by-side and better arguments should be presented as to why 1X is achievable.

3) The issue of intrinsic and extrinsic noises should be brought earlier in the paper as this is a very important consideration. They could be introduced in the Introduction. Gene duplication is not expected to reduce extrinsic noise and extrinsic noise is usually the primary source of differences among cells. As far as I understand, they are treated as potentially contributing equally in the model, which is clearly not the case in reality.

4) An alternative to duplication is also an increase in expression level, which would make protein abundance more often above the critical expression value and thus also increase fitness, without the need for duplication. Mutations that increase abundance would also then compete with duplications.

5) Subsection “Gene duplication helps to better resolve the fitness trade-off”, second paragraph. The authors describe the fitness landscape as rugged but a rugged fitness landscape has multiple local peaks, which is not the case here.

6) The authors define and introduce phenotypic accuracy, which is basically the inverse of noise. I am not sure more terms are necessary in this field. Not sure also that the use of information transmission helps this study and adds anything to the results.

7) Subsection “Gene duplication helps to better resolve the fitness trade-off”, last paragraph. The authors say that the two surfaces reassemble. This interpretation appears to be rather subjective. It would be useful to explain why this matters and how similar they really are.

8) The authors introduce the concept of trade-off in the Introduction and argue that this is an important factor in evolution but largely ignore them as a constraint on the evolution of expression. At the same time, they state that an increase in expression is detrimental in most environments (subsection “Most of the new-born duplicated genes are costly for the cell and do not offer phenotypic accuracy”, first paragraph). This question needs clarification and again, a better organization of the text would allow to better contrast the systems with and without trade-offs.

9) The authors use a biological context that is laboratory populations and experimental evolution. For instance, they say in the first paragraph of the subsection “Fixation is conditioned by the unexpected recurrence of creation and deletion of gene duplications in a population”, that typical bacteria populations are 109 cells. I presume that they refer to cell populations in the laboratory. It would be more appropriate to refer to biological conditions that occur in nature. Even if laboratory conditions favor some evolutionary paths or dynamics, it would be irrelevant if the conditions do not exist in nature. This comment is also relevant for the simulations with dilutions and exponential growth in a flask. These simulations would be interesting if they were tested experimentally in the laboratory in this study. However, since we want to understand evolution in nature, why not use what is expected to be relevant in natural populations, including effective population size estimates, which have been computed for E. coli I presume. Since theory has shown that duplication reduces noise, what the readers will be really interested in is whether this is sufficient to favor the maintenance of duplicates in a biologically realistic system.

10) Subsection “Phenotypic accuracy can lead to the fixation of a new-born duplicated gene in the population”, first paragraph. Cis regulatory mutations are assumed to act on the average expression and not on the noise in expression. This is a convenient assumption but not necessarily true. Mutations most likely affect both at the same time (See the work of P. Wittkopp). This could reduce the mutational target site available for mutations reducing expression level.

11) Subsection “Phenotypic accuracy can lead to the fixation of a new-born duplicated gene in the population”, first paragraph. The authors discuss the fact that about 10% of mutations affect expression (reduction by about 50%). To calculate the rate at which these mutations occur, one needs to know how many sites in the genome have these effects, not what fraction of mutations that have been studied reduce expression. It is not 10% of all mutations in the genome that reduce expression, but rather 50% of the 75 bp region as studied by Kinney et al. This should be clarified in the calculation. Also, the probability that a duplication and a mutation that reduces expression by 50% occur in the same cell in the same generation depend on their equilibrium frequency and somehow the effective population size? The order of appearance would also matter because reducing the expression of only one of the copy (if the mutation occurs after the duplication) is not going to bring the expression level to the ancestral state.

12) Discussion, second paragraph. It is not clear that all of the results mentioned here derive from the theory proposed and if the results actually suggest a reinterpretation of the results. To be useful, it would be important to have predictions from this model that would be specific to this model and could not be explained by the previous models proposed. Also, it would be useful if some were tested here to actually show that some cases known in nature seem to fit the model. Any variation in gene copy number in bacteria that cannot be explained by dosage effects alone or other models of duplicate evolution?

13) The authors assume that the gene expression partitioning seen for pairs of duplicates is 50-50%, but according to Gout and Lynch, this is very often not the case. It is not clear how an expression partitioning that is not 50% contributes to reduce noise in expression. This could be explored here.

14) Some reasoning needs to be revisited carefully. For instance, in the third paragraph of the Discussion, the authors predict that essential genes would be less duplicable as a consequence of their reduced expression noise. Essential genes are not created essential and may derive from non-essential genes, which were noisy initially. If these genes show less expression noise because they contribute more to fitness, it means that selection for lower noise could have favoured their duplication (at the same time making them non-essential, making this effect hard to see).

15) Subsection “In silico evolution”. Why is evolution envisioned as if it occurred in the laboratory? It is already unclear if experimental evolution reflects evolution in natural systems so simulating experimental evolution appears to move away from nature.

16)Subsection “Genetic diversity”, last paragraph. Wouldn't the equilibrium frequency just be Uc/Ud?

17) Figure 1. Should explain what is x/x0 in the legend.

Reviewer #3:

This manuscript puts forth an interesting new theory on how newly birthed duplicated genes could eventually fix in a population. While the work laid out here seems to be of large interest, I have a few concerns that I would like to see addressed before publication.

My main concern with this publication is that the bulk of the work is centered on examining a system where a duplication does not result in a change of total expression, which is at best a very rare occurrence. While this is discussed later in the manuscript, some justification of why this situation was chosen for the biases of this work should be included in the "Gene duplication helps to better resolve the fitness trade-off" section.

The claim that the actual and the optimal dose-response curves (Figure 1E and Figure 2F) are similar doesn't seem very convincing. Showing this data in something like a q-q plot and reporting a correlation would aid in the argument. This is especially important for Figure 2F when you make the comparison between the duplicate and the singleton, because there does not appear to be much of difference between both.

The comparison of non-normalized mutual information is confusing. Stating what the I values are and that one is 25% higher than the other doesn't convey the message that the duplication changes fidelity in a significant way. Is there an additional metric that could be used to better make this point?

The set up in the Introduction could be improved by adding further detail about why reducing gene expression inaccuracies results in increased fitness.

Often the model is linked to values that have been "experimentally determined" but there doesn't appear to be a clear reference to where these values have come from.

The amount of in. noise is an important parameter in this model. Any statement about the amount of in. noise that exists in biological systems would aid in linking this model back to the biology. Is a moderate (0.3) amount of in. noise to be expected?

The Discussion section largely centers on further directions of this work and ends abruptly. Including a section about the limitations of this work and also casting this work into a larger context would be appreciated.

I believe that eLife requires that you make any code used available. I would suggest putting your simulation code in a repository and including the link in the manuscript.

Overall, this is an interesting manuscript but I feel the way some of the data is presented could be changed to strengthen the author's arguments. By including more detailed statistical analysis and expanding some portions of text for clarity would improve this manuscript substantially.

[Editors’ note: what now follows is the decision letter after the authors submitted for further consideration.]

Thank you for resubmitting your work entitled "Intrinsic adaptive value and early fate of gene duplication revealed by a bottom-up approach" for further consideration at eLife. Your revised article has been favorably evaluated by Diethard Tautz as the Senior and Reviewing Editor, and two reviewers.

The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

One reviewer still has some comments on the presentation of your results. Please check these carefully and clarify the issues as much as possible. Such changes should improve the impact that this manuscript will eventually have. While I do not think that further reviewing will be necessary after these changes are introduced, I should like to ask you to provide nonetheless a careful response letter, indicating which changes were incorporated.

Reviewer #2:

I maintain my comments on the previous version of the manuscript. I believe the paper is hard to follow and extremely specialized such that it is hard to evaluate whether the observations are generalizable.

Important concepts in some sections are not introduced properly in the Introduction (tradeoffs for instance do not only include production costs but any other types of negative effects, including in other environments). Some assumptions made for the analysis are not well detailed, for instance the extent of noise in expression, the cost of expression. Another example is the statement made from Figure 6A that most mutations are nearly neutral. Given what was said about the importance of gene expression tuning and the large Ne for E. coli, most of these changes are most likely not neutral at all. This is a surprising statement given that the paper shows that small changes in the distribution of expression levels can affect the fate of gene duplication.

What we would like to know is under which noise regime (showed to be likely based on observations) this mechanism could affect evolution given a clear set of assumptions that are shown to be realistic. I do not feel we know this by reading the paper as it is.

Some of the concepts introduced is not defined properly, for instance phenotypic accuracy. Here the authors say that phenotypic accuracy (…subsection “Gene duplication helps to better resolve the fitness trade-off”, second paragraph) is the fact that phenotypic responses generated by duplicated genes give on average higher fitness values than responses generated by singletons. This is simple corollary to the fact that duplication reduces noise, this is not a new concept that needs the be defined. Using such definitions is just a distraction that reduces our understanding. Same could be said about information content. This is not appropriate for a generalist journal such as eLife.

It is not clear why we need simulations at all if the selection coefficient have been estimated given all of the analytical work that has been done previously (fixation prob. versus Ne and S).

The section on expression demand in extreme environments (subsection “Expression demand in extreme environments can also lead to the fixation of a newborn duplicate in the population”) does not really deal with the question in hand, which is the effect of duplication of noise reduction. There are examples of arbitrary assumptions here too, for instance the consideration of a lac promoter with 40% lower activity as a starting point.

Examples of sections with lack of logical flow:

Introduction, fifth paragraph; subsection “Gene duplication helps to better resolve the fitness trade-off”, first two paragraphs; subsection “Expression demand in extreme environments can also lead to the fixation of a newborn duplicate in the population”, second paragraph.

Reviewer #3:

This version of the manuscript is much improved. I thank the author for careful and detailed comments. I especially appreciate the inclusion of significance statistics and the addition of the "Maintenance of a duplication upon fixation in the population" section.

https://doi.org/10.7554/eLife.29739.011

Author response

[Editors’ note: the author responses to the first round of peer review follow.]

Reviewer #1:

Authors propose that gene duplication reduces gene expression noise, which could be beneficial and hence lead to the fixation of newly duplicated genes. However, I believe this model is untenable. My detailed comments follow.

1) There is no lack of evolutionary models to explain the fixation and long-term retention of duplicates. In terms of fixation, which is the focus of the present study, a new duplicate may be fixed by positive selection for increased gene dose (for either the main or minor function of the gene) or genetic drift. The Introduction seems to suggest a lack of suitable models (hence need for a new model), which is misleading.

We do not agree with this appreciation, as saying that fixation of gene duplicates is well explained just by selection for increased dosage or by drift is ignoring a lot of literature. The search for alternative models date back several decades. As already pointed out by Spofford (Am. Nat. 103:407-432, 1969), there is a significant gap in our understanding of gene duplication concerning the critical initial phase; a gap that still needs to be filled. Please, see the review by Innan and Kondrashov, 2010 where several models are discussed regarding this long-standing problem. Our work indeed goes in this direction, trying to determine if duplication is advantageous per se, without invoking the necessity of more expression or a new function. At this point, we would like to stress that our focus is on the earliest stages of selection on gene duplication.

Many records expose that dosage or drift are not sufficient criteria. For example, an increased expression can be useful in extreme circumstances (e.g., to face a stress), but not in routine environments for which the organism should be adapted. Moreover, many duplicates do not contribute to increase total expression (Qian et al., 2010), which entails they were not selected for dosage. It is certainly unlikely that all these duplicates were fixed fortuitously, so additional criteria are required. We believe that all this is well explained in the Introduction, so we ask reviewer to reconsider his/her position after re-reading the manuscript.

But even more. Selection for increased dosage or by drift is a model that cannot explain, for example, why duplicates maintain expression, as well as function (DeLuna et al., 2008), or why duplicates are enriched in TATA boxes, elements that contribute to increase variability (Lehner, 2010). These relevant aspects are discussed in our manuscript, and our results indeed contribute to enlarge our knowledge about gene expression control mechanisms, and in particular the mechanism of gene duplication.

2) The present model relies on a reduction of gene expression noise caused by gene duplication. This noise reduction is tiny. In the best case scenario, the intrinsic noise measured by CV is reduced by 29% upon duplication. But because expression noise is mainly from extrinsic noise, which is not reduced by gene duplication, the fraction of total expression noise reduced is minute (likely <5%).

We are sorry to say these numbers are wrong.The variance of the stochastic fluctuations in gene expression is reduced by 50% upon duplication when only intrinsic noise is taken into account. This leads to an average selection coefficient of 0.1% (duplicate vs. singleton), which is sufficient to be selected (Figure 4B). When in addition extrinsic noise is considered, the stochastic fluctuations in gene expression are reduced in a lesser, but still significant way upon duplication, leading to selection coefficients of about 0.05%, which is still selectable in populations larger than 104 individuals (Figure 4C). When intrinsic and extrinsic noise are of similar magnitude, the reduction is of 25%. When extrinsic noise is double the intrinsic noise, the reduction is of 17%. We have rewritten the text to have: “the variance of the stochastic fluctuations (noise) in gene expression is reduced by 50% upon duplication (Wang and Zhang, 2011; when only intrinsic fluctuations are considered). However, when both intrinsic and extrinsic fluctuations are considered, the variance is reduced by 15 – 25%. In any case, this increases fitness on average”. After this clarification, we hope reviewer now agrees with us.

3) The fitness benefit from 5% noise reduction will be swamped by a much greater fitness cost of doubling the expression of the gene owing to gene duplication. So, authors propose that the total expression level of the duplicate pair is halved by a mutation. Simply halving the total expression is actually not sufficient, because the above calculation of the noise reduction assumes equal expression levels of the two genes. If the two genes have different expression levels, the amount of noise reduced becomes even smaller. So, two very special mutations that reduce the expression of each gene by ~50% are required. I believe the probability of simultaneously acquiring two such mutations in a cell is almost 0.

As said before, the reviewer’s claim of 5% noise reduction is erroneous. That being said, the model only requires one mutation, so the second reviewer’s claim here about two mutations is also erroneous. Cell populations inherently present genetic variability, and in few generations one could find a mutant with half expression. The idea is that duplication occurs in that mutant cell. The precise probabilities of occurrence and fixation are calculated in this work. Note that we have recalculated them in the new version following the suggestion of reviewer 2.

4) Compared with the probability of acquiring the above two mutations, the probability of acquiring mutation(s) conferring a new function may be larger. In other words, neofunctionalization is probably more likely to happen than the scenario proposed by the authors.

As stated before, the model is not based on the accumulation of two mutations, but just one. Therefore, this comment does not apply.

5) Authors based their calculations on one gene (lacZ), but wrote as if the calculations apply to all genes.

Indeed. The theory is general and is not particular of any gene. We chose lacZ gene for illustrative purposes and because the dose-response curve and associated fitness function are known in this case (experimentally validated). The lac operon has been studied since the times of Jacob and Monod as a paradigmatic system from which to derive general principles of gene regulation. See e.g. the works by Elowitz et al., 2002 to study stochastic gene expression or by Garcia and Phillips (PNAS 108:12173-12178, 2011) to study thermodynamics of transcriptional repression, both with the lac promoter. This is normal practice in the field of dynamic systems biology, where the detailed study of the dynamic behavior of natural systems is able to derive general rules applicable to any system with similar molecular features (see the Alon’s textbook). In words of Savageau, “the lactose (lac) operon of Escherichia coli serves as the paradigm for gene regulation, not only for bacteria, but also for all biological systems from simple phage to humans. The details of the systems may differ, but the key conceptual framework remains, and the original system continues to reveal deeper insights with continued experimental and theoretical study” (Math. Biosci. 231:1938, 2011).

6) I wonder why lacZ is not duplicated in E. coli if their theory predicts that duplication of lacZ is beneficial.

As we discuss in the manuscript, duplicates are created and deleted continuously (subsection “Fixation is conditioned by the unexpected recurrence of formation and deletion of duplicates in a population”). Because these rates have been shown to be higher than expected (see e.g. Genetics 184:1077-1094, 2010 or Genetics 194:937-954, 2013), the selection coefficient has to be greater than the deletion rate to escape from such birth-death processes and reach fixation. In bacteria, the deletion rate for lacZ gene is quite high (~10-2, Genetics 184:1077-1094, 2010), which is greater than the average selection coefficient that we calculated (0.1% = 10-3), then preventing its duplication. For other genes (or the same lacZ gene in other genetic context), the deletion rate can be much lower (~10-4, Genetics 192:397-415, 2012), which is now smaller than the selection coefficient and fixation of a duplicate can occur. This relevant aspect has been missed by this reviewer. To reinforce this, we have added the following sentence: “perhaps, this is why lacZ is not duplicated in E. coli despite this may be beneficial”.

Certainly, we are not interested in analyzing the duplicability of particular genes, but in providing a general theory to explain the intrinsic adaptive value of duplicates, something that completes the model of fixation by dose selection or drift (Figure 7).

7) They provide no empirical evidence for even one duplicate gene that was likely fixed by the mechanism proposed.

We have added the following text in the new version of the manuscript: “The genomic inspection of organisms in which genetic drift is not, in principle, a suitable force to drive the fixation of duplicates (e.g., bacteria or yeast; Lynch and Marinov, 2015) gave us some empirical insight, despite the masking produced by subsequent evolutionary trajectories. […] Arguably, duplication might have been fixed in this case to cope with gene expression inaccuracies, especially when GntR produces bimodal responses (captured in single-cell experiments; Afroz et al., 2014)”.

This contains some analysis of duplicated genes in bacteria and yeast to support our theory.

Reviewer #2:

[…] 1) The writing needs to be improved. Some wordings are confusing and also the overall structure of the paper would benefit from a more logical organization. The different alternative assumptions should be confronted directly side by side to clarify the limitations and the conditions in which the system could evolve. The issue of trade-off that is introduced as being important in the Introduction should be better addressed.

Here are some examples of sentences that need to be revisited:

- The first sentence of the Abstract is difficult to read. The fact that duplication contributes to complexity implies that duplicates are maintained. Why use “albeit”?

Removed.

- Introduction, second paragraph, first sentence. This sentence is also hard to follow and brings multiple important elements in a single sentence.

Split.

- Introduction, fifth paragraph, second to last sentence. Again hard to follow.

Rewritten.

- Introduction, last paragraph. What is a real gene?

“Real” removed.

- “Here, we simply considered a cost function based on LacZ expression, although it would be more precise a cost based on lactose permease (LacY) activity (Eames and Kortemme, 2012)”. This sentence is difficult to follow.

Rewritten.

- “The population was let to evolve without introducing any artifact…”? What does artifact refer to here?

Replaced by “bias”.

- Subsection “Most of the new-born duplicated genes are costly for the cell and do not offer phenotypic accuracy”, last paragraph. It is strange to start this paragraph with “even though”. We would expect a contrast to be made but it is not the case.

Removed.

Subsection “Most of the new-born duplicated genes are costly for the cell and do not offer phenotypic accuracy” “as long as” is not used in the proper context.

Restructured.

- Subsection “Fixation is conditioned by the unexpected recurrence of creation and deletion of gene duplications in a population”, first sentence. The word “created” is used to refer to gene duplication. Gene duplication is a process that cannot be created. Gene duplicates can be created, although I would refrain from using “created” here.

Fixed.

- The authors use “simple” and “punctual” mutation rates. They may want to refer to per base or nucleotide mutation rate instead, or use other standard nomenclature.

Fixed.

- Subsection “A comprehensive model compatible with population genetics to explain the early fate of gene duplications”, second paragraph. It would be better to state directly the effect of generation time rather than mention “simple” organisms because they have short generation time.

Fixed.

2) A large fraction of the results presented assume that the sum of expression of the two copies is equal to the expression of the ancestral copy. This assumption is later relaxed in the paper. However, because expression is likely to scale with copy number, this assumption is most likely extremely optimistic. In addition, it is possible that with two copies, repression is not as efficient and the genes are now expressed even when not needed. The two different scenario (2X expression and 1X expression, and their intermediates if possible) should be compared side-by-side and better arguments should be presented as to why 1X is achievable.

Our model indeed considers that expression is doubled after duplication. Thus, it does scale with copy number, and we never assumed the contrary. The relevant question is, hence, how this scenario is compatible with dosage sharing. We solved this by proposing that the genetic variability existing in a population allows having mutants with half expression (see Figure 6B), which can recover the ancestral expression level after duplication. The effect of intermediates between 1x and 2x is shown in Figure 6C.

In the first part of the manuscript, we assumed that a genotype carrying a duplicate with dosage sharing already existed to study its eventual selective advantage. We have added the following sentence: “For the moment, we ensured gene dosage sharing to evaluate in a quantitative way the goodness of having a second gene copy for the cell without invoking the need for more expression”. In the second part, we studied how to reach such genotype, calculating the precise probabilities of occurrence and fixation. We have added new figures to elucidate our point (Figures 2, 5C, 6A in the new version).

In the particular case of the lac promoter, repression is highly strong (the effective dissociation constant between LacI and DNA is about nM, i.e., a few LacI molecules are sufficient to turn off completely the gene). Although repression can change after duplication when it is weak, this effect can be neglected to develop a first theory.

3) The issue of intrinsic and extrinsic noises should be brought earlier in the paper as this is a very important consideration. They could be introduced in the Introduction. Gene duplication is not expected to reduce extrinsic noise and extrinsic noise is usually the primary source of differences among cells. As far as I understand, they are treated as potentially contributing equally in the model, which is clearly not the case in reality.

Although the Introduction is already quite long, we have expanded it to include more detail about gene expression noise. The balance between intrinsic and extrinsic noise depends on the particular environmental conditions and the regulatory structures in which the gene is embedded.

In some cases, extrinsic noise is dominant, in others not (see e.g. PNAS 99:12795-12800, 2002 and Nature 439:861-864, 2006). For example, when the medium is rich in nutrients, the expression is low, and no further regulations affect the gene, intrinsic noise dominates. However, when the medium is poor, the expression is high, and the gene belongs to a complex regulatory network, extrinsic noise does. According to our results, when only intrinsic noise is considered, genetic variability is reduced in about 50%, while when both intrinsic and extrinsic noises are considered it is reduced in about 25%. But extrinsic noise does not seem to affect much the selective advantage provided by duplication (see Figure 4C).

Moreover, if reviewer looks carefully at the model, he/she will notice that intrinsic and extrinsic noises are treated differently. Intrinsic noise is particular for each copy (z1 and z2), while extrinsic noise is common for both copies (z0). Thus, our model indeed reflects what happens in reality.

4) An alternative to duplication is also an increase in expression level, which would make protein abundance more often above the critical expression value and thus also increase fitness, without the need for duplication. Mutations that increase abundance would also then compete with duplications.

This is precisely the study that we did, results shown in Figures 6E, F. See the associated text in the subsection “Expression demand in extreme environments can also lead to the fixation of a newborn duplicate in the population”.

5) Subsection “Gene duplication helps to better resolve the fitness trade-off”, second paragraph. The authors describe the fitness landscape as rugged but a rugged fitness landscape has multiple local peaks, which is not the case here.

We have replaced “rugged” by “hill-like” in the new version.

6) The authors define and introduce phenotypic accuracy, which is basically the inverse of noise. I am not sure more terms are necessary in this field. Not sure also that the use of information transmission helps this study and adds anything to the results.

Although we consider relevant the introduction of “phenotypic accuracy”, as it is almost a self-explanatory term, we have diminished its use in the manuscript. This is putting the system in an operational point close to the optimum, that is, make the phenotype as accurate as possible. This is more than noise reduction (subsection “Gene duplication helps to better resolve the fitness trade-off”).

On the other hand, the use information transmission further characterizes noise reduction for varying lactose dose. It is true it could be overlooked, although it brings a single parameter able to describe the accuracy of the dynamic response. In addition, this follows the tradition of using this information theoretic parameter to characterize regulatory systems with acquired genetic redundancy (Science 334:354-358, 2011, eLife 4:e06559, 2015, or PLoS Comput. Biol. 12:e1005156, 2016).

7) Subsection “Gene duplication helps to better resolve the fitness trade-off”, last paragraph. The authors say that the two surfaces reassemble. This interpretation appears to be rather subjective. It would be useful to explain why this matters and how similar they really are.

The two surfaces resemble in the sense that they present the maximum for the same lactose dose. We performed a Spearman’s correlation test. This matters because it shows that mutual information (simply measured from gene expression data) can be used as a proxy of selection coefficient (hard to measure). We have added the following sentence: “We could then predict a cell fitness increment from measuring a reduction in gene expression noise”.

8) The authors introduce the concept of trade-off in the Introduction and argue that this is an important factor in evolution but largely ignore them as a constraint on the evolution of expression. At the same time, they state that an increase in expression is detrimental in most environments (subsection “Most of the new-born duplicated genes are costly for the cell and do not offer phenotypic accuracy”, first paragraph). This question needs clarification and again, a better organization of the text would allow to better contrast the systems with and without trade-offs.

We do not understand these statements. The trade-off, as stated in the manuscript (subsection “Quantitative biochemical view of a fitness trade-off”), arises because the enzyme (LacZ) generates at the same time a benefit (in metabolism) and a cost (due to expression). See Figure 1C. The model always takes into account the balance between metabolic benefit and expression cost (i.e., the trade-off). There are no simulations in which the expression cost in neglected. Stochastic fluctuations in expression are evaluated on the basis of such trade-off, as well as changes in mean expression.

The central aspect of this work is now illustrated in Figure 2. When the system is close to the optimal operational point (i.e., maximal fitness), changes in gene expression are costly (i.e., reduce fitness). These changes can be stochastic due to molecular noise or deterministic. Thus, noise reduction by gene duplication results in a useful strategy to increase fitness, provided the total gene expression is maintained (Figure 3E). If gene expression were increased after duplication, the system would move away from the optimum, with the consequent reduction in fitness (Figure 5C). This is why we proposed a model in which a mutation previous duplication is required. However, when the system is far from the optimum, changes in gene expression can be favorable (Figure 6E). This is the case in which duplication is selected by dosage.

9) The authors use a biological context that is laboratory populations and experimental evolution. For instance, they say in the first paragraph of the subsection “Fixation is conditioned by the unexpected recurrence of creation and deletion of gene duplications in a population”, that typical bacteria populations are 109 cells. I presume that they refer to cell populations in the laboratory. It would be more appropriate to refer to biological conditions that occur in nature. Even if laboratory conditions favor some evolutionary paths or dynamics, it would be irrelevant if the conditions do not exist in nature. This comment is also relevant for the simulations with dilutions and exponential growth in a flask. These simulations would be interesting if they were tested experimentally in the laboratory in this study. However, since we want to understand evolution in nature, why not use what is expected to be relevant in natural populations, including effective population size estimates, which have been computed for E. coli I presume. Since theory has shown that duplication reduces noise, what the readers will be really interested in is whether this is sufficient to favor the maintenance of duplicates in a biologically realistic system.

In this new version, we have recalculated the probability of fixation with the value of the effective population size for E. coli (about 108).

10) Subsection “Phenotypic accuracy can lead to the fixation of a new-born duplicated gene in the population”, first paragraph. Cis regulatory mutations are assumed to act on the average expression and not on the noise in expression. This is a convenient assumption but not necessarily true. Mutations most likely affect both at the same time (See the work of P. Wittkopp). This could reduce the mutational target site available for mutations reducing expression level.

This is a very interesting remark. We have introduced a citation to Metzger et al. (Nature 521:344-347, 2015) in this new version. We now say: “Mutations were also assumed to affect only the mean expression level and not the noise, even though this latter might happen (Metzger et al., 2015)”. Also in the Discussion, “These predictions involve, nevertheless, some limitations. On the one hand, due to a simplified mathematical model not considering the many molecular/genetic attributes that impinge implicitly on gene expression, such as promoter sequence-dependent noise levels (Metzger et al., 2015), response coupling due to genetic proximity (Becskei et al., 2005), or recursive fitness-expression dependence (Klumpp et al., 2009)”.

11) Subsection “Phenotypic accuracy can lead to the fixation of a new-born duplicated gene in the population”, first paragraph. The authors discuss the fact that about 10% of mutations affect expression (reduction by about 50%). To calculate the rate at which these mutations occur, one needs to know how many sites in the genome have these effects, not what fraction of mutations that have been studied reduce expression. It is not 10% of all mutations in the genome that reduce expression, but rather 50% of the 75 bp region as studied by Kinney et al. This should be clarified in the calculation. Also, the probability that a duplication and a mutation that reduces expression by 50% occur in the same cell in the same generation depend on their equilibrium frequency and somehow the effective population size? The order of appearance would also matter because reducing the expression of only one of the copy (if the mutation occurs after the duplication) is not going to bring the expression level to the ancestral state.

There, we were talking about the mutations falling in the promoter (as the sentence starts with “Mutations in the cis-regulatory region of the lacZ gene…”). The 10% refers to a 75 bp region, not the whole genome.

We would say that the probability of co-occurrence (a mutation that reduces in half expression and a duplication) is independent of the population size. The population size determines the number of generations to wait for such co-occurrence.

The order of appearance of these mutational events is certainly an aspect that we did not consider. This would reduce in half the estimation. We have amended this.

12) Discussion, second paragraph. It is not clear that all of the results mentioned here derive from the theory proposed and if the results actually suggest a reinterpretation of the results. To be useful, it would be important to have predictions from this model that would be specific to this model and could not be explained by the previous models proposed. Also, it would be useful if some were tested here to actually show that some cases known in nature seem to fit the model. Any variation in gene copy number in bacteria that cannot be explained by dosage effects alone or other models of duplicate evolution?

We have rewritten the Discussion to better provide an interpretation of our results. In addition, we have added text in the Results (subsection “Maintenance of a duplicate upon fixation in the population”), regarding the “genomic inspection of organisms”, to support a fixation model in which noise reduction is relevant. We have added accordingly significant references.

13) The authors assume that the gene expression partitioning seen for pairs of duplicates is 50-50%, but according to Gout and Lynch, this is very often not the case. It is not clear how an expression partitioning that is not 50% contributes to reduced noise in expression. This could be explored here.

This is precisely what we did in former Figure 6G, citing in the text Gout and Lynch, 2015.

14) Some reasoning needs to be revisited carefully. For instance, in the third paragraph of the Discussion, the authors predict that essential genes would be less duplicable as a consequence of their reduced expression noise. Essential genes are not created essential and may derive from non-essential genes, which were noisy initially. If these genes show less expression noise because they contribute more to fitness, it means that selection for lower noise could have favoured their duplication (at the same time making them non-essential, making this effect hard to see).

We have added this interesting remark in the Discussion: “However, this consideration should be taken with caution, as genes not essential a priori could be duplicated and then, upon fixation, accumulate beneficial mutations (Han et al., 2009) to ensure preservation for long time, resulting a posteriori in essential genes due to functional diversification (as it seems in the case of mammals; Makino et al., 2009)”.

15) Subsection “In silico evolution”. Why is evolution envisioned as if it occurred in the laboratory? It is already unclear if experimental evolution reflects evolution in natural systems so simulating experimental evolution appears to move away from nature.

Experimental evolution was simulated as a way to determine whether a duplicate, increasing fitness on average due to noise reduction, would be able to invade a population. Invasion is not obvious because the fitness increase upon duplication occurs only on average. Of course, this framework is a proxy of what occurs in nature. But we consider it is sufficient for our purposes. We have acknowledged this aspect in the manuscript: “For simplicity, we simulated a scenario of experimental evolution (Elena and Lenski, 2003; Dekel and Alon, 2005), although the dynamics in nature might be more complex”.

16)Subsection “Genetic diversity”, last paragraph. Wouldn't the equilibrium frequency just be Uc/Ud?

No, the frequency is given by Uc/(Uc+Ud). Uc/Ud gives the ratio. If Ud>>Uc, then the frequency and the ratio can be considered equal.

17) Figure 1. Should explain what is x/x0 in the legend.

This corresponds to a normalized lactose concentration. This has been specified in the legend.

Reviewer #3:

This manuscript puts forth an interesting new theory on how newly birthed duplicated genes could eventually fix in a population. While the work laid out here seems to be of large interest, I have a few concerns that I would like to see addressed before publication.

My main concern with this publication is that the bulk of the work is centered on examining a system where a duplication does not result in a change of total expression, which is at best a very rare occurrence. While this is discussed later in the manuscript, some justification of why this situation was chosen for the biases of this work should be included in the "Gene duplication helps to better resolve the fitness trade-off" section.

The seminal study by Qian et al. (Trends Genet. 26:425-430, 2010) showed just the contrary. In fact, many old duplicates that were fixed still maintain a high degree of functional similarity, and each copy shows a substantial decrease in its expression level after duplication to maintain a similar level with respect to the ancestral genotype (dosage sharing model). In addition, recent works also appear to support this hypothesis, e.g., the ones by Lynch (Mol. Biol. Evol. 32:2141-2148, 2015) and Pritchard (Science 352:1009-1013, 2016).

That being said, our model indeed considers that expression is doubled after duplication. The relevant question is, hence, how this scenario is compatible with dosage sharing. We solved this by proposing that the genetic variability existing in a population allows having mutants with half expression (see Figure 6B), which can recover the ancestral expression level after duplication. This is one of the key aspects of this work, and we hope now the reviewer realizes about it.

The claim that the actual and the optimal dose-response curves (Figure 1E and Figure 2F) are similar doesn't seem very convincing. Showing this data in something like a q-q plot and reporting a correlation would aid in the argument. This is especially important for Figure 2F when you make the comparison between the duplicate and the singleton, because there does not appear to be much of difference between both.

A q-q plot does not apply here, as we are not dealing with probability distributions. In any case, we could use the Euclidean distance to measure the distance between the two curves. We have added in the new version the following: “By generating different dose-response curves with values of x0 (lactose EC50 on LacZ) between 0.01 and 1 mM, we found that most of them deviate from the optimal one (P = 0.02; Euclidean distance as a metric)”.

But the problem with this comment is that the reviewer seems to ignore the nature of the two curves put into question. The actual dose-response curve between lactose and LacZ is well described by a sigmoidal function in which the EC50 value is represented by the parameter x0 in our model. This curve was obtained just by fitting the resulting LacZ values against the different lactose doses. The optimal dose-response curve, by contrast, was obtained mathematically by derivation of the fitness function (i.e., dW/dy = 0); a fitness function constructed by Dekel and Alon, 2005 from the experimental evaluation of the metabolic benefit and the expression cost, not involving the parameter x0. In this regard, that both curves roughly overlap is remarkable, suggesting that the lac promoter evolved to reach optimality (indeed, the main result by Dekel and Alon). We relied on this model to develop of theoretical work, but we cannot re-explain all that study here.

The comparison of non-normalized mutual information is confusing. Stating what the I values are and that one is 25% higher than the other doesn't convey the message that the duplication changes fidelity in a significant way. Is there an additional metric that could be used to better make this point?

Information transfer is a well-known magnitude in information theory to describe input-output associations, introduced by Shannon several decades ago. In recent years, this magnitude is being used in biology to describe genetic systems [see e.g. the works by Levchenko (Science 334:354-358, 2011) or by O’Shea (eLife 4:e06559, 2015)], although it is still unfamiliar for many researchers. In this case, information transfer (measured as mutual information) captures all stochasticity underlying gene expression data to produce an outcome that be compared. An increase of 25% in mutual information is indeed significant. We have added in the new version the following: “significance assessed by a z-test, P = 0 with 104 points” (Results) and “To compare statistically two I values, we followed the approximation proposed by Cellucci et al., (2005) to obtain an equivalent correlation coefficient, and then the Fisher’s r-to-z transformation” (Materials and methods).

Perhaps, the reviewer is more familiar with other measures of association, like Pearson’s correlation. Mutual information outperforms this kind of correlation (see e.g. PLoS Comput. Biol. 12:e1005156, 2016). In fact, the calculation of mutual information is equivalent to calculating the G statistic, a likelihood-ratio statistic (see the Sokal’s textbook).

The set up in the Introduction could be improved by adding further detail about why reducing gene expression inaccuracies results in increased fitness.

Although the Introduction is already quite long, we have expanded it to include more detail about gene expression noise.

Often the model is linked to values that have been "experimentally determined" but there doesn't appear to be a clear reference to where these values have come from.

As it is said in the Materials and methods section, the parameters that define the fitness function come from the work by Dekel and Alon (Nature 436:588-592, 2005).

The amount of in. noise is an important parameter in this model. Any statement about the amount of in. noise that exists in biological systems would aid in linking this model back to the biology. Is a moderate (0.3) amount of in. noise to be expected?

The values of noise that we considered in this study are based on previous studies analyzing by means of single-cell techniques gene expression variability. We have added the following: “Typical values characterizing the magnitude of the stochastic fluctuations (ηin and ηex) range between 0.1 and 0.5. They lead to values of gene expression noise (understood as the coefficient of variation) between 0.26 and 0.72 (in the case of ηin = ηex and x = x0), in agreement with experimental reports (Elowitz et al., 2002)”.

The Discussion section largely centers on further directions of this work and ends abruptly. Including a section about the limitations of this work and also casting this work into a larger context would be appreciated.

We have expanded the Discussion. Note however that Figure 7 already puts our work in context.

I believe that eLife requires that you make any code used available. I would suggest putting your simulation code in a repository and including the link in the manuscript.

Regarding the fitness and gene expression models, we detailed the precise equations in the Materials and methods section, as well as all parameter values. In fact, the fitness function was developed by Dekel and Alon, 2005, as stated in the manuscript. Any reader could implement this model easily, as we did. Regarding the in silico evolution experiments, we implemented our own code to evolve a cell population according to a framework of experimental evolution (serial dilutions). With the instructions we provide in the manuscript, it is straightforward to reproduce our results. Anyway, in looking for acceptance in eLife of the manuscript, we have deposited in SourceForge a Matlab file to model the fitness and gene expression as a function of lactose, and a C++ file to perform the in silico evolution of a population of cells according to such fitness function.

Overall, this is an interesting manuscript but I feel the way some of the data is presented could be changed to strengthen the author's arguments. By including more detailed statistical analysis and expanding some portions of text for clarity would improve this manuscript substantially.

The reviewer must note that this is not a bioinformatic study in which several statistical analyses are carried out, but a dynamical systems study (a mathematical approach to evolutionary systems biology). Here, we relied on bottom-up models of stochastic gene expression and cell fitness to perform our calculations. Suggesting typical comments found in evaluations of bioinformatic studies (like doing more statistical analyses, correlation plots, etc.) seems not appropriate here.

[Editors’ note: the author responses to the re-review follow.]

The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

One reviewer still has some comments on the presentation of your results. Please check these carefully and clarify the issues as much as possible. Such changes should improve the impact that this manuscript will eventually have. While I do not think that further reviewing will be necessary after these changes are introduced, I should like to ask you to provide nonetheless a careful response letter, indicating which changes were incorporated.

Reviewer #2:

I maintain my comments on the previous version of the manuscript. I believe the paper is hard to follow and extremely specialized such that it is hard to evaluate whether the observations are generalizable.

We have followed all these suggestions to improve readability.

Important concepts in some sections are not introduced properly in the Introduction (tradeoffs for instance do not only include production costs but any other types of negative effects, including in other environments). Some assumptions made for the analysis are not well detailed, for instance the extent of noise in expression, the cost of expression. Another example is the statement made from Figure 6A that most mutations are nearly neutral. Given what was said about the importance of gene expression tuning and the large Ne for E. coli, most of these changes are most likely not neutral at all. This is a surprising statement given that the paper shows that small changes in the distribution of expression levels can affect the fate of gene duplication.

We have added the following in the first Results section: “In cellular systems, fitness trade-offs arise because beneficial actions involve costs. […] Such components can be described in different ways according to the problem. A paradigmatic and simple fitness trade-off emerges when…”.

Regarding the extent of noise, we have added: “The magnitudes of the stochastic fluctuations were chosen as to end in typical variations of lactose EC50 of 10 – 100%, up or down, resulting in values of gene expression noise, around 0.5, compatible with experimental results (Elowitz et al., 2002).” This information was specified in the Materials and methods section, but now it is also included in the Results section.

Regarding the cost of expression, we have added: “with a marginal cost of 0.036 in the units of the model (Dekel and Alon, 2005)”.

The sentence about mutations in the promoter has been rewritten as: “This indicates that about 10% of them yield cells with nearby 50% lower expression” (removing the claim about neutrality).

What we would like to know is under which noise regime (showed to be likely based on observations) this mechanism could affect evolution given a clear set of assumptions that are shown to be realistic. I do not feel we know this by reading the paper as it is.

We have introduced the following text in the Discussion section: “Certainly, by aggregating the responses of two genes, intrinsic fluctuations can be mitigated, but not fluctuations of extrinsic nature. […] To follow our model, noise has to mainly impinge the regulation of the system, i.e., disturb the link between the signal molecule and gene expression (Blake et al., 2006)”.

Some of the concepts introduced is not defined properly, for instance phenotypic accuracy. Here the authors say that phenotypic accuracy (subsection “Gene duplication helps to better resolve the fitness trade-off”, second paragraph) is the fact that phenotypic responses generated by duplicated genes give on average higher fitness values than responses generated by singletons. This is simple corollary to the fact that duplication reduces noise, this is not a new concept that needs the be defined. Using such definitions is just a distraction that reduces our understanding. Same could be said about information content. This is not appropriate for a generalist journal such as eLife.

We have avoided the introduction of the term “phenotypic accuracy”.

It is not clear why we need simulations at all if the selection coefficient have been estimated given all of the analytical work that has been done previously (fixation prob. versus Ne and S).

As noise reduction is a strategy that works on average (i.e., to observe its effects we need sampling), we believe that simulations (Figure 4) are interesting. It is not the same thing to have a constant selection coefficient (a cell with duplication performs better all the time) than an average selection coefficient (a cell with duplication performs better most of the time, but not all).

The section on expression demand in extreme environments (subsection “Expression demand in extreme environments can also lead to the fixation of a newborn duplicate in the population”) does not really deal with the question in hand, which is the effect of duplication of noise reduction. There are examples of arbitrary assumptions here too, for instance the consideration of a lac promoter with 40% lower activity as a starting point.

We have removed this section (text and corresponding figures) from the manuscript.

Examples of sections with lack of logical flow:

Introduction, fifth paragraph;

Rewritten as: “But this strategy works on average, i.e., duplication may warrant more accuracy when multiple decisions in gene expression are considered. […] Other mechanistic models have been proposed beyond the demand for increased expression or the accumulation of beneficial mutations (Innan and Kondrashov, 2010), yet do not convincingly resolve the main population genetic dynamical issue”. We have removed the mention to the trade-off here to avoid confusion, now introduced at the beginning of the Results section.

Subsection “Gene duplication helps to better resolve the fitness trade-off”, first two paragraphs;

Rewritten and removed parts related to information transfer. We have also added more details about Figure 3F.

Subsection “Expression demand in extreme environments can also lead to the fixation of a newborn duplicate in the population”, second paragraph.

Removed.

https://doi.org/10.7554/eLife.29739.012

Article and author information

Author details

  1. Guillermo Rodrigo

    1. Instituto de Biología Molecular y Celular de Plantas, CSIC – UPV, Valencia, Spain
    2. Instituto de Biología Integrativa y de Sistemas, CSIC – UV, Paterna, Spain
    Contribution
    Conceptualization, Formal analysis, Validation, Methodology, Writing—original draft
    For correspondence
    guillermo.rodrigo@csic.es
    Competing interests
    No competing interests declared
    ORCID icon 0000-0002-1871-9617
  2. Mario A Fares (deceased)

    1. Instituto de Biología Molecular y Celular de Plantas, CSIC – UPV, Valencia, Spain
    2. Instituto de Biología Integrativa y de Sistemas, CSIC – UV, Paterna, Spain
    3. Trinity College Dublin, University of Dublin, Dublin, Ireland
    Contribution
    Validation, Writing—original draft
    Competing interests
    No competing interests declared

Funding

Ministerio de Economía y Competitividad (BFU2015-66894-P)

  • Guillermo Rodrigo

Ministerio de Economía y Competitividad (BFU2015-66073-P)

  • Mario A Fares

Generalitat Valenciana (GVA/2016/079)

  • Guillermo Rodrigo

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

This work was supported by grants BFU2015-66894-P (to GR) and BFU2015-66073-P (to MAF) from the Spanish Ministry of Economy (MINECO/FEDER), and also by grant GVA/2016/079 from the Generalitat Valenciana (to GR).

Reviewing Editor

  1. Diethard Tautz, Reviewing Editor, Max-Planck Institute for Evolutionary Biology, Germany

Publication history

  1. Received: June 20, 2017
  2. Accepted: January 4, 2018
  3. Accepted Manuscript published: January 5, 2018 (version 1)
  4. Version of Record published: January 17, 2018 (version 2)

Copyright

© 2018, Rodrigo et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 873
    Page views
  • 157
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Computational and Systems Biology
    2. Physics of Living Systems
    Weerapat Pittayakanchit et al.
    Research Article
    1. Cell Biology
    2. Computational and Systems Biology
    Sean C Warren et al.
    Tools and Resources