Mutational robustness changes during long-term adaptation in laboratory budding yeast populations

  1. Milo S Johnson  Is a corresponding author
  2. Michael M Desai  Is a corresponding author
  1. Department of Organismic and Evolutionary Biology, Harvard University, United States
  2. Quantitative Biology Initiative, Harvard University, United States
  3. NSF-Simons Center for Mathematical and Statistical Analysis of Biology, Harvard University, United States
  4. Department of Physics, Harvard University, United States

Abstract

As an adapting population traverses the fitness landscape, its local neighborhood (i.e., the collection of fitness effects of single-step mutations) can change shape because of interactions with mutations acquired during evolution. These changes to the distribution of fitness effects can affect both the rate of adaptation and the accumulation of deleterious mutations. However, while numerous models of fitness landscapes have been proposed in the literature, empirical data on how this distribution changes during evolution remains limited. In this study, we directly measure how the fitness landscape neighborhood changes during laboratory adaptation. Using a barcode-based mutagenesis system, we measure the fitness effects of 91 specific gene disruption mutations in genetic backgrounds spanning 8000–10,000 generations of evolution in two constant environments. We find that the mean of the distribution of fitness effects decreases in one environment, indicating a reduction in mutational robustness, but does not change in the other. We show that these distribution-level patterns result from differences in the relative frequency of certain patterns of epistasis at the level of individual mutations, including fitness-correlated and idiosyncratic epistasis.

Editor's evaluation

Johnson and Desai developed an innovative yeast experimental-evolution system where they can insert barcoded disruptive mutations into the genome and measure their individual effect on fitness. They use this system to test whether these mutations have different effects on evolving lineages as they adapt over time. As expected, the mean fitness effect does decline in most (but not all) populations as lineages adapt, but in another condition, mean fitness effects of mutations do not change as the populations adapt. The authors suggest an intriguing interpretation that the ‘control coefficient’ of selection on growth can shift between different genetic modules over time, resulting in differing magnitudes of epistasis.

https://doi.org/10.7554/eLife.76491.sa0

Introduction

Evolutionary adaptation relies on recombination and spontaneous mutagenesis to constantly introduce variation into populations, upon which natural selection can act. The fate of a single mutation – and its impacts on the dynamics of adaptation – depends on how it affects organismal fitness, which we know depends in complex ways on the rest of the genetic background (reviewed in de Visser et al., 2011; Domingo et al., 2019; Lehner, 2011). Understanding this genotype dependence, or epistasis, involves analyzing key features of the fitness landscape, the high-dimensional map between genotype and fitness (Wright, 1932).

To investigate these questions, numerous studies have surveyed epistatic interactions among a variety of different types of mutations. These studies have found that beneficial mutations isolated from laboratory evolution experiments tend to exhibit negative epistasis. That is, they are less beneficial in combination than would be expected from the combination of their effects in isolation (Karkare et al., 2021; MacLean et al., 2010; Ono et al., 2017; Rokyta et al., 2011; Schenk et al., 2013; Schoustra et al., 2016; Zee and Velicer, 2017). However, some examples of positive epistasis among beneficial mutations have also been observed (Chou et al., 2009; Fumasoni and Murray, 2020; Hsieh et al., 2020; Khan et al., 2011; Levin-Reisman et al., 2019). Surveys of interactions between deleterious mutations have produced numerous examples of both positive and negative epistasis (Costanzo et al., 2016; Elena and Lenski, 1997; Hall and MacLean, 2011; Jasnos and Korona, 2007; Lalić and Elena, 2012; Sanjuán et al., 2004; Van Leuven et al., 2021).

While this earlier work has identified a wide range of epistatic interactions among specific combinations of mutations, several recent studies of epistasis in laboratory microbial systems have found that general patterns of fitness-correlated epistasis often emerge. These fitness-correlated patterns typically favor less-fit genotypes: for both beneficial and deleterious mutations, we usually find that the fitness effect of a mutation is negatively (rather than positively) correlated with the fitness of the genetic background on which it occurs (Chou et al., 2011; Johnson et al., 2019; Khan et al., 2011; Kryazhimskiy et al., 2014). These patterns have been termed diminishing returns and increasing costs for beneficial and deleterious mutations, respectively. These trends are particularly intriguing to those interested in modeling the dynamics of adaptation for two reasons. The first is the analytical promise of a simple, monotonic relationship between fitness and the fitness effects of mutations. The second is the potential for these patterns of epistasis to explain declining adaptability, a commonly observed phenomenon in laboratory evolution experiments in which the rate of fitness increase slows as evolution proceeds (reviewed in Couce and Tenaillon, 2015; see Wünsche et al., 2017 for an analysis of the link between diminishing returns and declining adaptability).

The initial studies that identified patterns of fitness-mediated epistasis did so in the context of a relatively small number of beneficial mutations, finding that these mutations became systematically less beneficial in more-fit genetic backgrounds. More recent work has shown that as populations evolve the average fitness effect of a spontaneous beneficial mutation decreases (Aggeli et al., 2021; Wünsche et al., 2017). Much less work has been done to characterize how the effects of deleterious mutations interact with the beneficial mutations that fix during evolution (Remold and Lenski, 2004). We recently showed that the fitness effects of larger panels of ~100–1000 random insertion mutations (both beneficial and deleterious) tend to be systematically less beneficial or more deleterious in more-fit backgrounds (Johnson et al., 2019). However, the genetic backgrounds in that study were derived from a cross between two diverged yeast strains. It remains unclear whether similar patterns would hold across genetic backgrounds that are the result of long-term laboratory evolution because the mutations that drive evolutionary adaptation are selected along a line of descent, which in principle could affect their epistatic interactions.

Here, we directly address this question by measuring the fitness effects of a panel of insertion mutations during the course of a long-term laboratory evolution experiment in budding yeast. Specifically, we isolated clones from six timepoints spanning 8000–10,000 generations of adaptation to each of two constant laboratory environments from the ongoing evolution experiment we have recently described (Johnson et al., 2021). While the yeast strains used in our prior experiment differed at tens of thousands of segregating loci, the strains in this experiment differ by only tens or hundreds of mutations that fixed successively during evolution. By looking at the effects of insertion mutations in these strains, we are measuring a panel of hidden phenotypes (the fitness effects of the mutations) that may change predictably or stochastically during evolution. The widespread presence of epistasis observed in biological systems suggests that these hidden phenotypes may be important to long-term evolutionary outcomes as the fitness effects of mutations effectively open and close doors to unique pathways for evolution (Johnson et al., 2021; Karkare et al., 2021; Kvitek and Sherlock, 2011).

Robustness can be broadly defined as invariance in the face of perturbation (Masel and Siegal, 2009). Here, we are concerned specifically with mutational robustness, a measure of how invariant phenotypes are to mutations (Lauring et al., 2013). Our approach complements several recent studies that have analyzed changes in mutational robustness during evolution by conducting mutagenesis followed by phenotypic measurements. For example, Novella et al., 2013 found that vesicular stomatitis virus strains evolved in the lab gained robustness, measured based on survival after mutagenesis. In contrast, Butković et al., 2020 found that a strain of turnip mosaic potyvirus evolved in Arabidopsis thaliana lost robustness over time, measured as the change in the ability of the virus to retain its level of plant infectivity after mutagenesis. Our barcode-based mutagenesis system makes it possible to dissect these overall changes in robustness by measuring how they emerge as a result of changes in the effects of individual mutations.

In this study, we aim to leverage this mutation-level data to better understand the structure of epistasis in evolving populations. First, we analyze the overall distribution of fitness effects to measure how mutational robustness changes during adaptation in each environment. Next, we ask whether the distribution-level changes we observe can be explained by patterns of fitness-correlated epistasis among individual mutations. Finally, we examine how the effects of these mutations change in each of the evolving populations, asking whether epistasis is more often driven by predictable adaptations common across populations or by specific mutations fixed in a single population.

Results

Changes in the distribution of fitness effects during evolution

We isolated two clones from each of six timepoints from six haploid populations evolved in rich media at a permissive temperature (YPD 30°C) and from six haploid populations evolved in a defined media environment at a high temperature (SC 37°C), a total of 144 clones. We measured the fitness of each of these clones in the environment to which they adapted, finding that fitness increases steadily through time in both the YPD 30°C and SC 37°C environments, and displays a general pattern of declining adaptability (Figure 1A).

Figure 1 with 9 supplements see all
The distribution of fitness effects (DFE) mean declines in one of two environments during evolution.

(A) Changes in fitness during the evolution experiment, measured as the average fitness of two clones isolated from six timepoints in each population. In each graph, zero is the fitness of a fluorescent reference used in that environment. Error bars represent the standard deviation of the fitnesses measured for the two clones (points without error bars have errors smaller than the point size). (B) The mean fitness effect of the insertion mutations measured in the clones isolated from each timepoint. Asterisks represent a significant correlation (p<0.05, Wald test) between generation and DFE mean in that population alone. In both (A) and (B), the six populations are indicated by color. Error bars represent the standard error of the DFE mean, calculated from the standard errors of individual mutations (see ‘Materials and methods’). Additional DFE statistics are shown in Figure 1—figure supplement 4.

We next created a library containing each of 91 insertion mutations in each of our 144 clones. This set of mutations was identified previously as a subset of random insertion mutations that have measurable effects in the strains from Johnson et al., 2019. These mutations have a similar spectrum of effects in the clones isolated for this study (Figure 1—figure supplement 8), suggesting that they are also a broad sample of insertion mutations with measurable fitness effects in these strains. We measured the fitness effect of each mutation in each clone using barcode-based competition assays as described in Johnson et al., 2019. Because the molecular dynamics of evolution in these haploid populations are characterized by successive selective sweeps, we expect the two clones isolated from each population at each timepoint to have very similar genetic backgrounds. When we compare the average fitness effect measurement for each insertion mutation between these clones, we generally see strong agreement, with a few exceptions (Figure 1—figure supplements 13). These exceptions likely represent rare but important genetic differences between clones from the same population-timepoint. Given this, we chose to analyze our data in two ways. First, we improve the reliability of our fitness effect measurements for each population-timepoint by using measurements from combined barcodes (cBCs) from both clones, treating them as we treated biological replicates in Johnson et al., 2019. Second, we treat each clone independently. Figure 1—figure supplement 7, Figure 2—figure supplement 5, Figure 3—figure supplement 5, and Figure 4—figure supplement 1 show that our qualitative conclusions are unchanged when using this second approach.

We find that the distribution of fitness effects (DFE) of our 91 insertion mutations changes over the course of the evolution experiment. In the YPD 30°C environment, the mean of the DFE decreases over time as fitness increases during evolution (Figure 1B), consistent with a general pattern of both diminishing returns and increasing costs. The negative relationship between generations evolved and the mean of the DFE is significant in the entire set of population-timepoints (p=2.0 × 10–6, Wald test) and in four of our six individual populations (p<0.05, Wald test). In contrast, although the DFE does shift in individual populations evolved in our SC 37°C environment, we do not see any consistent patterns (Figure 1B).

In both environments, we find that the changes in the mean of the DFE are modest compared to our previous experiment (Figure 1—figure supplement 6 shows that the slope of the regression between the DFE mean and background fitness in YPD 30°C is less than half the magnitude of the corresponding slope in Johnson et al., 2019). The reasons for this are unclear, but may reflect the fact that fitness differences between the strains we study here are caused by a smaller number of mutations. We note that because of the modest differences in the DFE between clones, the noise in our measurements contributes significantly to the variation. One potentially biased source of this noise is missing measurements: not all mutations have fitness effects measured in each population-timepoint due to differences in transformation efficiency or, most commonly, mutations that had too few read counts in the first two timepoints of the fitness assay. As in Johnson et al., 2019, missing measurements of strongly deleterious mutations are more common in more-fit strains (clones from later timepoints) in YPD 30°C, suggesting that the negative correlation between generations evolved and the mean of the DFE would be stronger if more of these deleterious mutations had been successfully measured in clones from later timepoints. Indeed, if we look at a limited set of mutations with measurements in every population-timepoint or if we ‘fill in’ missing measurements using their average fitness effect across population-timepoints, we see similar or stronger patterns of change (p<5 × 10–7, Wald test) in the DFE mean in YPD 30°C (Figure 1—figure supplement 5). This relationship between the DFE mean and generations evolved also holds (p<5 × 10–8, Wald test) in our analysis where clones are treated independently (Figure 1—figure supplement 7).

Epistasis at the level of individual mutations

We next turn to the components of these distribution-level patterns: epistatic patterns for individual mutations. To look at these patterns, we focus on mutations that have fitness measurements in at least 20 population-timepoints in each environment (77 mutations in YPD 30°C, 74 in SC 37°C, 70 in both). We start by looking for patterns of fitness-correlated epistasis. We classify each mutation in each environment as correlated negatively or positively with background fitness if the correlation is significant at the p<0.05 level (Wald test) and the absolute value of the slope is greater than 0.05, and classify them as not significantly correlated if they do not meet these criteria. The cutoff of 0.05 for the slope was chosen to filter for mutations with an effect size across the range of background fitness that is larger than the typical standard error for our fitness effect measurements (see ‘Materials and methods’). For both environments, we find numerous examples of both negative and positive correlations, corresponding to increasing costs or decreasing costs, respectively, for deleterious mutations (and to diminishing returns or increasing returns, respectively for beneficial mutations).

In Figure 2, we show examples illustrating how the fitness effect of specific mutations vary across clones isolated from each environment, plotted as a function of the fitness of each clone in that environment (i.e., the background fitness). We find numerous examples of mutations that exhibit negative, positive, and nonsignificant correlations with background fitness in both environments (panels along the diagonal). We also find examples where a specific mutation exhibits a nonsignificant correlation in one environment and either a positive or negative correlation in the other (off-diagonal panels). Overall, consistent with our previous results, we observe about twice as many negative correlations as positive correlations. As we would expect from the DFE-level results, this imbalance is more pronounced in the YPD 30°C environment: 33/77 (~43%) mutations have fitness effects that decline significantly as background fitness increases in YPC 30°C compared to 17/74 (~23%) in SC 37°C. We also find that only 9/77 (~12%) mutations display the opposite pattern (fitness effects increase significantly as background fitness increases) in YPD 30°C, while 13/74 (~18%) display this pattern in SC 37°C. Because we are primarily focused on comparing the frequency of each pattern across environments, we report these values before multiple-hypothesis-testing correction here and in Figure 2; after a Benjamini–Hochberg multiple-hypothesis correction, these values fall to 24/77 (~31%), 15/74 (~20%), 9/77 (~12%), and 11/74 (~15%), respectively.

Figure 2 with 5 supplements see all
Patterns of fitness-correlated epistasis.

Each panel shows an example of a specific mutation with a particular combination of relationships (negative, positive, or nonsignificant correlation between fitness effect of the mutation, s, and background fitness) in the two environments; numbers indicate the total number of mutations displaying each pair of relationships. Each point depicts the fitness effect (y-axis) of one insertion mutation measured in one population-timepoint, with the measured fitness of that population-timepoint represented on the x-axis. Error bars show the standard error of both measurements (see ‘Materials and methods’). Axes are colored to identify the environment: in each panel, the blue axes on the left are data from YPD 30°C and the black axes on the right are data from SC 37°C. Points are colored by population, as in Figure 1. Each set of example plots is labeled by where the mutation is in the genome (i.e., which gene it disrupts). Additional comparisons of patterns of epistasis in these experiments and those from Johnson et al., 2019 are shown in Figure 2—figure supplements 14.

Modeling the determinants of fitness effects

The examples in Figure 2 demonstrate that correlations between fitness effects and fitness are common but often do not explain the bulk of epistasis. By definition, the fitness-correlated patterns we observe are the result of interactions between our insertion mutations and mutations that fix during the evolution experiment. If these interactions are all strictly ‘fitness-mediated,’ the fitness effects of mutations will be fully explained by a background fitness effect. Alternately, correlations between fitness effects and fitness could arise based on the average effect of a number of idiosyncratic effects that are more likely to be negative versus positive. To understand the relative contribution of these determinants of epistasis, we compare three linear models used to explain the fitness effects of a single mutation in each of our two environments:

  1. The fitness model (XM): the fitness effect of the mutation is a linear function of background fitness.

  2. The idiosyncratic model (IM): the fitness effect of the mutation can change in any population at any timepoint (and all subsequent timepoints) when an interacting mutation fixes in that population (see below for our constraints on fitting these parameters).

  3. The full model (FM): the fitness effect of the mutation is affected by both a linear effect of background fitness and the idiosyncratic interactions of fixed mutations, as in the idiosyncratic model.

We fit each model by ordinary least squares (OLS). We define the fitness effect of each mutation in the ancestral strain as the mean fitness effect measured among clones from the first timepoint, and fix the intercept of each of our models accordingly. For the idiosyncratic and full models, we add idiosyncratic parameters iteratively, choosing the parameter that improves the Bayesian information criteria (BIC) the most at each step. These coefficients represent epistasis between the insertion mutation in question and one or more unknown mutations that fix during evolution in one population. Because mutations generally fix between every pair of timepoints during evolution, there could in principle be one idiosyncratic parameter for each timepoint in each population, but allowing all of these parameters would constitute overfitting. To combat this possibility, we do not allow parameters that fit a single point (e.g., a parameter for an effect at the final timepoint), and we only allow one parameter per population. We stop this iterative process of adding parameters if the BIC improves by less than 2 during a step (or when there is one parameter per population). Note that because of this iterative parameter adding procedure, the full model may have a different set of idiosyncratic parameters and will sometimes have less explanatory power than the idiosyncratic model (i.e., IM is not nested in FM).

Figure 3A shows how well each model explains the fitness effect data for each mutation in each environment. We find that the fitness model often explains a modest amount of variance (mean R2: 18% in YPD 30°C, 19% in SC 37°C), but the idiosyncratic model (mean R2: 54% in YPD 30°C, 41% in SC 37°C) and the full model (mean R2: 53% in YPD 30°C, 44% in SC 37°C) usually offer more explanatory power. The examples in Figure 3B also demonstrate that epistasis is not strictly fitness-mediated; we commonly observe a stepwise change in the fitness effect of an insertion mutation in one evolving population, likely indicating epistasis between the insertion mutation and one or more mutations that fix in that population at a particular timepoint.

Figure 3 with 37 supplements see all
Determinants of fitness effects.

(A) For each environment, we plot the standard deviation of the fitness effect across all population-timepoints and the square root of the variance explained by each of our three models. The colored squares below each bar represent which model has the lowest Bayesian information criteria (BIC) for each mutation. Mutations shown in red or black are insertions in or near the corresponding gene, respectively; stars indicate the mutations shown in panel (B). Only mutations with fitness-effect measurements in at least 20 population-timepoints are shown. The insets show the distribution of all coefficients in the idiosyncratic model (IM) and full model (FM), pooled across all mutations. (B) Examples of idiosyncratic (left half) and full (right half) model fits. Model predictions are shown by dashed lines, and lines with contributions from indicator variables associated with a particular population are the same color as the points from that population (colors are the same as in Figure 1).

We can also ask which model best explains the data using the BIC, which penalizes models based on the number of parameters. The small squares below the bars in Figure 3A indicate which model has the lowest BIC for each mutation. In YPD 30°C, the full model has the lowest BIC for 40/77 (~52%) mutations and the idiosyncratic model has the lowest BIC for 37/77 (~48%). In SC 37°C, the full model has the lowest BIC for 49/73 (~67%) mutations and the idiosyncratic model has the lowest BIC for 24/73 (~33%). When we assess how well each model fits the entire dataset in each environment, the full model has a lower BIC than the idiosyncratic model in both environments.

Positive and negative coefficients in the idiosyncratic model represent positive and negative epistasis between mutations that fix during evolution and our insertion mutations. While these coefficients can arise in our modeling procedure due to noise alone, we find far more coefficients in our empirical dataset than in a simulated or shuffled dataset (see ‘Materials and methods,’ Figure 3—figure supplement 4). The bulk of the coefficients we find in YPD 30°C are negative (214/291, ~74%), while both positive (131/245, ~53%) and negative (114/245, ~47%) coefficients are common in our idiosyncratic model in SC 37°C (Figure 3A insets). Note that many of the positive epistatic terms in our idiosyncratic model in SC 37°C are the result of a consistent reduction in the fitness costs of some deleterious mutations in the first 2000 generations of evolution in all populations (e.g., see the mutation in VAM6 in Figure 3B and others in Figure 3—figure supplements 637, and see Figure 3—figure supplement 3 for a breakdown of coefficients by individual mutations).

The differences we observe in epistatic patterns between environments could be caused by interactions between epistasis and the environment (‘G×G×E’ effects), differences in the adaptive targets (i.e., the functional modules subject to selection) in each environment, or a combination of the two. To tease apart these possibilities, we measured the fitness effects of mutations in clones evolved in YPD 30°C in the SC 37°C environment. We assayed the background fitness of each of these clones in SC 37°C and found that populations evolved in YPD 30°C sometimes experience large fitness declines in the SC 37°C environment (Figure 4C). We also observe widespread epistasis in this alternate environment, but do not observe any overall trends in the mean of the DFE in SC 37°C over the course of evolution in YPD 30°C (Figure 4A and B). Instead, the variance in our data is dominated by one particularly low fitness clone (from population P1C05, generation 7550) in which several insertion mutations were strongly beneficial (Figure 4C).

Figure 4 with 1 supplement see all
Patterns of epistasis in a nonevolution environment.

(A) Same as Figure 3A, but for clones from YPD 30°C assayed in SC 37°C. We plot the standard deviation of the fitness effect across all population-timepoints and the square root of the variance explained by each of our three models. The colored squares below each bar represent which model has the lowest Bayesian information criteria (BIC) for each mutation. Mutations shown in red or black are insertions in or near the corresponding gene, respectively; stars indicate the mutations shown in panel (B). Only mutations with fitness-effect measurements in at least 20 population-timepoints are shown. The inset shows the distribution of all coefficients in the idiosyncratic model (IM) and full model (FM), pooled across all mutations. (B) Example IM model fit, as in Figure 3B. The model predictions are shown by bold dashed lines, and lines with contributions from indicator variables associated with a particular population are the same color as the points from that population (colors are the same as in Figure 1 and panel C). (C) The fitness and distribution of fitness effects (DFE) mean over time in YPD 30°C populations assayed in SC 37°C. The asterisk indicates a significant correlation (p<0.05). Error bars on fitness represent the standard deviation of the fitnesses measured for the two clones, but note that we were only able to measure the fitness of one clone at several population-timepoints due to low fitnesses relative to our reference; the corresponding points here have no error bars. Error bars on the DFE mean represent the standard error of the DFE mean, calculated from the standard errors of individual mutations (see ‘Materials and methods’).

When we apply the same set of models to this dataset, we again find that both background fitness and idiosyncratic effects can explain how many mutations’ fitness effects vary, with the latter again outperforming the former (Figure 4A). Notably, we observe relatively more positive epistatic coefficients (109/241, ~45%) than in YPD 30°C (77/291, ~26%), though we also observe a heavy tail of strongly negative coefficients. These patterns of idiosyncratic epistasis are not due to outsized contributions from a few populations; the distributions of IM coefficients for YPD 30°C populations are different in the two assay environments across populations (Figure 3—figure supplement 2). Overall, these results support the hypothesis that G×G×E effects underlie the differences we observe between environments (see also Hall et al., 2019), though they do not rule out the possibility that differences in adaptive targets between environments may also contribute.

Discussion

Shifting distributions of fitness effects

By using our barcode-based mutagenesis system to assay the fitness effects of 91 specific gene disruption mutations across numerous genetic backgrounds spanning 8000–10,000 generations of laboratory evolution, we have described how overall mutational robustness (defined in terms of the average effect of this type of insertion mutation) changes during evolution. We then dissected these overall effects in terms of how the fitness effects of individual mutations change during evolution. We find that populations adapting to our YPD 30°C environment become less robust to deleterious mutations over time. This shift in the mean of the DFE in YPD 30°C is not caused by strictly fitness-mediated shifts in the fitness effects of mutations, but is instead the result of an excess of negative idiosyncratic epistatic effects (Figure 3). In contrast, in clones isolated from populations evolved in SC 37°C, epistatic interactions are more evenly divided between negative and positive effects.

The fact that populations evolved in YPD 30°C lost robustness as they increased in fitness over time is broadly consistent with our earlier work showing that the DFE becomes more strongly deleterious in more-fit genetic backgrounds (Johnson et al., 2019). However, the loss of robustness we observe here was not as strong as in this previous work (see Figure 1—figure supplement 6 for a comparison to the effect size in Johnson et al., 2019), and we did not observe any predictable change in the DFE mean in SC 37°C. One potential explanation for the weaker patterns we observe at the DFE level in this experiment is that there are simply less mutations involved compared to the genetic backgrounds from our earlier work: the genotypes in our previous experiment were derived from a cross between two yeast strains that differed at tens of thousands of loci, so the pool of mutations was both larger and more balanced (each mutation present in ~50% of clones) than in this study. If the overall patterns of fitness-correlated epistasis arise due to the collective effect of numerous idiosyncratic interactions, rather than a genuine fitness-mediated effect, we would therefore expect weaker trends here (Lyons et al., 2020; Reddy and Desai, 2021).

Our results also illustrate a form of hidden evolutionary unpredictability, despite the fact that our populations increased in fitness over time along a predictable trajectory (Johnson et al., 2021). As these populations adapted, they accumulated mutations that carry with them epistatic interactions with potential future mutations across the rest of the genome. The patterns of epistasis we observe among the insertion mutations in our study demonstrate that the predictability of these second-order effects varies widely: some potential future mutations show strong fitness-correlated effects in every population, while others are affected by a small number of idiosyncratic interactions with mutations that fix during evolution. These kinds of unpredictable patterns of epistasis could lead to fundamentally unpredictable evolutionary outcomes, with changes in the fitness effects of mutations dynamically closing off or opening up evolutionary pathways during adaptation.

Patterns of epistasis among individual mutations

Most of the insertion mutations we analyze in this experiment are deleterious across most or all genetic backgrounds. We find that these deleterious mutations tend to become more strongly deleterious over time in populations evolved in the YPD 30°C environment. This pattern results from an overabundance of negative epistatic interactions, which could involve either the beneficial mutations that drive fixation events or the neutral or weakly deleterious mutations that hitchhike to fixation during selective sweeps (McDonald et al., 2016). The strong predictive power of background fitness for the fitness effects of some mutations suggests that interactions with beneficial mutations are driving these patterns of epistasis, but idiosyncratic effects that deviate from these relationships hint at a role for interactions with hitchhiker mutations as well (Figure 3). Systematic backcrossing and mutagenesis experiments would be required to disentangle these patterns (along with any cases of higher-order epistasis involving multiple mutations that fix during evolution), but we suspect both types of interactions contribute to the epistasis we observe.

Among the relatively small number of beneficial insertion mutations we analyze, we find that the beneficial effects tend to decline over time, and almost universally shift to neutral or deleterious effects later in the experiment (Figure 3—figure supplement 1). These results provide additional examples of diminishing-returns epistasis among beneficial mutations, which can at least partially explain the pattern of declining adaptability often observed in microbial evolution experiments (Chou et al., 2011; Khan et al., 2011; Kryazhimskiy et al., 2014; Wünsche et al., 2017). One of these mutations has a clear functional story behind it: the beneficial effect of an insertion mutation in ADE5,7 is the result of breaking the adenine synthesis pathway upstream of a toxic intermediate, so when populations in SC 37°C fix other loss-of-function mutations that also break the pathway between the first two sampled timepoints in the evolution experiment, this mutation becomes neutral (we do not see this effect in YPD 30°C because most populations have fixed a mutation in the adenine pathway by the first sampled timepoint; for a more in-depth discussion of this case of epistasis and contingency, see Johnson et al., 2021).

The picture that emerges from our data is one in which idiosyncratic epistatic effects are largely unpredictable but also unbalanced (have a nonzero mean), which leads to correlations with background fitness. In both our own work and other studies of microbial evolution, the mean epistatic effect is negative: beneficial mutations tend to have negative epistatic interactions with both deleterious (Johnson et al., 2019, this study) and beneficial mutations (Chou et al., 2011; Hall et al., 2019; Karkare et al., 2021; Khan et al., 2011; Kryazhimskiy et al., 2014; Ono et al., 2017; Pearson et al., 2012; Perfeito et al., 2014; Rokyta et al., 2011; Wünsche et al., 2017).

At the broadest level, we can distinguish two potential sources of these unbalanced patterns of epistasis: the structure of biological systems and the set of mutations that fix during evolution. Both theoretical and experimental work has shown that genes within functional modules tend to have similar interaction profiles with other genes (Costanzo et al., 2016; Segrè et al., 2005). Given this kind of ‘monochromatic’ epistasis, all that is necessary for fitness-correlated epistasis to appear during evolution is for beneficial mutations to be clustered in a few modules. The strength and direction of fitness-correlated epistasis will then depend on the particular modules targeted by selection and how those modules interact with the rest of the cell. For example, the targets of adaptation in SC 37°C may be more related to heat stress than general components of growth (e.g., mutations in LCB3, which have been shown to reduce killing in certain heat stress conditions, are enriched in populations in SC 37°C) (Ferguson-Yankey et al., 2002; Johnson et al., 2021). If the strongly deleterious effects of some of our insertion mutations are exacerbated by heat stress, but a beneficial mutation reduces heat stress, we can expect a positive interaction between the two mutations. In contrast, we hypothesize that the deleterious effects of some insertion mutations become more pronounced when growth rate is increased by mutations in YPD 30°C (Johnson et al., 2019). In order to understand or predict these differences in epistasis specific to the evolution environment, we will need to better understand the functional structure of biological systems.

Higher-order epistasis and the evolution of robustness

Each mutation that fixes during evolution has an immediate first-order effect on fitness, but also carries with it a second-order set of pairwise interactions whose strength and direction is determined by the structure of functional relationships between genes and biological modules. Through higher-order interactions, a mutation can also change the structure of these functional relationships, altering the complexity, redundancy, robustness, and evolvability of biological systems (reviewed in de Visser et al., 2003; Masel and Siegal, 2009; Masel and Trotter, 2010; Payne and Wagner, 2019). Our work here suggests that mutational robustness tends to decrease during evolution in some environments, but our data is limited to interactions with the set of ~200 mutations that fixed during 10,000 generations of laboratory adaptation. We expect the effects we observe here to dominate during rapid adaptation, but over longer evolutionary timescales, robustness may be more dependent on changes in the higher-order structure of biological systems.

Ideas and speculation

We find that in one evolution environment a tendency towards negative epistasis leads to a shift in the DFE towards more deleterious mutations over time, while in another evolution environment no such shift occurs. Clearly, the details matter, and it is difficult to draw general conclusions. In this section, we speculate on how work in metabolic control theory (MCT) may help explain the functional underpinnings of our results and fitness-correlated epistasis more generally.

MCT describes how mutations in idealized metabolic pathways change the control coefficients of other mutations (Szathmáry, 1993). In a strictly serial pathway, a mutation reducing the activity of one enzyme will decrease the control other enzymes have over flux through the pathway, but a mutation increasing the activity of one of the enzymes will increase the control of others. If two enzymes act instead in parallel, these patterns are reversed: a mutation that increases activity of one will decrease the control of the other on flux, and vice versa (for a thorough treatment of how epistasis arises and propagates in metabolic networks, see Kryazhimskiy, 2021).

If flux through the pathway is correlated with fitness, these patterns of interactions predict epistasis between mutations in different enzymes in the pathway. For example, if flux is negatively correlated with fitness in a strictly serial pathway, beneficial mutations will be those that reduce flux, so they will reduce the control of other enzymes in the pathway, such that these beneficial mutations exhibit negative epistasis. This is the case for beneficial loss-of-function mutations in the broken adenine-synthesis pathway present in the ancestors of our evolution experiment (Johnson et al., 2021): one beneficial loss-of-function mutation in this pathway (e.g., in ADE4) lowers the control coefficients of the rest of the enzymes in that pathway, such that loss-of-function mutations in these enzymes become less beneficial (e.g., our insertion mutation in ADE5,7 in this experiment, Figure 3—figure supplement 1).

MCT is a mathematical framework for describing these interactions, but the general qualitative principles can be applied beyond enzymes in metabolic pathways to understand patterns of fitness-correlated epistasis (MacLean, 2010). One hypothesis for the functional underpinnings of increasing-costs epistasis can be framed in this way. First, we assert that the large-scale components of growth in the yeast cell functionally act in serial. While these components of growth (e.g., cell wall production, ribosome production, and DNA replication) do not belong to an actual serial pathway, they clearly do not act in parallel: these components generally cannot ‘fill in’ for each other. That is, they are nonredundant. We therefore propose that an increase in the function of one of these components (in terms of growth rate) will increase the control coefficients of the rest of the components. Similarly, a decrease in the function of one component will decrease the control coefficients of the rest. We can intuitively understand this based on the idea of limitation: if DNA replication slows to a halt, growth rate will become less sensitive to changes in the speed of cell wall production. In contrast, if a population in which growth is limited by DNA replication fixes a mutation that improves DNA replication, we expect the control coefficient of cell wall production to increase, meaning deleterious mutations that slow cell well production will become more deleterious. We believe this effect can explain much of the increasing-costs epistasis we have observed.

Consider the following metaphor for this effect. You work for a car manufacturer, and your factory’s goal is to produce cars as quickly as possible. You work with a small team that builds the wheels. Your team is efficient, but the engine team is much slower. Because the engine team is limiting production, you don’t feel under pressure from the boss at all – in fact, one of your team members slacks off sometimes, but the company hardly suffers (read: their control coefficient is low, deleterious mutations have small effects). One day the engine team purchases a new robot and dramatically speeds up their process. Suddenly cars are waiting for wheels, and the pressure on your team increases dramatically – now when your teammate is slacking, it slows the entire production line down (read: their control coefficient is high, deleterious mutations have larger effects, costs have increased).

In our discussion, we speculate that we see increasing costs more frequently in YPD 30°C because adaptation in that environment is more focused on improving core components of growth compared to adaptation in SC 37°C where selection for improvement in heat tolerance or survival may be more common. The phenotype of survival does not fit as neatly into our car manufacturing metaphor: we have no strong hypotheses for how control coefficients should change as populations increase heat tolerance or for how large-scale phenotypes such as growth and survival integrate in terms of competitive fitness. However, we speculate that in SC 37°C adaptive mutations are less likely to be affecting the core components of growth, and therefore cause increasing costs less often than mutations selected in YPD 30°C (i.e., adaptive mutations in SC 37°C are less likely to increase the control coefficients of random mutations). To move closer to predictive models of epistasis, we will need to better understand the functional relationships between these large-scale cellular phenotypes.

We end our discussion by noting that changes in robustness over longer evolutionary timescales may depend not on the type of changes we observe in our evolution experiment, but on changes to the higher-order structure of biological systems. Within our metaphor, this is the difference between changes to each team’s effectiveness and changes to the structure of the assembly line (or even to the ultimate product of the system). Defining what constitutes a change to the structure of the system itself is a difficult problem, analogous to defining ‘novelty’ in evolution (Murray, 2020), but most would agree that there are fundamental differences in the complexity and redundancy of the genetic systems of viruses and humans, for example.

Both experimental and theoretical work have suggested that higher genome complexity and redundancy is associated with more negative epistasis between deleterious mutations (Macía et al., 2012; Sanjuán and Elena, 2006; Sanjuán and Nebot, 2008; though see Agrawal and Whitlock, 2010). Negative epistasis between deleterious mutations implies that deleterious mutations increase the control coefficients of other mutations and that beneficial mutations will decrease control coefficients. We can understand this pattern as a broad extension of MCT: the case of strictly parallel pathways and fitness correlated with flux, in which two deleterious mutations exhibit negative epistasis, is one example of a functionally redundant relationship. Therefore, in organisms with higher functional redundancy, we expect to observe more positive interactions between beneficial and deleterious mutations and less increasing-costs epistasis. In other words, the second-order loss of mutational robustness we observe during adaptation should be stronger in organisms with low redundancy. While there may be more unseen factors affecting the types of interactions beneficial mutations participate in, present data suggests that increasing-costs epistasis may be specific to organisms or environments where the functional redundancy of genes or biological modules is low.

Epistasis between beneficial mutations

An apparent contradiction emerges from our explanation for increasing-costs epistasis: if adaptation increases the control coefficients of core components of growth, why do we not see more positive epistasis between beneficial mutations in evolution experiments? Why do we instead usually see negative, diminishing-returns epistasis and declining adaptability? We propose that this discrepancy arises due to differences in the availability and form of beneficial and deleterious mutations. Based on previous work, we expect beneficial mutations in laboratory evolution experiments to be primarily loss-of-function mutations (Murray, 2020). In contrast to deleterious mutations, which can be spread across the genome, these types of beneficial mutations will rarely exist in well-adapted core components of growth, and will instead be clustered in a few adaptive targets. Within these targets, we believe loss-of-function beneficial mutations are often functionally redundant, meaning that they tend to decrease each other’s control coefficients for fitness. Beneficial mutations can be redundant by inactivating the same deleterious pathway (e.g., ADE pathway mutations discussed above), solving the same general problem (e.g., mutations shortening lag in Karkare et al., 2021), or changing a phenotype with a nonlinear fitness function (Chiu et al., 2012; Chou et al., 2014; Keren et al., 2016; Lunzer et al., 2005; Otwinowski et al., 2018). Nonmonotonic fitness functions can arise from phenotypes with both potential benefits and costs (Dekel and Alon, 2005), such that negative interactions between beneficial mutations and the benefit can lead to fitness-correlated epistasis that crosses neutrality, exhibiting diminishing returns, increasing costs, and sign epistasis (Figure 3—figure supplement 1).

This explanation provides a prediction: we will be more likely to see synergistic epistasis during evolution experiments when we observe beneficial gain-of-function mutations. Chou et al., 2009 provides a particularly strong example of a beneficial gain-of-function (promoter capture) mutation that is more beneficial in more-fit genetic backgrounds. In the long-term Escherichia coli evolution experiment, potentiating mutations acquired during adaptation in one population interacted positively with a beneficial gain-of-function mutation (also a promoter capture), enabling aerobic citrate utilization (Blount et al., 2012). Studies of evolutionary repair also provide examples of synergistic interactions between apparently nonredundant beneficial mutations (Fumasoni and Murray, 2020; Hsieh et al., 2020). These counterexamples underscore the fact that diminishing-returns epistasis is not a rule; it is a pattern that is overrepresented in evolution experiments due to a tendency for beneficial mutations to be loss-of-function mutations and to be clustered in a few adaptive targets. These tendencies may be weaker later in evolution experiments when mutations are spread more evenly across cellular modules, such that a period of declining adaptability caused by diminishing-returns epistasis early in an experiment gives way to a period of relatively constant fitness gains (Good and Desai, 2015).

A final note on terminology for epistasis

Very few papers discuss epistasis between beneficial and deleterious mutations – most theoretical and experimental work has focused on epistasis between two beneficial mutations or two deleterious mutations. With these same-signed pairs of mutations, the terms negative and positive epistasis are consistent. Two beneficial mutations that interact negatively imply two deleterious reversions that also interact negatively. However, when we consider one of these beneficial mutations and the deleterious reversion of the other, they interact positively. We provide this note to clarify that increasing-costs epistasis, in which deleterious mutations exhibit negative epistasis with beneficial mutations, should not be considered the same as previous results demonstrating negative epistasis between two deleterious or two beneficial mutations. Instead, we should expect it in systems where more positive epistasis is observed between same-signed mutations. While epistasis is already a concept overladen with terminology, we submit that in some cases it may be more useful to classify interactions between mutations or cellular components as being functionally redundant or nonredundant in terms of fitness.

Materials and methods

Strains

All strains used for this study were isolated from the evolution experiment described in Johnson et al., 2021. We isolated two clones from each of our focal populations at each sequencing timepoint. For this experiment, we used clones from 12 MATa populations, 6 from the YPD 30°C environment and 6 from the SC 37°C environment. We decided to include population P1B04, which exhibits a cell-clumping phenotype in preliminary imaging data, and to exclude population P1B03, which diploidized during evolution, and populations P3C04, P3F05, P3D05, and P3E02, which lost G-418 resistance during evolution (not being able to select on G-418 during transformation could allow the HygMX cassette to replace the KanMX cassette, leading to leakage during the selection step). Otherwise, we chose populations randomly. The ancestor of these populations is MJM361 (MATa, YCR043C:KanMX, STE5pr-URA3, ade2-1, his3Δ::3xHA, leu2Δ::3xHA, trp1-1, can1::STE2pr-HIS3 STE3pr-LEU2, HML::NATMX, rad5-535).

Barcoded Tn7 libraries

Request a detailed protocol

We used a previously created set of Tn7-based plasmid libraries to introduce the same set of ~100 mutations into each of our strains (Johnson et al., 2019). These plasmids contain a section of the yeast genome corresponding to one of these ~100 locations, interrupted by a Tn7 insertion containing a random DNA barcode and a HygMX cassette. Each barcode uniquely identifies the mutation and the plasmid library via a mapping established in earlier work (Johnson et al., 2019).

Transformation

Request a detailed protocol

Our yeast transformation protocol is a scaled-up version of that used in Johnson et al., 2019, based on the method described in Gietz and Schiestl, 2007. We grew strains from freezer stocks overnight, diluted 750 μL into 15 mL YPD + ampicillin (100 g/mL), grew for 4 hr, pelleted the cells, and resuspended in 900 μL transformation mix and 100 μL plasmid DNA cut with NotI-HF (corresponds to 2 μg of plasmid; cut at 37°C for 3 hr, then heat-inactivated at 65°C for 10 min). We then heat-shocked this mixture at 42°C for 1 hr, recovered in 3 mL YPD + ampicillin for 2 hr, plated 25 μL on antibiotic selection plates to check efficiency, and then combined the rest with 40 mL YPD supplemented with antibiotics. For both agar and liquid-selective media, we included hygromycin (300 μg/mL), clonNat (20 μg/mL), and G-418 (200 μg/mL). We made frozen glycerol stocks of each transformation after ~48 hr of growth. All growth was conducted at 30°C either in a test tube on a roller drum (recovery) or in a baffled flask on an orbital shaker (all other steps).

We transformed 2 clones from 12 populations at 6 timepoints for a total of 144 transformations. We organized these transformations into three ‘VTn assays,’ each associated with 48 transformations using our 48 unique barcoded libraries.

Fitness assays

Request a detailed protocol

Again, we followed the protocols established in Johnson et al., 2019 for our fitness assays. We used two types of media: rich YPD media (1% Bacto yeast extract [VWR #90000-726], 2% Bacto peptone [VWR #90000-368], 2% dextrose [VWR #90000-904]) and synthetic complete (SC) media (0.671% YNB with nitrogen [Sunrise Science #1501-250], 0.2% SC [Sunrise Science # 1300-030], 2% dextrose). We assayed our transformed libraries of clones from the YPD 30°C environment in both their evolution environment (YPD 30°C) and the SC 37°C environment, and clones from the SC 37°C environment in their evolution environment (SC 37°C). We first arrayed our transformation glycerol stocks into two 96-well plates corresponding to the two evolution environments, and then inoculated 8 μL from each well of these plates into four (for SC 37°C assays) or eight (for YPD 30°C assays) flat-bottom polypropylene 96-well plates containing 126 μL of media, supplemented with the same antibiotics as during the initial selection. To ensure efficacy of the antibiotics in the SC 37°C environment, we used media with MSG instead of ammonium sulfate (1.71 g/L YNB without amino acids or ammonium sulfate, 2 g/L SC, 1 g/L MSG). After this period of growth, we used YPD and SC supplemented with ampicillin (100 μg/mL) and tetracycline (25 μg/mL), matching the conditions of the evolution experiment. After 40 hr of growth in these plates, we started daily transfers.

At each daily transfer, we diluted YPD 30°C cultures 1/210 and SC 37°C cultures 1/28. During these transfers, we combined and mixed cultures from each well corresponding to the same clone/transformation to increase population size and reduce bottleneck noise. In the first (T0) transfer, we combined cultures from the eight plates that were initially inoculated from the freezer stock and diluted them into 20 96-well plates. In all subsequent transfers (T1–4), we combined cultures from all 20 plates and diluted them into 20 new plates. Specifically, for YPD 30°C, we diluted 3 μL from each well of 20 plates into 60 μL YPD (60 μL total, 1/2 dilution), mixed, then diluted 16 μL into 112 μL YPD (1/23 dilution), mixed, and distributed 2 μL into 126 μL YPD in 20 plates (1/26 dilution). For SC 37°C, we diluted 3 μL from each well of 20 plates into 60 μL SC (60 μL total, 1/2 dilution), mixed, then diluted 60 μL into 60 μL SC (1/2 dilution), mixed, and distributed 2 μL into 126 μL SC in 20 plates (1/26 dilution).

Barcode sequencing

Request a detailed protocol

Our fitness assays in YPD 30°C were originally performed alongside assays in SC 37°C that were later abandoned due to an issue with expired reagents (and repeated with appropriate reagents in our second round of assays). During these assays, we combined equal volumes of culture at the end of each transfer from every well corresponding to each of the three VTn assays in each environment. We can pool the cultures corresponding to each VTn assay because we know which barcodes correspond to which plasmid library/clone, so we can divide our barcode count data appropriately during sequencing analysis. We performed DNA extractions from two 1.5 mL pellets for each assay-timepoint from our YPD 30°C fitness assays and from four 1.5 mL pellets from our SC 37°C fitness assays using Protocol I from the Yeastar Genomic DNA Kit (Zymo Research), as described previously (Johnson et al., 2019). We then amplified barcodes using a two-step PCR protocol. We performed four first-round PCRs with 19 µL gDNA, 25 L 2X Kapa Hotstart Hifi MM, 3 µL 10 M TnRS1 primer, and 3 µL 10 M TnFX primer, and ran the PCR protocol: (1) 95°C 3:00, (2) 98°C 0:20, (3) 60°C 0:30, (4) 72°C 0:30, GO TO step 2 three times, and (5) 72°C 1:00. We purified these PCRs with PCRClean DX Magnetic Beads (Aline) using a 0.85× ratio. We then set up two second-round PCRs per sample by combining 25 µL purified PCR µ1 product, 1.5 µL ddH2O, 10 µL Kapa Hifi Buffer, 1 µL KAPA HiFi HotStart DNA Polymerase, 5 µL 5 M N7XX primer (Nextera), and 5 µL 5 M S5XX primer (Nextera), and ran the PCR protocol: (1) 95°C 3:00, (2) 98°C 0:20, (3) 61°C 0:30, (4) 72°C 0:30, GO TO step 2 19 times, and (5) 72°C 2:00. We purified the resulting libraries with Aline beads, using a 0.7× ratio, then repeated the purification with a 0.65× ratio, and finally sequenced our pooled libraries on a NextSeq 550 (Illumina).

From reads to barcode counts

Request a detailed protocol

We process our sequencing data as described previously (Johnson et al., 2019). We first filter reads based on inline indices and quality scores, use regular expressions to extract barcode sequences, and combine barcode counts across timepoints for each VTn assay. Next, we use a single-bp-deletion-neighborhood method to correct errors in raw barcodes, assigning them to the set of known barcodes from each of our plasmid libraries. By associating barcodes with plasmid libraries, we associate them both with a fitness assay for a particular clone and with a particular insertion mutation, and we divide our barcode count data accordingly.

Estimating fitness effects from barcode counts

Request a detailed protocol

Again, we follow Johnson et al., 2019, with minor differences. First, we convert barcode counts to log-frequencies at each timepoint. After this preliminary step, we noticed a large number of log-frequency spikes, restricted largely to one timepoint in one of our VTn assays in SC 37C. These spikes in frequency very likely represent low-level sequencing library contamination from another timepoint due to primer cross-contamination. In Figure 1—figure supplement 9, we show that this is confined to this single timepoint and demonstrate how we can use a simple heuristic (excluding lineages whose log-frequency at timepoint 2 is 0.5 greater than both timepoint 1 and timepoint 3) to remove the barcoded lineages affected by this sequencing library contamination. This step excludes, on average, less than 1% of the reads from timepoint 2 in these assays. Next, we calculate fitness effects for each barcode, as described in Johnson et al., 2019. After excluding timepoints with less than 5000 total barcode counts, we measure the log-frequency slope for each barcode at each consecutive pair of timepoints, excluding timepoints in which the barcode has less than 10 counts. We scale each of these log-frequency slopes by the median log-frequency slope of barcodes associated with five neutral reference mutations, and then average these scaled values to get one fitness measurement for each barcode. As in Johnson et al., 2019, we observe a small fraction of outlier barcodes, which follow starkly different log-frequency trajectories than the other barcodes associated with the same insertion mutation, presumably due either to pre-existing mutations in the transformed culture or transformation artifacts (including two mutations being transformed together). We use a log-likelihood ratio test to identify barcodes whose read counts are inconsistent with barcodes near the median fitness measured for one insertion mutation. Based on iterative exclusion and exploration of frequency trajectories for this experiment, we chose a heuristic cutoff of 40 for the log-likelihood ratio required to exclude barcodes (for a detailed description of this method, see Johnson et al., 2019). Finally, to decrease noise from low-frequency barcode lineages while retaining the independent measurements unique barcodes provide, we randomly combine counts from individual barcodes into a maximum of five combined barcodes (‘cBCs’) per insertion mutation. Next, we repeat the fitness measurement process described above to get final fitness measurements for each cBC (scbc).

Again following Johnson et al., 2019, we calculate the mean and standard error of all cBCs fitness measurements for each insertion mutation (m) in each clone (c):

Sm,c=Σcbcsscbcnum_cbcs,σm,c,cbc=cbcs(scbcsm,c)2(num_cbcs1)num_cbcs

For each clone, we also calculate the standard error of our fitness measurements of our set of neutral barcodes (σneut,c). Since sm,c is the difference between the measured cBC fitness for mutation m and the measurement of neutrality, we calculate the standard error for sm,c as the square root of the sum of squares of these two errors:

σm,c=σm,c,cbc2+σneut,c2

To obtain the fitness effect of a mutation for a population-timepoint (sm,p,t), we calculate an inverse-variance weighted average on the fitness effect measurements from each clone:

sm,p,t=clonessm,cσm,c2clones1σm,c2

We only calculate sm,p,t if there are at least three cBCs between the two clones. If one of the two clones has only one cBC, we use σm,c from the other replicate as an estimate of the standard error to be used in inverse-variance weighted averaging. Note that because we only consider data with at least three cBCs during our analysis in which we treat clones independently, this estimate of the standard error is only used for the inverse-variance weighted averaging.

We calculate the standard error on this sm,pt measurement as described above for individual clones using the s values for all cBCs associated with mutation m in both clones:

σm,pt,cbc=cbcs(scbcsm,p,t)2(num_cbcs1)num_cbcs

As described in Johnson et al., 2019, we use this conservative measure of standard error instead of the one given by inverse-variance weighting in order to better capture unspecified biological error between the two replicates. Again, we combine this standard error with the standard error of all neutral cBCs in both clones (σneut,pt):

σm,p,t=σm,p,t,cbc2+σneut,p,t2

To test whether mutations have effects significantly different from zero for each population-timepoint, we fit all cBC s values for a single mutation by OLS as described in Johnson et al., 2019. The OLS model includes a fitted term for the fitness effect of the mutation (smut), a fitted term (βmut) for differences in the effect between clones (indicated by r), and a normally distributed noise term (emut).

scbc=smut+βmutr+emut

We use the t-statistic for the intercept to calculate p-values for whether smut0 for each mutation. We perform a Benjamini–Hochberg correction at the 0.05 level on the entire set of p-values we find.

Measuring changes in the DFE and accounting for missing measurements

Request a detailed protocol

For each population-timepoint or clone, in each condition, we calculate the mean of the fitness effects of all mutations with at least three cBCs, excluding distributions with less than 60 mutations with fitness effect measurements. We calculate the standard error of the mean of the distribution of fitness effects by combining variances from two sources: error in our measurement of mutation fitness effects and errors in our measurement of mean fitness. Since this second component of variance is shared across mutations, it is not scaled by the number of mutations:

σDFEmean,p,t=mutationsσm,p,t,cbc2n2+σneut,p,t2

where n is the number of mutations with measurements in the distribution (we use an analogous formula in our analysis on individual clones). Importantly, note that this measure of error for the DFE mean describes only noise from measured mutations and does not address the fact that some mutations do not have measurements in some clones or population-timepoints.

We examined the effect of these missing fitness effect measurements in several ways. First, we examined the mean of the DFE in sets of mutations shared across the set of population-timepoints we had assayed successfully in each condition. For the analysis of individual clone DFEs in each condition, we try to find the largest set of clones with at least 40 mutations with measurements in every clone. We do this by iterating through the list of clones, sorted by the number of measured mutations, and adding them to a set of clones until the number of mutations with measurements in every clone drops below 40. Next, we create ‘filled-in’ DFEs for each population-timepoint and each clone in which missing measurements are replaced with the mean fitness effect measured across all strains in a given condition. The pattern observed in YPD 30°C, in which the mean fitness effect decreases as populations evolve and gain fitness, is stronger (in terms of the p-value and R2 of the regression) in both the set of shared mutations and the filled-in DFE than in our original analysis.

Finally, we examined the number of strongly deleterious mutations (defined as having a mean fitness effect <–0.05 across all population-timepoints in a given condition) that were not measured in each population-timepoint. In YPD 30°C, these strongly deleterious mutations are less likely to be measured in more fit strains, again suggesting that the relationship we observe would be stronger with complete data. The results of these analyses are plotted in Figure 1—figure supplement 5.

Modeling the determinants of epistasis

Request a detailed protocol

Because our modeling approach can be strongly influenced by outliers, we only consider fitness effect measurements for mutations with at least five cBCs across the two clones for these analyses (or three cBCs for the corresponding analysis in which clones are treated separately). First, we perform least-squares regression between background fitness and fitness effect for each mutation in each environment. As described in the main text, we classify mutations as being negatively or positively correlated with fitness based on both a statistical test (p<0.05, Wald test) and the effect size (|slope|>0.05). This heuristic slope cutoff is meant to filter for cases in which the range of fitness effects under consideration is larger than the typical noise for a single fitness effect measurement. In the YPD 30°C environment, a slope of 0.05 across an ~0.15 range of fitnesses will mean fitness effects should vary ~0.007, which is also the mean standard error of our fitness effect measurements in that environment (an analogous calculation in SC 37°C would yield a lower threshold, but we keep 0.05 for consistency).

Next, we fit our data with the three linear models described in the main text. To fix the intercepts of our models correctly, we first transform the fitness effect of each mutation by subtracting the mean fitness effect measured across all populations at the first timepoint of the evolution experiment (we denote each transformed fitness effect for mutation m in population p, timepoint t, measured in condition e as s~m,p,t,e below). We similarly transform our background fitness variable by subtracting the average fitness measurement across all populations at the first timepoint (we denote each transformed background fitness in population p, timepoint t, measured in environment e as x~p,t,e below). Then we fix the intercept at (0, 0) in the modeling described below. Note that our plots showing these model fits use the natural scales for fitness and fitness effects, not these transformed scales. The fitness model (XM) can be written as

s~m,p,t,e=βm,ex~p,t,e+em,e

where em,e is a normally distributed noise term, and βm,e is a fitted parameter representing the slope between background fitness and the fitness effect of the mutation.

The idiosyncratic model (IM) can then be written as

s~m,p,t,e=p,tαm,p,t,eip,t,e+em,e

ip,t,e is an indicator variable that is 1 at timepoints >= t in population p and zero in all other cases, and αm,p,t,e is a fitted parameter associated with that indicator variable.

The full model (FM) can then be written as

s~m,p,t,e=βm,ex~p,t,e+p,tαm,p,t,eip,t,e+em,e

In both idiosyncratic model and full model, the idiosyncratic epistasis terms (αm,p,t,eip,t,e) are added iteratively. At each step, we add the parameter that decreases the BIC the most if that decrease is more than 2. We do not allow parameters that fit only a single point, and we do not allow more than one parameter per population (i.e., the maximum number of idiosyncratic parameters for a given mutation in a given environment is the number of populations, six). Importantly, the idiosyncratic terms in the idiosyncratic model and the full model for a given mutation in a given environment may be different, so IM is not nested within FM and IM may explain more variance than FM in some cases. Because we consider the fixed intercept term to be part of our models, we compute R2 for each model as 1 – (the sum of squared residuals/the centered sum of squares). This R2 value can be negative if the model explains less variance than a model with only a free intercept term, in which case we set R2 to zero and do not plot the model in Figure 3 or Figure 4. All model fitting was performed by OLS using the Python package statsmodels (Seabold and Perktold, 2010).

To test how much noise affects our model fitting procedure, we ran our analysis on a simulated dataset and a shuffled dataset. In both cases, we focus on the sets of up to 36 fitness effect measurements associated with a mutation and a condition (with measurements in six populations and six timepoints). For the simulated dataset, we drew fitness effects for each set of mutations from a normal distribution with a mean of zero and a standard deviation equal to the mean empirical standard error for the fitness effects within the set of mutations (i.e., we drew all sm,p,t,e values for a given m and e from a normal distribution with a standard deviation of mean(σm,p,t,e)). For the shuffled dataset, we randomly shuffled within each set (i.e., we randomly shuffled all sm,p,t,e values for a given m and e).

The distributions of the coefficients obtained from our modeling procedure for the empirical, shuffled, and simulated datasets are plotted in Figure 3—figure supplement 4. In YPD 30°C, we found 291, 91, and 87 IM coefficients in our empirical data, shuffled data, and simulated data, respectively. In SC 37°C, we found 245, 78, and 81 IM coefficients in our empirical data, shuffled data, and simulated data, respectively. In clones isolated from evolution in YPD 30°C and assayed in SC 37°C, we found 241, 60, and 60 IM coefficients in our empirical data, shuffled data, and simulated data, respectively. In YPD 30°C, we found 140, 45, and 36 FM coefficients in our empirical data, shuffled data, and simulated data, respectively. In SC 37°C, we found 88, 38, and 46 FM coefficients in our empirical data, shuffled data, and simulated data, respectively. In clones isolated from evolution in YPD 30°C and assayed in SC 37°C, we found 199, 63, and 57 FM coefficients in our empirical data, shuffled data, and simulated data, respectively.

Measuring background fitness

Request a detailed protocol

We measured the background fitness of clones with fluorescence-based competitive fitness assays in duplicate for each clone in each environment using the reference strains strain 2490A-GFP1 and 11470A-GFP1 for the YPD 30°C and SC 37°C clones, respectively. We used the 2490A-GFP1 reference when we assayed the YPD-30°C-evolved clones in SC 37°C because some of these clones have very low fitness and 2490A-GFP1 has a lower fitness than 11470A-GFP1. We used the fitness difference measured between these two references in Johnson et al., 2021 to standardize the fitness measurements YPD-30°C-evolved clones in SC 37°C so that all fitness measurements in SC 37°C are on the same scale. Fitness assays were performed and data was analyzed as described in Johnson et al., 2021. Briefly, we maintained mixed cultures of our clones and fluorescent references for three daily growth cycles, as described above, and measured the frequency of fluorescent cells at each transfer using flow cytometry. We then calculated the fitness of each clone as the slope of the natural log of the ratio between the frequencies of the nonreference and reference cell populations over time. Finally, we calculated the mean and standard error of the fitness measurements for the two clones associated with each population-timepoint.

Data availability

Raw sequencing data has been deposited in the GenBank SRA (accession: SRP351176). All code used in this project is available on GitHub (https://github.com/mjohnson11/VTn_pipeline, copy archived at swh:1:rev:02d2b41d54dd22487df1c75f9e381411c5ef0376). All figures are based on data included in Supplementary File 1.

The following data sets were generated
    1. Johnson MS
    2. Desai MM
    (2021) NCBI BioProject
    ID PRJNA789529. Mutational robustness changes during long-term adaptation in laboratory budding yeast populations.

References

  1. Conference
    1. Wright S
    (1932)
    The roles of mutation, inbreeding, crossbreeding, and selection in evolution
    Proc Proceedings of the Sixth International Congress of Genetics. pp. 356–366.

Decision letter

  1. Vaughn S Cooper
    Reviewing Editor; University of Pittsburgh, United States
  2. Aleksandra M Walczak
    Senior Editor; CNRS LPENS, France
  3. Craig Miller
    Reviewer

Our editorial process produces two outputs: i) public reviews designed to be posted alongside the preprint for the benefit of readers; ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Mutational robustness changes during long-term adaptation in laboratory budding yeast populations" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Senior Editor. The following individual involved in the review of your submission has agreed to reveal their identity: Craig Miller (Reviewer #2).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

(1) Justify the selection of the 91 mutants and how they are broadly representative. If they are not for the following reasons, considerable additional experiments are required.

In the authors' 2019 study, of ~1,147 insertion mutants assayed, ~91 of them had fitness effects distinguishable from neutral, i.e. ~8% are "effective". If (a) the percentage of "effective mutations" are similar between the genetic backgrounds used in the 2019 study and the current study, and (b) the genetic backgrounds used in these two studies are not correlated, then "effective mutations" from the 2019 study could still be "effective" only by chance. If the same percentage holds, then only 8% of 91 may (7 mutants) be drivers of the DFE or epistasis forms reported here. Though the paper suggests that 20-30 mutants are deleterious in the current dataset (figure 1 supplement 5), the lack of proper error estimates as mentioned below makes it difficult to evaluate these estimates

(2) Clarify the statistical power of various comparisons, as follows. Fitness is obtained by averaging effects on two clones from a population and variation between these measurements due to genetic differences or error are insufficiently accounted for. Some measures have r-squares <0.5, which begs the question, how much of the noise is coming from mutational differences between clones and how much from measurement noise? Was there any replication of fitness estimation on identical clones so that one has a good estimate of the measurement noise? How is noise accounted for and how does it influence the epistasis modeling? Much is made about the amount of the distribution with positive vs negative coefficients-but sampling noise alone will make coefficients deviate from zero.

3) Both reviewers and the editor were confused by the rationale and choice of the nested statistical framework, as noted in greater detail below by Reviewer 2. For example, It is unclear why for many mutations the idiosyncratic model explains more of the variance than the full model (e.g. Figure 3A). (Note, the fitness-mediated model never fits better than the full model). Also, when dealing with nested models in general, one should ask whether the more complex model fits the data enough better to justify accepting it over simpler model(s). There are clearly details and constraints in the models used here (and likely in the fitting process) that matter, but these are not discussed in any detail, or is the choice of model selection explained adequately.

Reviewer #1 (Recommendations for the authors):

Comments on figures:

Figure 1B: What is the error bar on y-axis? Without an error bar, conclusions cannot be made, particularly due to small fitness changes on y-axis (0.015 to 0.02 fitness changes while the difference between two clones from the same population is up to ~0.02). As acknowledged in Line 124, noise in fitness measurements contribute significantly to the modest differences in the DFE between clones.

Figure 2: Make it clear that each dot in the figure represents a disruption mutation, whose fitness is averaged on the background of two clones from the same timepoint/population. Define "background fitness". The label of x-axis is currently missing. Fitness errors on x-axis are not considered when inferring the significance of the correlation. Furthermore, the examples in Figure 2 are sufficient to conclude that epistasis is not strictly fitness-mediated. Authors could foreshadow the conclusions of the next section.

Figure 3: Line 214-215, walk the reader through the paper. Current writing is difficult to understand. What do the coefficients represent?

3B: Include a positive or negative model of IM. Consider also show models with x-axis as fitness to keep it consistent with 3C. Similarly, show examples with coefficient ~0 in 3C. The current selection of examples in 3B and 3C can lead to misunderstanding of the IM and Full-M.

One thing I don't quite understand is that models with more parameters as in IM or full-M are going to give better explanatory power than those with fewer parameters as in Fitness-M. Are there alternative ways to evaluate these models taking the number of parameters in the model into consideration?

Line-by-Line comments:

Line 81: Introduce the concept of mutational robustness. The immediate relation between mutational robustness and directional evolution experiments where bottleneck/drift likely plays little role in is not obvious. The authors should explain it in the introduction to motivate the rest of the manuscript.

Line 106: Explain why these 91 insertion mutations were chosen.

Line 108-112: Fitness correlation between clones from the same population is not enough to justify the averaging of their fitness as detailed in Public Review.

Line 118: How was p-value calculated? Was error of fitness measurements considered?

Line 159-160: What is the rationale underlying the choice of 0.05 for slope cutoff?

Line 203-210: Briefly describe the models, for instance, what (how many) parameters were used. If possible, include the formula of the model.

Line 235-236: Explain why the remaining epistatic effects are more likely to be negative.

Line 237-238: between the epistasis and environment ("GXGXE" effects).

Line 243: Which figure to look at?

Additional analysis that authors could consider:

Will the conclusion change if more/fewer time points from the evolving populations has been studied (for instance, clones were isolated from longer or shorter evolution)?

How could the magnitude of fitness increase during evolution affect the conclusion (i.e. the maximum fitness increase of clones from YPD is ~0.06 while the maximum fitness increase in SC is ~0.03)?

Reviewer #2 (Recommendations for the authors):

I am enthusiastic about this work. My main criticism is that the modeling is not laid out with enough care and thoroughness, leaving me to question some of the downstream conclusions. I have also offered many specific comments to the authors that are all in the form of comments within the manuscript pdf. I have emailed this annotated pdf to the editors who will upload it on my behalf.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Mutational robustness changes during long-term adaptation in laboratory budding yeast populations" for further consideration by eLife. Your revised article has been evaluated by Aleksandra Walczak (Senior Editor) and a Reviewing Editor.

The manuscript has been improved but there are some remaining issues that need to be addressed, as outlined below:

Reviewer 1 identified issues with some of the supplemental figures, which must be fixed. However, we all appreciate this extensive revision and look forward to a final version.

Reviewer #1 (Recommendations for the authors):

The authors have sufficiently addressed the previous concerns over the choice of mutants and statistical analysis.

However, multiple supplement figures in the revised manuscript do not have any data points. The technical errors should be fixed before the publication.

I believe once the errors are fixed, the manuscript is ready to be published.

https://doi.org/10.7554/eLife.76491.sa1

Author response

Essential revisions:

(1) Justify the selection of the 91 mutants and how they are broadly representative. If they are not for the following reasons, considerable additional experiments are required.

In the authors' 2019 study, of ~1,147 insertion mutants assayed, ~91 of them had fitness effects distinguishable from neutral, i.e. ~8% are "effective". If (a) the percentage of "effective mutations" are similar between the genetic backgrounds used in the 2019 study and the current study, and (b) the genetic backgrounds used in these two studies are not correlated, then "effective mutations" from the 2019 study could still be "effective" only by chance. If the same percentage holds, then only 8% of 91 may (7 mutants) be drivers of the DFE or epistasis forms reported here. Though the paper suggests that 20-30 mutants are deleterious in the current dataset (figure 1 supplement 5), the lack of proper error estimates as mentioned below makes it difficult to evaluate these estimates

We agree that if the identity of “effective mutations” was completely uncorrelated between the 2019 study and this one, this would imply that only ~8% of the 91 mutations (7 mutants) would have fitness effects in our data set. However, this would be surprising if true, since the strains used in the two studies are in fact quite similar. To be specific, the structure of the relatedness of the strains differs (reflecting laboratory adaptation in this study as compared to offspring of a cross in the 2019 paper), but the strains we consider in this paper are about as closely related to the segregants considered in the 2019 paper as those segregants are to each other.

To test this, prior to beginning this experiment we conducted a pilot study to confirm that these mutations had effects in the strains for this experiment (and our final dataset is in effect a more complete and less noisy version of this pilot). To explicitly show this comparison, we have added a new Figure Supplement (Figure 1 —figure supplement 8), which we reproduce below. As is clear from this figure, the range of fitness effects of the 91 mutations measured in this experiment and their effects in our 2019 experiment are quite similar (i.e. they are similarly “effective” in both studies). In fact, the percentage of fitness effect measurements that are significantly different from zero (after a Benjamini-Hochberg correction at the 0.05 level) is 57% here, compared to 48% in the previous experiment (we do not expect higher percentages since we often observe patterns of epistasis that include fitness effects near or at neutrality).

To make this point more clear we have also added an additional clarification in the main text: “This set of mutations was identified previously as a subset of random insertion mutations that have measurable effects in the strains from Johnson et al. (2019). These mutations have a similar spectrum of effects in the clones isolated for this study (Figure 1 —figure supplement 8), suggesting that they are also a broad sample of insertion mutations with measurable fitness effects in these strains.”

We also appreciate the comment on Figure 1 —figure supplement 5. The y-axis label “# of deleterious muts. measured” in our original draft was misleading: it is really a measure of the number of “strongly deleterious mutations” (defined as having a mean fitness effect < -0.05 across all population-timepoints in a given condition) that were successfully measured in each population-timepoint. We have added this phrasing to the main text, the figure itself, and the figure caption to clarify that this only shows strongly deleterious mutations (and relates only to the issue of missing measurements).

Finally, we note that any experiment of this type faces a fundamental limitation on the number of fitness effects that can be measured, and it is very likely that we would discover many more interesting cases of epistasis were we able to test the effects of a larger set of mutations over the course of these evolutionary trajectories. However, based on the analysis shown in Figure 1 —figure supplement 8, we believe these 91 mutations are comparably “effective” in these strains as they were in the previous experiment.

2) Clarify the statistical power of various comparisons, as follows. Fitness is obtained by averaging effects on two clones from a population and variation between these measurements due to genetic differences or error are insufficiently accounted for. Some measures have r-squares <0.5, which begs the question, how much of the noise is coming from mutational differences between clones and how much from measurement noise? Was there any replication of fitness estimation on identical clones so that one has a good estimate of the measurement noise? How is noise accounted for and how does it influence the epistasis modeling? Much is made about the amount of the distribution with positive vs negative coefficients-but sampling noise alone will make coefficients deviate from zero.

These are good questions, and we have taken several steps to clarify these statistical comparisons. First, we have more clearly described how we calculate the standard errors of s for mutations in each assay using the internal replication structure provided by the multiple redundant barcodes associated with each mutation (in Materials and methods; see below for revised text). These errors are now shown as error-bars in Figure 1 —figure supplements 1-3. Second, we describe how we calculate Benjamini-Hochberg corrected p-values describing the probability each mutation in each population-timepoint has a fitness effect of zero, which we include in our final dataset and use to address point 1 above. Both of these steps closely follow what was performed and described in Johnson et al. (2019).

Third, we have conducted a parallel analysis in which we treat each clone independently, as suggested by reviewer #1, and show that our conclusions are qualitatively unchanged. We have also changed the language in the main text to reflect the point you make here:

“Because the molecular dynamics of evolution in these haploid populations are characterized by successive selective sweeps, we expect the two clones isolated from each population at each timepoint to have very similar genetic backgrounds. When we compare the average fitness effect measurement for each insertion mutation between these clones, we generally see strong agreement, with a few exceptions (Figure 1 —figure supplements 1-3). These exceptions likely represent rare but important genetic differences between clones from the same population-timepoint. Given this, we chose to analyze our data in two ways. First, we improve the reliability of our fitness effects for each population-timepoint by using measurements from cBCs from both clones, treating them as we treated biological replicates in Johnson et al., 2019. Second, we treat each clone independently. Figure 1 —figure supplement 7, Figure 2 —figure supplement 5, Figure 3 —figure supplement 5, and Figure 4 —figure supplement 1 show that our qualitative conclusions are unchanged when using this second approach.”

Fourth, we have more carefully examined the possibility that noise in our measurements could affect the results of our modeling procedures. In the main text, we have included a more complete explanation of the procedures we use to prevent overfitting during modeling:

“Because mutations generally fix between every pair of timepoints during evolution, there could in principle be one idiosyncratic parameter for each timepoint in each population, but allowing all of these parameters would constitute over-fitting. To combat this possibility, we do not allow parameters that fit a single point (e.g., a parameter for an effect at the final timepoint), and we only allow one parameter per population.”

Even with these restrictions, we agree that noise in our fitness effect measurements is likely to give rise to some of the fitted parameters in our models. In order to assess the extent of this potential issue, we created both a simulated and a randomly-shuffled dataset and re-ran our modeling procedure on them. We explain how we constructed these datasets and how many coefficients resulted for each model in the Material and Methods. The distributions of these coefficients are plotted in Figure 3 —figure supplement 4. We find that while a considerable number of coefficients arise when we apply our models to these simulated and shuffled datasets, a larger number and a wider range of coefficients arise from our empirical data. This suggests that we should treat individual idiosyncratic coefficients with skepticism when looking at our data, but that we can still draw meaning from the overall patterns in these distributions. Given these results, we have added the following line to the main text: “While these coefficients can arise in our modeling procedure due to noise alone, we find far more coefficients in our empirical dataset than in a simulated or shuffled dataset (see Materials and methods, Figure 3 —figure supplement 4).” We have also removed our speculative discussion of the distribution of full model coefficients in SC 37°C, which is about a deviation in the distribution of coefficients that is similar to those we see in our simulated/shuffled dataset modeling. None of our conclusions rest upon this point; it was originally included as an interesting observation.

Here is the relevant new text in Materials and methods:

“To test how much noise affects our model fitting procedure, we ran our analysis on a simulated dataset and a shuffled dataset. In both cases, we focus on the sets of up to 36 fitness effect measurements associated with a mutation and a condition (with measurements in 6 populations and 6 timepoints). For the simulated dataset, we drew fitness effects for each set of mutations from a normal distribution with a mean of zero and a standard deviation equal to the mean empirical standard error for the fitness effects within the set of mutations (i.e. we drew all sm,p,t,e values for a given m and e from a normal distribution with a standard deviation of mean(σm,p,t,e)). For the shuffled dataset, we randomly shuffled within each set (i.e. we randomly shuffled all sm,p,t,e values for a given m and e).

The distributions of the coefficients obtained from our modeling procedure for the empirical, shuffled, and simulated datasets are plotted in Figure 3 —figure supplement 4. In YPD 30°C, we found 291, 91, and 87 IM coefficients in our empirical data, shuffled data, and simulated data, respectively. In SC 37°C, we found 245, 78, and 81 IM coefficients in our empirical data, shuffled data, and simulated data, respectively. In clones isolated from evolution in YPD 30°C and assayed in SC 37°C, we found 241, 60, and 60 IM coefficients in our empirical data, shuffled data, and simulated data, respectively. In YPD 30°C, we found 140, 45, and 36 FM coefficients in our empirical data, shuffled data, and simulated data, respectively. In SC 37°C, we found 88, 38, and 46 FM coefficients in our empirical data, shuffled data, and simulated data, respectively. In clones isolated from evolution in YPD 30°C and assayed in SC 37°C, we found 199, 63, and 57 FM coefficients in our empirical data, shuffled data, and simulated data, respectively.”

We have added more clear descriptions of the modeling details to both the methods and Results sections. The counterintuitive phenomenon where the idiosyncratic model sometimes explains more variance than the full model is indeed due to the fitting process. Since idiosyncratic parameters are added iteratively and must reduce the Bayesian Information Criteria by at least 2 at each step, in some cases the full model (in which fitness is included as a parameter) has fewer parameters and/or less explanatory power than the idiosyncratic model. Another way to explain this is that while the fitness model is nested in the full model (both have fitness as a parameter), the idiosyncratic model is not nested in the full model (there may be parameters in the idiosyncratic model that are not in the full model).

3) Both reviewers and the editor were confused by the rationale and choice of the nested statistical framework, as noted in greater detail below by Reviewer 2. For example, It is unclear why for many mutations the idiosyncratic model explains more of the variance than the full model (e.g. Figure 3A). (Note, the fitness-mediated model never fits better than the full model). Also, when dealing with nested models in general, one should ask whether the more complex model fits the data enough better to justify accepting it over simpler model(s). There are clearly details and constraints in the models used here (and likely in the fitting process) that matter, but these are not discussed in any detail, or is the choice of model selection explained adequately.

Reviewer #1 (Recommendations for the authors):

Comments on figures:

Figure 1B: What is the error bar on y-axis? Without an error bar, conclusions cannot be made, particularly due to small fitness changes on y-axis (0.015 to 0.02 fitness changes while the difference between two clones from the same population is up to ~0.02). As acknowledged in Line 124, noise in fitness measurements contribute significantly to the modest differences in the DFE between clones.

For Figure 1B, we have calculated standard errors on the DFE mean using the errors from the measurements of fitness effects that make up the distribution, and added this text to the methods:

“We calculate the standard error of the mean of the distribution of fitness effects by combining variances from two sources: error in our measurement of mutation fitness effects and errors in our measurement of mean fitness. Since this second component of variance is shared across mutations, it is not scaled by the number of mutations:

σDFEmean,p,t=mutationsσm,p,t,cbc2n2+σneut,p,t2 Where n is the number of mutations with measurements in the distribution (we use an analogous formula in our analysis on individual clones).”

We note that these errors do not consider missing measurements, which are likely to be important. These are discussed in the main text and explored more fully in Figure 1 —figure supplement 5. We have added text to the methods clarifying this analysis, and this edited section in the main text describes the results:

“As in Johnson et al., 2019, missing measurements of strongly deleterious mutations are more common in more-fit strains (clones from later timepoints) in YPD 30°C, suggesting that the negative correlation between generations evolved and the mean of the DFE would be stronger with complete data. Indeed, if we look at a limited set of mutations with measurements in every population-timepoint, or if we “fill in” missing measurements using their average fitness effect across population-timepoints, we see similar or stronger patterns of change (P<5×10-7, Wald Test) in the DFE mean in YPD 30°C (Figure 1 —figure supplement 5).”

Figure 2: Make it clear that each dot in the figure represents a disruption mutation, whose fitness is averaged on the background of two clones from the same timepoint/population. Define "background fitness". The label of x-axis is currently missing. Fitness errors on x-axis are not considered when inferring the significance of the correlation. Furthermore, the examples in Figure 2 are sufficient to conclude that epistasis is not strictly fitness-mediated. Authors could foreshadow the conclusions of the next section.

We now include error bars for standard errors on both axes of Figure 2, and we have changed the caption of Figure 2 to clearly describe the graphs:

“Each point depicts the fitness effect (y-axis) of one insertion mutation measured in one population-timepoint, with the measured fitness of that population-timepoint represented on the x-axis.”

We do not label the axes of the individual graphs in this figure to avoid clutter, instead labeling the mini graphs surrounding the figure. We think that along with the caption text above, the axes labels are well described.

You are correct that we do not consider the errors on the x-axis when doing least-squares regression on these data. During this revision we researched and implemented methods for errors-in-variables regression models such as orthogonal regression and the method described in York et al. (2004). While these methods can be used to more accurately infer the parameters of a linear relationship, they are not necessary for detecting the presence of a relationship, and we found that they can be sensitive to errors in our measurement of standard error (i.e. they can overfit points with especially low standard errors, which may sometimes arise by chance in our error calculations (which are now described in more depth in Materials and methods)). We ultimately determined that in this case, ordinary least squares (OLS) is an appropriate method, because (1) we have uncorrelated errors on the x and y axes, and (2) we are concerned with the presence of a relationship between variables rather than an accurate measurement of the slope of a known relationship. When using OLS, errors on either axis will tend to attenuate our estimates of the slope (towards zero), so this is a conservative approach.

We agree that this figure makes the point that epistasis is not strictly fitness-mediated. We have added this line to the start of the Modeling section: “The examples in Figure 2 demonstrate that correlations between fitness effects and fitness are common but often don’t explain the bulk of epistasis.”

Figure 3: Line 214-215, walk the reader through the paper. Current writing is difficult to understand. What do the coefficients represent?

We have rewritten this sentence to provide more clarity: “The examples in Figure 3B also demonstrate that epistasis is not strictly fitness-mediated; we commonly observe a stepwise change in the fitness effect of an insertion mutation in one evolving population, likely indicating epistasis between the insertion mutation and one or more mutations that fix in that population at a particular timepoint.”

The following sentence on the next page clarifies what the coefficients represent: “Positive and negative coefficients in the idiosyncratic model represent positive and negative epistasis between mutations that fix during evolution and our insertion mutations.”

3B: Include a positive or negative model of IM. Consider also show models with x-axis as fitness to keep it consistent with 3C. Similarly, show examples with coefficient ~0 in 3C. The current selection of examples in 3B and 3C can lead to misunderstanding of the IM and Full-M.

We agree that the figure should show more diverse examples of the modeling results. We have updated Figure 3 to include 4 examples of each model, showing a range of magnitudes of both positive and negative coefficients. While we appreciate the suggestion to use fitness on the x-axis, we have opted to continue using generations for the idiosyncratic model plots in order to emphasize that we can think of the determinants of fitness effects without referencing fitness, instead relying on how genotypes are related based on shared ancestry.

One thing I don't quite understand is that models with more parameters as in IM or full-M are going to give better explanatory power than those with fewer parameters as in Fitness-M. Are there alternative ways to evaluate these models taking the number of parameters in the model into consideration?

We have addressed this point above in the essential revisions section.

Line-by-Line comments:

Line 81: Introduce the concept of mutational robustness. The immediate relation between mutational robustness and directional evolution experiments where bottleneck/drift likely plays little role in is not obvious. The authors should explain it in the introduction to motivate the rest of the manuscript.

We have added the following sentences to the beginning of the paragraph to introduce the concept of mutational robustness, with the relevant citations: “Robustness can be broadly defined as invariance in the face of perturbation (Masel and Siegel, 2009). Here we are concerned specifically with mutational robustness, a measure of how invariant phenotypes are to mutations (Lauring et al. 2013).”

Line 106: Explain why these 91 insertion mutations were chosen.

We now address this point as described above.

Line 108-112: Fitness correlation between clones from the same population is not enough to justify the averaging of their fitness as detailed in Public Review.

See our response to this point above.

Line 118: How was p-value calculated? Was error of fitness measurements considered?

We have updated the text here and elsewhere to indicate that Wald Tests were used to calculate p-values for least-squares regression. We discuss our choice to use ordinary least-squares (assuming equal errors) above.

Line 159-160: What is the rationale underlying the choice of 0.05 for slope cutoff?

This is a good question, we have added the following text to the main text and Methods to explain why we chose to use this heuristic cutoff:

In the main text: “The cutoff of 0.05 for the slope was chosen to filter for mutations with an effect size across the range of fitness that is larger than the typical standard error for our fitness effect measurements (see Materials and methods).”

In Materials and methods: “As described in the main text, we classify mutations as being negatively or positively correlated with fitness based on both a statistical test (P<0.05, Wald Test) and the effect size (|slope|>0.05). This heuristic slope cutoff is meant to filter for cases in which the range of fitness effects under consideration is larger than the typical noise for a single fitness effect measurement. In the YPD 30°C environment, a slope of 0.05 across an ~0.15 range of fitnesses will mean fitness effects should vary ~0.007, which is also the mean standard error of our fitness effect measurements in that environment (an analogous calculation in SC 37°C would yield a lower threshold, but we keep 0.05 for consistency).”

Line 203-210: Briefly describe the models, for instance, what (how many) parameters were used. If possible, include the formula of the model.

We have included a more in-depth description of each model, including formulas, in the Materials and methods.

Line 235-236: Explain why the remaining epistatic effects are more likely to be negative.

The point we were making here was that in a model including a background fitness effect (the full model), there are more negative than positive coefficients in P3 SC 37°C. However, since this was based on a relatively small number of coefficients, we have removed this sentence in the new draft (see response above about how noise can affect the fitted coefficients).

Line 237-238: between the epistasis and environment ("GXGXE" effects).

We have made this change, with slightly different wording: “between epistasis and the environment”.

Line 243: Which figure to look at?

We have included a reference to Figure 4A-B here.

Additional analysis that authors could consider:

Will the conclusion change if more/fewer time points from the evolving populations has been studied (for instance, clones were isolated from longer or shorter evolution)?

How could the magnitude of fitness increase during evolution affect the conclusion (i.e. the maximum fitness increase of clones from YPD is ~0.06 while the maximum fitness increase in SC is ~0.03)?

The relationships between the mean of the DFE and generations or background fitness remain significant (P<0.05, Wald Test, least-squares regression) in YPD 30°C when we exclude the first and second timepoints, but not once we also exclude the third timepoint. We believe this analysis is practically an analysis of power, which will decrease as the number of timepoints or the range in fitness increase changed, and we have included the data necessary for other researchers to ask this type of question in Supplementary File 1.

Reviewer #2 (Recommendations for the authors):

I am enthusiastic about this work. My main criticism is that the modeling is not laid out with enough care and thoroughness, leaving me to question some of the downstream conclusions. I have also offered many specific comments to the authors that are all in the form of comments within the manuscript pdf. I have emailed this annotated pdf to the editors who will upload it on my behalf.

Thank you for the detailed comments here and in the attached pdf, and in particular for your deep engagement with the ideas in the Discussion and Ideas and Speculation sections. We have made numerous small edits throughout the paper based on your suggestions, and we have attached a pdf in which we respond to each of your comments along with this revision.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

The manuscript has been improved but there are some remaining issues that need to be addressed, as outlined below:

Reviewer 1 identified issues with some of the supplemental figures, which must be fixed. However, we all appreciate this extensive revision and look forward to a final version.

Reviewer #1 (Recommendations for the authors):

The authors have sufficiently addressed the previous concerns over the choice of mutants and statistical analysis.

However, multiple supplement figures in the revised manuscript do not have any data points. The technical errors should be fixed before the publication.

We believe the reviewer is referring to several blank plots in Figure 1 —figure supplements 1-3 and Figure 1 —figure supplement 9, which are due to missing data. It is also possible that there were PDF conversion issues that removed points from supplemental figures, which will be fixed for the final figure submission.

To clarify the blank plots, we have added the following to the figure captions for Figure 1 —figure supplements 1-3: “Blank graphs represent cases in which one or both replicates did not have fitness measurements, due to low read counts or insufficient neutral barcodes for mean fitness estimation.” We have also added the following to the figure caption for Figure 1 —figure supplement 9: “Assays in which less than three timepoints have at least 5000 reads are not plotted.”

https://doi.org/10.7554/eLife.76491.sa2

Article and author information

Author details

  1. Milo S Johnson

    1. Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, United States
    2. Quantitative Biology Initiative, Harvard University, Cambridge, United States
    3. NSF-Simons Center for Mathematical and Statistical Analysis of Biology, Harvard University, Cambridge, United States
    Contribution
    Conceptualization, Formal analysis, Investigation, Writing – original draft, Writing – review and editing
    For correspondence
    milo.s.johnson.13@gmail.com
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-0169-2494
  2. Michael M Desai

    1. Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, United States
    2. Quantitative Biology Initiative, Harvard University, Cambridge, United States
    3. NSF-Simons Center for Mathematical and Statistical Analysis of Biology, Harvard University, Cambridge, United States
    4. Department of Physics, Harvard University, Cambridge, United States
    Contribution
    Conceptualization, Resources, Supervision, Funding acquisition, Investigation, Writing – original draft, Project administration, Writing – review and editing
    For correspondence
    mdesai@oeb.harvard.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9581-1150

Funding

National Science Foundation (Graduate Research Fellowship)

  • Milo S Johnson

National Science Foundation (PHY-1914916)

  • Michael M Desai

National Institutes of Health (GM104239)

  • Michael M Desai

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Sergey Kryazhimskiy, Alena Martsul, Andrew Murray, and members of the Desai lab for useful discussions about experimental design and analysis. We thank Shreyas Gopalakrishnan, Juhee Goyal, and Megan E Dillingham for their help with isolating the clones used in this experiment. We thank Craig Miller and one anonymous reviewer for helpful discussion and comments during the revision process. This work was supported by an NSF Graduate Research Fellowship (to MSJ), the NSF (PHY-1914916), and the NIH (GM104239). Computational work was performed on the Cannon cluster supported by the Research Computing Group at Harvard University.

Senior Editor

  1. Aleksandra M Walczak, CNRS LPENS, France

Reviewing Editor

  1. Vaughn S Cooper, University of Pittsburgh, United States

Reviewer

  1. Craig Miller

Publication history

  1. Received: December 17, 2021
  2. Preprint posted: December 20, 2021 (view preprint)
  3. Accepted: July 25, 2022
  4. Accepted Manuscript published: July 26, 2022 (version 1)
  5. Version of Record published: August 5, 2022 (version 2)

Copyright

© 2022, Johnson and Desai

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 525
    Page views
  • 198
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Milo S Johnson
  2. Michael M Desai
(2022)
Mutational robustness changes during long-term adaptation in laboratory budding yeast populations
eLife 11:e76491.
https://doi.org/10.7554/eLife.76491
  1. Further reading

Further reading

    1. Computational and Systems Biology
    2. Evolutionary Biology
    Guillem Santamaria, Chen Liao ... Joao B Xavier
    Research Article Updated

    Microbes have disproportionate impacts on the macroscopic world. This is in part due to their ability to grow to large populations that collectively secrete massive amounts of secondary metabolites and alter their environment. Yet, the conditions favoring secondary metabolism despite the potential costs for primary metabolism remain unclear. Here we investigated the biosurfactants that the bacterium Pseudomonas aeruginosa makes and secretes to decrease the surface tension of surrounding liquid. Using a combination of genomics, metabolomics, transcriptomics, and mathematical modeling we show that the ability to make surfactants from glycerol varies inconsistently across the phylogenetic tree; instead, lineages that lost this ability are also worse at reducing the oxidative stress of primary metabolism on glycerol. Experiments with different carbon sources support a link with oxidative stress that explains the inconsistent distribution across the P. aeruginosa phylogeny and suggests a general principle: P. aeruginosa lineages produce surfactants if they can reduce the oxidative stress produced by primary metabolism and have excess resources, beyond their primary needs, to afford secondary metabolism. These results add a new layer to the regulation of a secondary metabolite unessential for primary metabolism but important to change physical properties of the environments surrounding bacterial populations.

    1. Evolutionary Biology
    2. Neuroscience
    Antoine Beauchamp, Yohan Yee ... Jason P Lerch
    Research Advance Updated

    The ever-increasing use of mouse models in preclinical neuroscience research calls for an improvement in the methods used to translate findings between mouse and human brains. Previously, we showed that the brains of primates can be compared in a direct quantitative manner using a common reference space built from white matter tractography data (Mars et al., 2018b). Here, we extend the common space approach to evaluate the similarity of mouse and human brain regions using openly accessible brain-wide transcriptomic data sets. We show that mouse-human homologous genes capture broad patterns of neuroanatomical organization, but the resolution of cross-species correspondences can be improved using a novel supervised machine learning approach. Using this method, we demonstrate that sensorimotor subdivisions of the neocortex exhibit greater similarity between species, compared with supramodal subdivisions, and mouse isocortical regions separate into sensorimotor and supramodal clusters based on their similarity to human cortical regions. We also find that mouse and human striatal regions are strongly conserved, with the mouse caudoputamen exhibiting an equal degree of similarity to both the human caudate and putamen.