Quantifying microbial fitness in high-throughput experiments

Justus Wilhelm Fink author has email address
Michael Manhart author has email address

Institute of Integrative Biology, Department of Environmental Systems Science, ETH Zurich, Zurich, Switzerland
Center for Advanced Biotechnology and Medicine, Rutgers University, Piscataway, United States
Department of Environmental Microbiology, Swiss Federal Institute of Aquatic Science and Technology (Eawag), Dübendorf, Switzerland
Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, Rutgers University, Piscataway, United States

https://doi.org/10.7554/eLife.102635.2

Open access
Copyright information

Figures and data

Overview of the choice of encoding and the choice of time scale for quantifying relative fitness.
(A) Example trajectory of relative abundance x (top panel) for a mutant invading and eventually replacing a wild-type population. The same trajectory is plotted under the encoding log x (middle panel) and the encoding logit x = log(x/(1 − x)) (bottom panel). (B) Flowchart to predict the future relative abundance of a mutant given a relative fitness value s^m = dm/dt for some encoding m. The current relative abundance x_t is transformed into the new variable m_t = m(x_t), then projected into the future through a linear extrapolation using s^m (upper horizontal arrow) and finally converted back into a frequency x_t+Δt using the decoding function m⁻¹. (C) Four scenarios for a mutant with the same relative fitness per-cycle but with different underlying population dynamics. For each scenario, we show an example trajectory of absolute abundance (stacked) for the wild-type (dark grey) and mutant population (light grey). Each scenario is mapped as a single-dot onto the fold-change diagram (center plot) and colored areas indicated positive (green area) and negative relative fitness per-cycle (blue area; compare Eq. (9)). Fitness per-cycle has isoclines that are parallel to the identity in the LFC diagram. (D) Scenario for rank discrepancy between relative fitness per-cycle and relative fitness per-generation . For a given competition (red dot), rank discrepancy occurs in a bow tie-shaped area of the fold-change space (shaded red). Any competition in the right half of this area (blue dot) will have higher mutant fitness but lower mutant fitness (right inset). As small plots on the left, we show possible population dynamics that generate this fold-change variation. Here the growth rates of the mutant and wild-type are identical in the two scenarios, but the growth period is different (e.g., from additional resources).

Overview of the choice of encoding and the choice of time scale for quantifying relative fitness.
(A) Example trajectory of relative abundance x (top panel) for a mutant invading and eventually replacing a wild-type population. The same trajectory is plotted under the encoding log x (middle panel) and the encoding logit x = log(x/(1 − x)) (bottom panel). (B) Flowchart to predict the future relative abundance of a mutant given a relative fitness value s^m = dm/dt for some encoding m. The current relative abundance x_t is transformed into the new variable m_t = m(x_t), then projected into the future through a linear extrapolation using s^m (upper horizontal arrow) and finally converted back into a frequency x_t+Δt using the decoding function m⁻¹. (C) Four scenarios for a mutant with the same relative fitness per-cycle but with different underlying population dynamics. For each scenario, we show an example trajectory of absolute abundance (stacked) for the wild-type (dark grey) and mutant population (light grey). Each scenario is mapped as a single-dot onto the fold-change diagram (center plot) and colored areas indicated positive (green area) and negative relative fitness per-cycle (blue area; compare Eq. (9)). Fitness per-cycle has isoclines that are parallel to the identity in the LFC diagram. (D) Scenario for rank discrepancy between relative fitness per-cycle and relative fitness per-generation . For a given competition (red dot), rank discrepancy occurs in a bow tie-shaped area of the fold-change space (shaded red). Any competition in the right half of this area (blue dot) will have higher mutant fitness but lower mutant fitness (right inset). As small plots on the left, we show possible population dynamics that generate this fold-change variation. Here the growth rates of the mutant and wild-type are identical in the two scenarios, but the growth period is different (e.g., from additional resources).

Comparison of mutant fitness rankings with different time scales using empirical trait variation.
(A) Overview of pairwise coculture simulations. For each mutant strain (grey), we simulate a competition growth cycle against a reference wild-type strain (blue) using the estimated traits and laboratory parameters for the initial condition (N₀ = 0.05 OD, R₀ = 111 mM glucose, x = 0.5; Methods and see Figure for the underlying trait distribution). (B) Rank discrepancy between relative fitness per-cycle and per-generation . Higher rank is defined as higher fitness, and the rank difference is defined as the rank in minus the rank in . The blue halo marks the mutant with the greatest rank difference. (C) Covariation between wild-type and mutant fold-change across all simulated competitions for all mutant strains (grey dots). The blue halo marks the same mutant as in panel B, with its red bow tie-shaped area defining the space of LFCs that would have a rank discrepancy with it (compare Figure 1D), which includes many other mutant competitions.

Comparison of mutant fitness rankings with different time scales using empirical trait variation.
(A) Overview of pairwise coculture simulations. For each mutant strain (grey), we simulate a competition growth cycle against a reference wild-type strain (blue) using the estimated traits and laboratory parameters for the initial condition (N₀ = 0.05 OD, R₀ = 111 mM glucose, x = 0.5; Methods and see Figure for the underlying trait distribution). (B) Rank discrepancy between relative fitness per-cycle and per-generation . Higher rank is defined as higher fitness, and the rank difference is defined as the rank in minus the rank in . The blue halo marks the mutant with the greatest rank difference. (C) Covariation between wild-type and mutant fold-change across all simulated competitions for all mutant strains (grey dots). The blue halo marks the same mutant as in panel B, with its red bow tie-shaped area defining the space of LFCs that would have a rank discrepancy with it (compare Figure 1D), which includes many other mutant competitions.

Discrepancy between fitness per-generation and per-cycle in epistasis detection and long-term trend.
(A) Schematic detection of epistasis for two single mutants (light and dark circle). Epistasis occurs if the corresponding double mutant (two-component circles ) deviates from the additive fitness (parallelogram shape in solid lines; Methods). (B) Correlation between epistasis in relative fitness per-cycle s_cycle and per-generation s_gen. Each dot corresponds to a hypothetical double mutant that combines pairs of mutations systematically perturbing each of the three growth traits (Figure 2A) in our minimal model of population dynamics (Methods). (C) Heatmap for epistasis in the per-cycle fitness s_cycle for all double mutants in our simulated data set, same data as in panel B. (D) Heatmap for epistasis in the per-generation fitness s_gen, same data as in panel B. All epistasis plots are based on 1:1 competition growth cycles with the wild-type (Figures S9–S14). See Figure for epistasis patterns at low mutant relative abundance. (E) Hypothetical scenario for trait evolution in an evolution experiment. An evolving population decreases lag time to zero (orange line), increases growth rate until saturation (blue line), and gradually decreases biomass yield (green line). This trend is similar to initial observations from the LTEE [98]. (F) Corresponding long-term trend in relative fitness based on the trait evolution in panel E. We estimate relative fitness per-cycle (grey line) and per-generation (red line) every 250 generations. Dotted vertical lines mark the end of trait evolution in lag time and growth rate in panel E. For the actual fitness trend in the LTEE, see Figure S16.

Discrepancy between fitness per-generation and per-cycle in epistasis detection and long-term trend.
(A) Schematic detection of epistasis for two single mutants (light and dark circle). Epistasis occurs if the corresponding double mutant (two-component circles ) deviates from the additive fitness (parallelogram shape in solid lines; Methods). (B) Correlation between epistasis in relative fitness per-cycle s_cycle and per-generation s_gen. Each dot corresponds to a hypothetical double mutant that combines pairs of mutations systematically perturbing each of the three growth traits (Figure 2A) in our minimal model of population dynamics (Methods). (C) Heatmap for epistasis in the per-cycle fitness s_cycle for all double mutants in our simulated data set, same data as in panel B. (D) Heatmap for epistasis in the per-generation fitness s_gen, same data as in panel B. All epistasis plots are based on 1:1 competition growth cycles with the wild-type (Figures S9–S14). See Figure for epistasis patterns at low mutant relative abundance. (E) Hypothetical scenario for trait evolution in an evolution experiment. An evolving population decreases lag time to zero (orange line), increases growth rate until saturation (blue line), and gradually decreases biomass yield (green line). This trend is similar to initial observations from the LTEE [98]. (F) Corresponding long-term trend in relative fitness based on the trait evolution in panel E. We estimate relative fitness per-cycle (grey line) and per-generation (red line) every 250 generations. Dotted vertical lines mark the end of trait evolution in lag time and growth rate in panel E. For the actual fitness trend in the LTEE, see Figure S16.

Rank discrepancies between fitness statistics in LTEE competition data.
(A) Rank difference between relative fitness per-cycle and per-generation as a function of time in the LTEE. For each of the 12 lines in the LTEE, we compute the relative fitness per-cycle and per-generation as a function of time, by averaging the data from the LTEE competition data set [11] across replicate measurements (Sec. S9). For some lines, the time-series is truncated due to measurement difficulties [11]. Based on the quantitative fitness values and , we compute fitness rankings between the replicate lines at any given time point (higher rank means higher fitness ; compare Figure 2B). The rank difference is defined as the rank in minus the rank in , and we combine the rank difference between all pairs of evolving lines to compute the maximum rank difference at each time-point. Insets: At three chosen time-points (t = 4000, t = 15000, t = 30000 generations), we show the correlation between relative fitness per-cycle and relative fitness per-generation (each dot corresponds to one of the 12 lines in the LTEE). We quantify the correlation between and with the Spearman rank correlation λ (panel title). Colors indicate the time-point and correspond to the color bar in panel B. (B) Covariation between relative fitness per-cycle and per-generation for evolved populations of the LTEE. Each line of the LTEE contributes roughly two competitions per time-point due to replicate measurements (Sec. S9) [11]. The color of the points indicates evolutionary time point; see panel (C) for the color bar. We highlight the evolved population with the greatest rank difference (blue halo; same as in Figure S17). (C) Same as panel B but comparing relative fitness under the logit-encoding and under the log-encoding .

Rank discrepancies between fitness statistics in LTEE competition data.
(A) Rank difference between relative fitness per-cycle and per-generation as a function of time in the LTEE. For each of the 12 lines in the LTEE, we compute the relative fitness per-cycle and per-generation as a function of time, by averaging the data from the LTEE competition data set [11] across replicate measurements (Sec. S9). For some lines, the time-series is truncated due to measurement difficulties [11]. Based on the quantitative fitness values and , we compute fitness rankings between the replicate lines at any given time point (higher rank means higher fitness ; compare Figure 2B). The rank difference is defined as the rank in minus the rank in , and we combine the rank difference between all pairs of evolving lines to compute the maximum rank difference at each time-point. Insets: At three chosen time-points (t = 4000, t = 15000, t = 30000 generations), we show the correlation between relative fitness per-cycle and relative fitness per-generation (each dot corresponds to one of the 12 lines in the LTEE). We quantify the correlation between and with the Spearman rank correlation λ (panel title). Colors indicate the time-point and correspond to the color bar in panel B. (B) Covariation between relative fitness per-cycle and per-generation for evolved populations of the LTEE. Each line of the LTEE contributes roughly two competitions per time-point due to replicate measurements (Sec. S9) [11]. The color of the points indicates evolutionary time point; see panel (C) for the color bar. We highlight the evolved population with the greatest rank difference (blue halo; same as in Figure S17). (C) Same as panel B but comparing relative fitness under the logit-encoding and under the log-encoding .

The choice of library abundance and reference group in bulk competition experiments.
(A) Overview of a pairwise competition experiment with one mutant (top row) and scenarios for bulk competition experiments with many different mutants (middle and bottom rows). The two bulk competition scenarios differ in their initial fraction of the mutant library (colored ovals) in the inoculum (open box). For each scenario, we show a schematic growth cycle (log absolute abundance) in the inset on the right. (B) Schematic relative abundance trajectories for a mutant compared to two alternative subpopulations. We distinguish between the total relative abundance x_i with respect to the population as a whole (height of green band in the top box) and the pairwise relative abundance x_i,wt with respect to the wild-type (height of green band in the bottom box; Eq. (20)). We indicate the sign of total relative fitness (Eq. (24)) and pairwise relative fitness (Eq. (25)) on the right. (C) The absolute error between bulk and pairwise competition experiments. The total relative fitness (dark grey dots; Eq. (24)) and the pairwise relative fitness (orange dots; Eq. (25)) for mutants in the minimal population dynamics model (Figure 2A, Methods) parameterized by the yeast knockout data (Figure S4) in bulk competition growth cycle with low mutant library abundance (panel A, case II; Methods). The absolute error is defined as the bulk fitness statistic minus the relative fitness in pairwise competition (Eq. (S92) in Sec. S15). In the inset, the absolute error for pairwise relative fitness (Eq. (25)) for a bulk competition growth cycle with 99.9% library abundance (light grey dots; case III). This still includes a wild-type reference at small percentage to be able to compute a pairwise relative fitness. The x-axis and the orange dots in the inset are identical to the main plot. (D) The relative error in bulk competition experiments as a function of mutant library abundance in the inoculum. Each line corresponds to a knockout mutant in our data set, and represents the relative error between the pairwise relative fitness in bulk competition and the relative fitness in pairwise competition (Eq. (S102) in Sec. S16). The black vertical lines show the recommended mutant library abundance for our data set based on Eq. (11) (x_lib ≈ 24.6%) and based on the more conservative Eq. (S118) (x_lib ≈ 0.02%, Sec. S16). Note these bounds are calculated a priori and are not the result of a fitting procedure.

Empirical benchmarks of bulk fitness against pairwise competition experiments.
(A) Fitness measurements as reported by Opijnen et al. [103] for knockouts of S. pneumoniae. Pairwise fitness was measured in 1:1 competitions of knockouts strains with the wild-type over a single growth cycle using the per-generation statistic [26]. Bulk fitness was measured from a single growth cycle using 100% initial abundance of the mutant library and a set of neutral strains as the reference to calculate a per-generation fitness, replacing LFC_wt with the library LFC in Eq. (S31) (Sec. S5). Since we were unable to obtain the original data, we extracted these fitness data from Figure 3c in the original publication and subtracted 1 to plot as . See Figure S24 for the absolute and relative error of bulk fitness based on these data. (B) Fitness measurements as reported by Levy et al. [13] for evolved genotypes of of S. cerevisiae after 80 generations of the evolution experiment. Pairwise fitness was measured in 1:1 competitions of knockouts trains with a fluorescently-labeled wild-type over seven growth cycle using relative fitness per-cycle under the log-encoding , divided by a fixed number generations inferred from the dilution factor. Bulk fitness was measured from relative abundance data throughout the evolution experiment and using a multi-step algorithm, Bayesian inference, and a set of neutral strains as the reference, leading to a pairwise fitness estimate equivalent to (compare SI Eq. (50) in Levy et al. [13]). We extracted the fitness data from the table presented in SI Sec. 8.1 of the original publication. See Figure for the absolute and relative error of bulk fitness based on these data.

Empirical benchmarks of bulk fitness against pairwise competition experiments.
(A) Fitness measurements as reported by Opijnen et al. [103] for knockouts of S. pneumoniae. Pairwise fitness was measured in 1:1 competitions of knockouts strains with the wild-type over a single growth cycle using the per-generation statistic [26]. Bulk fitness was measured from a single growth cycle using 100% initial abundance of the mutant library and a set of neutral strains as the reference to calculate a per-generation fitness, replacing LFC_wt with the library LFC in Eq. (S31) (Sec. S5). Since we were unable to obtain the original data, we extracted these fitness data from Figure 3c in the original publication and subtracted 1 to plot as . See Figure S24 for the absolute and relative error of bulk fitness based on these data. (B) Fitness measurements as reported by Levy et al. [13] for evolved genotypes of of S. cerevisiae after 80 generations of the evolution experiment. Pairwise fitness was measured in 1:1 competitions of knockouts trains with a fluorescently-labeled wild-type over seven growth cycle using relative fitness per-cycle under the log-encoding , divided by a fixed number generations inferred from the dilution factor. Bulk fitness was measured from relative abundance data throughout the evolution experiment and using a multi-step algorithm, Bayesian inference, and a set of neutral strains as the reference, leading to a pairwise fitness estimate equivalent to (compare SI Eq. (50) in Levy et al. [13]). We extracted the fitness data from the table presented in SI Sec. 8.1 of the original publication. See Figure for the absolute and relative error of bulk fitness based on these data.

Parameter settings for Gaussian Process optimisation.

Predicting the absolute and relative abundance of microbial populations.
(A) Example time series of absolute abundance for a single microbial population (light grey). The observer (eye symbol) of the time series at time t can ask about the future trend in absolute abundance (arrows). (B) Same as panel (A) but for two microbial strains, such as a wild-type and a mutant. The absolute abundance of the wild-type strain (light grey) is stacked on top of the absolute abundance of the mutant strain (dark grey). (C) The relative abundance for the mutant strain corresponding to panel (B). We sketch the mean relative abundance x over time (green line) inferred from multiple replicate cocultures. An error band (light green) around the time series demonstrates the variation between replicates.

The effect of encodings on a non-logistic relative abundance trajectory.
Example trajectory of relative abundance x (top panel) for a mutant invading and eventually replacing a wild-type population, simulated with the Gompertz equation dx/dt = rx log(1/x). Below, we show the Gompertz trajectory under the encodings log x and logit x = log(x/(1 − x)). Compare to Figure 1A for a trajectory simulated with the logistic equation.

The variation of wild-type and mutant log fold-change under resource consumption constraints.
(A) Schematic fold-change variation under a perfect resource consumption constraint between wild-type and mutant. Each dot corresponds to a mutant strain in a 1:1 competition growth cycle with the wild-type strain (Sec. S5). There is negative covariation between mutant and wild-type LFCs because the resource constraint forces one to decrease if the other increases. For our model of population dynamics, we can calculate this constraint exactly (black line: Eq. (S36) in Sec. S5). For two mutants (red points) we highlight the bow tie-shaped areas (red shading; compare Figure 1D) where other competitions must lie to have a rank discrepancy between relative fitness per-cycle and per-generation with those mutants, but they are inevitably empty due to the negative covariance across competitions from the resource constraint. Inset: Correlation of fitness per-cycle and per-generation for these same competitions, showing no rank discrepancy. (B) Same as (A) but schematically showing how a deviation from the resource consumption constraint can allow competitions to fall in each other’s red bow tie-shaped regions (e.g., blue, orange, and green competitions compared to the red one), and thus have a rank discrepancy between relative fitness per-cycle and per-generation (inset).

The variation of wild-type and mutant log fold-change under resource consumption constraints.
(A) Schematic fold-change variation under a perfect resource consumption constraint between wild-type and mutant. Each dot corresponds to a mutant strain in a 1:1 competition growth cycle with the wild-type strain (Sec. S5). There is negative covariation between mutant and wild-type LFCs because the resource constraint forces one to decrease if the other increases. For our model of population dynamics, we can calculate this constraint exactly (black line: Eq. (S36) in Sec. S5). For two mutants (red points) we highlight the bow tie-shaped areas (red shading; compare Figure 1D) where other competitions must lie to have a rank discrepancy between relative fitness per-cycle and per-generation with those mutants, but they are inevitably empty due to the negative covariance across competitions from the resource constraint. Inset: Correlation of fitness per-cycle and per-generation for these same competitions, showing no rank discrepancy. (B) Same as (A) but schematically showing how a deviation from the resource consumption constraint can allow competitions to fall in each other’s red bow tie-shaped regions (e.g., blue, orange, and green competitions compared to the red one), and thus have a rank discrepancy between relative fitness per-cycle and per-generation (inset).

Empirical trait distribution for single gene-knockouts of yeast.
(A) Overview of the growth curve data set and the estimated growth traits for the knockout library of Saccharomyces cerevisiae (Methods) [118]. (B) Covariation between estimated maximum growth rate g and lag time λ across all mutant strains (grey dots; Pearson correlation coefficient r = −0.17, p = 7 × 10⁻³⁰) as well as wild-type replicates (orange dots; r = −0.16, p = 0.002). The reference wild-type strain for our pairwise coculture simulations is defined by the median trait values (black cross) of all wild-type replicates. (C) Covariation between measured maximum growth rate g and biomass yield Y across all mutant strains (grey dots; r = 0.21, p = 8 × 10⁻⁴⁴) as well as wild-type replicates (orange dots; r = −0.06, p = 0.25). Histograms on the top and right sides of (B) and (C) are marginal distributions along each axis, with colors matching the scatter points and the vertical black line marking the median wild-type trait value.

Replicate measurements for growth rate, lag time, and yield in our empirical trait data set.
(A) Co-variation of growth rate between replicate measurements of the knockouts (grey dots; Pearson correlation coefficient r = 0.94, p = 0.00). Each dot represents a mutant genotype from the single-gene knockout collection in Saccharomyces cerevisiae [118]. For the vast majority of genotypes in our data set (4163 out of 4492 knockouts) we are able to fit two traits from two independent growth curve measurements (Figure S4A; Methods). (B) Covariation of lag time between replicate measurements of the knockouts (grey dots; r = 0.90, p = 0.00) (C) Covariation of biomass yield between replicate measurements of the knockouts (grey dots; r = 0.43, p = 4.81 × 10⁻¹⁸⁸).

The distribution of fitness effects under relative fitness per-cycle and per-generation.
(A) Distribution of relative fitness per-cycle for the single-gene knockouts of yeast (grey) and wild-type replicates (orange) in our data set (Figure S4A, Methods) [118], based on the fitness data from Figure 2B. The vertical black line marks zero relative fitness (e.g., the median wild-type). (B) Same as (A) but with relative fitness per-generation (C) Covariation in fitness ranks between relative fitness per-cycle and per-generation . For each fitness statistic, we calculate the mutant ranking (higher rank means higher fitness and mutants with equal fitness are assigned the lowest rank in the group), based on the rank data from Figure 2B.

The distribution of fitness effects under relative fitness per-cycle and per-generation.
(A) Distribution of relative fitness per-cycle for the single-gene knockouts of yeast (grey) and wild-type replicates (orange) in our data set (Figure S4A, Methods) [118], based on the fitness data from Figure 2B. The vertical black line marks zero relative fitness (e.g., the median wild-type). (B) Same as (A) but with relative fitness per-generation (C) Covariation in fitness ranks between relative fitness per-cycle and per-generation . For each fitness statistic, we calculate the mutant ranking (higher rank means higher fitness and mutants with equal fitness are assigned the lowest rank in the group), based on the rank data from Figure 2B.

Exploring alternative conditions for rank discrepancy between fitness statistics in yeast gene-deletion data.
Plots columns (A) and (B) have the same format as Figure S4B,C, and columns (C) and (D) match Figure 2B,C (in reverse order). Row (A): Low mutant frequency x = 0.01. Row (B): Standard mutant frequency x = 0.5, but where we artificially eliminate variation in lag time to test the effect on the fitness rank discrepancy. Row (C): Same as row (B) but no variation in growth rate. Row (D): Same as row (B) but no variation in biomass yield. Row (E): Same as row (B), but yields of mutants lower than the median wild-type yield are set to the wild-type value.

Hypothetical case where relative fitness per-cycle and per-generation are anticorrelated across mutants.
(A) Covariation between growth rate and lag time for a synthetically generated set of mutants (grey dots) and a single wild-type (orange dots). The variation here may represent the standing variation in an evolved population with improved growth rate and lag time over the wild-type ancestor. The graphic overlap between points means they appear as a single line. (B) Covariation between growth rate and biomass yield for the mutants (grey dots) and wild-type (orange dot) in this synthetic data set. (C) Covariation between relative fitness per-cycle and per-generation . We simulate pairwise competitions for all mutants against the wild-type with the exact same settings as for the empirical data set (Figure 2A; Methods). (D) Covariation between mutant and wild-type fold-change in the competitions. Based on the competition data in panel C (grey dots), we estimate the LFC values for all mutant-to-wild-type competitions (grey dots). We highlight the bow tie-shaped area of rank discrepancy for the mutant with the lowest overall fold-change (red shading; compare Figure 1D). The dashed black line shows the isocline for zero relative fitness per-cycle (Eq. (9)). Distributions of relative fitness per-cycle (E) and per-generation (F) for the mutants in panel (A,B). Based on the fitness data from panel (C).

Hypothetical case where relative fitness per-cycle and per-generation are anticorrelated across mutants.
(A) Covariation between growth rate and lag time for a synthetically generated set of mutants (grey dots) and a single wild-type (orange dots). The variation here may represent the standing variation in an evolved population with improved growth rate and lag time over the wild-type ancestor. The graphic overlap between points means they appear as a single line. (B) Covariation between growth rate and biomass yield for the mutants (grey dots) and wild-type (orange dot) in this synthetic data set. (C) Covariation between relative fitness per-cycle and per-generation . We simulate pairwise competitions for all mutants against the wild-type with the exact same settings as for the empirical data set (Figure 2A; Methods). (D) Covariation between mutant and wild-type fold-change in the competitions. Based on the competition data in panel C (grey dots), we estimate the LFC values for all mutant-to-wild-type competitions (grey dots). We highlight the bow tie-shaped area of rank discrepancy for the mutant with the lowest overall fold-change (red shading; compare Figure 1D). The dashed black line shows the isocline for zero relative fitness per-cycle (Eq. (9)). Distributions of relative fitness per-cycle (E) and per-generation (F) for the mutants in panel (A,B). Based on the fitness data from panel (C).

Competition growth cycles for double mutants — first column of Figure 3C,D.
Each row corresponds to a pair of mutations that perturb growth traits in the minimal model of population dynamics (Figure 2A, Methods). This figure shows all combinations where the first mutant increases growth rate (first column of Figure 3C,D). Column (A): Perturbations of growth traits for single and double mutants. Column (B): Growth curves of wild-type (grey) and mutant 1 (blue). Column (C): Growth curves of wild-type (grey) and mutant 2 (red). Column (D): Growth curves of wild-type (grey) and double mutant (purple).

Competition growth cycles for double mutants — second column of Figure 3C,D.
Same as Figure S9 but for mutant pairs where the first mutant decreases growth rate.

Competition growth cycles for double mutants — third column of Figure 3C,D.
Same as Figure S9 but for mutant pairs where the first mutant increases lag time.

Competition growth cycles for double mutants — fourth column of Figure 3C,D.
Same as Figure S9 but for mutant pairs where the first mutant decreases lag time.

Competition growth cycles for double mutants — fifth column of Figure 3C,D.
Same as Figure S9 but for mutant pairs where the first mutant increases biomass yield.

Competition growth cycles for double mutants — sixth column of Figure 3C,D.
Same as Figure but for mutant pairs where the first mutant decreases biomass yield. For the double mutant where the second mutant also decreases yield, we cannot use the additive mutation effects (Methods) because that would result in a negative yield for the double mutant. Here we have defined the double mutant to have a yeield value of Y_mut = 0.1. This choice does not affect the fitness or epistasis, since biomass yield is a neutral trait with fitness zero in our model of population dynamics.

Epistasis patterns at low mutant relative abundance.
Same as Figure 3B–D but where the mutant starts each competition at low relative abundance (x₀ = 0.01; Sec. S8), compared to the 1:1 initial conditions in Figure (x₀ = 0.5).

Long-term fitness trends in the LTEE under relative fitness per-cycle and per-generation.
(A) Fit of a hyperbolic model (pink line; Eq. (S56) in Sec. S9) and a power-law model (cyan line; Eq. (S57) in Sec. S9) to a pooled time-series of relative fitness per-generation . We pool the fitness per-generation from all 12 lines of the LTEE into a single data set (grey dots) and repeat the fits of Wiser et al. [123] (Sec. S9). We compute the fraction of variance explained R² as a measure for the quality of fit. (B) Same as panel (A) but using relative fitness per-cycle . (C) Same as panel (A) but averaging fitness across all 12 lines at each time point. Note that the fraction of variance explained in is much higher compared to panel (A), because the averaged time-series has fewer points (Sec. S9). (D) Same as panel (C) but using relative fitness per-cycle . For a comparison between our results and the original fit by Wiser et al. [123], see Sec. S9.

Long-term fitness trends in the LTEE under relative fitness per-cycle and per-generation.
(A) Fit of a hyperbolic model (pink line; Eq. (S56) in Sec. S9) and a power-law model (cyan line; Eq. (S57) in Sec. S9) to a pooled time-series of relative fitness per-generation . We pool the fitness per-generation from all 12 lines of the LTEE into a single data set (grey dots) and repeat the fits of Wiser et al. [123] (Sec. S9). We compute the fraction of variance explained R² as a measure for the quality of fit. (B) Same as panel (A) but using relative fitness per-cycle . (C) Same as panel (A) but averaging fitness across all 12 lines at each time point. Note that the fraction of variance explained in is much higher compared to panel (A), because the averaged time-series has fewer points (Sec. S9). (D) Same as panel (C) but using relative fitness per-cycle . For a comparison between our results and the original fit by Wiser et al. [123], see Sec. S9.

Comparison of relative fitness per-generation and per-cycle across the complete LTEE competition data set.
(A) Rank discrepancy between relative fitness per-cycle and per-generation for evolved populations of the LTEE. For each fitness statistic, we calculate a ranking (higher rank means higher fitness and mutants with equal fitness are assigned the lowest rank in the group; compare Figure 2B) across all 12 evolved lines and all time points. The rank difference is defined as the rank in minus the rank in . We highlight the evolved population with the greatest rank difference (blue halo). (B) Covariation between the wild-type and mutant fold-change for the competition data from the LTEE [123]. The term “mutant” (y-axis) refers to an evolved population at a given time point from one of the 12 lines of the LTEE. We highlight the evolved population with the greatest rank difference (blue halo; same as in panel (A)) and draw its corresponding bow tie-shaped area of rank discrepancy between and (red shading; compare Figure 1D). The coloring in all panels refers to the time point at which the evolved population was sampled from the LTEE.

Comparison of relative fitness per-generation and per-cycle across the complete LTEE competition data set.
(A) Rank discrepancy between relative fitness per-cycle and per-generation for evolved populations of the LTEE. For each fitness statistic, we calculate a ranking (higher rank means higher fitness and mutants with equal fitness are assigned the lowest rank in the group; compare Figure 2B) across all 12 evolved lines and all time points. The rank difference is defined as the rank in minus the rank in . We highlight the evolved population with the greatest rank difference (blue halo). (B) Covariation between the wild-type and mutant fold-change for the competition data from the LTEE [123]. The term “mutant” (y-axis) refers to an evolved population at a given time point from one of the 12 lines of the LTEE. We highlight the evolved population with the greatest rank difference (blue halo; same as in panel (A)) and draw its corresponding bow tie-shaped area of rank discrepancy between and (red shading; compare Figure 1D). The coloring in all panels refers to the time point at which the evolved population was sampled from the LTEE.

Comparison of log-encoding and logit-encoding for relative fitness per-cycle in the LTEE competition data set.
(A) Same as Figure S17A but for rank discrepancy between logit-encoded relative fitness per-cycle and log-encoded relative fitness per-cycle for evolved populations of the LTEE. (B) Covariation between the wild-type relative abundance and mutant relative abundance at the start of the competition growth cycle for the competition data from the LTEE [123]. The term “mutant” in the axis labels refers to an evolved population at a given time point from one of the 12 lines of the LTEE. The coloring in all panels refers to the time point at which the evolved population was sampled from the LTEE.

Comparison of log-encoding and logit-encoding for relative fitness per-cycle in the LTEE competition data set.
(A) Same as Figure S17A but for rank discrepancy between logit-encoded relative fitness per-cycle and log-encoded relative fitness per-cycle for evolved populations of the LTEE. (B) Covariation between the wild-type relative abundance and mutant relative abundance at the start of the competition growth cycle for the competition data from the LTEE [123]. The term “mutant” in the axis labels refers to an evolved population at a given time point from one of the 12 lines of the LTEE. The coloring in all panels refers to the time point at which the evolved population was sampled from the LTEE.

Predicting relative fitness with monoculture proxies under different scenarios of trait variation.
Plots in columns (A) and (B) show growth trait variation across mutants, with the same format as Figure S4B,C. Plots in column (C) show quality of prediction for different monoculture proxies under the trait distribution in columns (A) and (B). As the ground truth, we estimate the relative fitness per-cycle using a simulated 1:1 competition growth cycle (Figure 2A; Methods). For each mutant i, we compute the growth rate difference Δg = g_i − g_wt, the lag time difference Δlag = λ_i − λ_wt, the difference in biomass yield Δyield = Y_i − Y_wt, the difference in log fold-change ΔLFC = LFC_i − LFC_wt, and the difference in area under the growth curve ΔAUC = AUC_i − AUC_wt from simulated monoculture growth curves (Sec. S10). We quantify the agreement between the monoculture proxy and the relative fitness per-cycle using the Spearman correlation coefficient τ. Row (A): Empirical growth trait variation from data on yeast single-gene knockout mutants (identical to Figure S4B,C). Row (B): Same as row (A) but where we artificially eliminate variation in lag time to test the effect on the monoculture proxies. Row (C): Same as row (A) but with no variation in growth rate. Row (D): Same as row (A) but with no variation in biomass yield.

Predicting relative fitness with monoculture proxies under different scenarios of trait variation.
Plots in columns (A) and (B) show growth trait variation across mutants, with the same format as Figure S4B,C. Plots in column (C) show quality of prediction for different monoculture proxies under the trait distribution in columns (A) and (B). As the ground truth, we estimate the relative fitness per-cycle using a simulated 1:1 competition growth cycle (Figure 2A; Methods). For each mutant i, we compute the growth rate difference Δg = g_i − g_wt, the lag time difference Δlag = λ_i − λ_wt, the difference in biomass yield Δyield = Y_i − Y_wt, the difference in log fold-change ΔLFC = LFC_i − LFC_wt, and the difference in area under the growth curve ΔAUC = AUC_i − AUC_wt from simulated monoculture growth curves (Sec. S10). We quantify the agreement between the monoculture proxy and the relative fitness per-cycle using the Spearman correlation coefficient τ. Row (A): Empirical growth trait variation from data on yeast single-gene knockout mutants (identical to Figure S4B,C). Row (B): Same as row (A) but where we artificially eliminate variation in lag time to test the effect on the monoculture proxies. Row (C): Same as row (A) but with no variation in growth rate. Row (D): Same as row (A) but with no variation in biomass yield.

The choice of the cutoff time for evaluating the area under the curve (AUC).
(A) Distribution of saturation times in monoculture for the knockouts (grey bars) and wild-type replicates (orange bars) in our empirical data set (Figure S4). The saturation time t_sat is defined as the time when the limiting resource is depleted (R(t_sat) = 0) and can be estimated numerically from the trait data (Methods). Three vertical lines indicate example choices for the cutoff time t_eval of the AUC: the most frequent saturation time (t_eval = 13 hours; red line), a saturation time halfway in the decay of the distribution (t_eval = 16 hours; black line), and an external time scale (t_eval = 24 hours; blue line). (B) Covariation between relative fitness per-cycle and AUC with the cutoff times marked in panel (A): 13 hours (B), 16 hours (C), and 24 hours (D). We compute the AUC from simulated monoculture knockouts (grey dots) and wild-type replicates (orange dots). As the ground truth, we take the relative fitness per-cycle from a pairwise competition (Figure 2A). The red lines show the best fit from a linear regression of to the AUC.

The choice of the cutoff time for evaluating the area under the curve (AUC).
(A) Distribution of saturation times in monoculture for the knockouts (grey bars) and wild-type replicates (orange bars) in our empirical data set (Figure S4). The saturation time t_sat is defined as the time when the limiting resource is depleted (R(t_sat) = 0) and can be estimated numerically from the trait data (Methods). Three vertical lines indicate example choices for the cutoff time t_eval of the AUC: the most frequent saturation time (t_eval = 13 hours; red line), a saturation time halfway in the decay of the distribution (t_eval = 16 hours; black line), and an external time scale (t_eval = 24 hours; blue line). (B) Covariation between relative fitness per-cycle and AUC with the cutoff times marked in panel (A): 13 hours (B), 16 hours (C), and 24 hours (D). We compute the AUC from simulated monoculture knockouts (grey dots) and wild-type replicates (orange dots). As the ground truth, we take the relative fitness per-cycle from a pairwise competition (Figure 2A). The red lines show the best fit from a linear regression of to the AUC.

The error in mutant fitness rankings between bulk and pairwise competition experiments.
(A) Rank difference between total relative fitness and pairwise relative fitness in a bulk competition growth cycle with low mutant library abundance (Figure 5A, case II). Based on the fitness data in Figure 5C, we calculate a mutant ranking for total relative fitness in bulk (Eq. (24)) and a ranking for pairwise relative fitness in bulk (Eq. (25)) (higher rank means higher fitness and mutants with equal fitness are assigned the lowest rank in the group). The rank difference is defined as the rank in total relative fitness minus the rank in pairwise relative fitness . The ranks exactly match in this case because the mutants are sufficiently rate in case II that the population fitness is almost identical to the wild-type fitness. (B) Based on the pairwise relative fitness at low mutant library abundance in Figure 5C (orange dots; case II), we calcuate a mutant ranking for pairwise relative fitness in bulk (Eq. (25)) and a mutant ranking for the relative fitness per-cycle in pairwise competition (Eq. (9). The rank difference (orange dots) is defined as the rank in bulk competition minus the rank in pairwise competition. The rank differences are small because the mutants are sufficient rare in case II that higher-order effects in the bulk competition are minor. (C) Based on the fitness data in the inset of Figure 5C, we estimate a mutant ranking for pairwise relative fitness (Eq. (25)) in the case of a bulk competition growth cycle with high mutant library abundance (grey dots; compare Figure 5A, case III). For each case, the rank difference is defined as the rank in the bulk competition minus the rank in pairwise competition. The rank difference for case II (organge dots) is identical to the data in panel (B). All fitness statistics in this figure are based on the logit-encoding, however, since the underlying relative abundances are small (x ≪ 1), it is approximately equivalent to the log-encoding (log x ≈ logit x).

The error in mutant fitness rankings between bulk and pairwise competition experiments.
(A) Rank difference between total relative fitness and pairwise relative fitness in a bulk competition growth cycle with low mutant library abundance (Figure 5A, case II). Based on the fitness data in Figure 5C, we calculate a mutant ranking for total relative fitness in bulk (Eq. (24)) and a ranking for pairwise relative fitness in bulk (Eq. (25)) (higher rank means higher fitness and mutants with equal fitness are assigned the lowest rank in the group). The rank difference is defined as the rank in total relative fitness minus the rank in pairwise relative fitness . The ranks exactly match in this case because the mutants are sufficiently rate in case II that the population fitness is almost identical to the wild-type fitness. (B) Based on the pairwise relative fitness at low mutant library abundance in Figure 5C (orange dots; case II), we calcuate a mutant ranking for pairwise relative fitness in bulk (Eq. (25)) and a mutant ranking for the relative fitness per-cycle in pairwise competition (Eq. (9). The rank difference (orange dots) is defined as the rank in bulk competition minus the rank in pairwise competition. The rank differences are small because the mutants are sufficient rare in case II that higher-order effects in the bulk competition are minor. (C) Based on the fitness data in the inset of Figure 5C, we estimate a mutant ranking for pairwise relative fitness (Eq. (25)) in the case of a bulk competition growth cycle with high mutant library abundance (grey dots; compare Figure 5A, case III). For each case, the rank difference is defined as the rank in the bulk competition minus the rank in pairwise competition. The rank difference for case II (organge dots) is identical to the data in panel (B). All fitness statistics in this figure are based on the logit-encoding, however, since the underlying relative abundances are small (x ≪ 1), it is approximately equivalent to the log-encoding (log x ≈ logit x).

A decomposition for the error from higher-order interactions in bulk competition experiments.
For a bulk competition growth cycle with low mutant library abundance (Figure 5A, case II), we calculate the pairwise relative fitness (Eq. (25)) for the knockouts in our empirical data set (Figure S4A) using a previously established approximation [122] (Sec. S14). The error from higher-order interactions is defined as the pairwise fitness in bulk minus the fitness in pairwise competition. Based on the approximation for our model of population dynamics (Sec. (S14)), we derive a decomposition that separates the absolute error into two terms that we call the fitness-dependent error term (dark orange dots) and the fitness-independent error term (light orange dots). For full details on the decomposition, see Sec. (S15).

The relative error between bulk and pairwise competition experiments.
(A) Same as Figure 5C but showing relative rather than absolute error. The relative error of each bulk statistic is defined as the absolute error (Figure 5C), divided by the fitness in pairwise competition (Eq. (S102) in Sec. S16). A dashed grey line indicates the threshold of 1% relative error. (B) Same as Figure but showing relative rather than absolute error. (C) Same as the inset of Figure 5C but showing relative rather than absolute error. On the x-axis, we plot the absolute value of relative fitness per-cycle in the pairwise competition.

The bulk fitness error based on measurements by van Opijnen et al. [155].
(A) Absolute error between the pairwise fitness measured in bulk competition and the pairwise fitness measured in pairwise competition for the set of single-gene knockouts of S. pneumoniae shown in Figure 3c in the original publication. We calculate the Pearson correlation coefficient r and the p-value for the null hypothesis of zero correlation, and indicate the linear regression between the two variables with a red line. (B) Same as (A) but showing relative rather than absolute error. We define a set of knockouts with non-neutral fitness (blue dots; ) to calculate the geometric mean relative error (blue line). See Figure 6A for fitness data and experiment conditions.

The bulk fitness error based on measurements by van Opijnen et al. [155].
(A) Absolute error between the pairwise fitness measured in bulk competition and the pairwise fitness measured in pairwise competition for the set of single-gene knockouts of S. pneumoniae shown in Figure 3c in the original publication. We calculate the Pearson correlation coefficient r and the p-value for the null hypothesis of zero correlation, and indicate the linear regression between the two variables with a red line. (B) Same as (A) but showing relative rather than absolute error. We define a set of knockouts with non-neutral fitness (blue dots; ) to calculate the geometric mean relative error (blue line). See Figure 6A for fitness data and experiment conditions.

The bulk fitness error based on measurements by Levy et al.
[32]. Same as Figure but for the set of evolved genotypes of S. cerevisiae shown in Figure 2d in the original publication. In (A), we calculate the Pearson correlation coefficient r and the p-value for the null hypothesis of zero correlation, and indicate the linear regression between the two variables with a red line. In (B) we define a set of knockouts with non-neutral fitness (blue dots; ) to calculate the geometric mean relative error (blue line). See Figure 6A for fitness data and experiment conditions.

The bulk fitness error based on measurements by Levy et al.
[32]. Same as Figure but for the set of evolved genotypes of S. cerevisiae shown in Figure 2d in the original publication. In (A), we calculate the Pearson correlation coefficient r and the p-value for the null hypothesis of zero correlation, and indicate the linear regression between the two variables with a red line. In (B) we define a set of knockouts with non-neutral fitness (blue dots; ) to calculate the geometric mean relative error (blue line). See Figure 6A for fitness data and experiment conditions.

Sign up for email alerts