Quantifying microbial fitness in high-throughput experiments

Justus Wilhelm Fink; Michael Manhart

doi:10.7554/eLife.102635.1

Introduction

Fitness describes the fate of mutations as they arise in a population [1] and serves to define other core evolutionary concepts such as trade-offs [2]. At the same time, fitness is an often confusing term in evolutionary biology [3–5]. It is possible to classify fitness measures by their role in theory [6] and their scale of measurement [7–9], but these arguments offer little guidance on how to measure fitness with microorganisms in practice. In particular for the field of microbial ecology and evolution, empirical fitness measurements allow us to test general evolutionary theory with fast-growing model organisms [10– 15] and can also be used to detect microbial interactions [16, 17], annotate gene function [18, 19], and understand the spread of antibiotic resistance genes [20, 21].

The classic approach to measuring the fitness effects of mutations uses pairwise competition experiments, where the mutant competes with a wild-type in co-culture [22– 24]. This approach is ideal because it closely mimics the dynamics of spontaneous mutations, and thus is typically used in experimental evolution [10, 12, 25–27]. For example, the Long-Term Evolution Experiment (LTEE) has evolved Escherichia coli over tens of thousand of generations and measured the fitness of each evolved lineage in competition with the ancestor [11, 26, 28]. Since measuring pairwise competitions becomes infeasible for a large number of strains, a second approach is to measure properties of genotypes in monoculture, like the growth rate, and combine these measurements for a relative fitness estimate. Monoculture growth curves are straightforward to measure, even for entire libraries of single-gene knock-out mutants [29–31], but empirical tests have shown that the monoculture growth rate is insufficient to predict the mutant fitness [32–34]. Summary statistics, like the area under the curve (AUC), perform better [35–37]. However, this approach breaks down for strains that engage in cross-feeding or toxin production and any attempts to improve monoculture-based fitness estimates require additional experiments [34, 37] and do not go beyond qualitative fitness rankings [37].

A third and more recent approach is to measure relative fitness in bulk competition experiments, based on high-throughput barcode sequencing [13, 38]. For example, transposon-insertion mutagenesis generates libraries of barcoded gene knockouts that can be tracked as they grow in a single batch culture [39]. Estimating the fitness of each barcode allows us to identify genes that are particularly important in the growth environment [18, 19, 40], in the given genetic background [14, 15, 41, 42], and in the context of an ecological community [16, 17, 42, 43]. Bulk fitness estimation also plays a major role to identify beneficial lineages in experimental evolution [13, 44] (not unlike the tracking of viral pathogens in public health [45–47]) and to test their fitness in new environments [48–52]. Many of these studies make different choices about the design of experiments and the calculation of fitness, but without explaining why a certain fitness measure was used or how to compare between data sets.

The inconsistent choices for quantifying microbial fitness make it is difficult to compare fitness across experiments, especially because different groups tend to use different definitions. For example, the LTEE reports relative fitness per generation [11, 26], whereas other evolution experiments exclusively report relative fitness percycle [12, 53]. New techniques, like barcode sequencing, often spark their own fitness metrics [18, 40, 54–56]. It is also not clear what the consequences of this choice are; a different fitness metric might lead to a different ranking of mutants in a genetic screen (and the importance we assign to those genes), besides affecting the quantitative fitness value. The probability of fixation for spontaneous mutations [57, 58], the speed of adaptation [59], and the effect size for gene essentiality experiments [18, 19] all depend on our ability to estimate quantitative fitness values. This raises three key questions: How do these fitness statistics differ, or are some equivalent? Can we say which choices are optimal, in the light of first principles or some practical considerations? And does this tell us how we should design these experiments? Previous work has addressed aspects of these questions [6–9, 60] but the arguments are scattered across disciplines and based on generic models of population growth [6, 7, 9, 61], rather than incorporating the specific dynamics of microbes in laboratory experiments. Here we address these questions from a unified framework, using realistic microbial population dynamics with empirical traits to systematically test different fitness metrics.

Results

Predictive power varies across different relative fitness statistics

While there are other notions of fitness relevant in different evolutionary contexts (Sec. S1, Fig. S1), in this article we focus on relative fitness as this is the most common object of high-throughput laboratory experiments in microbes [18, 26, 40, 55, 56]. To generalize the wide range of relative fitness statistics used in these studies, we define a genotype’s relative fitness as any number that is sufficient to predict the genotype’s relative abundance x(t) over a short-time horizon (Fig. S1). The simplest approach is to take a linear expansion of x(t) and use its slope at time t to predict the change over a time-window Δt:

Under this linear approximation, the slope s^linear = dx/dt|_t constitutes a relative fitness statistic, since it is sufficient to predict the change in relative abundance. Figure 1A (top panel) shows a schematic trajectory of relative abundance x(t) with two examples of this predictive approach (dashed arrows). As we can see, the naive linear statistic of relative fitness can significantly under- or overestimate the actual change in relative abundance, because x(t) changes nonlinearly in time.

Overview of the choice of encoding and the choice of timescale for quantifying relative fitness.
(A) Example trajectory of relative abundance x (upper panel) for a mutant invading and eventually replacing a wild-type population. The same trajectory is plotted under the encoding log x (middle panel) and the logit-encoding log(x/(1 − x)) (lower panel). (B) The general flow-chart to predict the future relative abundance of a mutant given a relative fitness value s^m = *dm/dt* for some encoding m. The current relative abundance x_t is transformed into the new variable m_t = m(x_t), then projected into the future through a linear extrapolation using s^m (upper horizontal arrow) and finally converted back into a frequency x_t+..6t using the decoding function m⁻¹. (C) Four scenarios for positive mutant fitness with different underlying population dynamics. For each scenario, we show an example trajectory of absolute abundance (stacked) for the wild-type (dark grey) and mutant population (light grey). Each scenario is mapped as a single-dot onto the fold-change diagram (center plot) and colored areas indicated positive (green area) and negative relative fitness per-cycle (blue area; compare Eq. (8)). (D) Basic constellation for misranking between relative fitness per-cycle and relative fitness per-generation . For a given competition (red dot), misranking occurs in a bow-tie area of the fold-change space (red shade). Any competition in the right half of this area (grey dot) will have higher mutant fitness but lower mutant fitness (right inset). As small plots on the left, we show possible population dynamics that generate this fold-change variation.

As an alternative, we can transform the relative abundance x into a new variable m(x) that improves the quality of prediction (Fig. 1B; compare [62]). We define an encoding as a smooth, strictly-increasing function of relative abundance. This one-to-one mapping allows us to predict the change in relative abundance using a linear expansion of the encoded relative abundance, rather than the relative abundance itself:

where m⁻¹ is the inverse of the encoding function and

is the relative fitness under the encoding m. Note that the relative fitness of neutral mutations is always zero, independent of the encoding.

For example, Fig. 1A (middle panel) shows the relative abundance under the log-encoding m(x) = log x; this encoding is implicit in typical plots of relative abundance on a logarithmic scale [44]. This leads to an approximately linear trajectory in the beginning, where the prediction quality of the slope of the encoding s^log is thus higher than for the slope of relative abundance s^linear (compare top and middle panels of Fig. 1A). However, under the log-encoding the prediction quality remains poor at later stages in the trajectory. In the case of Fig. 1A, we see that the ideal encoding is given by m(x) = logit x (bottom panel), where the logit function is defined as

This is ideal because it transforms the nonlinear trajectory of relative abundance x(t) into a linear trajectory that can be fully predicted from the linear expansion in Eq. (2).

Why is the logit function m(x) = logit x the ideal encoding of relative abundance for a fitness statistic in this case? Since we generated the relative abundance x(t) in Fig. 1A using logistic dynamics, the logit function is linearly related to the inverse function of x(t) (Sec. S2). More generally, the ideal encoding for a given relative abundance dynamics x(t) is any linear function of the inverse m(x(t)) = at(x) + b of those dynamics. Note that the ambiguous scale a also includes log and logit encodings with a different base of the logarithm (discussed in Ref. [7]). The inversion of x(t) is equivalent to removing frequency (relative abundance) dependence from the relative fitness statistic s^linear [63]. For example, Fig. S2 shows that if we apply the logit-encoding to a trajectory with non-logistic underlying dynamics (e.g., Gompertz model), the logit transform reduces but does not entirely remove the nonlinearity in the relative abundance trajectory (see the mathematical notes by Mallet [64] for more examples). We note that is possible to generalize the concept of encodings to absolute abundance as well (Sec. S3).

Of course, the exact dynamics of microbial populations are generally not known, and if they were known, there would be no need to make predictions using the linear expansion in Eq. (2). In the absence of an exact model, there are mathematical and practical reasons for why the logit function is a sensible encoding for relative abundance. First, logistic dynamics can be interpreted as the lowest-order approximation to more complex dynamics of relative abundance (Sec. S2). Second, the logit encoding is particularly suited to the binomial sampling noise in experimental measurements of relative abundance (Fig. S3). Logit is the designated link function for binomial random variables, meaning that it normalizes the measurement variance across relative abundances (heteroscedasticity), regardless of whether the underlying dynamics are logistic [65–67]. For these reasons, we exclusively focus on the logit-based relative fitness for the remainder of this work.

Relative fitness statistics require a choice of timescale

In practice, the relative abundance of a genotype is only available as a trajectory of discrete time points (see crossmarks in Fig. S1C). To estimate the relative fitness at a given time point, we calculate the finite difference of the encoded abundance between the current time t and the previous time point:

where Δt is the time difference between observations. For discretized time steps it is also possible to define a multiplicative fitness that describes the ratio, rather than the difference, of encoded relative abundance between time points (Sec. S4).

In the case of a single mutant competing against a wild-type strain, the estimated relative fitness of the mutant (using Eq. (5)) under the logit encoding is

where x_mut is the relative abundance of the mutant and x_wt = 1 − x_mut is the relative abundance of the wild-type. This form of relative fitness is widely used in empirical measurements of microbial fitness [12, 66]. Note that a special property of the logit encoding is that we can rewrite Eq. (6) in terms of the log fold-changes of the mutant and wild-type strains:

where the log fold-change of each strain is LFC = log N (t)/N (t Δt) and N is the biomass of the strain. The LFC is sometimes also referred to as a Malthusian parameter [9, 26, 68].

A key element of this estimate of relative fitness (Eq. (5)) is the time interval Δt. There are three common ways to choose a time interval Δt in empirical measurements of microbial fitness. The simplest is to use a fixed clock time between measurements (e.g., one day). However, many microbial populations, especially in laboratory experiments [26, 69] but also in some natural environments [70–72], grow in discrete growth cycles dictated by pulses of nutrients (batch culture). These cycles define an intrinsic timescale of population dynamics and also determine the time point of sampling. In this case, it is convenient to quantify relative fitness per cycle

where we have chosen the logit-encoding and Δt = 1 growth cycle in Eq. (5).

Another important timescale for microbial populations is the generation time. In general, the relative fitness per cycle (Eq. (8)) may depend on the number of generations in the growth cycle, and it can be valuable to normalize for this dependence, especially when comparing across environments [26]. However, defining the number of generations for a population with multiple genotypes growing at different rates is ambiguous. In the case of a single mutant competing with a wild-type, it is common to consider only the number of generations experienced by the wild-type strain, which is estimated by the log fold-change LFC_wt = log N_wt(t)/N_wt(t − Δt). It is convenient to express the generations in base e to match the natural logarithm in the logit-encoding, but this could equivalently be done by converting all logarithms to base 2. Thus the relative fitness of a mutant per generation is defined by choosing Δt = LFC_wt in Eq. (5). In the case of the logit-encoding, the relative fitness per-generation is

where we have replaced ratios of relative abundances with ratios of absolute abundances in Eq. (6) and rear-ranged to express completely in terms of log fold-changes (Eq. (7)). The relative fitness per generation with the logit encoding is equivalent to the fitness statistic used in the LTEE [26] and other studies [73–77]. Some authors assume a fixed number of generations per growth cycle [12, 15], but since this may not be true, we use the term “per-generation” more strictly for fitness statistics where the wild-type generations are explicitly measured.

Relative fitness per-generation ranks mutants differently than relative fitness per-cycle

Even though relative fitness of a mutant per growth-cycle (Eq. (8)) and per generation (Eq. (9)) are both common statistics in microbial fitness measurements, they are not just equivalent quantities in different units. One major discrepancy between these fitness statistics occurs when the wild-type strain has no growth or experiences net death during the growth cycle because the number of wild-type generations is not well-defined (denominator of Eq. (9)) [5, 78]. This occurs for microbes under high drug concentrations, as well as for microbial populations under harsh environmental conditions as found in sediments or wastewater [79, 80]. Figure 1C shows four qualitatively different scenarios for a mutant competing with a wild-type strain, parameterized according to their LFCs. A comparison between relative fitness per-generation and per-cycle only makes sense when both the wild-type and the mutant show net growth (scenario 1 in Fig. 1C)

However, even when the LFC is positive, the relative fitness per-generation can produce different rankings compared to the relative fitness percycle. For example, Fig. 1D shows two mutant genotypes (red and grey) that have opposite rankings under these fitness statistics: the grey mutant has higher relative fitness per-cycle , but the red mutant has higher relative fitness per-generation . The disagreement in ranking requires positive covariation in the LFCs (red bowtie area in Fig. 1D), such that the mutant with higher LFC also induces a higher LFC in the wild-type (Sec. S5, Fig. S4).

Fitness statistics disagree over predicted ranking of single gene knockouts

While we have shown that relative fitnesses per-cycle and per-generation can lead to different mu-tant rankings in principle, we need to know whether this outcome occurs under realistic scenarios of microbial population dynamics. We thus simulate fitness rankings for gene knockout mutants using empirically measured growth traits, based on a previously published growth curve dataset for the single-gene knockout collection in Saccharomyces cerevisiae [29, 81]. We estimate growth traits from growth curves (Fig. 2A; Methods; Sec. S6) and find large variation in lag time, growth rate, and biomass yield for the single gene knockouts (grey dots in Fig. 2B,C; also see Fig. S5); despite the knockouts affecting multiple traits (pleiotropy), there are only weak correlations between them. To estimate relative fitness per-cycle (Eq. (8)) and per-generation (Eq. (9)), we simulate a pairwise competition for each knockout against the wild-type (Fig. 2D) using a consumer-resource model with a single limiting nutrient (Methods) [82, 83]. While the relative fitness per-cycle and per-generation are highly correlated across mutants over-all (Fig. S6), there are major differences in the ranks of individual mutants (Fig. 2E). For example, measuring the relative fitness per-generation ranks one beneficial mutant (grey dot highlighted with blue circle in Fig. 2E) 145 positions higher than quantifying relative fitness per-cycle (where higher rank corresponds to higher fitness). Indeed, when we plot the set of competitions in this fitness ranking on the LFC diagram, we find that there is many pairs of points that have positive covariation in the mutant and wild-type LFC and thus give rise to ranking differences (compare Figs. 2F and 1D) despite a negative covariation overall (compare Figs. 2F and S4B).

Comparison of mutant fitness rankings with different statistics on empirical trait variation.
(A) Overview of the growth curve dataset and the estimated growth traits for the knockout library of *Saccharomyces cerevisiae* (Methods). (B) Covariation between estimated steady-state growth rate g and lag time, λ across all mutant strains (grey dots; Pearson correlation coefficient r = −0.17, p = 7 × 10⁻³⁰) as well as wild-type replicates (orange dots; r = − 0.16, p = 0.002). The reference wild-type strain for our pairwise co-culture simulations is defined by the median trait values (black cross) of all wild-type replicate. (C) Covariation between measured steady-state growth rate g and biomass yield Y across all mutant strains (grey dots; r = 0.21, p = 8 × 10^-44) as well as wild-type replicates (orange dots; r = − 0.06, p = 0.25). (D) Overview of pairwise co-culture simulations. For each mutant strain (orange), we simulate a competition growth cycle against a reference wild-type strain using the estimated traits (panel A) and laboratory parameters for the initial condition (N₀ = 0.05 OD, R₀ = 111 mM glucose, x = 0.5; Methods) and quantify relative fitness of the mutant in different statistics (Eq. (8),Eq. (9)). (E) Rank disagreement between relative-fitness per-generation and per-cycle . For each fitness statistic, we calculate the mutant ranking (higher rank means higher fitness and mutants with equal fitness are assigned the lowest rank in the group). The rank difference is defined as the rank in minus the rank in . (F) Covariation between wild-type and mutant fold-change across all simulated competitions, with mutant strains (grey dots) and wild-type replicates (orange dots). For each wild-type replicate, we simulate a pairwise co-culture competition against the reference wild-type strain. We highlight the mutant with the greatest rank difference (blue halo) in panel E and F, and its corresponding bow-tie area of misranking (compare Fig. 1D).

What features of the population dynamics are responsible for the schism between relative fitness per-cycle and per-generation? We explore this by systematically varying the mutant trait distributions and initial conditions in our simulations (Fig. S7). This demonstrates that the fitness statistics and disagree on the rank-ing if the mutants have diminished biomass yield and are competed at a high initial frequency (50%) against the wild-type (rows A and D in Fig. S7). Intuitively, these conditions are necessary because they allow mutants to alter the wild-type LFC. Even though the wild-type growth rate is unaffected by mutants, the wild-type LFC will change in the presence of mutants that have lower yield because they reduce the time to resource depletion (row D in Fig. S7). A change in wild-type LFC also requires the mutant to be present at high relative abundance initially (row A in Fig. S7; see Sec. S7 for detailed conditions). By this mechanism, it is possible to concoct a set of growth traits where relative fitness per-generation and per-cycle deliver completely opposite rankings (Fig. S8).

As a further example, we test how the choice of fitness statistic affects conclusions from the LTEE. We reanalyze competition data from the LTEE [11] using the relative fitness per-generation (equivalent to the original definition used in those studies, ) and relative fitness per-cycle (Sec. S8). We see that the relative fitness statistics disagree on the rank-ing of some evolved lineages overall (Fig. S9) and at a given time point (Fig. S10). We confirm that the long-term fitness trend in the LTEE is the same for and (Fig. S11), but it is possible to construct hypo-thetical scenarios of evolution where this would not be the case (Fig. 3A-B). Beyond the qualitative mismatch between these fitness statistics, and can also lead to different conclusions for the presence of epistasis between multiple mutations (Fig. 3C-F,Fig. S12). For example, measuring relative fitness per-generation will detect negative magnitude epistasis between a mutation that only affects lag time and a mutation that only affects biomass yield (Fig. 3C), but measuring relative fitness per-cycle shows no epistasis (Fig. 3D). This is because mutations that increase biomass yield increase the wild-type LFC and, in this case, decrease the relative fitness per-generation (compare Fig. S12A and C). By the same mechanism, relative fitness per-cycle detects epistasis between mutations that affect growth rate and biomass yield (Fig. 3F), but no such epistasis is present in relative fitness per-generation (Fig. 3E). This again demonstrates that it is possible to arrive at different biological conclusions from the same experimental data de-pending on the choice of fitness statistic.

Potential consequences of the choice of fitness statistic for the interpretation of evolutionary data.
(A) Hypothetical scenario for the trait evolution in a long-term evolution experiment. An evolving population decreases in lag time (orange line), increases in growth rate to a maximum (blue line) and keeps decreasing in biomass yield (green line). This trend is similar to initial observations from the LTEE [94] (B) Corresponding long-term trend in relative fitness based on the trait evolution in panel A. We estimate relative fitness per-cycle (grey line) and per-generation (red line) every 250 generations. Dotted grey lines mark the end of trait evolution in lag time and growth rate in panel A. For the actual fitness trend in the LTEE, see Fig. S11. (C) Epistasis plot for lag time and yield using relative fitness per-generation . Colored dots show the fitness for a single mutant with shorter lag time (blue dot), a single mutant with higher biomass yield (red dot) and a double mutant with both mutations (purple dot). (D) Epistasis plot for lag time and yield in relative fitness per-cycle . (E) Epistasis plot for growth rate and yield in relative fitness per-generation . Colored dots show the fitness for a single mutant with higher growth rate (blue dot), a single mutant with higher biomass yield (red dot) and a double mutant with both mutations (purple dot). (F) Epistasis plot for growth rate and yield in relative fitness per-cycle . All epistais plots are based on 50:50 competition growth cycles with the wild-type (compare panels C,D and Fig. S12A-C, compare panels E,F and Fig. S12D-F).

Higher-order effects distort relative fitness measured in bulk competitions

So far we have focused on measuring relative fitness in pairwise competition, but this is usually not practical for large numbers of mutants. Measuring traits of genotypes in monocultures to predict their pairwise relative fitness (Sec. S1) is convenient, but when we test this strategy using our simulation framework we find that most traits perform poorly (Sec. S9, Fig. S13) (with some success for the AUC, but this success depends on the choice of time scale; Fig. S14). Another approach to measuring relative fitness is to use bulk competition experiments, where many mutant genotypes compete simultaneously in a single culture and each genotype is tracked through DNA barcode sequencing [39, 44]. However, this raises the question of how well relative fitness of a mutant in bulk competition corresponds to its relative fitness in a pairwise competition with the wild-type, since the growth in bulk might be influenced by the presence of other mutant genotypes, a phenomenon known as a higher-order interaction [83–85].

An important choice in bulk competition experiments is the relative abundance of the library of all mutant genotypes compared to the wild-type. The invasion of a spontaneous mutation into an existing population is best captured by pairwise competitions with low initial relative abundance of the mutant (Case I, Fig. 4A), and one way to recreate this scenario in a bulk competition is to use a low relative abundance for the mutant library overall (Case II, Fig. 4A) [40, 48–52]. A practical problem with Case II is that individual genotypes in the library will have low absolute abundances, which leads to stochasticity in the population dynamics and the sequencing preparation [13, 86, 87]. Therefore, it is common to compete the mutant library by itself (Case III, Fig. 4A) [13, 15, 17–19, 88].

The choice of library abundance and reference group in bulk competition experiments.
(A) Overview of a pairwise competition experiment (upper row) and multiple scenarios for bulk competition experiments (middle and bottom row) with different initial fraction of the mutant library (colored ovals) in the inoculum (open box). For each scenario, we show a schematic growth cycle (log absolute abundance) in the inset on the right. (B) Schematic relative abundance trajectories for a mutant compared to two alternative subpopulations. We distinguish between the total relative abundance x_i with respect to the population as a whole (height of green band in the top box) and the pairwise relative abundance x_iwt with respect to the wild-type (height of green band in the bottom box; Eq. (18). We indicate the sign of total relative fitness (Eq. (22)) and pairwise relative fitness (Eq. (23)) on the right. (C) The absolute error between bulk and pairwise competition experiments. The total relative fitness (grey dots; Eq. (22)) and the pairwise relative fitness (red dots; Eq. (23)) for empirical knockouts (Fig. 2A) in bulk competition growth cycle with low mutant library abundance (panel A, case II; Methods). The absolute error is defined as the bulk fitness statistic minus the relative fitness in pairwise competition (Eq. (S81)). In the inset, the absolute error for pairwise relative fitness (Eq. (23)) for a bulk competition growth cycle with high mutant library abundance (blue dots; case III). The x-axis and the red dots in the inset are identical to the main plot. (D) The relative error in bulk competition experiments as a function of mutant library abundance in the inoculum. Each line corresponds to a knockout in our dataset, and represents the relative error between the pairwise relative fitness in bulk competition and the relative fitness in pairwise competition (Eq. (S91)). In black lines, we show the recommended mutant library abundance for our dataset based on Eq. (10) (x_lib ≈ 24.6%) and based on Eq. (S109) (x_lib ≈ 0.02%, Sec. S15).

A second important choice is whether to quantify fitness of a mutant relative to the whole population (“total relative fitness”; Fig. 4B, top panel) [18, 19, 88] or relative to another specific genotype, like the wild-type (“pairwise relative fitness”; Fig. 4B, bottom panel; Methods) [40, 48–52]. In the case of a mutant library growing by itself (Case III, Fig. 4A), it is still possible to estimate a pairwise fitness by using known neutral mutants as the reference population [15, 17] and calculating fitness relative to this group (Secs. S10 and S11).

To test the consequences of these choices, we simulate a bulk competition (batch culture) using the trait data for yeast single-gene knockouts and compare the es-timates of total and pairwise relative fitness to fitness measured in pairwise competition, which we use as the ground truth (Methods). Figure 4C shows that the total relative fitness has systematically higher error than the pairwise relative fitness does, since the total fitness measures the genotype relative not just to the wild-type but to all other mutants as well (Sec. S12). For example, a mutant that grows identically to the wild-type (neutral phenotype) has zero increase in the pairwise relative abundance, but may have a net increase in the total relative abundance due to the poor growth of other mutants (compare top and bottom panel in Fig. 4B). This affects the total relative fitness of all mutants in a uniform way (Fig. 4C, Sec. S12), such that total and pairwise relative fitness completely agree in the ranking of genotypes (Fig. S15A).

While the pairwise relative fitness is the preferred method for estimating fitness in bulk competitions, the presence of higher-order interactions means it still deviates from the fitness in pairwise competitions. For the specific population dynamics in our simulation (Methods; Sec. S13,), we decompose the higher-order interaction into two terms, a fitness-independent term and a fitness-dependent term that acts as an amplifier for fitness values (Fig. S16; Sec. S14). Intuitively, the population in bulk competition consumes resources more slowly and gives more time for growth rate differences to accrue. The higher-order effects were tested by Levy et al. [13], who compared fitness estimates for ca. 30 beneficial lineages in bulk to the fitness estimates from pairwise competition and found that bulk fitness was typically higher, with some variation from lineage to lineage, consistent with our results here.

Can we reduce these deviations by changing the relative abundance of the mutant library (Fig. 4A)? When we repeat our simulation using a high mutant library abundance (Case III, Fig. 4A), we find that the absolute error for pairwise relative fitness increases proportionally (inset in Fig. 4C). Clearly, minimizing the abundance of the mutant library minimizes the strength of higher-order interactions (also reducing ranking disagreement (Fig. S15B,C).

As low mutant abundances create practical problems for experiments, it is valuable to identify a maximum abundance for the mutant library that keeps the error from bulk fitness estimates below a desired threshold. It is convenient to express this threshold in relative rather than absolute terms (compare Fig. S17 to Fig. 4C). Figure 4D shows the relative error from higher-order interactions across a range of mutant library abundances and using the specific population dynamics of our model, we derive the following rule: Assuming that the mutants in the library only have variation in growth rate, the mutant library abundance should be below

where ϵ is the desired threshold on the relative error, g_wt is the wild-type growth rate, and g_lib is the growth rate of the library as a whole (Sec. S15). In the case of our single-gene knockout library, Eq. (10) predicts a maximum library abundance of 24.6% based on the wild-type and library growth rate (g_wt = 0.406, g_lib = 0.389) for a relative error ϵ = 1%. Figure 4D shows that this maximum library abundance keeps the relative error below 1% for high-fitness mutants (bright yellow), because they are dominated by growth rate effects, but fails for mutants close to neutrality because they have a trade-off between growth rate and lag time, which the estimate in Eq. (10) neglects. It is possible to derive a more precise bound that keeps the relative error below 1% for all mutant genotypes (here: x_lib = 0.02%, see vertical line in Fig. 4D), but this requires prior knowledge of the trait covariation in the mutant library (Sec. S15).

Discussion

Best practices for quantifying mutant fitness in high-throughput experiments

In this work, we have introduced a conceptual framework (summarized in Fig. 5) to derive common statistics of relative fitness from three essential choices:

Flow-diagram for the quantification of relative fitness from time-series data.
Given the relative abundance of the mutant genotype at two consecutive timepoints (e.g. the start and end of a growth cycle), the user has to choose an encoding (Fig. 1A-B) and the time-scale for evaluating the change in relative abundance (Eq. (5)). In bulk competition experiments, multiple definitions of relative abundance are possible depending on the choice of the reference subpopulation (compare Fig. 4B; Methods). Each combination of these choices (dashed black lines) leads to a different fitness statistic and we summarize our recommendations (thick black line) in the discussion.

The choice of the state variable for relative abundance (encoding m; Eq. (2) and Fig. 1B).
The choice of the time scale for the change in relative abundance (Δt; Eq. (5)).
In bulk competition experiments, the choice of sub-population that acts as the reference for relative abundance of a chosen mutant genotype (Fig. 4B).

The combination of these choices leads to a range of fitness statistics, including those commonly used in population genetics, experimental evolution, and transposon-insertion screens (Fig. 5). However, as we compare these statistics in simulated competition experiments, we find that the choice of time scale can lead to different mutant rankings (Figs. 2D and 2E) and the choice of the reference subpopulation can lead to a systematic offset in fitness values (Fig. 4C). Based on these insights, we recommend the following choices for quantifying relative fitness of a mutant:

Use the logit encoding of relative abundance, because under the null model of logistic dynamics this linearizes the trajectory of relative abundance and regularizes measurement noise (Fig. S3).
Use a fixed extrinsic time scale (e.g., a single growth cycle), rather than an intrinsic time scale (e.g., the number of generations) which introduces an additional factor of variation between competitions.
In bulk competition experiments, use the pairwise fitness of the mutant relative to the wild-type reference subpopulation (either by including a barcoded wild-type or by grouping neutral mutants into a virtual wild-type) because this more closely matches the relative fitness in pairwise competition.
Since relative fitness measurements in bulk competitions inevitably carry an error from higher-order interactions, we recommend minimizing relative abundance of the mutant library as practically feasible (Fig. 4D). For a given error tolerance, Eq. (10) gives an estimate for the maximum library abundance.

If measuring direct competition is not feasible, we empirically find that the area under the growth curve (AUC) is the best approximation of the true fitness ranking (Fig. S13) but one must carefully choose the time scale (Fig. S14). Our recommendations agree with previous criticisms of relative fitness per-generation [5, 8, 9, 78] and total relative fitness [60, 89] but differ from the standard practice in high-throughput genetic screens [18, 19, 88] and many evolution experiments [33, 75], including the LTEE [11, 26]. The choice of the reference group is a well recognized issue in high-throughput evolutionary studies, in which one first estimates a total relative fitness and then subtracts a correction based on the fitness of the wild-type [12, 50, 52] or a mean population fitness [13, 41, 42, 51, 89]. In contrast, we recommend to choose the reference group at the level of relative abundance (Fig. 4B; Eq. (18))

Consequences of fitness quantification choices for microbial ecology and evolution

The quantitative differences between relative fitness statistics affects our ability to make evolutionary predictions. In particular, the fixation probability of a spontaneous mutation depends on the magnitude of relative fitness compared to other timescales like mutation and drift [57, 58], and in microbial populations with clonal interference, it depends on the entire distribution of fitness effects [59, 90]. For example, multiplying the mutation rate with an incompatible measure of fitness would lead to predicting the wrong speed of adaptation. In the context of multiple mutations in the same cell, different fitness statistics can lead to different conclusions about the presence of magnitude epistasis (Fig. 3C-F), and in the context of gene essentiality tests, they may affect the outcome of a significance test (e.g., using log or logit with the test statistic in Ref. [18]).

Using different fitness statistics can even lead to differences in mutant rankings (see Fig. S8 for an extreme case). Since measurements often serve as a first screen to narrow down the investigation to the top set of genes [17, 40], a difference in mutant ranking means the investigation might miss out on relevant genes because of the choice of fitness statistic.

Other sources of discrepancy in quantifying mutant fitness

Besides the conceptual choices of fitness quantification discussed in this article, experimental limitations can create discrepancies between replicate fitness measurements of the same mutant [36, 50, 91]. For example, relative abundance measurements entail sampling uncertainties (when we sample liquid for colony counting or DNA sequencing) [44, 60, 61], copy number variation [18, 44] as well as PCR jackpots and sequencing read errors [13, 86, 87]. Furthermore, there will inevitably be some variation in initial condition between replicates and fluctuations during the fitness assay [36, 87, 91]. On one hand, the conceptual choices for quantifying fitness can mitigate experimental uncertainty. For example, the logit encoding normalizes the sampling errors over the time series of relative abundance (Fig. S3B). On the other hand, there is sometimes a trade-off between conceptual choices and experimental precision, such as the choice of the initial mutant library abundance, which needs to be low to minimize higher-order interactions and yet high enough to minimize sampling uncertainty [44]. Future work will need to elucidate the interplay between conceptual and experimental sources of discrepancy in fitness measurements.

How general is the disagreement between fitness statistics?

The discrepancy between relative fitness per-cycle and per-generation (Figs. 1D and 2E), as well as the offset in the total relative fitness compared to pairwise fitness (Fig. 4C) will hold under any form of population dynamics. However, the details of the population dynamics do matter for the higher-order interactions that cause the discrepancy between relative fitness from bulk and pairwise competitions (Fig. 4D). The empirical evidence for the presence and mechanisms of higher-order interactions in microbial populations is limited [83–85]. Nevertheless, the fact that we see the fact that we see higher-order interactions even under competition for a single resource suggests that these effects will be present under more complex dynamics. Indeed, a comparison of mutant fitness measurements found systematic overestimates for fitness from bulk compared to pairwise competitions [13]. Since this could not be explained by the offset from using total relative fitness, it is indicative of higher-order interactions. Future work should provide more comparisons between relative fitness from bulk and pairwise competitions to better understand higher-order interactions.

Methods

Inferring growth traits for the single-gene knockout collection in Yeast

We use a previously published dataset [29] for the single-gene knockout collection in Saccharomyces cerevisiae [81], where the authors track growth of each genotype for 47 hours in monoculture, using microwell plates with defined growth medium. We download this growth curve dataset from the PROPHECY database [92], choosing specifically the dataset measured in Synthetic Defined medium. We last accessed the PROPHECY website (http://prophecy.lundberg.gu.se/) in March 30, 2020, but as of May 30, 2024, the website no longer seems accessible and included a reformatted version of the data with our code repository (https://github.com/justuswfink/24FitnessQuantification). From the raw timeseries of optical density, the authors subtracted a background correction based on blank wells, applied another correction for nonlinearity at high optical densities and smoothed the growth curve to remove electrical noise [29].

We start with the curves in the published dataset (9951 curves) and apply further trimming and smoothing steps (Sec. S6). From this data we calculate the time series of instantaneous per-capita growth rate N ⁻¹dN/dt (where N is the optical density) and identify time windows where the rate is approximately constant, which we interpret as distinct growth phases (Sec. S6). We only include curves that have a single phase of constant exponential growth followed by a stationary phase of approximately zero growth (9424 curves).

We quantify the biomass yield, maximum growth rate, and the lag time from each remaining curve as follows. First, we estimate the initial abundance N_initial (average optical density over first three time points) and the final abundance N_final (average optical density over stationary phase) from the growth curve. Then we calculate the biomass yield as Y = (N_final − N_initial)/R(0), where R(0) = 111 mM (20 g/L) is the initial concentration of glucose (assuming glucose is the single limiting resource). To estimate the maximum growth rate, we average the instantaneous growth rate over the exponential phase. Finally, we estimate the lag time from the intersection of the log initial abundance (log N₀) with the slope of the maximum growth rate during the exponential growth phase [93]. We excluded curves with negative initial OD, curves with negative inferred lag times, and curves with a low quality of fit between the measured time series and a simulated curve based on the inferred trait values (R² < 0.95; see below for the model). Our final database includes trait estimates for 9195 curves, which represents 92.4% of the original dataset [29].

Each single-gene deletion strain was measured in two technical replicate growth curves, using a second plate with identical layout in the same plate reader. Some genotypes have only one estimate of the growth traits because the other replicate did not pass our filters (273 genotypes), but for most genotypes we retain two replicate estimates in our final dataset (4163 genotypes); a few genotypes even have three (2 genotypes) or four replicate estimates (54 genotypes) because these genotypes were included multiple times by the original authors [29]. From these replicates we finally calculate the average yield, growth rate, and lag time for each geno-type. The dataset also contained many replicate growth curves of the wild-type strain, 374 of which passed our filters. Since wild-type traits inferred from these replicates had large variation (potentially due to measuring the wild-type across many different plates and days), we define the wild-type trait as the median, not the mean. Although the wild-type measurements included in this dataset have large trait variation (orange dots in Fig. 2B,C), the fact that the variation across knockouts is significantly greater for growth rate (Levene’s test p = 1.3 × 10⁻⁹) and lag time (Levene’s test p = 0.025) and that the replicate measurements of gene knockouts are correlated (Fig. S5) suggests that these trait measurements do capture true genetic variation. In contrast, the variation in biomass yield across knockouts is not significantly different from the variation across wild-type replicates (Levene’s test p = 0.44), suggesting that the variation in biomass yield is driven by non-genetic factors like the abundance or batch of resources used for each growth measurement.

Simulating population dynamics in competition experiments

Throughout this work, we model the competition between a set of genotypes during a single batch culture growth cycle using the following model of population dynamics [82, 83]:

where N_j is the absolute abundance, x_j(0) the initial relative abundance, g_j(t, R) the growth rate, and Y_j the biomass yield of genotype j, where the genotypes include a wild-type as well as one or more mutants. The initial absolute abundance of all genotypes together is N₀, and the concentration of the single limiting resource is R with initial value R₀. Note that Eq. (11b) assumes that cells consume resources only for biomass growth and not for maintenance of existing biomass.

To capture the sigmoidal shape of typical growth curves with a lag phase, exponential growth phase, and saturation phase (e.g., Fig. 2A), we model the dependence of growth rate on time and resource concentration as

where Θ is the step function:

In this model, a genotype has an initial lag phase of time λ_j where no growth occurs, followed by a phase of constant exponential growth at rate g_j, and ending when the resource concentration R reaches zero. The time to resource depletion, which we also call the saturation time, is defined by the implicit equation R(t_sat) = 0. The simple form of the growth response (Eq. (12)) means that the absolute abundance of genotype j at saturation is

and its log fold-change is

To calculate relative fitness per-generation (Eq. (9)) or per-cycle (Eq. (8) with Δt = 1), we numerically determine the saturation time t_sat by integrating the biomass and resource dynamics (Eq. (11)) as described in previous work [82, 83]. We note that it also possible to derive an approximate expression for the saturation time t_sat which we use for our theoretical calculations (Sec. S7).

Testing total and pairwise relative fitness in bulk competition experiments

We explore total and pairwise relative fitness as two alternative measures of fitness in bulk competition experiments, and defined as follows. Consider a population with multiple genotypes such that each genotype i has absolute abundance N_i(t) at time t. The total relative abundance of the genotype is

The “total” here refers to the fact that this is the abundance of genotype i relative to all other genotypes in the population. Similarly, we can think of the definition of relative fitness discussed in the main text (Eq. 3) as the total relative fitness of this genotype under an encoding m:

However, sometimes we want to track the dynamics of a genotype relative to another specific genotype; for example, to follow the fate of a mutant against the wild-type in a bulk competition experiment. For a pair of genotypes i ≠ j, we thus define the pairwise relative abundance

which is the relative abundance of genotype i in a subpopulation of genotype i and j. To predict the change in the pairwise relative abundance, we can use the slope

which defines the pairwise relative fitness of genotype i with respect to genotype j. In the special case of a population with only two strains, the pairwise relative fitness and total relative fitness are identical but they may differ with more than two genotypes (Sec. S12).

It is also possible to define the total and pairwise relative fitness statistics as finite differences over a time interval, rather than as instantaneous derivatives. For example, we can define them over a growth cycle starting at t = 0 and ending at t = t_sat:

For these fitness statistics in the bulk competitions, we use the logit encoding

where we have rewritten in the form that it is pre-sented in bulk competition experiments [15, 51, 52]. The logit encoding has mathematical advantages for coarse-graining the relative fitness of genotype groups (Sec. S10) but using the log encoding is another common choice in the literature [17, 41, 50]. However, the relative abundances of the individual mutants in our simulated bulk competition experiments are low enough that these two encodings are approximately equivalent (logit x ≈ log x for x ≪1).

Unlike the comparison between per-cycle and per-generation relative fitness where we focused on rank differences (Fig. 2E), here we can evaluate absolute differences in fitness estimates because the total and pairwise fitness in bulk fitness are measured in the same units as the relative fitness in pairwise competition. For all mutant genotypes, we calculate the absolute error between these bulk fitness estimates (Eq. (22),(23)) and the rel-ative fitness in pairwise competitions (Eq. (6)), which we take as the ground truth (Fig. 4C). Note that the pairwise competition could depend on the initial relative abundance of the mutant; we have chosen a very low rel-ative abundance (10⁻⁶) that mimics a mutant arising de novo and where this dependence is very weak.

Acknowledgements

JWF wishes to thank Luis-Miguel Chevin, Olivier Tenaillon, Henrique Teotónio and the participants of the 2021 ENS autumn course on Experimental Evolution for valuable discussions early on. Both authors are grateful to Benjamin Raach and Gatwa Tshinsele-Van Bellingen for a critical reading of the manuscript. JWF and MM were supported by an Ambizione grant from the Swiss National Science Foundation (PZ00P3 180147).

Supplementary Information

S1. Different types of fitness under example models of population dynamics

We distinguish between three related but distinct notions of fitness [1], all based on predicting dynamics of a population [2–5]. The first type of fitness is absolute fitness, which is a property of a single genotype by itself and serves to predict the change in the genotype’s absolute abundance N(t) over a future time window Δt (Fig. S1A). This is important for questions about extinction and evolutionary rescue [6]. The second type of fitness is relative fitness, which is a property of two geno-types as it describes how the relative abundance x(t) of one genotype changes compared to the other over a time Δt (Fig. S1B). This is important to determine the fixation probability of new mutations [7, 8]. In general these dynamics are stochastic [7, 9, 10], but throughout this paper we focus on their average behavior across replicate cultures (as sketched in Fig. S1C).

A practical challenge of working with relative fitnesses is that they must be measured between all pairs of geno-types in co-culture competitions. Therefore it is common to infer relative fitness of two genotypes based on some individual properties of the genotypes [11–14]. We denote this third notion of fitness as the fitness potential ; it is a property of an individual genotype, but unlike the absolute fitness, it has no meaning by itself; it is the ratio or difference of fitness potentials that is used to derive relative fitness between two genotypes [15, 16]. The collection of fitness potential values across a large set of genotypes forms a fitness landscape [17, 18]. We note that fitness as defined here gives information about short-time dynamics but not necessarily the long-term outcome (compare [19]). For example, this excludes the ratio of growth rates [14, 20] or the resource concentration R* in chemostat equilibrium [21] since these quantities cannot tell you how fast the absolute or relative abundance is changing.

In this section we explicitly calculate relative fitness of a mutant under a few example models of population dynamics, using the different encodings of relative abundance as described in the main text (Fig. 1A). Consider a competition coculture between a wild-type genotype with absolute abundance N_wt(t) and a mutant genotype with absolute abundance N_mut(t). We can describe their dynamics according to the ordinary differential equations (ODEs)

Note that the per-capita growth rates g_wt and g_mut of each genotype can depend on both genotypes to reflect competition or other interactions. The relative abundance of the mutant genotype at time t is

The dynamics of the mutant relative abundance are therefore described by

As defined in the main text (Eq. (3)), the relative fitness of a mutant is s^m = dm/dt for an encoding m(x) of the relative abundance x. Under the trivial linear encoding (m(x) = x), the relative fitness is therefore just the right-hand side of the relative abundance ODE (Eq. (S3)):

For the log encoding m(x) = log x, we use the identity d log x/dt = x⁻¹dx/dt to obtain the log-encoded relative fitness:

Finally, for the logit encoding m(x) = logit x = log(x/(1 − x)), we use the identity d logit x/dt = x⁻¹(1 x)⁻¹dx/dt to obtain the logit-encoded relative fitness:

By comparing relative fitness values under the linear encoding (Eq. (S4)) and under the logit encoding (Eq. (S6)), we see how the logit encoding has removed the explicit dependence on the mutant relative abundance (factors of x and 1− x), although there can be implicit dependence on the mutant relative abundance within the per-capita growth rates of each strain (g_wt and g_mut) due to density-dependent growth rates.

If the per-capita growth rates g_wt and g_mut (Eq. (S1)) are constants, then these constant growth rates also act as fitness potentials since they each depend only on a single genotype but their difference determines relative fitness (under the logit encoding, Eq. (S6)) between the genotypes. The growth rate is not a fitness potential under more complex dynamics, however. For example, consider a competition model with explicit density dependence:

where the growth rates decrease as the genotype abundances reach their carrying capacities K_wt and K_mut, and the maximum growth rates at low abundances are r_wt and r_mut. In this case, the relative fitness under the logit encoding is (from Eq. (S6))

In this case, there is no fitness potential because it is not possible to separate Eq. (S8) into a difference between terms that only depend on each genotype separately.

S2. The role of logistic population dynamics in logit-encoded relative fitness

Here we show how the logit encoding of relative abundance is related to the logistic model of population dynamics. For a relative abundance x, logistic dynamics are

where r is the exponential rate at which relative abundance increases from low values. This form emerges from the general dynamics of relative abundance (Eq. (S3)) when the difference in per-capita growth rates g_mut and g_wt is constant. Once can also interpret the logistic model as a lowest-order approximation for more complex dynamics. That is, consider a general equation for relative abundance:

for an arbitrary function f (x). Since this function must obey the boundary conditions f (0) = 0 and f (1) = 0 (the relative abundance must stop changing when it either goes extinct or fixes), a polynomial expansion of f (x) must have roots at these values:

Thus the logistic model in Eq. (S9) is a lowest-order approximation even when the true dynamics are more complex.

The logistic differential equation in Eq. (S9) has the solution

The logit encoding of the logistic relative abundance has linear dependence on time:

This is another way to see why the relative fitness under the logit encoding (the time derivative of Eq. (S13)) is constant under logistic dynamics. Mathematically, this occurs because the logit function is the inverse of the logistic dynamics (Eq. (S12)), up to a shift and rescaling. Thus if the relative abundance dynamics are different from logistic (Eq. (S9)), the logit encoding no longer exactly linearizes the trajectory of relative abundance and thus is no longer the optimal encoding for relative fitness (see example in Fig. S2).

S3. Definition of absolute fitness for a genotype

Here we give an explicit definition of a genotype’s absolute fitness, analogous to the definition of relative fitness in the main text (Eqs. (1)–(3)). Conceptually, absolute fitness is any number that is sufficient to predict a geno-type’s absolute abundance N over a short time window. Let an encoding m(N) be any smooth, strictly-increasing function of the absolute abundance N. We can then predict the absolute abundance over a time window Δt using a linear expansion of the encoded abundance (analogous to main text Eq. (2) for relative fitness):

where m^-1 is the inverse of the encoding function and

is defined as the absolute fitness of the genotype under the encoding m (analogous to main text Eq. (3) for relative fitness; see also Fig. 1B). For example, the absolute fitness of a genotype under the log encoding m(N) = log N is the per-capita growth rate:

In general, the ideal encoding of absolute abundance is the inverse function of the absolute abundance trajectory N(t) (up to a shift and rescaling), so that the first-order expansion in Eq. (S14) is exact and the absolute fitness a^m is sufficient to determine changes in absolute abundance up to any future time. The log encoding m(N) = log N is therefore ideal when absolute abundance grows or decays exponentially, while the logit encoding m(N) = logit N is ideal for a population that grows with logistic density dependence (Eq. (S7) in case of a single genotype).

Absolute fitness and relative fitness are related, since the relative abundance of a genotype is determined by normalizing its absolute abundance by the absolute abundance of all genotypes in the population. Specifically, the relative fitness of a genotype is determined by the absolute fitnesses for all genotypes in the population. For example, in the case of two genotypes, the relative fitness under the logit encoding (Eq. (S6)) is the difference between the genotypes’ absolute fitness under the log en-coding (Eq. (S16)). In the case of constant per-capita growth rates, these log-encoded absolute fitnesses also act as fitness potentials.

S4. Relative fitness predictions in discrete time: additive vs. multiplicative form

In our framework, relative fitness is a statistic that predicts relative abundance in an additive equation (Eq. (2)) but sometimes the dynamics of individual genotypes are modeled using a multiplicative form of fitness. In this section, we show how these multiplicative fitness statistics are related to the additive fitness statistics, in particular for the logit encoding.

We consider a population of a wild-type and a mutant genotype, where we track the mutant’s relative abundance over multiple rounds of competition (e.g. growth cycles)

and the variable x changes each round according to some underlying population dynamics.

In a modeling approach typical to many studies in population genetics [22], these population dynamics are captured in the genotype-specific growth factors

which drive the update equation for the mutant relative abundance

which is the discrete-time analogue to a differential equation (Eq. (S1)). We divide Eq. (S19) by 1 − x(r + 1) to obtain the form

which allows us to recognize the ratio of growth factors f_mut/f_wt as a relative fitness statistic, since it is sufficient to predict the relative abundance of the mutant genotype (under the encoding m(x) = x/1 − x). But the statistic f_mut/f_wt acts as a multiplying factor in Eq. (S20), whereas the general form of relative fitness s^m (Eq. (2)) acts as an additive factor. What’s the relationship between the multiplicative and the additive form of discrete-time relative fitness statistics?

To describe discrete rounds of population dynamics in our framework, we treat the relative abundance x(r) as samples from a continuous timeseries, separated by a time-gap Δt = t(r + 1) − t(r). For a chosen encoding m, The relative abundance of the mutant genotype in the future round is predicted by

since this is how we defined relative fitness s^m in Eq. (2) This relative fitness acts as additive factor, but we apply the exponential function on both sides of Eq. (S21) to obtain the updated equation

where the additive fitness together with the timescale Δt acts as a multiplicative factor. Specifically for the logit encoding m(x) = logit x we have

We compare Eq. (S23) to Eq. (S20) and solve for the relative fitness of the mutant genotype

as a function of the growth factors f_mut, f_wt. Using the definition of the growth factors (Eq. (S18)), we see that Eq. (S24) is simply the discrete-time relative fitness for the logit encoding in terms of the mutant and wild-type LFC (Eq. (S30)). As a general point, we note that the growth-factor f_mut qualifies as an absolute fitness (since it is sufficient to predict the absolute abundance N(r+1)), but does not constitute a relative fitness statistic (since we also need to know f_wt, see Eq.(S20) or Eq. (S24)).

More generally, for a given encoding function m(x) we define the mutant’s multiplicative relative fitness over a discrete growth cycle as

where s^m is the additive relative fitness for this growth-cycle under the chosen encoding (Eq. (3)) and Δt is the duration of the growth-cycle in time units that match s^m. Equation (S25) shows that the additive fitness s^m has time units, but the multiplicative fitness does not and is formally a dimensionless quantity. These dimensionless units are preserved in the approximate formula

which is the first-order expansion of Eq. (S25) in the limit of weak selection (|s^mΔt| ≪1). For example, if we choose to measure relative fitness with the logit encoding on the time-scale per-cycle (Δt = 1 cycle), we get

which is the same multiplicative fitness as if we choose to measure the relative fitness on the time-scale pergeneration (Δt = LFC^wt):

In both cases, the time-units for the discrete-time additive fitness cancel in the product term.

So how would a mutant ranking in the multiplicative fitness w^logit rank a set of mutant genotypes compared to the fitness statistics or Equation (S27) shows that the multiplicative fitness statistic w^logit agrees with the relative fitness per-cycle s^logit, but can differ from the ranking in the relative fitness per-generation as both terms of the product in Eq. (S28) depend on the mutant (compare Fig. 2F and Sec. S7).

Finally, we address the question how the relative fitness agrees in the ranking with the multiplicative fitness w^logit, but disagrees with the fitness statistic defined in the Long-Term Evolution Experiment [23] as

By comparing Eq. (S29) to the multiplicative fitness in Eq. (S26), we see that the LTEE fitness statistic W does not derive from the general form of the multiplicative fitness (Eq. (S25)) as all statistics derived from this approximation have a multiplying factor Δt that cancels the units of time in s^m. The fact that the LTEE fitness statistic W (Eq. (S29)) misses the term LFC_wt compared to the logit-based multiplicative fitness w^logit (Eq. (S28)) means that the objects have different units of time, and different rankings.

S5. Derivation of mismatch conditions for relative fitness per-cycle and per-generation

In this section, we derive the conditions for a ranking mismatch between the relative fitness per-cycle and per-generation (for the logit-encoded relative abundance) across a set of competition experiments. Consider a batch culture where a competing wild-type and mutant genotype have log fold-changes LFC_wt and LFC_mut over a single growth cycle. The LFCs are convenient variables to describe these dynamics since we can express the mutant’s relative fitness per-cycle as (main text Eq. (8))

and the mutant’s relative fitness per-generation as (main text Eq. (9))

Here we assume both LFCs are nonzero (so that measuring fitness per-generation is meaningful; see discussion in main text).

To determine how these two fitness statistics lead to different rankings, we consider two competition experiments A and B, which may represent two different mu-tants competing against the same wild-type or the same mutant tested in two different environments. A mismatch in ranking occurs when the fitness per-cycle in competition is highest in B, while the fitness per-generation is highest in A:

We insert the expressions for relative fitness per-cycle (Eq. (S30)) and per-generation (Eq. (S31)) into Eq. (S32) to rewrite the condition for ranking mismatch in terms of the LFCs:

For a given competition A, Eq. (S33) defines an area in the space of LFCs where competition B can lie such that competitions’ fitness is ranked differently per-cycle versus per-generation (Fig. 1D shows an example as the red-shaded area). Biologically, these constraints describe a situation where both the mutant and wild-type LFCs are higher in competition B than in competition A (i.e., so that they gray point is up and to the right of the red point in Fig. 1D), but the LFC increases must be sufficiently balanced between the mutant and wild-type (i.e., so that the point lies within the red area in Fig. 1D).

Typically, however, the LFCs of the wild-type and the mutant are not independent, since these LFCs are jointly constrained by the fact both strains compete for the same finite resources. For example, assume that a single limit-ing resource with concentration R is consumed in proportion to the growth of each genotype’s biomass according to

where Y_wt and Y_mut are the wild-type and mutant biomass yields (stoichiometry of biomass to resource).

We can integrate Eq. (S34) to obtain

The growth cycle stops when no resource remains (R(t) = 0), such that

where we have expressed the genotype abundances at the end of the growth cycle in terms of their LFCs. Equation (S36) thus entails a constraint between the wild-type and mutant LFCs. For a set of a mutant competitions with the same initial resource concentration R(0), initial abundances N_wt(0) and N_mut(0), and yields Y_wt and Y_mut, the mutant and wild-type LFCs are constrained by Eq. (S36) to fall along a one-dimensional curve (black line in Fig. S4A). Geometrically, we see that this constraint on LFCs is incompatible with the requirements for a ranking mismatch between fitness per-cycle and pergeneration for a pair of mutant competitions (compare black line and red shaded areas in Fig. S4A). However, if some mutant competitions deviate from this constraint, for example by having different yields Y_mut or initial conditions N_wt(0) and N_mut(0), then ranking mismatches may be possible (Fig. S4B).

S6. Analysis of growth curves to identify growth phases

In this section, we describe in more detail how we identify growth phases from the original dataset of growth curves and use this to choose a subset of curves that matches the simplified growth dynamics of our population dynamics model (Fig. 2A). As mentioned in the main text (Methods), we downloaded this original data from the PROPHECY DATABASE (http://prophecy.lundberg.gu.se/), downloading specifically the dataset for growth in Synthetic Defined medium as first analysed and reported in [24]. The original growth curve data is already corrected for background and instrument non-linearities [24] (summarized in Methods), but we decided to apply additional corrections as follows: To begin with, we concatenate the original 51 data files (for different plate reader runs) into a single, consecutive dataframe and manually handle a duplication in one of the files (Experiment NO. 18). This file has no measurements for the first timepoint (t = 0) due to technical error in the original data export [24] and for curves from this experimental run, we set the initial time point to NAN value in Python, meaning that these points will be ignored for any subsequent calculations of averages. More generally, we decided to trim the first four timepoints of all growth curves (equivalent to 1h20min from 47h total) and remove OD measurement below a noise threshold (OD = 0.001) as this improves the quality of the fit later on.

After pre-processing, we estimate a smooth time series for the instantaneous growth rate in each growth curve, using a previously published script gaussianprocess.py by Swain et al. [25] that implements the Gaussian Process approach to smoothing (download from https://swainlab.bio.ed.ac.uk/software.html). We apply this script to the logarithmic absolute abundance log OD, and reconstruct a smoothed trajectory f(t) ≈ log OD(t) as well as the first derivative df /dt and the second derivative d²f/d²t [25]. Effectively, the script estimates three hyperparameters that capture the shape of each curve and we find that the estimation works best if we constrain the parameter ranges as outline in Table. S1.

Parameter settings for Gaussian Process optimisation.

With a smoothed time series at hand, we now identify ‘plateaus’ of constant growth rate using the functions available in Scipy [26]. For each growth curve, we start by identifying so-called ‘plateau seeds’, which are small intervals where the second derivative is below a chosen threshold d²f/d²t< 5 · 10⁻⁶. For each ‘plateau seed’ k in the growth curve, we calculate the average growth rate ĝ_k in the time-window of the plateau. Due to some noise in the second derivative, we find many ‘plateau seeds’ that are adjacent and need to be merged. To do so, we iterate over the ‘plateau seeds’ in the growth curve and merge the current candidate k with the previous plateau k - 1, except one of the following conditions is true:

Both plateaus seeds have a duration that is too long (equal or greater than 100 minutes).
The transition time between the plateau seeds is too large (equal or greater than 200 minutes).
The first plateau has a significantly different growth rate that the second one, sucht that the following equation is satisfied
where g_k, g_k−1 are the average growth rate in each plateau (per minute).

Empirically, we find that the duration of these merged plateaus is shorter than what one would expect from visually inspecting the growth curve. Therefore we extend the remaining plateaus in each growth curve as follows: For each plateau, we estimate the lower and upper growth rate df /dt in the time window and take this as a growth rate corridor. We extend the plateau to the left, until df /dt leaves that growth rate corridor, and similarly extend to the right. By definition, the resulting plateau is equal or larger to the original time-window and we recal the average growth rate over the time window.

From this analysis, we obtain a list of growth phases for each curve that allows us to choose a subset of curves that match our model of population dynamics (Methods). We only choose curves that have two plateaus, where the first plateau has significant growth (exponential phase), and the second plateau has no growth (stationary phase). Here we define significant growth as the average growth rate in the plateau time window is larger or equal to 0.0011 per minute. This forms the set of growth curves that we use to estimate growth traits (9424 curves).

S7. The saturation time in our model of population dynamics

In this section, we restate an explicit expression for the saturation time t_sat in pairwise competition that was derived in earlier work and allows us to see how mutants can influence resource depletion and the wild-type LFC (Eq. (15)). Using the same model of population dynamics (Methods, Eq(11)), previous work [27, 28] derived an approximate formula for t_sat that shows how it depends on the underlying parameters:

where

is the effective lag time of the population,

is the effective e-fold growth time (reciprocal exponential growth rate) of the population,

is the effective yield of the population, and

is the log fold-change of the total biomass, which depends on the initial absolute abundance N₀ of all genotypes and the initial concentration of resources R₀.

Equation (S38) shows how each competing genotype influences the saturation time t_sat and thus the LFCs of all other genotypes via Eq. (15). For example, adding a mutant with slow growth rate increases the effective doubling time (Eq. (S40)), while a mutant with long lag time will increase the effective lag time (Eq. (S39)). Genotypes also influence the saturation time (Eq. (S38)) through the effective yield , which is the harmonic average of yields for all genotypes (Eq. (S41)). The harmonic aver-age means that adding mutants with low biomass yield Y_j can significantly shorten the saturation time, but mutants that are more efficient (high Y_j) have little influence on the duration of the growth cycle. Note that genotypes must be at sufficiently high relative abundance to significantly influence the effective population traits, since each the contribution of each genotype j is weighted by its relative abundance x_j.

S8. Analysis of fitness trajectories from the long-term evolution experiment

A previous analysis of the Long-Term Evolution Experiment (LTEE) performed by Wiser et al.[29] found that evolved populations of Escherichia coli increased in relative fitness over 50,000 generations without converging to a maximum fitness. Here we re-analyze the same data by directly comparing the relative fitness (under the logit encoding) per-cycle and per-generation to see if the choice of fitness statistic changes the conclusion. The experimental protocol of the LTEE has been described elsewhere [23, 30], but we briefly summarize the main aspects: Starting with a single ancestral strain of E. coli, 12 replicate populations were inoculated in 1988 and are perpetually grown in batch cultures with serial transfers, such that 1% of the population biomass is transferred to fresh growth medium each day. Samples from each replicate population are stored every 500 generations, leading to a record of evolved populations over time [30].

Previous work performed competition experiments between the ancestral population and each evolved population (every 500-2000 generations) by combining them in equal proportions and growing them over a single batch culture growth cycle, with measurements of their initial and final absolute abundances taken by colony counting [29]. This data has been prepared in a convenient format by Good et al. [31] and is available for download at https://github.com/benjaminhgood/LTEE-metagenomic/blob/master/additional_data/Concatenated.LTEE.data.all.csv. For a few of the 12 populations, the time series is truncated: population Ara+6 has competition measurements up to generation 4000, population Ara-2 has competition measurements up to generation 30,000, and population Ara+2 has competition measurements up to generation 32,000 [29]. Note that the evolved population tested in these competitions is not a single genotype, but a sample of multiple genotypes that were present in that evolving population at that time. From these values of absolute abundance, we compute the log fold-change (LFC) of the evolved and ancestral populations in each competition and then calculate the evolved population’s relative fitness percycle (Eq. (S30)) and per-generation (Eq. (S31)). The original dataset has two to four replicate measurements for each evolved sample, corresponding to a repeat of the competition experiment at a different day [29]. Initially, we collect all competition experiments into a single dataset (n = 928 competitions) and find that relative fitness per-generation and per-cycle differ in the ranking of these competitions (Fig. S9A,B)For example, one competition is ranked 261 positions lower in relative fitness per-generation than per-cycle (where higher ranks indicate higher fitness). The scatter occurs because the biomass yield evolves downward over time [32, 33]. We can understand the ranking mismatch from the underlying LFCs in the competition experiment (Fig. S9C), that show considerable scatter, with some mutant-wild-type pairs in a positive covariation (compare Fig. S9C and Fig. 1D). The scatter occurs because the biomass yield evolves downward over time [32, 33], shifting the wild-type LFCs downward in 50:50 competitions with the evolved populations (horizontal trend across time points in Fig. S9C).

We can also construct a timeseries of relative fitness for each population and compare fitness rankings at a individual time points. For each of the 12 populations in the LTEE, we define the relative fitness per-generation at time t by averaging the value across all competition experiments with the frozen sample from time t. Simlarly, we define a timeseries of relative fitness percycle for each population. In summary, we can pool all time series into a single dataset (with the truncation described above) and test how the two statistics rank the 12 populations at any point in the experiment. We find that the mismatch between relative fitness per-cycle and per-generation is consistenly low at all time points (Fig. S10), with a few exceptions (for example, at generation 4,000 the two statistics disagree on the top six populations).

A key result from previous analysis on this dataset is that the evolving populations increase indefinitely in relative fitness, rather than leveling off at some maximum fitness value. The original analysis by Wiser et al. [29] tested the long-term trend by fitting the statistic to a hyperbolic model of the time series:

where t is the evolutionary time point at which the evolved population is measured against its ancestor, and f(t) is the fitness statistic measured from that competition (here fitted to the measured values of W). The important feature of the hyperbolic model is that it assumes that relative fitness saturates at a maximum fitness over long times (lim_t→∞ f(t) = 1 + a). To contrast this model, they also tested a power law

under which fitness increases without bound over long times (lim_t→∞ f(t) = ∞). To repeat this analysis with the fitness statistics and used in this article, we must adjust the models to account for the fact that W takes 1 as its neutral value (occurring at t = 0 by definition) while and are zero under neutrality.

Thus we use

as the hyperbolic model and

as the power law model.

As a control against the original analysis of Wiser et al. [29], we first perform our own fit of the time series of the relative fitness per-generation to the hyper-bolic (Eq. (S45)) and the power law (Eq. (S46)) models. Wiser et al. compared their two models using the Bayesian Information Criterion [29], but since the models have the same number of parameters this is mathematically equivalent to comparing the values of R². Figure S11A shows that in our analysis, the power law model has a higher quality of fit (R² = 0.701) than the hyperbolic model does (R² = 0.682) for the fitness statistic , consistent with the original result by Wiser et al. for the statistic [29]. Since the equations for the hyperbolic model only differ in the constant offset (compare Eqs. (S43) and (S45)), our fit of to Eq. (S45) is mathematically equivalent to the fit of W to Eq. (S43). Therefore, our fit should give the exact same results as the original publication [29] in the case of the hyperbolic model (but we couldn’t fit the fitted values a, b in [29] so we were unable to check it). This is not true for the power law, because the modified power law used by Wiser et al. [29] (Eq. (S44)) includes the offset within the parentheses, rather than as an added constant outside (compare Eq. (S44) to Eq. (S46)). This means that the fit of W to the power law with an initial value of 1 (Eq. (S44), fitted values α = 0.00515, β = 0.0950; matching [29]) is different from the fit of to the power law with an initial value of zero (Eq. (S46), fitted values α = 0.000007, β = 0.299891).

We next perform a new analysis by calculating the relative fitness per-cycle from the same timeseries data (12 timeseries pooled together, one for each line) and fitting this fitness statistic to the same hyperbolic (Eq. (S45)) and power law (Eq. (S46)) models. As shown in Fig. S11B, we find that the power law model outperforms the hyperbolic model for the per-cycle fitness, consistent with the model performance for the per-generation fitness. This suggests that Wiser et al.’s original conclusion about fitness increasing with-out bound [29] is robust to the choice of fitness statistic.

As we previously mentioned, the fitness measurements of all replicate populations are not uniform across time (Fig. S11A): there are fewer fitness measurements at late time points (generation 34,000 and higher) because three populations were eventually excluded from the fitness measurements [29]. To further corroborate our results, we thus repeat the model fits using a single time series of the evolved fitness, rather than fitting the models to a all 12 fitness time series simultaneously. We calculate the average fitness per-generation and per-cycle in the evolution experiment as the average across all 12 populations at each time point and fit this population-averaged time series to the hyperbolic (Eq. (S45)) and power law (Eq. (S46)) models (Wiser et al [29] refer to this as a fit to the “grand mean“). Figure S11C,D shows that the power law still has a better quality of fit than the hyperbolic model does, and we thus conclude that the increasing fitness trend reported by Wiser et al. [29] is also robust to the uneven distribution of measurements over time.

We note that the quality of fit R reported in the main text of Wiser et al. [29] differs from the quality of fit we show on our plots. While both studies performed the fits of the models on all 12 populations simultaneously, Wiser et al. [29] evaluated the quality of fit by calculating R for the fitted model (blue and pink lines in Fig. S11A) against the fitness time series averaged over replicates (grey points in Fig. S11C). That is, they fit the model to the data without averaging over population but calculate quality of fit using the population-averaged data. Since we believe this was inconsistent, we have followed a more standard approach of calculating the quality of fit on the same input data used for the fit (i.e., correlating the blue and pink lines in Fig. S11A with grey points in the same plot). As a consequence, the correlations between the fitted models and data reported in the original publication (hyperbolic R = 0.969, power law R = 0.986; see [29]) are systematically higher than the values we find in our re-analysis (hyperbolic R = 0.826, power law: R = 0.837; take square root of the values in Fig. S11A). Although the long-term fitness dynamics in the case of the LTEE are robust to the choice of fitness statistic, we note that it is possible to construct scenarios of microbial growth trait evolution where relative fitness saturates in one statistic but not in the other (Fig. 3). For theory work that on the long-term trend and other models than the powerlaw or hyperbolic model tested here, see [34–36].

S9. Testing AUC and other fitness potentials using simulated competition experiments

In this section we explain our tests of estimated fitness potentials against true relative fitness in simulated competition experiments. We first focus on the area under the growth curve (AUC), which is defined as

where N(t) is a growth curve of absolute abundance (or a proxy such as optical density) and t_eval is a cut-off time for evaluating the area. Many previous studies [12, 15, 16, 37, 38] and growth curve analysis packages [39] have used this definition. The idea of AUC is that, unlike estimated fitness potentials that only account for individual traits of a genotype’s growth (e.g., growth rate or lag time; see below), the AUC literally integrates the whole growth dynamics into a single number. For example, both fast growth rate and short lag time are manifested in greater AUC. Note that while one can attempt to use the AUC as an approximate fitness potential (as we investigate here), it is not a measure of absolute fitness (Sec. S3) since it is insufficient to predict changes in absolute abundance (i.e., the area under the growth curve does not determine the change in absolute abundance from beginning to end).

To compute the AUC for the set of single-gene deletion genotypes in our data set, we first simulate a growth curve for each genotype under the population dynamics model in Eq. (11) with the traits estimated from the original data (Methods). We use simulated growth curves, with trait values inferred from the measured growth curves, rather than the actual measured growth curves since we are comparing the AUCs to relative fitness also from simulations of competitions; we do not have actual competition data for these genotypes, so it would be an apples-to-oranges comparison if we used AUCs from the actual growth curve data. Furthermore, in empirical growth curves, the AUC is also influenced by technical variation in the initial biomass N₀ and the initial concentration of resources R₀, so using simulations removes these effects.

Figure S14A shows the distribution of saturation times t_sat (as defined for the population dynamics model in Methods) numerically calculated for all simulated growth curves of the deletion mutants. We calculate the AUC for each growth curve using Eq. (S47) with an evaluation time of t_eval = 16 hours, since that includes the stationary phase in the vast majority of our simulated growth curves (Fig. S14A). From the mutant’s AUC we calculate the AUC-based estimator of the mutant’s relative fitness

where AUC_wt is the AUC for a simulated growth curve of the wild-type (using the median wild-type traits in our database; see Methods). We then simulate a single growth cycle of the mutant competing against the wild-type (Eq. (11) with equal initial abundance of the mutant and wild-type) to calculate the mutant’s “true” (under the model assumptions) relative fitness per-cycle with the logit encoding . We simulate both the coculture and the monoculture growth dynamics with the same initial biomass N₀ = 0.05 OD and resource concentration R₀ = 111 mM [24]. Repeating this analysis for all mutant genotypes in the dataset, we calculate the Spearman rank correlation between the AUC-predicted relative fitness from monoculture ŝ_AUC and the true relative fitness in competition (Fig. S13, column C).

The success of the AUC depends on the evaluation time t_eval, which sets the time window from which information is captured from the growth curve by the integral (Eq. (S47)). Figure S14B–D shows the correlation between the relative fitness in coculture and the AUC estimator ŝ_AUC (Eq. (S48)) for three values of t_eval: the mean saturation time of all genotypes in monoculture (t_eval ≈ 13 hours), a significant longer value (t_eval ≈ 24 hours), and an intermediate value (t_eval = ≈ 16 hours; used for Fig. S13). For short evaluation times (t_eval ≈ 13 hours), the AUC underestimates the fitness of mutants with long lag but fast growth, which leads to a nonlinear relationship between the AUC estimator and the true relative fitness (compare Fig. S14B and Fig. S14C). For long evaluation times (t_eval = 24 hours), there is greater scatter between the AUC predictor and true relative fitness for the highest fitness mutants (compare spread in Fig. S14D and Fig. S14C). Intuitively, these mutants have short lag time or fast growth rate and saturate early in monoculture, so their AUC values are effectively set by the biomass yield, which has no predictive value on the competition outcome. In summary, Fig. S14B–D shows a trade-off between accurately ranking highly-deleterious mutants (which needs long t_eval) and ranking highly-beneficial mutants (which need short t_eval).

Besides AUC it is possible to use other features of the monoculture growth curve as approximate fitness potentials. For example, one can use the monoculture growth rates alone as fitness potentials:

or the monoculture lag times:

Another possible fitness potential is the absolute abundance at saturation, which in our model of population dynamics (Eq. (11)) is proportional to the biomass yield:

Finally, one can also use the difference in the monoculture log fold-changes (LFCs):

This looks similar to the definition of relative fitness percycle (Eq. (S30)) but is distinct because it uses the LFCs in monoculture rather than the true LFCs in coculture (which may be different).

We test each of these fitness potentials against the relative fitness in pairwise competition, using different input datasets of trait variation and the ‘GNU Parallel’ command to speed up the simulation process [40]. Figure S13 rows B and C show that growth rate and lag time can act as perfect fitness potentials if that trait is the only trait with variation across mutants. This is because the relative fitness is proportional to differences in each of these traits when they are the only source of variation (see Sec. S13). Section S13 also shows that differences in biomass yield (Y_mut − Y_wt) have no effect on fitness by themselves, which is why the biomass yield of the strains in monoculture is a poor fitness potential to estimate relative fitness. This large but neutral variation in the biomass yield across mutants in our datasets means that the LFC is also a poor fitness potential.

All of these trait-based fitness potentials are outperformed by the AUC, which provides the best approximation of the mutant fitness ranking in coculture under realistic trait variation (Fig. S13, row A). More broadly, it is important to treat absolute and relative fitness, as well as fitness potentials, as distinct concepts, serving different purposes [1, 41]. As we show here in simulation (Fig. S13), and others have shown in experiments [15, 16, 42, 43], measuring fitness potentials is not enough to demonstrate that a mutant genotype will outcompete the wild-type.

S10. Coarse-graining pairwise relative fitness in multi-genotype populations

In this section, we point out the specific advantages of the logit-encoding for coarse-graining pairwise relative fitness in bulk competition experiments. While the pair-wise relative fitness is defined for any encoding m, the logit encoding endows it with some convenient mathematical properties not shared by other encodings (e.g., the log encoding). The logit encoding of the pairwise relative abundance has the property

meaning that it is antisymmetric under exchange of the indices i and j (logit x_ij = − logit x_ji) and additive across pairs of indices (logit x_ij = logit x_ik + logit x_kj). Since the logit-encoded pairwise relative fitness is just the time derivative of the logit function (Eq. (19)), it carries equivalent properties of antisymmetry and additivity:

We also note that the logit encoding of the pairwise relative abundance has the property:

Rescaling the relative abundances of either genotype thus does not change the pairwise relative fitness (since it only shifts the logit by a constant, which does not affect its derivative). This means that pairwise relative fitness is an “intensive” property of a genotype, analogous to intensive properties in statistical mechanics (such as temperature) that do not scale with system size. For example, if we split a mutant genotype into two subgroups (e.g., differentiated by a neutral marker), the pairwise relative fitness of each mutant subgroup with respect to the wild-type will be the same as the pairwise relative fitness of the mutant genotype as a whole compared to the wild-type. In contrast, the logit-encoded total relative fitness does not satisfy this property since logit(ax_i) ≠ logit x_i+ constant.

When the encoding m is the logit function, the pairwise relative fitness per-cycle still satisfies the above properties (antisymmetry and additivity with respect to indices i, j, and invariance under relative abundance rescaling) since those are properties of the underlying logit encoding. This is also apparent from interpreting the per-cycle fitness statistic as an integral of the instantaneous statistic:

S11. The relative fitness between coarse-grained groups of genotypes

In this section, we generalize the concept of pairwise relative fitness to pairs of genotype groups rather than pairs of individual genotypes. In a multi-genotype population with non-overlapping subsets of genotypes 𝒜 and ℬ, define

as the relative abundance of 𝒜 genotypes compared to ℬ genotypes at time t. Analogous with Eq. (19), we define the fitness of group 𝒜 relative to group ℬ as

for an encoding m. Under the logit encoding, it turns out that the fitness between these two groups can be conve-niently expressed as a weighted average of the pairwise fitness between the member genotypes in each group:

To prove Eq. (S60), we first note that

Thus we can rewrite the logit-encoded relative pairwise fitness between 𝒜 and ℬ as

where we have invoked Eq. (S61) on the second line. We can expand each term on the right-hand side of Eq. (S62) as

and then insert Eq. (S63) into Eq. (S62) to obtain

where we have collected the normalization factors (sums over relative abundances in 𝒜 and ℬ) as a single prefactor. We then rewrite each product of sums in Eq. (S64) as

to finally obtain

Identifying the term in the inner parentheses as the pair-wise selection coefficient (Eq. (19)) then results in Eq. (S60).

Equation (S60) establishes that the relative fitness between a pair of genotype groups is a weighted sum of relative fitnesses between individual pairs of genotypes in those groups, but this holds for relative fitness defined at an instant in time (since it is based on derivatives). For relative fitness defined over a finite time interval (e.g., a growth cycle in batch culture), an analogous but approximate result holds. We first write the relative fitness over a growth cycle time interval as an integral over the instantaneous relative fitness (inserting Eq. (S60) into Eq. (S57)):

where t_sat is the end time of the growth cycle (Methods, Sec. S7). The integral in Eq. (S67) is difficult to calculate as the relative abundance trajectories x_i(t), x_j(t) depend on the relative fitness of the genotypes in a non-trivial way. Instead, we make the approximation that the relative abundances do not change significantly over time of the growth cycle (x_i(t) ≈ x_j(0)) and can thus pass the integral through the sums in Eq. (S67) to show that the per-cycle relative fitness between a pair of genotype groups is approximately also a weighted sum of per-cycle relative fitnesses:

Conceptually, assuming the relative abundances are approximately constant over the growth cycle is equivalent to assuming selection is weak; one can also show this mathematically by expressing the relative abundances x_i(t) in terms of the pairwise relative fitnesses in Eq. (S67) and keeping only terms to leading order in .

We finally note that the total relative fitness (instantaneous Eq. (17) and per-cycle Eq. (20)) is a special case of the relative fitness between groups (Eqs. (S60) and (S68)) where 𝒜 is the single genotype i and ℬ is all other geno-types besides i:

S12. Fitness error from the frame of reference in bulk competition experiments

Here we calculate the error that arises from measuring the total relative fitness of each mutant in a bulk competition experiment of a mutant library, rather than the pairwise relative fitness between each mutant and the wild-type. We call this difference the error from the frame of reference, the frame of reference being either the whole population in total fitness or the wild-type in pairwise fitness. Note that this is an error between two different fitness quantifications of the same bulk competition experiment; Sec. S14 addresses the error (arising from higher-order interactions) between fitness quantifications in bulk versus pairwise competition experiments. Here we only consider relative fitness under the logit encoding and measured per growth cycle, so we drop these labels to simplify notation.

Consider a bulk competition experiment of a wild-type and a library of large number of mutants over a single batch growth cycle of time t_sat (Methods). The total relative fitness of mutant i is (Eq. (20))

while its pairwise relative fitness compared to the wild-type is (Eq. (21))

Using the coarse-graining rules from Sec. S11 (namely Eqs. (S68) and (S70)), we can express the total relative fitness of mutant i as a weighted sum of the pairwise relative fitnesses between the mutant and the wild-type and between the mutant and the rest of the mutant library:

where the notation lib \ i refers to the mutant library excluding the mutant i. The approximation here is due to our assumption in Eq. (S68) that selection is weak enough that the relative abundances of genotypes do not change too much over the growth cycle. Since we can rewrite the pairwise relative fitness between mutant i and the rest of the library as a difference between i and the wild-type and the rest of the library and the wild-type (using Eq. (S55))

we insert this into Eq. (S73) to obtain

The difference between the total and pairwise relative fitness is therefore

Since mutant libraries in these experiments typically contain hundreds or thousands of mutants, the contribution of a single mutant i is small and thus we can assume that the properties of the mutant library excluding mutant i (lib \ i) are approximately the same as the library as a whole (so that x_lib\i ≈ x_lib, s_lib\i,wt ≈ s_lib,wt). This allows us to further simplify the error as

Equation (S77) shows why this offset between the total and pairwise relative fitnesses is approximately independent of the focal mutant i (hence the constant shift for all mutant points in main text Fig. 4C). The sign of the error from the frame of reference depends on the mean fitness of the mutant library: a mutant library that is overall deleterious relative to the wild-type (s_lib,wt < 0) causes the total relative fitness for a mutant σ_i to overestimate the mutant’s pairwise relative fitness s_i,wt. Intuitively, this is because the total relative fitness is comparing the mutant to a mixed population of wild-type and other mutants, which are on average worse competitors than the wild-type, which thus makes the mutant appear to be better than if it is just compared to the wild-type alone (compare top and bottom panel in Fig. 4B). Equation (S77) also shows that the error from the frame of reference can be reduced if the mutant library is neutral relative to the wild-type (s_lib,wt = 0) or the mutant library has small relative abundance in the culture biomass (x_lib ≪ 1). In bulk competition experiments with barcoded mutant libraries, these assumptions are often not met since the mutant libraries tend to be overall deleterious (as in our simulated bulk competition for the yeast single-gene knockouts, see Fig. S6) and are inoculated at a high relative abundance [44–48] Since this makes the error in Eq. (S77) significant, we instead recommend including barcoded wild-type strains as references in the bulk competition, so that pairwise fitness can be quantified relative to that (Eq. (S72)) rather than using the total relative fitness. By using a mix of barcoded wild-type cells and non-barcoded wild-type cells it is further possible to optimize this protocol and save on sequencing investment [49].

Finally, we want to point out a difference between the best practice we recommend here (Discussion) and a wide-spread practice in estimating fitness estimates in bulk competition experiments. In practice, transposon-seq experiments that grow the mutant library by itself start with an estimate of total relative fitness, and then subtract the median total relative fitness of the knockouts [45–47, 50] or a mean total relative fitness [44, 51, 52]. However, these corrections are not explicitly founded in the choice of a reference group (like a set of neutral genotypes or a wild-type), making the correction appear ad-hoc [45, 47, 50], or the reference is a strain that is not part of the culture, like in the fitness estimates for barcoded lineages that are evaluated against the initial ancestor without that ancestor actually being present [44, 51, 52]. To make things more confusing, even those studies that do include a wild-type then describe their method as an estimate of total relative fitness of the mutant under the log encoding, subtracted by the total relative fitness of the wild-type under the log-encoding [49, 53, 54]. The result is a pairwise relative fitness under the logit-encoding (Eq. (23)) but presenting it this way obscures that choice of encoding and the relationship to the classic, logit-based selection coefficient used in pairwise competition experiments (Eq. (6)). We hope that our framework can provide more clarity: The choice of the reference group happens at the level of relative abundance, by calculating a pairwise relative abundance (Eq. (18) or Eq. (S58)), and this removes the need for any correction on the fitness values themselves.

S13. Pairwise relative fitness using an explicit model of population dynamics

In the model of population dynamics (Methods, Eq. (11)), we can calculate the pairwise relative fitness of genotypes based on the approximation of the saturation time (Sec. S7; [27, 28]). The pairwise relative fitness of genotype i relative to genotype j (per-cycle and under the logit encoding) is

where

The e-fold growth time for genotype j is τ_j = 1/g_j, and the terms Δλ_ij = λ_i − λ_j and Δτ_ij = τ_i − τ_j are the differences between the two genotypes lag times and growth times. Since the terms in Eq.(S78d) depend on the covariation between growth and lag, we interpret these terms as couplings between the growth and lag phases; they are zero if only two genotypes are present.

S14. Fitness error from higher-order interactions between pairwise and bulk competition experiments

In this section, we calculate the error in relative fitness of a mutant arising from higher-order interactions in bulk competition experiments with large mutant libraries, compared to the “true” relative fitness in pairwise competitions between just the focal mutant and wild-type alone. Here we only consider relative fitness under the logit encoding and measured per growth cycle, so we drop these labels to simplify notation. Let the pairwise relative fitness of mutant i compared to the wild-type (Eq. (21)) be

in the pairwise competition with the wild-type alone, and

in the bulk competition with all other mutants in the library. The superscripts pair and bulk indicate that the dynamics of x_i,wt(t) and the saturation time t_sat may be different in the two competitions. The difference between these two measurements of relative fitness is

Since the difference between the bulk and pairwise competitions is the presence of the other mutants in the library, we interpret this difference as the fitness error from higher-order interactions among the mutants.

We now calculate how this error depends on the underlying growth traits of the genotypes using the population dynamics model (Methods, Eq. (11)); and the explicit expression for relative fitness from Sec. S13. Based on the approximate pairwise relative fitness in this model (Eq. (S78)), the relative fitness in the pairwise competition is the sum of two terms

while the relative fitness in the bulk competition is the sum of three terms

where the third term represents the coupling between growth and lag phases present only in populations with more than two genotypes (Eq. (S78d)). We define the higher-order effects on the selection for lag time as

Using Eq. (S78b), we can express this in terms of the underlying traits as

where is the difference in effective e-fold growth times (Eq. (S40)) in the bulk competition and in the pairwise competition of mutant i and the wild-type.

We similarly define the higher-order effects in the selection on growth rate as

and calculate and from Eq. (S78c) to get the expression

where LFC^bulk is the log fold-change (Eq. (S42)) of the total biomass in the bulk competition and LFC^pair,i is the log fold-change of the total biomass in the pairwise competition between mutant i and the wild-type. We note that total biomass growth in the bulk competition LFC^bulk depends on the mutant library through the effective biomass yield (Eq. (S41)), but this dependence is weak because the yield enters only logarithmically into Eq. (S42). We thus assume that the bulk and pairwise competition have equal biomass growth (LFC^bulk ≈ LFC^pair,i) for all mutants i such that the higher-order effect on the growth rate selection (Eq. (S87)) is proportional to the increase in the mean doubling time

Since this has the same form as the higher-order effect on lag time selection in Eq. (S85), we can combine Eq. (S88) and Eq. (S85)) into

Because the effective growth time in the pairwise competition is approximately just the effective growth time of the wild-type alone τ_wt (assuming the mutant is competed against it at low initial relative abundance), this means that

That is, the relative error of lag and growth selection from higher-order interactions is approximately independent of the individual mutant, but rather depends on properties of the wild-type (τ_wt) and the whole mutant library (through ). This is why we observe an approximately constant slope across all mutants in Fig. S16 (dark orange points) when comparing the error against the pairwise relative fitness. The slight deviation from the constant slope is due to our approximation that the LFCs are the same between the bulk and pairwise competitions; this is a good approximation for the yeast deletion library but not exactly true, and hence causes a slightly different scaling between the growth Δs_growth and lag Δs_lag error terms compared to Eq.(S90).

Finally, we note that the growth-lag coupling terms do not have a simple scaling with pairwise relative fitness since they depend quadratically on trait differences; this is shown as the bright orange points in Fig. S16. In the main text, we therefore refer to Δs_i,wt,lag + Δs_i,wt,growth as the fitness-dependent error and the as the fitness-independent error from higher-order interactions.

S15. Choosing the mutant library abundance in bulk competition experiments

Section S14 showed that measuring relative fitness of a mutant in a bulk competition (with a library of other mutants also present) entails an error due to higher-order interactions among the mutants, compared to measuring relative fitness in a pairwise competition consisting of just the mutant and wild-type. Here we show how this error depends on the relative abundance of the mutant library in the bulk competition, so that we can estimate the range of library abundances that keep the error below a desired threshold.

Calculating the relative error on fitness

Let the absolute error in relative fitness from higher-order interactions for a mutant i be Δs_i,wt (Eq. (S81)). Let the relative fitness of this mutant in a pairwise competition be (Eq. (S79)). Since the error Δs_i,wt depends on the relative abundance x_lib of the whole mutant library, our goal is to determine what range of x_lib keeps the error in relative fitness (compared to the “true” relative fitness in the pairwise competition) below a chosen threshold ϵ:

In Sec. S14 we calculated the dependence of Δs_i,wt on the underlying growth traits. However, since here we are mainly concerned with the dependence on the library abundance x_lib, we present an alternative calculation that better captures that dependence.

We start by pointing out that we can express the relative fitness in pairwise (Eq. (S79)) and bulk competitions (Eq. (S80)) in terms of the saturation times for these competitions, using the explicit solution to the population dynamics model in Eq. (14):

We can thus express the error from higher-order interactions (Eq. (S81)) as

where Δg_i,wt = g_i − g_wt. Equation (S94) shows that the mutant library affects the fitness of individual mutants by changing the saturation time. Mathematically, this is equivalent to the results of Sec. S14, which showed how the difference in effective e-fold growth times between bulk and pairwise competitions primarily mediated the higher-order effects, but expressed in terms of the saturation time t_sat (which is not identical to but related through Eq. (S38)). The error in Eq. (S94) is also proportional to the growth rate advantage of the mutant compared to the wild-type. In particular, mutant geno-types that only differ in lag time are not affected by the mutant library since their advantage is accrued once at the beginning of the growth cycle and therefore does not scale with the total time of competition.

According to Eq. (S38), the saturation time t_sat depends on the effective lag time (Eq. (S39)), effective e-fold growth time (Eq. (S40)), and the log fold-change LFC (Eq. (S42)) for the competition. To simplify the calculation, we introduce a few assumptions. We assume all mutants have the same yields Y_i such that the LFCs in the bulk and pairwise competitions are identical (LFC^bulk = LFC^pair). We also assume that the relative abundance of the mutant in the pairwise competition is small enough that the saturation time in that case is set entirely by the wild-type traits:

Note that this means that the difference in saturation times is independent of the specific mutant i; the only dependence of the mutant i on the overall error from higher-order interactions is through the difference in growth rates in Eq. (S94).

We can thus write the fitness error as

The dependence on the mutant library relative abundance x_lib is contained within the effective traits and of the bulk competition. Using the definitions in Eqs. (S39) and (S40), we can show that

Inserting Eqs. (S97) and (S98) into Eq. (S96), we obtain

where Δλ_lib,wt = λ_lib − λ_wt is the difference in lag times, and Δτ_lib,wt = τ_lib − τ_wt the difference in e-fold growth times (reciprocal growth rates), between the library (defined according to Eq. (S39)) and the wild-type. All the dependence on the library relative abundance x_lib is now contained in the fraction outside the square brackets in Eq. (S99).

Since the relative fitness of the mutant library as a whole compared to the wild-type in bulk competition is (using Eq. (S78))

we can rewrite Eq. (S99) as

This shows that a mutant library that is neutral relative to the wild-type in the bulk competition removes the error from higher-order interactions, even when that neutrality is based on a trade-off between growth rates and lag times (Δ λ _lib,wt =−Δ τ_lib,wt LFC ^bulk). We note that the error of higher-order interactions (Eq. (S101)) appears similar to the error from the frame of reference (Eq. (S77), but the former includes the relative mutant growth rate Δg_i,wt/g_wt as an additional pre-factor.

Calculating the upper bound on relative abundance of the mutant library

To calculate the maximum relative library abundance that keeps the relative error below the threshold ϵ, we input the absolute error in Eq. (S99) into the relative error bound of Eq. (S91) and rearrange to isolate the dependence on the library abundance x_lib:

where

depends on the focal mutant i but does not depend on the library abundance x_lib. The left-hand side of Eq. (S102) varies between zero and one as a function of x_lib. This means that if ϵ A_i > 1, any value of x_lib will satisfy Eq. (S102), meaning that the relative error on mutant i from higher-order interactions will always be less than ϵ. For example, this holds when the mutant i has the same growth rate as the wild-type (Δg_i,wt = 0, which causes A_i → ∞), even if it varies in lag time and/or other mutants have variation in growth rates.

We thus next consider the case where ϵ A_i < 1. We multiply both sides of Eq. (S102) by the denominator on the left-hand side (always positive) to obtain

and then collect all the terms involving x_lib on the left hand side:

Since ϵ A_i < 1, the factor multiplying x_lib on the left-hand side of Eq. (S105) must be positive and thus we can divide both sides to obtain an upper bound on the library abundance x_lib such that the relative error on fitness is less than ϵ :

Since ϵ will typically be small, we can simplify the right-hand side of Eq. (S106) by approximating it to first-order in ϵ :

This approximation holds as long as ϵ A_i(g_lib - g_wt)/g_{lib ≪} 1.

The maximum mutant library abundance determined by Eq. (S106) or (S107) is specific to a single mutant genotype i, meaning that the relative fitness errors for other mutants may exceed ϵ even if the inequality for mutant i is satisfied. To keep the relative fitness errors for all mutants less than ϵ, we need to choose a library abundance such that

This means that Eq. (S107) must be satisfied for all mutants j:

Thus we must determine the minimum value of A_j over all mutants j. Using the definition in Eq. (S103), A_j is minimized for the mutant with minimum vale of

where we have calculated the pairwise relative fitness in Eq. (S92) using from Eq. (S95). Mutants that minimize Eq. (S110) are those with tradeoffs between lag times and growth rates such that their overall pair-wise relative fitness with respect to the wild-type is zero. This makes sense, since these mutants will have a relative fitness close to zero in the pairwise competition , but nonzero fitness relative to the wild-type in the bulk competition as the relative selection on the lag time and growth rate shifts (e.g., in Eq. (S78)).

Special case of variation in growth rates only

To determine an even simpler estimate on the maximum mutant library abundance, we consider the special case where genotypes vary only in growth rates and not lag times. Using the definition in Eq. (S103), A_i = g_lib/(g_wt - g_lib) for all mutants i, and thus there is a single bound on library abundance for all mutants (using Eq. (S107)):

This is the same as Eq. (10) in the main text.

We test Eq. (10) using our simulated competition data for the yeast single-gene deletion library (Methods) with variation in all three traits: lag time, growth rate, and biomass yield (Fig. 2B-C). We compute the growth rates of the mutant library (g_lib = 0.389 h⁻¹ using Eq. (S40)) and the wild-type (g_wt = 0.406 h⁻¹). Using a relative error threshold of ϵ = 0.01, the maximum mutant library relative abundance according to Eq. (10) is x_lib = 24.6%. Figure 4D shows that this mutant library abundance indeed is able to keep the relative error below the threshold for mutants with high relative fitness, but, as expected, this mutant library abundance fails for mutants close to neutrality.

We furthermore compute the maximum library abundance with Eq. (S109), based on the precise trait variation in our dataset. The mutant library has an effective lag time λ_lib = 1.95 h (compared to wild-type lag time λ_wt = 1.92 h) and an effective e-fold growth time τ_lib = 2.57 h (compared to the wild-type τ_wt = 2.46 h), leading to an overall negative relative fitness based on a LFC value of LFC^bluk = log 100 (a typical fold-change in our dataset) and the limit of low relative abundance of the library. We insert these trait values into Eq. (S109) and obtain a maximum mutant library abundance Figure 4D shows that this much smaller mutant library abundance is able to keep the relative error below the desired threshold for all mutant genotypes. Note, however, this is still not fully exact since we have ignored the underlying variation in biomass yield in our trait dataset.

Significance of findings

Strength of evidence

Abstract

Introduction

Results

Predictive power varies across different relative fitness statistics

Overview of the choice of encoding and the choice of timescale for quantifying relative fitness.

Relative fitness statistics require a choice of timescale

Relative fitness per-generation ranks mutants differently than relative fitness per-cycle

Fitness statistics disagree over predicted ranking of single gene knockouts

Comparison of mutant fitness rankings with different statistics on empirical trait variation.

Potential consequences of the choice of fitness statistic for the interpretation of evolutionary data.

Higher-order effects distort relative fitness measured in bulk competitions

The choice of library abundance and reference group in bulk competition experiments.

Discussion

Best practices for quantifying mutant fitness in high-throughput experiments

Flow-diagram for the quantification of relative fitness from time-series data.

Consequences of fitness quantification choices for microbial ecology and evolution

Other sources of discrepancy in quantifying mutant fitness

How general is the disagreement between fitness statistics?

Methods

Inferring growth traits for the single-gene knockout collection in Yeast

Simulating population dynamics in competition experiments

Testing total and pairwise relative fitness in bulk competition experiments

Acknowledgements

Supplementary Information

S1. Different types of fitness under example models of population dynamics

S2. The role of logistic population dynamics in logit-encoded relative fitness

S3. Definition of absolute fitness for a genotype

S4. Relative fitness predictions in discrete time: additive vs. multiplicative form

S5. Derivation of mismatch conditions for relative fitness per-cycle and per-generation

S6. Analysis of growth curves to identify growth phases

Parameter settings for Gaussian Process optimisation.

S7. The saturation time in our model of population dynamics

S8. Analysis of fitness trajectories from the long-term evolution experiment

S9. Testing AUC and other fitness potentials using simulated competition experiments

S10. Coarse-graining pairwise relative fitness in multi-genotype populations

S11. The relative fitness between coarse-grained groups of genotypes

S12. Fitness error from the frame of reference in bulk competition experiments

S13. Pairwise relative fitness using an explicit model of population dynamics

S14. Fitness error from higher-order interactions between pairwise and bulk competition experiments

S15. Choosing the mutant library abundance in bulk competition experiments

Calculating the relative error on fitness

Calculating the upper bound on relative abundance of the mutant library

Special case of variation in growth rates only

Predicting the absolute and relative abundance of microbial populations.

The effect of encodings on a non-logistic relative abundance trajectory.

The advantages of the logit-encoding for linear regression to relative abundance time-series data.

The variation of wild-type and mutant log fold-change under resource consumption constraints

Replicate measurements for growth rate, lag time and yield in our empirical trait dataset.

The distribution of fitness effects under relative fitness per-cycle and per-generation.

Exploring alternative conditions for misranking between fitness statistics in yeast gene-deletion data.

Example dataset with anti-correlation between relative fitness per-cycle and per-generation.

Comparison of mutant fitness rankings across the complete LTEE competition dataset.

Rank difference between relative fitness per-cycle and per-generation as a function of time in the LTEE.

Long-term fitness trends in the LTEE under relative fitness per-cycle and per-generation.

Comparing magnitude epistasis between relative fitness per-cycle and per-generation.

Predicting relative fitness with monoculture proxies under different scenarios of trait variation.

The choice of the cut-off time for evaluating the area under the curve

The error in mutant fitness rankings between bulk and pairwise competition experiments.

A decomposition for the error from higher-order interactions in bulk competition experiments.

The relative error between bulk and pairwise competition experiments.

References

Article and author information

Author information

Justus Wilhelm Fink

Michael Manhart

Version history

Cite all versions

Copyright

Metrics