Quantifying microbial fitness in high-throughput experiments

Justus Wilhelm Fink; Michael Manhart

doi:10.7554/eLife.102635.2

Revised: This Reviewed Preprint has been revised by the authors in response to the previous round of peer review; the eLife assessment and the public reviews have been updated where necessary by the editors and peer reviewers.

Reviewing Editor
Sara Mitri
University of Lausanne, Lausanne, Switzerland
Senior Editor
Aleksandra Walczak
CNRS, Paris, France

Reviewer #1 (Public review):

The authors point out that the fitness estimates obtained from different experimental assays (monoculture, pairwise competition or bulk competition) are not generally equivalent, not even with regard to the fitness ranking of different genotypes. Using a computational model based on experimentally measured growth phenotypes for knockout strains in yeast, as well as data from Lenski's Long Term Evolution Experiment (LTEE), they derive a set of best practice rules aimed at extracting the optimal amount of information from such experiments.

The study is very complete on a technical level, and the conceptual weaknesses raised in the first round of reviews have been fully addressed in the revision.

https://doi.org/10.7554/eLife.102635.2.sa3

Reviewer #2 (Public review):

Summary:

The manuscript "Quantifying microbial fitness in high-throughput experiments" provides a comprehensive analysis of the various approaches to quantifying fitness in microbial evolution, focusing on three primary factors: encoding of relative abundance, time scale of measurement, and the choice of reference subpopulation. The authors systematically explore how these choices impact fitness statistics and provide recommendations aimed at standardizing practices in the field. This manuscript aims to highlight the impact of differing fitness definitions and the methodologies utilized for analysis and how that can significantly alter interpretations of mutant fitness, affecting evolutionary predictions and the overall understanding of genetic interactions in the experiments.

Strengths:

The choices for quantifying fitness in evolution experiments are critical and highly relevant given the increasing prevalence of high-throughput experiments in evolutionary biology. The authors methodically categorize fitness statistics and their implications, providing clarity on a complex subject. This structured approach aids in understanding the nuances of fitness measurement. The manuscript effectively highlights how different choices in fitness measurement can influence fitness rankings and the understanding of epistasis, which is important for modeling evolutionary dynamics.

Comments on revisions:

The authors have comprehensively addressed all previous comments and suggestions. In particular, the addition of the new methods section: 'A guide to calculate pairwise relative fitness under the logit encoding from bulk competition data' - significantly improves the clarity of the implementation and helps in the overall interpretation of the framework.

https://doi.org/10.7554/eLife.102635.2.sa2

Reviewer #3 (Public review):

Summary:

The authors present analyses of different fitness measures derived from empirical data from yeast knock-out mutants and the long-term evolution experiment (LTEE) with Escherichia coli to explore discrepancies and identify preferred methods to estimate relative fitness in high-throughput experiments. Their work has three components. They first discuss the different "encodings" of relative abundance data and conclude that logit-transformations are preferred, because they transform nonlinear abundance trajectories into linear trajectories with greater predictive power. Next, they compare per-generation with per-growth cycle relative fitness estimates inferred from simulations of pairwise competitions based on published growth traits for the yeast strains and on published pairwise competition measurements for the LTEE data. Both data sets show quantitative and qualitative (i.e. rank order) discrepancies of estimates across different time scales, which are highlighted by considering possible underlying causes (i.e. trade-offs between growth traits) and consequences (i.e. epistasis among mutations affecting different growth traits). Finally, the authors compare simulated pairwise and bulk (i.e. where many mutants compete during a growth cycle in a single environment) competition assays based on the yeast knock-out mutants and demonstrate an optimal ratio of collective mutants to wild-type strains that minimizes both sampling error and overestimation of fitness estimates when compared with pairwise competitions.

Strengths:

The study deals with a highly relevant topic. Fitness is central to general evolutionary theory, but also poorly defined and implies different traits for different organisms and conditions. For microbes, which are often used in evolution experiments, high-throughput experiments may yield different measures to quantify abundance over time, from individual growth traits to bulk competition experiments. Hence, it is relevant to consider discrepancies among those measures and identify preferred measures with respect to predicting population dynamic and evolutionary processes. The present study contributes to this aim by (i) making readers aware of differences among commonly used fitness estimates, (ii) showing that simulated (yeast) and calculated (E. coli) competitive fitness may differ across time scales, and (iii) showing that bulk competitions may yield relative fitness estimates that are systematically higher than pairwise competitions. The study is rather thorough on the theory side, with extensive derivations and analyses of various fitness measures using their resource competition model in the Supplementary Information. The study ends with a few practical recommendations for preferred methods to infer relative fitness estimates, that may be useful for experimentalists and stimulate further investigations.

Weaknesses:

The study has a few limitations. Perhaps the most apparent limitation is the lack of a clear answer to the question which fitness measure is best "in the light of first principles". The authors show clear discrepancies between fitness estimates across different time scales or using different reference genotypes in bulk competition and provide useful recommendations based on practical considerations (e.g. using pairwise competitions as "golden standard"), but it remains unclear whether these measures provide the greatest value for the questions researchers may want to answer with them (e.g. predict shifts in genotype frequencies). -- The authors have convinced me in their response that their recommendations were fundamentally related to the resource competition model, and the changes in introduction and discussion help to appreciate the choice of fitness measure in relation to the research question.

A second limitation is that the authors analyse fitness differences arising solely from resource competition, whereas microbes often interact via other mechanisms, e.g. the production of anticompetitor toxins, cross-feeding of metabolites or lack of growth to enhance their persistence in stress conditions. Without simulations of these processes, understanding discrepancies among fitness measures is necessarily limited. In addition, the analysis of trade-offs between growth traits causing these discrepancies during resource competition seems confounded by biases in measurement error or parameter estimation, at least for growth rate and lag time (Fig. 2B), where the replicate estimates for the wildtype show a similar negative correlation. -- The motivation to use a resource competition model for fitness inference is generally well motivated now. I accept their argument that resource competitive differences are most important for microbial strains with small genetic differences (e.g. from mutant libraries or from the same evolution experiment). However, it is relevant to note that this ignores situations that are rather common, where the wild-type strain produces an anticompetitor toxin or causes growth inhibition through metabolite products that lower the pH (and derived strains will likely contain resistant mutations).

Third, the study does not validate relative fitness predictions from growth traits (as is done for the yeast mutants) with measured relative fitness estimates using competition assays, while such data are available, e.g. for the LTEE. This would strengthen their inferences about preferred fitness measures. -- In their response, the authors explain that their aim was different, i.e. the provide "proof of principle" that the choices of fitness measure can produce discrepancies even when they follow the same growth model.

Fourth, the analysis of epistasis between mutations affecting different growth traits (shown in Fig. 3) based on the LTEE data could be better introduced and analysed more comprehensively. Now, the examples given in panels C-F seem rather idiosyncratic and readers may wonder how general these consequences of using fitness estimates based on different time scales are. -- The authors have made extensive improvements to address how different growth parameters, especially lag and growth rate, differently affect apparent epistasis based on measures at different time scale (per generation vs per cycle). These provide a more comprehensive analysis of down-stream consequences for epistasis detection.

Finally, the study is generally less accessible to experimentalists due to the extensive and principled treatment of specific population dynamic models and fitness inferences. This may distract from the overarching aim to identify fitness measures that are most accurate and useful for predictions of population dynamic and evolutionary processes. In this light, the motivation for the initial discussion of the importance of how to best encode relative abundance (Fig. 1) is unclear. Also, the conclusion, that logit encoding is preferred, because it linearizes logistic growth dynamics and "improves the quality of predictions", is not further motivated. Experimentalists using non-linear models to infer fitness from growth curves or competition assays may miss the relevance of this discussion. -- Thanks for this explanation (indeed, I confused "logistic dynamics" with "logistic growth model"); the additional explanations and text reductions have improved accessibility for experimentalists.

Comments on revisions:

I appreciate the thorough and effective response to all recommendations and have no further comments.

https://doi.org/10.7554/eLife.102635.2.sa1

Author response:

The following is the authors’ response to the original reviews.

We thank both editors and the three reviewers for their constructive criticism of our work. As a result of these comments, we have made several significant revisions to the paper that we believe strengthen and clarify our major results:

(1) Following suggestions from Reviewers #1 and #3, we have have improved our introduction to the different fitness concepts (lines 105–148) and streamlined the discussion of the logit encoding (lines 175–190). In particular, we have moved the most technical points to the SI (Sec. S3).

(2) Based on criticisms of our usage of the population dynamics model from Reviewers #1 and #3, we significantly revised our explanation of the motivation and interpretation of this model (lines 284–310 and 323–336) and our discussion of the generalizability of these results (lines 678–728), including the possible effects of interactions besides resource competition.

(3) Following a request from Reviewer #3, we have expanded our analysis of epistasis to systematically test all possible double mutants between qualitative types of trait perturbations in the model. We have added a new main text figure (Fig. 3), new SI figures (Figs. S9–S15), a new subsection in the Results (lines 344–395), and corresponding new sections in the Methods (lines 864–892) and SI (Sec. S8).

(4) Following concerns from Reviewers #2 and #3 about the limited empirical data, we have expanded our analysis of the LTEE data (new main text Fig. 4, revised text on lines 416–439, and revised SI Figs. S16–S18) and have analyzed two new benchmarking datasets for bulk fitness to test our predictions (new main text Fig. 6, new Results subsection on lines 561–590, and new SI Figs. S24 and S25).

(5) Following the criticism of Reviewer #3 about the lack of a clear recommendation on fitness quantification that provides the greatest value for a given scientific question, we have better explained what we think the scientific consequences of fitness are as a motivation for our analysis (lines 82–88, 319–322, and 615–630) and replaced the final flowchart figure with a step-by-step guide in the Methods to implement our recommendations in practice (lines 964–982).

Reviewer #1 (Public review):

The authors point out that the fitness estimates obtained from different experimental assays (monoculture, pairwise competition, or bulk competition) are not generally equivalent, not even with regard to the fitness ranking of different genotypes. Using a computational model based on experimentally measured growth phenotypes for knockout strains in yeast, as well as data from Lenski’s Long Term Evolution Experiment (LTEE), they derive a set of best practice rules aimed at extracting the optimal amount of information from such experiments.

The study is very complete on a technical level and I have no suggestions for further analyses. However, I feel the readability and the conceptual focus of the manuscript could be significantly improved by rearranging the material with regard to the contents of the main text vs. the Methods and the Supplement. Detailed recommendations:

(1) Regarding readability, the large number of references to material in the Methods and Supplement fragment the main text and make it difficult to follow.

We understand the challenges these references pose to the flow of the main text; we have attempted to keep those references to a minimum, while ensuring that technical details of the work are fully documented and referenced for completeness.

(2) Conceptually, it seems to me that the current presentation obscures the reasons why we should care about fitness in the first place. In the first paragraph of Results, the authors define fitness “as any number that is sufficient to predict the genotype’s relative abundance x(t) over a short-time horizon”. To me, this seems like an extremely narrow and not very interesting definition. Instead, I view fitness as an intrinsic property of a genotype that allows us to predict its performance under a range of conditions, including in particular conditions that are different from the experimental setup that was used to obtain the fitness estimates. The latter viewpoint is well expressed in Supplementary Section S1, where the authors discuss the notion of fitness potential. I would recommend to move at least part of this discussion to the main text.

We appreciate the reviewer’s viewpoint and have moved that conceptual discussion from the SI to the beginning of the Results section to give readers a broader perspective on fitness (lines 105–148). We use “potential” in analogy with potential energy in physics and have clarified this on lines 126–135.

What we call fitness potential, like the other notions of fitness we discuss in this paper (relative and absolute fitness), is still specific to an environmental condition. Fitness as a property intrinsic to a genotype and independent of any environment, as the reviewer mentions, is an interesting concept but beyond the scope of this paper, which is focused on analyzing fitness measurements that are inevitably environment-specific and we have clarified this on lines 142–148. While it is true that this definition of fitness is narrow, it is what can be empirically measured directly, and thus we believe it is crucial to understand how to best interpret that data.

By comparison, the arguments in favor of the logit encoding that currently opens the Results session are rather straightforward and could be shortened significantly.

We agree and have condensed this section (lines 175–192).

(3) Similarly, the modeling strategy used in this work is quite subtle and needs to be explained more fully in the main text. The authors use growth traits (lag time, growth rate, and yield) extracted from monoculture experiments on a yeast knockout collection and feed them into a specific mathematical model to simulate pairwise and bulk competition scenarios. Since a key claim of the work is that monoculture experiments are generally poor predictors of competitive fitness, the basis for this conclusion and the assumptions on which it is based need to be described clearly in the main text. In the current version of the manuscript, this information has been largely relegated to the Methods section.

We agree that our motivation for the population dynamics model and growth curve data was not clearly explained. We have significantly revised this section of the Results in the main text (lines 284–310).

In particular, we recognize the potential for misunderstanding this material we do not intend the relative fitness values calculated from this model to be interpreted as predictions of the true relative fitness between yeast deletion strains. Rather, we use the population dynamics model for our proof of principle: that the most basic features of microbial population dynamics in laboratory experiments, as captured by this model (resource competition, lag phase, growth phase, saturation), are sufficient to create discrepancies between common fitness statistics used in these experiments (different encodings, time scales, choices of reference subpopulations). We have added a statement to highlight existing work on monoculture predictors for competition outcomes [32, 34, 36, 37] on lines 453–459.

Reviewer #1 (Recommendations for the authors):

In the discussion of the LTEE in Section S8, the authors write on page 8 that “we couldn’t fit the fitted values a,b in ref. 29 so we were unable to check it”. I don’t understand this sentence - is the claim that the fit in ref. 29 was incorrect?

We have clarified this point in the SI (now Sec. S9). Our point was not that the fit in Wiser et al. 2013 is incorrect, but merely that we could not find the exact values of the fitted parameters they obtained documented in their paper, so we could not compare our own fitted parameters directly to theirs.

Also, at the end of the section, the authors refer to theory work on the long-term fitness trend in the LTEE. Here, two early references arguing for a logarithmic increase in fitness could be mentioned as well:

International Journal of Modern Physics B 12,:361-391 (1998) Evolution and Extinction Dynamics in Rugged Fitness Landscapes Paolo Sibani, Michael Brandt, and Preben Alstrøm

J. Stat. Mech. (2008) P04014 Evolution in random fitness landscapes: the infinite sites model Su-Chan Park and Joachim Krug

We thank the reviewer for providing these two references and have added them to the list of previous works on long-term fitness trends at the end of the section (now Sec. S9).

Reviewer #2 (Public review):

Summary:

The manuscript “Quantifying microbial fitness in high-throughput experiments” provides a comprehensive analysis of the various approaches to quantifying fitness in microbial evolution, focusing on three primary factors: encoding of relative abundance, time scale of measurement, and the choice of reference subpopulation. The authors systematically explore how these choices impact fitness statistics and provide recommendations aimed at standardizing practices in the field. This manuscript aims to highlight the impact of differing fitness definitions and the methodologies utilized for analysis and how that can significantly alter interpretations of mutant fitness, affecting evolutionary predictions and the overall understanding of genetic interactions in the experiments. Although this manuscript focuses on a critical issue in the quantification of fitness in high throughput experiments, it heavily relies on only one experimental dataset (Warringer et al 2003) and one organism i.e, Yeast (Saccharomyces cerevisiae) grown in a defined medium, the environmental influence is not completely captured. While the theoretical framework is strong, more experimental examples with more organisms (i.e., more datasets) in their analysis and comparison would enhance the manuscript, especially its conclusion.

We have expanded our analysis of competition data from the Long-Term Evolution Experiment in E. coli (lines 416– 439), including adding a main text figure (Fig. 4) along with the three SI figures (Figs. S16–S18). We have also added two completely different data sets that directly test our predicted discrepancies in fitness estimates from bulk competition experiments. From this data we have added a new main text figure (Fig. 6), two new SI figures (Figs. S24 and S25), and a new section at the end of the Results (lines 563–590).

We wish to clarify, though, that the aim of this study is to develop theory on fitness quantification choices and minimal examples to demonstrate the potential for discrepancies between these choices. While we appreciate the reviewer’s interest in understanding how discrepancies in fitness statistics vary across organisms and environments, that is an empirical question beyond the scope of this paper.

Strengths:

The choices for quantifying fitness in evolution experiments are critical and highly relevant given the increasing prevalence of high-throughput experiments in evolutionary biology. The authors methodically categorize fitness statistics and their implications, providing clarity on a complex subject. This structured approach aids in understanding the nuances of fitness measurement. The manuscript effectively highlights how different choices in fitness measurement can influence fitness rankings and the understanding of epistasis, which is important for modeling evolutionary dynamics.

Weaknesses:

The theoretical framework is robust, but the manuscript could benefit from more empirical examples to illustrate how different fitness quantification methods lead to varied conclusions in experiments.

Please see our response to the previous comment on this point.

The discussion on the choice of reference subpopulation could be expanded with the influence of the environment or the condition. Different types of reference groups might yield different implications for fitness calculations, and further elaboration would enhance this section.

While we agree that studying how environmental conditions affect fitness is an important and interesting problem, it goes beyond the scope of this paper, which focuses on the basic theory of quantifying microbial fitness from highthroughput experiments. Applications of this theory to empirical questions about environmental variation would be best served by their own studies. We have added a statement clarifying this goal (lines 144–148).

We are unsure how the choice of reference subpopulation is related to this issue. In our view, if the goal of a mutant fitness measurement is to predict how that mutant would behave when arising spontaneously and competing against its immediate ancestor, the gold-standard reference subpopulation must always be the mutant’s immmediate ancestor, or another mutant that is known to be phenotypically equivalent to the ancestor (e.g., neutral mutants in the case of a large mutant library). Other choices of reference subpopulations would not provide directly meaningful information in this regard.

The authors overgeneralize some findings; for instance, the implications of fitness measurement choices could vary significantly across different microbes or experimental conditions. A more detailed discussion would strengthen the conclusion.

We certainly agree that the consequences of fitness quantification choices could vary significantly across organisms and environments; our goal for this paper is to demonstrate what discrepancies are possible in principle and in particular how they depend on basic features of microbial population dynamics (e.g., variation in yield). We have added two separate paragraphs in the Discussion section to address the generalizability of our results in the context of pairwise (lines 678–710) and bulk fitness measurements (lines 711–728).

Overall, this manuscript is a significant contribution to the field of evolutionary biology, addressing a critical issue in the quantification of fitness but lacks more experimental support to make it a wider claim. By systematically exploring the factors that influence fitness measurements, the authors provide valuable insights that can guide future research - the framework is computationally thorough but needs a more detailed explanation of concepts instead of generalizing.

We have improved our explanation of several of the important concepts. In particular, we have significantly revised our explanation of the population dynamics model (lines 284–310) to emphasize its role as a null model to demonstrate how fundamental aspects of microbial growth are sufficient to cause discrepancies between fitness statistics. We have also revised two paragraphs on the generalizability of our results in the Discussion section (lines 678–728).

Further work is needed, particularly to incorporate empirical examples and expand certain discussions to include environmental variation and their impact, which would improve clarity and applicability.

We have added a sentence at the beginning of the Results section to acknowledge the environmental dependence of fitness (lines 142–148). We believe further discussion of that issue is beyond the scope of this paper, as it would require a significant amount of additional data and/or environmental modeling.

Reviewer #2 (Recommendations for the authors):

In addition to the comments from the previous sections, other specific comments:

(1) Figure 5 needs to be populated with additional parameter details. For example, include brief descriptions of each parameter involved in the encoding, time scale, and reference choices. This will help users understand the implications of each choice. Adding these details will make the flow diagram more comprehensive, aiding researchers in implementing these steps more clearly.

Following this comment and another comment about this figure from Reviewer #3, we decided to replace this figure with a new Methods section with step-by-step instructions (lines 964–982).

(2) Duplication in Line 620: “Nevertheless, the fact that we see the fact that we see...” This redundancy needs to be corrected.

We thank the reviewer for pointing this out; we have rewritten this paragraph.

(3) More experimental data comparisons and their assessment concerning various microbial systems and multiple environmental conditions are recommended to support the claim.

Please see our responses to the related public comments.

Reviewer #3 (Public review):

Summary:

The authors present analyses of different fitness measures derived from empirical data from yeast knockout mutants and the long-term evolution experiment (LTEE) with Escherichia coli to explore discrepancies and identify preferred methods to estimate relative fitness in high-throughput experiments. Their work has three components. They first discuss the different “encodings” of relative abundance data and conclude that logit transformations are preferred because they transform nonlinear abundance trajectories into linear trajectories with greater predictive power. Next, they compare per-generation with per-growth cycle relative fitness estimates inferred from simulations of pairwise competitions based on published growth traits for the yeast strains and on published pairwise competition measurements for the LTEE data. Both data sets show quantitative and qualitative (i.e. rank order) discrepancies of estimates across different time scales, which are highlighted by considering possible underlying causes (i.e. trade-offs between growth traits) and consequences (i.e. epistasis among mutations affecting different growth traits). Finally, the authors compare simulated pairwise and bulk (i.e. where many mutants compete during a growth cycle in a single environment) competition assays based on the yeast knock-out mutants and demonstrate an optimal ratio of collective mutants to wild-type strains that minimizes both sampling error and overestimation of fitness estimates when compared with pairwise competitions.

Strengths:

The study deals with a highly relevant topic. Fitness is central to general evolutionary theory, but also poorly defined and implies different traits for different organisms and conditions. For microbes, which are often used in evolution experiments, high-throughput experiments may yield different measures to quantify abundance over time, from individual growth traits to bulk competition experiments. Hence, it is relevant to consider discrepancies among those measures and identify preferred measures with respect to predicting population dynamics and evolutionary processes. The present study contributes to this aim by (i) making readers aware of differences among commonly used fitness estimates, (ii) showing that simulated (yeast) and calculated (E. coli) competitive fitness may differ across time scales, and (iii) showing that bulk competitions may yield relative fitness estimates that are systematically higher than pairwise competitions. The study is rather thorough on the theory side, with extensive derivations and analyses of various fitness measures using their resource competition model in the Supplementary Information. The study ends with a few practical recommendations for preferred methods to infer relative fitness estimates, that may be useful for experimentalists and stimulate further investigations.

Weaknesses:

The study has several limitations. Perhaps the most apparent limitation is the lack of a clear answer to the question of which fitness measure is best “in the light of first principles”. The authors show clear discrepancies between fitness estimates across different time scales or using different reference genotypes in bulk competition and provide useful recommendations based on practical considerations (e.g. using pairwise competitions as the “golden standard”), but it remains unclear whether these measures provide the greatest value for the questions researchers may want to answer with them (e.g. predict shifts in genotype frequencies).

We agree on the importance of considering the scientific questions researchers want to answer in determining the best way to quantify fitness. We have revised both the Introduction (lines 82–88) and the Discussion (lines 615–630) to more clearly explain possible downstream questions researchers may wish to answer with fitness data, and thus why discrepancies in that data based on analysis choices may be important.

We believe that the text does provide a specific recommendation (second subsection of the Discussion, lines 635– 658) for how to quantify relative fitness: using the logit encoding (rather than other encodings), measuring fitness per-cycle (rather than per-generation), and using the wild-type or a phenotypically-equivalent proxy as reference subpopulation to calculate pairwise fitness in a bulk competition (rather than using the mutant library as a whole). This recommendation is based on first principles: the logit encoding is based on the principle of the logistic equation as the null model of relative abundance dynamics (lines 635–637), the choice of the per-cycle timescale is based on the principle that in non-steady state environments the time scale for measuring selection should not depend on the wild-type growth (lines 640–645), and the choice of reference population is based on the principle that a mutant’s fitness should serve as a predictor of its dynamics when arising de novo at low frequency and competing against its wild-type (lines 648–653).

A second limitation is that the authors analyse fitness differences arising solely from resource competition, whereas microbes often interact via other mechanisms, e.g. the production of anticompetitor toxins, cross-feeding of metabolites, or lack of growth to enhance their persistence in stress conditions. Without simulations of these processes, understanding discrepancies among fitness measures is necessarily limited.

We agree that other interactions are important in many microbial ecosystems and could affect measurements of fitness. We discuss the possibility of these other interactions and their potential consequences for fitness on lines 697– 710.

We focus on resource competition in this paper, however, for two reasons. One is that we are using it as a null model: resource competition is always present, and thus it provides an important baseline for discrepancies in fitness statistics in the absence of any other assumptions. Indeed, our results are that this minimal assumption alone is sufficient to produce a wide range of significant discrepancies, which provides the proof of principle that choices of fitness quantification matter. We have clarified this in a revised explanation of the population dynamics model on lines 294–304.

The second reason is that fitness measurements of the type discussed in this paper are typically performed on mutants that have only small genetic differences with their ancestor (e.g., a point mutation or gene deletion). While more complex interactions between such similar genotypes are not impossible, we expect them to be rare, in which case resource competition is the only interaction. Explicit modeling of other interactions is an important question for future work, but would require more detailed models and data of those phenomena, and thus would go beyond the scope of the present study. We have added a sentence to explain our emphasis on resource competition on lines 298–301 and 690–697.

In addition, the analysis of trade-offs between growth traits causing these discrepancies during resource competition seems confounded by biases in measurement error or parameter estimation, at least for growth rate and lag time (Figure 2B), where the replicate estimates for the wildtype show a similar negative correlation.

The tradeoff between growth traits was only an incidental observation and is not necessary for the fitness statistic discrepancies we analyze in this paper; the only important pattern in the growth traits is the existence of mutants with reduced yields (so as to reduce the wild-type log fold-change in a competition) as well as variation in one other trait under selection (lag time or growth rate in this model). We have clarified this mechanism on lines 328–336, which is demonstrated by Fig. S7. Since these tradeoffs are not relevant to the results and we agree that their significance may be unreliable due to the noisiness of the data, we have removed mention of them.

Third, the study does not validate relative fitness predictions from growth traits (as is done for the yeast mutants) with measured relative fitness estimates using competition assays, while such data are available, e.g. for the LTEE. This would strengthen their inferences about preferred fitness measures.

The goal of our modeling with the yeast growth trait data is not to test the ability to predict competition experiments from monoculture data; that has been the focus of previous studies [32, 34, 36, 37]. Rather, we use the population dynamics model for a proof of principle: that the most basic features of microbial population dynamics in laboratory experiments, as captured by this model (resource competition, lag phase, growth phase, saturation), are sufficient to create discrepancies between common fitness statistics used in these experiments (different encodings, time scales, choices of reference subpopulations). The yeast growth curve data merely provides realistic parameters for this model, to ensure we are studying a biologically relevant regime of the dynamics. To avoid this misconception, we have revised our explanation of this model and the data on lines 284–310.

Fourth, the analysis of epistasis between mutations affecting different growth traits (shown in Figure 3) based on the LTEE data could be better introduced and analysed more comprehensively. Now, the examples given in panels C-F seem rather idiosyncratic and readers may wonder how general these consequences of using fitness estimates based on different time scales are.

We agree that this analysis was incomplete and missed an opportunity to emphasize this important consequence of fitness quantification. We have thus expanded this analysis into a systematic test of all possible double mutants between qualitative types of trait perturbations in the model. We have added a new main text figure (Fig. 3), new SI figures (Figs. S9–S15), a new subsection in the Results (lines 346–395), and corresponding new sections in the Methods (lines 864–892) and SI (Sec. S8).

Finally, the study is generally less accessible to experimentalists due to the extensive and principled treatment of specific population dynamic models and fitness inferences. This may distract from the overarching aim to identify fitness measures that are most accurate and useful for predictions of population dynamics and evolutionary processes.

We appreciate this concern as we do hope to make the paper as broadly accessible as possible, especially to experimentalists who measure microbial fitness. To this end, we have reduced the technical discussion of encodings in the first section of the Results (lines 164–187); revised explanations of the population dynamics model (lines 284–310), importance of growth trait variation (lines 328–336), and epistasis (lines 346–395) to better emphasize the conceptual intuition of these parts; and added a step-by-step guide for our recommended best practices of quantifying fitness in bulk competition experiments (lines 964–982).

In this light, the motivation for the initial discussion of the importance of how to best encode relative abundance (Figure 1) is unclear. Also, the conclusion, that logit encoding is preferred, because it linearizes logistic growth dynamics and “improves the quality of predictions”, is not further motivated. Experimentalists using non-linear models to infer fitness from growth curves or competition assays may miss the relevance of this discussion.

The motivation for the discussion of encodings is that it is one of the choices made differently by researchers, mainly using either the logit (more common in experimental evolution and population genetics studies) or log encoding (more common in TnSeq analyses). As such we believe it is important to explain where this choice comes from (a transformation of relative abundance data to make it approximately linear in time, and thus amenable to characterization by a single slope parameter) and why we believe the logit encoding is more logical in most cases. We have streamlined and revised this subsection to make it clearer (lines 164–187).

Our argument for favoring the logit encoding in most cases is based on the logistic model being a null model for relative abundance dynamics (Sec. S3). In light of the reviewer’s comments, we have realized this may be confusing because there are two common usages of logistic dynamics that are biologically distinct. What we mean by logistic model is the dynamics of relative abundance x of a mutant in competition with other genotypes:

Here s turns out to be the relative fitness under the logit encoding. On the other hand, researchers also use a logistic ODE to describe the dynamics of absolute abundance N of a single strain in monoculture (e.g., as in a growth curve):

We believe the reviewer’s last point refers to Eq. (2), whereas our argument about the logit encoding is based on Eq. (1). We have added a note to clarify this distinction for the reader (lines 192–196).

Reviewer #3 (Recommendations for the authors):

In addition to my general comments in the public review, I have several more specific recommendations:

(1) Line 183-189: unclear why logit-based relative fitness is preferred. Abundance data are not typically binomial.

We agree this claim about abundance data was incorrect and have removed it. We have revised the section to focus on motivating the logit encoding from logistic dynamics of relative abundance as a null model for most systems (main text lines 175–187 and Sec. S3).

(2) Line 205: it may be mentioned that s(logit) is the same as the “selection rate constant” often used in microbial studies.

We have added a sentence clarifying the equivalence of the logit-encoded relative fitness to the selection coefficient in population genetics (lines 188–190).

(3) Line 368: why do mutations that increase biomass yield also increase WT LFC? Is this, because they grow slower and hence allow the WT more time to grow?

Mutants with higher yield allow the wild-type to achieve higher log fold-change because those mutants consume fewer resources per cell, which frees up more resources for the wild-type to consume and increase its overall growth. It’s not about growth rate or time, as this would occur even for mutants whose growth rates are identical to the wild-type’s. We have revised our explanation of how variation in growth traits differentially affects fitness statistics (lines 323–340) and epistasis (lines 361–378).

(4) Line 382-386: you may want to cite Ram et al. (2019, 10.1073/pnas.1902217116), who also did such analyses for experimental data from E. coli.

We have cited this work as Ref. [34].

(5) Line 415: perhaps use “bulk relative fitness” instead of “total relative fitness”, to contrast with “pairwise relative fitness”.

We acknowledge the language in this section can be subtle. However, “bulk” is not a sufficient identifier for the concept of total relative fitness as bulk competition experiments (with many genotypes competing simultaneously) can be used to measure either total relative fitness or pairwise relative fitness. (In pairwise competition experiments with only two genotypes, these two types of fitness are identical.) As such we adhere to our original language but have added words to clarify which type of experiment (bulk or pairwise) we are talking about in a given context (e.g., on lines 495–504).

(6) Line 451-453: why does a population in bulk competition consume resources more slowly than in pairwise competitions?

Mutant libraries used in bulk competition experiments usually include a large number of deleterious mutants, which grow more slowly than the wild-type. Thus these populations typically consume resources more slowly than a population in a pairwise competition would, where a large part of the population is the wild-type.

(7) Line 565: I don’t understand how one can compare relative fitness to other timescales.

Relative fitness, as we’ve defined it, has units of rate, since it describes the rate of change of relative abundance (or an encoding of it) over some time scale (e.g., a batch growth cycle or a generation). Therefore it can be compared to other times scales of the system, such the rate of new mutations arising or the rate of genetic drift fluctuations, as long as they are measured in the same units. This comparison is important to population genetics analyses, such as determining whether the population is in the strong selection-weak mutation limit or the clonal interference regime.

(8) Line 620 repeats text.

Thank you, we have revised this paragraph and removed the typo.

(9) Figure 1C+D: the link between the scenarios on the left and the graphs on the right may be better explained. For example, it may help to make explicit that the 4 scenarios in panel C show the same relative fitness per cycle and that mutant and wildtype have the same growth rate, but different growth periods in both scenarios in panel D. It is also unclear whether the grey dot links to the upper scenario in D.

We have clarified this issue in the caption and changed the colors to avoid this confusion.

(10) Figure 2E: it is unclear why “mutants with equal fitness are assigned the lowest rank”.

This was a technical comment about how to handle ties in our analysis of mutant rankings, but it is moot since no exact ties actually occur in our simulations. We have removed this remark to avoid confusion.

(11) Figure 2F: the axis labels are confusing, as for the WT estimates no LFC mutant exists. It would also help to make explicit in the legend against which WT replicate/reference strain each strain has competed.

We agree the inclusion of wild-type replicates in this plot was confusing and unnecessary, so we have removed them. The mutants compete against a wild-type with traits defined by their median values across all wild-type replicates; this is noted in Fig. 2A and the Methods section on our analysis of this data (lines 809–813).

(12) Figure 5: I am not sure this is needed, as its information is rather limited.

We agree and have removed this figure.

https://doi.org/10.7554/eLife.102635.2.sa0

Quantifying microbial fitness in high-throughput experiments

Peer review process

Editors

Be the first to read new articles from eLife