# Abstract

The adaptive dynamics of evolving microbial populations takes place on a complex fitness landscape generated by epistatic interactions. The population generically consists of multiple competing strains, a phenomenon known as clonal interference. Microscopic epistasis and clonal interference are central aspects of evolution in microbes, but their combined effects on the functional form of the population’s mean fitness are poorly understood. Here, we develop a computational method that resolves the full microscopic complexity of an evolving population subject to a standard serial dilution protocol. We find that stronger microscopic epistasis gives rise to fitness trajectories with slower growth independent of the number of competing strains, which we quantify with power-law fits and understand mechanistically via a random walk model that neglects dynamical correlations between genes. We show that clonal interference leads to fitness trajectories with faster growth (in functional form) without microscopic epistasis, but has a negligible effect when epistasis is sufficiently strong, indicating that the role of clonal interference depends intimately on the underlying fitness landscape.

# 1 Introduction

Laboratory evolution experiments have demonstrated the widespread prevalence of microscopic epistasis, the tendency for the phenotype associated with a mutation to depend on the background genotype in which it emerged (Khan et al., 2011; Chou et al., 2011; Wang et al., 2013; Good and Desai, 2015; Bakerlee et al., 2022; Kryazhimskiy et al., 2014; Diaz-Colunga et al., 2022). Basic evolutionary theory indicates that for moderate mutation rates the overall population will consist of many competing strains, because additional mutations can emerge before an existing mutation has time to fix in the culture (Gerrish and Lenski, 1998; Desai and Fisher, 2007; de Visser and Rozen, 2006; Park and Krug, 2007). This “clonal interference” is consistently observed in laboratory experiments such as Lenski’s long-term evolution experiment (LTEE) (Fogle et al., 2008; Lenski et al., 1991; Lenski and Travisano, 1994; Lenski, 2017; Wiser et al., 2013). Yet, despite the agreed-upon ubiquity of both aspects of evolution, there are few quantitative predictions for how they affect some of the most common experimental outputs, such as the mean fitness of the population. The central difficulty arises from the need to treat both the population and the genome at the microscopic level, which requires sophisticated analytical tools or high-resolution experiments.

The traditional approach in evolutionary theory is to make use of assumptions and statistical model classes that sidestep these complexities. Most theoretical studies of long-term adaptation take place in the “strong selection weak mutation” (SSWM) limit (Gillespie, 1983, 1984, 1991; Orr, 2002; Good et al., 2012), where the time for a beneficial mutation event to occur is large in comparison to the time for a typical beneficial mutation to fix in the population. While convenient due to analytical simplifications, this limit neglects clonal interference by ensuring that the culture consists of a single dominant strain at almost all times, and is hence known to be invalid for populations under standard laboratory conditions (de Visser and Rozen, 2006). As an unfortunate by-product, even if the predictions of a model are found to match experimental data, it is not clear how the addition of clonal interference will change the results.

To avoid the analytical and computational challenges associated with modeling the genome at microscopic granularity, significant theoretical effort has been spent studying macroscopic “rugged” or “uncorrelated” models (Kauffman and Levin, 1987; Kauffman and Weinberger, 1989; Wilke, 2004; Macken and Perelson, 1989; Flyvbjerg and Lautrup, 1992; Kingman, 1978), which posit the fitness of a mutant to be drawn at random from a fixed distribution (Park and Krug, 2008). More recently, macroscopic “fitness-parameterized” models – which assume the distribution of fitness effects (DFE) depends only on the fitness of the parent – have garnered interest as a way to model correlations in the fitness landscape (Kryazhimskiy et al., 2009). Although macroscopic models have provided significant insight into evolutionary dynamics, both classes exhibit serious disadvantages. Rugged landscapes make predictions that are known to violate experimental measurements, such as the typical length of an adaptive walk (Orr, 2006). Fitness-parameterized models can help correct some of these issues, but to do so they require an assumption about how the DFE depends on fitness, which is typically unknown because of its experimental intractability.

One approach that corrects the deficiencies of macroscopic models is the use of microscopic models, which treat the genome as a sequence of loci each with a binary label indicating the presence or absence of a mutation. Perhaps the most well-studied microscopic model in evolutionary theory is Kauffman’s NK model (Kauffman and Levin, 1987), but similar microscopic models can be obtained systematically via Fourier expansion (Neher and Shraiman, 2011). This approach leads to the class of spin glass models well-studied in the statistical physics (Sherrington and Kirkpatrick, 1975; Sompolinsky and Zippelius, 1982; Arous et al., 2001), theoretical neuroscience (Amit et al., 1985a,b; Hopfield, 1982), ecology (Roy et al., 2020), machine learning (Choromanska et al., 2015), and combinatorial optimization literatures (Mezard and Montanari, 2009; Mezard et al., 1987). Recently, spin glass models have been used to study the role of epistasis in adaptive dynamics, leading to insight into the generation of slow, logarithmic fitness trajectories (Guo et al., 2019) and how macroscopic epistasis emerges from widespread microscopic interactions (Reddy and Desai, 2021). But due to the computational expense of full-scale microscopic simulations, these prior works make the SSWM assumption, which greatly simplifies the resulting adaptive dynamics to a process that is analytically tractable.

In order to understand the role played by microscopic epistasis and clonal interference in real-world evolving populations, we present a systematic numerical study of the evolutionary dynamics of a microbial culture under serial dilution on a microscopic fitness landscape far from the SSWM regime. To do so, we consider a spin-glass type model that enables us to independently tune the magnitude of epistasis and the level of clonal interference in the culture. The model contains both additive and epistatic terms. The relative magnitude of the two terms can be adjusted, and the epistatic interaction takes place pairwise between two loci on the genome.

To resolve the adaptive dynamics, we develop a high-performance, OpenMP-based multi-threaded implementation of the resulting stochastic process in C++, which we use to study an adaptive walk to a fitness peak comprising a few hundred fixation events. The implementation leverages several algorithmic advances to capture the complete microscopic details of the process over long timescales: an efficient algorithm for computation of the fitness that leverages the structure of the epistatic interaction, a hashing-based method for storage of strains by genotype for fast splitting and joining, and an efficient approach for diluting the culture. These algorithmic advances render tractable the computation of the entire distribution of fitness effects, the complete sequence of fixed mutations along with their individual effects, and the number of remaining beneficial mutations for any strain at any time with realistic population sizes (one hundred million bacteria) over realistic timescales (tens of thousands of generations). In the strong clonal interference regime, tracking all microscopic details generates hundreds of gigabytes of data; the resulting datasets are processed by custom-built Python code that produces standard, experimentally-measurable observables.

Our framework brings insight into both real-world experimental systems and modern approaches in evolutionary theory. To this end, we show that hill-climbing dynamics on a random and sparse fitness landscape with two-point interactions cannot give rise to the slow, low-exponent power law trajectories observed in Lenski’s LTEE even with clonal interference, suggesting that other factors such as structured interactions might be at play. Moreover, we show that any macroscopic fitness-parameterized model used to describe a microscopic process must depend intrinsically on the level of clonal interference in the population, implying that the DFE in a macroscopic model for an experimental system must be tuned to the mutation rate of the culture.

The paper is organized as follows. In Section 2, we describe details of the fitness landscape and the simulation. In Section 3.1, we describe how microscopic epistasis slows fitness trajectories independent of the level of clonal interference. In Sections 3.2 and 3.3, we consider the use of macrosopic models to interpret and explain mechanistically the effect of microscopic epistasis. In Section 3.4, we show how the effect of clonal interference depends on the strength of microscopic epistasis, and that fitness parameterized models must be tuned to the level of clonal interference in the population. We conclude with a discussion and suggest some extensions in Section 4.

# 2 Model details

## 2.1 Basic setup

To study the effect of microscopic epistasis on the average fitness trajectory, it is useful to model the genome as a sequence of sites and to consider fitness landscapes that specify the fitness as a function of the state of the genome. We study a generic finite-sites microscopic model (Figure 1) inspired by the Sherrington–Kirkpatrick spin glass in statistical physics (Sherrington and Kirkpatrick, 1975). In the next subsection, we elaborate on the generality of a model of this form.

### The fitness model

With *L* denoting the length of the genome, the fitness of a strain with genotype ** α** ∈ {±1}

^{L}is given by the expression

Here, *F*_{off} is an arbitrary offset value that can be used to fix the initial fitness independent of the initial genotype. Inspired by experimental competition assays in typical laboratory microbial evolution settings, we compare the fitness of a given strain to the fitness of the ancestral strain. To do so, we choose *F*_{off} so that the fitness of the ancestral strain is equal to one; this arbitrary shift has no effect on the dynamics or our conclusions.

In (1), each *h _{i}* represents the instantaneous contribution of a mutation at gene

*i*to the fitness of the strain in the absence of epistasis. Each

*J*=

_{ij}*J*describes the microscopic epistasis between mutations at genes

_{ji}*i*and

*j*. Realistic biological fitness landscapes are thought to be rugged, containing many local extrema. By taking the values of

*h*and

_{i}*J*to be random, (1) gives rise to such a complex fitness landscape.

_{ij}### Disorder statistics

Because biological networks are typically sparse (Tong et al., 2004; Costanzo et al., 2016), our model is such that each gene only interacts on average with a fraction 0 < *ρ* ≤ 1 of the other genes. *ρ* is typically on the order of a few percent in realistic networks, and we set *ρ* accordingly. These considerations lead to the choice of distributions

with *J _{ii}* = 0 for all

*i*. Above, N(

*μ*,

*σ*

^{2}) denotes a normal distribution with mean

*μ*and variance

*σ*

^{2}, and Ber(

*ρ*) denotes a Bernoulli distribution with parameter

*ρ*. The variances are set to and where 0 ≤

*β*≤ 1 sets the magnitude of microscopic epistasis and Δ > 0 sets the magnitude of generic fitness increments. This scaling with

*β*ensures that the fitness increase at initialization is approximately independent of

*β*. This choice provides a useful setting to qualitatively compare the speed of fitness trajectories in addition to quantitative measures such as fitting parametric functional forms (Guo and Amir, 2022). Typical fitness increments in laboratory experiments are on the order of a few percent of the ancestral strain’s fitness (Barrick et al., 2009), and we choose Δ to match this observation. Our conclusions about the roles of epistasis and clonal interference (and their mechanisms) do not depend on the specific choice.

### Fitness trajectories

The fitness landscape is defined by a sparse and random genetic interaction network (Figure 1A). Adaptation dynamics produce fitness trajectories (Figure 1B) qualitatively consistent with experimental observations of long-term evolution in microbial populations (Lenski, 2017), leading to power-law trajectories with exponents between 0.45 and 3.1. For details on how these trajectories and exponents are produced, see Appendices A & B.

### Landscape structure

The parameter *β* can be used to continuously tune the ruggedness of the landscape (Figure 1C). For *β* = 0, there is no microscopic epistasis, and the evolutionary dynamics corresponds to a hill-climbing event towards the single fitness maximum with value ∑_{i}|*h _{i}*|. In the opposite extreme for β = 1, mutations do not have any effect independent of the genetic background, and the landscape is rife with local fitness maxima. By systematically varying this parameter and observing its effect on the average fitness trajectory throughout the approach to a fitness maximum, we quantify the role of microscopic epistasis in

*slowing*the functional form (i.e., reducing the power-law exponent) of the fitness trajectory both with and without clonal interference.

## 2.2 Choice of model

There are several compelling reasons to study the fitness model in Equation (1) that we now highlight.

### Generality

Equation (1) represents the two lowest-order terms in a Fourier expansion of an arbitrary function defined on the Boolean hypercube (Neher and Shraiman, 2011). As such, our model represents a rigorously quantifiable approximation of *any* choice of fitness model defined on the hypercube.

### Tunable epistasis

The presence of the continuous parameter *β* allows us to systematically vary the relative contribution of the epistatic interaction, enabling a detailed study of the effect of microscopic epistasis on the dynamics of adaptation. The well-known NK model (Kauffman and Levin, 1987) similarly contains a parameter (*K*, the number of genes in an epistatic interaction) that can be used to tune the ruggedness of the landscape. However, this parameter is discrete and changing it leads to a more drastic shift in the structure of the fitness landscape.

### Tunable clonal interference

For low mutation rate, the population is in the SSWM regime and the evolutionary dynamics correspond to sequential sweeps of beneficial mutations throughout the population, which consists of a single dominant strain (Figure 1D). As the mutation rate increases, the adaptation dynamics become richer, enabling higher-order effects such as multiple mutations (Weissman et al., 2009), stochastic tunneling (Iwasa et al., 2004; Guo et al., 2019), and competing populations, all of which emerge naturally in our simulations.

### Efficiency

Equation (1) has significant computational advantages—summarized by Lemma C.1— that enable us to develop a large-scale simulation. Leveraging algebraic simplifications intrinsic to the model’s mathematical structure, the fitness of a given strain may be computed as a correction to the fitness of a reference strain that is updated adaptively to track the state of the population. The resulting “adaptive fast fitness computation” is several orders of magnitude faster than a naive calculation, and our ability to simulate to long times in the strong clonal interference regime hinges upon it.

# 3 Results

## 3.1 Landscape ruggedness slows the fitness trajectory

We first study the effect of microscopic epistasis on the functional form of the fitness trajectory in both the SSWM (*μ* = 10^{−8}) and clonal interference (*μ* = 2 × 10^{−4}) mutation regimes (Figure 2). Intuitively, by complicating the fitness landscape and increasing the difficulty of the corresponding optimization problem, we expect that greater levels of microscopic epistasis may lead to a slower fitness trajectory with a longer duration required to reach a peak. We find that the value of the fitness peak increases slightly with increasing *β*. To eliminate this variability, each mean fitness trajectory is normalized to lie between the values 1 and 2 for visualization. The fitness trajectory takes more time to approach its asymptotic value as *β* increases, indicating a slower approach to equilibrium (Figure 2A/B).

Insight into the mechanism by which epistasis slows the fitness trajectory can be obtained by visualizing the substitution trajectories (Figure 2C/D), which describe the number of mutations that have fixed in the population at time *t*. The substitution trajectories demonstrate that increasing the amount of microscopic epistasis smoothly leads to an accumulation of more fixed mutations at each time. Because the initial rank (the number of available beneficial mutations) is identical for each value of *β*, the substitution trajectories suggest a simple picture: as *β* increases, a greater number of new available beneficial mutations are generated per each typical fixation event. Moreover, because more mutations are needed to cease adaptation, each typical fixation event must provide less progress towards the fitness peak. Mathematically, this corresponds to a greater prevalence of “flat regions” in the fitness landscape, which have been identified as a source of slow dynamics in previous studies of spin-glass physics (Kurchan and Laloux, 1996).

These observations highlight the role of microscopic epistasis in slowing down long-term evolutionary dynamics, but they do not make a quantitative claim about the functional form of the fitness trajectory. To make such a claim, we can fit the data to a predictive model and study how the model parameters depend on *β*. The functional form of the fitness trajectory can be computed analytically in the SSWM regime in a fitness-parameterized context approximately met by our fitness model with *β* = 0 (Good and Desai, 2015); the resulting trajectory is given by the power-law relaxation

with *F*_{∞} = ∑_{i}|*h _{i}*|,

*c*= 2, and

*F*

_{0}= 1 by our choice of

*F*

_{off}. We find empirically that in both the SSWM and clonal interference regimes, the power-law in (2) provides a good fit to the mean fitness trajectory for

*β*> 0.

We fit (2) to the mean fitness trajectory (see Appendix B details) over a range of values of *β* (Figure 2E). In both regimes, the relaxation exponent c decreases monotonically with increasing *β*, indicating a quantitative slowdown of the fitness trajectory with increasing levels of microscopic epistasis. In the next section, we will show that we can estimate the exponent *c* = 0.5 for *β* = 1.0 in the SSWM regime.

## 3.2 A fitness-parameterized mapping

The mechanism suggested by the substitution trajectories can be confirmed by mapping the microscopic model to a fitness-parameterized landscape with two effective parameters: a single beneficial fitness increment given by the expected beneficial fitness increment 〈Δ*F _{b}*〉

^{1}, and a beneficial mutation rate set by the rank

*R*(Figure 3). The reduction to a few-parameter fitness-parameterized model has been justified both theoretically (Good et al., 2012) and experimentally (Hegreness et al., 2006) in the clonal interference regime, and we find that it similarly provides useful insight in the SSWM regime.

In both mutation regimes, for all nonzero values of *β*, the expected beneficial increment behaves linearly as a function of fitness (Figure 3B/D). This phenomenon is known as “macroscopic” or “global” epistasis, and has been shown to be an emergent property of a class of finite-sites models similar to the one considered here (Reddy and Desai, 2021). Consistent with the observations of the previous section, the expected beneficial increment (scaled relative to the fitness peak) decreases with *β* at fixed fitness, which highlights that microscopic epistasis tends to reduce the progress towards the peak provided by each typical fixation event.

Unlike the beneficial fitness increment, the rank *R*(*F*) exhibits more variation in functional form as *β* is varied, which progresses towards linearity as *β* tends to one (Figure 3A/C). Considering only *β* > 0, *R*(*F*) decreases with increasing *β* at each *F*; including *β* = 0, this is also true for sufficiently large *F*. This indicates that at any fixed distance from the fitness peak, it is less likely to achieve a beneficial mutation.

Taken together, these observations demonstrate that microscopic epistasis leads to a slower fitness trajectory through two complementary effects. At each fixed value of fitness, beneficial mutations are less likely to be found by random mutation; moreover, more of them are required to reach the peak due to a lower typical (normalized) increment. These additional beneficial mutations are generated by the epistatic interaction as each fixation event occurs.

### Quantitative model

These arguments can also be justified mathematically, leading to a quantitative prediction of *c* = 0.5 for *β* = 1, consistent with the result in Figure 2E. In such a two-parameter macroscopic model, the fitness evolves according to the dynamics

where each quantity on the right-hand side is evaluated at *F*(*t*). In the SSWM regime, according to Haldane’s formula (Haldane, 1927). The preceding paragraphs demonstrate that decreases as a function of *β* for each fixed *F*, giving rise to a slower trajectory. From Figure 3A/B, both *R*(*F*) and 〈Δ*F _{b}*〉(

*F*) are approximately linear and reach zero at

*F*= 2. Hence

*R*≈

*k*〈Δ

*F*〉 for a fixed

_{b}*k*> 0. Equation 3 then reads

which predicts that asymptotically

for a fixed *B* > 0. This prediction provides a complement to the analytical result of *c* = 2 in the SSWM regime with *β* = 0.

## 3.3 A random walk model

The previous sections provided an explanation for the effect of microscopic epistasis on long-term adaptation dynamics: as the level of microscopic epistasis increases, the typical number of available beneficial mutations at fixed fitness decreases while the number of fixed mutations required to reach the peak increases, leading to a slower trajectory. In this section, we formulate a mechanistic model that provides a qualitative explanation for why microscopic epistasis generates new beneficial mutations and slows the trajectory.

### Distribution of increments

The previous section highlighted that the mechanism is common to both the SSWM and clonal interference regimes, for which reason we resort to the SSWM limit in the subsequent analysis. In the SSWM limit, a single empirical distribution of fitness effects is induced by the fitness landscape

where Δ*F _{i}*(

*t*) = −2

*α*(

_{i}*t*) (

*h*+ ∑

_{i}_{j}

*J*(

_{ij}α_{j}*t*) is the fitness effect of a mutation at gene

*i*at time

*t*. Because there is only a single strain, the dynamics of adaptation can be characterized entirely by the evolution of

*ρ*in time. When a mutation at site

_{t}*i*fixes (which can only occur if Δ

*F*(

_{i}*t*) > 0), the corresponding increment is updated:

In the absence of microscopic epistasis, each such fixation event would decrease the rank by one until all available beneficial mutations have fixed. However, due to microscopic epistasis, the fixation of a mutation at gene *i* causes a change in all other fitness increments:

The update to the fitness increment in (6) is complex due to the presence of correlations between *α _{i}* and

*α*induced by the coupling

_{j}*J*. Moreover, the distribution of

_{ij}*α*must be conditioned on the event that Δ

_{i}α_{j}J_{ij}*F*> 0. Previous studies in spin glass (Horner, 2007; Eastham et al., 2006) and electron glass (Mogilyanskii and Raikh, 1989; Amir et al., 2008) physics have obtained significant physical insight by neglecting dynamical correlations, and here we take a similar approach.

_{i}### Update statistics

The neglect of dynamical correlations implies that the effect of each fixation event is to add a random Gaussian noise term with probability *ρ* (the network sparsity parameter) to all other increments,

Above, *μ _{β}* and are mean and variance parameters of the noise distribution; these can be estimated numerically from data to account for initialization from a state with

*R*= 100 and to condition on beneficial mutation events (Appendix B). This process corresponds to a biased random walk (Equation (7)) with nonlocal transport (Equation (5)) on the fitness increments.

### Coarse-grained simulation

To test this mechanistic model, we developed a coarse-grained simulation methodology based on Gillespie’s stochastic simulation algorithm (Gillespie, 1976) (Appendix B). Fitting the power law in (2) to the mean fitness trajectories shows that the random walk model predicts a monotonically decreasing exponent and an increased number of fixed mutations to reach a local fitness maximum as a function of *β* (Figure 4A). This is qualitatively consistent with the results of applying the same coarse-grained simulation approach in the SSWM approximation to the full fitness model (Figure 4B). Due to the neglect of correlations, the random walk model predicts a lower number of generated beneficial mutations and a correspondingly higher fitness exponent.

### Why are beneficial mutations generated?

Because only beneficial mutations can fix in the SSWM limit, the transport in (5) is asymmetric, and beneficial mutations will be rapidly converted to deleterious mutations with equal magnitude but opposite sign. The noise in (7) broadens the distribution of fitness increments, which converts deleterious mutations with small magnitude into beneficial mutations with small magnitude and vice-versa. The buildup of deleterious mutations results in a diffusive flux from the deleterious half to the beneficial half, providing a simple mechanism for the formation of new beneficial mutations (Figure 4C). Empirically, we find that increases with *β*, driving the generation of a greater number of mutations with increasing *β* and correspondingly slower trajectory.

## 3.4 The effect of clonal interference

Clonal interference is known to reduce fixation probabilities (Gerrish and Lenski, 1998; Lin et al., 2020b), and hence to slow down the rate of adaptation when compared to an SSWM model with the same parameters. Given its prevalence in realistic evolving laboratory populations, models incorporating both clonal interference and macroscopic epistasis have been developed to predict the slow power law fitness trajectory observed in Lenski’s long-term evolution experiment (Wiser et al., 2013). Despite this interest, the effect of clonal interference on the shape of the fitness trajectory is still poorly understood.

In fitness-parameterized models, for sufficiently weak clonal interference, the way clonal interference affects the shape of the fitness trajectory depends on how the beneficial mutation rate changes with fitness (Guo and Amir, 2022). In particular, for typical models where beneficial mutations become less prevalent as the population climbs up the hill, clonal interference *accelerates* the fitness trajectory. While this result provides insight into the role of clonal interference in laboratory populations, it is not clear if it holds more generally in a microscopic framework, or when the magnitude of clonal interference becomes large. Here, we demonstrate that the effect of clonal interference on the fitness trajectory depends on the strength of microscopic epistasis. In its absence, clonal interference is seen to accelerate the trajectory, while for a sufficiently strong epistatic interaction, clonal interference does not quantitatively affect the speed of the trajectory.

In simulation, the magnitude of clonal interference is tuned by adjusting the mutation rate *μ*. Increasing the mutation rate increases clonal interference, but also accelerates adaptation by allowing for more mutations each day. This effect can be eliminated by normalizing time by the mutation rate, so as to isolate the role of clonal interference itself. Viewed in these units, clonal interference decelerates the fitness trajectory due to the suppression of fixation probabilities (Figure 7)).

When measuring the speed of a fitness trajectory, it is more rigorous to assign a quantitative measure by fitting a functional form such as the power law in (2), which will directly estimate the timescale parameter *a* independently for each *μ*. Fitting this power law reveals exponents that hover around *c* ≈ 0.8 in the epistatic setting (here, *β* = 0.25), but increase from *c* ≈ 1.85 for *μ* = 10^{−8} to *c* ≈ 2.6 for *μ* = 10^{−4} in the non-epistatic setting (Figure 5). These results can also be visualized qualitatively by normalizing time in units such that the initial rate of fitness increase is constant for different values of *μ*, as was done for the results in Figure 2 via the definition of *β* (Figure 8).

### A change in the effective landscape

To understand this result – and why it differs from the predictions of the fitness-parameterized setting – we can map the microscopic model to an effective fitness-parameterized landscape (Figure 6). This mapping reveals a striking observation: the effective macroscopic model depends on the mutation rate, which is typically treated as an independent parameter in the standard fitness-parameterized framework. Intuitively, there are many states in the landscape with the same value of *F* but which differ in their conditional distribution of increments *ρ*(Δ*F*|*F*); these states are dynamically selected in a way that depends on the level of clonal interference. In both the epistatic and non-epistatic landscapes, we find that the distribution becomes more sharply peaked for higher levels of clonal interference, while it has a heavier tail for lower levels (Figure 6A/B). This occurs because clonal interference suppresses the fixation of low-effect mutations. The result is that systems with high clonal interference typically reach a given *F* through fewer, more valuable mutations, while systems with low clonal interference typically reach *F* through the accumulation of more low-value mutations.

This effect can be visualized over the trajectory by making use of the summary statistics 〈Δ*F _{b}*〉 (expected increment) and

*R*(rank) viewed as a function of the fitness. In both landscapes, the expected increment decreases with increasing

*μ*(Figure 6C/D). By contrast, the behavior of the rank depends on the strength of microscopic epistasis. Without epistasis, the rank increases with

*μ*(Figure 6E), while with epistasis, the rank becomes non-monotonic in

*μ*, and becomes comparable at high fitness (Figure 6F). This observation suggests the importance of the rank in setting the speed of the fitness trajectory, analogous to the results of Section 3.2. Both with and without microscopic epistasis, the behavior of the rank with

*μ*parallels that of the exponent

*c*.

# 4 Discussion and conclusions

In this paper, motivated by laboratory serial dilution experiments, we developed a high-performance simulation approach to study the dynamics of long-term adaptation. We focused on a generic microscopic model that considers the microbial genome as a collection of sites with a binary value indicating the presence of a mutation. Our model contains a non-dimensional parameter 0 ≤ *β* ≤ 1 that enables us to smoothly tune the relative contribution of microscopic epistasis to the fitness effect Δ*F _{i}* of a mutation at gene

*i*. In addition, we can tune an overall mutation rate

*μ*to adjust the magnitude of clonal interference in the culture. By independently varying the parameters

*β*and

*μ*, we mapped out a phase diagram that describes the effects of microscopic epistasis and clonal interference on the functional form of the mean fitness trajectory.

Our simulation approach gives us the ability to probe microscopic details that are challenging to obtain experimentally, such as statistics of the distribution of fitness effects and the rank over time. The approach also allows us to study regimes such as strong microscopic epistasis and strong clonal interference that have eluded previous theoretical study. In addition to its generality, our model has computational advantages that enable us to probe the long-time dynamics required to reach a local fitness maximum.

## The role of microscopic epistasis

By mapping the model to a simplified fitness-parameterized landscape, we showed that as the strength of microscopic epistasis increases, more mutations are needed to reach a local fitness maximum. In addition, beneficial mutations become less likely; these two properties together lead to a slower trajectory. We isolated a mechanism for this phenomenon – the generation of new, low-effect beneficial mutations mediated by the epistatic interaction when the culture is at high fitness – and we showed through a random walk model that this generation is sufficient to slow the fitness trajectory.

A by-product of our analysis is an observation that, as microscopic epistasis increases in strength, the beneficial mutation rate becomes an increasingly linear function of fitness, similar to the phenomenon of “global epistasis” observed for the expected beneficial fitness increment (Reddy and Desai, 2021). This suggests that the strength of epistasis in realistic microbial populations could be inferred by measuring the beneficial mutation rate as a function of fitness experimentally.

## The role of clonal interference

Through a similar analysis, we observed that in the microscopic context considered here, any equivalent fitness-parameterized model must depend on the mutation rate *μ*, a parameter that is typically tuned independently. In effect, the change in fixation probabilities induced by clonal interference filters the accessible genotypes with a given fitness *F*, giving rise to a conditional distribution of fitness increments *ρ _{μ}*(Δ

*F*|

*F*) that depends on

*μ*.

Based on this observation, we showed that the effect of clonal interference on the fitness trajectory differs from prior predictions in the fitness-parameterized setting and depends on the strength of epistasis. In a non-epistatic model, increasing clonal interference accelerates the fitness trajectory. For sufficiently strong microscopic epistasis, clonal interference has no effect on the speed of the fitness trajectory. We leave the development of a mechanistic model capturing this phenomenon to future work.

## The beneficial mutation rate

A surprising observation is that the trends of the rank with *β* in Figure 3A/B and with *μ* in Figure 6E/F are both consistent with the behavior of the exponent with *β* and *μ*. This is also similar to predictions in the fitness-parameterized framework, where it was found that the effect of clonal interference depends on how the beneficial mutation rate changes with fitness (Guo and Amir, 2022). Taken together, these results suggest that the behavior of the beneficial mutation rate as a function of fitness plays a central role in setting the speed of the fitness trajectory.

## 4.1 Future directions

Our model and simulation can be extended in many exciting directions. One possibility is to allow for horizontal gene transfer between bacterial strains. By allowing large jumps across the fitness landscape, horizontal gene transfer may have a similar effect to microscopic epistasis, and could slow the fitness trajectory by generating groups of new beneficial mutations (Slomka et al., 2020); on the other hand, it could also accelerate the fitness trajectory by allowing for larger steps towards a fitness maximum. Another possibility is to bias the model, so that the *h _{i}* and

*J*have non-zero means, and to study the effect of these mean values on the long-term dynamics. A third possibility is to allow for further structure in the interaction matrix

_{ij}*J*, rather than the i.i.d. random entries considered here. For example, by allowing for low-rank structure in

*J*, one could in principle quantify the role of connected modules of mutations on the functional form of the fitness trajectory (Parter et al., 2008; Landau and Sompolinsky, 2016, 2021). A final direction would be to form a mechanistic model for the effect of clonal interference both with and without epistasis, similar to the random walk model developed to understand the effect of microscopic epistasis.

In addition to the extensions considered above, our simulation environment forms a fertile testing ground for theoretical predictions. The relaxation exponents as a function of *β* considered in Figure 2E could in principle be predicted within the random walk approximation using the techniques developed by Horner (2007), or within a dynamical mean-field theory that studies the evolution of the average value of the mutation variables 〈*α _{i}*(

*t*)〉 ∈ [−1, 1] over time (Ginzburg and Sompolinsky, 1994; Sompolinsky and Zippelius, 1981; Sommers, 1987). Solving such a model, in addition to analytically quantifying the slowing of the fitness trajectory with

*β*, would provide a prediction for a functional form that could be tested against experimental fitness trajectories.

# Acknowledgements

We thank Haim Sompolinsky, Guy Bunin, and Yoav Ram for many useful discussions, and we thank Jimmy Almgren-Bell for contributions to an initial version of the simulation software. N. M. B. was partially supported by the Research Training Group in Modeling and Simulation funded by the National Science Foundation via grant RTG/DMS – 1646339. C. H. R. was partially supported by the Applied Mathematics Program of the U.S. DOE Office of Science Advanced Scientific Computing Research under Contract No. DE-AC02-05CH11231.

# A Materials and methods

Here, we provide an overview of the simulation methodology, as well as some necessary definitions for the results to follow. Further information on the simulation approach is available in Appendix B.

## Strains

We begin with a population of size *N*_{0} and with an initial ancestral genotype *α*^{(0)} drawn randomly as *L* i.i.d. random variables taking the values ±1 with equal probability. We define mutations with respect to this initial sequence and define a strain by the path taken on the hypercube from the initial genotype to its current genotype. Due to microscopic epistasis, the fitness effect of a mutation can only be defined in the context of its genetic background. Defining strains by their path ensures that mutations at the same site with different fitness effects are correctly tracked throughout the simulation. Each path is stored in a hash table, and the hashes are compared when new strains are generated to see if the new strain already exists.

## Growth

We assume that each strain grows exponentially with rate given by its fitness. The primary quantity we study is the population mean fitness *F*(*t*), which is defined as the average fitness over all cells in the population. To obtain *F*(*t*), we average its value over replicate trajectories initialized from different ancestral genotypes within the same fitness landscape.

## Mutations

As the strains grow, mutants are generated with a fixed mutation rate *μ* > 0, which describes the probability of a cell gaining a mutation when it divides. The genotype of a mutant is obtained from the genotype of the parent by flipping a site uniformly at random. A mutation is said to have *fixed* if it is present in all strains. For a mutant with genotype *α*^{(c)} produced from a parent with genotype *α*^{(p)}, the fitness increment is Δ*F*(*α*^{(c)}, *α*^{(p)}) = *F* (*α*^{(c)}) − *F* (*α*^{(p)}), while the selection coefficient is *s*(*α*^{(c)}, *α*^{(p)}) = Δ*F*(*α*^{(c)}, *α*^{(p)})/*F*(*α*^{(p)}).

## Dilution protocol

We study a batch culture subject to a standard serial dilution protocol (Lenski et al., 1991; Lin et al., 2020a; Desai et al., 2007; Kryazhimskiy et al., 2012). The population is allowed to grow until the total number of bacteria in the simulation reaches *N _{f}* =

*D*×

*N*

_{0}where

*D*is the dilution factor. When the total population size reaches

*N*, we say that a day has been completed. At the end of each day, the population is diluted back to size

_{f}*N*

_{0}by sampling from a multivariate hypergeometric distribution. Repeated sampling from this distribution is computationally intensive, and we developed an efficient approximate scheme to do so in the strong clonal interference regime (Appendix B).

## Simulation parameters

Motivated by Lenski’s long-term evolution experiment (Lenski et al., 1991; Lenski and Travisano, 1994), we set *D* = 100, *N*_{0} = 10^{6}, and *N _{f}* = 10

^{8}in all simulations. We fix

*ρ*= 0.05 and Δ = 0.005, respectively motivated by sparsity of biological networks and the overall magnitude of typical fitness increments in laboratory experiments, as stated earlier. We set

*L*= 1000, which corresponds to considering mutations at the level of each gene in

*E. coli*. Because typically only a fraction of possible mutations are beneficial, we fix the initial rank to 100. The mutation rate

*μ*and epistatic parameter

*β*are varied and will be specified along with the results.

# B Further simulation details

Recall that the fitness of a strain with genotype * α* ∈ {±1}

^{L}is given by

## Strain growth

Because each strain grows with rate proportional to its fitness, we may write

In (9), *N _{i}*(

*t*) represents the size of strain

*i*at time

*t*,

*α*^{(i)}is the genotype for strain

*i*, and Δ

*t*> 0 is a timestep (set to Δ

*t*= 0.01 in all simulations). To avoid biasing the growth for strains with low bacteria count, we allow throughout the simulation.

## Mutations

After a step of size Δ*t*, following (9), the number of new bacteria produced by a given strain is equal to

To ensure that the number of mutants generated by strain *i* does not exceed the number of bacteria Δ*N _{i}*(

*t*) generated by strain

*i*, we draw a Poisson random variable

and then set the number of mutants to be

Let *α*^{(c)} denote a child strain produced by parent strain with genotype *α*^{(p)}. To obtain *α*^{(c)}, we draw an index *j* ∈ {1,…, *L*} uniformly at random and flip the sign of . The path through genome space for the child strain is obtained by appending the index *j* to the path through genome space for the parent strain.

As new strains are generated through mutation events, we must check if they already exist in the population. If the strain already exists, the mutant joins the existing strain rather than defining a new one. Checking for the existence of a newly generated strain in the overall population can be performed efficiently by hashing the list of integers defining the path through genome space. All current paths can be stored in a set defining the active strains: the time complexity of checking set membership scales as with the number of strains, which is a significant reduction compared to checking all new mutant strains against all existing strains.

## Dilution protocol

The number of bacteria in each strain that make it through dilution to the following day follows a multivariate hypergeometric distribution. To efficiently sample from this distribution, we sequentially draw from the hypergeometric marginal distributions (Gentle, 1998). We first sort the population by number of bacteria in descending order. We then sample *k*_{1} ~ Hyper (*N _{f}*,

*N*

_{1},

*N*

_{0}) where

*N*

_{1}is the size of the largest strain. We then recursively apply this procedure, choosing

*k*~ Hyper(

_{j}*N*− ∑

_{f}_{l<j}

*k*,

_{l}*N*,

_{j}*N*

_{0}− ∑

_{l<j}

*k*) until we have drawn N

_{l}_{o}bacteria or we have gone through the entire population. For greater efficiency, each hypergeometric marginal distribution can be replaced with a draw from a binomial distribution. We verified that our results are independent to this approximation.

## Replicates

Each simulation is performed with a number of replicate populations. Each replicate is instantiated with the same quenched disorder as specified by the *h _{i}* and

*J*. The dynamics for each replicate differ through the random initialization of the ancestral genotype

_{ij}

*α*^{(0)}and through the sequence of random mutations. In the experiments studying the role of clonal interference, the same

*h*and

*J*are used as

*μ*is varied.

## Power-law fitting methodology

To obtain fitness exponents, the mean (computed over replicates) trajectory is fit via nonlinear least-squares using the scipy function curve_fit. The standard error of the mean is used to weight the residuals in the loss function. Error bars are computed via the bootstrap method, by randomly sampling subsets of trajectories and fitting models to the mean over each subset. The corresponding estimates define an empirical distribution over parameters, from which we compute quantiles and use the median as the estimate of the parameters.

## Random walk statistics

We perform the following procedure to obtain an estimate of *μ _{β}* and . We average over initial landscapes and n_inits initializations with rank

*R*for each value of

*β*. For each initialization, we select a single beneficial mutation at random and compute the changes to all other

*R*− 1 beneficial fitness increments. We compute the empirical mean and variance of these changes and average the resulting estimates over all landscapes and initializations. In the numerical experiments reported here, we set

*R*= 100, n_inits = 10, and n_landscapes = 10, though we found that our results were insensitive to the choice of number of landscapes and initializations.

## Coarse-grained simulation methodology

To simulate the discrete random walk model, we compute an initial empirical distribution of fitness increments according to (8) and save the sparsity pattern defined by the randomly drawn *J _{ij}*. At each fixation event, we flip Δ

*F*↦ −Δ

_{i}*F*and adjust each Δ

_{i}*F*↦ Δ

_{j}*F*+

_{j}*η*for

_{j}*j*≠

*i*as discussed in the main text. If

*J*= 0 in the originally drawn genetic network, we set

_{ij}*η*= 0. For comparison, the full SSWM dynamics can be simulated using (8) directly.

_{j}Fixation probabilities are determined by Haldane’s formula *p*_{fix}(Δ*F*) ~ Δ*F*/*F*. To fit our discrete random walk model into the framework of Gillespie’s stochastic simulation algorithm, we define “chemical reactions” corresponding to each possible fixation event. The reaction propensity for a mutation at site *i* is taken to be equal to *p*_{fix}(Δ*F _{i}*). We choose a mutation site at random with probability proportional to its fixation probability. We randomly draw the time that occurred before the next fixation event according to an exponential distribution with mean (up to an overall mutation rate fixing the units of time).

# C Fast fitness computation

The following lemma gives a fast algorithm for computing the fitness of a given bacterial strain. The proof proceeds by noting that rather than computing (1) directly, we can compute the fitness of a strain *α*^{(c)} with respect to its parent strain *α*^{(p)}. It concludes by observing that the fitness of *α*^{(p)} can be related to the fitness of an arbitrary reference strain *α*^{(r)}. This reference strain can be adjusted on-the-fly to track the state of the population.

## Lemma C.1

*Let α*

^{(r)}∈ {±1}

^{L}

*denote the genotype of a reference strain,*

**α**^{(p)}∈ {±1}

^{L}

*denote the genotype of the parent strain, and*

**α**^{(c)}∈ {±1}

^{L}

*denote the genotype of the child strain obtained from*

**α**^{(p)}

*via a mutation at gene k. Then*,

*where M _{r} denotes the set of mutations of the parent with respect to the reference strain*.

Equation (10) states that if we store the elements of the vector , we can compute the fitness of a child strain in terms of the fitness of the parent strain in time complexity . This is a massive improvement over the complexity corresponding to a naive calculation of the quadratic form, and a large improvement over the complexity corresponding to computing the fitness of the child with respect to the fitness of the parent, particularly if the reference strain is updated to keep |*M _{r}*| small. We found empirically that setting the reference strain to be equal to the dominant strain every time the dominant strain accumulates 15 new fixed mutations was a good heuristic.

### Proof

Observe that we may write

Now, write that with

Plugging this in completes the proof.

# References

- Mean-field model for electron-glass dynamics
*Phys. Rev. B***77** - Spin-glass models of neural networks
*Phys. Rev. A***32** - Storing infinite numbers of patterns in a spin-glass model of neural networks
*Phys. Rev. Lett***55** - Aging of spherical spin glasses
*Probability Theory and Related Fields***120**:1–67 - Idiosyncratic epistasis leads to global fitness–correlated trends
*Science***376**:630–635 - Genome evolution and adaptation in a long-term experiment with Escherichia coli
*Nature***461** - Open problem: The landscape of the loss surfaces of multilayer networks
*Proceedings of The 28th Conference on Learning Theory, volume 40 of Proceedings of Machine Learning Research* - Diminishing returns epistasis among beneficial mutations decelerates adaptation
*Science***332**:1190–1192 - A global genetic interaction network maps a wiring diagram of cellular function
*Science***353** - Clonal Interference and the Periodic Selection of New Beneficial Mutations in Escherichia coli
*Genetics***172**:2093–2100 - Beneficial mutation selection balance and the effect of linkage on positive selection
*Genetics***176** - The speed of evolution and maintenance of variation in asexual populations
*Current Biology***17**:385–394 - Global epistasis on fitness landscapes
*arXiv:2210.03677* - Mechanism for the failure of the edwards hypothesis in the sherrington-kirkpatrick spin glass
*Phys. Rev. B***74** - Evolution in a rugged fitness landscape
*Physical Review A***46**:6714–6723 - Clonal interference, multiple mutations and adaptation in large asexual populations
*Genetics***180** - Random Number Generation and Monte Carlo Methods
- The fate of competing beneficial mutations in an asexual population
*Genetics***102** - A general method for numerically simulating the stochastic time evolution of coupled chemical reactions
*Journal of Computational Physics***22** - A simple stochastic gene substitution model
*Theoretical Population Biology***23**:202–215 - Molecular Evolution Over the Mutational Landscape
*Evolution***38**:1116–1129 - The causes of molecular evolution
- Theory of correlations in stochastic neural networks
*Phys. Rev. E***50** - The impact of macroscopic epistasis on long-term evolutionary dynamics
*Genetics***199** - Distribution of fixed beneficial mutations and the rate of adaptation in asexual populations
*Proceedings of the National Academy of Sciences***109** - The effect of weak clonal interference on average fitness trajectories in the presence of macroscopic epistasis
*Genetics***220** - Stochastic tunneling across fitness valleys can give rise to a logarithmic long-term fitness trajectory
*Science Advances***5** - A mathematical theory of natural and artificial selection, part v: Selection and mutation
*Mathematical Proceedings of the Cambridge Philosophical Society***23** - An equivalence principle for the incorporation of favorable mutations in asexual populations
*Science***311** - Neural networks and physical systems with emergent collective computational abilities
*Proceedings of the National Academy of Sciences***79** - Time dependent local field distribution and metastable states in the sk-spin-glass
*The European Physical Journal B***60** - Stochastic tunnels in evolutionary dynamics
*Genetics***166** - Towards a general theory of adaptive walks on rugged landscapes
*Journal of Theoretical Biology***128** - The NK model of rugged fitness landscapes and its application to maturation of the immune response
*Journal of theoretical biology***141**:211–245 - Negative epistasis between beneficial mutations in an evolving bacterial population
*Science***332**:1193–1196 - A simple model for the balance between selection and mutation
*Journal of Applied Probability***15**:1–12 - The dynamics of adaptation on correlated fitness landscapes
*Proceedings of the National Academy of Sciences***106** - Population subdivision and adaptation in asexual populations of saccharomyces cerevisiae
*Evolution***66**:1931–1941 - Global epistasis makes adaptation predictable despite sequence-level stochasticity
*Science***344**:1519–1522 - Phase space geometry and slow dynamics
*Journal of Physics A: Mathematical and General***29**:1929–1948 - The impact of structural heterogeneity on excitation-inhibition balance in cortical networks
*Neuron***92** - Macroscopic fluctuations emerge in balanced networks with incomplete recurrent alignment
*Phys. Rev. Research***3** - Experimental evolution and the dynamics of adaptation and genome evolution in microbial populations
*The ISME Journal***11** - Dynamics of adaptation and diversification: a 10,000-generation experiment with bacterial populations
*Proceedings of the National Academy of Sciences***91**:6808–6814 - Long-term experimental evolution in Escherichia coli. I. Adaptation and divergence during 2,000 generations
*The American Naturalist***138**:1315–1341 - Evolution of microbial growth traits under serial dilution
*Genetics***215**:767–777 - Evolution of microbial growth traits under serial dilution
*Genetics***215**:767–777 - Protein evolution on rugged landscapes
*Proceedings of the National Academy of Sciences of the United States of America***86**:6191–6195 - Information, Physics, and Computation
- Spin glass theory and beyond
*World Scientific* - Self-consistent description of coulomb gap at finite temperatures
*Soviet Physics - JETP (English Translation)***68** - Statistical genetics and evolution of quantitative traits
*Rev. Mod. Phys***83** - The Population Genetics of Adaptation: The Adaptation of Dna Sequences
*Evolution***56**:1317–1330 - The Population Genetics of Adaptation on Correlated Fitness Landscapes: The Block Model
*Evolution***60**:1113–1124 - Clonal interference in large populations
*Proceedings of the National Academy of Sciences***104**:18135–18140 - Evolution in random fitness landscapes: the infinite sites model
*Journal of Statistical Mechanics: Theory and Experiment***2008** - Facilitated variation: How evolution learns from past environments to generalize to new environments
*PLOS Computational Biology***4** - Global epistasis emerges from a generic model of a complex trait
*eLife***10** - Complex interactions can create persistent fluctuations in high-diversity ecosystems
*PLOS Computational Biology***16**:1–14https://doi.org/10.1371/journal.pcbi.1007827 - Solvable model of a spin-glass
*Phys. Rev. Lett***35** - Experimental evolution of Bacillus subtilis reveals the evolutionary dynamics of horizontal gene transfer and suggests adaptive and neutral effects
*Genetics***216** - Path-integral approach to ising spin-glass dynamics
*Phys. Rev. Lett***58** - Dynamic theory of the spin-glass phase
*Physical Review Letters***47**:359–362 - Relaxational dynamics of the Edwards-Anderson model and the mean-field theory of spin-glasses
*Physical Review B*:6860–6875 - Global mapping of the yeast genetic interaction network
*Science***303** - Genetic background affects epistatic interactions between two beneficial mutations
*Biology letters***9** - The rate at which asexual populations cross fitness valleys
*Theor Popul Biol***75** - The speed of adaptation in large asexual populations
*Genetics***167**:2045–2053 - Long-term dynamics of adaptation in asexual populations
*Science***342**

# Article and author information

### Author information

## Version history

- Preprint posted:
- Sent for peer review:
- Reviewed Preprint version 1:
- Reviewed Preprint version 2:
- Version of Record published:

## Copyright

© 2023, Boffi et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

# Metrics

- views
- 820
- downloads
- 83
- citations
- 2

Views, downloads and citations are aggregated across all versions of this paper published by eLife.