Abstract
Epistasis between mutations can make adaptation contingent on evolutionary history. Yet despite widespread ‘microscopic’ epistasis between the mutations involved, microbial evolution experiments show consistent patterns of fitness increase between replicate lines. Recent work shows that this consistency is driven in part by global patterns of diminishingreturns and increasingcosts epistasis, which make mutations systematically less beneficial (or more deleterious) on fitter genetic backgrounds. However, the origin of this ‘global’ epistasis remains unknown. Here, we show that diminishingreturns and increasingcosts epistasis emerge generically as a consequence of pervasive microscopic epistasis. Our model predicts a specific quantitative relationship between the magnitude of global epistasis and the stochastic effects of microscopic epistasis, which we confirm by reanalyzing existing data. We further show that the distribution of fitness effects takes on a universal form when epistasis is widespread and introduce a novel fitness landscape model to show how phenotypic evolution can be repeatable despite sequencelevel stochasticity.
Introduction
Despite the idiosyncrasies of epistasis, a number of laboratory microbial evolution experiments show systematic patterns of convergent phenotypic evolution and declining adaptability. A striking example is provided by the Escherichia coli longterm evolution experiment (LTEE) (Figure 1a): 12 replicate populations that adapt in parallel show remarkably similar trajectories of fitness increase over time (Wiser et al., 2013; Lenski et al., 2015), despite stochasticity in the identity of fixed mutations and the underlying dynamics of molecular evolution (Tenaillon et al., 2016; Good et al., 2017). Similar consistent patterns of fitness evolution characterized by declining adaptability over time have also been observed in parallel yeast populations evolved from different genetic backgrounds and initial fitnesses (Kryazhimskiy et al., 2014; Figure 1b) and in other organisms (Elena and Lenski, 2003; Perfeito et al., 2014; Wünsche et al., 2017; Sanjuán et al., 2005; Couce and Tenaillon, 2015; Jerison et al., 2017; Schenk et al., 2013). Declining adaptability is thought to arise from diminishingreturns epistasis (Khan et al., 2011; Chou et al., 2011; Kryazhimskiy et al., 2014), where a global coupling induced by epistatic interactions systematically reduces the effect size of individual beneficial mutations on fitter backgrounds. Diminishingreturns manifests as a striking linear dependence of the fitness effect of a mutation on background fitness (Figure 1c). While diminishingreturns can be rationalized as the saturation of a trait close to a fitness peak, recent work shows a similar dependence on background fitness even for deleterious mutations, which become more costly on higher fitness backgrounds (Johnson et al., 2019). This suggests that fitter backgrounds are also less robust to deleterious effects (Figure 1d), a phenomenon that has been termed increasingcosts epistasis. The origin of the global coupling that results in these effects is unknown.
Put together, these empirical observations suggest that the contributions to the fitness effect, s_{i}, of a mutation at a locus $i$ in a given genetic background can be written as
where ${s}_{\text{additive},i}$ is the additive effect of the mutation, ${s}_{\text{genotype},i}$ is its genotypedependent epistatic contribution independent of the background fitness $y$ (i.e., idiosyncratic epistasis), and c_{i} quantifies the magnitude of global epistasis for locus $i$. Equation (1) reflects the observation that the strength of global epistasis depends on the specific mutation and applies independently of whether its additive effect is deleterious (increasingcosts) or beneficial (diminishingreturns). Over the course of adaptation in a fixed environment, global epistatic feedback on mutational effects can lead to a longterm decrease in adaptability. If this feedback dominates, Equation (1) suggests that the dependence of the fitness effect on evolutionary history is summarized entirely by the current fitness, and therefore results in predictable fitness evolution.
Here, we show that diminishingreturns and increasingcosts epistasis are a simple consequence of widespread epistasis (WE). This is consistent with recent work (Lyons et al., 2020) that proposes a similar argument to explain these phenomena. However, while the core idea is similar, we present here an alternative framework based on the Fourier analysis of fitness landscapes, which leads to new insights and quantitative predictions. In particular, our framework leads to novel predictions for the relationship between the magnitude of global epistasis and the stochastic effects of microscopic epistasis, which we confirm by reanalyzing existing data. Extending this framework, we further quantify how the distribution of fitness effects (DFEs) shifts as the organism adapts and how the fitness effect of a mutation depends on the sequence of mutations that have fixed over the course of adaptation (i.e., historical contingency). While specific historical relationships depend on the genetic architecture, we introduce a novel fitness landscape model with an intuitive architecture for which the entire history is summarized by the current fitness. Using this fitness landscape model, we investigate the longterm dynamics of adaptation and elucidate the architectural features that lead to predictable fitness evolution.
Results
Diminishingreturns and increasingcosts epistasis
We begin by examining the most general way to express the relationship between genotype and fitness (i.e., to describe the fitness landscape). A map between a quantitative trait (such as fitness), $y$, and the underlying genotype can be expressed as a sum of combinations of $\mathrm{\ell}$ biallelic loci ${x}_{1},{x}_{2},\mathrm{\dots},{x}_{\mathrm{\ell}}$ that take on values ${x}_{i}=\pm 1$ (Hordijk and Stadler, 1998; Neher and Shraiman, 2011; Weinberger, 1991; Szendro et al., 2013; Weinreich et al., 2013):
where $\overline{y}$ is a constant that sets the overall scale of fitness. The symmetric convention ${x}_{i}=\pm 1$ for the two allelic variants is less often used than ${x}_{i}=0,1$, but it is an equivalent formulation, which we employ here because it will prove more convenient for our purposes (see Poelwijk et al., 2016 for a discussion). The coefficients of terms linear in x_{i} represent the additive contribution of each locus to the fitness (i.e., its fitness effect averaged across genotypes at all other loci), the higherorder terms quantify epistatic interactions of all orders, and $\overline{y}$ is the average fitness across all possible genotypes. Importantly, Equation (2) makes apparent the idiosyncrasies induced by epistasis: a mutation at a locus with $\mathrm{\ell}$ interacting partners has an effect composed of ${2}^{\mathrm{\ell}1}$ contributions.
To explicitly compute the fitness effect of a mutation at locus $i$ on a particular genetic background, we simply flip the sign of x_{i}, keeping all other x_{j} constant, and write down the difference in fitness that results. This fitness effect will generally involve a sum over a large number of terms involving the $f$’s in Equation (2). While this may suggest that an analysis of fitness effects via Equation (2) is intractable, the analysis in fact simplifies considerably if the locus has a significant number of independent interactions that contribute to the fitness (i.e., provided that the number of independent, nonzero epistatic terms associated to the locus is large). In this case, we show that the fitness effects of individual mutations decrease linearly with background fitness and the fluctuations around this linear trend are normally distributed. In other words, widespread independent idiosyncratic epistatic interactions lead to the observed patterns of diminishingreturns and increasingcosts epistasis.
We present a derivation of this result in the SI. Here, we explain the key intuition using a heuristic argument. The argument is based on a simple idea: for a welladapted organism ($y>\overline{y}$) with complex epistatic interactions, a mutation is more likely to disrupt rather than enhance fitness. To be quantitative, consider a highly simplified scenario where some number $N$ of the $f$’s in Equation (2) are ±1 at random and the others are 0. In this case, the fitness of a given genotype is a sum of ${N}_{+}$ and ${N}_{}$ interactions that contribute positively and negatively to the trait, respectively, each with unit magnitude, so that $y=\overline{y}+{N}_{+}{N}_{}$. When positive and negative interactions balance, the organism is in a ‘neutrally adapted’ state ($y\approx \overline{y}$). By selecting for positive interactions, adaptation generates a bias so that ${N}_{+}>{N}_{}$ and $y>\overline{y}$. If locus $i$ involved in a fraction v_{i} of all of $N={N}_{+}+{N}_{}$ interactions is mutated, the effect of the mutation, on average, is to flip the sign of ${N}_{+}{v}_{i}$ positive interactions and ${N}_{}{v}_{i}$ negative interactions. The new fitness is then ${y}_{i}=y2{N}_{+}{v}_{i}+2{N}_{}{v}_{i}=\overline{y}+(12{v}_{i})(y\overline{y})$ and thus ${s}_{i}={y}_{i}y=2{v}_{i}(y\overline{y})$. The negative linear relation between the background fitness, $y$, and the fitness effect of the mutation, s_{i}, is immediately apparent and emerges as a systematic trend simply due to a sampling bias towards positive interactions. Of course, while this relation is true on average, it is possible that locus $i$ affects more or less positive interactions due to sampling fluctuations. Provided only that $N$ is large and the interactions are independent, these fluctuations are approximately Gaussian with magnitude $\sqrt{N{v}_{i}(1{v}_{i})}$.
This basic argument holds beyond the simple model with unit interactions. In the more general case, if the mutation is directed from ${x}_{i}=1\to +1$, we show in the SI that its fitness effect, s_{i}, on a background of fitness $y$ can be written as
where,
and ${\stackrel{~}{\u03f5}}_{i}$ is a genotype and locusdependent term that is distributed across genotypes with mean zero and variance expressed in terms of the $f$’s from Equation (2) (see SI for details). In the following equation and similar ones henceforth, a summation such as ${\sum}_{j>k\ne i}{f}_{ijk}^{2}$ is meant to denote a sum over pairs $j,k$, where each pair appears only once and no pair that includes index $i$ appears. Symmetry of the $f$’s w.r.t. interchanged indices is also assumed (e.g., ${f}_{ijk}={f}_{jik}$). The numerator of ${\stackrel{~}{v}}_{i}$ in Equation (4) is proportional to the covariance of fitness effects and background fitness, and the denominator is the variance of background fitness across genotypes. A similar equation for the case ${x}_{i}=+1\to 1$ can be derived. The choice of $+1\to 1$ or $1\to +1$ is simply a matter of convention. If the convention is reversed, the coefficients of oddorder in Equation (2), that is, ${f}_{i},{f}_{ijk},\mathrm{\dots}$, should also switch signs. It can be easily checked that reversing the signs of these quantities in the expression for ${\stackrel{~}{v}}_{i}$ above leads to the expression for ${\stackrel{~}{v}}_{i}$ when ${x}_{i}=+1\to 1$.
Note that in general ${\stackrel{~}{v}}_{i}$ is not guaranteed to be positive and ${\stackrel{~}{\u03f5}}_{i}$ is arbitrary and determined by the genotypefitness map. However, consistent patterns emerge when locus $i$ has a large number of independent, nonzero epistatic terms and the additive effects ${f}_{1},{f}_{2},\mathrm{\dots}$ of its interacting partners are not much larger than the epistatic terms (defined further below), which we call the WE limit. In the WE limit, ${\stackrel{~}{\u03f5}}_{i}$ is normally distributed across genotypes with variance proportional to ${\stackrel{~}{v}}_{i}(1{\stackrel{~}{v}}_{i})$. This follows from the same reasoning as in our heuristic argument with unit interactions above (see SI for details). In addition, ${\stackrel{~}{v}}_{i}$ is typically positive, giving rise to a negative linear trend (i.e., diminishingreturns and increasingcosts). We can see this by taking the third and higherorder terms in Equation (4) to be zero, in which case ${\stackrel{~}{v}}_{i}$ is positive if ${\sum}_{j\ne i}{f}_{ij}^{2}>{\sum}_{j\ne i}{f}_{j}{f}_{ij}$. This will typically be true in the WE limit because we expect ${\sum}_{j\ne i}{f}_{ij}^{2}$ to scale with the number of interacting partners $\mathrm{\ell}$, while each term in ${\sum}_{j\ne i}{f}_{j}{f}_{ij}$ can be positive or negative and thus the sum scales as $\sqrt{\mathrm{\ell}}$ if the terms are independent. Thus when locus $i$ has a large number of interacting partners, ${\stackrel{~}{v}}_{i}$ is typically positive unless the magnitude of the additive terms ($a$) is much larger than the magnitude of the epistatic terms ($e$), $a\gg e\sqrt{\mathrm{\ell}}$. This argument is easily extended to the case when the third and higherorder terms are nonzero (see SI); the upshot is that the bias towards ${\stackrel{~}{v}}_{i}$ positive gets stronger with increasing epistasis.
The conditions for the WE limit are more likely to hold when the number of loci, $\mathrm{\ell}$, that affect the trait is large. Therefore, we expect to generically observe patterns of diminishingreturns and increasingcosts epistasis for a complex trait involving many loci. Importantly, whether we observe a negative linear trend does not depend on the magnitude of a locus’ epistatic interactions relative to its own additive effect, but rather relative to the additive effects of its interacting partners. If we are not in the WE limit, and instead the additive effects dominate (i.e., $a\gg e\sqrt{l}$), then Equation (4) suggests that the slope of the linear trend can be either positive or negative. We will show further below that recent experimental data demonstrates that both scenarios can be relevant: some loci have $a\ll e\sqrt{l}$ while others have $a\gg e\sqrt{l}$, with the former creating a bias towards the observed negative linear trends that characterize diminishingreturns and increasingcosts epistasis.
We note that Equation (3) immediately leads to testable quantitative predictions: in the WE limit, the distribution of the residuals, ${\stackrel{~}{\u03f5}}_{i}$, obtained from regressing s_{i} and $y$ is entirely determined by the slope of the regression, $2{\stackrel{~}{v}}_{i}$. Specifically, we predict that these residuals (the deviations of individual genotype fitnesses from the overall diminishingreturns or increasingcosts trend) should be normally distributed with a variance proportional to ${\stackrel{~}{v}}_{i}(1{\stackrel{~}{v}}_{i})$. However, this condition only applies if diminishingreturns arises from the WE limit. It does not hold if epistasis is negligible, if locus $i$ interacts significantly with only a few other dominant loci, or if the epistatic terms are interrelated (e.g., when global epistasis arises from a nonlinearity applied to an unobserved additive trait; Starr and Thornton, 2016; Sailer and Harms, 2017; Otwinowski et al., 2018). The latter case may still lead to a negative linear trend, but the statistics of the residuals will differ from Equation (3) (see SI for a discussion).
It is convenient to subsequently work with the symmetric version of Equation (3), where the fitness effects of both ${x}_{i}=1\to +1$ and its reversion ${x}_{i}=+1\to 1$ (whose fitness effect is negative of the former) are included in the regression against their respective background fitness. In this case, the additive term is averaged out, and we show (SI) that in the WE limit,
where ${\eta}_{i}$ depends on the genetic background and the locus, and is normally distributed with zero mean and variance $V$, and
Here, $V$ is the total genetic variance due to all loci (i.e., the variance in fitness across all possible genotypes) while ${V}_{i}$ is the contribution to the total variance by the $f$’s involving locus $i$. We therefore refer to v_{i} as the variance fraction (VF) of locus $i$. We show further below that for certain fitness landscapes v_{i} can also be interpreted as the fraction of pathways affected by a locus. For these reasons, we focus on v_{i}, which is half of the negative slope, rather than the slope. Note that the v_{i}’s do not sum to one unless there is no epistasis (with epistasis, ${\sum}_{i}{v}_{i}>1$, reflecting the fact that the variance contributed by different loci overlap). While the directed mutation case discussed previously is the relevant one when presenting experimental data (e.g., Figure 1c, d), it is conceptually simpler to work with the symmetric case. These two cases coincide and ${v}_{i}\approx {\stackrel{~}{v}}_{i}$ in the WE limit if the additive effect of a locus is small (i.e., ${f}_{i}^{2}\ll {\sum}_{j\ne i}{f}_{ij}^{2}+{\sum}_{j>k\ne i}{f}_{ijk}^{2}+\mathrm{\dots}$).
Our results show that the VF v_{i} plays an important role. It determines the slope of the negative relationship between the fitness effect and background fitness. At the same time, it determines the magnitude of the idiosyncratic fluctuations away from this trend. We also note that this slope can be used to experimentally probe the contribution of a locus to the trait (i.e., its VF) taking into account all orders of epistasis, which circumvents the estimation of the individual $f$’s in Equation (2). The theory additionally predicts that the slope obtained by regressing the sum of fitness effects of two mutations at loci $i,j$ against background fitness is proportional to ${v}_{ij}={v}_{i}+{v}_{j}2{e}_{ij}$, where ${e}_{ij}$ quantifies the magnitude of epistatic interactions of all orders between $i$ and $j$ (SI).
Importantly, while the fitness effects of individual mutations (and hence the DFEs) may change over the course of evolution due to epistasis, the distribution of variance fractions (DVF) across loci, $P(v)$, is an invariant measure of the range of effect sizes available to the organism during adaptation. As we will see, this means that the DVF plays an important role in determining longterm adaptability.
Numerical results and experimental tests
To illustrate our analytical results, we first demonstrate that the effects described above are reproduced in numerical simulations. To do so, we numerically generated a genotypephenotype map of the form in Equation (2), with $\mathrm{\ell}=400$ loci and an exponential DVF, $P(v)={\overline{v}}^{1}{e}^{v/\overline{v}}$, where $\overline{v}=0.02$ (Materials and methods). This DVF is shown in Figure 2a. Note that $\overline{v}\mathrm{\ell}\gg 1$ corresponds to an epistatic landscape; $\overline{v}\mathrm{\ell}=8$ chosen here thus corresponds to a model within the WE limit (note that ${\stackrel{~}{v}}_{i}\approx {v}_{i}$ in this parameter range). Using this numerical landscape, we measured the fitness effect of mutations at 30 loci across 640 background genotypes with a range of fitnesses (Figure 2b). Our results recapitulate the predicted linear dependence on background fitness (Figure 1c, d), with a negative slope equal to twice the VF predicted from Equation (5). We further simulated the evolution of randomly generated genotypes similar to the experimental procedure used in Kryazhimskiy et al., 2014 (Figure 2c), finding that our results reproduce the patterns of declining adaptability observed in experiments (Figure 1b). Note that ∼10 mutations are fixed during this simulated evolution; declining adaptability here is not due to a finitesites effect.
As described previously, Equation (5) implies a proportional relationship between the magnitude of global epistasis (quantified by the slope of the relationship between the fitness effect of a mutation and the background fitness) and the magnitude of microscopic epistasis (quantified by the residual variance around this linear trend); see also Figure 3a. We verify this relationship in simulations (Figure 2d). We predict that the slope obtained by regressing the sum of fitness effects of two mutations at loci $i,j$ against background fitness is proportional to ${v}_{ij}={v}_{i}+{v}_{j}2{e}_{ij}$. We further assume that ${e}_{ij}=O({\overline{v}}^{2})$ (specifically, ${e}_{ij}={v}_{i}{v}_{j}$ for the genotypephenotype map used for numerics). Since v_{i} and v_{j} are typically small for a complex trait, we expect nearadditivity ${v}_{ij}\approx {v}_{i}+{v}_{j}$ and that any deviations are subadditive, which is confirmed in simulations (Figure 2e, f).
While testing the latter prediction on double mutants requires further experiments, we can immediately test the relationship between the slope and the distribution of residuals from existing experimental data. To do so, we reanalyzed the data from Johnson et al., 2019, which measured the fitness effect of 91 insertion mutants on about 145 backgrounds. These background strains were obtained by crossing two yeast strains that differed by $\approx 40,000$ singlenucleotide polymorphisms (SNPs). Of these 40,000 loci, $\mathrm{\ell}\approx 40$ have been identified as causal loci with currently available mapping resolution (Bloom et al., 2015). In Figure 3, we show the estimated ${\stackrel{~}{v}}_{i}$ (negative onehalf of the slope of the bestfit line) and the VF v_{i} for each of the 91 mutations. These mutations were selected after screening for nonzero effect, and thus the DVF is biased upwards. The mean VF is $\overline{v}\approx 0.06$. The wide range of v_{i} observed in the data implies that the epistatic influence of loci varies greatly across loci, and we will show further below that this is crucial for maintaining a supply of beneficial mutations even when the organism is welladapted to the environment.
Our theoretical results imply that we expect the linear relationship between background fitness and fitness effect to be negative if the additive effects of a locus’ interacting partners are not much larger than the epistatic terms. Specifically, we define the additivity of interacting loci (AoIL) for locus $i$ as
which we show can be estimated from data (Materials and methods and SI). If the AoIL is less than half, Equation (4) implies that the linear trend is guaranteed to be negative. If instead the AoIL is greater than 0.5, the trend can be either positive or negative. The data shows a range of AoIL between 0 and 1 across loci. As predicted by our theory, we find that the loci with AoIL lt_{0.5} always show negative trends and the ones with AoIL gt_{0.5} show both negative and positive trends (Figure 3c). Importantly, the sign of the trend is determined by the AoIL and not by the additivity of the mutated locus, which we define as
The additivity across loci also has a wide range. However, small additivity does not necessarily imply a negative trend (Figure 3d).
We next used the data from Johnson et al., 2019 to analyze the relationship between the slope of the linear trend and the residual variance around this trend. We find that the experimental data confirms our theoretical prediction that the residual variance is proportional to ${\stackrel{~}{v}}_{i}(1{\stackrel{~}{v}}_{i}$) if the AoIL is small (Figure 3e, ${R}^{2}=0.5$ for loci with AoIL ¡ 0.5 and ${R}^{2}=0.42$ for all loci). The Gaussiandistributed term in Equation (3) also predicts the shape of the distribution of the residuals given the VFs, which aligns well with the empirical distribution of the residuals (Figure 3f).
Together, these theoretical results and our reanalysis of experimental data show that linear patterns of global diminishingreturns and increasingcosts epistasis are a simple consequence of widespread epistatic interactions. The DVFs observed in data (Figure 3b) further imply that the epistatic influence of different loci on fitness can vary across a wide range. In what follows, we show that these two observations can be put together to make general predictions about the DFEes, and consequently the longterm dynamics of adaptation. The key ingredient that enables this analysis (including Equation (5)) is that in the WE limit fitness and fitness effects are jointly normal (with respect to a uniform distribution over all possible genotypes), which allows us to quantify complex dependencies between these variables in terms of pairwise covariances.
The distribution of fitness effects
Longterm adaptation is determined by the DFEs of possible mutations and the stochastic dynamical processes that lead to fixation. While Equation (5) represents the distribution of the fitness effects of a specific mutation at locus $i$ over all genotypes in the population that have fitness $y$, we are instead interested in the DFE, where fitness effects are measured for all the mutations arising in the background of a particular genotype that has fitness $y$. For now we ignore the influence of evolutionary history on the DFE; we expand on that complication in the following section.
Examining the DFE over $\mathrm{\ell}$ loci for a randomly chosen genotype of fitness $y$ can be thought of as sampling the fitness effects ${s}_{1},{s}_{2},\mathrm{\dots},{s}_{\mathrm{\ell}}$ from the conditional joint distribution $P({s}_{1},{s}_{2},\mathrm{\dots},{s}_{\mathrm{\ell}}y)$, which generally depends on epistasis. If the number of independent, nonzero epistatic terms is large, then $P({s}_{1},{s}_{2},\mathrm{\dots},{s}_{\mathrm{\ell}}y)$ is a multivariate normal distribution defined by the means and covariances of the $\mathrm{\ell}+1$ variables $y,{s}_{1},{s}_{2},\mathrm{\dots},{s}_{\mathrm{\ell}}$, which in turn can be computed in terms of the $f$’s from Equation (2). In particular, the conditional means and covariances are ${\text{Mean}}_{y}({s}_{i})=2{v}_{i}(y\overline{y})$, ${\text{Cov}}_{y}({s}_{i},{s}_{j})=4V({e}_{ij}{v}_{i}{v}_{j})$, where ${e}_{ij}$ is the epistatic VF between loci $i$ and $j$ and ${e}_{ii}={v}_{i}$. This implies that the conditional correlation between fitness effects is $({e}_{ij}{v}_{i}{v}_{j})/\sqrt{{v}_{i}{v}_{j}(1{v}_{i})(1{v}_{j})}$.
The DFE simplifies considerably if we make certain additional assumptions on the magnitude of epistatic interactions. If we assume the typical VF $\overline{v}$ is small (i.e., $\overline{v}\ll 1$) and also that ${e}_{ij}$ is $O({\overline{v}}^{2})$, then correlations are $O(\overline{v})$ and thus negligible. Then, in a particular sample ${s}_{1},{s}_{2},\mathrm{\dots},{s}_{\mathrm{\ell}}$, we can think of each s_{i} as being drawn independently with mean $2{v}_{i}(y\overline{y})$ and variance $4{v}_{i}V$. To compute the DFE, $\rho (sy)$, we first sample the VF from the DVF, $P(v)$, and then sample a Gaussian random variable with the aforementioned mean and variance. This leads to the DFE
where $\phi $ is the standard normal pdf. Curiously, the correlations between s_{i}’s vanish when ${e}_{ij}={v}_{i}{v}_{j}$, in which case the above equation is exact and the DFE is determined entirely by the DVF. Further below, we introduce a specific fitness landscape model for which this relation does hold. Diminishingreturns is naturally incorporated in Equation (9): the mean of $s$ is $2\overline{v}(y\overline{y})$, that is, the DFE shifts progressively towards deleterious values with increasing fitness.
Historical contingency in adaptive trajectories
A key unresolved question is the extent to which evolutionary history influences the DFE and the dynamics of adaptation (Agarwala and Fisher, 2019). That is, what does our theory say about historical contingency?
Suppose a clonal population of fitness y_{0} accumulates $k$ successive mutations resulting in fitnesses ${y}_{1},{y}_{2},\mathrm{\dots},{y}_{k}$. By virtue of arising on the same ancestral background, the fitness gain of a new mutation, ${s}_{k+1}$, is in general correlated with the full sequence of past fitnesses and the identity of the $k$ mutations through its epistatic interactions with them. Based on these correlations, we use wellknown properties of conditional normal distributions (Eaton, 1983) to write
where the weights ${w}_{k+1,i}$ depend on the VF (${v}_{k+1}$) of the new mutation and its epistatic interactions with past mutations. Here, $\u03f5$ is the normally distributed residual that depends on the initial genotype and the weights (SI). Equation (10) is a generalization to a sequence of mutations of Equation (5), which we can think of as the special case where $k=0$.
To gain intuition, it is useful to first analyze Equation (10) when $k=1$ (i.e., to compute the effect of a second mutation conditional on the first). In this case, we show in the SI that
where ${s}_{1}={y}_{1}{y}_{0}$ is the fitness effect due to mutation 1. The first term on the righthand side is the dependence on the fitness of the immediate ancestor, similar to the corresponding term in Equation (5). The second term quantifies the influence of epistasis between loci 1 and 2 on s_{2}. When ${e}_{12}={v}_{1}{v}_{2}$, dependence on s_{1} vanishes entirely and s_{2} depends only on y_{1}. In contrast, if loci 1 and 2 do not interact, ${e}_{12}=0$, and s_{2} is, on average, larger if the mutation at 1 is beneficial compared to when it is deleterious. This has an intuitive interpretation: diminishingreturns applies to the overall fitness and the mechanism through which it acts is epistasis. However, if mutations 1 and 2 do not interact, then the increase in fitness corresponding to mutation 1 does not actually reduce the effect of mutation 2 (as expected by diminishingreturns) so the expected effect of mutation 2 is larger. This analysis suggests that during adaptation, since selection favors mutations with stronger fitness effects on the current background, a mutation that interacts less with previous mutations is more likely to be selected.
To identify the conditions under which history plays a minimal role, we would like to examine when ${s}_{k+1}$ depends only on the current fitness, y_{k}, and is independent of both the past fitnesses and idiosyncratic epistasis. If this were true, then Equation (5) would apply for new mutations that arise through the course of a single evolutionary path (i.e., the fitness effect of a new mutation is ‘memoryless’ and depends only on its VF and the current fitness). Surprisingly, such a condition does exist. We show that this occurs when the magnitude of epistatic interactions between the new mutation and the $k$ previous mutations, ${e}_{k+1,1:k}$, satisfies a specific relation: ${e}_{k+1,1:k}={v}_{k+1}{v}_{1:k}$, where ${v}_{1:k}$ is the combined VF of the $k$ previous mutations (SI). In general, this condition is not satisfied, implying that there will be historical contingency that can be analyzed using the framework above. Remarkably, it turns out that a fitness landscape model for which the condition is satisfied does exist and arises from certain intuitive assumptions on the organization of biological pathways and cellular processes. This fitness landscape model additionally serves as an example of a landscape where global epistasis can vary substantially across loci. We describe this model below.
The connectedness model
We introduce the ‘connectedness’ model (CN model, for short). In this model, each locus $i$ is involved in a fraction ${\mu}_{i}$ of independent ‘pathways,’ where each pathway has epistatic interactions between all loci involved in that pathway (Figure 4a). The probability of an epistatic interaction between three loci ($i,j,k$) is then proportional to ${\mu}_{i}{\mu}_{j}{\mu}_{k}$ since this is the probability that these loci are involved in the same pathway. When the number of loci $\mathrm{\ell}$ is large, we show that in this model ${v}_{i}={\mu}_{i}/(1+{\mu}_{i})$, and when $\mathrm{\ell}$ is small, ${v}_{i}={\mu}_{i}/\overline{\mu}\mathrm{\ell}$, where $\overline{\mu}$ is the average over all loci (SI). The CN model therefore has a specific interpretation: the outsized contribution to the fitness from certain loci (large v_{i}) is due to their involvement in many different complex pathways (large ${\mu}_{i}$) and not from an unusually large perturbative effect on a few pathways. The distribution, $P(\mu )$, across loci determines the DVF.
Statistical fitness landscapes such as the NK model and the Rough Mt. Fuji model (Neidhart et al., 2013; Agarwala and Fisher, 2019; Kauffman and Weinberger, 1989; Stadler and Happel, 1999; Aita et al., 2000; Altenberg, 1997) are related to the CN model. Specifically, the CN model is a subclass of the broader class of generalized NK models (see Hwang et al., 2018 for a review). However, oftenstudied fitness landscape models have one important difference that distinguishes them and gives qualitatively different dynamics of adaptation (shown further below): in contrast to the CN model, classical fitness landscapes are typically ‘regular.’ That is, the VF of every locus is assumed to be the same (except the star neighborhood model, which has a bimodal DVF; Hwang et al., 2018).
The CN model is equivalent to a Gaussian fitness landscape with exponentially decaying correlations (SI). The CN model has tunable ruggedness, where the landscape transitions from additivity to maximal epistasis with increasing $\overline{\mu}$. Maximal epistasis corresponds to ${\mu}_{i}=1$ (and hence ${v}_{i}=1/2$) for all $i$. From Equation (5), this implies that the new fitness after a mutation occurs is independent of the previous fitness, consistent with the expectation from a HouseofCards (HoC) model (Kauffman and Levin, 1987) (where genotypes have uncorrelated fitness). Regular fitness landscape models with exponentially decaying correlations have memoryless fitness effects under the restrictive assumption that every locus is equivalent (Agarwala and Fisher, 2019). We show that the dynamics of adaptation of the more general CN model are also memoryless, that is, the condition detailed in the previous section holds true (SI). Yet, as we show below, the predicted dynamics for the CN model are very different to those from a regular fitness landscape model.
We emphasize that the wellconnectedness assumed for the CN model is not a requirement for Equation (5) to hold. However, how diminishingreturns influences the longterm dynamics of adaptation depends on the specific genetic architecture and the corresponding fitness landscape. Consider, for example, an alternative model of genetic networks organized in a modular structure (Figure 4b). In this model, each locus is part of a single module and interacts epistatically with other loci in that module to determine the fitness of that module; overall fitness is then determined as a function of the module fitnesses. In this case, the variance contributed by a locus is due to its additive contribution and from epistasis between loci restricted to its module. While the argument for diminishingreturns still applies to the fitness as a whole, it follows from the same argument that diminishingreturns should also apply to each module separately. Consequently, the dynamics of adaptation for the modular model are different from the CN model. For simplicity, we analyze the dynamics of adaptation for the CN model and postpone a discussion of how the dynamics differ for different models to subsequent work.
The dynamics of adaptation
We now examine the DFE that follows from Equation (5) and what that implies for longterm adaptation under the conditions for memoryless fitness effects. We henceforth assume a large number of loci with sparse epistasis (though the total number of nonzero epistatic terms is still large). This implies that $\mathrm{\ell}\gg 1$, ${v}_{i}\ll 1$ and $\overline{v}\mathrm{\ell}\gg 1$; for simplicity, we also assume strongselectionweakmutation (SSWM) selection dynamics and $s\ll 1,Ns\gg 1$, where $s$ are fitness effects and $N$ is the population size. Under these conditions, a mutation sweeps and fixes in a population before another one arises. The probability of fixation of a beneficial mutation, ${p}_{\text{fix}}$, is then proportional to its fitness effect (Haldane, 1927).
It is convenient to rescale fitnesses based on the total variance in fitness across all possible genotypes by defining $z={V}^{1/2}(y\overline{y}),\sigma ={V}^{1/2}s,\nu ={V}^{1/2}\eta $. Note that $\nu $ is normally distributed with zero mean and unit variance. Here, $z$ has an intuitive interpretation as the ‘adaptedness’ of the organism. When the organism is neutrally adapted ($z\ll 1$), positive and negative epistatic contributions to the fitness are balanced and diminishingreturns is negligible. Diminishingreturns is relevant when the organism is welladapted ($z\gg 1$). Below, we give the intuition behind our analysis, which is presented in full detail in the SI.
In the neutrally adapted regime, the linear negative feedback in Equation (5) is negligible and the DFE is determined by the distribution of $\simeq {v}^{1/2}\nu $. Loci with large $v$ can lead to a DFE with a long tail. If $\overline{v}$ is the typical VF of a locus, the fitness increases as $z\sim {n}_{s}{\overline{v}}^{1/2}$, where n_{s} is the number of substitutions. Since $\overline{v}$ is a measure of overall epistasis, this implies that epistasis speeds adaptation in the neutrally adapted regime by allowing access to more influential beneficial mutations.
Fitness increases until the effect of the negative feedback cannot be neglected. From Equation (5), this happens when $\overline{v}z\sim {\overline{v}}^{1/2}\nu $ (i.e., when ${z}^{2}\sim {\overline{v}}^{1}$). Intuitively, fitness begins to plateau when its accumulated benefit from substitutions is comparable to the scale of the total genetic variance (${n}_{s}\overline{v}\sim 1$) and further improvements are due to rare positive fluctuations. In this welladapted regime, diminishingreturns and increasingcosts epistasis strongly constrain the availability of beneficial mutations, whose effects can be quantified in this model: for a mutation to have a fitness effect $\sigma $, we require from Equation (5) that $\nu \simeq \sigma /2{v}^{1/2}+{v}^{1/2}z$, which has probability $\sim {e}^{{\nu}^{2}/2}$. Beneficial effects of large $\sigma $ arise when $\nu $ has a large positive deviation. The most likely $v$ that leads to a particular $\sigma $ is when $\nu $ is smallest (i.e., at ${v}^{*}\simeq \sigma /2z$), in which case $\nu \simeq \sqrt{2\sigma z}$, yielding a tail probability $\sim {e}^{\sigma z}$. Remarkably, the beneficial DFE in the welladapted regime is quite generally an exponential distribution independent of the precise form of the DVF (unless it is singular). In particular, we show in the SI that for the DFE, $\rho (\sigma z)$,
which depends solely on the adaptedness of the organism. The exponential form arises because of the Gaussianity of $\nu $, but the argument can be easily extended to $\nu $ with nonGaussian tails.
An exponential beneficial DFE has been previously proposed by Orr, 2003 but arises here due to a qualitatively different argument. Orr’s result instead follows from extreme value theory: suppose the fitness effects of $\mathrm{\ell}$ loci ($\mathrm{\ell}\gg 1$) are sampled from a DFE $\rho (\sigma )$ and $F(\sigma )\equiv {\int}_{\mathrm{\infty}}^{\sigma}\rho ({\sigma}^{\prime})\mathit{d}{\sigma}^{\prime}$. Then, the probability that a beneficial mutation has at least a certain effect size $\sigma $ is $P({\sigma}_{b}\ge \sigma )=\frac{1F(\sigma )}{1F(0)}\approx \frac{\mathrm{ln}F(\sigma )}{\mathrm{ln}F(0)}$, where the latter approximation holds when beneficial mutations are rare (i.e., $1F(0)$ is small). A wellknown result from extreme value theory (Gumbel, 2004; Majumdar et al., 2020) implies that for a large family of distributions $\rho (\sigma )$ and for $\mathrm{\ell}\gg 1$, we have $\mathrm{\ell}\mathrm{ln}F(\sigma )\propto {e}^{k\sigma}$ (for some constant $k$) and therefore $P({\sigma}_{b}\ge \sigma )={e}^{k\sigma}$. This argument is consistent with our results, but does not yield the dependence of $k$ on adaptedness and the rate of beneficial mutations without additional information about $\rho (\sigma )$.
Under SSWM assumptions, from Equation (12), the typical effect size of a fixed mutation is ${\sigma}_{\text{fix}}\sim {z}^{1}$, which typically has a VF,
The above relation makes precise the effects of increasingcosts epistasis on adaptation. As adaptation proceeds, the delicate balance of high fitness configurations constrains fixed beneficial mutations to have moderate VFs. A mutation of small VF is likely to confer small benefit and is lost to genetic drift, while one with a large VF is more likely to disrupt an established high fitness configuration.
This intuition is not captured in regular fitness landscape models, which assume statistically equivalent loci, that is, ${v}_{i}=\overline{v}$ for all $i$ and $P(v)=\delta (v\overline{v})$ is singular. From Equation (9), we see that this leads to a Gaussian DFE whose mean decreases linearly with increasing fitness, in contrast to the exponential DFE in our theory. The key difference is the lack of loci with intermediate effect, which drive adaptation in the welladapted regime. As a consequence, the rate of beneficial mutations declines exponentially (${U}_{b}\sim {e}^{\overline{v}{z}^{2}/2}$) and the fitness thus sharply plateaus at $z\sim {\overline{v}}^{1/2}$. In contrast, our theory predicts a much slower depletion of beneficial mutations, ${U}_{b}\sim {z}^{2}$ (SI). The rate of adaptation is $dz/dt\sim {U}_{b}{p}_{\text{fix}}{\sigma}_{\text{fix}}\sim {z}^{4}$ (since ${p}_{\text{fix}}\sim {\sigma}_{\text{fix}}$), which leads to a slow but steady powerlaw gain in fitness, $z\sim {t}^{1/5}$. The rate of fixation of beneficial mutations is $d{n}_{s}/dt\sim {U}_{b}{p}_{\text{fix}}\sim {z}^{3}\sim {t}^{3/5}$, which gives ${n}_{s}\sim {t}^{2/5}$.
We verify our analytical results using numerics. As before, we generated a genotypephenotype map using the CN model with an exponential DVF, $P(v)={\overline{v}}^{1}{e}^{v/\overline{v}}$ and $\mathrm{\ell}=400$ loci. The DFE can be calculated exactly by plugging in this $P(v)$ in Equation (9):
We simulated the evolution of randomly generated genotypes from $z=0$ to $z=2.5$ and $z=5$, and the DFE across all loci was measured (we chose $\overline{y}=0,V=1$ so that $y=z,s=\sigma $). The theoretical prediction for the DFE, Equation (14), closely aligns with the numerical results (Figure 4c).
Due to computational constraints, it is difficult to simulate evolution deep into the welladapted regime. To compute the shape of adaptive trajectories and their variability, we instead simulated SSWM dynamics using the DFE directly from Equation (14), beginning from a neutrally adapted fitness ($z=0$). Typical trajectories (Figure 4d) show rapid adaptation to the welladapted regime beyond which the fitness grows slowly as ${t}^{1/5}$, as predicted from theory. The predictions for the number of fixed beneficial mutation are also recapitulated (Figure 4e).
Discussion
Recent empirical studies have observed consistent patterns of diminishingreturns and increasingcosts epistasis. Our model gives a simple explanation for these observations. In particular, we showed that these patterns are generic consequences of widespread microscopic epistatic interactions. The intuition underlying this result is that a random mutation typically has a larger disruptive effect on the delicate balance of microscopic epistasis that underpins a fitter background. Our model predicts a quantitative relationship between the magnitudes of global epistasis (i.e., the negative slope of diminishingreturns and increasingcosts epistasis) and microscopic epistasis, which we confirmed using existing data (Figure 3).
A similar explanation for diminishingreturns and increasingcosts epistasis has been recently proposed by Lyons et al., 2020. While our core argument for diminishingreturns and increasingcosts epistasis is the same as in that work, our Fourier analysis framework dissects the features of the fitness landscape necessary to observe these phenomena in terms of experimentally measurable average effects (i.e., the $f$’s in Equation (2)). In particular, we show that the additivity of a locus’ interacting partners critically determines whether the trend is negative or unbiased. In addition, the Fourier analysis framework yields predictions for the DFEs, the historical influence of past mutations on the fitness effect of a newly mutated site, and motivates the proposed ‘connectedness’ fitness landscape model. The analysis of experimental data presented in Lyons et al. complements the experimental data considered here, lending further empirical support for the prevalence of epistasis and its importance in determining longterm adaptability.
Our model leads to other experimentally testable predictions. The most direct and accessible test of the theory is to measure the fitness for all possible combinations of mutations at $\sim $10–15 significant loci and compare (using Equation (6)) the magnitude of global epistasis to the measured fitness coefficients (the $f$’s). Additionally, we predict that the magnitude of global epistasis of a double mutant should be nearly the sum of magnitudes of the corresponding single mutants, and any deviations should be biased towards subadditivity. Since the predictions involve measuring residual variance, experimental noise can be an important confounding factor.
The observation that diminishingreturns occurs as a ‘regression to the mean’ effect on certain fitness landscapes has been noted previously (Draghi and Plotkin, 2013; Greene and Crona, 2014). The theory developed here quantifies precisely when we should expect to observe these patterns. We emphasize that our key result, Equation (5), is a general statistical relation that holds if epistasis is widespread, irrespective of the specific genetic architecture and the corresponding fitness landscape. Weak epistasis with many loci is sufficient to observe noticeable patterns of global epistasis. However, the argument fails if the contribution of a locus is purely additive or when epistasis is limited to one or a handful of other loci. In the latter case, we expect the fitness effect of a mutation to be dominated by the allelic states of its partner loci, and thus take on a few discrete values. A few examples from Johnson et al., 2019 indeed exhibit this pattern (e.g., cases where the fitness effect of a specific mutation depends primarily on the allelic state at a single other locus).
We highlight a distinction between global epistasis discussed in this work and another form of global epistasis (also known as ‘nonspecific’ epistasis) typically used in protein evolution to describe nonspecific epistatic interactions due to a nearly additive trait transformed by a nonlinear function (Starr and Thornton, 2016; Otwinowski et al., 2018; Otwinowski, 2018; Sailer and Harms, 2017; Husain and Murugan, 2020). This nonlinear function creates systematic relationships between epistasis terms and breaks the condition of independent epistatic terms required for our arguments to apply. Specific nonlinearities such as an exponential function may indeed lead to a negative linear trend on average, but the structure of the residuals differs from the one in Equation (5) and observed in data.
A surprising empirical observation is that the negative linear relationship between fitness effect and ancestral fitness characteristic of global epistasis has different slopes for different loci. Our model identifies the negative slope as twice the fraction of variance contributed by a locus to the trait. To explain the wide range of VFs observed in data, we developed the CN model, a framework to think about the organization of cellular processes that can lead to loci of widely varying VFs. In the CN model, loci have a large VF due to their involvement in many different pathways rather than due to a large effect on a single pathway. The CN model can be viewed as a statistical fitness landscape where loci can have a range of VFs, specified by the DVFs. In the special case of every locus having the same VF, the CN model corresponds to a fitness landscape with tunable ruggedness and exponentially decaying correlations.
Extending our framework to incorporate adaptation, we showed that the DFE depends only on the current fitness, rather than the entire evolutionary history, under the intuitive assumptions behind the CN model. The theory therefore gives a simple explanation for why phenotypic evolution can be predictable, even while the specific mutations that underlie this evolution are highly stochastic.
Our framework has an implicit notion of ‘adaptedness’ without referencing a Gaussianshaped phenotypic optimum, often assumed in models of adaptation (e.g., Fisher’s geometric model) (Fisher, 1930; Orr, 2005; Martin and Lenormand, 2006). Over the course of adaptation, the DFE shifts towards deleterious values, reflecting diminishingreturns, which naturally arises from our basic arguments. For a welladapted organism, we show that the DFE for beneficial mutations takes on an exponential form and leads to universal adaptive dynamics. While an exponential DFE for beneficial mutations has been proposed previously based on extreme value theory (Orr, 2003), our result arises due to an entirely different argument: the tail of the beneficial DFE is determined by loci of intermediate size whose disruptive effect due to increasingcosts is small, yet whose effect size is large enough not to be lost due to genetic drift.
Our theory further predicts declining adaptability, with rapid adaptation in a neutrally adapted regime followed by much slower increases in fitness, resulting in powerlaw adaptive trajectories when the organism is welladapted. This is consistent with observations from the E. coli LTEE (Wiser et al., 2013; Lenski et al., 2015). Our model predicts a quicker decline in the number of substitutions (${n}_{s}\sim {t}^{2/5}$) compared to the near linear trend observed in the LTEE data (Good et al., 2017). However, the dynamics of fixation in the LTEE deviate strongly from SSWM assumptions. This may explain the discrepancy, although we note that existing theory has only analyzed the effects of clonal interference and other breakdowns in SSWM assumptions for a constant DFE and weak epistasis (Good et al., 2012; Schiffels et al., 2011). Further work will be required to understand how these effects interact with global epistasis. For example, we may expect that the effect of a highly beneficial mutation at a segregating locus is more likely to be attenuated due to interference from subsequent deleterious mutations, while a lessfit lineage has a larger pool of beneficial mutations and is thus more likely to ‘leapfrog’ over morefit lineages.
Materials and methods
The code and data to generate the figures are available at Reddy, 2020.
Simulations
Request a detailed protocolWe use a fitness landscape model with $\mathrm{\ell}$ loci to generate the genotypefitness map. Each locus is assigned a sparsity µ from $P(\mu )$, which is an exponential distribution with mean $\overline{\mu}$. Each of $M$ independent pathways sample loci with each locus $i$ having probability ${\mu}_{i}$ of being selected to a pathway. We choose $\mathrm{\ell}=400,\overline{\mu}=0.02,M=500$ so that $\overline{\mu}\mathrm{\ell}=8$ ensures significant epistasis. All loci in a pathway interact with each other, where additive and higherorder coefficient terms of all orders were drawn independently from a standard normal distribution. The total fitness is the sum of contributions from the $M$ pathways. We normalize the coefficients so that the sum of squares of all coefficients is 1, that is, the total variance across genotypes is 1. The mean, $\overline{y}$, is close to zero from our sampling procedure. The above procedure is a simple and efficient way to generate epistatic terms to order $\sim $20, beyond which the computational requirements are limited by the exponentially increasing demand. Note that the effects described in the paper were also observed with only pairwise and cubic epistatic terms.
The VFs shown in Figure 2a can be calculated numerically from the definition. From the theory, given our choice of $P(\mu )$, these should follow an exponential distribution with mean $\overline{v}\approx \overline{\mu}/(1+\overline{\mu})$. There may be deviations since $M$ is finite whereas the calculations assume $M\to \mathrm{\infty}$. To generate Figure 2b, in order to get a range of background fitnesses, we first sample 128 random genotypes. These have fitnesses close to zero; in order to obtain a range of fitness values, we simulated the evolution of these 128 genotypes up to $y=1,2,3,4,5$ under SSWM assumptions to get $128\times 5=640$ genotypes at roughly five fitness values. The fitness effect of applying a mutation (i.e., flipping its sign) is measured for 30 randomly chosen loci (which are kept fixed) over each of the 640 genotypes. This is shown for 5 of the 30 and for the mean over the 30 loci in Figure 2b.
To generate Figure 2c, we sampled 64 random genotypes and 12 replicates of each. The evolution of these 768 genotypes was simulated for a total of 50 generations with a mutation rate of 1 per generation. The mean fitness gain over the 12 replicates is plotted for each of the 64 founders against their initial fitness.
To generate Figure 2d, the residuals are measured using the same procedure as for the experimental data analysis described below for the initial 128 genotypes at $y\approx 0$ and the 30 loci with the largest VF.
Double mutants were created by mutating all pairs of the 30 randomly chosen loci on the 640 evolved genotypes. Their mean fitness effect was computed and plotted along with the mean fitness effect for single mutants, shown in Figure 2e. The VF of the pair of loci for the double mutant was estimated as before and compared to the sum of the estimated VFs of the corresponding single mutants. This is shown in Figure 2f.
To generate the plots in Figure 4c, we simulated the evolution of 128 randomly sampled genotypes to $y=2.5$ and $y=5$. The fitness effect of 200 randomly sampled loci was measured and the distribution is plotted.
Analysis of the data from Johnson et al.
Request a detailed protocolThe data from Johnson et al., 2019 consists of the fitness after the addition of 91 insertion mutations on each of 145 background genotypes. The fitness of a particular mutation at locus $i$ can be modeled as
where ${y}_{i},y$ are the mutant and background fitnesses, respectively, ${c}_{i},{b}_{i}$ are constants for each locus, and the residual ${\text{Residual}}_{i}(g)$ depends on the background genotype $g$.
We estimate the VF ${v}_{i}=(1{\widehat{\rho}}_{i})/2$, where the Pearson correlation ${\widehat{\rho}}_{i}=\text{Corr}({y}_{i}\oplus y,y\oplus {y}_{i})$, where the symbol ⊕ denotes that the mutant and background fitness datasets are concatenated. ${\stackrel{~}{v}}_{i}$ is estimated as the negative onehalf of the slope of the best linear fit of ${s}_{i}={y}_{i}y$ and $y$. The residuals for each of the 145 genotypes for each of the 91 mutations are simply
where the overline represents an average over the 145 genotypes, which is used as an estimate of the constant term and ${c}_{i}=2{\stackrel{~}{v}}_{i}1$. In Figure 3b, we plot the distribution of estimated v_{i} and ${\stackrel{~}{v}}_{i}$. In Figure 3c, we compute the AoIL for each locus using Equation (7), which we show in the SI to be $\text{Cov}({s}_{i},{y}_{i}+y)/(\text{Cov}({s}_{i},{y}_{i}+y)+\text{Var}({s}_{i}))$. In Figure 3d, we compute the additivity using Equation (8). The additive effect is ${f}_{i}=\overline{({y}_{i}y)}/2$, and $\text{Var}({s}_{i})/4$ gives the sum of squares of the epistatic terms (SI). In Figure 3e, we compute the variance of the residuals across the 145 genotypes for each locus and plot it against the locus’ estimated ${\stackrel{~}{v}}_{i}(1{\stackrel{~}{v}}_{i})$. In Figure 3f, we plot the distribution of residuals over all genotypes and loci. The prediction is that in the WE limit the distribution of residuals is determined by $2{\stackrel{~}{v}}_{i}(1{\stackrel{~}{v}}_{i})\eta $, where $\eta $ is a Gaussian random variable. We multiply $\sqrt{{\stackrel{~}{v}}_{i}(1{\stackrel{~}{v}}_{i})}$ for each locus with 10,000 i.i.d. standard normal random variables, pool the resulting numbers for all loci, and plot the predicted distribution in Figure 3f. The distributions are variancematched. While Figure 3e shows that the variance of the residuals aligns with the theoretical prediction of being proportional to slope, Figure 3f shows that the data is also consistent with the predicted Gaussianity of the backgroundgenotypedependent contribution.
Appendix 1
Diminishingreturns and increasingcosts epistasis in a model of a complex trait
In the main text, we propose that the fitness effect after a mutation at locus $i$ can be written as
Here, ${\eta}_{i}$ is a genotype and locusdependent contribution, which is distributed as a meanzero Gaussian with variance equal to the total genetic variance, $\overline{y}$ is a constant, and v_{i} is the variance fraction, which is defined further below. Equation (17) corresponds to the symmetric case where the fitness effects of the mutations ${x}_{i}=1\to +1$ and ${x}_{i}=+1\to 1$ are simultaneously regressed against their respective background fitness. The directed case when the mutation is specified to change from ${x}_{i}=1\to +1$ (or ${x}_{i}=+1\to 1$) will be considered in the directed mutation section below.
We show that Equation (17) arises under certain conditions in a generic model of a complex trait with $\mathrm{\ell}\gg 1$ biallelic loci. Essentially, we would like to compute the distribution of the new fitness, y_{i}, across genotypes after a mutation at locus $i$ given the current fitness $y$ and the set of all parameters $\mathrm{\Theta}$ of the model (i.e., $P({y}_{i}y,\mathrm{\Theta})$, with ${s}_{i}={y}_{i}y$). Using the chain rule of probability, we can write
where the sum is over all possible genotypes. While $P({y}_{i}g,\mathrm{\Theta})$ is determined by the genotypefitness map, $P(gy,\mathrm{\Theta})$ is the crucial factor that gives weight only to the genotypes that yield the current fitness $y$. If the fitness is much larger than the mean fitness over all possible genotypes, Equation (18) implicitly ensures that weight is given to only those genotypic configurations that lead to such an unusually large fitness. We will analyze the case of a ‘microcanonical ensemble’ where every genotypic configuration that leads to a particular fitness is equally likely (with no linkage), that is, the prior $P(g)$ across genotypes is uniform (see the section 'Maximum entropy interpretation' for a discussion).
It is difficult to directly evaluate the sum in Equation (18). In the following sections, we give a simple derivation but elaborated to highlight the key assumption that leads to Equation (17). In short, the negative correlation between ${s}_{i}={y}_{i}y$ and $y$ implied by Equation (17) is a trivial consequence of y_{i} and $y$ having the same distribution w.r.t. $P(g)$. $\eta}_{i$ is in general arbitrary and determined by the genotypefitness map. However, ${\eta}_{i}$ is normally distributed if we make certain assumptions about the structure of epistasis in the genotypefitness map. The emergence of the negative linear relationship for a directed mutation ${x}_{i}=1\to +1$ is subtler.
Fourier representation of the fitness function
The fitness $y(g)$ for a genotype of length $\mathrm{\ell}$, $g=\{{x}_{1},{x}_{2},\mathrm{\dots},{x}_{\mathrm{\ell}}\}$, where ${x}_{i}=\pm 1$, can be generally written as Poelwijk et al., 2016; Neher and Shraiman, 2011
The symmetric choice of ${x}_{i}=\pm 1$ is chosen for mathematical convenience. In this form, the total variance contribution from the qthorder epistatic terms is ${\sum}_{{i}_{1}>{i}_{2}>\mathrm{\dots}{i}_{q}}{f}_{{i}_{1}{i}_{2}\mathrm{\dots}{i}_{q}}^{2}$, and the total genetic variance, defined as
is the sum of variance contributions from orders $q=1$ to $\mathrm{\ell}$, that is, the sum of squares of all the $f$’s. The sum over the ${2}^{\mathrm{\ell}}$ genotypes is an expectation value assuming all genotypes are equally likely; we will denote this expectation using an overline hereafter. We use this expectation value as a proxy for empirical averages over the ‘background’ genotypes in a population. With a sufficient number of background genotypes, the empirical average should converge to this expectation value.
The representation in Equation (19) is a Fourier representation of the fitness function on the $\mathrm{\ell}$dimensional hypercube and makes calculations much simpler. For instance, to get the terms from each order, we have
It is useful to define the variance contribution due to a particular locus $i$ as (symmetry w.r.t. interchanging indices is used for each term throughout)
We also define the variance fraction of locus $i$,
which plays a key role in the model.
Derivation of Equation (17)
We would like to relate the fitness before and after locus $i$ is flipped (i.e., ${x}_{i}\to {x}_{i}$), denoted by $y$ and y_{i}, respectively. We have from Equation (19),
where ${y}_{\overline{i}}$ and ${\xi}_{i}$ respectively contain all the terms not containing and containing locus $i$, that is,
We have ${y}_{\overline{i}}=(y+{y}_{i})/2\overline{y}$ and ${\xi}_{i}=(y{y}_{i})/2$. Since ${y}_{\overline{i}}$ and ${\xi}_{i}$ contain all the genotype dependence, we can write
From the properties of the Fourier representation in Equation (21), it is easy to see that the means are $\overline{{y}_{\overline{i}}}=0,\overline{{\xi}_{i}}=0$, the variances are $\overline{{y}_{\overline{i}}^{2}}=V{V}_{i}$, $\overline{{\xi}_{i}^{2}}={V}_{i}$, and the covariance is $\overline{{y}_{\overline{i}}{\xi}_{i}}=0$.
The calculations so far have been exact. We now make the key assumption that ${y}_{\overline{i}}$ and ${\xi}_{i}$ are both normaldistributed across genotypes. This assumption is similar to that of Fisher’s infinitesimal model, where the distribution of trait values across strains for a complex trait is argued to be normaldistributed since the trait value is due to infinitesimal independent contributions from many loci. While ${y}_{\overline{i}}$ is easily seen to be normaldistributed for large $\mathrm{\ell}$, an argument can be made for ${\xi}_{i}$ only if locus $i$ has a large number of independent, nonzero epistatic terms and the additive term f_{i} is smaller in magnitude than the epistatic terms; specifically, we require that ${f}_{i}^{2}\lesssim {\sum}_{j\ne i}{f}_{ij}^{2}+{\sum}_{k<j\ne i}{f}_{ijk}^{2}+\mathrm{\dots}$. If instead ${f}_{i}^{2}\gg {\sum}_{j\ne i}{f}_{ij}^{2}+{\sum}_{k<j\ne i}{f}_{ijk}^{2}+\mathrm{\dots}$, then ${\xi}_{i}$ is bimodal, where the two modes correspond to ${\xi}_{i}\approx \pm {f}_{i}$ at ${x}_{i}=\pm 1$. For loci with pairwise and thirdorder epistasis, the number of pairwise and thirdorder epistatic terms scale $\propto \mathrm{\ell}$ and $\propto {\mathrm{\ell}}^{2}$, respectively, which justifies the normality assumption for large $\mathrm{\ell}$ even if individual epistatic terms are smaller in magnitude than the additive terms.
Under these assumptions and since ${y}_{\overline{i}}$ and ${\xi}_{i}$ are linearly independent, we have
where $\phi $ is the standard normal pdf. Therefore, y_{i} is normaldistributed across genotypes and from the above equation can be written as
where the VF ${v}_{i}={V}_{i}/V$ was defined previously and $\overline{{\eta}_{i}}=0,\overline{{\eta}_{i}^{2}}=V$. This leads to the form in Equation (17),
The above derivation was presented to clarify the basic assumptions. Simply computing the covariance between $y$ and y_{i} in Equations (24) and (25), we get $\overline{(y\overline{y})({y}_{i}\overline{y})}=\overline{{y}_{\overline{i}}^{2}}\overline{{\xi}_{i}^{2}}=V2{V}_{i}$. The correlation is then $12{v}_{i}$. Equation (32) follows if additionally ${y}_{i},y$ are jointly Gaussian, which is true if locus $i$ has many independent, nonzero epistatic terms.
Directed mutation
Previously, we considered the symmetric flip ${x}_{i}\to {x}_{i}$ and averaged over all $\mathrm{\ell}$ loci including $i$. Here, we consider the case when the mutation is specified to change either from ${x}_{i}=1\to +1$ or ${x}_{i}=+1\to 1$. In this case, we should average over all loci except $i$.
We consider ${x}_{i}=1\to +1$ (the opposite case is similar). The fitness before the mutation is
and the fitness after the mutation is
so that
The tilde and hat are used here to distinguish the fitness from the y_{i} defined previously for the symmetric case corresponding to the flip ${x}_{i}\to {x}_{i}$. We can easily compute averages over genotypes using the Fourier orthogonality relations. To compute the expectation value of products of two quantities, say s_{i} and ${y}_{i}^{}$, we simply compute a dot product where the components of the vectors are the coefficients of terms in the expansion above. For example, $\overline{{s}_{i}{y}_{i}^{}}$ is $(\overline{y}{f}_{i})(2{f}_{i})+{\sum}_{j\ne i}({f}_{j}{f}_{ij})(2{f}_{ij})+{\sum}_{j>k\ne i}({f}_{jk}{f}_{ijk})(2{f}_{ijk})+\mathrm{\dots}$. We have for the means, variances, and covariances,
The slope when s_{i} is regressed against ${y}_{i}^{}$ is $\overline{({s}_{i}{\overline{s}}_{i})({y}_{i}^{}\overline{{y}_{i}^{}})}/\overline{({y}_{i}^{}\overline{{y}_{i}^{}}{)}^{2}}$. We can define a ‘modified’ VF ${\stackrel{~}{v}}_{i}$ as half the negative slope,
Writing the linear form based on this correlation, we get
where ${\eta}_{i}$ is again normally distributed (in the WE limit) with zero mean and variance $\overline{({y}_{i}^{}\overline{{y}_{i}^{}}{)}^{2}}$ and $K}_{i}^{2}=\frac{\overline{({s}_{i}{\overline{s}}_{i}{)}^{2}}}{\overline{({y}_{i}^{}\overline{{y}_{i}^{}}{)}^{2}}}4{\stackrel{~}{v}}_{i}^{2$. Note that, unlike v_{i}, ${\stackrel{~}{v}}_{i}$ can be negative.
However, we argue that ${\stackrel{~}{v}}_{i}$ is typically positive in the WE limit, which leads to a negative linear trend. The second term in the numerator on the righthand side of Equation (42) has the same number of terms as $\overline{({s}_{i}{\overline{s}}_{i}{)}^{2}}$, but these terms appear as products of Fourier coefficients that may have opposing signs. In particular, if ${\sum}_{j\ne i}{f}_{ij}^{2}>{\sum}_{j\ne i}{f}_{j}{f}_{ij}$, $\sum _{j>k\ne i}{f}_{ijk}^{2}>\sum _{j\ne i}{f}_{jk}{f}_{ijk}$, and so on, then ${\stackrel{~}{v}}_{i}$ is guaranteed to be positive. If we denote the typical magnitude of qthorder epistasis terms as e_{q} (e_{1} corresponds to additive effects), each of this relationships has the form ${e}_{q+1}^{2}\mathrm{\ell}>{e}_{q}{e}_{q+1}\sqrt{\mathrm{\ell}}$ when $\mathrm{\ell}\gg 1$, that is, ${e}_{q}<{e}_{q+1}\sqrt{\mathrm{\ell}}$. If the number of loci is sufficiently large, then these relationships will hold even if the typical magnitude of individual higherorder epistasis terms is smaller than the lowerorder terms. We therefore expect that the second term in the numerator on the righthand side of Equation (42) is smaller than $\frac{\overline{({s}_{i}{\overline{s}}_{i}{)}^{2}}}{2}$ when $\mathrm{\ell}\gg 1$. A similar argument can be made for the cross terms ${f}_{j}{f}_{ij},{f}_{jk}{f}_{ijk},\mathrm{\dots}$ once the squares in the denominator of the righthand side of Equation (42) are expanded.
When $\mathrm{\ell}\gg 1$, we can then write
Further, in the WE limit, ${K}_{i}^{2}\approx 4{\stackrel{~}{v}}_{i}4{\stackrel{~}{v}}_{i}^{2}$ so that the variance of ${K}_{i}{\eta}_{i}$ is $\propto {\stackrel{~}{v}}_{i}(1{\stackrel{~}{v}}_{i})$.
To estimate the ratio of the magnitudes of the second and first terms in the numerator on the righthand side of Equation (42) from data, we use the expression for the covariance,
to get
Comments on the result
Fitness as a nonlinear function of an additive trait
The negative linear trend observed in data may arise due to the measured fitness being a nonlinear function of an unobserved additive trait. In this case, the epistasis terms are systematically related to each other and the independence assumptions used to derive Equation (32) break down. In short, we show that specific nonlinearities can indeed lead to a negative linear trend, but the statistics of the residuals observed in data make this possibility unlikely.
Suppose the fitness is $y=\varphi (u)$, where $\varphi $ is a nonlinear function, $u={f}_{0}+{\sum}_{i}{f}_{i}{x}_{i}$ is the unobserved additive trait, f_{0} is a constant, and f_{i} are additive coefficients. For a linear trend, we require ${s}_{i}=\varphi ({u}_{i})\varphi (u)\propto \varphi (u)$, where ${u}_{i}=u2{f}_{i}{x}_{i}$ for the flip ${x}_{i}\to {x}_{i}$. For small f_{i} relative to the other coefficients, we can Taylor expand $\varphi ({u}_{i})$ and show that we require $\varphi (u)\propto {e}^{u}$ to get a linear trend. This nonlinearity creates a linear trend with slope ${e}^{2{f}_{i}{x}_{i}}1$. For a negative linear trend, we require $2{f}_{i}{x}_{i}>0$. However, even if this condition is true, the relation ${s}_{i}=({e}^{2{f}_{i}{x}_{i}}1)y$ is exact and there are no residuals.
To introduce residuals, suppose instead that $u={f}_{0}+{\sum}_{i}{f}_{i}{x}_{i}+h$, where $h$ is a HoC term, that is, it is an independent Gaussian random variable (mean 0, variance ${\sigma}_{h}^{2}$) across genotypes and repeatable across measurements. $h$ introduces epistasis in the unobserved trait. We have ${s}_{i}=({e}^{2{f}_{i}{x}_{i}+{h}_{i}h}1)y$, where h_{i} is the HoC term after the mutation, so that the average fitness effect conditional on $y$ is ${\overline{s}}_{i}=({e}^{2{f}_{i}{x}_{i}+{\sigma}_{h}^{2}}1)y$. The conditional variance of the residuals is $\overline{({s}_{i}{\overline{s}}_{i}{)}^{2}}={e}^{4{f}_{i}{x}_{i}+2{\sigma}_{h}^{2}}({e}^{2{\sigma}_{h}^{2}}1){y}^{2}$. Note that the residual variance is no longer proportional to the slope and this variance increases as ${y}^{2}$, which are both inconsistent with the data.
Maximum entropy interpretation
The expectation values are averages over all the genotypes assuming that every genotype of a particular fitness is equally likely, that is, the distribution over genotypes is uniform. This assumption is analogous to ensemble averages over a microcanonical ensemble in statistical physics, where one assumes that all the particle configurations that have a particular energy are equally likely. The experimental setting in Johnson et al., 2019 is similar. The background genotypes are generated from a cross between two strains, which due to recombination makes each locus have equal probability of being one of the two alleles. Closely linked loci may be considered together as blocks. Some of the loci are partially linked, which may lead to deviations from the predictions. The expressions derived above can be easily extended to the case with different background genotype statistics.
The uniform distribution has an informationtheoretic interpretation as the distribution that has the maximum entropy (MaxEnt) given no additional knowledge of how the genotype was generated. Equation (32) can therefore be viewed as the MaxEnt prediction of the fitness effect if locus $i$ is mutated conditioned on the current observed fitness $y$. A key idea that will be used throughout the paper is that when each locus $i$ has a significant number of independent, nonzero epistatic terms, the distribution of fitness and fitness effects is jointly normal with respect to the uniform prior over genotypes. From wellknown properties of multivariate normal distributions, the MaxEnt predictions of unobserved variables are multilinear forms of the observed variables. For example, the MaxEnt prediction for s_{i} given an observed sequence of past fitness is an autoregressive Gaussian process defined by the covariance between the unobserved and observed variables (see section 'History dependence' below).
Varying a subset of loci
In addition, the sums in Equation (19) are over $\mathrm{\ell}$ loci involved in a trait. In reality, we may vary a subset $P$ of all loci and take averages over only this subset. The derivation still follows through in this case; we can simply write the fitness in terms of effective parameters as
Here, we are abusing notation — it is to be assumed that a coefficient with a particular combination of indices appears only once in the sums. The results from the previous section still apply w.r.t. these new effective parameters. For instance, the total variance and the variance due to locus $i\in P$ are
and the effective VF is ${\stackrel{~}{V}}_{i}/\stackrel{~}{V}$.
Relationship to the fluctuationdissipation relation
Equation (32) is analogous to the fluctuationdissipation theorem in statistical physics (Kubo, 1966), which relates the response of a thermodynamic system to a perturbation. The relationship between the magnitude of macroscopic epistasis (the slope in Equation (32)) and the variance due to the background genotypes is analogous to the relationship between viscous drag and Brownian motion (Kubo, 1966). For Brownian motion, the normality arises due to numerous independent collisions of a particle with neighboring particles. In our case, natural selection acts as an external perturbation that pushes the system away from equilibrium (here $\overline{y}$). Diminishingreturns naturally arises as the tendency of the system to revert to its entropically favored equilibrium state.
Variance fraction of double mutants
Using the arguments from the previous section, it is easy to show that the variance of a double mutant at loci $i$ and $j$, ${V}_{ij}$, is necessarily subadditive. In particular, we have
is the total epistatic variance between loci $i$ and $j$. The correlation between the new fitness after a double mutation and the previous fitness is $12{V}_{ij}/V\equiv 12{v}_{ij}=12{v}_{i}2{v}_{j}+2{e}_{ij}$, where ${e}_{ij}\equiv {I}_{ij}/V$ is the epistatic variance fraction between $i$ and $j$.
Appendix 2
Connectedness model
We introduce a ‘connectedness’ model (the CN model, for short), where each locus has a probability ${\mu}_{i}$ of being involved in any particular interaction. We can interpret ${\mu}_{i}$ as the fraction of independent pathways that involve locus $i$, where each pathway has epistatic interactions between all loci involved in that pathway. The number of independent pathways, $M$, is assumed to be very large. The probability of an epistatic interaction between, say loci $i,j,k$, is proportional to ${\mu}_{i}{\mu}_{j}{\mu}_{k}$ since this is the probability that these loci are involved in the same pathway. The magnitude of the interaction term is ${f}_{ijk}^{2}\propto {\mu}_{i}{\mu}_{j}{\mu}_{k}$, where the proportionality is the magnitude of the perturbative effect of the mutations, which is assumed to be constant for all orders of interaction. We set this quantity to unity since it simply scales the fitness coefficients and does not affect subsequent results. The CN model leads to a specific interpretation (and hence its name): the outsized contribution to the variance from a particular locus is due to its involvement in many different complex pathways and not from an unusually large perturbative effect on a few pathways. For large $\mathrm{\ell}$, the CN model is specified by the distribution, $P(\mu )$, across loci. In particular, given $P(\mu )$, we can calculate the total genetic variance, $V$, and the variance due to locus $i$, ${V}_{i}$. We define $\overline{\mu}\equiv {\int}_{0}^{1}\mu P(\mu )\mathit{d}\mu $.
We calculate the expected total variance across statistical ensembles. Note that here the expectations are averages over ensembles where the parameters of the model are resampled, in contrast to the derivations presented in sections above, which were ensemble averages over equally likely genotypes. Since each pathway is independently sampled, expectations approximate the values in a single realization as $M\to \mathrm{\infty}$. All expectations are denoted $\u27e8.\u27e9$. We calculate the expected variance contribution from one pathway: since all pathways are statistically identical, the total variance from $M$ pathways is simply $M$ times the expected contribution from a single pathway. The contribution from the qthorder interaction between loci ${i}_{1},{i}_{2},\mathrm{\dots},{i}_{q}$ is $\u27e8{f}_{{i}_{1}{i}_{2}\mathrm{\dots}{i}_{q}}^{2}\u27e9={\prod}_{n=1}^{q}{\mu}_{{i}_{n}}$. The expected total variance is
The variance due to terms involving locus $i$ is
Therefore, we have
There are two qualitatively different regimes, $\overline{\mu}\mathrm{\ell}\ll 1$ and $\overline{\mu}\mathrm{\ell}\gg 1$. When $\overline{\mu}\mathrm{\ell}\ll 1$, each pathway typically contains a single locus and should lead to an additive model. In this limit, we can write from Equation (61),
which is consistent with the expectation that ${v}_{i}\sim {\mathrm{\ell}}^{1}$ for an additive model. In the opposite limit, $\overline{\mu}\mathrm{\ell}\gg 1$, we have
The CN model therefore leads to an intuitive interpretation of the VF as being determined by the sparsity of interactions of a locus. Equation (63) yields an upper bound at ${v}_{i}=0.5$, at ${\mu}_{i}=1$. From Equation (31), we see that this case is equivalent to the HoC model of maximal epistasis where the new fitness after a mutation is independent of the previous fitness. The DVF $P(v)$ is directly determined by $P(\mu )$; the CN model can therefore be used as a generative model to generate fitness landscapes with arbitrary DVFs.
We can further calculate the epistatic variance between two loci in the CN model. The total epistatic variance ${I}_{ij}$ between loci $i$ and $j$ is
In the limit $\overline{\mu}\mathrm{\ell}\gg 1$, the epistatic VF after dividing by $V$ is then simply ${e}_{ij}={\mu}_{i}{\mu}_{j}/(1+{\mu}_{i})(1+{\mu}_{j})={v}_{i}{v}_{j}$. Using Equation (53), we have
If v_{i}’s are small, the CN model predicts nearadditivity between the effects of two loci. This is not inconsistent with the strong epistasis assumption implicit in the limit $\overline{\mu}\mathrm{\ell}\gg 1$: though the total contribution of epistatic interactions to the genetic variance may be large, the epistatic variance between two specific loci can still be negligible. This is because the majority of epistatic variance is due to the combinatorially large number of higherorder epistatic terms whose individual effects themselves can be weak.
Relationship to statistical fitness landscapes
Statistical fitness landscapes such as the NK model and the Rough Mt. Fuji model are closely related to the CN model described above. The CN model falls under the broad class of generalized NK models (Hwang et al., 2018). In generalized NK models, epistasis is due to modules (or ‘pathways’) of $K$ loci that interact epistatically with each other. The different NK models differ in how the loci are assigned to the modules and the interaction structure within the module. In the CN model, each locus has a locusspecific probability of being part of any module and the interaction structure within a module is alltoall. The locusspecific probability gives rise to a highly nonregular model, that is, loci can have a wide range of contributions to the total variance. This feature gives rise to qualitatively different adaptation properties. We will show this further below in addition to showing that the CN model has a special memoryless property.
Wellstudied fitness landscapes such as Kauffman’s NK model and the Rough Mt. Fuji landscape are regular, that is, every locus contributes equally to the variance. In other words, the DFE, which is determined by the DVF in our picture, instead comes from a single constant value $v=\overline{v}$. The fluctuations of the fitness effects are solely due to the genotypedependent term $\eta $, which is a Gaussian. In a later section, we show that the DFE in this case corresponds to a Gaussian with mean $2\overline{v}(y\overline{y})$. As adaptation proceeds and $y$ increases, the DFE shifts to the deleterious side but retains its Gaussian shape. The adaptation properties that result from this DFE are quite different from those arising from our theory. Further, while the regular fitness landscape picture may lead to a good approximation of our results when the number of mutations is large, it does not capture the different magnitudes of diminishingreturns for different loci observed in experiments.
The landscape of the regular CN model
The CN model for the special case of homogeneous loci, ${\mu}_{i}=\overline{\mu}$ for all $i$, is similar to the NK model but with one major difference: the number of loci in each pathway in the NK model is fixed at $K$ loci, whereas in the CN model the typical pathway size is controlled by a continuous parameter $\overline{\mu}$. This introduces contributions at every epistatic order while effectively imposing sparsity on the contributions from higherorder interactions. We now show that the regular CN model has tunable ruggedness, that is, it transitions from an additive model to a model with maximal epistasis with increasing $\overline{\mu}$ and has exponentially decaying correlations for $\overline{\mu}\mathrm{\ell}\gg 1$.
The regular fitness landscapes often discussed are stationary Gaussian processes on a $\mathrm{\ell}$dimensional hypercube. The regular CN model (i.e., ${\mu}_{i}=\overline{\mu}$ for all $i$) also falls into this class as $M\to \mathrm{\infty}$. The key quantity that defines such a fitness landscape is the covariance function between the fitnesses of genotypes $g$ and ${g}^{\prime}$, $C(g,{g}^{\prime})=C(d)$, where $d\equiv g{g}^{\prime}$ is the Hamming distance between two genotypes. We now compute the covariance $C(d)$ for the regular CN model. Each order term in each pathway is independent and the covariances for each order add up over all pathways. We first calculate the expectation of the firstorder term in a single pathway, which is
Here, we have used $\u27e8{f}_{i}^{2}\u27e9=\overline{\mu}$, where the $\overline{\mu}$ comes from the probability of locus $i$ being selected for the module as argued previously. The covariance is linear in the distance between genotypes, as one would expect from an additive model. Directly calculating the higherorder terms is more complicated because of the ordering restriction $i>j$ for the secondorder term and for higher orders. As noted previously (Stadler and Happel, 1999; Neidhart et al., 2013; Agarwala and Fisher, 2019), these can instead be calculated using combinatorics, which we will demonstrate with the secondorder term. We have for the secondorder covariance
where the elementwise product ${(g{g}^{\prime})}_{i}\equiv {x}_{i}{x}_{i}^{\prime}$ is 1 if ${x}_{i}={x}_{i}^{\prime}$ match and −1 otherwise. If $d$ is the distance between $g$ and ${g}^{\prime}$, then the elementwise product $g{g}^{\prime}$ has $d1$ terms and $\mathrm{\ell}d$ 1 terms. The term in the summation above is 1 if both ${(g{g}^{\prime})}_{i}$ and ${(g{g}^{\prime})}_{j}$ are chosen from the $\mathrm{\ell}d$ subset or both are chosen from $d$ subset. This term is −1 if one of the two is chosen from the $\mathrm{\ell}d$ subset and the other from the $d$ subset. The number of terms that are 1 s are therefore $\left(\genfrac{}{}{0pt}{}{\mathrm{\ell}d}{2}\right)\left(\genfrac{}{}{0pt}{}{d}{0}\right)+\left(\genfrac{}{}{0pt}{}{\mathrm{\ell}d}{0}\right)\left(\genfrac{}{}{0pt}{}{d}{2}\right)$ and the number of terms that are −1 s are $\left(\genfrac{}{}{0pt}{}{\mathrm{\ell}d}{1}\right)\left(\genfrac{}{}{0pt}{}{d}{1}\right)$. This argument is easily extended to higher orders. The general qthorder contribution to the covariance is
It is easily verified that the firstorder term matches. When $d=0$, we recover the binomial coefficients, as expected. The summation above is precisely the Krawtchouk polynomial ${\mathcal{K}}_{q}(d;\mathrm{\ell},2)$ (Olver et al., 2010), which we will denote by ${\mathcal{K}}_{q}(\mathrm{\ell},d)$. We therefore have
The generating function of the Krawtchouk polynomials yields
The above expression is consistent with intuition: when $\overline{\mu}\mathrm{\ell}\ll 1$, the covariance is linear in $d$, as expected for an additive model. In the opposite limit of $\overline{\mu}\mathrm{\ell}\gg 1$ and $0<d<\mathrm{\ell}/2$, the constant term 1 can be ignored, and we see that the covariance is proportional to ${e}^{\lambda d}$, where $\lambda =\mathrm{ln}\left(\frac{1+\overline{\mu}}{1\overline{\mu}}\right)$. Epistasis is maximal when $\overline{\mu}\to 1$, in which case $\lambda \to \mathrm{\infty}$ and the covariance rapidly goes to zero with $d$. This is the HoC model of maximal epistasis.
The landscape of the general CN model
The landscape of the general CN model, where loci can have different ${\mu}_{i}$, is no longer stationary but the correlation structure can still be calculated. The qthorder contribution to the covariance between genotypes $g$ and ${g}^{\prime}$ in the general case is
It is easy to see that when the contributions from all orders are added up, the covariance $C(g,{g}^{\prime})$ has a rather simple product form
The correlation $c(g,{g}^{\prime})=C(g,{g}^{\prime})/\sqrt{C(g,g)C({g}^{\prime},{g}^{\prime})}$ is
The above relation is exact. When ${\mu}_{i}\ll 1$, $\overline{\mu}\mathrm{\ell}\gg 1$, then $1\pm {\mu}_{i}\approx {e}^{\pm {\mu}_{i}}$ and the 1’s in the numerator and denominator above can be ignored. We get
When ${\mu}_{i}$’s are equal, we recover the homogeneous case with exponentially decaying correlations.
Appendix 3
Adaptation
The distribution of fitness effects
Longterm adaptation is determined by the DFEs and the stochastic dynamical processes that lead to fixation. Before analyzing the properties of adaptive walks, we clarify what our previous analysis, which ultimately led to Equation (32), means for the DFE.
Equation (32) represents the distribution of the fitness effect of a specific mutation at locus $i$ over all genotypes in the population that have fitness $y$. We are instead interested in the DFE, where fitness effects are measured for all the mutations of a particular genotype that has fitness $y$. The DFE for a particular genotype generally depends on idiosyncratic epistatic interactions between loci. In order to make this explicit, we turn to our analysis framework described previously. For notational convenience, we will assume $\overline{y}=0$ and $V=1$ in this subsection and the next.
The DFE over $\mathrm{\ell}$ loci for a particular genotype of a fitness $y$ can be thought of as a sample, ${s}_{1},{s}_{2},\mathrm{\dots},{s}_{\mathrm{\ell}}$, from the conditional joint distribution $P({s}_{1},{s}_{2},\mathrm{\dots},{s}_{\mathrm{\ell}}y)$. From our assumption of numerous, independent epistatic terms for each locus, this joint distribution is defined entirely in terms of the means and covariances of the $\mathrm{\ell}+1$ variables $y,{s}_{1},{s}_{2},\mathrm{\dots},{s}_{\mathrm{\ell}}$. Recall that the fitness effect of a mutation at locus $i$ is ${s}_{i}={y}_{i}y=2{\xi}_{i}$ as defined in Equation (26). As shown previously, we have $\overline{{s}_{i}y}=2{v}_{i}$. We also have $\overline{{s}_{i}{s}_{j}}=4{e}_{ij}$ with ${e}_{ii}={v}_{i}$. The means of all the variables are zero. Based on this covariance structure, we can compute $P({s}_{1},{s}_{2},\mathrm{\dots},{s}_{\mathrm{\ell}}y)$ using the properties of the conditional distribution of a multivariate normal distribution. It is straightforward to show that conditional on the fitness $y$, the conditional means are given by ${\text{Mean}}_{y}({s}_{i})=2{v}_{i}y$ and the conditional covariances are ${\text{Cov}}_{y}({s}_{i},{s}_{j})=4({e}_{ij}{v}_{i}{v}_{j})$. This relation makes clear that in general the DFE from a sample ${s}_{1},{s}_{2},\mathrm{\dots},{s}_{\mathrm{\ell}}\sim P({s}_{1},{s}_{2},\mathrm{\dots},{s}_{\mathrm{\ell}}y)$ depends on the epistatic interactions between all pairs of loci via ${e}_{ij}$. Note that ${\text{Var}}_{y}({s}_{i})=4{v}_{i}(1{v}_{i})$, which leads to Equation (32) for the marginal distribution $P({s}_{i}y)$ of a particular locus.
The DFE simplifies considerably if we make certain additional assumptions on the nature of epistatic interactions. In particular, the conditional correlation between fitness effects is
If we assume the typical VF, $\overline{v}$ is small (i.e., $\overline{v}\ll 1$) and also that ${e}_{ij}$ is $O({\overline{v}}^{2})$, then correlations are $O(\overline{v})$ and thus negligible. Then, in a particular sample ${s}_{1},{s}_{2},\mathrm{\dots},{s}_{\mathrm{\ell}}$, we can think of each s_{i} as being drawn independently with mean $2{v}_{i}y$ and variance $4{v}_{i}(1{v}_{i})$. If $\mathrm{\ell}$ is large, then this leads to a DFE
where $P(v)$ is the DVFs across the loci and $\phi $ is the standard normal pdf. Surprisingly, this relation becomes exact for the CN model, where we have shown that ${e}_{ij}={v}_{i}{v}_{j}$ (Equation (66)) and therefore the correlations between s_{i}’s vanish. In this case, the DFE is thus determined entirely by the DVF, but we will show later that it has certain universal properties independent of even the DVF. Note that we derive the DFE starting from rather general assumptions on the organization of the genotypephenotype map in contrast to past models that assume the DFE as a starting point.
Above, we have measured the DFE for the subset of genotypes that have fitness $y$ without regard to their evolutionary history. Over the course of adaptation, mutations are fixed and certain fitness changes are observed. We would then like to measure the DFE for those genotypes that have undergone a particular adaptive trajectory. As we show below, the DFE again simplifies considerably if certain relations hold.
History dependence
Using an analysis similar to the one in the previous section, we quantify history dependence by calculating the correlations between the new fitness and the adaptive history conditional on the genotypes that have undergone a specific sequence of events in the past.
To be precise, suppose an initial clonal population of fitness y_{0} gains $k$ successive mutations and the corresponding sequence of fitnesses is ${y}_{1},{y}_{2},\mathrm{\dots},{y}_{k}$. We would like to quantify how the fitness of a new mutation at locus $k+1$, ${y}_{k+1}$, depends on past fitnesses and the idiosyncratic epistatic interactions between the previous $k$ mutations and the new mutation. The correlation between any two fitnesses y_{i} and y_{j} ($i<j$) is given by $12{v}_{i+1:j}$, where ${v}_{i+1:j}$ is the VF of the loci $i+1,i+2,\mathrm{\dots},j$ (the subscript notation will be used throughout). In general, ${v}_{i+1:j}$ accounts for the epistatic interactions of all orders between these $ji$ loci and is expressible in terms of the coefficients of our original complex trait model in Equation (19). One can then write the covariance matrix between ${y}_{0},{y}_{1},\mathrm{\dots},{y}_{k+1}$. In block form, this is
The mean of ${y}_{k+1}$ conditional on ${y}_{0},{y}_{1},\mathrm{\dots},{y}_{k}$ is a linear weighted sum of the past fitnesses:
where $\mathit{\bm{y}}$ is a vector with elements ${y}_{0},{y}_{1},\mathrm{\dots},{y}_{k}$. In other words, ${y}_{k+1}$ can be written as
where $\u03f5$ is a meanzero stochastic term that depends on the genotype of the initial population and whose variance can be calculated from $\mathrm{\Sigma}$. To gain intuition, it is useful to explicitly calculate the case of $k=1$. In this case, the covariance matrix is
We have
We therefore have
The dependence on the past has complex dependencies on epistasis between loci 1 and 2 even in this highly simplified case. To identify what contributes to history dependence beyond just the most recent fitness, we rewrite Equation (87) as
From here, we observe that the dependence on v_{1} and y_{0} vanishes precisely when ${v}_{12}={v}_{1}+{v}_{2}2{v}_{1}{v}_{2}$. This is not true for all landscapes. However, as noted previously, this is in fact true for the CN model (Equation (66)) when $\overline{\mu}\mathrm{\ell}\gg 1$. In this case, we get the simple relation
where the prefactor in the second term comes from normalization and $\eta $ is a Gaussian random variable.
From Equation (53), we have the general relation ${v}_{12}={v}_{1}+{v}_{2}2{e}_{12}$, where e_{12} is the epistatic VF between loci 1 and 2. We can then rewrite Equation (88) as
for ${v}_{1}\ll 1$ and ${s}_{1}={y}_{1}{y}_{0}$. An intuitive interpretation of this result is presented in the main text.
Sufficient condition for memoryless fitness gains
The $k=1$ case suggests that the relation for memoryless fitness gains (${e}_{12}={v}_{1}{v}_{2}$) could in fact be true for all $k$ under the CN model, which indeed turns out to be the case, as we show below. Motivated by the $k=1$ case, we would like to have ${w}_{k+1,k}=12{v}_{k+1}$ in Equation (83) and the rest of the weights equal to zero. If this is the case,
Multiplying both sides by y_{j} and computing the correlations, we get the condition $12{v}_{j+1:k+1}=(12{v}_{j+1:k})(12{v}_{k+1})$ for all $j$. Therefore, a sufficient condition for memoryless fitness gain is
for all $k$ and for all $j<k$. We will now show that this is true for the CN model for $j=0$, that is, ${v}_{1:k+1}={v}_{k+1}+{v}_{1:k}2{v}_{k+1}{v}_{1:k}$; the rest trivially follows. Let us first analyze what terms contribute towards ${V}_{1:k}$ (dividing by $V$ gives ${v}_{1:k}$). When loci 1 through $k$ are mutated, their effect is to flip the signs of their coefficients in Equation (19). The ones that have changed sign contribute to the decorrelation between the fitnesses before and after the set of mutations, but the epistatic terms that have an even number of flips do not change and therefore their contribution has to be subtracted. ${V}_{1:k}$ therefore is the sum of squares of all the coefficients in Equation (19) whose loci have flipped an odd number of times. To keep track of indices, suppose ${i}_{1},{i}_{2},\mathrm{\dots}$ are used to denote the indices of the $k$ loci (which take values from 1 to $k$) and ${j}_{1},{j}_{2},\mathrm{\dots}$ for the rest. Then,
and so on. Now, when the $k+1th$ locus is also flipped, to compute ${V}_{1:k+1}$, we can add up the two variances ${V}_{1:k}$ and ${V}_{k+1}$ except for the cross terms that have an even number of sign flips and that include both the $k+1\mathrm{t}\mathrm{h}$ locus and the other $k$ loci. These have to be subtracted twice because they appear both in ${V}_{1:k}$ and ${V}_{k+1}$. These terms are
where the $F\mathrm{s}$ are defined in a similar fashion as in Equation (93) except the sums over $j\mathrm{s}$ run from $k+2$ to $\mathrm{\ell}$ instead of $k+1$ to $\mathrm{\ell}$. We get the general relation
For the CN model specifically, we have $\u27e8{f}_{{i}_{1}{i}_{2}\mathrm{\dots}{i}_{k}}^{2}\u27e9={\prod}_{j=1}^{k}{\mu}_{{i}_{j}}$. This implies
Performing a similar calculation for $\u27e8{V}_{1:k}\u27e9$, we find
which gives
for $\mu \mathrm{\ell}\gg 1$ since ${v}_{k+1}={\mu}_{k+1}/(1+{\mu}_{k+1})$. Dividing Equation (95) by $V$ throughout concludes the derivation.
Adaptedness
Under the conditions of memoryless fitness gains from the previous section, we can write
where variables have been rescaled as $z={V}^{1/2}(y\overline{y}),\sigma ={V}^{1/2}s,\nu ={V}^{1/2}\eta $. This equation suggests various forms for the DFE, $\rho (\sigma z)$, depending on the DVF, $P(v)$.
$z$ is an intuitive measure of ‘adaptedness’: (1) when $z$ is negative, the organism is in the unlikely situation of being ‘negatively adapted’ to the environment. Beneficial mutations are much more likely than deleterious mutations and adaptation is dominated by loci that have a large $v$. (2) When $z\ll 1$, the organism is ‘neutrally adapted’. The number of beneficial and deleterious mutations is balanced. (3) When $z\gg 1$, the organism is ‘welladapted,’ where the DFE is strongly skewed towards deleterious mutations.
We will analyze adaptation in the neutrally adapted and the welladapted regimes. When the organism is negatively adapted, Equation (99) predicts that a few substitutions are sufficient to quickly reach the neutrally adapted state. In addition, we assume the VF of each locus is small, that is, $v\ll 1$ but the number of loci $\mathrm{\ell}$ is large enough so that overall epistasis $\approx \overline{v}\mathrm{\ell}\gg 1$ ($\overline{v}$ is the mean VF). We can then ignore the $1v$ factor and rewrite Equation (99) as
The intuition behind the analytical results is discussed in the main text. We present the formal calculations here.
Analytical results for an exponential DVF
We begin by calculating the DFE, $\rho (\sigma z)$, for the specific case when the DVF is an exponential,
where $\overline{v}\ll 1$ is the mean VF across loci. The true DVF likely has a large fraction of loci that have no effect; accounting for these loci simply scales the mutation rate and will be ignored in this analysis. The exponential DVF leads to analytical predictions for the shape of the average fitness trajectories under certain simplifying assumptions about the underlying selective forces.
From Equation (100), we have
Here, $\nu $ is a standard normal random variable, which we integrate out, giving
where $\phi $ is the normal pdf. For $P(v)$ given in Equation (101), this integral can be calculated exactly:
The resulting DFE is a double exponential with scale $\sim {(z/2\pm \sqrt{2{\overline{v}}^{1}+{z}^{2}}/2)}^{1}$. For $z\ll 1$ small, the DFE is symmetric around the origin, as expected, with a scale determined by the DVF. As $z$ increases, the DFE skews towards deleterious effects. The typical magnitudes of beneficial and deleterious effects are not independent from the overall ratio of beneficial to deleterious mutations. The welladapted regime is reached when ${z}^{2}$ is comparable to ${\overline{v}}^{1}$. To clearly delineate the two regimes, it is useful to define new variables $x=z\sqrt{\overline{v}/2}$ and $\lambda =\sigma \sqrt{2/\overline{v}}$. In the new variables, the DFE is
The neutrally adapted and welladapted regimes then correspond to $x\lesssim 1$ and $x\gtrsim 1$, respectively. The mean rate of adaptation in units of generations on the $x$ scale is $\u27e8dx\u27e9=\u27e8dz\u27e9\sqrt{\overline{v}/2}=\u27e8\sigma \u27e9\sqrt{\overline{v}/2}=\u27e8\lambda \u27e9\overline{v}/2$, where the expectation is taken over fixation probabilities and the DFE. We assume SSWM so that ${p}_{\text{fix}}(\sigma )\sim 2\sigma $ or ${p}_{\text{fix}}(\lambda )\sim 2\lambda \sqrt{\overline{v}/2}$ for positive $\lambda $ and 0 otherwise. We find
where $N$ is population size, and $U$ is the effective mutation rate for loci with nonzero effect. Integrating over $\lambda $, we get
Integrating over $dx$ starting from an initial x_{0}, we obtain an approximation to the mean fitness trajectory
Note that this is only an approximation for small $x$ since the typical fixed beneficial effect is a discrete jump. We show the result from Equation (109) in Appendix 1—figure 1a, where the dependencies on the equilibrium fitness $\overline{y}$ and genetic variance $V$ are highlighted. There are two independent parameters that determine the scale ($\sqrt{\overline{v}/2V}$) and location ($\overline{y}$) of the fitness, and one parameter ($NU{n}_{\text{gen}}$) that determines the time scale. From Equation (109), the $x\sim t$ and $x\sim {t}^{1/5}$ scalings in the neutrally adapted ($x\ll 1$) and welladapted regime ($x\gg 1$) respectively are apparent.
The number of substitutions, n_{s}, as a function of $x$ can also be calculated under the SSWM assumption. We get
which can be mapped onto time using Equation (109). The scalings are ${n}_{s}\sim x\sim t$ in the neutrally adapted regime and ${n}_{s}\sim {x}^{2}\sim {t}^{2/5}$ in the welladapted regime.
Asymptotics and general scaling results
We now derive the asymptotic properties of the DFE in the welladapted regime (${\overline{v}}^{1/2}z\gg 1$). Writing out the Gaussian pdf in Equation (80), we have
where the change of variables $v=u/z$ is used. When $z\gg 1$, Laplace’s method can be used. The exponent is minimized when $u=\sigma /2$, which gives
The contribution towards a fitness effect $\sigma $ at large $z$ comes largely from loci with VF $v\approx \sigma /2z$. The exponential form of the beneficial DFE is determined entirely by the Gaussian tails of the genotypedependent term. The argument can be easily generalized to nonGaussian tail probabilities using a similar calculation. Equation (116) implies that the ratio of the probabilities of beneficial and deleterious mutations is independent of the DVF as long as it has sufficient mass at $v=\sigma /2z$:
Such a relationship has been hypothesized previously based on simulations of an additive finitesites model and the form of ${p}_{\text{fix}}$ close to the highinterference limit that could result in a fitness plateau (Rice et al., 2015). Using our theory, we have shown that this result is indeed true independent of the DVF and under our core hypotheses of normality and memoryless fitness gains. If ${p}_{\text{fix}}\sim {e}^{\sqrt{V}{T}_{c}\sigma}$ in the highinterference limit, where ${T}_{c}$ is the coalescent time, fitness should plateau when ${p}_{\text{fix}}(\sigma )\rho (\sigma z)={p}_{\text{fix}}(\sigma )\rho (\sigma z)$, which is at
The $\sqrt{V}$ appears to account for the rescaling $\sigma =s/\sqrt{V}$.
We get Equation (116) only if $P(v)$ has probability mass at $v=\sigma /2z$. This is not the case in the exponentially correlated fitness landscape model with homogeneous loci, that is, if, for instance, $P(v)=\delta (v\overline{v})$, we instead get from Equation (80)
The DFE in this case is therefore a Gaussian with mean shifting towards the deleterious side. The ratio of beneficial to deleterious mutations goes rapidly to zero as $\simeq \phi (\sqrt{\overline{v}}z)/\sqrt{\overline{v}}z$ and adaptation sharply plateaus beyond $z={\overline{v}}^{1/2}$.
From Equation (116), various scaling results can be derived that apply independent of $P(v)$ and the Gaussian assumption on $\nu $. We retain the exponential form for convenience. The rate of beneficial mutations is
The integral has an effective upper cutoff at $w\sim O(1)$ and can be approximated as ${U}_{b}\approx UP(0)/2{z}^{2}\sim {z}^{2}$ under certain assumptions for $P(v)$ for small $v$. The prefactor $P(0)$ suggests that the number of beneficial mutations depends on the number of smalleffect loci. While strong epistatic loci drive adaptation in the neutrally adapted regime, adaptation in the welladapted regime is instead driven by weakly epistatic loci.
From Equation (116), the typical size of a beneficial mutation is $\sigma \sim {z}^{1}$. Under SSWM, ${p}_{\text{fix}}\sim \sigma \sim {z}^{1}$. As argued previously, since ${U}_{b}\sim {z}^{2}$, we get $dz/dt\sim {z}^{4}$ and therefore we obtain the two scaling relations
which apply independently of the form of the DVF in the welladapted regime.
Data availability
The code and data used to generate the figures are available at https://github.com/greddy992/global_epistasis (copy archived at https://archive.softwareheritage.org/swh:1:rev:ab20956034094e5789a5e0b74fb015d7f5341c31/).

DryadData from: Longterm dynamics of adaptation in asexual populations.https://doi.org/10.5061/dryad.0hc2m
References

Adaptive walks on highdimensional fitness landscapes and seascapes with distancedependent statisticsTheoretical Population Biology 130:13–49.https://doi.org/10.1016/j.tpb.2019.09.011

BookNk Fitness Landscapes, in “The Handbook of Evolutionary Computation”Browse Books.

BookMultivariate Statistics: A Vector Space ApproachJOHN WILEY & SONS, INC.https://doi.org/10.2307/2347710

Evolution experiments with microorganisms: the dynamics and genetic bases of adaptationNature Reviews Genetics 4:457–469.https://doi.org/10.1038/nrg1088

The changing geometry of a fitness landscape along an adaptive walkPLOS Computational Biology 10:e1003520.https://doi.org/10.1371/journal.pcbi.1003520

BookA mathematical theory of natural and artificial selection, part v: selection and mutationIn: Haldane J. B. S, editors. Mathematical Proceedings of the Cambridge Philosophical Society. Cambridge University Press. pp. 838–844.https://doi.org/10.1111/j.1469185X.1924.tb00546.x

Amplitude spectra of fitness landscapesAdvances in Complex Systems 01:39–66.https://doi.org/10.1142/S0219525998000041

Physical constraints on epistasisMolecular Biology and Evolution 37:2865–2874.https://doi.org/10.1093/molbev/msaa124

Universality classes of interaction structures for NK fitness landscapesJournal of Statistical Physics 172:226–278.https://doi.org/10.1007/s109550181979z

Towards a general theory of adaptive walks on rugged landscapesJournal of Theoretical Biology 128:11–45.https://doi.org/10.1016/S00225193(87)800292

The NK model of rugged fitness landscapes and its application to maturation of the immune responseJournal of Theoretical Biology 141:211–245.https://doi.org/10.1016/S00225193(89)800190

The fluctuationdissipation theoremReports on Progress in Physics 29:255–284.https://doi.org/10.1088/00344885/29/1/306

Sustained fitness gains and variability in fitness trajectories in the longterm evolution experiment with Escherichia coliProceedings of the Royal Society B: Biological Sciences 282:20152292.https://doi.org/10.1098/rspb.2015.2292

Idiosyncratic epistasis creates universals in mutational effects and evolutionary trajectoriesNature Ecology & Evolution 4:1685–1693.https://doi.org/10.1038/s4155902001286y

Statistical genetics and evolution of quantitative traitsReviews of Modern Physics 83:1283–1300.https://doi.org/10.1103/RevModPhys.83.1283

Exact results for amplitude spectra of fitness landscapesJournal of Theoretical Biology 332:218–227.https://doi.org/10.1016/j.jtbi.2013.05.002

BookNIST Handbook of Mathematical Functions Hardback and CDROMCambridge university press.

The distribution of fitness effects among beneficial mutationsGenetics 163:1519–1526.

The genetic theory of adaptation: a brief historyNature Reviews Genetics 6:119–127.https://doi.org/10.1038/nrg1523

Biophysical inference of epistasis and the effects of mutations on protein stability and functionMolecular Biology and Evolution 35:2345–2354.https://doi.org/10.1093/molbev/msy141

Inferring the shape of global epistasisPNAS 115:E7550–E7558.https://doi.org/10.1073/pnas.1804015115

The ContextDependence of mutations: a linkage of formalismsPLOS Computational Biology 12:e1004771.https://doi.org/10.1371/journal.pcbi.1004771

Patterns of epistasis between beneficial mutations in an antibiotic resistance geneMolecular Biology and Evolution 30:1779–1787.https://doi.org/10.1093/molbev/mst096

Random field models for fitness landscapesJournal of Mathematical Biology 38:435–478.https://doi.org/10.1007/s002850050156

Quantitative analyses of empirical fitness landscapesJournal of Statistical Mechanics: Theory and Experiment 2013:P01005.https://doi.org/10.1088/17425468/2013/01/P01005

Fourier and Taylor series on fitness landscapesBiological Cybernetics 65:321–330.https://doi.org/10.1007/BF00216965

Should evolutionary geneticists worry about higherorder epistasis?Current Opinion in Genetics & Development 23:700–707.https://doi.org/10.1016/j.gde.2013.10.007

Diminishingreturns epistasis decreases adaptability along an evolutionary trajectoryNature Ecology & Evolution 1:61.https://doi.org/10.1038/s415590160061
Decision letter

Naama BarkaiSenior and Reviewing Editor; Weizmann Institute of Science, Israel

Dmitri A PetrovReviewer; Stanford University, United States

Kristina CronaReviewer; American University, United States

Joachim KrugReviewer; University of Cologne, Germany
In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.
Thank you for submitting your article "Global epistasis emerges from a generic model of a complex trait" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by Naama Barkai as the Senior and Reviewing Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Dmitri A Petrov (Reviewer #1); Kristina Crona (Reviewer #2); Joachim Krug (Reviewer #3).
As you will see below, all reviewers were excited about the quality of the work. Still, to make is accessible to broad audience, the paper would benefit from revising the writing, making it more accessible and precise. Please relate to all comments below.
Reviewer #1 (Recommendations for the authors):
I have little to add to the paper and I strongly support its publication.
Some points that come to mind as a reader:
1. I would have liked more detailed analysis of the connection between the extreme value theory and the results in the paper.
2. most experimental evolution is not done in the WMSS regime and indeed the authors suggest that the failure of their predictions in the cases of LTEE might be due to the strong mutation regime of the LTEE. Can more be said about the expected differences in predictions in the strong mutation regime?
3. I imagine these arguments were made in engineering or catastrophe theory or economics or control theory – is this true? The results feel so generic that they would apply to all complex systems.
4. What do you results say about the evolution of modularity and how epistasis would evolve?
Reviewer #2 (Recommendations for the authors):
The authors should address the fact that diminishing return and increases costs occur for landscapes beyond the type considered. For instance, consider a biallelic 20locus fitness landscapes associated with stabilizing selection, where genotypes are represented as bit strings and
wg=1∣∑gi10∣20
A mutation 0 ↦ 1 for a genotype g results in the fitness difference
120if∑gi≥10120if∑gi<10
There is no regression effect here. However, by adding random noise to the fitness values and instead assume that wg=1∣∑gi10∣20+εg the result is a landscape with both diminishing return and increased costs effects.
A. Major comments and recurrent issues.
1. Throughout the paper, the manuscript seems to assume SSWM, and fall back on Gillespie’s theory for small fitness differences between genotypes. For instance the author writes (Page 15):”This analysis suggests that during adaptation, since selection favors mutations with stronger fitness effects on the current background, a mutation that interacts less with previous mutations is more likely to be selected.” However, if two beneficial mutations have strong effects than they are about equally likely to go to fixation.
2. Every mathematical expression in the main text should have an explanation or a reference (for biology journals that is a reasonable requirement).
3. References are missing throughout the manuscript. There is not even a reference for”Krawtchouk polynomials”.
4. The SI is too difficult to read in my view. More detailed argument would help, and also useful references for readers who need to catch up, or review, say Fourier representations or Brownian motion.
5. Throughout the main text, the manuscript abuses notation for Fourier coefficients (without comments). The expansion on Page 2 assumes that i > j for f_{ij} (and similarly) but later on the authors use f_{ij} where i < j (at first the notation was confusing to me, especially subscripts of the type i 6 = j > k). At least a comment should be provided, or otherwise the notation should be changed (in fact the manuscript uses different notation in the SI, Page 9).
6. The following type of expression was unclear to me before I read the SI (on Page 6 and other places in the manuscript):
∑ f_{j} f_{ij}+∑ f_{jk} f_{ijk} +….
7. The expression”slope for a double mutant”, and similarly, sounds strange (on Page 9 and other places in the manuscript). Here the fitness of a double mutant is compared to the sum of the fitnesses for the corresponding single mutants (something similar should be stated for clarity).
8. It would be great with an explicit example (a toy example) of a landscape where global epistasis varies substantially between loci, for some intuitive understanding.
B. Detailed comments in order of appearance. Most comments below are very minor and concerns notation or wording. The list is not intended to be complete (in fact most comments concern the first few pages and the corresponding part of the SI). Similar minor issues were noted throughout the manuscript, but all of it can be handled swiftly.
1. Page 1:” However, the mechanistic basis of this global epistasis remains unknown.”
The manuscript provides statistical rather than mechanistic explanations. (A mechanistic explanation would concern protein folding or something.)
2. Page 5: “In other words, Equation 2 implies that widespread independent idiosyncratic epistatic interactions lead to the observed patterns of diminishingreturns and increasingcosts epistasis.”
Equation 2 does not imply anything, since such an expansion is possible for any genotype. It would make more sense to simply say:”We argue that wide spread independent idiosyncratic epistatic lead to diminishing return and increasing costs epistasis.” Alternatively, explain the connection to Equation 2 in a more precise way.
(The heuristic argument on the same page is very nice!)
3. Page 6, L 103: The definition of v~iis very important and should be clarified. Explain the symbols ∑ f_{j} f_{ij}+∑ f_{jk} f_{ijk} +… and explain the symbols in the denominator. (It would also be nice with a brief description of how the expression is composed from the regression argument in the SI.)
4. Page 6, L108:”Note that these results hold for any fitness landscape”. This is not true, as the text also states a few lines later.
5. Page 6, L 120: Why does the sum scale as l?
6. Page 8, L 162:”The theory additionally predicts.…” What exactly is the claim? Does the claim concern any double mutant such that the corresponding single mutations are beneficial?
7. Page 8, L 171:”To test our analytical results” Sounds strange, simulations do not verify analytical results. What exactly is the purpose of the simulation (an illustration of the analytical results perhaps)?
8. SI: It would be much easier to read the SI if it started with Part A followed by Part I. The reason is that Part I uses v_{i} that is defined in Part A.
9. SI: The notation makes the reading more difficult than necessary in a few places. For instance the symbols y_{i} and y_{i} are unrelated, whereas y_{i} and ξ_{i} are closely related. In addition, more informative notation could save time for a reader, such as y(x_{i} = 1), y(x_{i} = −1) rather than y~iand y^i.
10. SI, L102: Since v~iis important, it seems reasonable with more explanations of the derivation. Properties of the Fourier expansions are used for cancelling out terms and other simplifications (which is also mentioned), but explicit arguments are not provided.
11. Why is v~idefined as half of the negative slope? Later on 2v~ishows up in formulas. It would make more sense to define v~ias the slope, which would also make it easier to interpret some formulas in the main text.
Reviewer #3 (Recommendations for the authors):
The results for the symmetric model are much simpler and more transparent – the slope of the regression is strictly negative, and has a natural interpretation in terms of variance fraction, whereas the corresponding expression (4) in the directed case looks intimidatingly complicated and obscure. I had a hard time understanding the difference between the two cases and the meaning of the statement that in the symmetric model "the fitness effects of both x_{i} = 1 \to +1 and x_{i} = +1 \to 1 are regressed against the background fitness". My interpretation of this is now the following:
To apply the symmetric model to experimental data, along with each mutation also the corresponding reversion would have to be included in the data set. Whereas the selection coefficient of the reversion is simply the negative of that of the mutation, it happens on a different background, and therefore the effect of including it in the regression is not trivial. In practice, including the reversions is of course not possible, and therefore the directed model is needed, although conceptually the symmetric model is more appealing.
If the authors agree with the above, I suggest that they add some explanation along these lines in the manuscript.
The following points concern the same issue:
1. In the application to experimental data, it is of course also not possible (or meaningful) to distinguish between the two directions x_{i} = 1 \to +1 and x_{i} = +1 \to 1. It is therefore a bit worrisome that (3,4) depend on the direction. I assume that in practice the data analysis is anyway restricted to the WE limit where the difference between the two direction does not matter. Is that correct? If so, it should be explicitly stated.
2. The description of how the model is applied to the data of Johnson et al. (Methods and Materials, lines 524 ff.) is not entirely clear. Why can the variance fraction be estimated from the Pearson correlation between y_{i} and y? And what does it mean (in line 534) that the slope of the regression is either 2v_{i} – 1 or 2v~i 1? If the directed model applies, the slope should always be 2v~i – 1. Related to this, in Figure 3a it seems 2v should be replaced by 2v~.
https://doi.org/10.7554/eLife.64740.sa1Author response
Reviewer #1 (Recommendations for the authors):
I have little to add to the paper and I strongly support its publication.
Some points that come to mind as a reader:
1. I would have liked more detailed analysis of the connection between the extreme value theory and the results in the paper.
We have now included a derivation of Orr’s result and how that relates to the result we obtain (lines 406414 of RM). Our results are consistent with Orr’s result, and give additional information about how the beneficial DFE and the rate of beneficial mutations change with adaptedness.
2. most experimental evolution is not done in the WMSS regime and indeed the authors suggest that the failure of their predictions in the cases of LTEE might be due to the strong mutation regime of the LTEE. Can more be said about the expected differences in predictions in the strong mutation regime?
This is an interesting issue, and we have added a few comments about the dynamics in the strong mutation regime (lines 523530 in the RM). Future work will be required to say anything more quantitative, because while existing theory has extensively analyzed this regime for a fixed DFE and weak epistasis, there is much less understanding of how strong mutations interact with global epistasis.
3. I imagine these arguments were made in engineering or catastrophe theory or economics or control theory – is this true? The results feel so generic that they would apply to all complex systems.
We wondered the same and are unaware of a similar result in other fields. Perhaps the most closely related result is the fluctuationdissipation theorem from statistical physics, which is also based on the idea that many independent interactions lead to a negative linear feedback and Gaussian fluctuations of proportional magnitudes. We discuss this relationship in the SI (see section on Relationship to fluctuationdissipation theorem).
4. What do you results say about the evolution of modularity and how epistasis would evolve?
The Fourier framework here assumes a fixed landscape and does not capture global architectural changes in the landscape over the course of longterm evolution. We think the evolution of modularity is an interesting question and plan to develop a framework that incorporates both epistasis and a plastic landscape in followup work. Our intuition is that modularity is typically favored in changing environments, as expected. However, our results suggest that a fullyconnected architecture has on average mutations of large effect due to epistasis, which can speed adaptation in the neutrallyadapted regime. We suspect that a nonmodular architecture may be favored in certain cases depending on how much environments differ and change.
Reviewer #2 (Recommendations for the authors):
The authors should address the fact that diminishing return and increases costs occur for landscapes beyond the type considered. For instance, consider a biallelic 20locus fitness landscapes associated with stabilizing selection, where genotypes are represented as bit strings and
wg=1∣∑gi10∣20
A mutation 0 ↦ 1 for a genotype g results in the fitness difference
120if∑gi≥10120if∑gi<10
There is no regression effect here. However, by adding random noise to the fitness values and instead assume that wg=1∣∑gi10∣20+εg the result is a landscape with both diminishing return and increased costs effects.
A. Major comments and recurrent issues.
1. Throughout the paper, the manuscript seems to assume SSWM, and fall back on Gillespie’s theory for small fitness differences between genotypes. For instance the author writes (Page 15):”This analysis suggests that during adaptation, since selection favors mutations with stronger fitness effects on the current background, a mutation that interacts less with previous mutations is more likely to be selected.” However, if two beneficial mutations have strong effects than they are about equally likely to go to fixation.
We have expanded on our SSWM and small fitness differences assumptions, and included the expression for the fixation probability (line 373377 in RM). We expect the small differences assumption to have minimal effect on the longterm dynamics as the typical beneficial fitness effect in the welladapted regime is small relative to the fitness (~ 1/2z^{2}, where z is adaptedness and z >> 1 in the welladapted regime). We have also added a brief comment in the Discussion regarding nonSSWM dynamics.
2. Every mathematical expression in the main text should have an explanation or a reference (for biology journals that is a reasonable requirement).
We have added references, referred to the SI or elaborated on the derivation for each of the numbered equations in the main text.
3. References are missing throughout the manuscript. There is not even a reference for”Krawtchouk polynomials”.
We have added a reference for Krawtchouk polynomials and elsewhere (line 929).
4. The SI is too difficult to read in my view. More detailed argument would help, and also useful references for readers who need to catch up, or review, say Fourier representations or Brownian motion.
We have added references to Brownian motion (lines 845848) and Fourier representations (lines 732).
5. Throughout the main text, the manuscript abuses notation for Fourier coefficients (without comments). The expansion on Page 2 assumes that i > j for f_{ij} (and similarly) but later on the authors use f_{ij} where i < j (at first the notation was confusing to me, especially subscripts of the type i 6 = j > k). At least a comment should be provided, or otherwise the notation should be changed (in fact the manuscript uses different notation in the SI, Page 9).
We have added a footnote before Equation 4 explaining the notation.
6. The following type of expression was unclear to me before I read the SI (on Page 6 and other places in the manuscript):
∑ f_{j} f_{ij} +∑ f_{jk} f_{ijk} +….
The symbols are now explained in a footnote (point 5 above). We have added a sentence to hint where v~ is coming from (line 118120 of RM).
7. The expression”slope for a double mutant”, and similarly, sounds strange (on Page 9 and other places in the manuscript). Here the fitness of a double mutant is compared to the sum of the fitnesses for the corresponding single mutants (something similar should be stated for clarity).
We agree and have modified this phrase for clarity (lines 183185 and 210211 of RM).
8. It would be great with an explicit example (a toy example) of a landscape where global epistasis varies substantially between loci, for some intuitive understanding.
We note that the CN model is a landscape where global epistasis varies across loci. We have added a sentence to highlight that the CN model serves this purpose (lines 322324 in RM) As a side note, this was our original motivation for constructing the CN model. Coincidentally, it turned out the CN model satisfied the minimal contingency condition as well.
B. Detailed comments in order of appearance. Most comments below are very minor and concerns notation or wording. The list is not intended to be complete (in fact most comments concern the first few pages and the corresponding part of the SI). Similar minor issues were noted throughout the manuscript, but all of it can be handled swiftly.
1. Page 1:” However, the mechanistic basis of this global epistasis remains unknown.”
The manuscript provides statistical rather than mechanistic explanations. (A mechanistic explanation would concern protein folding or something.)
We agree and have removed “mechanistic” here and elsewhere.
2. Page 5:”In other words, Equation 2 implies that widespread independent idiosyncratic epistatic interactions lead to the observed patterns of diminishingreturns and increasingcosts epistasis.”
Equation 2 does not imply anything, since such an expansion is possible for any genotype. It would make more sense to simply say:”We argue that wide spread independent idiosyncratic epistatic lead to diminishing return and increasing costs epistasis.” Alternatively, explain the connection to Equation 2 in a more precise way.
(The heuristic argument on the same page is very nice!)
We agree and have removed “Equation 2 implies that”.
3. Page 6, L 103: The definition of v~iis very important and should be clarified. Explain the symbols ∑ f_{j} f_{ij} +∑ f_{jk} f_{ijk} +… and explain the symbols in the denominator. (It would also be nice with a brief description of how the expression is composed from the regression argument in the SI.)
The changes are described in Point 6 of major points above.
4. Page 6, L108:”Note that these results hold for any fitness landscape”. This is not true, as the text also states a few lines later.
We agree that this statement can confuse readers and have removed it.
5. Page 6, L 120: Why does the sum scale as l?
The scaling with l follows from a central limit argument, as we are adding up l independent terms with arbitrary signs.
6. Page 8, L 162:”The theory additionally predicts.…” What exactly is the claim? Does the claim concern any double mutant such that the corresponding single mutations are beneficial?
We have modified this (see point 7 of major points above).
7. Page 8, L 171:”To test our analytical results” Sounds strange, simulations do not verify analytical results. What exactly is the purpose of the simulation (an illustration of the analytical results perhaps)?
We agree and have changed “test” to “illustrate”.
8. SI: It would be much easier to read the SI if it started with Part A followed by Part I. The reason is that Part I uses v_{i} that is defined in Part A.
We agree that the v_{i} is defined later in part A, but we feel that part A won’t be motivated well if it’s presented first. For this reason, we chose to keep the existing order and have highlighted further that v_{i} will be defined later.
9. SI: The notation makes the reading more difficult than necessary in a few places. For instance the symbols y_{i} and y_{i} are unrelated, whereas y_{i} and ξ_{i} are closely related. In addition, more informative notation could save time for a reader, such as y(x_{i} = 1), y(x_{i} = −1) rather than y~iand y^i.
We agree and have changed the notation for these quantities.
10. SI, L102: Since v~iis important, it seems reasonable with more explanations of the derivation. Properties of the Fourier expansions are used for cancelling out terms and other simplifications (which is also mentioned), but explicit arguments are not provided.
We have elaborated further on how this calculation is done (line 779 of RM).
11. Why is v~idefined as half of the negative slope? Later on 2v~ishows up in formulas. It would make more sense to define v~ias the slope, which would also make it easier to interpret some formulas in the main text.
v_{i} has an interpretation as the variance fraction and additionally, for the CN model we show that v_{i} is the fraction of pathways involving locus i. For these reasons, we think it is more convenient to refer to v_{i}, which is half the negative slope, rather than the slope. We have added a sentence in the text explaining our choice (lines 170172 in RM).
Reviewer #3 (Recommendations for the authors):
The results for the symmetric model are much simpler and more transparent – the slope of the regression is strictly negative, and has a natural interpretation in terms of variance fraction, whereas the corresponding expression (4) in the directed case looks intimidatingly complicated and obscure. I had a hard time understanding the difference between the two cases and the meaning of the statement that in the symmetric model "the fitness effects of both x_{i} = 1 \to +1 and x_{i} = +1 \to 1 are regressed against the background fitness". My interpretation of this is now the following:
To apply the symmetric model to experimental data, along with each mutation also the corresponding reversion would have to be included in the data set. Whereas the selection coefficient of the reversion is simply the negative of that of the mutation, it happens on a different background, and therefore the effect of including it in the regression is not trivial. In practice, including the reversions is of course not possible, and therefore the directed model is needed, although conceptually the symmetric model is more appealing.
If the authors agree with the above, I suggest that they add some explanation along these lines in the manuscript.
We agree with this interpretation and that the symmetric model is conceptually more appealing. We chose to include the directed mutation case as it is closer to how experimental data is often presented in practice (i.e., Figure 1c,d and Equation 1). We modified and added a few sentences (lines 162165, 174176 in RM) to make this clearer.
The following points concern the same issue:
1. In the application to experimental data, it is of course also not possible (or meaningful) to distinguish between the two directions x_{i} = 1 \to +1 and x_{i} = +1 \to 1. It is therefore a bit worrisome that (3,4) depend on the direction. I assume that in practice the data analysis is anyway restricted to the WE limit where the difference between the two direction does not matter. Is that correct? If so, it should be explicitly stated.
The sign convention indeed shouldn’t matter, irrespective of whether the landscape is in the WE limit. We modified and added a few sentences to make this clear. If the convention is reversed, we can check that the value of v~i remains the same. This is because the oddorder Fourier coefficients also change sign (lines 120124 of RM).
2. The description of how the model is applied to the data of Johnson et al. (Methods and Materials, lines 524 ff.) is not entirely clear. Why can the variance fraction be estimated from the Pearson correlation between y_{i} and y? And what does it mean (in line 534) that the slope of the regression is either 2v_{i} – 1 or 2v~i 1? If the directed model applies, the slope should always be 2v~i – 1. Related to this, in Figure 3a it seems 2v should be replaced by 2v~.
We thank the reviewer for spotting these errors (note that this is a misprint and the values of v_{i} shown in the Figures were computed correctly). The variance fraction is the Pearson correlation when data from both +1 > 1 and 1 > +1 is included. We have modified to address the three points above (lines 577579 and 582 of RM).
https://doi.org/10.7554/eLife.64740.sa2Article and author information
Author details
Funding
Simons Foundation (NSFSimons Center at Harvard #1764269)
 Gautam Reddy
Simons Foundation (376196)
 Michael M Desai
National Science Foundation (PHY1914916)
 Michael M Desai
National Institutes of Health (R01GM104239)
 Michael M Desai
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We thank Sergey Kryazhimskiy, Andrew Murray, Milo Johnson, and members of the Desai lab for comments on the manuscript. GR was supported by the NSFSimons Center for Mathematical and Statistical Analysis of Biology at Harvard (award number #1764269) and the Harvard FAS Quantitative Biology Initiative. MMD acknowledges support from the Simons Foundation (grant 376196), NSF Grant PHY1914916, and NIH grant R01GM104239.
Senior and Reviewing Editor
 Naama Barkai, Weizmann Institute of Science, Israel
Reviewers
 Dmitri A Petrov, Stanford University, United States
 Kristina Crona, American University, United States
 Joachim Krug, University of Cologne, Germany
Publication history
 Received: November 9, 2020
 Accepted: March 26, 2021
 Accepted Manuscript published: March 29, 2021 (version 1)
 Version of Record published: April 20, 2021 (version 2)
Copyright
© 2021, Reddy and Desai
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 1,178
 Page views

 205
 Downloads

 3
 Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.