Global epistasis emerges from a generic model of a complex trait

  1. Gautam Reddy  Is a corresponding author
  2. Michael M Desai  Is a corresponding author
  1. NSF-Simons Center for Mathematical and Statistical Analysis of Biology, Harvard University, United States
  2. Department of Organismic and Evolutionary Biology, Harvard University, United States
  3. Quantitative Biology Initiative, Harvard University, United States
  4. Department of Physics, Harvard University, United States


Epistasis between mutations can make adaptation contingent on evolutionary history. Yet despite widespread ‘microscopic’ epistasis between the mutations involved, microbial evolution experiments show consistent patterns of fitness increase between replicate lines. Recent work shows that this consistency is driven in part by global patterns of diminishing-returns and increasing-costs epistasis, which make mutations systematically less beneficial (or more deleterious) on fitter genetic backgrounds. However, the origin of this ‘global’ epistasis remains unknown. Here, we show that diminishing-returns and increasing-costs epistasis emerge generically as a consequence of pervasive microscopic epistasis. Our model predicts a specific quantitative relationship between the magnitude of global epistasis and the stochastic effects of microscopic epistasis, which we confirm by reanalyzing existing data. We further show that the distribution of fitness effects takes on a universal form when epistasis is widespread and introduce a novel fitness landscape model to show how phenotypic evolution can be repeatable despite sequence-level stochasticity.


Despite the idiosyncrasies of epistasis, a number of laboratory microbial evolution experiments show systematic patterns of convergent phenotypic evolution and declining adaptability. A striking example is provided by the Escherichia coli long-term evolution experiment (LTEE) (Figure 1a): 12 replicate populations that adapt in parallel show remarkably similar trajectories of fitness increase over time (Wiser et al., 2013; Lenski et al., 2015), despite stochasticity in the identity of fixed mutations and the underlying dynamics of molecular evolution (Tenaillon et al., 2016; Good et al., 2017). Similar consistent patterns of fitness evolution characterized by declining adaptability over time have also been observed in parallel yeast populations evolved from different genetic backgrounds and initial fitnesses (Kryazhimskiy et al., 2014; Figure 1b) and in other organisms (Elena and Lenski, 2003; Perfeito et al., 2014; Wünsche et al., 2017; Sanjuán et al., 2005; Couce and Tenaillon, 2015; Jerison et al., 2017; Schenk et al., 2013). Declining adaptability is thought to arise from diminishing-returns epistasis (Khan et al., 2011; Chou et al., 2011; Kryazhimskiy et al., 2014), where a global coupling induced by epistatic interactions systematically reduces the effect size of individual beneficial mutations on fitter backgrounds. Diminishing-returns manifests as a striking linear dependence of the fitness effect of a mutation on background fitness (Figure 1c). While diminishing-returns can be rationalized as the saturation of a trait close to a fitness peak, recent work shows a similar dependence on background fitness even for deleterious mutations, which become more costly on higher fitness backgrounds (Johnson et al., 2019). This suggests that fitter backgrounds are also less robust to deleterious effects (Figure 1d), a phenomenon that has been termed increasing-costs epistasis. The origin of the global coupling that results in these effects is unknown.

Declining adaptability and global epistasis in microbial evolution experiments.

(a) Convergent phenotypic evolution in the E. coli long-term evolution experiment: the fitness relative to the common ancestor of 11 independently adapting populations over 50,000 generations is shown (data from Wiser et al., 2013). The 12th population, Ara + 6, has limited data and is not shown. (b) Yeast strains with lower initial fitness adapt faster (data from Kryazhimskiy et al., 2014). The fitness gain after 250 (green) and 500 (orange) generations of 640 independently adapting populations with 64 different founders and 10 replicates of each founder. Mean and SE are computed over replicates. (c) Diminishing returns of specific beneficial mutations on fitter backgrounds for three knocked out genes (green, orange, and purple) (data from Kryazhimskiy et al., 2014). Control in pink. (d) Increasing costs of specific deleterious mutations on fitter backgrounds (data from Johnson et al., 2019). The fitness effect relative to the least fit background for the mean over 91 mutations (in red) and 5 of the 91 mutations is shown. Linear fits for the five specific mutations and the mean using dashed and solid lines respectively are shown.

Put together, these empirical observations suggest that the contributions to the fitness effect, si, of a mutation at a locus i in a given genetic background can be written as

(1) si=sadditive,i+sgenotype,i-ciy,

where sadditive,i is the additive effect of the mutation, sgenotype,i is its genotype-dependent epistatic contribution independent of the background fitness y (i.e., idiosyncratic epistasis), and ci quantifies the magnitude of global epistasis for locus i. Equation (1) reflects the observation that the strength of global epistasis depends on the specific mutation and applies independently of whether its additive effect is deleterious (increasing-costs) or beneficial (diminishing-returns). Over the course of adaptation in a fixed environment, global epistatic feedback on mutational effects can lead to a long-term decrease in adaptability. If this feedback dominates, Equation (1) suggests that the dependence of the fitness effect on evolutionary history is summarized entirely by the current fitness, and therefore results in predictable fitness evolution.

Here, we show that diminishing-returns and increasing-costs epistasis are a simple consequence of widespread epistasis (WE). This is consistent with recent work (Lyons et al., 2020) that proposes a similar argument to explain these phenomena. However, while the core idea is similar, we present here an alternative framework based on the Fourier analysis of fitness landscapes, which leads to new insights and quantitative predictions. In particular, our framework leads to novel predictions for the relationship between the magnitude of global epistasis and the stochastic effects of microscopic epistasis, which we confirm by reanalyzing existing data. Extending this framework, we further quantify how the distribution of fitness effects (DFEs) shifts as the organism adapts and how the fitness effect of a mutation depends on the sequence of mutations that have fixed over the course of adaptation (i.e., historical contingency). While specific historical relationships depend on the genetic architecture, we introduce a novel fitness landscape model with an intuitive architecture for which the entire history is summarized by the current fitness. Using this fitness landscape model, we investigate the long-term dynamics of adaptation and elucidate the architectural features that lead to predictable fitness evolution.


Diminishing-returns and increasing-costs epistasis

We begin by examining the most general way to express the relationship between genotype and fitness (i.e., to describe the fitness landscape). A map between a quantitative trait (such as fitness), y, and the underlying genotype can be expressed as a sum of combinations of biallelic loci x1,x2,,x that take on values xi=±1 (Hordijk and Stadler, 1998; Neher and Shraiman, 2011; Weinberger, 1991; Szendro et al., 2013; Weinreich et al., 2013):

(2) y=y¯+ifixi+i>jfijxixj+i>j>kfijkxixjxk+,

where y¯ is a constant that sets the overall scale of fitness. The symmetric convention xi=±1 for the two allelic variants is less often used than xi=0,1, but it is an equivalent formulation, which we employ here because it will prove more convenient for our purposes (see Poelwijk et al., 2016 for a discussion). The coefficients of terms linear in xi represent the additive contribution of each locus to the fitness (i.e., its fitness effect averaged across genotypes at all other loci), the higher-order terms quantify epistatic interactions of all orders, and y¯ is the average fitness across all possible genotypes. Importantly, Equation (2) makes apparent the idiosyncrasies induced by epistasis: a mutation at a locus with interacting partners has an effect composed of 2-1 contributions.

To explicitly compute the fitness effect of a mutation at locus i on a particular genetic background, we simply flip the sign of xi, keeping all other xj constant, and write down the difference in fitness that results. This fitness effect will generally involve a sum over a large number of terms involving the f’s in Equation (2). While this may suggest that an analysis of fitness effects via Equation (2) is intractable, the analysis in fact simplifies considerably if the locus has a significant number of independent interactions that contribute to the fitness (i.e., provided that the number of independent, nonzero epistatic terms associated to the locus is large). In this case, we show that the fitness effects of individual mutations decrease linearly with background fitness and the fluctuations around this linear trend are normally distributed. In other words, widespread independent idiosyncratic epistatic interactions lead to the observed patterns of diminishing-returns and increasing-costs epistasis.

We present a derivation of this result in the SI. Here, we explain the key intuition using a heuristic argument. The argument is based on a simple idea: for a well-adapted organism (y>y¯) with complex epistatic interactions, a mutation is more likely to disrupt rather than enhance fitness. To be quantitative, consider a highly simplified scenario where some number N of the f’s in Equation (2) are ±1 at random and the others are 0. In this case, the fitness of a given genotype is a sum of N+ and N- interactions that contribute positively and negatively to the trait, respectively, each with unit magnitude, so that y=y¯+N+-N-. When positive and negative interactions balance, the organism is in a ‘neutrally adapted’ state (yy¯). By selecting for positive interactions, adaptation generates a bias so that N+>N- and y>y¯. If locus i involved in a fraction vi of all of N=N++N- interactions is mutated, the effect of the mutation, on average, is to flip the sign of N+vi positive interactions and N-vi negative interactions. The new fitness is then yi=y-2N+vi+2N-vi=y¯+(1-2vi)(y-y¯) and thus si=yi-y=-2vi(y-y¯). The negative linear relation between the background fitness, y, and the fitness effect of the mutation, si, is immediately apparent and emerges as a systematic trend simply due to a sampling bias towards positive interactions. Of course, while this relation is true on average, it is possible that locus i affects more or less positive interactions due to sampling fluctuations. Provided only that N is large and the interactions are independent, these fluctuations are approximately Gaussian with magnitude Nvi(1-vi).

This basic argument holds beyond the simple model with unit interactions. In the more general case, if the mutation is directed from xi=-1+1, we show in the SI that its fitness effect, si, on a background of fitness y can be written as

(3) si=2fi(1v~i)additive2v~i(yy¯)global epistasis+ϵ~igenotype,


(4) v~i(jifij2+j>kifijk2+)-(jifjfij+j>kifjkfijk+)ji(fj-fij)2+j>ki(fjk-fijk)2+,

and ϵ~i is a genotype and locus-dependent term that is distributed across genotypes with mean zero and variance expressed in terms of the f’s from Equation (2) (see SI for details). In the following equation and similar ones henceforth, a summation such as j>kifijk2 is meant to denote a sum over pairs j,k, where each pair appears only once and no pair that includes index i appears. Symmetry of the f’s w.r.t. interchanged indices is also assumed (e.g., fijk=fjik). The numerator of v~i in Equation (4) is proportional to the covariance of fitness effects and background fitness, and the denominator is the variance of background fitness across genotypes. A similar equation for the case xi=+1-1 can be derived. The choice of +1-1 or -1+1 is simply a matter of convention. If the convention is reversed, the coefficients of odd-order in Equation (2), that is, fi,fijk,, should also switch signs. It can be easily checked that reversing the signs of these quantities in the expression for v~i above leads to the expression for v~i when xi=+1-1.

Note that in general v~i is not guaranteed to be positive and ϵ~i is arbitrary and determined by the genotype-fitness map. However, consistent patterns emerge when locus i has a large number of independent, nonzero epistatic terms and the additive effects f1,f2, of its interacting partners are not much larger than the epistatic terms (defined further below), which we call the WE limit. In the WE limit, ϵ~i is normally distributed across genotypes with variance proportional to v~i(1-v~i). This follows from the same reasoning as in our heuristic argument with unit interactions above (see SI for details). In addition, v~i is typically positive, giving rise to a negative linear trend (i.e., diminishing-returns and increasing-costs). We can see this by taking the third- and higher-order terms in Equation (4) to be zero, in which case v~i is positive if jifij2>jifjfij. This will typically be true in the WE limit because we expect jifij2 to scale with the number of interacting partners , while each term in jifjfij can be positive or negative and thus the sum scales as if the terms are independent. Thus when locus i has a large number of interacting partners, v~i is typically positive unless the magnitude of the additive terms (a) is much larger than the magnitude of the epistatic terms (e), ae. This argument is easily extended to the case when the third- and higher-order terms are nonzero (see SI); the upshot is that the bias towards v~i positive gets stronger with increasing epistasis.

The conditions for the WE limit are more likely to hold when the number of loci, , that affect the trait is large. Therefore, we expect to generically observe patterns of diminishing-returns and increasing-costs epistasis for a complex trait involving many loci. Importantly, whether we observe a negative linear trend does not depend on the magnitude of a locus’ epistatic interactions relative to its own additive effect, but rather relative to the additive effects of its interacting partners. If we are not in the WE limit, and instead the additive effects dominate (i.e., ael), then Equation (4) suggests that the slope of the linear trend can be either positive or negative. We will show further below that recent experimental data demonstrates that both scenarios can be relevant: some loci have ael while others have ael, with the former creating a bias towards the observed negative linear trends that characterize diminishing-returns and increasing-costs epistasis.

We note that Equation (3) immediately leads to testable quantitative predictions: in the WE limit, the distribution of the residuals, ϵ~i, obtained from regressing si and y is entirely determined by the slope of the regression, -2v~i. Specifically, we predict that these residuals (the deviations of individual genotype fitnesses from the overall diminishing-returns or increasing-costs trend) should be normally distributed with a variance proportional to v~i(1-v~i). However, this condition only applies if diminishing-returns arises from the WE limit. It does not hold if epistasis is negligible, if locus i interacts significantly with only a few other dominant loci, or if the epistatic terms are interrelated (e.g., when global epistasis arises from a nonlinearity applied to an unobserved additive trait; Starr and Thornton, 2016; Sailer and Harms, 2017; Otwinowski et al., 2018). The latter case may still lead to a negative linear trend, but the statistics of the residuals will differ from Equation (3) (see SI for a discussion).

It is convenient to subsequently work with the symmetric version of Equation (3), where the fitness effects of both xi=-1+1 and its reversion xi=+1-1 (whose fitness effect is negative of the former) are included in the regression against their respective background fitness. In this case, the additive term is averaged out, and we show (SI) that in the WE limit,

(5) si=-2vi(y-y¯)+2vi(1-vi)ηi,

where ηi depends on the genetic background and the locus, and is normally distributed with zero mean and variance V, and

(6) viViV=fi2+jifij2+kfk2+k>lfkl2+.

Here, V is the total genetic variance due to all loci (i.e., the variance in fitness across all possible genotypes) while Vi is the contribution to the total variance by the f’s involving locus i. We therefore refer to vi as the variance fraction (VF) of locus i. We show further below that for certain fitness landscapes vi can also be interpreted as the fraction of pathways affected by a locus. For these reasons, we focus on vi, which is half of the negative slope, rather than the slope. Note that the vi’s do not sum to one unless there is no epistasis (with epistasis, ivi>1, reflecting the fact that the variance contributed by different loci overlap). While the directed mutation case discussed previously is the relevant one when presenting experimental data (e.g., Figure 1c, d), it is conceptually simpler to work with the symmetric case. These two cases coincide and viv~i in the WE limit if the additive effect of a locus is small (i.e., fi2jifij2+j>kifijk2+).

Our results show that the VF vi plays an important role. It determines the slope of the negative relationship between the fitness effect and background fitness. At the same time, it determines the magnitude of the idiosyncratic fluctuations away from this trend. We also note that this slope can be used to experimentally probe the contribution of a locus to the trait (i.e., its VF) taking into account all orders of epistasis, which circumvents the estimation of the individual f’s in Equation (2). The theory additionally predicts that the slope obtained by regressing the sum of fitness effects of two mutations at loci i,j against background fitness is proportional to vij=vi+vj-2eij, where eij quantifies the magnitude of epistatic interactions of all orders between i and j (SI).

Importantly, while the fitness effects of individual mutations (and hence the DFEs) may change over the course of evolution due to epistasis, the distribution of variance fractions (DVF) across loci, P(v), is an invariant measure of the range of effect sizes available to the organism during adaptation. As we will see, this means that the DVF plays an important role in determining long-term adaptability.

Numerical results and experimental tests

To illustrate our analytical results, we first demonstrate that the effects described above are reproduced in numerical simulations. To do so, we numerically generated a genotype-phenotype map of the form in Equation (2), with =400 loci and an exponential DVF, P(v)=v¯-1e-v/v¯, where v¯=0.02 (Materials and methods). This DVF is shown in Figure 2a. Note that v¯1 corresponds to an epistatic landscape; v¯=8 chosen here thus corresponds to a model within the WE limit (note that v~ivi in this parameter range). Using this numerical landscape, we measured the fitness effect of mutations at 30 loci across 640 background genotypes with a range of fitnesses (Figure 2b). Our results recapitulate the predicted linear dependence on background fitness (Figure 1c, d), with a negative slope equal to twice the VF predicted from Equation (5). We further simulated the evolution of randomly generated genotypes similar to the experimental procedure used in Kryazhimskiy et al., 2014 (Figure 2c), finding that our results reproduce the patterns of declining adaptability observed in experiments (Figure 1b). Note that ∼10 mutations are fixed during this simulated evolution; declining adaptability here is not due to a finite-sites effect.

Global epistasis is recapitulated in a generic model of a complex trait and leads to testable predictions.

(a) The distribution of variance fractions over 400 loci for the simulated genotype-phenotype map. (b) The predicted linear relationship between fitness effect (relative to the fitness effect on the least fit background) and background fitness for the mean over 30 randomly chosen loci (red, solid line) and five loci (dashed lines in colors) is recapitulated. The slope of the linear fit for each locus is proportional to its variance fraction, v (slope = -2v). Mean and SE are over backgrounds of approximately equal fitness. See Materials and methods for more details. (c) The mean fitness gain after 25 (green) and 50 (orange) generations of simulated evolution of 768 independently adapting populations with 64 unique founders and 12 replicates each. Means and SEs are computed over the 12 replicates. Error bars are s.e.m. (d) The relationship predicted from theory between the residual variance from the linear fit for each locus and its slope is confirmed in simulations. (e) The mean fitness effect for single mutants at 30 loci and double mutants from all possible pairs of the 30 loci. The slope for the double mutants is predicted to be roughly twice that of single mutants. (f) The estimated variance fraction of a double mutant with mutations at two loci is predicted from theory and confirmed in simulations to be approximately the sum of the variance fractions for single mutations at the two loci. Sub-additivity is due to epistasis between the two loci. See Materials and methods for more details.

As described previously, Equation (5) implies a proportional relationship between the magnitude of global epistasis (quantified by the slope of the relationship between the fitness effect of a mutation and the background fitness) and the magnitude of microscopic epistasis (quantified by the residual variance around this linear trend); see also Figure 3a. We verify this relationship in simulations (Figure 2d). We predict that the slope obtained by regressing the sum of fitness effects of two mutations at loci i,j against background fitness is proportional to vij=vi+vj-2eij. We further assume that eij=O(v¯2) (specifically, eij=vivj for the genotype-phenotype map used for numerics). Since vi and vj are typically small for a complex trait, we expect near-additivity vijvi+vj and that any deviations are sub-additive, which is confirmed in simulations (Figure 2e, f).

Experimental observations from Johnson et al., 2019 are consistent with theoretical predictions.

(a) The fitness effect of one of the 91 mutations from Johnson et al., 2019 plotted against background fitness. (b) The distribution of the measured v~i (negative one-half of the slope from a) and variance fractions vi for the 91 insertion mutations. (c, d) v~i plotted against the additivity of interacting loci (AoIL) and the additivity of the mutated locus (see main text for definitions). The histograms are shown below the plots. The sign of the trend depends on the AoIL rather than the additivity of the mutated locus. (e) The measured variance of the residuals against the prediction v~i(1-v~i), shown here for the 91 mutations. The yellow circles correspond to the loci with AoIL lt0.5. The best-fit line (yellow dashed line) to these loci has R2=0.50 (R2=0.42 for all points). (f) The shape of the distribution of residuals pooled from all 91 mutations aligns well with the prediction from Equation (3). The variances of the two distributions are matched. Inset: same plot in log-linear scale. See Materials and methods for more details.

While testing the latter prediction on double mutants requires further experiments, we can immediately test the relationship between the slope and the distribution of residuals from existing experimental data. To do so, we reanalyzed the data from Johnson et al., 2019, which measured the fitness effect of 91 insertion mutants on about 145 backgrounds. These background strains were obtained by crossing two yeast strains that differed by 40,000 single-nucleotide polymorphisms (SNPs). Of these 40,000 loci, 40 have been identified as causal loci with currently available mapping resolution (Bloom et al., 2015). In Figure 3, we show the estimated v~i (negative one-half of the slope of the best-fit line) and the VF vi for each of the 91 mutations. These mutations were selected after screening for nonzero effect, and thus the DVF is biased upwards. The mean VF is v¯0.06. The wide range of vi observed in the data implies that the epistatic influence of loci varies greatly across loci, and we will show further below that this is crucial for maintaining a supply of beneficial mutations even when the organism is well-adapted to the environment.

Our theoretical results imply that we expect the linear relationship between background fitness and fitness effect to be negative if the additive effects of a locus’ interacting partners are not much larger than the epistatic terms. Specifically, we define the additivity of interacting loci (AoIL) for locus i as

(7) AoIL(i)|jifjfij+j>kifjkfijk+|(jifij2+j>kifijk2+)+|jifjfij+j>kifjkfijk+|,

which we show can be estimated from data (Materials and methods and SI). If the AoIL is less than half, Equation (4) implies that the linear trend is guaranteed to be negative. If instead the AoIL is greater than 0.5, the trend can be either positive or negative. The data shows a range of AoIL between 0 and 1 across loci. As predicted by our theory, we find that the loci with AoIL lt0.5 always show negative trends and the ones with AoIL gt0.5 show both negative and positive trends (Figure 3c). Importantly, the sign of the trend is determined by the AoIL and not by the additivity of the mutated locus, which we define as

(8) Additivity(i)fi2fi2+jifij2+j>kifijk2+.

The additivity across loci also has a wide range. However, small additivity does not necessarily imply a negative trend (Figure 3d).

We next used the data from Johnson et al., 2019 to analyze the relationship between the slope of the linear trend and the residual variance around this trend. We find that the experimental data confirms our theoretical prediction that the residual variance is proportional to v~i(1-v~i) if the AoIL is small (Figure 3e, R2=0.5 for loci with AoIL ¡ 0.5 and R2=0.42 for all loci). The Gaussian-distributed term in Equation (3) also predicts the shape of the distribution of the residuals given the VFs, which aligns well with the empirical distribution of the residuals (Figure 3f).

Together, these theoretical results and our reanalysis of experimental data show that linear patterns of global diminishing-returns and increasing-costs epistasis are a simple consequence of widespread epistatic interactions. The DVFs observed in data (Figure 3b) further imply that the epistatic influence of different loci on fitness can vary across a wide range. In what follows, we show that these two observations can be put together to make general predictions about the DFEes, and consequently the long-term dynamics of adaptation. The key ingredient that enables this analysis (including Equation (5)) is that in the WE limit fitness and fitness effects are jointly normal (with respect to a uniform distribution over all possible genotypes), which allows us to quantify complex dependencies between these variables in terms of pairwise covariances.

The distribution of fitness effects

Long-term adaptation is determined by the DFEs of possible mutations and the stochastic dynamical processes that lead to fixation. While Equation (5) represents the distribution of the fitness effects of a specific mutation at locus i over all genotypes in the population that have fitness y, we are instead interested in the DFE, where fitness effects are measured for all the mutations arising in the background of a particular genotype that has fitness y. For now we ignore the influence of evolutionary history on the DFE; we expand on that complication in the following section.

Examining the DFE over loci for a randomly chosen genotype of fitness y can be thought of as sampling the fitness effects s1,s2,,s from the conditional joint distribution P(s1,s2,,s|y), which generally depends on epistasis. If the number of independent, nonzero epistatic terms is large, then P(s1,s2,,s|y) is a multivariate normal distribution defined by the means and covariances of the +1 variables y,s1,s2,,s, which in turn can be computed in terms of the f’s from Equation (2). In particular, the conditional means and covariances are Meany(si)=2vi(yy¯), Covy(si,sj)=4V(eijvivj), where eij is the epistatic VF between loci i and j and eii=vi. This implies that the conditional correlation between fitness effects is (eij-vivj)/vivj(1-vi)(1-vj).

The DFE simplifies considerably if we make certain additional assumptions on the magnitude of epistatic interactions. If we assume the typical VF v¯ is small (i.e., v¯1) and also that eij is O(v¯2), then correlations are O(v¯) and thus negligible. Then, in a particular sample s1,s2,,s, we can think of each si as being drawn independently with mean -2vi(y-y¯) and variance 4viV. To compute the DFE, ρ(s|y), we first sample the VF from the DVF, P(v), and then sample a Gaussian random variable with the aforementioned mean and variance. This leads to the DFE

(9) ρ(s|y)=01𝑑v(2vV)-1P(v)φ(s+2v(y-y¯)2vV),

where φ is the standard normal pdf. Curiously, the correlations between si’s vanish when eij=vivj, in which case the above equation is exact and the DFE is determined entirely by the DVF. Further below, we introduce a specific fitness landscape model for which this relation does hold. Diminishing-returns is naturally incorporated in Equation (9): the mean of s is -2v¯(y-y¯), that is, the DFE shifts progressively towards deleterious values with increasing fitness.

Historical contingency in adaptive trajectories

A key unresolved question is the extent to which evolutionary history influences the DFE and the dynamics of adaptation (Agarwala and Fisher, 2019). That is, what does our theory say about historical contingency?

Suppose a clonal population of fitness y0 accumulates k successive mutations resulting in fitnesses y1,y2,,yk. By virtue of arising on the same ancestral background, the fitness gain of a new mutation, sk+1, is in general correlated with the full sequence of past fitnesses and the identity of the k mutations through its epistatic interactions with them. Based on these correlations, we use well-known properties of conditional normal distributions (Eaton, 1983) to write

(10) sk+1=i=0kwk+1,iyi+ϵ,

where the weights wk+1,i depend on the VF (vk+1) of the new mutation and its epistatic interactions with past mutations. Here, ϵ is the normally distributed residual that depends on the initial genotype and the weights (SI). Equation (10) is a generalization to a sequence of mutations of Equation (5), which we can think of as the special case where k=0.

To gain intuition, it is useful to first analyze Equation (10) when k=1 (i.e., to compute the effect of a second mutation conditional on the first). In this case, we show in the SI that

(11) s2-2v2(y1-y¯)+v1v2-e12v1s1+ϵ,

where s1=y1-y0 is the fitness effect due to mutation 1. The first term on the right-hand side is the dependence on the fitness of the immediate ancestor, similar to the corresponding term in Equation (5). The second term quantifies the influence of epistasis between loci 1 and 2 on s2. When e12=v1v2, dependence on s1 vanishes entirely and s2 depends only on y1. In contrast, if loci 1 and 2 do not interact, e12=0, and s2 is, on average, larger if the mutation at 1 is beneficial compared to when it is deleterious. This has an intuitive interpretation: diminishing-returns applies to the overall fitness and the mechanism through which it acts is epistasis. However, if mutations 1 and 2 do not interact, then the increase in fitness corresponding to mutation 1 does not actually reduce the effect of mutation 2 (as expected by diminishing-returns) so the expected effect of mutation 2 is larger. This analysis suggests that during adaptation, since selection favors mutations with stronger fitness effects on the current background, a mutation that interacts less with previous mutations is more likely to be selected.

To identify the conditions under which history plays a minimal role, we would like to examine when sk+1 depends only on the current fitness, yk, and is independent of both the past fitnesses and idiosyncratic epistasis. If this were true, then Equation (5) would apply for new mutations that arise through the course of a single evolutionary path (i.e., the fitness effect of a new mutation is ‘memoryless’ and depends only on its VF and the current fitness). Surprisingly, such a condition does exist. We show that this occurs when the magnitude of epistatic interactions between the new mutation and the k previous mutations, ek+1,1:k, satisfies a specific relation: ek+1,1:k=vk+1v1:k, where v1:k is the combined VF of the k previous mutations (SI). In general, this condition is not satisfied, implying that there will be historical contingency that can be analyzed using the framework above. Remarkably, it turns out that a fitness landscape model for which the condition is satisfied does exist and arises from certain intuitive assumptions on the organization of biological pathways and cellular processes. This fitness landscape model additionally serves as an example of a landscape where global epistasis can vary substantially across loci. We describe this model below.

The connectedness model

We introduce the ‘connectedness’ model (CN model, for short). In this model, each locus i is involved in a fraction μi of independent ‘pathways,’ where each pathway has epistatic interactions between all loci involved in that pathway (Figure 4a). The probability of an epistatic interaction between three loci (i,j,k) is then proportional to μiμjμk since this is the probability that these loci are involved in the same pathway. When the number of loci is large, we show that in this model vi=μi/(1+μi), and when is small, vi=μi/μ¯, where μ¯ is the average over all loci (SI). The CN model therefore has a specific interpretation: the outsized contribution to the fitness from certain loci (large vi) is due to their involvement in many different complex pathways (large μi) and not from an unusually large perturbative effect on a few pathways. The distribution, P(μ), across loci determines the DVF.

The distribution of fitness effect (DFE) and long-term adaptation dynamics predicted for the connectedness model.

(a) Schematic of the connectedness (CN) model, where each locus is associated with a fraction µ of pathways that contribute to the organism’s fitness. (b) An alternative model with modular organization, where sets of loci interact only within the pathways specific to a single module. (c) The DFE predicted from Equation (14) matches those obtained from simulated evolution of genotypes from the CN model. 128 randomly drawn genotypes (400 loci) with initial fitness y close to zero are evolved to y=2.5 and y=5, and the DFE is measured across loci and genotypes. We chose y¯=0 and V=1 so that y represents adaptedness. Insets: same plots in log-linear scale. Note that the number of beneficial mutations acquired during the simulated evolution (10-20) is much less than the total number of loci (400). (d) For a neutrally adapted organism, the theory predicts quick adaptation to a well-adapted state beyond which the adaptation dynamics are independent of the specific details of the genotype-fitness map. Shown here is the mean adaptation curve predicted under strong-selection-weak-mutation (SSWM) assumptions, which leads to a power-law growth of fitness with exponent 1/5 in the well-adapted regime (inset). (e) The number of fixed beneficial mutations under SSWM, which grows as a power-law with exponent 2/5 in the well-adapted regime (inset). The shaded region is the 95 confidence interval around the mean for (c) and (d). See Materials and methods and SI for more details.

Statistical fitness landscapes such as the NK model and the Rough Mt. Fuji model (Neidhart et al., 2013; Agarwala and Fisher, 2019; Kauffman and Weinberger, 1989; Stadler and Happel, 1999; Aita et al., 2000; Altenberg, 1997) are related to the CN model. Specifically, the CN model is a subclass of the broader class of generalized NK models (see Hwang et al., 2018 for a review). However, often-studied fitness landscape models have one important difference that distinguishes them and gives qualitatively different dynamics of adaptation (shown further below): in contrast to the CN model, classical fitness landscapes are typically ‘regular.’ That is, the VF of every locus is assumed to be the same (except the star neighborhood model, which has a bimodal DVF; Hwang et al., 2018).

The CN model is equivalent to a Gaussian fitness landscape with exponentially decaying correlations (SI). The CN model has tunable ruggedness, where the landscape transitions from additivity to maximal epistasis with increasing μ¯. Maximal epistasis corresponds to μi=1 (and hence vi=1/2) for all i. From Equation (5), this implies that the new fitness after a mutation occurs is independent of the previous fitness, consistent with the expectation from a House-of-Cards (HoC) model (Kauffman and Levin, 1987) (where genotypes have uncorrelated fitness). Regular fitness landscape models with exponentially decaying correlations have memoryless fitness effects under the restrictive assumption that every locus is equivalent (Agarwala and Fisher, 2019). We show that the dynamics of adaptation of the more general CN model are also memoryless, that is, the condition detailed in the previous section holds true (SI). Yet, as we show below, the predicted dynamics for the CN model are very different to those from a regular fitness landscape model.

We emphasize that the well-connectedness assumed for the CN model is not a requirement for Equation (5) to hold. However, how diminishing-returns influences the long-term dynamics of adaptation depends on the specific genetic architecture and the corresponding fitness landscape. Consider, for example, an alternative model of genetic networks organized in a modular structure (Figure 4b). In this model, each locus is part of a single module and interacts epistatically with other loci in that module to determine the fitness of that module; overall fitness is then determined as a function of the module fitnesses. In this case, the variance contributed by a locus is due to its additive contribution and from epistasis between loci restricted to its module. While the argument for diminishing-returns still applies to the fitness as a whole, it follows from the same argument that diminishing-returns should also apply to each module separately. Consequently, the dynamics of adaptation for the modular model are different from the CN model. For simplicity, we analyze the dynamics of adaptation for the CN model and postpone a discussion of how the dynamics differ for different models to subsequent work.

The dynamics of adaptation

We now examine the DFE that follows from Equation (5) and what that implies for long-term adaptation under the conditions for memoryless fitness effects. We henceforth assume a large number of loci with sparse epistasis (though the total number of nonzero epistatic terms is still large). This implies that 1, vi1 and v¯1; for simplicity, we also assume strong-selection-weak-mutation (SSWM) selection dynamics and s1,Ns1, where s are fitness effects and N is the population size. Under these conditions, a mutation sweeps and fixes in a population before another one arises. The probability of fixation of a beneficial mutation, pfix, is then proportional to its fitness effect (Haldane, 1927).

It is convenient to rescale fitnesses based on the total variance in fitness across all possible genotypes by defining z=V-1/2(y-y¯),σ=V-1/2s,ν=V-1/2η. Note that ν is normally distributed with zero mean and unit variance. Here, z has an intuitive interpretation as the ‘adaptedness’ of the organism. When the organism is neutrally adapted (|z|1), positive and negative epistatic contributions to the fitness are balanced and diminishing-returns is negligible. Diminishing-returns is relevant when the organism is well-adapted (z1). Below, we give the intuition behind our analysis, which is presented in full detail in the SI.

In the neutrally adapted regime, the linear negative feedback in Equation (5) is negligible and the DFE is determined by the distribution of v1/2ν. Loci with large v can lead to a DFE with a long tail. If v¯ is the typical VF of a locus, the fitness increases as znsv¯1/2, where ns is the number of substitutions. Since v¯ is a measure of overall epistasis, this implies that epistasis speeds adaptation in the neutrally adapted regime by allowing access to more influential beneficial mutations.

Fitness increases until the effect of the negative feedback cannot be neglected. From Equation (5), this happens when v¯zv¯1/2ν (i.e., when z2v¯-1). Intuitively, fitness begins to plateau when its accumulated benefit from substitutions is comparable to the scale of the total genetic variance (nsv¯1) and further improvements are due to rare positive fluctuations. In this well-adapted regime, diminishing-returns and increasing-costs epistasis strongly constrain the availability of beneficial mutations, whose effects can be quantified in this model: for a mutation to have a fitness effect σ, we require from Equation (5) that νσ/2v1/2+v1/2z, which has probability e-ν2/2. Beneficial effects of large σ arise when ν has a large positive deviation. The most likely v that leads to a particular σ is when ν is smallest (i.e., at v*σ/2z), in which case ν2σz, yielding a tail probability e-σz. Remarkably, the beneficial DFE in the well-adapted regime is quite generally an exponential distribution independent of the precise form of the DVF (unless it is singular). In particular, we show in the SI that for the DFE, ρ(σ|z),

(12) ρ(σ|z)ρ(-σ|z)=e-σz,

which depends solely on the adaptedness of the organism. The exponential form arises because of the Gaussianity of ν, but the argument can be easily extended to ν with non-Gaussian tails.

An exponential beneficial DFE has been previously proposed by Orr, 2003 but arises here due to a qualitatively different argument. Orr’s result instead follows from extreme value theory: suppose the fitness effects of loci (1) are sampled from a DFE ρ(σ) and F(σ)-σρ(σ)𝑑σ. Then, the probability that a beneficial mutation has at least a certain effect size σ is P(σbσ)=1-F(σ)1-F(0)lnF(σ)lnF(0), where the latter approximation holds when beneficial mutations are rare (i.e., 1-F(0) is small). A well-known result from extreme value theory (Gumbel, 2004; Majumdar et al., 2020) implies that for a large family of distributions ρ(σ) and for 1, we have -lnF(σ)e-kσ (for some constant k) and therefore P(σbσ)=e-kσ. This argument is consistent with our results, but does not yield the dependence of k on adaptedness and the rate of beneficial mutations without additional information about ρ(σ).

Under SSWM assumptions, from Equation (12), the typical effect size of a fixed mutation is σfixz-1, which typically has a VF,

(13) vfix*σfix/2z1/2z2.

The above relation makes precise the effects of increasing-costs epistasis on adaptation. As adaptation proceeds, the delicate balance of high fitness configurations constrains fixed beneficial mutations to have moderate VFs. A mutation of small VF is likely to confer small benefit and is lost to genetic drift, while one with a large VF is more likely to disrupt an established high fitness configuration.

This intuition is not captured in regular fitness landscape models, which assume statistically equivalent loci, that is, vi=v¯ for all i and P(v)=δ(v-v¯) is singular. From Equation (9), we see that this leads to a Gaussian DFE whose mean decreases linearly with increasing fitness, in contrast to the exponential DFE in our theory. The key difference is the lack of loci with intermediate effect, which drive adaptation in the well-adapted regime. As a consequence, the rate of beneficial mutations declines exponentially (Ube-v¯z2/2) and the fitness thus sharply plateaus at zv¯-1/2. In contrast, our theory predicts a much slower depletion of beneficial mutations, Ubz-2 (SI). The rate of adaptation is dz/dtUbpfixσfixz-4 (since pfixσfix), which leads to a slow but steady power-law gain in fitness, zt1/5. The rate of fixation of beneficial mutations is dns/dtUbpfixz-3t-3/5, which gives nst2/5.

We verify our analytical results using numerics. As before, we generated a genotype-phenotype map using the CN model with an exponential DVF, P(v)=v¯-1e-v/v¯ and =400 loci. The DFE can be calculated exactly by plugging in this P(v) in Equation (9):

(14) ρ(σ|z)=v¯-122v¯-1+z2e-σz/2-|σ|2v¯-1+z2/2.

We simulated the evolution of randomly generated genotypes from z=0 to z=2.5 and z=5, and the DFE across all loci was measured (we chose y¯=0,V=1 so that y=z,s=σ). The theoretical prediction for the DFE, Equation (14), closely aligns with the numerical results (Figure 4c).

Due to computational constraints, it is difficult to simulate evolution deep into the well-adapted regime. To compute the shape of adaptive trajectories and their variability, we instead simulated SSWM dynamics using the DFE directly from Equation (14), beginning from a neutrally adapted fitness (z=0). Typical trajectories (Figure 4d) show rapid adaptation to the well-adapted regime beyond which the fitness grows slowly as t1/5, as predicted from theory. The predictions for the number of fixed beneficial mutation are also recapitulated (Figure 4e).


Recent empirical studies have observed consistent patterns of diminishing-returns and increasing-costs epistasis. Our model gives a simple explanation for these observations. In particular, we showed that these patterns are generic consequences of widespread microscopic epistatic interactions. The intuition underlying this result is that a random mutation typically has a larger disruptive effect on the delicate balance of microscopic epistasis that underpins a fitter background. Our model predicts a quantitative relationship between the magnitudes of global epistasis (i.e., the negative slope of diminishing-returns and increasing-costs epistasis) and microscopic epistasis, which we confirmed using existing data (Figure 3).

A similar explanation for diminishing-returns and increasing-costs epistasis has been recently proposed by Lyons et al., 2020. While our core argument for diminishing-returns and increasing-costs epistasis is the same as in that work, our Fourier analysis framework dissects the features of the fitness landscape necessary to observe these phenomena in terms of experimentally measurable average effects (i.e., the f’s in Equation (2)). In particular, we show that the additivity of a locus’ interacting partners critically determines whether the trend is negative or unbiased. In addition, the Fourier analysis framework yields predictions for the DFEs, the historical influence of past mutations on the fitness effect of a newly mutated site, and motivates the proposed ‘connectedness’ fitness landscape model. The analysis of experimental data presented in Lyons et al. complements the experimental data considered here, lending further empirical support for the prevalence of epistasis and its importance in determining long-term adaptability.

Our model leads to other experimentally testable predictions. The most direct and accessible test of the theory is to measure the fitness for all possible combinations of mutations at 10–15 significant loci and compare (using Equation (6)) the magnitude of global epistasis to the measured fitness coefficients (the f’s). Additionally, we predict that the magnitude of global epistasis of a double mutant should be nearly the sum of magnitudes of the corresponding single mutants, and any deviations should be biased towards sub-additivity. Since the predictions involve measuring residual variance, experimental noise can be an important confounding factor.

The observation that diminishing-returns occurs as a ‘regression to the mean’ effect on certain fitness landscapes has been noted previously (Draghi and Plotkin, 2013; Greene and Crona, 2014). The theory developed here quantifies precisely when we should expect to observe these patterns. We emphasize that our key result, Equation (5), is a general statistical relation that holds if epistasis is widespread, irrespective of the specific genetic architecture and the corresponding fitness landscape. Weak epistasis with many loci is sufficient to observe noticeable patterns of global epistasis. However, the argument fails if the contribution of a locus is purely additive or when epistasis is limited to one or a handful of other loci. In the latter case, we expect the fitness effect of a mutation to be dominated by the allelic states of its partner loci, and thus take on a few discrete values. A few examples from Johnson et al., 2019 indeed exhibit this pattern (e.g., cases where the fitness effect of a specific mutation depends primarily on the allelic state at a single other locus).

We highlight a distinction between global epistasis discussed in this work and another form of global epistasis (also known as ‘nonspecific’ epistasis) typically used in protein evolution to describe nonspecific epistatic interactions due to a nearly additive trait transformed by a nonlinear function (Starr and Thornton, 2016; Otwinowski et al., 2018; Otwinowski, 2018; Sailer and Harms, 2017; Husain and Murugan, 2020). This nonlinear function creates systematic relationships between epistasis terms and breaks the condition of independent epistatic terms required for our arguments to apply. Specific nonlinearities such as an exponential function may indeed lead to a negative linear trend on average, but the structure of the residuals differs from the one in Equation (5) and observed in data.

A surprising empirical observation is that the negative linear relationship between fitness effect and ancestral fitness characteristic of global epistasis has different slopes for different loci. Our model identifies the negative slope as twice the fraction of variance contributed by a locus to the trait. To explain the wide range of VFs observed in data, we developed the CN model, a framework to think about the organization of cellular processes that can lead to loci of widely varying VFs. In the CN model, loci have a large VF due to their involvement in many different pathways rather than due to a large effect on a single pathway. The CN model can be viewed as a statistical fitness landscape where loci can have a range of VFs, specified by the DVFs. In the special case of every locus having the same VF, the CN model corresponds to a fitness landscape with tunable ruggedness and exponentially decaying correlations.

Extending our framework to incorporate adaptation, we showed that the DFE depends only on the current fitness, rather than the entire evolutionary history, under the intuitive assumptions behind the CN model. The theory therefore gives a simple explanation for why phenotypic evolution can be predictable, even while the specific mutations that underlie this evolution are highly stochastic.

Our framework has an implicit notion of ‘adaptedness’ without referencing a Gaussian-shaped phenotypic optimum, often assumed in models of adaptation (e.g., Fisher’s geometric model) (Fisher, 1930; Orr, 2005; Martin and Lenormand, 2006). Over the course of adaptation, the DFE shifts towards deleterious values, reflecting diminishing-returns, which naturally arises from our basic arguments. For a well-adapted organism, we show that the DFE for beneficial mutations takes on an exponential form and leads to universal adaptive dynamics. While an exponential DFE for beneficial mutations has been proposed previously based on extreme value theory (Orr, 2003), our result arises due to an entirely different argument: the tail of the beneficial DFE is determined by loci of intermediate size whose disruptive effect due to increasing-costs is small, yet whose effect size is large enough not to be lost due to genetic drift.

Our theory further predicts declining adaptability, with rapid adaptation in a neutrally adapted regime followed by much slower increases in fitness, resulting in power-law adaptive trajectories when the organism is well-adapted. This is consistent with observations from the E. coli LTEE (Wiser et al., 2013; Lenski et al., 2015). Our model predicts a quicker decline in the number of substitutions (nst2/5) compared to the near linear trend observed in the LTEE data (Good et al., 2017). However, the dynamics of fixation in the LTEE deviate strongly from SSWM assumptions. This may explain the discrepancy, although we note that existing theory has only analyzed the effects of clonal interference and other breakdowns in SSWM assumptions for a constant DFE and weak epistasis (Good et al., 2012; Schiffels et al., 2011). Further work will be required to understand how these effects interact with global epistasis. For example, we may expect that the effect of a highly beneficial mutation at a segregating locus is more likely to be attenuated due to interference from subsequent deleterious mutations, while a less-fit lineage has a larger pool of beneficial mutations and is thus more likely to ‘leapfrog’ over more-fit lineages.

Materials and methods

The code and data to generate the figures are available at Reddy, 2020.


Request a detailed protocol

We use a fitness landscape model with loci to generate the genotype-fitness map. Each locus is assigned a sparsity µ from P(μ), which is an exponential distribution with mean μ¯. Each of M independent pathways sample loci with each locus i having probability μi of being selected to a pathway. We choose =400,μ¯=0.02,M=500 so that μ¯=8 ensures significant epistasis. All loci in a pathway interact with each other, where additive and higher-order coefficient terms of all orders were drawn independently from a standard normal distribution. The total fitness is the sum of contributions from the M pathways. We normalize the coefficients so that the sum of squares of all coefficients is 1, that is, the total variance across genotypes is 1. The mean, y¯, is close to zero from our sampling procedure. The above procedure is a simple and efficient way to generate epistatic terms to order 20, beyond which the computational requirements are limited by the exponentially increasing demand. Note that the effects described in the paper were also observed with only pairwise and cubic epistatic terms.

The VFs shown in Figure 2a can be calculated numerically from the definition. From the theory, given our choice of P(μ), these should follow an exponential distribution with mean v¯μ¯/(1+μ¯). There may be deviations since M is finite whereas the calculations assume M. To generate Figure 2b, in order to get a range of background fitnesses, we first sample 128 random genotypes. These have fitnesses close to zero; in order to obtain a range of fitness values, we simulated the evolution of these 128 genotypes up to y=1,2,3,4,5 under SSWM assumptions to get 128×5=640 genotypes at roughly five fitness values. The fitness effect of applying a mutation (i.e., flipping its sign) is measured for 30 randomly chosen loci (which are kept fixed) over each of the 640 genotypes. This is shown for 5 of the 30 and for the mean over the 30 loci in Figure 2b.

To generate Figure 2c, we sampled 64 random genotypes and 12 replicates of each. The evolution of these 768 genotypes was simulated for a total of 50 generations with a mutation rate of 1 per generation. The mean fitness gain over the 12 replicates is plotted for each of the 64 founders against their initial fitness.

To generate Figure 2d, the residuals are measured using the same procedure as for the experimental data analysis described below for the initial 128 genotypes at y0 and the 30 loci with the largest VF.

Double mutants were created by mutating all pairs of the 30 randomly chosen loci on the 640 evolved genotypes. Their mean fitness effect was computed and plotted along with the mean fitness effect for single mutants, shown in Figure 2e. The VF of the pair of loci for the double mutant was estimated as before and compared to the sum of the estimated VFs of the corresponding single mutants. This is shown in Figure 2f.

To generate the plots in Figure 4c, we simulated the evolution of 128 randomly sampled genotypes to y=2.5 and y=5. The fitness effect of 200 randomly sampled loci was measured and the distribution is plotted.

Analysis of the data from Johnson et al.

Request a detailed protocol

The data from Johnson et al., 2019 consists of the fitness after the addition of 91 insertion mutations on each of 145 background genotypes. The fitness of a particular mutation at locus i can be modeled as

(15) yi=-ciy+bi+Residuali(g),

where yi,y are the mutant and background fitnesses, respectively, ci,bi are constants for each locus, and the residual Residuali(g) depends on the background genotype g.

We estimate the VF vi=(1-ρ^i)/2, where the Pearson correlation ρ^i=Corr(yiy,yyi), where the symbol ⊕ denotes that the mutant and background fitness datasets are concatenated. v~i is estimated as the negative one-half of the slope of the best linear fit of si=yi-y and y. The residuals for each of the 145 genotypes for each of the 91 mutations are simply

(16) Residuali(g)=(yi+ciy)-(yi+ciy)¯,

where the overline represents an average over the 145 genotypes, which is used as an estimate of the constant term and ci=2v~i-1. In Figure 3b, we plot the distribution of estimated vi and v~i. In Figure 3c, we compute the AoIL for each locus using Equation (7), which we show in the SI to be |Cov(si,yi+y)|/(|Cov(si,yi+y)|+Var(si)). In Figure 3d, we compute the additivity using Equation (8). The additive effect is fi=(yi-y)¯/2, and Var(si)/4 gives the sum of squares of the epistatic terms (SI). In Figure 3e, we compute the variance of the residuals across the 145 genotypes for each locus and plot it against the locus’ estimated v~i(1-v~i). In Figure 3f, we plot the distribution of residuals over all genotypes and loci. The prediction is that in the WE limit the distribution of residuals is determined by 2v~i(1-v~i)η, where η is a Gaussian random variable. We multiply v~i(1-v~i) for each locus with 10,000 i.i.d. standard normal random variables, pool the resulting numbers for all loci, and plot the predicted distribution in Figure 3f. The distributions are variance-matched. While Figure 3e shows that the variance of the residuals aligns with the theoretical prediction of being proportional to slope, Figure 3f shows that the data is also consistent with the predicted Gaussianity of the background-genotype-dependent contribution.

Appendix 1

Diminishing-returns and increasing-costs epistasis in a model of a complex trait

In the main text, we propose that the fitness effect after a mutation at locus i can be written as

(17) si=-2vi(y-y¯)+2vi(1-vi)ηi,0vi1.

Here, ηi is a genotype and locus-dependent contribution, which is distributed as a mean-zero Gaussian with variance equal to the total genetic variance, y¯ is a constant, and vi is the variance fraction, which is defined further below. Equation (17) corresponds to the symmetric case where the fitness effects of the mutations xi=-1+1 and xi=+1-1 are simultaneously regressed against their respective background fitness. The directed case when the mutation is specified to change from xi=-1+1 (or xi=+1-1) will be considered in the directed mutation section below.

We show that Equation (17) arises under certain conditions in a generic model of a complex trait with 1 biallelic loci. Essentially, we would like to compute the distribution of the new fitness, yi, across genotypes after a mutation at locus i given the current fitness y and the set of all parameters Θ of the model (i.e., P(yi|y,Θ), with si=yi-y). Using the chain rule of probability, we can write

(18) P(yi|y,Θ)=gP(yi|g,Θ)P(g|y,Θ),

where the sum is over all possible genotypes. While P(yi|g,Θ) is determined by the genotype-fitness map, P(g|y,Θ) is the crucial factor that gives weight only to the genotypes that yield the current fitness y. If the fitness is much larger than the mean fitness over all possible genotypes, Equation (18) implicitly ensures that weight is given to only those genotypic configurations that lead to such an unusually large fitness. We will analyze the case of a ‘microcanonical ensemble’ where every genotypic configuration that leads to a particular fitness is equally likely (with no linkage), that is, the prior P(g) across genotypes is uniform (see the section 'Maximum entropy interpretation' for a discussion).

It is difficult to directly evaluate the sum in Equation (18). In the following sections, we give a simple derivation but elaborated to highlight the key assumption that leads to Equation (17). In short, the negative correlation between si=yi-y and y implied by Equation (17) is a trivial consequence of yi and y having the same distribution w.r.t. P(g). ηi is in general arbitrary and determined by the genotype-fitness map. However, ηi is normally distributed if we make certain assumptions about the structure of epistasis in the genotype-fitness map. The emergence of the negative linear relationship for a directed mutation xi=-1+1 is subtler.

Fourier representation of the fitness function

The fitness y(g) for a genotype of length , g={x1,x2,,x}, where xi=±1, can be generally written as Poelwijk et al., 2016; Neher and Shraiman, 2011

(19) y(g)=y¯+ifixi+i>jfijxixj+i>j>kfijkxixjxk+

The symmetric choice of xi=±1 is chosen for mathematical convenience. In this form, the total variance contribution from the qth-order epistatic terms is i1>i2>iqfi1i2iq2, and the total genetic variance, defined as

(20) V12g(y(g)-y¯)2,

is the sum of variance contributions from orders q=1 to , that is, the sum of squares of all the f’s. The sum over the 2 genotypes is an expectation value assuming all genotypes are equally likely; we will denote this expectation using an overline hereafter. We use this expectation value as a proxy for empirical averages over the ‘background’ genotypes in a population. With a sufficient number of background genotypes, the empirical average should converge to this expectation value.

The representation in Equation (19) is a Fourier representation of the fitness function on the -dimensional hypercube and makes calculations much simpler. For instance, to get the terms from each order, we have

(21) y¯=y(g)¯,fi=xiy(g)¯,fij=xixjy(g)¯,

It is useful to define the variance contribution due to a particular locus i as (symmetry w.r.t. interchanging indices is used for each term throughout)

(22) Vifi2+jifij2+k<jifijk2+.

We also define the variance fraction of locus i,

(23) viViV,

which plays a key role in the model.

Derivation of Equation (17)

We would like to relate the fitness before and after locus i is flipped (i.e., xi-xi), denoted by y and yi, respectively. We have from Equation (19),

(24) yy¯=yi¯+ξi,
(25) yi-y¯=yi¯-ξi,

where yi¯ and ξi respectively contain all the terms not containing and containing locus i, that is,

(26) ξi=fixi+jifijxixj+k<jifijkxixjxk+.

We have yi¯=(y+yi)/2-y¯ and ξi=(y-yi)/2. Since yi¯ and ξi contain all the genotype dependence, we can write


From the properties of the Fourier representation in Equation (21), it is easy to see that the means are yi¯¯=0,ξi¯=0, the variances are yi¯2¯=VVi, ξi2¯=Vi, and the covariance is yi¯ξi¯=0.

The calculations so far have been exact. We now make the key assumption that yi¯ and ξi are both normal-distributed across genotypes. This assumption is similar to that of Fisher’s infinitesimal model, where the distribution of trait values across strains for a complex trait is argued to be normal-distributed since the trait value is due to infinitesimal independent contributions from many loci. While yi¯ is easily seen to be normal-distributed for large , an argument can be made for ξi only if locus i has a large number of independent, nonzero epistatic terms and the additive term fi is smaller in magnitude than the epistatic terms; specifically, we require that fi2jifij2+k<jifijk2+. If instead fi2jifij2+k<jifijk2+, then ξi is bimodal, where the two modes correspond to ξi±fi at xi=±1. For loci with pairwise and third-order epistasis, the number of pairwise and third-order epistatic terms scale and 2, respectively, which justifies the normality assumption for large even if individual epistatic terms are smaller in magnitude than the additive terms.

Under these assumptions and since yi¯ and ξi are linearly independent, we have

(30) P(yi|y,Θ)φ(y+yi-2y¯2V-Vi)φ(y-yi2Vi),

where φ is the standard normal pdf. Therefore, yi is normal-distributed across genotypes and from the above equation can be written as

(31) yi-y¯=(1-2vi)(y-y¯)+2vi(1-vi)ηi,

where the VF vi=Vi/V was defined previously and ηi¯=0,ηi2¯=V. This leads to the form in Equation (17),

(32) si=-2vi(y-y¯)+2vi(1-vi)ηi.

The above derivation was presented to clarify the basic assumptions. Simply computing the covariance between y and yi in Equations (24) and (25), we get (yy¯)(yiy¯)¯=yi¯2¯ξi2¯=V2Vi. The correlation is then 1-2vi. Equation (32) follows if additionally yi,y are jointly Gaussian, which is true if locus i has many independent, nonzero epistatic terms.

Directed mutation

Previously, we considered the symmetric flip xi-xi and averaged over all loci including i. Here, we consider the case when the mutation is specified to change either from xi=-1+1 or xi=+1-1. In this case, we should average over all loci except i.

We consider xi=-1+1 (the opposite case is similar). The fitness before the mutation is

(33) yi-=y¯-fi+ji(fj-fij)xj+j>ki(fjk-fijk)xjxk+,

and the fitness after the mutation is

(34) yi+=y¯+fi+ji(fj+fij)xj+j>ki(fjk+fijk)xjxk+,

so that

(35) si=yi+-yi-=2fi+2jifijxj+2j>kifijkxjxk+.

The tilde and hat are used here to distinguish the fitness from the yi defined previously for the symmetric case corresponding to the flip xi-xi. We can easily compute averages over genotypes using the Fourier orthogonality relations. To compute the expectation value of products of two quantities, say si and yi-, we simply compute a dot product where the components of the vectors are the coefficients of terms in the expansion above. For example, siyi-¯ is (y¯-fi)(2fi)+ji(fj-fij)(2fij)+j>ki(fjk-fijk)(2fijk)+. We have for the means, variances, and covariances, 

(36) s¯i=2fi,
(37) yi-¯=y¯-fi,
(38) (sis¯i)2¯=4jifij2+4j>kifijk2+,
(39) (yiyi¯)2¯=ji(fjfij)2+j>ki(fjkfijk)2+,
(40) (si-s¯i)(yi--yi-¯)¯=2ji(fj-fij)fij+2j>ki(fjk-fijk)fijk+
(41) =2(jifjfij+j>kifjkfijk+)(sis¯i)2¯2

The slope when si is regressed against yi- is (sis¯i)(yiyi¯)¯/(yiyi¯)2¯. We can define a ‘modified’ VF v~i as half the negative slope, 

(42) v~i(sis¯i)2¯/4(jifjfij+j>kifjkfijk+)ji(fjfij)2+j>ki(fjkfijk)2+,
(43) =(jifij2+j>kifijk2+)(jifjfij+j>kifjkfijk+)ji(fjfij)2+j>ki(fjkfijk)2+.

Writing the linear form based on this correlation, we get

(44) si=2fi-2v~i(yi--y¯+fi)+Kiηi,

where ηi is again normally distributed (in the WE limit) with zero mean and variance (yiyi¯)2¯ and Ki2=(sis¯i)2¯(yiyi¯)2¯4v~i2. Note that, unlike vi, v~i can be negative.

However, we argue that v~i is typically positive in the WE limit, which leads to a negative linear trend. The second term in the numerator on the right-hand side of Equation (42) has the same number of terms as (sis¯i)2¯, but these terms appear as products of Fourier coefficients that may have opposing signs. In particular, if jifij2>jifjfij, j>kifijk2>jifjkfijk, and so on, then v~i is guaranteed to be positive. If we denote the typical magnitude of qth-order epistasis terms as eq (e1 corresponds to additive effects), each of this relationships has the form eq+12>eqeq+1 when 1, that is, eq<eq+1. If the number of loci is sufficiently large, then these relationships will hold even if the typical magnitude of individual higher-order epistasis terms is smaller than the lower-order terms. We therefore expect that the second term in the numerator on the right-hand side of Equation (42) is smaller than (sis¯i)2¯2 when 1. A similar argument can be made for the cross terms fjfij,fjkfijk, once the squares in the denominator of the right-hand side of Equation (42) are expanded.

When 1, we can then write

(45) v~iVi-fi2V-fi2=vi-fi2/V1-fi2/Vvi-fi2/V.

Further, in the WE limit, Ki24v~i-4v~i2 so that the variance of Kiηi is v~i(1-v~i).

To estimate the ratio of the magnitudes of the second and first terms in the numerator on the right-hand side of Equation (42) from data, we use the expression for the covariance,

(46) (si-s¯i)(yi+-yi+¯)¯=2ji(fj+fij)fij+2j>ki(fjk+fijk)fijk+,

to get

(47) |jifjfij+j>kifjkfijk+|jifij2+j>kifijk2+=|Cov(si,yi-)+Cov(si,yi+)|Var(si)=|Cov(si,yi-+yi+)|Var(si)

Comments on the result

Fitness as a nonlinear function of an additive trait

The negative linear trend observed in data may arise due to the measured fitness being a nonlinear function of an unobserved additive trait. In this case, the epistasis terms are systematically related to each other and the independence assumptions used to derive Equation (32) break down. In short, we show that specific nonlinearities can indeed lead to a negative linear trend, but the statistics of the residuals observed in data make this possibility unlikely.

Suppose the fitness is y=ϕ(u), where ϕ is a nonlinear function, u=f0+ifixi is the unobserved additive trait, f0 is a constant, and fi are additive coefficients. For a linear trend, we require si=ϕ(ui)-ϕ(u)ϕ(u), where ui=u-2fixi for the flip xi-xi. For small fi relative to the other coefficients, we can Taylor expand ϕ(ui) and show that we require ϕ(u)eu to get a linear trend. This nonlinearity creates a linear trend with slope e-2fixi-1. For a negative linear trend, we require 2fixi>0. However, even if this condition is true, the relation si=(e-2fixi-1)y is exact and there are no residuals.

To introduce residuals, suppose instead that u=f0+ifixi+h, where h is a HoC term, that is, it is an independent Gaussian random variable (mean 0, variance σh2) across genotypes and repeatable across measurements. h introduces epistasis in the unobserved trait. We have si=(e-2fixi+hi-h-1)y, where hi is the HoC term after the mutation, so that the average fitness effect conditional on y is s¯i=(e-2fixi+σh2-1)y. The conditional variance of the residuals is (sis¯i)2¯=e4fixi+2σh2(e2σh21)y2. Note that the residual variance is no longer proportional to the slope and this variance increases as y2, which are both inconsistent with the data.

Maximum entropy interpretation

The expectation values are averages over all the genotypes assuming that every genotype of a particular fitness is equally likely, that is, the distribution over genotypes is uniform. This assumption is analogous to ensemble averages over a microcanonical ensemble in statistical physics, where one assumes that all the particle configurations that have a particular energy are equally likely. The experimental setting in Johnson et al., 2019 is similar. The background genotypes are generated from a cross between two strains, which due to recombination makes each locus have equal probability of being one of the two alleles. Closely linked loci may be considered together as blocks. Some of the loci are partially linked, which may lead to deviations from the predictions. The expressions derived above can be easily extended to the case with different background genotype statistics.

The uniform distribution has an information-theoretic interpretation as the distribution that has the maximum entropy (MaxEnt) given no additional knowledge of how the genotype was generated. Equation (32) can therefore be viewed as the MaxEnt prediction of the fitness effect if locus i is mutated conditioned on the current observed fitness y. A key idea that will be used throughout the paper is that when each locus i has a significant number of independent, nonzero epistatic terms, the distribution of fitness and fitness effects is jointly normal with respect to the uniform prior over genotypes. From well-known properties of multivariate normal distributions, the MaxEnt predictions of unobserved variables are multilinear forms of the observed variables. For example, the MaxEnt prediction for si given an observed sequence of past fitness is an autoregressive Gaussian process defined by the covariance between the unobserved and observed variables (see section 'History dependence' below).

Varying a subset of loci

In addition, the sums in Equation (19) are over loci involved in a trait. In reality, we may vary a subset P of all loci and take averages over only this subset. The derivation still follows through in this case; we can simply write the fitness in terms of effective parameters as

(48) y(g)=y~+iPf~ixi+i,jPf~ijxixj+,where
(49) y~=y¯+iPfixi+i,jPfijxixj+,
(50) f~i=fi+jPfijxj+j,kPfijkxjxk+,and so on.

Here, we are abusing notation — it is to be assumed that a coefficient with a particular combination of indices appears only once in the sums. The results from the previous section still apply w.r.t. these new effective parameters. For instance, the total variance and the variance due to locus iP are

(51) V~=iPf~i2+i,jPf~ij2+i,j,kPf~ijk2+,
(52) V~i=f~i2+jPf~ij2+j,kPf~ijk2+,

and the effective VF is V~i/V~.

Relationship to the fluctuation-dissipation relation

Equation (32) is analogous to the fluctuation-dissipation theorem in statistical physics (Kubo, 1966), which relates the response of a thermodynamic system to a perturbation. The relationship between the magnitude of macroscopic epistasis (the slope in Equation (32)) and the variance due to the background genotypes is analogous to the relationship between viscous drag and Brownian motion (Kubo, 1966). For Brownian motion, the normality arises due to numerous independent collisions of a particle with neighboring particles. In our case, natural selection acts as an external perturbation that pushes the system away from equilibrium (here y¯). Diminishing-returns naturally arises as the tendency of the system to revert to its entropically favored equilibrium state.

Variance fraction of double mutants

Using the arguments from the previous section, it is easy to show that the variance of a double mutant at loci i and j, Vij, is necessarily sub-additive. In particular, we have

(53) Vij=Vi+Vj-2Iij,where
(54) Iij=fij2+ki,jfijk2+

is the total epistatic variance between loci i and j. The correlation between the new fitness after a double mutation and the previous fitness is 1-2Vij/V1-2vij=1-2vi-2vj+2eij, where eijIij/V is the epistatic variance fraction between i and j.

Appendix 1—figure 1
The mean fitness trajectory and the mean number of substitutions predicted by the model in the strong-selection-weak-mutation regime.

(a) The scaled fitness vs. scaled time. Shown below is the fitness in log-scale to highlight the different scalings in the neutrally adapted and well-adapted regimes. The slopes of the dashed lines are shown. (b) The scaled number of substitutions vs. scaled time as in (a).

Appendix 2

Connectedness model

We introduce a ‘connectedness’ model (the CN model, for short), where each locus has a probability μi of being involved in any particular interaction. We can interpret μi as the fraction of independent pathways that involve locus i, where each pathway has epistatic interactions between all loci involved in that pathway. The number of independent pathways, M, is assumed to be very large. The probability of an epistatic interaction between, say loci i,j,k, is proportional to μiμjμk since this is the probability that these loci are involved in the same pathway. The magnitude of the interaction term is fijk2μiμjμk, where the proportionality is the magnitude of the perturbative effect of the mutations, which is assumed to be constant for all orders of interaction. We set this quantity to unity since it simply scales the fitness coefficients and does not affect subsequent results. The CN model leads to a specific interpretation (and hence its name): the outsized contribution to the variance from a particular locus is due to its involvement in many different complex pathways and not from an unusually large perturbative effect on a few pathways. For large , the CN model is specified by the distribution, P(μ), across loci. In particular, given P(μ), we can calculate the total genetic variance, V, and the variance due to locus i, Vi. We define μ¯01μP(μ)𝑑μ.

We calculate the expected total variance across statistical ensembles. Note that here the expectations are averages over ensembles where the parameters of the model are resampled, in contrast to the derivations presented in sections above, which were ensemble averages over equally likely genotypes. Since each pathway is independently sampled, expectations approximate the values in a single realization as M. All expectations are denoted .. We calculate the expected variance contribution from one pathway: since all pathways are statistically identical, the total variance from M pathways is simply M times the expected contribution from a single pathway. The contribution from the qth-order interaction between loci i1,i2,,iq is fi1i2iq2=n=1qμin. The expected total variance is

(55) V=ifi2+i>jfij2+i>j>kfijk2+,
(56) =iμi+i>jμiμj+i>j>kμiμjμk+,
(57) =i=1(1+μi)-1

The variance due to terms involving locus i is

(58) Vi=fi2+jifij2+j>kifijk2+,
(59) =μi+μijiμj+μij>kiμjμk+,
(60) =μiji(1+μj).

Therefore, we have

(61) vi=Vi/V=μiji(1+μj)j(1+μj)-1.

There are two qualitatively different regimes, μ¯1 and μ¯1. When μ¯1, each pathway typically contains a single locus and should lead to an additive model. In this limit, we can write from Equation (61),

(62) viμiμ¯,μ¯1,

which is consistent with the expectation that vi-1 for an additive model. In the opposite limit, μ¯1, we have

(63) viμi1+μi,μ¯1.

The CN model therefore leads to an intuitive interpretation of the VF as being determined by the sparsity of interactions of a locus. Equation (63) yields an upper bound at vi=0.5, at μi=1. From Equation (31), we see that this case is equivalent to the HoC model of maximal epistasis where the new fitness after a mutation is independent of the previous fitness. The DVF P(v) is directly determined by P(μ); the CN model can therefore be used as a generative model to generate fitness landscapes with arbitrary DVFs.

We can further calculate the epistatic variance between two loci in the CN model. The total epistatic variance Iij between loci i and j is

(64) Iij=fij2+ki,jfijk2+
(65) =μiμjki,j(1+μk).

In the limit μ¯1, the epistatic VF after dividing by V is then simply eij=μiμj/(1+μi)(1+μj)=vivj. Using Equation (53), we have

(66) vij=vi+vj-2vivj.

If vi’s are small, the CN model predicts near-additivity between the effects of two loci. This is not inconsistent with the strong epistasis assumption implicit in the limit μ¯1: though the total contribution of epistatic interactions to the genetic variance may be large, the epistatic variance between two specific loci can still be negligible. This is because the majority of epistatic variance is due to the combinatorially large number of higher-order epistatic terms whose individual effects themselves can be weak.

Relationship to statistical fitness landscapes

Statistical fitness landscapes such as the NK model and the Rough Mt. Fuji model are closely related to the CN model described above. The CN model falls under the broad class of generalized NK models (Hwang et al., 2018). In generalized NK models, epistasis is due to modules (or ‘pathways’) of K loci that interact epistatically with each other. The different NK models differ in how the loci are assigned to the modules and the interaction structure within the module. In the CN model, each locus has a locus-specific probability of being part of any module and the interaction structure within a module is all-to-all. The locus-specific probability gives rise to a highly non-regular model, that is, loci can have a wide range of contributions to the total variance. This feature gives rise to qualitatively different adaptation properties. We will show this further below in addition to showing that the CN model has a special memoryless property.

Well-studied fitness landscapes such as Kauffman’s NK model and the Rough Mt. Fuji landscape are regular, that is, every locus contributes equally to the variance. In other words, the DFE, which is determined by the DVF in our picture, instead comes from a single constant value v=v¯. The fluctuations of the fitness effects are solely due to the genotype-dependent term η, which is a Gaussian. In a later section, we show that the DFE in this case corresponds to a Gaussian with mean -2v¯(y-y¯). As adaptation proceeds and y increases, the DFE shifts to the deleterious side but retains its Gaussian shape. The adaptation properties that result from this DFE are quite different from those arising from our theory. Further, while the regular fitness landscape picture may lead to a good approximation of our results when the number of mutations is large, it does not capture the different magnitudes of diminishing-returns for different loci observed in experiments.

The landscape of the regular CN model

The CN model for the special case of homogeneous loci, μi=μ¯ for all i, is similar to the NK model but with one major difference: the number of loci in each pathway in the NK model is fixed at K loci, whereas in the CN model the typical pathway size is controlled by a continuous parameter μ¯. This introduces contributions at every epistatic order while effectively imposing sparsity on the contributions from higher-order interactions. We now show that the regular CN model has tunable ruggedness, that is, it transitions from an additive model to a model with maximal epistasis with increasing μ¯ and has exponentially decaying correlations for μ¯1.

The regular fitness landscapes often discussed are stationary Gaussian processes on a -dimensional hypercube. The regular CN model (i.e., μi=μ¯ for all i) also falls into this class as M. The key quantity that defines such a fitness landscape is the covariance function between the fitnesses of genotypes g and g, C(g,g)=C(d), where d|g-g| is the Hamming distance between two genotypes. We now compute the covariance C(d) for the regular CN model. Each order term in each pathway is independent and the covariances for each order add up over all pathways. We first calculate the expectation of the first-order term in a single pathway, which is

(67) i,ifixifixi=μ¯i,iδiixixi
(68) =μ¯g.g
(69) =μ¯(-2d).

Here, we have used fi2=μ¯, where the μ¯ comes from the probability of locus i being selected for the module as argued previously. The covariance is linear in the distance between genotypes, as one would expect from an additive model. Directly calculating the higher-order terms is more complicated because of the ordering restriction i>j for the second-order term and for higher orders. As noted previously (Stadler and Happel, 1999; Neidhart et al., 2013; Agarwala and Fisher, 2019), these can instead be calculated using combinatorics, which we will demonstrate with the second-order term. We have for the second-order covariance

(70) i>j,i>jfijxixjfijxixj=μ¯2i>j,i>jδiiδjjxixixjxj
(71) =μ¯2i>j(gg)i(gg)j,

where the element-wise product (gg)ixixi is 1 if xi=xi match and −1 otherwise. If d is the distance between g and g, then the element-wise product gg has d-1 terms and -d 1 terms. The term in the summation above is 1 if both (gg)i and (gg)j are chosen from the -d subset or both are chosen from d subset. This term is −1 if one of the two is chosen from the -d subset and the other from the d subset. The number of terms that are 1 s are therefore (-d2)(d0)+(-d0)(d2) and the number of terms that are −1 s are (-d1)(d1). This argument is easily extended to higher orders. The general qth-order contribution to the covariance is

(72) μ¯qk=0min(q,d)(-1)k(-dq-k)(dk).

It is easily verified that the first-order term matches. When d=0, we recover the binomial coefficients, as expected. The summation above is precisely the Krawtchouk polynomial 𝒦q(d;,2) (Olver et al., 2010), which we will denote by 𝒦q(,d). We therefore have

(73) C(d)=q=1dμ¯q𝒦q(,d).

The generating function of the Krawtchouk polynomials yields

(74) C(d)=(1+μ¯)-d(1-μ¯)d-1.

The above expression is consistent with intuition: when μ¯1, the covariance is linear in d, as expected for an additive model. In the opposite limit of μ¯1 and 0<d</2, the constant term 1 can be ignored, and we see that the covariance is proportional to e-λd, where λ=ln(1+μ¯1-μ¯). Epistasis is maximal when μ¯1, in which case λ and the covariance rapidly goes to zero with d. This is the HoC model of maximal epistasis.

The landscape of the general CN model

The landscape of the general CN model, where loci can have different μi, is no longer stationary but the correlation structure can still be calculated. The qth-order contribution to the covariance between genotypes g and g in the general case is

(75) i1>i2>>iqfi1i2iq2n=1q(xinxin)=i1>i2>>iqn=1qμinxinxin.

It is easy to see that when the contributions from all orders are added up, the covariance C(g,g) has a rather simple product form

(76) C(g,g)=i=1(1+μixixi)-1

The correlation c(g,g)=C(g,g)/C(g,g)C(g,g) is

(77) c(g,g)=i=1(1+μixixi)-1i=1(1+μi)-1.

The above relation is exact. When μi1, μ¯1, then 1±μie±μi and the 1’s in the numerator and denominator above can be ignored. We get

(78) c(g,g)e-2i:xixiμi.

When μi’s are equal, we recover the homogeneous case with exponentially decaying correlations.

Appendix 3


The distribution of fitness effects

Long-term adaptation is determined by the DFEs and the stochastic dynamical processes that lead to fixation. Before analyzing the properties of adaptive walks, we clarify what our previous analysis, which ultimately led to Equation (32), means for the DFE.

Equation (32) represents the distribution of the fitness effect of a specific mutation at locus i over all genotypes in the population that have fitness y. We are instead interested in the DFE, where fitness effects are measured for all the mutations of a particular genotype that has fitness y. The DFE for a particular genotype generally depends on idiosyncratic epistatic interactions between loci. In order to make this explicit, we turn to our analysis framework described previously. For notational convenience, we will assume y¯=0 and V=1 in this subsection and the next.

The DFE over loci for a particular genotype of a fitness y can be thought of as a sample, s1,s2,,s, from the conditional joint distribution P(s1,s2,,s|y). From our assumption of numerous, independent epistatic terms for each locus, this joint distribution is defined entirely in terms of the means and covariances of the +1 variables y,s1,s2,,s. Recall that the fitness effect of a mutation at locus i is si=yi-y=-2ξi as defined in Equation (26). As shown previously, we have siy¯=-2vi. We also have sisj¯=4eij with eii=vi. The means of all the variables are zero. Based on this covariance structure, we can compute P(s1,s2,,s|y) using the properties of the conditional distribution of a multivariate normal distribution. It is straightforward to show that conditional on the fitness y, the conditional means are given by Meany(si)=-2viy and the conditional covariances are Covy(si,sj)=4(eij-vivj). This relation makes clear that in general the DFE from a sample s1,s2,,sP(s1,s2,,s|y) depends on the epistatic interactions between all pairs of loci via eij. Note that Vary(si)=4vi(1-vi), which leads to Equation (32) for the marginal distribution P(si|y) of a particular locus.

The DFE simplifies considerably if we make certain additional assumptions on the nature of epistatic interactions. In particular, the conditional correlation between fitness effects is

(79) Corry(si,sj)=eij-vivjvivj(1-vi)(1-vj).

If we assume the typical VF, v¯ is small (i.e., v¯1) and also that eij is O(v¯2), then correlations are O(v¯) and thus negligible. Then, in a particular sample s1,s2,,s, we can think of each si as being drawn independently with mean -2viy and variance 4vi(1-vi). If is large, then this leads to a DFE

(80) ρ(s|y)=01𝑑v(2v(1-v))-1P(v)φ(s+2vy2v(1-v)),

where P(v) is the DVFs across the loci and φ is the standard normal pdf. Surprisingly, this relation becomes exact for the CN model, where we have shown that eij=vivj (Equation (66)) and therefore the correlations between si’s vanish. In this case, the DFE is thus determined entirely by the DVF, but we will show later that it has certain universal properties independent of even the DVF. Note that we derive the DFE starting from rather general assumptions on the organization of the genotype-phenotype map in contrast to past models that assume the DFE as a starting point.

Above, we have measured the DFE for the subset of genotypes that have fitness y without regard to their evolutionary history. Over the course of adaptation, mutations are fixed and certain fitness changes are observed. We would then like to measure the DFE for those genotypes that have undergone a particular adaptive trajectory. As we show below, the DFE again simplifies considerably if certain relations hold.

History dependence

Using an analysis similar to the one in the previous section, we quantify history dependence by calculating the correlations between the new fitness and the adaptive history conditional on the genotypes that have undergone a specific sequence of events in the past.

To be precise, suppose an initial clonal population of fitness y0 gains k successive mutations and the corresponding sequence of fitnesses is y1,y2,,yk. We would like to quantify how the fitness of a new mutation at locus k+1, yk+1, depends on past fitnesses and the idiosyncratic epistatic interactions between the previous k mutations and the new mutation. The correlation between any two fitnesses yi and yj (i<j) is given by 1-2vi+1:j, where vi+1:j is the VF of the loci i+1,i+2,,j (the subscript notation will be used throughout). In general, vi+1:j accounts for the epistatic interactions of all orders between these j-i loci and is expressible in terms of the coefficients of our original complex trait model in Equation (19). One can then write the covariance matrix between y0,y1,,yk+1. In block form, this is

(81) Σ=(Σ0:k,0:kΣ0:k,k+1Σk+1,0:kΣk+1,k+1).

The mean of yk+1 conditional on y0,y1,,yk is a linear weighted sum of the past fitnesses:

(82) yk+1¯=Σk+1,0:kΣ0:k,0:k1y,

where 𝒚 is a vector with elements y0,y1,,yk. In other words, yk+1 can be written as

(83) yk+1=i=0kwk+1,iyi+ϵ,

where ϵ is a mean-zero stochastic term that depends on the genotype of the initial population and whose variance can be calculated from Σ. To gain intuition, it is useful to explicitly calculate the case of k=1. In this case, the covariance matrix is

(84) Σ=(11-2v11-2v121-2v111-2v21-2v121-2v21).

We have

(85) Σ2,01Σ01,011=14v1(1v1)(12v1212v1)(12v112v111)
(86) =14v1(1-v1)(1-2v12-(1-2v1)(1-2v2)(1-2v2)-(1-2v1)(1-2v12)).

We therefore have

(87) y2¯=v1v2+v122v1v122v1(1v1)y1+v1+v22v1v2v122v1(1v1)y0.

The dependence on the past has complex dependencies on epistasis between loci 1 and 2 even in this highly simplified case. To identify what contributes to history dependence beyond just the most recent fitness, we rewrite Equation (87) as

(88) y2¯=(12v2)y1+(12v1)(v12v1v2+2v1v2)2v1(1v1)y1+v1+v22v1v2v122v1(1v1)y0.

From here, we observe that the dependence on v1 and y0 vanishes precisely when v12=v1+v2-2v1v2. This is not true for all landscapes. However, as noted previously, this is in fact true for the CN model (Equation (66)) when μ¯1. In this case, we get the simple relation

(89) y2=(1-2v2)y1+2v2(1-v2)η,

where the pre-factor in the second term comes from normalization and η is a Gaussian random variable.

From Equation (53), we have the general relation v12=v1+v2-2e12, where e12 is the epistatic VF between loci 1 and 2. We can then rewrite Equation (88) as

(90) y2¯(12v2)y1+v1v2e12v1s1,

for v11 and s1=y1-y0. An intuitive interpretation of this result is presented in the main text.

Sufficient condition for memoryless fitness gains

The k=1 case suggests that the relation for memoryless fitness gains (e12=v1v2) could in fact be true for all k under the CN model, which indeed turns out to be the case, as we show below. Motivated by the k=1 case, we would like to have wk+1,k=1-2vk+1 in Equation (83) and the rest of the weights equal to zero. If this is the case,

(91) yk+1=(1-2vk+1)yk+ϵ.

Multiplying both sides by yj and computing the correlations, we get the condition 1-2vj+1:k+1=(1-2vj+1:k)(1-2vk+1) for all j. Therefore, a sufficient condition for memoryless fitness gain is

(92) vj+1:k+1=vk+1+vj+1:k-2vk+1vj+1:k

for all k and for all j<k. We will now show that this is true for the CN model for j=0, that is, v1:k+1=vk+1+v1:k-2vk+1v1:k; the rest trivially follows. Let us first analyze what terms contribute towards V1:k (dividing by V gives v1:k). When loci 1 through k are mutated, their effect is to flip the signs of their coefficients in Equation (19). The ones that have changed sign contribute to the de-correlation between the fitnesses before and after the set of mutations, but the epistatic terms that have an even number of flips do not change and therefore their contribution has to be subtracted. V1:k therefore is the sum of squares of all the coefficients in Equation (19) whose loci have flipped an odd number of times. To keep track of indices, suppose i1,i2, are used to denote the indices of the k loci (which take values from 1 to k) and j1,j2, for the rest. Then,

(93) V1:k=i1=1kFi12+i1>i2>i3kFi1i2i32+,whereFi12=fi12+j11:kfi1j12+j1>j21:kfi1j1j22+,Fi1i2i32=fi1i2i32+j11:kfi1i2i3j12+j1>j21:kfi1i2i3j1j22+,

and so on. Now, when the k+1th locus is also flipped, to compute V1:k+1, we can add up the two variances V1:k and Vk+1 except for the cross terms that have an even number of sign flips and that include both the k+1th locus and the other k loci. These have to be subtracted twice because they appear both in V1:k and Vk+1. These terms are

(94) I1:k,k+1=i1=1kFi1k+12+i1>i2>i3kFi1i2i3k+12+,

where the Fs are defined in a similar fashion as in Equation (93) except the sums over js run from k+2 to instead of k+1 to . We get the general relation

(95) V1:k+1=Vk+1+V1:k-2I1:k,k+1.

For the CN model specifically, we have fi1i2ik2=j=1kμij. This implies

(96) I1:k,k+1=i1=1kFi1k+12+i1>i2>i3kFi1i2i3k+12+,=μk+1j=k+2(1+μj)(i1=1kμi1+i1>i2>i3kμi1μi2μi3+)=μk+1(j=k+2(1+μj))(i=1k(1+μi)1)

Performing a similar calculation for V1:k, we find

(97) V1:k=j=k+1(1+μj)(i1=1kμi1+i1>i2>i3kμi1μi2μi3+)=(1+μk+1)(j=k+2(1+μj))(i=1k(1+μi)1)

which gives

(98) I1:k,k+1=vk+1V1:k.

for μ1 since vk+1=μk+1/(1+μk+1). Dividing Equation (95) by V throughout concludes the derivation.


Under the conditions of memoryless fitness gains from the previous section, we can write

(99) σ=-2vz+2v(1-v)ν,

where variables have been rescaled as z=V-1/2(y-y¯),σ=V-1/2s,ν=V-1/2η. This equation suggests various forms for the DFE, ρ(σ|z), depending on the DVF, P(v).

z is an intuitive measure of ‘adaptedness’: (1) when z is negative, the organism is in the unlikely situation of being ‘negatively adapted’ to the environment. Beneficial mutations are much more likely than deleterious mutations and adaptation is dominated by loci that have a large v. (2) When |z|1, the organism is ‘neutrally adapted’. The number of beneficial and deleterious mutations is balanced. (3) When z1, the organism is ‘well-adapted,’ where the DFE is strongly skewed towards deleterious mutations.

We will analyze adaptation in the neutrally adapted and the well-adapted regimes. When the organism is negatively adapted, Equation (99) predicts that a few substitutions are sufficient to quickly reach the neutrally adapted state. In addition, we assume the VF of each locus is small, that is, v1 but the number of loci is large enough so that overall epistasis v¯1 (v¯ is the mean VF). We can then ignore the 1-v factor and rewrite Equation (99) as

(100) σ=-2vz+2vν.

The intuition behind the analytical results is discussed in the main text. We present the formal calculations here.

Analytical results for an exponential DVF

We begin by calculating the DFE, ρ(σ|z), for the specific case when the DVF is an exponential,

(101) P(v)=v¯-1e-v/v¯,

where v¯1 is the mean VF across loci. The true DVF likely has a large fraction of loci that have no effect; accounting for these loci simply scales the mutation rate and will be ignored in this analysis. The exponential DVF leads to analytical predictions for the shape of the average fitness trajectories under certain simplifying assumptions about the underlying selective forces.

From Equation (100), we have

(102) ρ(σ|z)=𝑑v𝑑νP(v)P(ν)δ(σ+2vz-2vν).

Here, ν is a standard normal random variable, which we integrate out, giving

(103) ρ(σ|z)=01𝑑v(2v)-1P(v)φ(σ+2vz2v),

where φ is the normal pdf. For P(v) given in Equation (101), this integral can be calculated exactly:

(104) ρ(σ|z)=v¯-122v¯-1+z2e-σz/2-|σ|2v¯-1+z2/2.

The resulting DFE is a double exponential with scale (z/2±2v¯-1+z2/2)-1. For |z|1 small, the DFE is symmetric around the origin, as expected, with a scale determined by the DVF. As z increases, the DFE skews towards deleterious effects. The typical magnitudes of beneficial and deleterious effects are not independent from the overall ratio of beneficial to deleterious mutations. The well-adapted regime is reached when z2 is comparable to v¯-1. To clearly delineate the two regimes, it is useful to define new variables x=zv¯/2 and λ=σ2/v¯. In the new variables, the DFE is

(105) ρ(λ|x)=141+x2e-λx+|λ|1+x22.

The neutrally adapted and well-adapted regimes then correspond to x1 and x1, respectively. The mean rate of adaptation in units of generations on the x scale is dx=dzv¯/2=σv¯/2=λv¯/2, where the expectation is taken over fixation probabilities and the DFE. We assume SSWM so that pfix(σ)2σ or pfix(λ)2λv¯/2 for positive λ and 0 otherwise. We find

(106) dx=NUv¯/2-𝑑λpfix(λ)ρ(λ|x)λ
(107) =NUv¯v¯/20𝑑λλ2ρ(λ|x),

where N is population size, and U is the effective mutation rate for loci with nonzero effect. Integrating over λ, we get

(108) dx=NU(2v¯)3/21+x2(x+1+x2)3.

Integrating over dx starting from an initial x0, we obtain an approximation to the mean fitness trajectory

(109) NU(2v¯)3/2ngen=x0x𝑑x1+x2(x+1+x2)3=T(x)-T(x0),
(110) where T(x)=4x55+5x33+x2+115(12x4+19x2+7)+x.

Note that this is only an approximation for small x since the typical fixed beneficial effect is a discrete jump. We show the result from Equation (109) in Appendix 1—figure 1a, where the dependencies on the equilibrium fitness y¯ and genetic variance V are highlighted. There are two independent parameters that determine the scale (v¯/2V) and location (y¯) of the fitness, and one parameter (NUngen) that determines the time scale. From Equation (109), the xt and xt1/5 scalings in the neutrally adapted (x1) and well-adapted regime (x1) respectively are apparent.

The number of substitutions, ns, as a function of x can also be calculated under the SSWM assumption. We get

(111) v¯ns=N(x)-N(x0),
(112) whereN(x)=x(x+x2+1)/4+sinh-1(x)/4,

which can be mapped onto time using Equation (109). The scalings are nsxt in the neutrally adapted regime and nsx2t2/5 in the well-adapted regime.

Asymptotics and general scaling results

We now derive the asymptotic properties of the DFE in the well-adapted regime (v¯1/2z1). Writing out the Gaussian pdf in Equation (80), we have

(113) ρ(σ|z)=01𝑑v(22πv)-1P(v)e-(σ+2vz)28v
(114) =e-σz/201𝑑v(22πv)-1P(v)e-σ28v-vz22
(115) =e-σz/201𝑑u(22πuy)-1P(u/z)e-z2(σ24u+u),

where the change of variables v=u/z is used. When z1, Laplace’s method can be used. The exponent is minimized when u=|σ|/2, which gives

(116) ρ(σ|z)(2z)-1P(|σ|/2z)e-z2(σ+|σ|).

The contribution towards a fitness effect σ at large z comes largely from loci with VF v|σ|/2z. The exponential form of the beneficial DFE is determined entirely by the Gaussian tails of the genotype-dependent term. The argument can be easily generalized to non-Gaussian tail probabilities using a similar calculation. Equation (116) implies that the ratio of the probabilities of beneficial and deleterious mutations is independent of the DVF as long as it has sufficient mass at v=|σ|/2z:

(117) ρ(σ|z)ρ(-σ|z)=e-σz.

Such a relationship has been hypothesized previously based on simulations of an additive finite-sites model and the form of pfix close to the high-interference limit that could result in a fitness plateau (Rice et al., 2015). Using our theory, we have shown that this result is indeed true independent of the DVF and under our core hypotheses of normality and memoryless fitness gains. If pfixeVTcσ in the high-interference limit, where Tc is the coalescent time, fitness should plateau when pfix(σ)ρ(σ|z)=pfix(-σ)ρ(-σ|z), which is at

(118) zplateau=2VTc.

The V appears to account for the rescaling σ=s/V.

We get Equation (116) only if P(v) has probability mass at v=|σ|/2z. This is not the case in the exponentially correlated fitness landscape model with homogeneous loci, that is, if, for instance, P(v)=δ(v-v¯), we instead get from Equation (80)

(119) ρ(σ|z)=φ(σ+2v¯z2v¯).

The DFE in this case is therefore a Gaussian with mean shifting towards the deleterious side. The ratio of beneficial to deleterious mutations goes rapidly to zero as φ(v¯z)/v¯z and adaptation sharply plateaus beyond z=v¯-1/2.

From Equation (116), various scaling results can be derived that apply independent of P(v) and the Gaussian assumption on ν. We retain the exponential form for convenience. The rate of beneficial mutations is

(120) Ub=U0𝑑σ(2z)-1P(|σ|/2z)e-σz
(121) =U2z20𝑑wP(w/2z2)e-w.

The integral has an effective upper cutoff at wO(1) and can be approximated as UbUP(0)/2z2z-2 under certain assumptions for P(v) for small v. The pre-factor P(0) suggests that the number of beneficial mutations depends on the number of small-effect loci. While strong epistatic loci drive adaptation in the neutrally adapted regime, adaptation in the well-adapted regime is instead driven by weakly epistatic loci.

From Equation (116), the typical size of a beneficial mutation is σz-1. Under SSWM, pfixσz-1. As argued previously, since Ubz-2, we get dz/dtz-4 and therefore we obtain the two scaling relations

(122) zt1/5,nst2/5,

which apply independently of the form of the DVF in the well-adapted regime.

Data availability

The code and data used to generate the figures are available at (copy archived at

The following previously published data sets were used
    1. Wiser MJ
    2. Ribeck N
    3. Lenski RE
    (2014) Dryad
    Data from: Long-term dynamics of adaptation in asexual populations.


  1. Book
    1. Altenberg L
    Nk Fitness Landscapes, in “The Handbook of Evolutionary Computation”
    Browse Books.
  2. Book
    1. Gumbel EJ
    Statistics of Extremes
    Courier Corporation.
  3. Book
    1. Haldane JBS
    (1927) A mathematical theory of natural and artificial selection, part v: selection and mutation
    In: Haldane J. B. S, editors. Mathematical Proceedings of the Cambridge Philosophical Society. Cambridge University Press. pp. 838–844.
  4. Book
    1. Olver FW
    2. Lozier DW
    3. Boisvert RF
    4. Clark CW
    NIST Handbook of Mathematical Functions Hardback and CD-ROM
    Cambridge university press.
    1. Orr HA
    The distribution of fitness effects among beneficial mutations
    Genetics 163:1519–1526.

Article and author information

Author details

  1. Gautam Reddy

    NSF-Simons Center for Mathematical and Statistical Analysis of Biology, Harvard University, Cambridge, United States
    Conceptualization, Formal analysis, Investigation, Methodology, Writing - original draft, Writing - review and editing
    For correspondence
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-1276-9613
  2. Michael M Desai

    1. NSF-Simons Center for Mathematical and Statistical Analysis of Biology, Harvard University, Cambridge, United States
    2. Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, United States
    3. Quantitative Biology Initiative, Harvard University, Cambridge, United States
    4. Department of Physics, Harvard University, Cambridge, United States
    Conceptualization, Formal analysis, Investigation, Methodology, Writing - original draft, Writing - review and editing
    For correspondence
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9581-1150


Simons Foundation (NSF-Simons Center at Harvard #1764269)

  • Gautam Reddy

Simons Foundation (376196)

  • Michael M Desai

National Science Foundation (PHY-1914916)

  • Michael M Desai

National Institutes of Health (R01GM104239)

  • Michael M Desai

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.


We thank Sergey Kryazhimskiy, Andrew Murray, Milo Johnson, and members of the Desai lab for comments on the manuscript. GR was supported by the NSF-Simons Center for Mathematical and Statistical Analysis of Biology at Harvard (award number #1764269) and the Harvard FAS Quantitative Biology Initiative. MMD acknowledges support from the Simons Foundation (grant 376196), NSF Grant PHY-1914916, and NIH grant R01GM104239.

Version history

  1. Received: November 9, 2020
  2. Accepted: March 26, 2021
  3. Accepted Manuscript published: March 29, 2021 (version 1)
  4. Version of Record published: April 20, 2021 (version 2)


© 2021, Reddy and Desai

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.


  • 2,918
  • 467
  • 71

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Gautam Reddy
  2. Michael M Desai
Global epistasis emerges from a generic model of a complex trait
eLife 10:e64740.

Share this article

Further reading

    1. Ecology
    2. Evolutionary Biology
    Chunxiao Li, Tao Deng ... Shiqi Wang
    Research Article

    The long-trunked elephantids underwent a significant evolutionary stage characterized by an exceptionally elongated mandible. The initial elongation and subsequent regression of the long mandible, along with its co-evolution with the trunk, present an intriguing issue that remains incompletely understood. Through comparative functional and eco-morphological investigations, as well as feeding preference analysis, we reconstructed the feeding behavior of major groups of longirostrine elephantiforms. In the Platybelodon clade, the rapid evolutionary changes observed in the narial region, strongly correlated with mandible and tusk characteristics, suggest a crucial evolutionary transition where feeding function shifted from the mandible to the trunk, allowing proboscideans to expand their niches to more open regions. This functional shift further resulted in elephantids relying solely on their trunks for feeding. Our research provides insights into how unique environmental pressures shape the extreme evolution of organs, particularly in large mammals that developed various peculiar adaptations during the late Cenozoic global cooling trends.

    1. Evolutionary Biology
    Tian Yue, Yongbo Guo ... Bing Su
    Research Article

    Compared with lowlander migrants, native Tibetans have a higher reproductive success at high altitude though the underlying mechanism remains unclear. Here, we compared the transcriptome and histology of full-term placentas between native Tibetans and Han migrants. We found that the placental trophoblast shows the largest expression divergence between Tibetans and Han, and Tibetans show decreased immune response and endoplasmic reticulum stress. Remarkably, we detected a sex-biased expression divergence, where the male-infant placentas show a greater between-population difference than the female-infant placentas. The umbilical cord plays a key role in the sex-biased expression divergence, which is associated with the higher birth weight of the male newborns of Tibetans. We also identified adaptive histological changes in the male-infant placentas of Tibetans, including larger umbilical artery wall and umbilical artery intima and media, and fewer syncytial knots. These findings provide valuable insights into the sex-biased adaptation of human populations, with significant implications for medical and genetic studies of human reproduction.