Introduction

Mutations decide the fitness of an organism in an environment-dependent fashion. But, the effect of mutations also depends on the genetic background they are occurring in1. This phenomenon is referred to as epistasis. Hence, epistasis influences adaptation2,3. However, it is largely unpredictable, although a few statistical patterns based on macroscopic traits have been reported in the last few years410. One way to study genotype-phenotype relationship and patterns of epistasis is by using fitness landscapes1113.

Fitness landscape is a multidimensional surface, in which one dimension represents a fitness-related phenotype, and the others genotype. Hence, it serves as a genotype-phenotype map, whose shape, in a given environment, is a consequence of genetic interactions or epistasis1417. If the landscape has many peaks, its structure is called rugged1820. Populations navigating on a rugged landscape are likely to be trapped in local peaks, whose precise identity is dictated by chance and population’s starting point on the landscape2123. Alternatively, on smooth landscapes, populations starting from different points on the sequence space all converge to the same global peak2427. Thus, the structure of the landscape also dictates our ability to predict evolution, and this has wide-ranging implications.

Although fitness landscape was conceptualized to explain the relationship between a population’s genotype and fitness, it has evolved to explain the relationship between fitness and functional protein-coding sequences13,28. This effort to characterize fitness landscapes started by considering a handful of “important” biallelic sites in a protein27,2937. Although such landscapes could explain the pervasiveness of epistasis in protein sequences, they do not allow us to make wide-ranging statistical predictions of the characteristics of the fitness landscape38. However, with advancing high-throughput technologies, it has become possible to construct high-dimensional landscapes3945.

In this study, we use one such high-dimensional fitness landscape constructed by Papkou and coworkers to understand how epistasis operates in a 9-base pair region of the folA gene in E. coli40. folA encodes for dihydrofolate reductase (DHFR) and mutations in the gene are known to confer resistance to the antibiotic trimethoprim4650.

Via analysis of the folA landscape, we ask three specific questions (Supplement Figure S1): (1) How does the nature of epistasis between two given sites change as a function of the genetic background? (2) Are these changes dependent on the fitness or genotype? (3) Does a given mutation follow already known patterns of global epistasis? If yes, what does it depend on? Our results show that epistasis is “fluid” – i.e., the nature of epistasis two mutations exhibit is a function of the genetic background. We also show that only a small fraction of mutations follow global epistasis. In fact, mutations can be classified into two groups consisting of ones that show global epistasis and others (comprising of a majority) that do not. We also propose a novel way to estimate and predict distribution of fitness effects (DFE) of a given genotype.

Results

folA fitness landscape in E. coli

A fitness landscape of a 9-base pair region of folA was generated recently by Papkou and coworkers40. The landscape explored a 9-bp region of the gene that has been shown to be important for resistance to the antibiotic trimethoprim47,51. All 49 variants of folA were generated, and grown in media containing the antibiotic. Deep sequencing was used to quantify fitness, and relative fitness of ∼99.7% of all variants was obtained. A striking feature of the landscape reported by Papkou and coworkers was that despite being highly rugged with 514 peaks, a majority of the landscape had adaptive access to high fitness peaks. This was because, compared to lower peaks, high fitness peaks had a large basin of attraction.

Our analysis using this dataset shows that with increasing size of the landscape, the number of peaks increase; however, the density of the peaks decreases (Supplement Figure S2 and S3). With this increasing size of the landscape, however, the accessibility of the global peak decreases (Supplement Figure S4). Globally, only small regions of the landscape could be represented as a maximally rugged NK landscape (Supplement Figure S5).

By scanning all mutations on a 9-dimensional landscape, Papkou et. al. have created a dataset that allows us to ask specific questions about epistasis and its manifestations on a fitness landscape. The genotypes on the folA landscape have been divided into two categories by Papkou et. al. into functional (∼7% of all points) and non-functional (∼93% of all points). This distinction is based on a statistical segregation of the 49 points. Since this segregation does not have any functional basis, we call the two groups as “high fitness” and “low fitness”.

Nature of epistasis between two mutations is context dependent or “fluid” nature of epistasis

Epistasis between two mutations, A and B, can manifest as no, positive, negative, or sign epistasis52. While we know several examples of pairs of mutations exhibiting epistasis of each kind9,52, we do not know if or how often the nature of epistasis exhibited by a pair of mutations changes with the genomic background.

To answer this question, we pick a pair of mutations and compute the fraction of genomes in which these mutations exhibit (a) Positive epistasis (PE), (b) Negative epistasis (NE), (c) Sign epistasis, and (d) No epistasis (No). Sign epistasis was further classified into (i) Reciprocal Sign epistasis (RSE), (ii) Single Sign epistasis (SSE) and (iii) Other Sign epistasis (OSE) based on the number of paths restricted by Darwinian evolution (Figure 1A).

Nature of epistasis exhibited by two mutations is contingent on the genetic background.

(A) The nature of epistasis exhibited by two mutations (indicated by red circle and green square) in all 47 genetic backgrounds was quantified. Numbers f1 to f6 indicate the fraction of genomes in which the two mutations exhibit Positive epistasis (PE), Negative epistasis (NE), Reciprocal Sign epistasis (RSE), Single Sign epistasis (SSE), Other Sign epistasis (OSE), and No epistasis (No), and were calculated for this pair of mutations. The process was repeated for every possible pair of mutations. Distribution of the six fractions f1 to f6 in (B) high and (C) low fitness backgrounds is plotted.

For example, the mutation pair (G ◊ A at position 3 and T ◊ C at position 7) exhibits PE in 26.08%; NE in 34.36%; RSE in 0.67%; SSE in 2.31%; OSE in 5.00%; and No Epistasis in 31.57% of all high fitness backgrounds. The corresponding figures for low fitness backgrounds are: PE: 19.41%; NE: 22.61%; RSE: 7.65%; SSE: 5.71%; OSE: 13.16%; and No Epistasis: 30.92%. The exact numbers for all mutations pairs are provided in Supplement Data-File1.

We repeat this process for all possible pairs of mutations in the 9-base pair region of folA. The frequency distribution of the fraction of all genotypes where a pair exhibits each type of epistasis is shown in Figure 1B (for high fitness backgrounds) and Figure 1C (for low fitness backgrounds).

Barring a few pair of mutations which exhibit positive epistasis in all/nearly all high fitness backgrounds, nature of epistasis between a pair of mutations is strongly dependent on the genetic background (Figure 1 and Supplement Table 1 and Supplement Data-File2). In high fitness backgrounds, mutation pairs exhibit positive epistasis most frequently (median 41% of the genotypes), followed by negative epistasis (median 23%) and no epistasis (median 16%), with sign epistasis being relatively rare (Figure 1B). In low fitness backgrounds, mutation pairs exhibit no epistasis most frequently (median 30%), followed by negative epistasis (median 22%), positive epistasis (median 21%) and other sign epistasis (median 13%) (Figure 1C).

Because of this contingency of the nature of epistasis between two mutations on the genetic background, we propose that epistasis is “fluid”.

Functionally important sites cause switch in epistasis more frequently

It has previously been shown that both the secondary structure and location of a residue in a protein dictate the nature of epistasis exhibited by residues53. In Figure 1, we saw that epistasis changes with genomic background. But, are certain positions more robust to changes in the nature of epistasis than others? In other words, does the location of a mutation in the genetic background dictate the likelihood of epistasis change?

The effect of a locus X on changing the nature of epistasis between two mutations was quantified as shown in Figure 2A. As shown in Figure 2B and Figure 2C, upon the introduction of a single mutation at locus X, the nature of epistasis switches in more than 50% of cases. The contributions of different loci is different. Positions 4 and 5 on the landscape, which are critical for protein function, are more responsible for changing the nature of epistasis between two mutations, than other sites on the landscape (Supplement Table 2 and Supplement Table 3). This control of nature of epistasis between other sites by functionally important sites is likely an important factor controlling protein evolution.

Different sites on the landscape dictate change in epistasis differently.

(A) To determine the impact of mutation at locus X in changing the nature of epistasis between two mutations (indicated by a red circle and a green square), we took the following approach. For a particular pair of mutations, the six sites (indicated by N) were fixed, and the nature of epistasis recorded before and after a mutation at X. This process was repeated for all 46 backgrounds, and the fraction of genomes in which, upon introduction of a mutation at X, epistasis between the mutations (red circle and green square) changed, was noted. This gives us f, as indicated in the Figure. This process was repeated for all possible pairs of mutations (red circle and green square); giving a distribution of f for locus X. This distribution of f is shown in (B) for high fitness backgrounds and (C) for low fitness backgrounds. Functionally important sites (4 and 5) cause change in epistasis more frequently than others.

The switch of the nature of epistasis is most frequent to positive, negative, or no epistasis (Supplement Figure S6 and S7) (also see Supplement Data-File3). Switch to sign epistasis is relatively infrequent. Interestingly, in high fitness backgrounds, mutations at functionally important positions (4 and 5, on the landscape) cause a switch to sign-epistasis more frequently, as compared to a mutation at any of the other seven positions. This pattern is not seen in low fitness backgrounds.

The pervasiveness of the change in epistatic interactions is surprising, and makes prediction of evolutionary trajectories harder. However, we also note that change of nature of epistasis due to a mutation is not unique to folA. An analysis of a previously published five-point landscape of beta-lactamase gene29 shows that change in nature of epistasis in common in intramolecular landscapes (Supplement Figure S8). The above results further emphasize that epistasis is “fluid”, and that different sites control the switch of epistasis differently.

Synonymous mutations can cause change of nature of epistasis

While historically synonymous mutations were thought to be neutral54,55, several studies have demonstrated that they can have a wide range of effects on cellular fitness5659. However, their role in changing epistasis between two mutations has not been studied. To study this in the folA landscape, we note that any two mutations lie in either one or two codons in the folA sequence. Depending on the location of these two mutations, there is/are one/two codon(s) that remain unaffected by these mutations. We introduce all possible synonymous mutations in the unaffected codon(s) and ask how frequently does the nature of mutation change? Through this, we seek the likelihood that a synonymous mutation will change the nature of epistasis between two mutations (Figure 3A).

Introduction of a synonymous mutation changes the nature of epistasis between two interacting mutations.

(A) For a given sequence, two mutations (red circle and green square) could occur in either one or two codons, resulting in two or one unaltered codons (underlined in figure). All possible synonymous mutations were introduced in the unaltered codon(s) and change of nature of epistasis (if any) between the two mutations was noted. This was done for all backgrounds and all pairs of two mutations. Probability that a synonymous mutation at locus X leads to a change in nature of epistasis between two mutations in (B) high and (C) low fitness backgrounds is shown. Mutations resulting in TAA ↔ TGA and TAA — TAG are considered as synonymous mutations.

In our analysis, we consider TAA ↔ TGA and TAA ↔ TAG as synonymous mutations (see discussion). All three codons encode for a stop signal for translation.

Our results shows that synonymous changes can cause change in nature of epistasis frequently (Figures 3B and 3C). In high fitness backgrounds, the likelihood of change of nature of epistasis upon introduction of a synonymous mutation is ∼0.45 (Figure 3B). In low fitness backgrounds, this number is ∼0.75 (Figure 3C).

These results clearly demonstrate that change of nature of epistasis can take place via the acquisition of even a synonymous mutation, reaffirming “fluidity” in the nature of epistasis. This property of epistasis makes predicting evolution difficult. However, in recent years, epistasis has been shown to exhibit several statistical patterns, collectively termed as global epistasis1. We next check how these patterns hold on the folA landscape.

Mutations on the landscape exhibit diminishing returns and increasing costs

Global epistasis suggests that the fitness effect of a mutation is a decreasing function of the background fitness7,60 61 (Figure 4A). In Figure 4B, a point represents fitness effect of a single mutation (y-axis) against the fitness of the genotype in which the mutation is introduced (x-axis) on the folA landscape. This is done for all mutations in all backgrounds. The line in red indicates the linear fit between the two variables. In both high and low fitness backgrounds, the negative slope of the line indicates presence of global epistasis, although the correlation is not very strong (R2 = 0.1284 for high fitness backgrounds, & R2 = 0.1236 for low fitness backgrounds).

The folA fitness landscape exhibits weak patterns of global epistasis.

(A) Cartoon to illustrate statistical patterns of global epistasis. The beneficial effect of a mutation decreases with an increase in background fitness (Diminishing Returns Epistasis). Beyond a certain background fitness (Pivot Point), the mutation becomes deleterious and its deleterious effects increase as the fitness of the genetic background increases (Increasing Costs Epistasis). (B) Mutational effects vs. the background fitness show weak correlation for low fitness (R2 = 0.1236) (left of dotted line) and high fitness (R2 = 0.1284) (right of dotted line) backgrounds. The red lines show the best linear fits for the two groups.

Only a small fraction of mutations exhibit global epistasis/ “binary” nature of global epistasis

We next investigate the effect on fitness of each of the 108 (12 possible mutations at each of the 9 sites) mutations on all possible genotypes. Figure 5 shows that only a few mutations exhibit strong patterns of global epistasis. These mutations are primarily (14 out of 16) at positions (nucleotide 4 and 5) which are functionally most important for folA46. An overwhelming majority of mutations (77/108) exhibit no correlation (R2 < 0.2) between their fitness effect and the fitness of the background on which they occur (Supplement Figure S9 and Supplement Table S4).

A small fraction of mutations follow global epistasis.

Only 16 (14 of which are at position 4 and 5) (highlighted in red boxes) out of 108 possible mutations exhibit statistically significant patterns (R2 > 0.4) of global epistasis. See Supplement Table 1 for statistics for each mutation.

Therefore, mutations exhibit one of two states depending on whether they follow global epistasis, or not. This indicates the “binary” nature of mutations. In the case of folA landscape, only mutations at nucleotide positions critical for function exhibit global epistasis.

A recent work61 demonstrated that the growth rate at which any mutation switches from being beneficial to deleterious is conserved for all mutations. This growth rate, around which the nature of mutation changes from being beneficial to deleterious was referred to as the pivot growth rate. While the mechanistic origins of pivot growth rate are yet unknown, this phenomenon likely represents a deep fundamental “rule” of how a cell works. We test the existence of the pivot growth rate for each of the 108 mutations in the folA landscape. Most (∼80%) mutations pivot from being beneficial to deleterious at a growth rate -0.657 ± 0.0657 (Figure 6 and Supplement Table 4). The presence of a pivot growth rate despite poor statistical correlations between background fitness and the fitness effect of a mutation is a surprising observation.

Most mutations switch from being beneficial to deleterious at the pivot point.

Background fitness (y-axis) at which each of the 108 mutations (x-axis) switches from being beneficial to deleterious. The black solid line shows the average fitness (−0.657) (or pivot) at which a mutation changes from being beneficial to deleterious. Individual bars show the deviation from the mean pivot point for each mutation. More than 80% (86 out of 108) of the mutations exhibit the pivot fitness in the range -0.657 +/-0.0657. The dotted line in red indicates the growth rate used in Papkou et al40 to differentiate between high and low fitness variants.

Predicting DFE from phenotype

The “binary” and “fluid” nature of epistasis discussed in this work make prediction of evolutionary trajectories difficult. As shown in Figure 5, only functionally important sites exhibit global epistasis. For most mutations, fitness effects are idiosyncratic. Hence, in the absence of knowledge of the sites which exhibit global epistasis, predictions are likely going to be accurate. From the context of the “fluidity” of epistasis, in the absence of complete knowledge of the genetic background, it is barely possible to comment on nature of epistasis between two mutations. Thus, these two features of epistasis make evolution unpredictable. We now ask if there exists any statistical pattern at all, which would enable the prediction of evolution.

In this context, we define a quantity called “phenotypic DFE”, which represents the collective DFE of all genotypes exhibiting near identical fitness (Figure 7A). To compute the phenotypic DFE, we binned all genotypes in non-overlapping narrow fitness intervals (see methods). DFE of each genotype was computed by introducing all possible 27 (by introducing all three mutations at each of the 9 positions for a given genotype) mutations. DFE of all genotypes in one interval was averaged to obtain the phenotypic DFE. We next ask how robustly the DFE of a particular genotype of near identical fitness can be predicted solely from the phenotypic DFE.

“Phenotypic DFE” is a predictor of DFE.

(A) Backgrounds with near identical fitness were randomly distributed in two groups comprising 90% and 10% of all backgrounds. The first group was used to define a “phenotypic DFE” (a mean DFE of all genomes in the group). The phenotypic DFE was compared with individual DFEs in the 10% group, and the p-value distribution of this comparison was obtained. The process was repeated for genotypes with other fitness. (B) Phenotypic DFE of high fitness backgrounds exhibited two peaks. The first peak was at fitness effect ∼ 0. The second peak comprised of deleterious mutations, whose magnitude increased with increasing background fitness. (C) Phenotypic DFE of low fitness backgrounds exhibited a single peak, whose mean decreased with increasing background fitness. (D) Percent of DFEs in a fitness window with p-value < 0.05 when compared with the Phenotypic DFE. As fitness increases, Phenotypic DFE becomes a better predictor of DFE of a genotype.

To test this, we compute the phenotypic DFE from randomly sampled 90% of the genotypes with fitness between fo and fo + Δ. The DFE of the remaining 10% of the genotypes in a fitness window was computed separately, and each DFE was compared with the phenotypic DFE. The likelihood of equivalence of each genotype’s DFE with the corresponding phenotypic DFE was estimated as the p-value of the Mann-Whitney U test. This comparison gives us a distribution of p-values, for each background fitness.

The phenotypic DFE of high fitness backgrounds comprised of two peaks. The first peak corresponds to mutations with large deleterious effects, whose magnitude increases with increasing background fitness (Figures 7B). The second peak is roughly centered at fitness effect ∼ 0. For low fitness backgrounds, the DFE comprised of only one peak, whose mean decreased as the background fitness increased (Figure 7C).

Phenotypic DFE is better predicted, than fitness effects of individual mutations, by the background fitness of the genotype. The fraction of genomes for which the phenotypic DFE is statistically significantly different from the actual DFE reduces as the background fitness increases (Figure 7D). This effect can also be seen from the distribution of p-values of comparisons between phenotypic DFE and actual DFEs, for different background fitness (Supplement Figure 10).

Discussion

Genotype-phenotype mapping, especially of proteins, is of great interest in evolutionary biology, cell biology, and genetics11,46,6266. The underlying rules that dictate this mapping are a combination of individual mutation effects, epistatic interactions between the mutations, and between the mutations and the genetic background they occur in7,29,35,60,61,6770. The extent and ubiquity of epistatic interactions is of particular interest, because they are mostly unpredictable and have a direct effect on the shape of the fitness landscape, and consequently, adaptation34,38,7176. Therefore, in order to understand the statistical rules which govern epistasis on a landscape, we analyzed a recently published folA landscape which was constructed by quantifying fitness of more than 260,000 sequences in a 9-base pair region40. Mutations in these nine base pairs have been reported to be adaptive47,49.

Navigability of fitness landscapes decides the likelihood of a population to reach the global fitness maximum. Most fitness landscapes indicate that protein landscapes are rugged, and hence, protein evolution is constrained47,70,7779. But, fitness landscapes are constructed by considering a handful of sites, and there is theoretical evidence to suggest that mutations in other “dimensions” could enable populations to navigate valleys in the landscape11. Fisher suggested the same in his correspondence with Wright80.

A particularly confounding mode of epistasis is sign epistasis, which alters the qualitative nature of a mutational effect, and creates valleys in fitness landscapes. In the folA landscape, sign epistasis frequently changes to positive or negative epistasis, indicating “fluidity” in its effects. We also report that synonymous mutations can change the nature of epistasis between two existing mutations. Hence, even if valleys existed, they are not that difficult to navigate. Similar findings have been reported in the past - deleterious mutations making peaks accessible, and neutral mutations being adaptive81,82. Additionally, synonymous mutations are known to have fitness effects by changing protein amount (by altering mRNA stability or translation rates)56,8387 or structure8892. Our results provide a novel mechanism via which synonymous mutations are relevant, for driving evolutionary change and controlling disease states9396.

Interestingly, our analysis shows that the nature of epistasis between two mutations can change by simply changing the stop codon (TAA ↔ TGA or TAA ↔ TAG). This observation holds even when the stop codon is encoded by the first three bases of the 9-bp landscape, and the interacting mutations whose nature of epistasis changes are in the subsequent two codons (which are presumably not even translated). This indicates that the change in the nature of epistasis has likely not do with the protein synthesis but with the mRNA and its effect on cellular fitness.

Premature termination of translation has been known to destabilize the entire transcript97. In eukaryotes, elaborate mechanisms are present to deal with potentially toxic effects of truncated proteins98101. However, prokaryotes lack these mechanisms. In fact, a recent study shows that in E. coli under stress, despite a premature stop codon in the gene sequence, stop codon read-through rates may be as high as 80%, due to a high probability of a mismatch at a premature stop codon102. This is a likely explanation for change of epistasis (or change of fitness) due to alternate premature stop codons in folA, when E. coli is grown in antibiotic stress.

Neutral mutations can improve evolvability of a sequence, and hence aid the movement on a fitness landscape103108. However, we do not see any evolvability enhancing mutations in the 9-bp folA landscape (Supplement Figure S11).

Predicting evolution is a longstanding goal in the field, and global epistasis patterns offered a tool to predict epistatic effects. However, our analyses show that global epistasis often does not hold. Instead, we observe that sequences with similar fitness have similar DFEs, offering some predictability, based on a macroscopic trait, of the adaptive potential of a population109111. The means of these DFEs decreased linearly with an increase in background fitness, despite the mutations on this landscape not following global epistasis patterns (Supplement Figure S12).

We show that the navigability of a landscape can change via mutations in the same protein. Can it also change via mutations elsewhere on the genome? High-dimensional landscapes are necessary to answer this question. Additionally, increasing the dimensionality of the landscape is likely going to provide qualitatively new perspectives of adaptation.

Methods

Calculating Hamming Distance

To calculate the hamming distance between two sequences of equal length, we compare the sequences and count the number of dissimilar loci.

Construction of sequence spaces and landscapes

In order to construct an n-base pair sequence space from a parent p-base pair sequence space (p > n), we choose any pn loci in the parent sequence space and find all variants in the parent space in which the selected loci are fixed (these loci contain a same sequence). The set of these selected variants are assigned their one hamming distance neighbours to construct a new n-base sequence space.

By repeating this process over all combinations of selecting the loci to be fixed and the 4pn permutations of choosing the fixed sequence in each case, we are able to break the parent sequence space into n-base pair sequence spaces (Supplement Table 2).

For our study, we use a nine-base pair parent sequence space to generate n-base pair sequence spaces (1 ≤ n ≤ 9). Landscapes are generated by mapping fitness values of each sequence in a sequence space corresponding to the ones assigned in the empirical fitness landscape generated by Papkou et al. using a nine base pair gene.

In rare cases where fitness value was not known, we disregarded those variants in all studies.

Finding peaks in a fitness landscape

Number of Peaks in Landscape

To count the number of peaks in fitness landscapes, we find the number of variants that have a higher fitness value than all its one hamming distance neighbours in that landscape.

Peak Probability

To quantify the probability of encountering a fitness peak in any landscape, we find the ratio of number of peaks found to the total number of sequences in that fitness landscape.

Expected number of peaks

In our case (as we are dealing with 4 letter genome), the expected number of peaks in maximally rugged (uncorrelated) NK landscapes is found by for an n-base pair landscape20. The number of peaks predicted by NK n+1 landscapes is rounded off to nearest integer.

Fitness effect of mutations

To find the fitness effect of a mutation acting on a genotype, we start with a set of all genotypes in the empirical Papkou et al. fitness landscape which would result in a new sequence formed following the mutation. In all such genotypes, we find the background fitness fb and the fitness of resulting mutant fb. The fitness effect of this mutation in this background is determined as the fitness difference between the mutant and the background (s = fmfb).

Finding phenotypic DFE

We found the Distribution of Fitness Effects for variants lying in a small slice of fitness value (for our study, we kept the range of this slice to be 0.05). The DFE was constructed by analysing the frequency of fitness effects of all mutations acting on the backgrounds lying in the selected range.

Finding genotypic DFE

For any 9 length genotypic sequence, we found the 9loci × 3bases = 27 mutations that may result in a new sequence. We used the frequency of fitness effect of all these mutations to constitute the distribution of fitness effects for any genotype.

Non parametric tests

We found the p-value of the two parameter KS test and the Mann– Whitney U test using the python scipy.stats library.

Linear regression

We found the pivot point fitness of individual mutations via linear regression of background fitness and the selection coefficient of the mutation using the “LinearRegression” model from python sklearn.linear_model library.

Epistasis as function of genetic background

Classifying Epistasis

To quantify the epistasis present in a mutation pair acting on two differing genetic loci, we compute the fitness effect of the individual mutations on a genetic background and the cumulative fitness effect of the two mutations on the same genetic background. If the fitness effect of the individual mutations were s1 and s2, while the cumulative effect of the two mutations was s12, then we classify the epistasis into following categories,

  1. No Epistasis: If |s12 − (s1 + s2)| < 0.05 i.e. we assume no epistasis was present if the cumulative fitness effect of the two mutations was about the same as the two mutations acting independently on the genetic background.

  2. Sign Epistasis: If s12 × (s1 + s2) < 0 i.e. we classify sign epistasis if the cumulative effect of the two mutations lead to a different sign of fitness effect than the sum of individual fitness effects of the two mutations on the same background. For example, if the combined effect of the mutation pair was beneficial even though the sum of fitness effects of the two mutations on the same background was deleterious, and vice versa.

  3. We further classified this epistasis into three categories,

    1. Reciprocal Sign Epistasis: If s1 < 0, s2 < 0 and s12 > 0 i.e. if the cumulative effect of the two mutations lead the background to a higher fitness, but both shortest paths to this higher fitness point are blocked to darwinian evolution.

    2. Single Sign Epistasis: If exclusively s1 < 0 or s2 < 0 and s12 > 0 i.e. if exactly one of the two shortest paths leading background to higher fitness are blocked to darwinian evolution.

  4. Other Sign Epistasis: All other cases cases classified to sign epistasis.

  5. Positive Epistasis: If s12 > s1 + s2 i.e. if the combined fitness effect of the two mutations was more beneficial / less deleterious than the sum of individual effects of the mutation pair on the genetic background.

  6. Negative Epistasis: If s12 < s1 + s2 i.e. if the combined fitness effect of the two mutations was less beneficial / more deleterious than the sum of individual effects of the mutation pair on the genetic background.

Epistasis Change with Genetic Background

Having the epistasis dossier generated for all mutation pairs, we compiled all cases of Positive, Negative, Sign and No Epistasis. For each of these cases, we select the genetic backgrounds and their one mutant neighbours such that their differing mutation locus is unrelated to the loci involved in Epistasis (In our case, we can find (9 − 2)loci × 3bases = 21 such neighbours for each background).

We then check the nature of epistasis in each of these 21 neighbours on the same mutation pair, and quantify the number of these cases in which nature of epistasis changes.

Finding paths in a sequence space

Set of all variants

We start by listing all the mutations required to convert the starting sequence Ato the target sequence T. If ℎ denotes the minimum number of mutations required to change sequence A to T, then for each step s ∈ 1, ⋯, ℎ − 1, we list all variants in the sequence space which are at s hamming distance from the starting variant and ℎ − s hamming distance from the target sequence. The resulting set includes all variants involved at each step for shortest traversal from A to T.

Having the set of all variants at each step in traversal of sequence A to T, we recursively find all ℎ! permutations of paths such that each step only allows one base change while leading the sequence to the target in minimum number of steps.

Effect of neutral mutations on evolvability

To perform this study, we compiled all the (9loci × 4bases) = 36mutations that are possible among all sequences of folA landscape. We then listed all the backgrounds on which these mutations change the genotype, but showcase neutral fitness effect (magnitude of fitness effect < 0.05).

We then allow a second mutation which changes both the background and the mutant genotype. If the selection coefficient of the second mutation on the background is s1 and on the neutral mutant is s, then the relative increase in evolvability is quantified as: .

Consider a variant A and a neutral mutation X which results in a variant B (AB|sAB| < 0.05). For any mutation Y taking place on genotypes A and B forming A and B such that (AABB), we quantify the relative change in evolvability of A from mutation Y due to a neutral mutation X as .

Finding synonymous mutations

We identified the all synonymous codons, i.e. the codons that encode the same amino acid / termination function. For each codon ci, we then found the list of all synonymous codons which are at 1 Hamming distance from the codon ci.

Using this data, we were able to identify the set of synonymous mutations for each codon.

Change in mode of epistasis due to synonymous mutation in an extrinsic codon

For any background in DHFR gene, we tested all mutation pairs (A and B) and identified the type of epistasis exhibited. Since these two mutations can mutate a minimum of one and a maximum of two codons in the 9 base pair gene, at least one codon remains un-mutated. For the mutation pair, we found the un-mutated codon(s), and derived the set of all possible synonymous mutations on the codon(s) of the given background.

We noted the number of instances where introduction of a synonymous mutation (X) at a particular locus (on an un-mutated codon) does or does not change the nature of epistasis for the background and given mutation pair.

We did this analysis separately (all / high fitness / low fitness) backgrounds, their possible mutation pairs (A and B) and their respective synonymous mutations (X) to identify the overall probability of epistasis change due to synonymous mutation on each locus.

Codes

All codes used in this work and Supplement Data Files are available at: https://github.com/SainiSupreet/Ecoli-folA-DHFR. The “readme” file at the repository gives details of how to run codes.

Acknowledgements

We thank Christian Landry and Krishna Swamy for feedback on the manuscript.

Funding

This work was funded by a Wellcome Trust/DBT (India Alliance) grant (Award Number: IA/S/19/2/504632) to SS. NMR was funded by Prime Minister’s Research Fellowship (PMRF ID 1301163).

Additional files

Supplement figures and tables.