Abstract
How epistasis hinders or facilitates movement on fitness landscapes has been a longstanding question of interest. Several high throughput experiments have demonstrated that despite its idiosyncrasy, epistatic effects exhibit global statistical patterns. Recently, Papkou et. al. constructed a fitness landscape for a 9-base region in the folA gene, which encodes for dihydrofolate reductase (DHFR), in E. coli, and demonstrated that despite being highly rugged, the landscape is highly navigable. In this work, using the folA landscape, we ask two questions: (1) How does the nature of epistatic interactions change as a function of the genomic background? (2) How predictable is epistasis within a gene? Our results show that epistasis is “fluid” - the nature of epistasis exhibited by a pair of mutations is strongly contingent on the genetic background. Mutations exhibit one of two binary “states”: a small fraction of mutations exhibit extremely strong patterns of global epistasis, while most do not. Despite these observations, we observe that the distribution of fitness effects (DFE) of a genotype is highly predictable based on its fitness. These results offer a new perspective on how epistasis operates within a gene, and how it can be predicted.
Significance Statement.
How a mutation changes organismal fitness is dependent on the genome in which it occurs. This phenomenon is known as epistasis and makes evolution unpredictable. Recent efforts to understand epistasis have led to the identification of statistical patterns in its manifestations. To study how epistasis operates in protein evolution, we analyze a recently reported landscape which quantifies fitness of ∼260000 sequences of an E. coli gene. We show two previously unknown properties of epistasis: “fluid” (epistasis between mutations is controlled via other epistatic interactions) and “binary” (only a few mutations exhibit statistical patterns; most do not). This work sheds new light on how epistasis manifests in gene sequences. Our results have important consequences for protein & organismal evolution.
Introduction
Mutations decide the fitness of an organism in an environment-dependent fashion. But, the effect of mutations also depends on the genetic background they are occurring in1. This phenomenon is referred to as epistasis. Hence, epistasis influences adaptation2,3. However, it is largely unpredictable, although a few statistical patterns based on macroscopic traits have been reported in the last few years4–10. One way to study genotype-phenotype relationship and patterns of epistasis is by using fitness landscapes11–13.
Fitness landscape is a multidimensional surface, in which one dimension represents a fitness-related phenotype, and the others genotype. Hence, it serves as a genotype-phenotype map, whose shape, in a given environment, is a consequence of genetic interactions or epistasis14–17. If the landscape has many peaks, its structure is called rugged18–20. Populations navigating on a rugged landscape are likely to be trapped in local peaks, whose precise identity is dictated by chance and population’s starting point on the landscape21–23. Alternatively, on smooth landscapes, populations starting from different points on the sequence space all converge to the same global peak24–27. Thus, the structure of the landscape also dictates our ability to predict evolution, and this has wide-ranging implications.
Although fitness landscape was conceptualized to explain the relationship between a population’s genotype and fitness, it has evolved to explain the relationship between fitness and functional protein-coding sequences13,28. This effort to characterize fitness landscapes started by considering a handful of “important” biallelic sites in a protein27,29–37. Although such landscapes could explain the pervasiveness of epistasis in protein sequences, they do not allow us to make wide-ranging statistical predictions of the characteristics of the fitness landscape38. However, with advancing high-throughput technologies, it has become possible to construct high-dimensional landscapes39–45.
In this study, we use one such high-dimensional fitness landscape constructed by Papkou and coworkers to understand how epistasis operates in a 9-base pair region of the folA gene in E. coli40. folA encodes for dihydrofolate reductase (DHFR) and mutations in the gene are known to confer resistance to the antibiotic trimethoprim46–50.
Via analysis of the folA landscape, we ask three specific questions (Supplement Figure S1): (1) How does the nature of epistasis between two given sites change as a function of the genetic background? (2) Are these changes dependent on the fitness or genotype? (3) Does a given mutation follow already known patterns of global epistasis? If yes, what does it depend on? Our results show that epistasis is “fluid” – i.e., the nature of epistasis two mutations exhibit is a function of the genetic background. We also show that only a small fraction of mutations follow global epistasis. In fact, mutations can be classified into two groups consisting of ones that show global epistasis and others (comprising of a majority) that do not. We also propose a novel way to estimate and predict distribution of fitness effects (DFE) of a given genotype.
Results
folA fitness landscape in E. coli
A fitness landscape of a 9-base pair region of folA was generated recently by Papkou and coworkers40. The landscape explored a 9-bp region of the gene that has been shown to be important for resistance to the antibiotic trimethoprim47,51. All 49 variants of folA were generated, and grown in media containing the antibiotic. Deep sequencing was used to quantify fitness, and relative fitness of ∼99.7% of all variants was obtained. A striking feature of the landscape reported by Papkou and coworkers was that despite being highly rugged with 514 peaks, a majority of the landscape had adaptive access to high fitness peaks. This was because, compared to lower peaks, high fitness peaks had a large basin of attraction.
Our analysis using this dataset shows that with increasing size of the landscape, the number of peaks increase; however, the density of the peaks decreases (Supplement Figure S2 and S3). With this increasing size of the landscape, however, the accessibility of the global peak decreases (Supplement Figure S4). Globally, only small regions of the landscape could be represented as a maximally rugged NK landscape (Supplement Figure S5).
By scanning all mutations on a 9-dimensional landscape, Papkou et. al. have created a dataset that allows us to ask specific questions about epistasis and its manifestations on a fitness landscape. The genotypes on the folA landscape have been divided into two categories by Papkou et. al. into functional (∼7% of all points) and non-functional (∼93% of all points). This distinction is based on a statistical segregation of the 49 points. Since this segregation does not have any functional basis, we call the two groups as “high fitness” and “low fitness”.
Nature of epistasis between two mutations is context dependent or “fluid” nature of epistasis
Epistasis between two mutations, A and B, can manifest as no, positive, negative, or sign epistasis52. While we know several examples of pairs of mutations exhibiting epistasis of each kind9,52, we do not know if or how often the nature of epistasis exhibited by a pair of mutations changes with the genomic background.
To answer this question, we pick a pair of mutations and compute the fraction of genomes in which these mutations exhibit (a) Positive epistasis (PE), (b) Negative epistasis (NE), (c) Sign epistasis, and (d) No epistasis (No). Sign epistasis was further classified into (i) Reciprocal Sign epistasis (RSE), (ii) Single Sign epistasis (SSE) and (iii) Other Sign epistasis (OSE) based on the number of paths restricted by Darwinian evolution (Figure 1A).
For example, the mutation pair (G ◊ A at position 3 and T ◊ C at position 7) exhibits PE in 26.08%; NE in 34.36%; RSE in 0.67%; SSE in 2.31%; OSE in 5.00%; and No Epistasis in 31.57% of all high fitness backgrounds. The corresponding figures for low fitness backgrounds are: PE: 19.41%; NE: 22.61%; RSE: 7.65%; SSE: 5.71%; OSE: 13.16%; and No Epistasis: 30.92%. The exact numbers for all mutations pairs are provided in Supplement Data-File1.
We repeat this process for all possible pairs of mutations in the 9-base pair region of folA. The frequency distribution of the fraction of all genotypes where a pair exhibits each type of epistasis is shown in Figure 1B (for high fitness backgrounds) and Figure 1C (for low fitness backgrounds).
Barring a few pair of mutations which exhibit positive epistasis in all/nearly all high fitness backgrounds, nature of epistasis between a pair of mutations is strongly dependent on the genetic background (Figure 1 and Supplement Table 1 and Supplement Data-File2). In high fitness backgrounds, mutation pairs exhibit positive epistasis most frequently (median 41% of the genotypes), followed by negative epistasis (median 23%) and no epistasis (median 16%), with sign epistasis being relatively rare (Figure 1B). In low fitness backgrounds, mutation pairs exhibit no epistasis most frequently (median 30%), followed by negative epistasis (median 22%), positive epistasis (median 21%) and other sign epistasis (median 13%) (Figure 1C).
Because of this contingency of the nature of epistasis between two mutations on the genetic background, we propose that epistasis is “fluid”.
Functionally important sites cause switch in epistasis more frequently
It has previously been shown that both the secondary structure and location of a residue in a protein dictate the nature of epistasis exhibited by residues53. In Figure 1, we saw that epistasis changes with genomic background. But, are certain positions more robust to changes in the nature of epistasis than others? In other words, does the location of a mutation in the genetic background dictate the likelihood of epistasis change?
The effect of a locus X on changing the nature of epistasis between two mutations was quantified as shown in Figure 2A. As shown in Figure 2B and Figure 2C, upon the introduction of a single mutation at locus X, the nature of epistasis switches in more than 50% of cases. The contributions of different loci is different. Positions 4 and 5 on the landscape, which are critical for protein function, are more responsible for changing the nature of epistasis between two mutations, than other sites on the landscape (Supplement Table 2 and Supplement Table 3). This control of nature of epistasis between other sites by functionally important sites is likely an important factor controlling protein evolution.
The switch of the nature of epistasis is most frequent to positive, negative, or no epistasis (Supplement Figure S6 and S7) (also see Supplement Data-File3). Switch to sign epistasis is relatively infrequent. Interestingly, in high fitness backgrounds, mutations at functionally important positions (4 and 5, on the landscape) cause a switch to sign-epistasis more frequently, as compared to a mutation at any of the other seven positions. This pattern is not seen in low fitness backgrounds.
The pervasiveness of the change in epistatic interactions is surprising, and makes prediction of evolutionary trajectories harder. However, we also note that change of nature of epistasis due to a mutation is not unique to folA. An analysis of a previously published five-point landscape of beta-lactamase gene29 shows that change in nature of epistasis in common in intramolecular landscapes (Supplement Figure S8). The above results further emphasize that epistasis is “fluid”, and that different sites control the switch of epistasis differently.
Synonymous mutations can cause change of nature of epistasis
While historically synonymous mutations were thought to be neutral54,55, several studies have demonstrated that they can have a wide range of effects on cellular fitness56–59. However, their role in changing epistasis between two mutations has not been studied. To study this in the folA landscape, we note that any two mutations lie in either one or two codons in the folA sequence. Depending on the location of these two mutations, there is/are one/two codon(s) that remain unaffected by these mutations. We introduce all possible synonymous mutations in the unaffected codon(s) and ask how frequently does the nature of mutation change? Through this, we seek the likelihood that a synonymous mutation will change the nature of epistasis between two mutations (Figure 3A).
In our analysis, we consider TAA ↔ TGA and TAA ↔ TAG as synonymous mutations (see discussion). All three codons encode for a stop signal for translation.
Our results shows that synonymous changes can cause change in nature of epistasis frequently (Figures 3B and 3C). In high fitness backgrounds, the likelihood of change of nature of epistasis upon introduction of a synonymous mutation is ∼0.45 (Figure 3B). In low fitness backgrounds, this number is ∼0.75 (Figure 3C).
These results clearly demonstrate that change of nature of epistasis can take place via the acquisition of even a synonymous mutation, reaffirming “fluidity” in the nature of epistasis. This property of epistasis makes predicting evolution difficult. However, in recent years, epistasis has been shown to exhibit several statistical patterns, collectively termed as global epistasis1. We next check how these patterns hold on the folA landscape.
Mutations on the landscape exhibit diminishing returns and increasing costs
Global epistasis suggests that the fitness effect of a mutation is a decreasing function of the background fitness7,60 61 (Figure 4A). In Figure 4B, a point represents fitness effect of a single mutation (y-axis) against the fitness of the genotype in which the mutation is introduced (x-axis) on the folA landscape. This is done for all mutations in all backgrounds. The line in red indicates the linear fit between the two variables. In both high and low fitness backgrounds, the negative slope of the line indicates presence of global epistasis, although the correlation is not very strong (R2 = 0.1284 for high fitness backgrounds, & R2 = 0.1236 for low fitness backgrounds).
Only a small fraction of mutations exhibit global epistasis/ “binary” nature of global epistasis
We next investigate the effect on fitness of each of the 108 (12 possible mutations at each of the 9 sites) mutations on all possible genotypes. Figure 5 shows that only a few mutations exhibit strong patterns of global epistasis. These mutations are primarily (14 out of 16) at positions (nucleotide 4 and 5) which are functionally most important for folA46. An overwhelming majority of mutations (77/108) exhibit no correlation (R2 < 0.2) between their fitness effect and the fitness of the background on which they occur (Supplement Figure S9 and Supplement Table S4).
Therefore, mutations exhibit one of two states depending on whether they follow global epistasis, or not. This indicates the “binary” nature of mutations. In the case of folA landscape, only mutations at nucleotide positions critical for function exhibit global epistasis.
A recent work61 demonstrated that the growth rate at which any mutation switches from being beneficial to deleterious is conserved for all mutations. This growth rate, around which the nature of mutation changes from being beneficial to deleterious was referred to as the pivot growth rate. While the mechanistic origins of pivot growth rate are yet unknown, this phenomenon likely represents a deep fundamental “rule” of how a cell works. We test the existence of the pivot growth rate for each of the 108 mutations in the folA landscape. Most (∼80%) mutations pivot from being beneficial to deleterious at a growth rate -0.657 ± 0.0657 (Figure 6 and Supplement Table 4). The presence of a pivot growth rate despite poor statistical correlations between background fitness and the fitness effect of a mutation is a surprising observation.
Predicting DFE from phenotype
The “binary” and “fluid” nature of epistasis discussed in this work make prediction of evolutionary trajectories difficult. As shown in Figure 5, only functionally important sites exhibit global epistasis. For most mutations, fitness effects are idiosyncratic. Hence, in the absence of knowledge of the sites which exhibit global epistasis, predictions are likely going to be accurate. From the context of the “fluidity” of epistasis, in the absence of complete knowledge of the genetic background, it is barely possible to comment on nature of epistasis between two mutations. Thus, these two features of epistasis make evolution unpredictable. We now ask if there exists any statistical pattern at all, which would enable the prediction of evolution.
In this context, we define a quantity called “phenotypic DFE”, which represents the collective DFE of all genotypes exhibiting near identical fitness (Figure 7A). To compute the phenotypic DFE, we binned all genotypes in non-overlapping narrow fitness intervals (see methods). DFE of each genotype was computed by introducing all possible 27 (by introducing all three mutations at each of the 9 positions for a given genotype) mutations. DFE of all genotypes in one interval was averaged to obtain the phenotypic DFE. We next ask how robustly the DFE of a particular genotype of near identical fitness can be predicted solely from the phenotypic DFE.
To test this, we compute the phenotypic DFE from randomly sampled 90% of the genotypes with fitness between fo and fo + Δ. The DFE of the remaining 10% of the genotypes in a fitness window was computed separately, and each DFE was compared with the phenotypic DFE. The likelihood of equivalence of each genotype’s DFE with the corresponding phenotypic DFE was estimated as the p-value of the Mann-Whitney U test. This comparison gives us a distribution of p-values, for each background fitness.
The phenotypic DFE of high fitness backgrounds comprised of two peaks. The first peak corresponds to mutations with large deleterious effects, whose magnitude increases with increasing background fitness (Figures 7B). The second peak is roughly centered at fitness effect ∼ 0. For low fitness backgrounds, the DFE comprised of only one peak, whose mean decreased as the background fitness increased (Figure 7C).
Phenotypic DFE is better predicted, than fitness effects of individual mutations, by the background fitness of the genotype. The fraction of genomes for which the phenotypic DFE is statistically significantly different from the actual DFE reduces as the background fitness increases (Figure 7D). This effect can also be seen from the distribution of p-values of comparisons between phenotypic DFE and actual DFEs, for different background fitness (Supplement Figure 10).
Discussion
Genotype-phenotype mapping, especially of proteins, is of great interest in evolutionary biology, cell biology, and genetics11,46,62–66. The underlying rules that dictate this mapping are a combination of individual mutation effects, epistatic interactions between the mutations, and between the mutations and the genetic background they occur in7,29,35,60,61,67–70. The extent and ubiquity of epistatic interactions is of particular interest, because they are mostly unpredictable and have a direct effect on the shape of the fitness landscape, and consequently, adaptation34,38,71–76. Therefore, in order to understand the statistical rules which govern epistasis on a landscape, we analyzed a recently published folA landscape which was constructed by quantifying fitness of more than 260,000 sequences in a 9-base pair region40. Mutations in these nine base pairs have been reported to be adaptive47,49.
Navigability of fitness landscapes decides the likelihood of a population to reach the global fitness maximum. Most fitness landscapes indicate that protein landscapes are rugged, and hence, protein evolution is constrained47,70,77–79. But, fitness landscapes are constructed by considering a handful of sites, and there is theoretical evidence to suggest that mutations in other “dimensions” could enable populations to navigate valleys in the landscape11. Fisher suggested the same in his correspondence with Wright80.
A particularly confounding mode of epistasis is sign epistasis, which alters the qualitative nature of a mutational effect, and creates valleys in fitness landscapes. In the folA landscape, sign epistasis frequently changes to positive or negative epistasis, indicating “fluidity” in its effects. We also report that synonymous mutations can change the nature of epistasis between two existing mutations. Hence, even if valleys existed, they are not that difficult to navigate. Similar findings have been reported in the past - deleterious mutations making peaks accessible, and neutral mutations being adaptive81,82. Additionally, synonymous mutations are known to have fitness effects by changing protein amount (by altering mRNA stability or translation rates)56,83–87 or structure88–92. Our results provide a novel mechanism via which synonymous mutations are relevant, for driving evolutionary change and controlling disease states93–96.
Interestingly, our analysis shows that the nature of epistasis between two mutations can change by simply changing the stop codon (TAA ↔ TGA or TAA ↔ TAG). This observation holds even when the stop codon is encoded by the first three bases of the 9-bp landscape, and the interacting mutations whose nature of epistasis changes are in the subsequent two codons (which are presumably not even translated). This indicates that the change in the nature of epistasis has likely not do with the protein synthesis but with the mRNA and its effect on cellular fitness.
Premature termination of translation has been known to destabilize the entire transcript97. In eukaryotes, elaborate mechanisms are present to deal with potentially toxic effects of truncated proteins98–101. However, prokaryotes lack these mechanisms. In fact, a recent study shows that in E. coli under stress, despite a premature stop codon in the gene sequence, stop codon read-through rates may be as high as 80%, due to a high probability of a mismatch at a premature stop codon102. This is a likely explanation for change of epistasis (or change of fitness) due to alternate premature stop codons in folA, when E. coli is grown in antibiotic stress.
Neutral mutations can improve evolvability of a sequence, and hence aid the movement on a fitness landscape103–108. However, we do not see any evolvability enhancing mutations in the 9-bp folA landscape (Supplement Figure S11).
Predicting evolution is a longstanding goal in the field, and global epistasis patterns offered a tool to predict epistatic effects. However, our analyses show that global epistasis often does not hold. Instead, we observe that sequences with similar fitness have similar DFEs, offering some predictability, based on a macroscopic trait, of the adaptive potential of a population109–111. The means of these DFEs decreased linearly with an increase in background fitness, despite the mutations on this landscape not following global epistasis patterns (Supplement Figure S12).
We show that the navigability of a landscape can change via mutations in the same protein. Can it also change via mutations elsewhere on the genome? High-dimensional landscapes are necessary to answer this question. Additionally, increasing the dimensionality of the landscape is likely going to provide qualitatively new perspectives of adaptation.
Methods
Calculating Hamming Distance
To calculate the hamming distance between two sequences of equal length, we compare the sequences and count the number of dissimilar loci.
Construction of sequence spaces and landscapes
In order to construct an n-base pair sequence space from a parent p-base pair sequence space (p > n), we choose any p − n loci in the parent sequence space and find all variants in the parent space in which the selected loci are fixed (these loci contain a same sequence). The set of these selected variants are assigned their one hamming distance neighbours to construct a new n-base sequence space.
By repeating this process over all combinations of selecting the loci to be fixed and the 4p−n permutations of choosing the fixed sequence in each case, we are able to break the parent sequence space into n-base pair sequence spaces (Supplement Table 2).
For our study, we use a nine-base pair parent sequence space to generate n-base pair sequence spaces (1 ≤ n ≤ 9). Landscapes are generated by mapping fitness values of each sequence in a sequence space corresponding to the ones assigned in the empirical fitness landscape generated by Papkou et al. using a nine base pair gene.
In rare cases where fitness value was not known, we disregarded those variants in all studies.
Finding peaks in a fitness landscape
Number of Peaks in Landscape
To count the number of peaks in fitness landscapes, we find the number of variants that have a higher fitness value than all its one hamming distance neighbours in that landscape.
Peak Probability
To quantify the probability of encountering a fitness peak in any landscape, we find the ratio of number of peaks found to the total number of sequences in that fitness landscape.
Expected number of peaks
In our case (as we are dealing with 4 letter genome), the expected number of peaks in maximally rugged (uncorrelated) NK landscapes is found by for an n-base pair landscape20. The number of peaks predicted by NK n+1 landscapes is rounded off to nearest integer.
Fitness effect of mutations
To find the fitness effect of a mutation acting on a genotype, we start with a set of all genotypes in the empirical Papkou et al. fitness landscape which would result in a new sequence formed following the mutation. In all such genotypes, we find the background fitness fb and the fitness of resulting mutant fb. The fitness effect of this mutation in this background is determined as the fitness difference between the mutant and the background (s = fm − fb).
Finding phenotypic DFE
We found the Distribution of Fitness Effects for variants lying in a small slice of fitness value (for our study, we kept the range of this slice to be 0.05). The DFE was constructed by analysing the frequency of fitness effects of all mutations acting on the backgrounds lying in the selected range.
Finding genotypic DFE
For any 9 length genotypic sequence, we found the 9loci × 3bases = 27 mutations that may result in a new sequence. We used the frequency of fitness effect of all these mutations to constitute the distribution of fitness effects for any genotype.
Non parametric tests
We found the p-value of the two parameter KS test and the Mann– Whitney U test using the python scipy.stats library.
Linear regression
We found the pivot point fitness of individual mutations via linear regression of background fitness and the selection coefficient of the mutation using the “LinearRegression” model from python sklearn.linear_model library.
Epistasis as function of genetic background
Classifying Epistasis
To quantify the epistasis present in a mutation pair acting on two differing genetic loci, we compute the fitness effect of the individual mutations on a genetic background and the cumulative fitness effect of the two mutations on the same genetic background. If the fitness effect of the individual mutations were s1 and s2, while the cumulative effect of the two mutations was s12, then we classify the epistasis into following categories,
No Epistasis: If |s12 − (s1 + s2)| < 0.05 i.e. we assume no epistasis was present if the cumulative fitness effect of the two mutations was about the same as the two mutations acting independently on the genetic background.
Sign Epistasis: If s12 × (s1 + s2) < 0 i.e. we classify sign epistasis if the cumulative effect of the two mutations lead to a different sign of fitness effect than the sum of individual fitness effects of the two mutations on the same background. For example, if the combined effect of the mutation pair was beneficial even though the sum of fitness effects of the two mutations on the same background was deleterious, and vice versa.
We further classified this epistasis into three categories,
Reciprocal Sign Epistasis: If s1 < 0, s2 < 0 and s12 > 0 i.e. if the cumulative effect of the two mutations lead the background to a higher fitness, but both shortest paths to this higher fitness point are blocked to darwinian evolution.
Single Sign Epistasis: If exclusively s1 < 0 or s2 < 0 and s12 > 0 i.e. if exactly one of the two shortest paths leading background to higher fitness are blocked to darwinian evolution.
Other Sign Epistasis: All other cases cases classified to sign epistasis.
Positive Epistasis: If s12 > s1 + s2 i.e. if the combined fitness effect of the two mutations was more beneficial / less deleterious than the sum of individual effects of the mutation pair on the genetic background.
Negative Epistasis: If s12 < s1 + s2 i.e. if the combined fitness effect of the two mutations was less beneficial / more deleterious than the sum of individual effects of the mutation pair on the genetic background.
Epistasis Change with Genetic Background
Having the epistasis dossier generated for all mutation pairs, we compiled all cases of Positive, Negative, Sign and No Epistasis. For each of these cases, we select the genetic backgrounds and their one mutant neighbours such that their differing mutation locus is unrelated to the loci involved in Epistasis (In our case, we can find (9 − 2)loci × 3bases = 21 such neighbours for each background).
We then check the nature of epistasis in each of these 21 neighbours on the same mutation pair, and quantify the number of these cases in which nature of epistasis changes.
Finding paths in a sequence space
Set of all variants
We start by listing all the mutations required to convert the starting sequence Ato the target sequence T. If ℎ denotes the minimum number of mutations required to change sequence A to T, then for each step s ∈ 1, ⋯, ℎ − 1, we list all variants in the sequence space which are at s hamming distance from the starting variant and ℎ − s hamming distance from the target sequence. The resulting set includes all variants involved at each step for shortest traversal from A to T.
Having the set of all variants at each step in traversal of sequence A to T, we recursively find all ℎ! permutations of paths such that each step only allows one base change while leading the sequence to the target in minimum number of steps.
Effect of neutral mutations on evolvability
To perform this study, we compiled all the (9loci × 4bases) = 36mutations that are possible among all sequences of folA landscape. We then listed all the backgrounds on which these mutations change the genotype, but showcase neutral fitness effect (magnitude of fitness effect < 0.05).
We then allow a second mutation which changes both the background and the mutant genotype. If the selection coefficient of the second mutation on the background is s1 and on the neutral mutant is s, then the relative increase in evolvability is quantified as: .
Consider a variant A and a neutral mutation X which results in a variant B (A ≠ B|sAB| < 0.05). For any mutation Y taking place on genotypes A and B forming A′ and B′ such that (A ≠ A′B ≠ B′), we quantify the relative change in evolvability of A from mutation Y due to a neutral mutation X as .
Finding synonymous mutations
We identified the all synonymous codons, i.e. the codons that encode the same amino acid / termination function. For each codon ci, we then found the list of all synonymous codons which are at 1 Hamming distance from the codon ci.
Using this data, we were able to identify the set of synonymous mutations for each codon.
Change in mode of epistasis due to synonymous mutation in an extrinsic codon
For any background in DHFR gene, we tested all mutation pairs (A and B) and identified the type of epistasis exhibited. Since these two mutations can mutate a minimum of one and a maximum of two codons in the 9 base pair gene, at least one codon remains un-mutated. For the mutation pair, we found the un-mutated codon(s), and derived the set of all possible synonymous mutations on the codon(s) of the given background.
We noted the number of instances where introduction of a synonymous mutation (X) at a particular locus (on an un-mutated codon) does or does not change the nature of epistasis for the background and given mutation pair.
We did this analysis separately (all / high fitness / low fitness) backgrounds, their possible mutation pairs (A and B) and their respective synonymous mutations (X) to identify the overall probability of epistasis change due to synonymous mutation on each locus.
Codes
All codes used in this work and Supplement Data Files are available at: https://github.com/SainiSupreet/Ecoli-folA-DHFR. The “readme” file at the repository gives details of how to run codes.
Acknowledgements
We thank Christian Landry and Krishna Swamy for feedback on the manuscript.
Funding
This work was funded by a Wellcome Trust/DBT (India Alliance) grant (Award Number: IA/S/19/2/504632) to SS. NMR was funded by Prime Minister’s Research Fellowship (PMRF ID 1301163).
Additional files
References
- 1Epistasis and evolution: recent advances and an outlook for predictionBMC Biol 21https://doi.org/10.1186/s12915-023-01585-3
- 2Epistasis and Adaptation on Fitness LandscapesAnnual Review of Ecology, Evolution, and Systematics 53
- 3Impact of epistasis and pleiotropy on evolutionary adaptationProc Biol Sci 279:247–256https://doi.org/10.1098/rspb.2011.0870
- 4The impact of macroscopic epistasis on long-term evolutionary dynamicsGenetics 199:177–190https://doi.org/10.1534/genetics.114.172460
- 5Mutational robustness changes during long-term adaptation in laboratory budding yeast populationsElife 11https://doi.org/10.7554/eLife.76491
- 6Diminishing-returns epistasis decreases adaptability along an evolutionary trajectoryNat Ecol Evol 1https://doi.org/10.1038/s41559-016-0061
- 7Microbial evolution. Global epistasis makes adaptation predictable despite sequence-level stochasticityScience 344:1519–1522https://doi.org/10.1126/science.1250939
- 8The simplicity of protein sequence-function relationshipsNat Commun 15https://doi.org/10.1038/s41467-024-51895-5
- 9Epistasis in protein evolutionProtein Sci 25:1204–1218https://doi.org/10.1002/pro.2897
- 10The causes of epistasisProc Biol Sci 278:3617–3624https://doi.org/10.1098/rspb.2011.1537
- 11The structure of genotype-phenotype maps makes fitness landscapes navigableNat Ecol Evol 6:1742–1752https://doi.org/10.1038/s41559-022-01867-z
- 12On the incongruence of genotype-phenotype and fitness landscapesPLoS Comput Biol 18https://doi.org/10.1371/journal.pcbi.1010524
- 13Empirical fitness landscapes and the predictability of evolutionNat Rev Genet 15:480–490https://doi.org/10.1038/nrg3744
- 14Epistasis and the Structure of Fitness Landscapes: Are Experimental Fitness Landscapes Compatible with Fisher’s Geometric Model?Genetics 203:847–862https://doi.org/10.1534/genetics.115.182691
- 15The distribution of epistasis on simple fitness landscapesBiol Lett 15https://doi.org/10.1098/rsbl.2018.0881
- 16Global epistasis on fitness landscapesPhilos Trans R Soc Lond B Biol Sci 378https://doi.org/10.1098/rstb.2022.0053
- 17The simplicity of protein sequence-function relationshipsbioRxiv https://doi.org/10.1101/2023.09.02.556057
- 18Measuring ruggedness in fitness landscapesProc Natl Acad Sci U S A 112:7345–7346https://doi.org/10.1073/pnas.1507916112
- 19Rugged fitness landscapes minimize promiscuity in the evolution of transcriptional repressorsCell Syst 15:374–387https://doi.org/10.1016/j.cels.2024.03.002
- 20Towards a general theory of adaptive walks on rugged landscapesJ Theor Biol 128:11–45https://doi.org/10.1016/s0022-5193(87)80029-2
- 21Experimental rugged fitness landscape in protein sequence spacePLoS One 1https://doi.org/10.1371/journal.pone.0000096
- 22Adaptation in tunably rugged fitness landscapes: the rough Mount Fuji modelGenetics 198:699–721https://doi.org/10.1534/genetics.114.167668
- 23Evolutionary dynamics on rugged fitness landscapes: Exact dynamics and information theoretical aspectsPhysical Review E 80
- 24Colloquium papers: Adaptive landscapes and protein evolutionProc Natl Acad Sci U S A 107:1747–1751https://doi.org/10.1073/pnas.0906192106
- 25Evolutionary accessibility of mutational pathwaysPLoS Comput Biol 7https://doi.org/10.1371/journal.pcbi.1002134
- 26Exploring the effect of sex on empirical fitness landscapesAm Nat 174:S15–30https://doi.org/10.1086/599081
- 27Fitness epistasis among 6 biosynthetic loci in the budding yeast Saccharomyces cerevisiaeJ Hered 101:S75–84https://doi.org/10.1093/jhered/esq007
- 28The roles of mutation, inbreeding, crossbreeding, and selection in evolutionProceedings of the Sixth International Congress on Genetics 1:355–366
- 29Darwinian evolution can follow only very few mutational paths to fitter proteinsScience 312:111–114https://doi.org/10.1126/science.1123539
- 30Natural selection and the concept of a protein spaceNature 225:563–564https://doi.org/10.1038/225563a0
- 31Ancestral lysozymes reconstructed, neutrality tested, and thermostability linked to hydrocarbon packingNature 345:86–89https://doi.org/10.1038/345086a0
- 32Test of Interaction between Genetic Markers That Affect Fitness in Aspergillus NigerEvolution 51:1499–1505https://doi.org/10.1111/j.1558-5646.1997.tb01473.x
- 33Epistasis between deleterious mutations and the evolution of recombinationTrends Ecol Evol 22:308–315https://doi.org/10.1016/j.tree.2007.02.014
- 34Experimental illumination of a fitness landscapeProc Natl Acad Sci U S A 108:7896–7901https://doi.org/10.1073/pnas.1016024108
- 35Pervasive contingency and entrenchment in a billion years of Hsp90 evolutionProc Natl Acad Sci U S A 115:4453–4458https://doi.org/10.1073/pnas.1718133115
- 36Local Fitness Landscapes Predict Yeast Evolutionary Dynamics in Directionally Changing EnvironmentsGenetics 208:307–322https://doi.org/10.1534/genetics.117.300519
- 37Patterns of Epistasis between beneficial mutations in an antibiotic resistance geneMol Biol Evol 30:1779–1787https://doi.org/10.1093/molbev/mst096
- 38Pervasive epistasis exposes intramolecular networks in adaptive enzyme evolutionNat Commun 14https://doi.org/10.1038/s41467-023-44333-5
- 39The fitness landscape of a tRNA geneScience 352:837–840https://doi.org/10.1126/science.aae0568
- 40A rugged yet easily navigable fitness landscapeScience 382https://doi.org/10.1126/science.adh3860
- 41Comprehensive fitness landscape of SARS-CoV-2 M(pro) reveals insights into viral resistance mechanismsElife 11https://doi.org/10.7554/eLife.77433
- 42Adaptation in protein fitness landscapes is facilitated by indirect pathsElife 5https://doi.org/10.7554/eLife.16965
- 43An experimental assay of the interactions of amino acids from orthologous sequences shaping a complex fitness landscapePLoS Genet 15https://doi.org/10.1371/journal.pgen.1008079
- 44Fitness landscape of substrate-adaptive mutations in evolved amino acid-polyamine-organocation transportersElife 13https://doi.org/10.7554/eLife.93971
- 45Comprehensive fitness maps of Hsp90 show widespread environmental dependenceElife 9https://doi.org/10.7554/eLife.53810
- 46Systems-level response to point mutations in a core metabolic enzyme modulates genotype-phenotype relationshipCell Rep 11:645–656https://doi.org/10.1016/j.celrep.2015.03.051
- 47High-Order Epistasis in Catalytic Power of Dihydrofolate Reductase Gives Rise to a Rugged Fitness Landscape in the Presence of Trimethoprim SelectionMol Biol Evol 36:1533–1550https://doi.org/10.1093/molbev/msz086
- 48Dihydrofolate reductase: x-ray structure of the binary complex with methotrexateScience 197:452–455https://doi.org/10.1126/science.17920
- 49Insights into enzyme function from studies on mutants of dihydrofolate reductaseScience 239:1105–1110https://doi.org/10.1126/science.3125607
- 50Structure, dynamics, and catalytic function of dihydrofolate reductaseAnnu Rev Biophys Biomol Struct 33:119–140https://doi.org/10.1146/annurev.biophys.33.110502.133613
- 51Evolutionary paths to antibiotic resistance under dynamically sustained drug selectionNat Genet 44:101–105https://doi.org/10.1038/ng.1034
- 52Epistasis--the essential role of gene interactions in the structure and evolution of genetic systemsNat Rev Genet 9:855–867https://doi.org/10.1038/nrg2452
- 53Pervasive Pairwise Intragenic Epistasis among Sequential Mutations in TEM-1 beta-LactamaseJ Mol Biol 431:1981–1992https://doi.org/10.1016/j.jmb.2019.03.020
- 54Genetic variability maintained in a finite population due to mutational production of neutral and nearly neutral isoallelesGenet Res 11:247–269https://doi.org/10.1017/s0016672300011459
- 55Non-Darwinian evolutionScience 164:788–798https://doi.org/10.1126/science.164.3881.788
- 56Good codons, bad transcript: large reductions in gene expression and fitness arising from synonymous mutations in a key enzymeMol Biol Evol 30:549–560https://doi.org/10.1093/molbev/mss273
- 57Synonymous mutations make dramatic contributions to fitness when growth is limited by a weak-link enzymePLoS Genet 14https://doi.org/10.1371/journal.pgen.1007615
- 58The distribution of fitness effects among synonymous mutations in a gene under directional selectionElife 8https://doi.org/10.7554/eLife.45952
- 59Synonymous codon substitutions perturb cotranslational protein folding in vivo and impair cell fitnessProc Natl Acad Sci U S A 117:3528–3534https://doi.org/10.1073/pnas.1907126117
- 60Higher-fitness yeast genotypes are less robust to deleterious mutationsScience 366:490–493https://doi.org/10.1126/science.aay4199
- 61Environment-independent distribution of mutational effects emerges from microscopic epistasisbioRxiv https://doi.org/10.1101/2023.11.18.567655
- 62Structural properties of genotype-phenotype mapsJ R Soc Interface 14https://doi.org/10.1098/rsif.2017.0275
- 63Genotype-Phenotype Mapping Meets Single Cell BiologyCell Syst 4:1–2https://doi.org/10.1016/j.cels.2017.01.008
- 64Genotype-phenotype mapping and the end of the ’genes as blueprint’ metaphorPhilos Trans R Soc Lond B Biol Sci 365:557–566https://doi.org/10.1098/rstb.2009.0241
- 65From genes to phenotype: dynamical systems and evolvabilityGenetica 84:5–11https://doi.org/10.1007/BF00123979
- 66The genetic landscape of a cellScience 327:425–431https://doi.org/10.1126/science.1180823
- 67Unpredictability of the Fitness Effects of Antimicrobial Resistance Mutations Across Environments in Escherichia coliMol Biol Evol 41https://doi.org/10.1093/molbev/msae086
- 68Positive epistasis between disease-causing missense mutations and silent polymorphism with effect on mRNA translation velocityProc Natl Acad Sci U S A 118https://doi.org/10.1073/pnas.2010612118
- 69Negative epistasis between beneficial mutations in an evolving bacterial populationScience 332:1193–1196https://doi.org/10.1126/science.1203801
- 70Perspective: Sign epistasis and genetic constraint on evolutionary trajectoriesEvolution 59:1165–1174
- 71Molecular ensembles make evolution unpredictableProc Natl Acad Sci U S A 114:11938–11943https://doi.org/10.1073/pnas.1711927114
- 72Evolution: like any other science it is predictablePhilos Trans R Soc Lond B Biol Sci 365:133–145https://doi.org/10.1098/rstb.2009.0154
- 73Historical contingency and its biophysical basis in glucocorticoid receptor evolutionNature 512:203–207https://doi.org/10.1038/nature13410
- 74How mutational epistasis impairs predictability in protein evolution and designProtein Sci 25:1260–1272https://doi.org/10.1002/pro.2876
- 75Predicting evolutionNat Ecol Evol 1https://doi.org/10.1038/s41559-017-0077
- 76Local fitness landscape of the green fluorescent proteinNature 533:397–401https://doi.org/10.1038/nature17995
- 77ACE2 binding is an ancestral and evolvable trait of sarbecovirusesNature 603:913–918https://doi.org/10.1038/s41586-022-04464-z
- 78Epistatic drift causes gradual decay of predictability in protein evolutionScience 376:823–830https://doi.org/10.1126/science.abn6895
- 79Protein evolution on rugged landscapesProc Natl Acad Sci U S A 86:6191–6195https://doi.org/10.1073/pnas.86.16.6191
- 80Sewall Wright and Evolutionary BiologyChicago University Press
- 81Compensatory mutations potentiate constructive neutral evolution by gene duplicationbioRxiv https://doi.org/10.1101/2024.02.12.579783
- 82Identification of the potentiating mutations and synergistic epistasis that enabled the evolution of inter-species cooperationPLoS One 12https://doi.org/10.1371/journal.pone.0174345
- 83A silent mutation in mabA confers isoniazid resistance on Mycobacterium tuberculosisMol Microbiol 91:538–547https://doi.org/10.1111/mmi.12476
- 84A Synonymous Mutation Upstream of the Gene Encoding a Weak-Link Enzyme Causes an Ultrasensitive Response in Growth RateJ Bacteriol 198:2853–2863https://doi.org/10.1128/JB.00262-16
- 85Coding-sequence determinants of gene expression in Escherichia coliScience 324:255–258https://doi.org/10.1126/science.1170160
- 86Causes and effects of N-terminal codon bias in bacterial genesScience 342:475–479https://doi.org/10.1126/science.1241934
- 87Effects of Synonymous Mutations beyond Codon Bias: The Evidence for Adaptive Synonymous Substitutions from Microbial Evolution ExperimentsGenome Biol Evol 13https://doi.org/10.1093/gbe/evab141
- 88How synonymous mutations alter enzyme structure and function over long timescalesNat Chem 15:308–318https://doi.org/10.1038/s41557-022-01091-z
- 89Synonymous Mutations Can Alter Protein Dimerization Through Localized Interface Misfolding Involving Self-entanglementsJ Mol Biol 436https://doi.org/10.1016/j.jmb.2024.168487
- 90The imprint of codons on protein structureBiotechnol J 6:641–649https://doi.org/10.1002/biot.201000329
- 91Codon Usage Influences the Local Rate of Translation Elongation to Regulate Co-translational Protein FoldingMol Cell 59:744–754https://doi.org/10.1016/j.molcel.2015.07.018
- 92Synonymous Codons Direct Cotranslational Folding toward Different Protein ConformationsMol Cell 61:341–351https://doi.org/10.1016/j.molcel.2016.01.008
- 93Understanding the contribution of synonymous mutations to human diseaseNat Rev Genet 12:683–691https://doi.org/10.1038/nrg3051
- 94Exposing synonymous mutationsTrends Genet 30:308–321https://doi.org/10.1016/j.tig.2014.04.006
- 95Synonymous mutations frequently act as driver mutations in human cancersCell 156:1324–1335https://doi.org/10.1016/j.cell.2014.01.051
- 96A pan-cancer analysis of synonymous mutationsNat Commun 10https://doi.org/10.1038/s41467-019-10489-2
- 97Effect of premature termination of translation on mRNA stability depends on the site of ribosome releaseProc Natl Acad Sci U S A 84:4890–4894https://doi.org/10.1073/pnas.84.14.4890
- 98Premature Termination Codons Are Recognized in the Nucleus in A Reading-Frame Dependent MannerCell Discov 1https://doi.org/10.1038/celldisc.2015.1
- 99SMG-6 mRNA cleavage stalls ribosomes near premature stop codons in vivoNucleic Acids Res 50:8852–8866https://doi.org/10.1093/nar/gkac681
- 100Nonsense mRNA suppression via nonstop decayElife 7https://doi.org/10.7554/eLife.33292
- 101Premature termination codons in the DMD gene cause reduced local mRNA synthesisProc Natl Acad Sci U S A 117:16456–16464https://doi.org/10.1073/pnas.1910456117
- 102Environment modulates protein heterogeneity through transcriptional and translational stop codon readthroughNat Commun 15https://doi.org/10.1038/s41467-024-48387-x
- 103The causes of evolvability and their evolutionNat Rev Genet 20:24–38https://doi.org/10.1038/s41576-018-0069-z
- 104Is evolvability evolvable?Nat Rev Genet 9:75–82https://doi.org/10.1038/nrg2278
- 105Mutator dynamics in sexual and asexual experimental populations of yeastBMC Evol Biol 11https://doi.org/10.1186/1471-2148-11-158
- 106Neutral evolution of mutational robustnessProc Natl Acad Sci U S A 96:9716–9720https://doi.org/10.1073/pnas.96.17.9716
- 107Mutators, population size, adaptive landscape and the adaptation of asexual populations of bacteriaGenetics 152:485–493https://doi.org/10.1093/genetics/152.2.485
- 108Evolvability-enhancing mutations in the fitness landscapes of an RNA and a proteinNat Commun 14https://doi.org/10.1038/s41467-023-39321-8
- 109Distribution of fixed beneficial mutations and the rate of adaptation in asexual populationsProc Natl Acad Sci U S A 109:4950–4955https://doi.org/10.1073/pnas.1119910109
- 110Deleterious passengers in adapting populationsGenetics 198:1183–1208https://doi.org/10.1534/genetics.114.170233
- 111The noisy edge of traveling wavesProc Natl Acad Sci U S A 108:1783–1787https://doi.org/10.1073/pnas.1013529108
Article and author information
Author information
Version history
- Preprint posted:
- Sent for peer review:
- Reviewed Preprint version 1:
Copyright
© 2025, Baheti et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.