Most phenotypes are determined by molecular systems composed of specifically interacting molecules. However, unlike for individual components, little is known about the distributions of mutational effects of molecular systems as a whole. We ask how the distribution of mutational effects of a transcriptional regulatory system differs from the distributions of its components, by first independently, and then simultaneously, mutating a transcription factor and the associated promoter it represses. We find that the system distribution exhibits increased phenotypic variation compared to individual component distributions - an effect arising from intermolecular epistasis between the transcription factor and its DNA-binding site. In large part, this epistasis can be qualitatively attributed to the structure of the transcriptional regulatory system and could therefore be a common feature in prokaryotes. Counter-intuitively, intermolecular epistasis can alleviate the constraints of individual components, thereby increasing phenotypic variation that selection could act on and facilitating adaptive evolution.https://doi.org/10.7554/eLife.28921.001
Distributions of mutational effects (DMEs) and the nature of the interactions among mutations (epistasis) critically determine evolutionary paths and outcomes (Eyre-Walker and Keightley, 2007; de Visser and Krug, 2014). DMEs are central to a range of fundamental questions in evolutionary biology (Halligan and Keightley, 2009), including understanding the origins of novel traits (Soskine and Tawfik, 2010), evolution of sex and recombination (Otto and Lenormand, 2002), and maintenance of genetic variation (Charlesworth et al., 1995). In contrast to selective constraints, which act on the variation already present in a population, biophysical laws and molecular mechanisms that define how a molecular system functions constrain the access to phenotypic variation through mutation (Camps et al., 2007; Wagner, 2011), and in doing so determine the shape of the DME (Fontana and Buss, 1994).
Even though most phenotypes are determined by underlying molecular systems that consist of multiple specifically interacting molecular components, direct and systematic experimental estimates of DMEs have been limited to the two extremes only: either at the level of the whole organism, obtained in mutation accumulation studies (Halligan and Keightley, 2009); or at the level of individual components, such as proteins (Wang et al., 2002; Bershtein et al., 2006; Sarkisyan et al., 2016) and DNA-binding sites for transcription factors (Kinney et al., 2010; Shultzaberger et al., 2012; Yun et al., 2012; Metzger et al., 2015), determined through direct mutagenesis. Knowing only the effects of mutations in individual molecular components might be insufficient to understand how the whole system evolves, as recent studies focusing on the interaction of mutations in two components of a molecular system uncovered the existence of pervasive intermolecular epistasis (Anderson et al., 2015; Podgornaia and Laub, 2015). Because of the large mutational space of proteins, these studies focused only on mutations in specific residues that lie at the interface between the two interacting molecules. In contrast, addressing the more general question of how intermolecular epistasis shapes DMEs of molecular systems is only possible by experimentally tackling the nearly prohibitive space of possible mutational combinations, which even for single components is conceivable only in rare cases (Sarkisyan et al., 2016). Here, by using one of the best understood transcriptional regulation systems, we experimentally ask how the DME of a system differs from the DMEs of its constitutive components.
To address this question, we used a simple gene regulatory system based on the canonical Lambda bacteriophage switch (Ptashne, 2011), consisting of three components – the σ70RNA polymerase complex (together, we refer to them as RNAP), transcription factor CI (trans-element), and the PR cis-element that contains the overlapping DNA-binding sites of the two proteins (Figure 1). These three molecular components interact to produce a quantitative phenotype: gene expression. Specifically, we used a genetic system in which a strong promoter PR controls the expression of a yellow fluorescence protein (yfp) and is repressed by the CI repressor, which we placed under the inducible promoter PTET (Figure 1B,C). This system exhibits high yfp expression in the absence of CI, where the level of expression is determined solely by RNAP binding. However, in the presence of CI, achieved by the induction of the PTET promoter, the system is strongly repressed (Figure 2A). We find that, even in such a simple transcriptional regulatory system, the DME of the system differs unexpectedly from the individual component DMEs, and that most of this difference can be directly attributed to the genetic regulatory structure of the system.
To determine the DMEs of individual components in our system, we performed direct mutagenesis on the cis- and the trans-element independently. For each component, we created libraries with approximately 0.01, 0.04, and 0.07 mutations per nucleotide (low, intermediate, and high mutation probability libraries, respectively). Due to the different sizes of the two components (84 bp for cis and 714 bp for trans), mutants in cis-element libraries contained on average 1, 3, and 6 mutations, whereas mutants in trans-element libraries contained on average 7, 28, and 49 mutations, respectively. We did not mutate the σ70 – RNAP complex due to its cell-wide pleiotropic effects. For assessing the DMEs of the system, we created three additional ‘system mutant libraries’ by combining cis- and trans-element mutant libraries of the same mutation probability. Each library consisted of more than 30,000 uniquely transformed individuals, and we estimated the corresponding DME for each library by measuring fluorescence, that is gene expression, of 1 million randomly sampled individuals by flow cytometry. We quantified the differences in DMEs in two ways. First, by observing the frequency of mutants in three biologically meaningful categories (Figure 2A): (i) mutants that are indistinguishable from the wildtype without the CI repressor (‘high expression phenotypes’); (ii) mutants indistinguishable from the wildtype with CI (‘no expression phenotypes’); and, importantly for our argument, (iii) mutants with expression levels that the wildtype cannot achieve (‘intermediate expression phenotypes’). Second, by calculating the Shannon entropy, which quantifies how uniformly the mutants cover the entire range of possible gene expression phenotypes.
The difference between the effects of mutations in the cis- (Figure 2B,C,D) and the trans-element (Figure 2E,F,G) is unambiguous. Most cis-element mutants, in the presence of CI, have low expression (Figure 2B,C,D). Such low expression can result either from sufficient CI repressor binding and/or impaired RNAP binding. The frequency of mutants with an intermediate expression phenotype in the presence of CI increases with the average number of mutations, a pattern that is also observed in cis-element libraries in the absence of CI (Figure 2—figure supplement 1B,C,D). In contrast, the effects of mutations in the trans-element have a distinctive bimodal distribution (Figure 2E,F,G). The two peaks in the distribution correspond to the expression levels of the wildtype population in the absence and presence of CI, revealing that the majority of CI mutants are either fully or close to fully functional, or completely inactive on the wildtype cis background. Furthermore, increasing the average number of mutations in the trans-element decreases the frequency of intermediate expression phenotypes. As the shape of the DME is determined by the underlying biophysical and mechanistic constraints that limit the phenotypic variation accessible through mutation, we conclude that the cis- and the trans-element have categorically different constraints. This categorical difference is best observed in a direct comparison between high mutation cis (Figure 2D) and low mutation trans (Figure 2E) libraries, which have approximately the same average number of mutations.
In order to further demonstrate that this categorical difference is not an artifact of the different number of mutations introduced into each element, we show that the same general trend is evident when comparing the effects of 150 random single point mutations of known identity in each, the cis- and the trans-element (Figure 2—figure supplement 2A,B). These measurements were done at the population level, in a plate reader. Point mutations in the cis-element, when only their effect on RNAP binding is measured, show a high frequency of intermediate expression levels. This is in agreement with other studies of prokaryotic (Kinney et al., 2010) and eukaryotic (Shultzaberger et al., 2012) DNA-binding sites. Similarly, we find a bimodal distribution of single mutation effects in the trans-element CI, which has previously been reported for other transcription factors (Pakula et al., 1986; Markiewicz et al., 1994) and enzymes (Jacquier et al., 2013), and may be a common feature of proteins that are close to their optimum (Soskine and Tawfik, 2010; Bataillon and Bailey, 2014).
The DMEs for the system, in which both the cis- and the trans-element are mutated simultaneously, show a surprising pattern: a higher frequency of intermediate phenotypes compared to either of the individual component DMEs (Figure 2H,I,J). This pattern can also be observed in the library of 150 random system double mutants (Figure 2—figure supplement 2), which consist of a unique combination of the previously described point mutations in cis and in trans (comparing system to cis: DKS = 0.39, p<0.0001; system to trans: DKS = 0.66, p<0.0001). Furthermore, the frequency of intermediate phenotypes increased with the average number of mutations (Figure 2). At intermediate and high mutation probabilities, the Shannon entropy of the system DMEs was also greater than the entropy of either of its constitutive components (Figure 2; Figure 2—source data 3), indicating that mutating the whole system gives rise to a more uniform range of possible phenotypes. The existence of a difference between system DMEs in the absence (Figure 2—figure supplement 1H,I,J) and in the presence of repressor CI (Figure 2H,I,J) indicates that a substantial portion of mutant CIs exhibit binding to the mutated cis-element backgrounds, and are thus functional repressors that specifically recognize operator mutants.
We tested if the observed differences between the system and the component DMEs might arise from differences in the gene expression noise of mutants - if the system mutants have greater noise, they could also lead to an increase in the frequency of intermediate phenotypes. However, this is unlikely as, for every mutation probability, we observed no difference in gene expression noise in our flow cytometry measurements (measured as the coefficient of variation) between 20 randomly selected system, cis, and trans mutants in the presence of CI (Figure 2—figure supplement 3, 4 and 5). Furthermore, gene expression noise was constant between all 180 random isolates (60 isolates from each of the three mutation probabilities) irrespective of their mutation probability (Figure 2—source data 1).
We wanted to understand if the observed increase in the abundance of intermediate phenotypes when the whole system mutates (Figure 2) can be attributed to epistatic interactions between mutations in the cis- and the trans-element. Since we use log10 of expression as our phenotype of interest, we calculate epistasis between mutations in the two components of the system (what we call intermolecular epistasis) as the deviation from the additive prediction based on single component effects. The additive null prediction for interactions between mutations, when considering only three possible categories of mutational effects (‘no’, ‘intermediate’, and ‘high expression’), is shown in Table 1. The standard approach of extending these predictions to whole distributions requires a convolution of individual component DMEs (Orr, 2003; Lee et al., 2010). To obtain this ‘naïve’ null prediction, we convolved the observed trans-element DME (shown in Figure 2E,F,G) with the distribution showing how mutations in cis alter wildtype expression (for further details, see Materials and methods). Note that effectively no mutations in cis or trans decrease expression relative to the wildtype, resulting in a ‘naïve’ system prediction exhibiting an increase in the frequency of mutants with high expression phenotypes, as seen in Figure 2H,I,J. We find that ‘naïve’ convolution predictions, which are carried out in the absence of any knowledge of the genetic regulatory structure of the system, are significantly different from the observed system distributions (Figure 2H,I,J), suggesting the existence of intermolecular epistasis between the two components.
To further show that intermolecular epistasis between the cis- and the trans-element is a common feature of our system that shapes DMEs not only at elevated mutation rates but also when only a single point mutation is present in each of the components, we utilized the previously described 150 random system double mutants (Figure 2—figure supplement 2), which consist of a unique combination of a single point mutation in cis and a single point mutation in trans (Figure 3—figure supplement 1; Figure 3—source data 1). In a plate reader, and hence at the level of a monoclonal population, we measured expression levels of all 150 system double mutants, as well as their corresponding single mutants, in the presence of CI (Figure 2—figure supplement 2). From these measurements, we calculated epistasis as the deviation from the additive expectation based on wildtype-normalized single mutant effects. This definition of epistasis mirrors the convolution approach utilized for the analysis of DMEs in Figure 2.
Intermolecular epistasis was common between single point mutations in our system, as 71 of 150 double mutants significantly deviated from their additive expectations (Figure 3, Figure 3—figure supplement 2). As such, intermolecular epistasis impacts expression levels in the system not only when a relatively large number of mutations accumulate, as shown in Figure 2, but also when a single point mutation is introduced in each of the components. Furthermore, 53 of these 71 mutants were in positive epistasis, meaning that the double mutant effect was higher than expected based on single mutant effects. Such positive epistasis contributes to the observed increase in the frequency of mutants with intermediate phenotypes in the system (Figure 2—figure supplement 2), as seemingly neutral trans mutations, which fully repress the wildtype cis, show elevated expression on mutated cis backgrounds. On the other hand, the presence of negative epistasis indicates a penetrant trans mutation, whose high expression on the wildtype cis is decreased on a mutated cis background. Such epistasis most frequently arises from loss-of-function trans mutations, in which the system expression level in the presence of CI is determined only by the effect of the cis mutation in the absence of CI. Consequently, we observed negative epistasis in 8 of 10 trans single point mutants that exhibited no measurable repression (Figure 3—source data 2). We did not identify any relationship between the presence of intermolecular epistasis and the physical location of mutations in the trans- (χ2(4)=2.02; p=0.73) or the cis-element (χ2(5)=1.69; p=0.89)(Figure 3—source data 1), indicating that even though individually mutations in some loci have a greater effect on expression level, they are not associated with any particular form of epistasis.
In the system we study, the genetic regulatory structure (Figure 1) indicates that mutations in cis affect both the binding of RNAP and of the repressor. A comparison to the ‘naïve’ convolutions performed without accounting for this regulatory structure (Figure 2) demonstrates that the presence of intermolecular epistasis prevents accurate predictions of system DME from individual component DMEs. We wanted to understand if these predictions could be improved by accounting for the effects of cis-element mutations on RNAP binding, which are measured in the absence of CI. In other words, we wanted to connect the basic knowledge of the regulatory structure of the system to the epistasis between mutations in the two components.
To do so experimentally, we combined the high mutation probability cis and the low mutation probability trans-element libraries (Figure 4). We chose these two libraries because they have similar expected number of mutations (n ~ 6), therefore minimizing a potential bias in their mutational effects arising from the difference in the number of mutations in the two components. We used FACS to partition the mutant libraries into the three phenotypic bins (no expression, intermediate, and high expression, as in Figure 2). In this manner, we partitioned two libraries: the high mutation cis-element library in the absence of CI (Figure 2—figure supplement 1D) and the low mutation trans-element library in the presence of CI (Figure 2E). We constructed nine new mutant libraries with all possible combinations of these partitions (Figure 5A–I). From these combination DMEs, measured in the presence of CI, we calculated the expected frequencies of mutants in each of the three categories, by weighting each combination DME by the relative frequency of the original trans-element partition it was derived from. Then, the predicted frequency in each category of the system DME is the sum of the weighted counts in the corresponding category across all nine DMEs (see Materials and methods). These predicted frequencies were not different from the observed ones (Table 2). As such, only by experimentally accounting for the genetic regulatory structure of the system (Figure 1) can we accurately predict mutant frequencies in three biologically meaningful categories and qualitatively explain how the cis background alters the phenotypic effects of trans-element mutations, as detailed in the legend to Figure 5.
We also produced a mathematical prediction of the system DME that incorporated the knowledge of its genetic regulatory structure. To do so, in addition to incorporating the knowledge of the effects of cis mutations in the absence of CI, we also assumed that all trans-element mutations that have high expression (same expression as the wildtype in the absence of CI) are loss-of-function mutants, which do not bind any mutated cis. When convolving the two component distributions, the effects of such loss-of-function trans mutants are removed and replaced by the cis-element DME in the absence of CI. This approach, which accounts for the effects of cis mutations on RNAP binding, captures the frequencies of mutants in the three phenotypic categories (‘no’, ‘intermediate’, and ‘high’ expression) more accurately than the ‘naïve’ convolutions (Figure 4; Table 2; Figure 2—figure supplement 6). From a theoretical evolutionary perspective, it is the deviations from a simple additive model that have well-documented consequences for organismal evolution, as they determine the ruggedness of the adaptive landscape (de Visser and Krug, 2014). Here, we show how those deviations emerge from the underlying genetic regulatory structure, and hence how they might lead to better predictions of regulatory system DMEs.
While considering the effects of cis-element mutations on RNAP binding explains intermolecular epistasis to an extent that allows more accurate predictions of system DMEs, it might not explain all of it. To determine the extent to which intermolecular epistasis cannot be explained by accounting for the genetic regulatory structure of the system, we constructed a second library of 150 system double mutants. This time, instead of randomly combining point mutations in cis- and trans-elements, we combined point mutations with a specific phenotype. Namely, all cis-element point mutants exhibited high expression in the absence of CI, meaning that the binding of RNAP was not measurably impaired (Figure 6—figure supplement 1A). All point mutations in trans used to assemble the 150 double mutants exhibited no expression (Figure 6—figure supplement 1B), meaning that these mutants had a fully functional trans-element on the wildtype cis background. The system double mutant library made in this manner corresponds to the partition combination shown in Figure 5C. In such a library, in the presence of CI, the additive null model predicts that a double mutant would exhibit the same phenotype as its cis point mutant, when the corresponding trans mutant maintains the same binding properties as the wildtype CI. When this is true, the system double mutant would not be in significant epistasis. Conversely, the system double mutant in this library would be in significant epistasis only when the trans mutant binds the mutated cis differently than the wildtype trans does.
We found that 15 double mutants in this library are in significant epistasis (Figure 6). These mutants maintained a decreased yet substantial dynamical range between the two environments (absence and presence of CI), and hence were still functional regulators (Figure 6—source data 1). Furthermore, all 15 were in positive epistasis, indicating that the double mutant effect is greater than the additive expectation. In such mutants, trans mutations are phenotypically neutral on the wildtype cis, but not on a mutated cis background, meaning that the trans mutant binds the mutated cis less strongly than the wildtype CI does. It is worth noting that the lack of mutants that are in negative epistasis in this library might be due to the strong wildtype binding of the CI repressor, so that introducing point mutations in trans that improve binding is highly unlikely. When mutations in trans induce alterations to the binding properties of the repressor (but not complete loss of function), intermolecular epistasis cannot be accounted for by the underlying structure of the genetic regulatory network. Interestingly, a disproportionate number of double mutants that were in positive epistasis carried a trans mutation in the linker region of the CI (χ2(4)=20.66; p<0.0005), which connects the N-terminal DNA-binding domain with the C-terminal dimerization domain (Figure 6—figure supplement 2; Figure 6—source data 2). This is in contrast to the random double mutant library (Figure 3), where we found no relationships between location of mutation and epistasis.
In this study, we show that mutating a molecular system with the most common transcriptional regulatory structure in prokaryotes (Salgado et al., 2013), namely a repressible promoter, increases phenotypic variation beyond what can be achieved by mutating any of the individual components alone. We focused on the phenotypic effects, rather than the fitness effects of mutations, in order to minimize the complexity of the studied system and also to enable a more direct interpretation of the results, as fitness effects of mutations depend on a much larger, and often unknown, set of factors than simply their phenotypic effects. Doing so enables an interpretation of observed DMEs in the light of their underlying molecular mechanisms, as recently shown for a prokaryotic (Lagator et al., 2017) and a eukaryotic cis-regulatory element (White et al., 2016). Furthermore, while previous studies investigated the effects of specific mutations in the contact surface between two molecules (Anderson et al., 2015; Podgornaia and Laub, 2015), their local nature, imposed by the large mutational space of proteins, prevented conclusions about DMEs of molecular systems. Precisely because our experimental approach allowed us to overcome these obstacles, we were able to show how the regulatory network structure determines intermolecular epistasis, indicating a broad range of conditions under which the additive null models of interactions between mutations might be inaccurate.
The observed increase in the frequency of intermediate phenotypes arises, in large part, from intermolecular epistasis between the two components of the system, most of which can be attributed to the structure of the gene regulatory system. The observed increase is due to both positive and negative epistasis. When intermolecular epistasis is positive, mutations in cis expose the genetic variation hidden in originally phenotypically neutral trans mutations. This can be achieved in two ways: (i) through changes in the RNAP and repressor binding sites, which are accounted for by considering the genetic regulatory structure of the system (Figure 5); or (ii) when mutations in trans change the binding preference of the repressor, so that a mutation in cis decreases the binding of the mutated trans more than it decreases the binding of the wildtype repressor (Figure 6). Negative intermolecular epistasis, on the other hand, arises when: (i) the trans mutations are penetrant, so that their effects (in particular, increased expression) are buffered by mutations in cis. This epistasis is frequently observed when trans mutations lead to a complete loss of binding (high expression phenotype), so that the system phenotype becomes the same in the presence and in the absence of CI (Figure 3). In other words, the system undergoes a qualitative transition from a three-component and thus a regulated promoter, to a two-component or a constitutive promoter. Under these conditions, the system phenotype is determined only by the effect of cis mutations on RNAP binding and can be thus explained by considering the genetic regulatory structure of the system (Figure 5). (ii) Negative epistasis can also arise when trans-element mutations alter the binding preference of the repressor, so that the mutated trans binds mutated cis more strongly than the wildtype repressor does. We found no evidence for this type of epistasis in the library of 150 random double mutants, likely due to our use of the Lambda PR promoter, which binds the repressor very tightly.
In the studied system, we demonstrate that intermolecular epistasis is present even between single point mutations in the cis- and the trans-element (Figure 3), and not only when a larger number of mutations accumulates in each component (Figure 2). When considering interactions between point mutations (Figure 3;6), we identified that positive epistasis contributes disproportionately to the increase in the frequency of intermediate phenotypes. The extent to which positive, as opposed to negative, epistasis drives the observed increase in the intermediate phenotypes when a greater number of mutations are present in the components (Figure 2) cannot be addressed without knowing the effects of specific molecular interactions between a prohibitively large number of mutation combinations.
Epistasis has been shown to impose constraints on evolutionary paths by increasing the ruggedness of the adaptive landscape both in theoretical and experimental studies (Whitlock et al., 1995; Barton and Partridge, 2000; Weinreich et al., 2006; Poelwijk et al., 2011; Breen et al., 2012; Podgornaia and Laub, 2015). Our data suggest that, for transcriptional regulatory networks, mutating the system of specifically interacting components alleviates the biophysical and mechanistic constraints acting on individual components, and in doing so increases phenotypic variation accessible through mutation. This role of intermolecular epistasis as a facilitator rather than a constraint on evolution arises, in significant part, directly from the genetic structure of a repressible transcriptional regulatory system. As such, it might be a common feature in prokaryotes, where repression through direct binding site overlap with RNAP forms the most common type of transcriptional regulatory system organization (Salgado et al., 2013). In these and other similar systems, potential paths for protein evolution might be more abundant when the interacting DNA-binding site is also mutating, as mutations in the partner component can expose the genetic variation hidden in originally phenotypically neutral mutations. Such intermolecular epistasis could give rise to punctuated protein evolution (Fontana and Schuster, 1998) – long periods of phenotypic stasis during which a transcription factor accumulates neutral mutations, interrupted by rapid adaptive evolution facilitated by mutations in the DNA-binding site. Furthermore, a gene might be horizontally transferred with its cognate cis-element, but without its cognate regulator. Then, intermolecular epistasis between the transferred cis-element and the orthologous transcription factor may reveal previously unavailable phenotypes that can facilitate adaptation to new niches. Therefore, accumulating mutations in the entire system, as opposed to only in a single component, appears to facilitate evolution both by extending the neutral sequence space, and hence increasing diversity through drift (Lynch and Hagner, 2015), as well as by increasing the available phenotypic variability.
The right regulatory region of Lambda phage (OR) is responsible for the decision-making between lysis or lysogeny (Johnson et al., 1981). The regulatory region consists of two RNA polymerase (RNAP) binding sites - promoters PR and PRM (not shown), and three CI/Cro transcription factor operators (OR1, OR2, and OR3) (Figure 1A). In the wildtype system, the strong promoter PR leads to expression of the transcription factor Cro. The transcription factor CI represses PR promoter by direct binding-site competition with RNAP.
We used a synthetic system based on the Lambda phage switch, in which we decoupled the cis- and trans-regulatory elements (Figure 1B,C). We removed cI and substituted cro with venus-yfp (Nagai et al., 2002) under control of PR promoter, followed by a T1 terminator sequence. The OR3 site was removed in order to remove the PRM promoter. Separated by a terminator sequence and 500 random base pairs, we placed cI under the control of PTET, an inducible promoter regulated by TetR (Lutz and Bujard, 1997), followed by a TL17 terminator sequence. In this way, concentration of CI transcription factor in the cell was under external control, achieved by addition of the inducer anhydrotetracycline (aTc). The entire cassette was inserted into a low-copy number plasmid backbone pZS* carrying a kanamycin resistance gene (Lutz and Bujard, 1997). In this system, cI constitutes the trans-element, while the PR promoter together with the two CI operator sites OR1 and OR2 make the cis-element.
The library of cis- and trans-element mutants was created using the GeneMorph IITM random mutagenesis kit (Agilent Technologies, Santa Clara, US). We created three mutant libraries with different average probability of mutations (0.01, 0.04, and 0.07 mutation chance per nucleotide) for both the transcription factor (714 bp) and the PR cis-regulatory element (84 bp). Therefore, the average number of mutations per mutant in the trans-element libraries was 7, 28, or 49, while in the cis-element it was 1, 3, or 6, respectively. We applied the same likelihood of mutation per nucleotide rather than using the same actual number of mutations between equivalent cis- and trans-element libraries in order to more accurately represent the biological process of mutagenesis.
PCR products of mutagenesis reactions were ligated into the wildtype construct, and inserted into One-Shot Top10 cells (Life Technologies, Carlsbad). This step was used to maximize the library diversity due to One-Shot Top10 cells’ high competency. Following electroporation, cells were plated at low concentrations on selective kanamycin plates to allow single colony formation and minimize resource competition, and grown overnight. Using chilled LB media, colonies were washed off the plates and collected. To ensure large coverage, we cloned mutagenized PCR products until we obtained at least 30,000 individual colonies (uniquely transformed individuals). Due to the stochastic nature of the mutagenesis protocol used (Agilent Technologies), the number of uniquely transformed individuals did not necessarily equal the number of different mutant genotypes, especially at low mutation rates. To illustrate, when the mutation rate was low so that a cis-element mutant would have on average only one mutation, some PCR mutagenesis products did not contain any mutations. When mutations were in the trans-element, which is ~10-fold longer, almost all PCR mutagenesis products contained several mutations. Using the information provided by the supplier (which we verified by sequencing 40 mutants from each cis and trans library, see below) on the distributions of the number of mutations (given a mutagenesis rate), we estimated that approximately 34.2%, 3.5%, and 0.2% of the low, intermediate, and high mutation number cis-element libraries consisted of wildtype genotypes. The comparative proportion of wildtype genotypes in the trans-element libraries was 0.07%, 10−13%, and 10−22%.
Populations containing a mixture of mutants with a given number of mutations were used to isolate plasmids, and clone them into the modified MG1655 strain expressing tetR gene from a constitutive PN25 promoter. We showed that the distributions of mutation numbers in 40 isolated individuals from each library conformed to the distributions of mutation numbers provided by the supplier. We did this by comparing the actual distribution of the number of mutations to the Poisson distribution based on the predicted mutation probability, using a Kolmogorov-Smirnov (K-S) test (Figure 2—source data 2). We used a power test to determine that the sample size of 40 was sufficient to verify the predicted number of mutations, with power set at 0.80 and desired detectable difference of ±0.5 mutations. Among the 240 sequenced mutants from the six libraries (6 × 40), we found no bias toward a specific type of mutations (transitions vs. transversions), nor did we identify overrepresentation of any particular single point mutation or locus in this dataset. From the sequenced mutants, we could estimate the proportion of re-ligated wildtype plasmid, in which the mutated region was not inserted and instead the wildtype used as the cloning template re-ligated back (this is a common occurrence with any library created through standard restriction-digestion cloning techniques). By observing the frequency of wildtype genotypes in the sequenced trans libraries (which, due to the relatively large number of mutations should not contain any wildtype sequences), we estimate that < 5% of each library is re-ligated with the wildtype insert (Figure 2—source data 2). As this is a relatively small frequency to estimate precisely from sequencing 120 clones (3 × 40 trans library sequences), we do not account for it when considering the proportion of wildtype cells in each library. More importantly, this bias should be the same for all libraries and is therefore unlikely to alter our comparative analyses. Finally, among the 240 sequenced plasmids, we did not observe any that contained neither the mutated nor the wildtype insert, as all sequenced plasmids had an insert from cloning.
To make the whole system libraries, we removed through restriction digestion the mutated cis-element from each cis library and cloned it into the plasmid library already containing trans-element mutants with the corresponding mutation probability. Note that system libraries created in this way would likely have somewhat reduced diversity compared to either cis or trans libraries, as stochastically some mutants present in the component libraries would not find their way into the combined system library. By our design, this potential reduction in diversity would be equally biased toward cis and trans mutants, and ought not to be inflated for one component compared to the other.
In this study, we set out to understand phenotypic effects of mutations and to connect the DMEs to the epistasis between the two components. To achieve this goal, we investigated the DMEs in random mutant libraries, therefore without knowing the identity of the specific mutants. We did not characterize individual mutants, as drawing conclusion about the effects of specific mutations would require an unachievable high number of mutants to be analyzed. This was because the relatively large number of mutations introduced, in particular in the trans-element where each individual in the low mutation probability library contained on average seven point mutations, meant that we covered a very large mutational space. As such, we would need to characterize an astronomical number of mutants to gain the statistical power necessary to discern the effects of individual mutations from the interactions between them. Marginal sampling of such a huge sequence space, which is the best we could achieve, would tell us nothing about how individual positions affect the overall DME.
In order to obtain the distributions of phenotypic effects of mutations in mutant libraries, we used flow cytometry/fluorescence activated cell sorting (FACS) to analyze expression levels of a yellow fluorescence protein. For all libraries, we measured gene expression levels both in the absence and in the presence of the transcription factor CI, determined by absence or presence of the inducer aTc, respectively. Throughout, we use log10 of expression level as our phenotype of interest. Mutant libraries, as well as the wildtype construct, were grown overnight in M9 minimal media supplemented by 1% casamino acids, 2% glucose, and 30 μg/ml kanamycin, and either without or with 8 ng/ml aTc. From this point, the investigator was blinded with respect to the identity of each processed library. Overnight populations were diluted 100 times, grown for 2 hr, diluted a further 10 times and their fluorescence measured in a BD FACSAriatm III cell sorter. Fluorescence of 500,000 cells was measured for each replicate of each library. Two independent replicates of each mutant library and the monoclonal wildtype population in each of the two growth conditions were measured in the manner described above. All flow cytometry data were subsequently analyzed in FlowJo version 10.0.8r1, and measurements with extreme FSC-A and SSC-A values were excluded from the analyses. The two replicate measurements of each library exhibited the same distributions of fluorescence phenotypes (tested by the K-S test) and were pooled together, to give a million counts for each library.
We verified that 1 million individual measurements, as well as the library diversity of at least 30,000 mutants, accurately described phenotypic distributions of possible mutations. To ensure that we are capturing a significant proportion of all possible phenotypic effects, we subsampled progressively smaller number of measurements (n = 500,000; n = 250,000; n = 100,000; n = 50,000) of cis- and trans-element mutant libraries, in each relevant environment (both absence and presence of CI for cis-element libraries, and only presence of CI for trans-element libraries). We quantified the difference between each subsampled dataset and the corresponding full dataset using a K-S test, and found that, by randomly subsampling each dataset 50 times, the distributions of phenotypes in subsamples were not statistically different from the distribution of the full dataset (Figure 2—source data 4).
Shannon entropy was used as an estimate of how uniform the distribution is across the whole range of possible phenotypes. The range of possible phenotypes was defined by the minimum and the maximum fluorescence measurement in the entire dataset, across all measured mutant library DMEs. Entropy was calculated as:
where Pk is the frequency of fluorescence measurements in the kth bin, and is the width of the bin, which was set to 0.05. In principle, values of entropy estimates depend on the bin width, so we checked explicitly that our conclusions do not depend on this particular choice. Error associated with each entropy measurement was calculated using standard bootstrapping methods. We performed a nonparametric permutation test to assess if the differences in entropy are significant.
We randomly isolated 20 cis, 20 trans, and 20 system mutants, from each mutation probability library, giving rise to 180 isolates. Power analysis (power.anova.test function in R statistical package) indicated that 20 samples in each category were sufficient to detect differences in gene expression noise of 2%, at significance level of 0.05 and power greater than 0.9. In a flow cytometer, we measured gene expression levels in the presence of CI in 100,000 individual counts, for two replicates of each mutant isolate, as well as for the wildtype. First, using a K-S test, we confirmed that the two replicate distributions for each mutant were not significantly different. We combined the two replicate measurements, and then randomly sampled without replacement 5000 reads from this common pool ten times (Figure 2—figure supplement 3, 4 and 5). Gene expression noise for each such sub-sample of 5000 reads was estimated as the coefficient of variation, as done in other studies on gene expression noise (Metzger et al., 2015). Note that the gene expression noise measured in this manner comes from two sources – the heterogeneity of gene expression between individual cells and the measurement error inherent in the flow cytometer. We assume that the measurement error is constant between all mutants, so that possible changes in the coefficient of variation would indicate a difference in the heterogeneity of gene expression between genetically identical individuals. Using ANOVA (aov function in R statistical software, version 3.4.1), we asked if there were differences in the noise of gene expression between the 60 mutants of the same mutation probability, as well as between all 180 tested mutants. We performed this test separately for the mutants that had no expression (that were fully contained in the ‘no expression’ category), and for all other mutants. These two groups were treated separately because the flow cytometry fluorescence measurement is not responding to the same intracellular environment when the cell is producing a fluorescence protein and when it is not. As such, estimates of gene expression noise in the two categorically different types of intercellular environments are not directly comparable (Figure 2—source data 1). Note that the tests carried out across all three mutation probabilities, which included 95 mutants with no fluorescence and 85 mutants with fluorescence, found no differences in gene expression noise. The probability of mutants with significantly different noise existing in the library but not being detected at such sample sizes is less than 10−4 (calculated using experimentally-observed within and between group variance and at power level of 0.9). For a library of 30,000 random mutants, this would mean that no more than 10 system mutants could have different gene expression noise. For such a small number of mutants to skew the observed system DMEs and lead to an increase in intermediate phenotypes would be possible only if they were strongly overrepresented in the library. This is unlikely the case, since all 180 isolated mutants had growth rates that were comparable to the wildtype, and hence increase in noise would have to be hugely beneficial on a very short time scale (around 10 generations).
Let us consider the effects of mutations, mcis and mtrans, in two components of a system. As our phenotype of interest is log10 of fluorescence level, we assume that, if two mutations are independent, their effects are additive: mexpected = mcis + mtrans. A deviation from this additive assumption is termed epistasis, so that: ε = mobserved mexpected. Note that a deviation from an additive expectation in log is equivalent to a multiplicative prediction on the linear scale. When considering a library of mutants in two components, their effects are represented by corresponding DMEs, fcis(m) and ftrans(m), where m is the ‘true’ effect of a random mutation on the wildtype, in the absence of experimental noise and measurement errors. If one could obtain ‘true’ DMEs for mutant libraries (in the absence of any type of noise or error), then the additive null expectation should follow a simple convolution of the two distributions: fexpected = fcis * ftrans. Any deviation of the observed combined library DME (fobserved) from the fexpected is indicative of epistasis.
In a realistic setting, experimental noise and instrumental error prevent direct measurements of the ‘true’ underlying DME (f), so that any measured DME (F) incorporates all the errors with its ‘true’ DME f. This means that, if the experimental noise and instrumental error do not change between mutants and across the dynamical measurement range, as is the case for our data (Figure 2—source data 1; Figure 3—source data 3), then the observed DME (F) of a mutant library is equivalent to a convolution between its ‘true’ underlying DME (f) and the measured wildtype distribution (Fwt): F = f * Fwt. In other words, the ‘true’ DME (f) shows how mutations alter wildtype expression. It follows that the additive prediction for the combined library, in the absence of epistasis, is a convolution of three DMEs:
F+expected = f+expected * F+wt = f +cis * f +trans * F+wt
where f +cis and f +trans are the ‘true’ distributions of the cis- and the trans-element, respectively, and the superscript ‘+’ indicates the presence of CI. The same equality would of course hold true in the absence of CI, but the analysis is trivial, as then the trans library shows no difference to the corresponding wildtype F--wt, that is f --trans is a unit element for the operation of convolution (delta-distribution). For simplicity, we will omit the subscript when discussing DMEs obtained in the presence of CI.
The additive prediction (multiplicative in log10 of expression) can be rewritten in two equivalent forms:
Fexpected = Fcis * ftrans
Fexpected = fcis * Ftrans
To obtain either of the underlying ‘true’ DMEs, we would need to deconvolve the wildtype distribution from one of the measured component library DMEs. However, although well understood and well behaved analytically, (de)convolutions are known to be highly unstable when used on numerical datasets, like ours. Therefore, we would need at least one of the component distributions in their analytical form. Instead of fitting one of the measured DMEs to some analytical representation followed by a deconvolution, we decided to directly ‘reverse engineer’ one of the component DMEs. We chose the cis DME in the presence of CI, as the simpler of the two. Concretely, we searched for fcis, such that its convolution with Fwt matches the observed Fcis as closely as possible. We assumed fcis to be from the gamma-family, as a relatively wide family of curves often used to describe DMEs (Figure 2—figure supplement 7A). We optimized three parameters of the fcis: shape, scale, and location, to minimize the squared differences between the observed cis-element DME (Fcis) and fcis * Fwt (Figure 2—figure supplement 7B). Note that the ‘true’ cis DME indicates that effectively all mutations in cis are either neutral or they increase expression. Because of this, convolving the ‘true’ cis distribution with any other distribution results in an overall increased expression, independently of where the original distribution is centered. This is in line with the usual treatment of single mutant effects: if an effect of a mutation is positive, the additive model states that the effect remains positive (and the same), independently of the genetic background.
After we obtained the ‘true’ DME for cis (fcis), we convolve it with the observed trans DME to produce a naïve null-model for the system DME in the absence of epistasis. Convolving without any adjustments results in a part of the predicted system DME that lies beyond the highest experimentally recorded fluorescence levels (Figure 2—figure supplement 7C). Because we use one of the strongest known promoters, the Lambda PR, predictions that have higher expression than the wildtype are not biologically meaningful, as no combination of component mutations could experimentally result in such high expression levels. To reflect this maximal biologically obtainable limit to expression levels, we introduce a cutoff, effectively treating any mutant that would result in higher expression levels as having the wildtype expression. In practice, we (i) removed the high expression peak from the trans DME (as convolution with those mutants gives rise to higher expression levels), (ii) performed a convolution between the remainder of the trans DME and the fcis, (iii) introduced a smooth cutoff at maximum expression levels to the convolved distribution, and (iv) added back the residual part of the high-expressing mutants. More specifically in the first step, we fitted the fraction α of the wildtype distribution in the absence of CI (F--wt) to minimize its (square) difference to the right-hand part of the high-expression trans peak (Figure 2—figure supplement 7D,E). In this way, we obtained a smooth remainder to convolve with fcis, which will produce biologically realistic values. In the end, we add back the distribution of the wildtype in the absence of CI (F--wt), so to obtain a properly normalized predicted distribution (Figure 2—figure supplement 7F). Because we impose this limit to the highest biologically obtainable expression level, convolving a hypothetical distribution consisting of predominantly high expression phenotypes with the ‘true’ cis distribution (fcis) would result in only high expression phenotypes.
The prediction for the system DME obtained in this manner reflects only two assumptions: the additive assumption of no epistasis, and the limit to maximal attainable expression levels. As such, this naïve prediction explicitly disregards any information about the genetic regulatory structure of the system. For each mutation probability, we evaluate if the predicted DME is different from the experimentally observed DME by conducting Pearson’s Chi-squared test to compare the frequencies of mutants in the three expression categories (‘no’, ‘intermediate’, and ‘high’ expression).
We created 150 mutants with a random point mutation in cis, and 150 mutants with a random point mutation in trans. Cis-element mutants were identified by Sanger sequencing of 400 randomly selected mutants from the cis-element library with low (0.01) mutation probability. To obtain 150 trans-element mutants, we repeated the random mutagenesis protocol on cI with a very low mutation rate (yielding approximately one mutation per kb). From this reaction, we randomly selected and sequenced 500 mutants in order to identify 150 that contained only a single point mutation. Then, we created a library of 150 double mutants, with one point mutation in the cis- and the other in the trans-element. These 150 double mutants were unique, as each one consisted of a unique pairing between cis and trans point mutations, so that no point mutations were found in more than one double mutant (Figure 3—source data 1).
In a plate reader, we measured fluorescence levels of all 150 double mutants as well as of their corresponding point mutants. The mutants, as well as the wildtype, were grown overnight in M9 minimal media supplemented with casamino acids, 30 μg/ml kanamycin, and either in the absence of CI or in the presence of CI (induced with 8 ng/ml of aTc). The overnight populations were diluted 1000-fold, grown until OD600 of approximately 0.05, and their fluorescence measured in a Bio-Tek Synergy H1 platereader. This procedure was replicated six times for each mutant. We performed a series of pairwise t-tests in order to determine which isolates had significantly different fluorescence to the wildtype. Using a K-S test, we compared if the system double mutants had a higher frequency of intermediate phenotypes.
Consistent with convolution-based analyses, we consider expression level as the log10 of fluorescence, so that epistasis is defined as a deviation from an additive model, as ε = msystem - (mcis +mtrans), where msystem is the wildtype-relative fluorescence of a system double mutant, and mcis and mtrans the wildtype-relative fluorescence of the two corresponding single mutants. Epistasis was calculated in the presence of CI, as in the absence of CI all trans mutants exhibited wildtype expression, and all system double mutants had the same expression as the cis mutant alone. In order to statistically determine which double mutants exhibited epistasis (i.e. ε not equal 1), we conducted a series of FDR-corrected t-tests. The errors were calculated based on six replicates, using error propagation to account for the variance due to normalization by the wildtype. Variance was not significantly different between measured mutants (Figure 3—source data 2).
It is possible that the estimates of epistasis through population-level measures of fluorescence levels in a plate reader might not be equivalent to estimates obtained through flow cytometry. This would particularly be true if the gene expression noise varied significantly between mutants. In order to confirm this is not the case in our study, we randomly selected 30 double mutants that, based on plate reader measurements, were in significant positive epistasis, 10 double mutants that were in significant negative epistasis, and 20 mutants that were not in significant epistasis. Then, we measured the fluorescence in 100,000 individual reads for two replicates of each isolate in a flow cytometer, in the absence and in the presence of CI (Figure 3—figure supplement 3, 4 and 5). First, we compared if gene expression noise between single and double mutants was the same, by conducing the same kind of analysis as described above, and found no differences between mutants (mutants with no expression: F83,747 = 0.891; p=0.59; mutants with expression: F95,855 = 1.332; p=0.174) (Figure 3—source data 3). We also confirmed that the noise in single/double mutants was not different to the gene expression noise in isolates from low, intermediate, and high mutation probability libraries (mutants with no expression: F166,1494 = 0.765; p=0.746; mutants with expression: F180,1620 = 1.385; p=0.485). Then, we calculated epistasis in the presence of CI from flow cytometry measurements in the same manner as described for plate reader measurements (Figure 3—source data 4). To evaluate the significance of calculated deviations from the additive expectation (epistasis), we use error propagation on the standard deviation obtained from the combined flow cytometry distributions of the two replicates, and not on the variance between means of replicate measurements (since the measured means for each isolated mutant were near identical between replicates). Linear regression between the estimates of epistasis from the two types of measurements shows that flow cytometry gives the same description of epistasis as the plate reader measurements (F1,58 = 350.5; p<0.0001) (Figure 3—figure supplement 6).
Because all 150 double mutants were sequenced, we could test if epistasis was associated with the location of mutations. For the trans-element, we identified three locations: the N-terminus and the C-terminus domains, and the linker region between them (Figure 3—source data 1). For the cis-element, the mutations could either be in one of the CI operator sites, in the RNAP contact residues (−10 and −35 regions), in the sites that have direct contact with both, or those that do not have direct contact with either protein (Figure 3—source data 1). Then, we tested if existence of epistasis depended on the location of point mutations through a Pearson’s Chi-squared test, which considered only the binary value for epistasis: either the presence of significant epistasis or its absence.
We wanted to explore if accounting for the genetic regulatory structure of the studied system would improve our ability to predict the system DME from the DMEs of its components. To this end, we put together the low mutation probability trans-element (Figure 2E) and high mutation probability cis-element libraries (Figure 2—figure supplement 1D) and measured the DME for this library in the manner described above. We put together these two libraries as they have approximately the same average number of mutations (seven for the trans- and six for the cis-element), allowing a comparison that is not influenced by the actual number of mutations in each of the two elements.
In order to experimentally predict the frequencies of phenotypes in each category of the low mutation trans-element +high mutation cis-element library, we partitioned the low mutation probability trans-element library in the presence, and high mutation probability cis-element library in the absence of CI. Using FACS, we partitioned each library into three bins, corresponding to the no expression, intermediate, and high expression phenotype categories. We sorted a minimum of 500,000 individuals into each bin, and grew them overnight in LB with 30 μg/ml kanamycin. Using these populations, we obtained a measure of sorting accuracy by obtaining a DME of each trans-element partition after overnight growth. We isolated the plasmids from all six partitioned populations (three cis and three trans), and cloned all possible combinations of cis- and trans-element partitions to make nine new mutant libraries. We obtained a DME for each of these libraries, in the manner described previously.
Then, we obtained a prediction for the frequency of phenotypes in each of the three categories for the system mutant library consisting of low mutation probability trans- and high mutation probability cis-elements. To do so, we weighted the frequencies of phenotypes in each category of each partition combination library in Figure 5 by the frequency of that partition in the original trans-element library (Figure 2E). For example, no expression trans +no expression cis library yielded 93.4% of phenotypes in ‘no expression’ category, and 6.6% of phenotypes in ‘intermediate phenotype’ category. These were weighted by 0.686, which is the frequency of phenotypes in no expression trans-element category from which this particular partition combination library was derived (Figure 2E). All weighted frequencies in the three categories – ‘no expression phenotypes’, ‘intermediate phenotypes’, and ‘high expression phenotypes’ – across all nine partition combination DMEs were added up to obtain a prediction for the distribution of phenotypes for the whole system library. As a control that it is the presence of the cis mutants that leads to a more accurate prediction of frequencies in the three categories, we used the frequencies of phenotypes based on sorting accuracy. These two predicted distributions (experimental prediction based on partition libraries and the prediction based on sorting accuracy) were compared to the actual distributions using a Pearson’s Chi-squared test.
We tested if accounting for the genetic regulatory structure improved the naïve convolution-based prediction of the system DME. Similar to the experimental approach, we incorporated the knowledge of the effects of cis mutations in the absence of CI (F--cis). In addition to the previous analysis, we assume that trans mutants showing high expression phenotypes (namely, those mutants that have the same expression as the wildtype in the absence of CI) are loss-of-function mutants that do not bind any cis mutants. To incorporate this information into the convolution, we (i) removed the high expression peak from the trans DME; (ii) performed a convolution between the remainder of the trans DME and the fcis, (iii) introduced a cutoff, and then, (iv) instead of adding back the high expression wildtype in the absence of CI (F--wt), we add the distribution of cis mutations in the absence of CI (F--cis). This distribution is, as for the naïve convolution, added in proportion to the removed high expression trans phenotypes to normalize the whole distribution. Then, we evaluated the difference between the predicted DME and the observed system DME using a Pearson’s Chi-squared test, as previously described. We did this for the three system libraries shown in Figure 2 and for the high mutation probability cis +low mutation probability trans library, shown in Figure 4.
While considering the effects of cis mutations on RNAP binding (and hence accounting for the genetic regulatory structure of the system) explained much of intermolecular epistasis we observed in system DMEs, we wanted to evaluate the extent to which other mechanisms might be contributing to epistasis between the cis- and the trans-element. To this end, we designed a library of 150 system double mutants, by combining point mutations in cis- and trans-elements with specific phenotypes. Namely, we selected 150 trans mutants that exhibited full repression, and 150 cis-element mutants that exhibited high expression in the absence of CI (Figure 6—figure supplement 1). The system double mutant library made in this manner corresponds to the partition combination shown in Figure 5G. Note that not all 150 double mutants had a unique point mutation in the cis-element, since we could not identify 150 mutations in cis that did not significantly affect expression levels in the absence of CI. Then, we measured fluorescence levels for all double mutants and their constitutive single mutants, and from those measurements calculated epistasis, in the same manner as described above. Finally, we tested if existence of epistasis depended on the location of point mutations (Figure 6—figure supplement 2; Figure 6—source data 2) with Pearson’s Chi-squared test, as previously described.
Effects of new mutations on fitness: insights from models and dataAnnals of the New York Academy of Sciences 1320:76–92.https://doi.org/10.1111/nyas.12460
Empirical fitness landscapes and the predictability of evolutionNature Reviews Genetics 15:480–490.https://doi.org/10.1038/nrg3744
The distribution of fitness effects of new mutationsNature Reviews Genetics 8:610–618.https://doi.org/10.1038/nrg2146
“The arrival of the fittest”: Toward a theory of biological organizationBulletin of Mathematical Biology 56:1–64.
Spontaneous mutation accumulation studies in evolutionary geneticsAnnual Review of Ecology, Evolution, and Systematics 40:151–172.https://doi.org/10.1146/annurev.ecolsys.39.110707.173437
Direct deconvolution density estimation of a mixture distribution motivated by mutation effects distributionJournal of Nonparametric Statistics 22:1–22.https://doi.org/10.1080/10485250903085847
Genetic studies of the lac repressor. XIV. Analysis of 4000 altered Escherichia coli lac repressors reveals essential and non-essential residues, as well as "spacers" which do not require a specific sequenceJournal of Molecular Biology 240:421–433.https://doi.org/10.1006/jmbi.1994.1458
Reciprocal sign epistasis is a necessary condition for multi-peaked fitness landscapesJournal of Theoretical Biology 272:141–144.https://doi.org/10.1016/j.jtbi.2010.12.015
Mutational effects and the evolution of new protein functionsNature Reviews Genetics 11:572–582.https://doi.org/10.1038/nrg2808
Genotype networks shed light on evolutionary constraintsTrends in Ecology & Evolution 26:577–584.https://doi.org/10.1016/j.tree.2011.07.001
Evolution of an antibiotic resistance enzyme constrained by stability and activity trade-offsJournal of Molecular Biology 320:85–95.https://doi.org/10.1016/S0022-2836(02)00400-X
Multiple Fitness Peaks and EpistasisAnnual Review of Ecology and Systematics 26:601–629.https://doi.org/10.1146/annurev.es.26.110195.003125
Patricia J WittkoppReviewing Editor; University of Michigan, United States
In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.
Thank you for submitting your article "Intermolecular epistasis increases phenotypic variation in a gene regulatory system" for consideration by eLife. Your article has been reviewed by three peer reviewers and the evaluation has been overseen by Patricia Wittkopp as the Reviewing Editor and Senior Editor. The reviewers have opted to remain anonymous.
The reviewers have discussed the reviews with one another and the Reviewing Editor has assembled a list of concerns for you to consider.
The creative approach to investigating an important question (interactions among genetic changes affecting cis- and trans-regulatory components of a regulatory network) was applauded, but concerns were raised by two of the reviewers about the null model used to infer the effects of epistasis. Following discussion, we all agreed this is a major concern since this test underlies the core observation and advance of the paper, and must be convincingly addressed. I should note that the remedy for this concern is not straight forward to us, and we anticipate that it will require a proof of principle experiment with a small number of mutants and/or a much more sophisticated mathematical/statistical treatment to better model the null hypothesis. We are concerned that sufficiently addressing this concern will take more than the 2 months typically allowed by eLife for revisions prior to publication and/or result in findings that are less novel, but have decided to extend an opportunity to try to address this concern in a response letter.
Instead of our usual policy of consolidating the remarks, I am attaching the full set of reviews because the concerns are explained in more detail there.
1) Revise the null model to better align with prior work and theory, possibly including some proof-of-principle tests of the strategy adopted.
2) Provide stronger support for the conclusion that the interactions are epistatic (non-additive).
The manuscript uses random mutants in a trans-acting repressor and its cis target and finds epistasis such that the combined cis and trans effects are not predicted by their effects alone. The work is conceptually novel and the data provide new insight into how regulatory systems might evolve. My main concern is what the expected phenotype distribution should look like in the absence of epistasis. In part this could be due to a lack of clarity in the definition of epistasis or the clarity of writing.
1) Epistasis is defined differently in different fields. In quantitative genetics (as in this study) it is a non-additive interaction. It appears the assumption is that the cis and trans components are additive on a log scale in the absence of epistasis. However, for fitness the assumption is often that epistasis is a deviation from multiplicative effects. The definition should be clearly stated.
2) Certain epistatic relationships are entirely expected. For example, in the absence of CI the phenotype of the cis + trans mutants should equal that of cis mutants regardless of trans effect since trans mutants are not expressed. This is nicely observed in Figure 2—figure supplement 1.
3) The expected phenotype distribution in the absence of epistasis is not clear and perhaps incorrect. The interpretation of Figure 2 is that there are too many cells with intermediate expression phenotype than expected. However, under a simple additive case one should expect panel D (cis) + panel (G) trans to produce mostly intermediate phenotypes since the most common phenotype in panel D is low expression and the most common in panel G is high expression, and low + high = intermediate. That said, the authors recognize that high expression in panel G (trans) is likely non-functional CI. To account for this they subtract out the high expression of panel G using WT in the absence of CI. This doesn't make sense. When non-functional CI is combine with cis mutations one would expect to see a phenotype distribution of cis in the absence of CI, as shown in Figure 2—figure supplement 1D, which is much more uniform and has a lot of intermediate frequency phenotypes. Eyeballing it looks like this would generate an expected pattern close to that observed a lot of intermediate phenotypes. The interpretation for the intermediate class is that mutant Cis are binding to mutant cis-elements. However, I don't think the authors have clearly shown that the increased frequency of the intermediate class is different from what one expect in the absence of epistasis.
There is not an obvious solution to calculating the expected phenotype distribution. The options I see are given below. However, engaging a mathematical biologist or statistician to appropriate generate expected phenotype distributions may yield better options.
a) average of cis and trans. Note that the methods stats that convolution of cis and trans yields phenotypes outside of biological range. The average of two numbers can't be above both numbers so the convolution must be cis+trans rather than (cis+trans)/2.
b) use a categorical model where cis + non-functional trans (high) = cis in the absence of CI, cis+ functional CI (low) = cis, cis + intermediate trans = weighted average of cis effects with CI and cis effects without CI with weighting depending on the whether intermediate is closer to low or high expression.
Figure 4 also shows expectations in the absence of epistasis that don't make sense. The results show that 7/9 system libraries show prevalent epistasis because adding in the cis mutants alters the phenotypic effects of the trans mutants. The figure and interpretation now seem to use a different definition of epistasis – the one used in classical genetics. In Figure 4 the grey shows unmodified trans and orange shows trans which is modified by cis. Panel G is an example where low trans is combined with high cis mutants. Unmodified (grey) is low, modified (orange) is intermediate or high. The observed distribution is spread across low, intermediate and high expression. But this is exactly what one would expect under an additive model of low + high effects.
4) The convolution of libraries to get an expectation. A gamma distribution was used for the cis library. It would be better to use the actual empirical distribution through random sampling of cis effects and trans effects with replacement. What assumption was made about the trans distribution? It is bimodal so not easily fit to a standard distribution. Is it a gamma after subtraction?
5) Noise in expression should be mentioned as it could contribute as well to the sorting accuracy statements.
6) Properties of mutant library. How was the average mutation frequency measured (1%-7%), and are mutants Poisson distributed. Simply using estimates provided by mutagenesis kit is not a sufficient measure of the library complexity. The manuscript states that 40 clones were Sanger sequenced. Is this 40 for each of the low, intermediate and high? What are the observed average number of mutants? Simply stating that they conform to the expected distribution given by the kit is not ok, you should use the empirical estimate obtained from sequencing.
7) What is the frequency of plasmids with no insert from cloning, either for the CI protein or the cis element? Typically this is low, but clones are confirmed this way. In high throughput experiments there will always be some frequency of plasmids ligated without an insert.
Lagator et al. measure how mutations lead to phenotypic variation in gene expression at the systems levels. This minimal system based on phage lambda contains the CI repressor and a constitutive promoter driving venus-yfp. Three types of mutagenesis libraries were created: the 'cis' library mutated the constitutive promoter, the 'trans' library mutated the protein coding sequence of the CI repressor, and the 'system' library mutated combined both sets of mutations. For each library flow cytometry was used to measure the Distribution of Mutational Effects (DME), the phenotypic variation in gene expression of the population. The main finding is that the quantitative shapes of the DME are different for the cis, trans, and combined libraries. In particular there is an excess of constructs in the combined library with intermediate expression levels. The authors claim epistasis between cis and trans mutations must be invoked to explain the intermediate level phenotypes in the DME of the combined libraries. The authors further discuss how phenotypically neutral mutations may express phenotypes in combination with other mutations.
I may be missing something here, but I think the main result of this manuscript may be trivial. There is a straw man hypothesis in the text which says that "the intuitive expectation (is) that an increase in the number of mutations ought to result in an increase in non-functional ('no expression') phenotypes". I agree that an increased mutation will lead to more loss of function mutations, but in this system loss of function trans mutants in CI increase expression while loss of function cis mutations in the promoter either decrease expression through decreased polymerase binding or increase expression through decreased CI binding. We might very well expect the combined library to have more intermediate phenotypes as loss of function mutations that both increase and decrease expression average each other out. One need not necessarily invoke epistasis to explain the increase in intermediate phenotypes in the combined library.
I also disagree with the primary interpretation that there are many "neutral" cis mutations that then manifest phenotypically in combination with a trans mutation. This is one plausible interpretation. An opposite interpretation is that there are many highly penetrant trans mutations (21.7% in Figure 2E) and that in combination with a cis mutation the effects of these trans mutations are buffered. The 10% increase in intermediate phenotypes in Figure 2H almost exactly mirrors the 10% decrease in high expressing phenotypes. This suggests that a large fraction of intermediate phenotypes come from highly penetrant trans mutations being buffered by cis mutations, and not from silent cis mutations that interact with trans mutations. In other words the mass in the DME moves from the high expressing bin into the medium expressing bin, not from the low expressing bin into the medium bin.
This manuscript describes the DME for interacting cis- and trans-regulatory sequences in a well-defined regulatory system. The primary finding is that epistatic interactions between mutations in these two components produce a larger range of phenotypes than variation in either single component. On the one hand, this type of epistasis is perhaps required to emerge from the known interactions of CI and the cis-sequence in the system. On the other hand, the quantitative consequences of this epistasis have rarely been described in detail and I think it is interesting to see how these interactions shape the phenotypic space explored. The use of mutant alleles with multiple mutations and the absence of any discussion of the identity of mutations mediating the observed epistasis that would have provided more insight into molecular mechanisms reduced my enthusiasm for this work, however. In addition, how much does intramolecular epistasis contribute to the patterns reported? One point where these questions are ameliorated is in the analysis of 109 single point mutations in cis and 73 in trans, but the locations of these changes with CI and the promoter are not described. Looking at the identity of these mutations in more detail might provide some insight into the specific interactions between cis and trans acting factors that produced the intermediate expression phenotypes.
[Editors' note: further revisions were requested prior to acceptance, as described below.]
Thank you for resubmitting your work entitled "Intermolecular epistasis increases phenotypic variation in a gene regulatory system" for further consideration at eLife. Your revised article has been favorably evaluated by three peer reviewers, and the evaluation was overseen byPatricia Wittkopp as the Reviewing and Senior Editor. The following individual involved in review of your submission has agreed to reveal his identity: Justin Fay.
The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:
We appreciate the authors response to the reviewer's comments and inclusion of additional data addressing the concerns raised. For example, the definitions of epistasis and the methodology used to compute the naive DME are now more clear and easier to understand. We also appreciated the comparison to an empirically derived DME that accounts for our molecular knowledge of the components of CI system in phage. We remain convinced this is an interesting dataset addressing an interesting question, but also remain concerned that the conclusions drawn depend on the assumptions of the model, some of which we think are not plausible. We also agree, however, that it is not clear what the "correct" set of assumptions should be, so are supportive of publication despite these concerns.
In light of this uncertainty, we think a modification of the title and adjustment of the conclusions is appropriate. For example, we think the title should convey that the structure of regulatory circuits determines patterns of epistasis rather than regulatory circuits generate lots of unexpected epistasis.
In addition, we ask that the authors clarify their work further (no new data is needed). For example, two areas that seem fundamental to understanding the paper are: Does low + low = high expression under a naive model? If so what does low + high equal under an additive model? I'm still not sure. Statements like: "increasing the number of mutated components should introduce additional constraints, limiting the variation accessible through mutation" remain confusing. I am including the full comments from reviewer #1 below because they explain these remaining questions more fully.
In this resubmitted manuscript, the authors revised their analysis and included substantial additional data. Primarily, they measured expression from 150 point mutations along with their double mutants. Overall the manuscript is greatly improved: it is more clearly presented and the individual single double mutant assays provides much greater confidence in their main result – epistasis between cis-trans mutants such that mutant trans-elements can bind mutant cis-elements generating expression patterns not expected from either mutant alone. However, as brought up in the initial review, the calculation of the double mutant expectations is problematic. In part, this may be related to clarity/understanding, but it could also indicate a problem in how these expectations were calculated. The expectations that I find problematic occur in Figure 2, but also in the double mutants, Figure 3 along with Figure 2—figure supplement 2.
Overall, there are some strong indications of surprising epistasis. For me this came from looking at Figure 3—figure supplement 3 through Figure 3—figure supplement 5 showing both the doubles and singles. However, eyeballing it is not easy and it would be much easier to read using bargraphs of single, single, double (obs) and expected. The examples of doubles with negative epistasis (Figure 3—figure supplement 3) seem to be quite small deviations since they look simply like a combination of the two single mutants. However, the cases with positive epistasis are striking in that many show low + low = intermediate rather than low which is what I believe the expectation to be.
While the examples are nice, the main analysis contains expectations that I don't find logical for the naive analysis. "Increasing the number of mutated components should introduce additional constraints, limiting the variation accessible through mutation": I disagree, given two sources of variation, combining them will increase variation beyond each individual component.
Central to the calculation of epistasis is the use of the convolution of cis + trans effects to derive an expectation for the system (doubles). This expectation is shown in Figure 2. The question is what do we expect when we combine a low (cis) with either a low (trans) or high (trans) expression mutant. The convolution predicts this will mostly be high with a small amount of intermediate and low. Under a simple additive model one would expect low + low = low, and low + high = intermediate, which is quite similar to what is found. What is not clear to me is whether this is a problem in calculating the convolution of three DMEs or the assumptions in applying the convolution to get an expected level of expression. I think there must be clarity and agreement on what the expectation of low + high should be from the cis and trans library.
The 150 single mutants show similar patterns to what I would expect based on an additive model.
71/150 deviate from additive expectation. However, F2-2 shows that most single mutants have no effect (i.e. low expression). Why then do most of the observed doubles have an effect in the range of 1-3 when their effect should be zero? These observations are at odds with one another.
If the argument that high (trans) + low (cis) should be high expression because the repressor doesn't work, then this is exactly what one would expect if you include epistasis as a consequence of the way the regulatory system works and so is not really insightful. While this is a fine assumption to make later (non-naive), the simplest naive expectation needs to be understandable before making things more complicated.
Why didn't the positional information predict those that affect expression? One would expect that changes in binding sites for RNAP or CI would have quite different effects on expression.https://doi.org/10.7554/eLife.28921.036
- Mato Lagator
- Jonathan P Bollback
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
We thank N Barton, T Bergmiller, A Betancourt, K Bod’ova, C Igler, C Nizak, T Paixão, MPleska, D Siekhaus, M Steinrueck, and G Tkačik for their invaluable comments on the manuscript. This work was supported by the People Programme (Marie Curie Actions) of the European Union's Seventh Framework Programme (FP7/2007-2013) under REA grant agreement n°  to ML and European Research Council under the European Union's H2020 Programme (FP/2007–2013)/ERC Consolidator Grant [n. 648440] to JPB.
- Patricia J Wittkopp, Reviewing Editor, University of Michigan, United States
© 2017, Lagator et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.