Altered expression of a quality control protease in E. coli reshapes the in vivo mutational landscape of a model enzyme

Abstract
Introduction
Results
Discussion
Materials and methods
Appendix 1
Data availability
References
Article and author information
Metrics

Abstract

Protein mutational landscapes are shaped by the cellular environment, but key factors and their quantitative effects are often unknown. Here we show that Lon, a quality control protease naturally absent in common E. coli expression strains, drastically reshapes the mutational landscape of the metabolic enzyme dihydrofolate reductase (DHFR). Selection under conditions that resolve highly active mutants reveals that 23.3% of all single point mutations in DHFR are advantageous in the absence of Lon, but advantageous mutations are largely suppressed when Lon is reintroduced. Protein stability measurements demonstrate extensive activity-stability tradeoffs for the advantageous mutants and provide a mechanistic explanation for Lon’s widespread impact. Our findings suggest possibilities for tuning mutational landscapes by modulating the cellular environment, with implications for protein design and combatting antibiotic resistance.

Introduction

Natural protein sequences are constrained by pressures to maintain required structures and functions within a complex cellular environment. However, key cellular factors shaping protein sequences (such as interactions with cellular binding partners or with the proteostasis machinery) are often unknown. To characterize functional constraints, it has been useful to determine mutational landscapes of proteins, which we define here as the effects on growth of every possible single amino acid mutation in the protein, via deep mutational scanning (Boucher et al., 2016; Fowler and Fields, 2014). Deep mutational scanning studies have provided insights into evolution of new protein functions (McLaughlin et al., 2012; Stiffler et al., 2015; Wrenbeck et al., 2017), protein design (Tinberg et al., 2013; Whitehead et al., 2012), functional trade-offs (Klesmith et al., 2017; Steinberg and Ostermeier, 2016), and adaptation to altered environments (Hietpas et al., 2013). With a few exceptions (Bandaru et al., 2017; Hietpas et al., 2013; Jiang et al., 2013; Stiffler et al., 2015), however, these studies find a general tolerance to mutation for residues outside of active sites and binding interfaces (Araya et al., 2012; Boucher et al., 2016; Klesmith et al., 2017; Roscoe et al., 2013; Wrenbeck et al., 2017) that is often explained by the absence of key environmental constraints under the selection conditions (Bandaru et al., 2017; Jiang et al., 2013; Stiffler et al., 2015).

To study the impact of multiple constraints on mutational tolerance during selection, we chose E. coli dihydrofolate reductase (DHFR) as a model system. DHFR is an essential enzyme within folate metabolism that reduces dihydrofolate to tetrahydrofolate and is necessary for thymidine production. Using this activity as the basis for an in vivo selection assay (Reynolds et al., 2011), we aimed first to measure a mutational landscape for DHFR and then to determine how a change to the cellular environment might affect the landscape. Because DHFR is known to progress through multiple conformational states during catalysis (Boehr et al., 2006; Sawaya and Kraut, 1997; Figure 1—figure supplement 1), we expected the mutational landscape of DHFR to be constrained by the requirement to adopt these different conformations. Moreover, prior work had suggested DHFR is impacted by cellular constraints such as protein quality control (Bershtein et al., 2013) and the build-up of a toxic metabolic intermediate (Schober et al., 2019). We hence expected deep mutational scanning to reveal a highly constrained mutational landscape for DHFR that would contrast with the mutational tolerance observed in other systems.

Results

As the basis for our studies, we first sought to establish highly sensitive selection conditions for DHFR function that would be calibrated to DHFR enzymatic velocity (rate of DHF conversion per molecule of DHFR) and capable of resolving mutants with velocities near-to or faster-than wild-type. We anticipated that we would need to control DHFR protein expression (intracellular abundance) levels because two prior studies that modified the chromosomal DHFR gene had reported an overall high mutational tolerance under permissive selection conditions (Garst et al., 2017) and that DHFR abundance can be reduced to ~30% without a growth impact (Bershtein et al., 2013). We used an E. coli strain derived from ER2566 with the genes for DHFR and a downstream enzyme, thymidylate synthase, deleted in the genome and complemented on a pACYC-DUET plasmid with a weak ribosome binding site (see Materials and methods) that results in DHFR abundance at approximately 10% of the endogenous protein level (Figure 1—figure supplement 2, Figure 1—source data 1). To tightly control growth conditions, we performed selections in a turbidostat to maintain the culture in early Log phase growth (Figure 1A, Figure 1—figure supplement 3A). To quantify the effects of DHFR mutations on growth, we calculated selection coefficients (Rubin et al., 2017) from the change in allele frequency over time by deep sequencing of timepoint samples determined in biological triplicate (Figure 1B). For a panel of 14 DHFR mutants, we confirmed that the selection coefficients obtained from deep mutational scanning correlated linearly with growth rates measured separately for the individual variants in a plate reader (Figure 1—figure supplement 3B, Figure 1—source data 2), as expected. Furthermore, under our controlled selection conditions, we observed a linear relationship between selection coefficient and in vitro velocity (Figure 1C) at cytosolic substrate concentrations (Bennett et al., 2009; Kwon et al., 2008) for these DHFR mutants (Figure 1—source data 3). These results confirm that selection coefficients between −1.5 and 1.0 in our experiment are correlated with DHFR enzymatic velocity over approximately 3 orders of magnitude, and that selection can resolve mutants with higher velocities than wild-type level velocity.

Figure 1 with 7 supplements see all

Download asset Open asset

*E. coli* DHFR deep mutational scanning uncovers many advantageous mutations.

(A) Turbidostat schematic. Reoccurring dilutions with fresh medium keep the culture optical density (OD600) below 0.075. (B) The selection coefficient for each mutant is the slope of the linear regression of allele frequency over time. The wild-type (squares) value is normalized to zero. Advantageous (red) mutations increase and disadvantageous (blue) mutations decrease in frequency. (C) Selection coefficients from deep mutational scanning as a function of enzymatic velocity for purified DHFR point mutants measured in vitro. Velocities at 20 µM DHF were calculated from Michalis-Menten parameters. Error bars reflect the standard deviation from three biological replicates. (D) Histogram of selection coefficients. The wild-type value is indicated with a vertical black line. The median standard deviation over all mutations is the cut-off for WT-like behavior (Materials and methods, Figure 1—figure supplement 3, Figure 1—figure supplement 4) and is indicated with dashed lines. Mutation are colored as advantageous (red), disadvantageous (blue), WT-like (white), or null (grey). (E) Structural model of DHFR (PDB ID: 3QL3) with cross-section slices (**a–e**) indicated. The DHF substrate (green) and the NADPH cofactor (purple) are represented by spheres (yellow carbons and heteroatom coloring). An arrow indicates the perspective for each slice. (**a–e**) five cross-section slices. Color scale indicates numbers of advantageous mutations at each position. Crosshatching indicates residues with >20% solvent accessible surface area.

Figure 1—source data 1 Soluble DHFR expression levels in molecules per cell measured from lysate activity assays as described in Materials and methods. The location of the DHFR gene is listed in parenthesis in the first column. Expression values corresponds to the cell strain in the column heading.: https://cdn.elifesciences.org/articles/53476/elife-53476-fig1-data1-v2.xlsx
Download elife-53476-fig1-data1-v2.xlsx
Figure 1—source data 2 Selection coefficients for –Lon selection (Figure 1—source data 1) compared to monoculture growth rates measured in a plate reader in ER2566 ∆folA/∆thyA (–Lon) as described in Materials and methods. For values listed as ND, no detectable change in OD was measured during a 30 hr growth period.: https://cdn.elifesciences.org/articles/53476/elife-53476-fig1-data2-v2.xlsx
Download elife-53476-fig1-data2-v2.xlsx
Figure 1—source data 3 Michaelis-Menten kinetics for the set of DHFR mutants (Fierke and Benkovic, 1989; Huang et al., 1994; Reynolds et al., 2011) used to calibrate the selection are reported together with the reference from which the values were taken.: https://cdn.elifesciences.org/articles/53476/elife-53476-fig1-data3-v2.xlsx
Download elife-53476-fig1-data3-v2.xlsx

We next analyzed the deep mutational scanning data for all possible DHFR single point mutants under the calibrated selection conditions (Figure 1D, Supplementary file 1). All pairwise replicates were related with a Pearson correlation R² value of 0.70 and the median standard deviation between replicates for all selection coefficients was 0.2 (Figure 1—figure supplement 3C–E). Using this value, we defined the selection coefficient interval of 0 ±0.2 as WT-like behavior. Within this interval, the standard deviation of the selection coefficients between replicates was not correlated with changes in selection coefficient (Figure 1—figure supplement 4A). Moreover, our WT-like threshold of 0.2 was greater than the value of 0.12 for the standard deviation for wild-type synonymous codons (Figure 1—figure supplement 4B). Based on these considerations, we defined DHFR mutations with selection coefficients of <−0.2 and >0.2 as disadvantageous and advantageous, respectively. Mutations that were depleted during overnight growth (under less stringent conditions using a supplemented growth medium, see Materials and methods) were assigned a null phenotype. As expected, mutations at DHFR positions that are known to be functionally important (M20, W22, D27, L28, F31, T35, M42, L54, R57, T113, G121, D122, and S148) were generally disadvantageous or null mutations (Figure 1—figure supplement 5). These results indicate that our selection assay is a sensitive reporter of functionally important residues and that our results are consistent with previous biochemical characterization of DHFR.

In previous deep mutational scanning experiments, stringent selection typically revealed many disadvantageous mutations (Garst et al., 2017; Jiang et al., 2013; Mavor et al., 2016; Mavor et al., 2018; Stiffler et al., 2015). In contrast, the most striking observation under our conditions is the large fraction of advantageous mutations (red, Figure 1D): 736 of 3161 possible variants were advantageous (23.3%), and wild-type DHFR only ranked 1203^rd (although 467 of the 1202 higher-ranking variants fall into the WT-like interval). In direct measurements of individual growth rates under our selection conditions, the top two DHFR variants (W47L and L24V) led to increases in growth rate of 40% and 76%, respectively, when compared to wild-type DHFR (Figure 1—figure supplement 6). Advantageous mutations were widely distributed over 127 of the 159 positions of DHFR (Figure 1E). Furthermore, when we examined the DHFR structure, many of the advantageous mutations appeared to disrupt key side-chain interactions, for example by disrupting atomic packing interactions or surface salt-bridges (Figure 1—figure supplement 7).

To understand the origins of this counter-intuitive prevalence of advantageous mutations, we looked for cellular factors potentially affecting our mutational landscape. Our selection strain (Anton et al., 2016), like most standard expression strains of E. coli, is naturally deficient in Lon protease (Gur and Sauer, 2008) due to an insertion of IS186 in the lon promoter region (saiSree et al., 2001). Lon is a major component of protein quality control in E. coli (Powers et al., 2012; Sauer and Baker, 2011) responsible for degrading poorly folded proteins. Moreover, Lon had previously been implicated in degrading DHFR unstable variants in E. coli (Bershtein et al., 2013; Cho et al., 2015), and deleting Lon in an MG1655 strain of E. coli masked the deleterious impact of 2 destabilizing mutations out of a panel of 21 mutants tested in growth experiments at 30 °C (Bershtein et al., 2013). Although these 21 mutants were selected for minimal impacts on Michaelis-Menten kinetic parameters, we reasoned that the absence of Lon could be responsible for the large fraction of advantageous but potentially destabilizing mutations observed in our selection.

To test this prediction, we reintroduced chromosomal Lon expression under the control of a constitutive promoter in our selection strain, and repeated deep mutational scanning in biological triplicate (Supplementary file 2). We refer to the two regimes as +Lon and –Lon selection. The quality of +Lon selection was comparable to that of –Lon selection (Figure 2—figure supplement 1, Figure 2—figure supplement 2). Consistent with our hypothesis, the distribution of selection coefficients shifted towards more negative values in the +Lon selection, depleting positive selection coefficients and enriching for negative or null coefficients (Figure 2A). The number of advantageous mutations after reintroducing Lon decreased from 737 in –Lon selection to 384 in +Lon selection (Figure 2B), the mean selection coefficient for advantageous mutations decreased from 0.47 to 0.37, and the rank of the wild-type sequence increased by 341 to 864^th (where 479 of the 863 higher-ranked variants are in the WT-like interval) (Figure 2—figure supplement 3). The median rank of the wild-type residue over all positions decreased from eight in –Lon selection to five in +Lon selection (Figure 2—figure supplement 4).

Figure 2 with 5 supplements see all

Download asset Open asset

Lon protease expression reshapes the mutational landscape.

(A) Histogram of selection coefficients for mutations (top) in –Lon (grey) and +Lon selection (green). The difference of the histograms (bottom) is shown with grey indicating more mutants for –Lon selection and green indicating more mutants for +Lon selection. The threshold for classification for advantageous and disadvantageous mutations is as in Figure 1 and indicated with dashed lines. (B) Distribution of mutations classified by selection coefficients: 0.2 ≤ advantageous (adv.), 0.2 > WT like > –0.2, –0.2 ≥ disadvantageous (disadv.), null, and no data (a mutant was not detected in the library after transformation into the selection strain). Grey bars: –Lon selection; green bars: +Lon selection. (C) Distribution of sequence positions into the five mutational response categories: Beneficial, Tolerant, Mixed, Deleterious, Intolerant. Grey bars: –Lon selection; green bars: +Lon selection. (D) Heatmap of DHFR selection coefficients in the –Lon and +Lon strains, showing details of the distributions shown in C) (dotted border). Positions (rows) are grouped by their mutational response category for –Lon and +Lon as in C) and sorted by the wild-type amino acid. Amino acid residues (columns) are organized by physiochemical similarity and indicated by their one-letter amino acid code. An asterisk indicates a stop codon. Advantageous mutations are shown in shades of red, disadvantageous mutations in shades of blue, Null mutations in grey and ‘No data’ as defined in A) in black. Wild-type amino acid residues are outlined in black.

To examine in more detail how the mutational response of individual residues changes between selection ±Lon, we used a K-means clustering algorithm (see Materials and methods) to group all DHFR sequence positions into five categories: positions where mutations were generally advantageous (Beneficial), generally WT-like (Tolerant), variably advantageous and disadvantageous (Mixed), generally disadvantageous (Restricted), and generally null (Intolerant). Grouping was performed separately for –Lon and +Lon selection (Figure 3—source data 1). Comparing the distributions of DHFR positions in –Lon and +Lon conditions illustrates the extensive reshaping of the mutational landscape by Lon (Figure 2C,D). For –Lon selection, 28 positions (17.6%) were classified as Beneficial, where nearly every mutation was preferred over the wild-type residue. In comparison, the number of Beneficial positions decreased to 10 in +Lon selection, with only three surface-exposed positions (E48, T68, D127) common between the two Beneficial sets. Simultaneously, the number of Restricted positions increased from 42 to 67 with the reintroduction of Lon into the selection strain (Figure 2C). These results support the conclusion that Lon activity broadly penalizes mutations, including a large subset of the advantageous mutations. Overall, the changes upon modulating Lon activity lead to a model in which upregulating Lon increases constraints on DHFR, and the mutational landscape changes from being permissive when Lon is absent to being more restricted when Lon is present (Figure 2D).

To analyze the constraints imposed by Lon on the DHFR mutational landscape in structural detail, we defined a ∆selection coefficient for each amino acid residue at each position as the difference between the +Lon and –Lon selections (Figure 3A). The ∆selection coefficient values were most negative at positions in the Beneficial category and at positions with a native VILMWF or Y amino acid residue (Figure 3B, excludes Intolerant positions from –Lon selection); overall, mutations at positions with native hydrophobic residues are enriched for negative ∆selection coefficients (Figure 3—figure supplement 1A). Strikingly, the mean ∆selection coefficient was –0.71 for the 65 buried positions with <20% side-chain solvent accessible surface area, compared to –0.27 for the 79 exposed positions (Figure 3C, Figure 3—figure supplement 1B, Figure 3—source data 1). These results show that Lon has a broad impact on the mutational landscape throughout the DHFR structure but imposes particularly strong constraints in the DHFR core.

Figure 3 with 1 supplement see all

Download asset Open asset

Delta selection coefficients show Lon impact.

(A) Conceptual diagram of ∆selection coefficients, calculated as the +Lon selection coefficient minus the –Lon selection coefficient (see Materials and methods). (B) Heatmap of ∆selection coefficient values for all positions not classified as Intolerant. ∆selection coefficients values between –0.2 and 0.2 are shown in white; ∆selection coefficients >0.2 are in shades of red and ∆selection coefficients <–0.2 in shades of blue. Amino acid residues (columns) are organized by physiochemical similarity and indicated by their one-letter amino acid code. The mean ∆selection coefficient (avg) at each position is shown as a separate column and outlined with a light blue box. Positions (rows) are sorted by the wild-type amino acid and grouped by their mutational response category from the –Lon selection in Figure 2C,D. Positions with a native VILMWF or Y amino acid are indicated with an orange bar to the left. (C) Per-position mean ∆selection coefficient displayed on the structural model of DHFR. The five cross-section slices of the DHFR structure are displayed as in Figure 1E, and the color scale is as in B).

Figure 3—source data 1 Burial classification for DHFR positions from the Getarea server (Fraczkiewicz and Braun, 1998) as described in Materials and methods.: https://cdn.elifesciences.org/articles/53476/elife-53476-fig3-data1-v2.xlsx
Download elife-53476-fig3-data1-v2.xlsx

To determine why mutations in DHFR were advantageous in the absence of Lon but less so in its presence, we selected a subset of mutations for more detailed characterization in individual experiments. We considered all positions with more than one mutation in the top 100 most advantageous mutations for the –Lon condition. We describe these positions by their location in one of four structural regions that appear to be hot-spots for the top advantageous mutations (Figure 4A,B, Figure 4—figure supplement 1): 1) exchanges between hydrophobic residues at core positions, 2) disruptions of surface residues on the beta-sheet near the active site, 3) disruptions of polar interactions with the adenine ring of NADPH, or 4) mutations to the active site or M20 loop that controls access to the active site. At these positions, we selected strongly advantageous mutations. Where possible, we selected two mutations at the same position but with significantly differing Lon sensitivities such that the set had a range of ∆selection coefficients from −0.07 to −1.46, with the exception of L24V that had a positive ∆selection coefficient. We first confirmed that the selected advantageous mutations indeed had higher cytosolic DHFR activity (the total rate of conversion of DHF to THF) in ER2566 ∆folA/∆thyA (–Lon) lysates relative to the activity for WT DHFR (Figure 4—figure supplement 2), consistent with the deep mutational scanning results.

Figure 4 with 9 supplements see all

Download asset Open asset

Advantageous mutations arise from an interplay of increased enzymatic velocity and increased abundance in the absence of Lon.

(A) DHFR structure with mutational hot-spots. For positions with two or more top 100 advantageous mutations in the absence of Lon, the beta carbon is depicted as a sphere scaled according to the number of top mutations. For mutants selected for in vitro characterization, the beta carbon is colored according to its location in the DHFR structure: core (purple), surface beta-sheet (gold), proximal to the adenine ring on NADPH (blue), or proximal to the active site and M20 loop (red). Positions for advantageous mutants from the calibration set are depicted in dark grey. (B) The structure from A) rotated 90° clockwise. (C) In vitro velocities of purified DHFR wild-type and point mutants measured at 20 µM DHF. Bars are colored in reference to the hot-spots in A). Error bars represent ±1 standard deviation from three independent experiments (Materials and methods). The dashed line represents the velocity of WT DHFR. (D) DHFR cellular abundance calculated from the lysate DHFR activity in Figure 4—figure supplement 2 and in vitro kinetics with purified enzyme (see Materials and methods). Error bars represent the cumulative percent error (standard deviation) from three independent experiments for velocity and three biological replicates for lysate activity. Data are shown in both the -Lon (light grey) and +Lon (green) conditions. The dashed line represents the WT expression level of DHFR in the –Lon background. Mutants are in the same order as in C) (see Figure 4—source data 2; four mutants were not measured). (E) Cellular abundance of DHFR vs. in vitro velocities of purified DHFR wild-type and point mutants measured at 20 µM DHF. Points are colored as in A). Error bars represent ±1 standard deviation from three independent experiments (Materials and methods). The dashed line represents WT-level DHFR activity, i.e. DHFR abundance/velocity pairs whose product is equivalent to [DHFR]_WT • velocity_WT. (F) Correlation between in vitro T_m values and in vivo ∆selection coefficients for DHFR wild-type and characterized mutants. Points are colored as in A). (G) ∆T_m values and ∆∆selection coefficient for mutations at the same position. Points representing comparison between mutants are numbered as follows: 1) D116I-M, 2) M42Y-F, 3) W30M-F, 4) I91G-A, 5) Q102W-L, 6) L62A-V, 7) I41A-V, 8) W47V-L.

Figure 4—source data 1 In vitro velocity for selected advantageous measured as described in Materials and methods at multiple concentrations of DHF are reported with the standard deviation over three independent experiments.: https://cdn.elifesciences.org/articles/53476/elife-53476-fig4-data1-v2.xlsx
Download elife-53476-fig4-data1-v2.xlsx
Figure 4—source data 2 Soluble DHFR abundance levels in molecules per cell measured from lysate activity assays as described in Materials and methods. All values are for the SMT205 plasmid transformed into the cell strain in the column heading. NM, not measured.: https://cdn.elifesciences.org/articles/53476/elife-53476-fig4-data2-v2.xlsx
Download elife-53476-fig4-data2-v2.xlsx
Figure 4—source data 3 Apparent T_m values from thermal denaturation experiments monitored by CD signal at 225 nm are reported along with the ∆selection coefficient (Lon impact) value depicted in Figure 4D.: https://cdn.elifesciences.org/articles/53476/elife-53476-fig4-data3-v2.xlsx
Download elife-53476-fig4-data3-v2.xlsx

The lysate activity assay reports on both the enzymatic activity of a DHFR variant and its intracellular abundance, [DHFR] (Bershtein et al., 2015b; Dykhuizen et al., 1987). To separate the two contributions, we purified each of the DHFR variants and determined their enzymatic velocity in vitro using concentrations of DHF that are consistent with estimates of cytosolic DHF concentration based on mass spectrometry measurements (Kwon et al., 2008). At 20 µM DHF, 16 the mutants had velocities equal and up to three-fold higher than that of WT (Figure 4C, Figure 4—figure supplement 3, Figure 4—source data 1). In contrast, the other eight mutants had velocities as much as two-fold lower than that of WT at the same DHF concentration. These results show that the higher cytosolic DHFR activity of the advantageous mutations can only partially be explained by changes in the kinetic parameters for these mutants.

We therefore examined the soluble intracellular abundance of these mutants. In the absence of Lon, we observed that mutant abundance levels varied from close-to-wild-type levels to a 20-fold increase over wild-type (Figure 4D, Figure 4—figure supplement 4, Figure 4—source data 2). Importantly, abundance decreased for most mutants in the presence of Lon (Figure 4—figure supplement 4), as expected, and these abundance decreases correspond to decreased selection coefficients (negative values in the ∆selection coefficients from Figure 3 that report on the Lon impact on selection (Figure 4—figure supplement 5)). Moreover, when considering both velocity and abundance the expected total cellular DHFR activity ([DHFR] • velocity) is increased compared to wild-type for the majority of advantageous mutants (Figure 4E, Figure 4—figure supplement 6, positions above the dotted line indicate expected cellular activity greater than wild-type). However, the expected total cellular DHFR activity is not a strong quantitative predictor of the advantageous mutants in –Lon selection (Figure 4—figure supplement 7, Figure 4—figure supplement 8). We attribute discrepancies at least in part to the difficulty of accurately quantifying rather small differences in activity and abundance, in addition to other potential complicating factors such as differential activity of cellular chaperones for different DHFR variants (Cho et al., 2015), and feedback regulation that could affect cellular concentrations of the substrate DHF (Bershtein et al., 2015a; Kwon et al., 2008). Nevertheless, our velocity and abundance measurement are in qualitative agreement with the in vivo selection. Taken together, these results suggest that increased selection coefficients arise from an interplay of effects of the mutations on cellular abundance and catalytic activity (Dykhuizen et al., 1987), and that each parameter alone is insufficient to explain the majority of the advantageous mutations. Moreover, Lon suppresses advantageous mutations at least in part by reducing their cellular abundance.

To test more directly whether advantageous mutations in DHFR destabilize the protein and whether this destabilization could explain the sensitivity to Lon expression, we measured apparent melting temperature (T_m) values from non-reversible thermal denaturation monitored by circular dichroism spectroscopy. We found that many of the advantageous mutations considerably destabilized the protein (Figure 4F, Figure 4—figure supplement 9, Figure 4—source data 3). Moreover and as expected, the ∆selection coefficients between +Lon and –Lon selection (Figure 3) are correlated with T_m (Figure 4F), except for mutations near the active site. Strikingly, when we compare different mutations at the same position, the change in ∆selection coefficients (i.e. Lon sensitivity) correlates with the change in T_m values (Figure 4G). These results indicate that the many of the selected advantageous mutations are destabilizing, and that destabilization is correlated with Lon sensitivity. One possible explanation for the selection advantage of the subset of destabilizing mutations with increased k_cat (e.g. L24V, W30F/M, M42F/Y, H114V, D116I/M, E154V) is that these mutations promote breathing motions that accelerate product release, which is rate limiting for wild-type DHFR at neutral pH (Oyen et al., 2017) and for a hyperactive DHFR mutant with a 7-fold increase in k_cat(Iwakura et al., 2006).

Taken together, our data indicate that the observed widespread changes in the mutational landscape of DHFR can be explained by a penalty for destabilizing mutations from Lon expression, leading to extensive activity – stability tradeoffs for advantageous mutations. The effect of these two selection pressures is directly observable in the structural arrangement of the mutational response categories (Figure 5, Figure 5—figure supplement 1). In –Lon conditions, mutational responses are arranged in shells around the hydride transfer site (Liu et al., 2013; Figure 5A, top), where the proportion of advantageous mutations increases with increasing distance (Figure 5B). This same spatial pattern also holds for +Lon selection (Figure 5A, bottom), but it is now superimposed with the additional pressure against destabilizing mutations such that there are no Beneficial positions in the core (Figure 5C, Figure 5—figure supplement 2). In contrast, the mutational responses as a function of distance to other DHFR sites (e.g. C5 of the NADPH adenine ring) do not show as strong of a relationship (Figure 5—figure supplement 3). These findings illustrate how the contributions from two constraints – one structural (distance from hydride transfer) and one dependent on cellular context (Lon) – can be distinguished from structural patterns in the mutational landscape.

Figure 5 with 3 supplements see all

Download asset Open asset

Structural characterization of multiple constraints on the DHFR mutational landscape.

(A) Mutational response categories from –Lon selection (top, categories in Figure 2C,D) and +Lon selection (bottom, categories as in Figure 2C,D) colored onto residues and displayed on slices as in Figure 1E. (B) Relationship between mutational response and distance from hydride transfer for –Lon selection. The percent of positions from each mutational response category are plotted as a function of distance from the site of hydride transfer. Each category colored as in A), top). (C) Relationship between mutational response and distance from hydride transfer for +Lon selection. Each category colored as in A), bottom).

Discussion

The naturally occurring insertion in the Lon promoter in our original selection strain, in combination with our stringent selection conditions, allowed the serendipitous discovery that advantageous mutations are remarkably prevalent throughout the DHFR structure but are also highly sensitive to Lon. The large fraction of advantageous mutations to DHFR appears to conflict with the fixation of the wild-type DHFR sequence during evolution. While Lon expression in our selection increases both the relative rank of the WT DHFR sequence (Figure 2—figure supplement 4) and the similarity between amino acid preferences from selection and from bacterial DHFR orthologues (Figure 2—figure supplement 5), there are still considerable differences: There are still 384 advantageous mutants that rank substantially better than the WT sequence even in the presence of Lon, and the amino preferences in the two selection experiments (±Lon) are more similar to each other than either is to the preferences from bacterial DHFR orthologues.

Considering these differences, we note several caveats in comparing our selection results to selection in evolution: First and most generally, screening DHFR variants under calibrated selection conditions (such as defined temperature, medium, and growth kept in early log phase) for a few generations is not expected to recapitulate the natural selection pressures on E. coli DHFR on evolutionary timescales. Second and more specifically, our selection conditions were intentionally engineered to be highly sensitive to mutations by dampening DHFR abundance to approximately 10% of the endogenous level (Figure 1—figure supplement 2). In contrast, endogenous DHFR is expected to be buffered from mutational impacts. Increasing DHFR activity or abundance in E. coli several-fold above that in wildtype strains does not increase fitness, and, conversely, reducing DHFR abundance in E. coli does not have an impact on growth until abundance is below 30% of the endogenous level (Bershtein et al., 2013; Bhattacharyya et al., 2016). Indeed, selection on mutations to the chromosomal DHFR gene did not reveal strong mutational impacts in the absence of the anti-folate drug trimethoprim (Garst et al., 2017). Third, chromosomal DHFR expression is modulated through feedback mechanism (Bershtein et al., 2015a), and it would be an interesting question how the distribution of fitness effects of DHFR mutations will be shaped by the presence of such a regulatory expression element that is absent in our selection system. Taken together, these mutational buffering effects likely explain why mutations that are advantageous in our selection are not prevalent in evolutionary DHFR sequences, and likely also explain why DHFR sequences do not vary between naturally occurring –Lon and +Lon strains of E. coli.

Nevertheless, our engineered selection conditions yielded considerable insights into constraints on mutational landscapes that are typically hidden from observation precisely because of buffering effects in natural contexts. The increase in the number of advantageous mutations in the absence of Lon shows that decreasing cellular constraints can substantially modulate the tolerance to mutation in a deep mutational scanning experiment. Because all B type E. coli strains (e.g. BL21) have the same natural Lon deficiency as our selection strain, our results could have implications for selection experiments performed in these strains over much longer time-scales such as the E. coli Long-Term Evolution Experiment (Tenaillon et al., 2016), or directed evolution strategies that often lead to mutations at positions distal to the active site.

Beyond experiments in B-type E. coli, we expect the fundamental principle of tuning trade-offs to play a role in other experimental systems. Prior work has illuminated the impact of chaperones on the effect of mutations, such as for GroEL in bacteria (Tokuriki and Tawfik, 2009) and for Hsp90 in eukaryotic cells that has been shown to buffer the phenotypic impacts of deleterious mutations (Queitsch et al., 2002). Our results highlight an opposite key role for the protein quality control machinery to tune in vivo mutational responses and lead to a model where protease activities add constraints to the mutational landscape and chaperones relieve them.

The ability to tune multiple constraints could provide a general way of controlling landscapes to drive genes into regions of sequence space that are highly responsive to external pressures. A concrete example of how this principle could be applied is in combinatorial antibiotics. Lon inactivation has been shown to increase resistance to antibiotics (Nicoloff and Andersson, 2013). Switching between compounds capable of inhibiting or activating Lon in combination with DHFR-targeting folate inhibitors such as trimethoprim could serve to variably promote destabilized resistance mutants when Lon is inhibited and then penalize those mutations when Lon is reactivated.

While the power in engineering individual gene sequences is well-recognized, we are only just beginning to explore the potential in engineering the general behavior of local sequence space. We anticipate that further study of tunable constraints will yield a new toolkit for fine control of the landscapes that guide movements through sequence space and enable unexplored engineering applications.

Materials and methods

All plasmid and primer sequences are listed in The Appendix. Key plasmids were deposited in the Addgene plasmid repository (accession codes are listed in The Appendix). All code and python scripts are available at https://github.com/keleayon/2019_DHFR_Lon.git with key input files and example command lines (Thompson, 2020; copy archived at https://github.com/elifesciences-publications/2019_DHFR_Lon).

Reagent type (species) or resource	Designation	Source or reference	Identifiers	Additional information
Strain, strain background (Escherichia coli)	ER2566	New England Biolabs	Cat# C2566I	Chemically competent cells
Strain, strain background (Escherichia coli)	ER2566 ∆folA/∆thyA (–Lon)	Reynolds et al. Cell 2011		Chemically competent and electrocompetent cells
Strain, strain background (Escherichia coli)	ER2566 ∆folA/∆thyA (+Lon)	This work		Chemically competent and electrocompetent cells
Recombinant DNA reagent	SMT101 (plasmid)	This work		Dual expression of DHFR and TYMS, in vivo assays, chloramphenicol (35 µg/mL final concentration)
Recombinant DNA reagent	SMT201 (plasmid)	This work		SMT101 with TET promter for TYMS, in vivo assays, Chloramphenicol (35 µg/mL final concentration)
Recombinant DNA reagent	SMT205 (plasmid)	This work		SMT201 with mutated RBS for DHFR, in vivo assays, Chloramphenicol (35 µg/mL final concentration)
Recombinant DNA reagent	SMT215 (plasmid)	This work		SMT205 with DHFR-FLAG-tag, western blot, Chloramphenicol (35 µg/mL final concentration)
Recombinant DNA reagent	KR101/SMT301 (plasmid)	Reynolds et al. Cell 2011		His8-tag, Heterologous expression, NiNTA purfication, kanamycin (50 µg/mL final concentration)
Recombinant DNA reagent	pSIM6 (plasmid)	Blomfield et al., 1991		Lambda Red recombinase expression, temperature-sensitive promoter, ampicillin/carbenicilin (100 µg/mL final concentration)
Recombinant DNA reagent	pIB279 (plasmid)	Blomfield et al., 1991		KAN-SacB cassette for positive/negative selection, ampicillin/carbenicilin (100 µg/mL final concentration)
Sequence-based reagent	TetDuet1_sense	This work	Mutagenic PCR primer	ccgCTTAAGtcgaacagaaagtaatcgtattgtacatccctatc
Sequence-based reagent	TetDuet2_anti	This work	Mutagenic PCR primer	gatagggatgtcaatctctatcactgatagggatgtacaatacg
Sequence-based reagent	TetDuet3_sense	This work	Mutagenic PCR primer	agagattgacatccctatcagtgatagagatactgagcacatcag
Sequence-based reagent	TetDuet4_anti	This work	Mutagenic PCR primer	ctttaatgaattcggtcagtgcgtcctgctgatgtgctcagtatctc
Sequence-based reagent	TetDuet5_sense	This work	Mutagenic PCR primer	cactgaccgaattcattaaagaggagaaaggtaccatatggc
Sequence-based reagent	TetDuet_5flanking	This work	Mutagenic PCR primer	ccgcttaagtcgaacagaaag
Sequence-based reagent	TetDuet_3flanking	This work	Mutagenic PCR primer	cggagatctgccatatggtacc
Sequence-based reagent	WT_DHFR_pos2_fwd	This work	Mutagenic PCR primer	NNSAGTCTGATTGCGGCGTTAG
Sequence-based reagent	WT_DHFR_pos2_fwd2	This work	Mutagenic PCR primer	NNSAGTCTGATTGCGGCGTTAG
Sequence-based reagent	WT_DHFR_pos3_fwd	This work	Mutagenic PCR primer	NNSCTGATTGCGGCGTTAGCG
Sequence-based reagent	WT_DHFR_pos4_fwd	This work	Mutagenic PCR primer	NNSATTGCGGCGTTAGCGGTA
Sequence-based reagent	WT_DHFR_pos5_fwd	This work	Mutagenic PCR primer	NNSGCGGCGTTAGCGGTAGAT
Sequence-based reagent	WT_DHFR_pos6_fwd	This work	Mutagenic PCR primer	NNSGCGTTAGCGGTAGATCGC
Sequence-based reagent	WT_DHFR_pos7_fwd	This work	Mutagenic PCR primer	NNSTTAGCGGTAGATCGCGTTATC
Sequence-based reagent	WT_DHFR_pos8_fwd	This work	Mutagenic PCR primer	NNSGCGGTAGATCGCGTTATCG
Sequence-based reagent	WT_DHFR_pos8_fwd2	This work	Mutagenic PCR primer	NNSGCGGTAGATCGCGTTATCG
Sequence-based reagent	WT_DHFR_pos9_fwd	This work	Mutagenic PCR primer	NNSGTAGATCGCGTTATCGGCATG
Sequence-based reagent	WT_DHFR_pos10_fwd	This work	Mutagenic PCR primer	NNSGATCGCGTTATCGGCATGG
Sequence-based reagent	WT_DHFR_pos11_fwd	This work	Mutagenic PCR primer	NNSCGCGTTATCGGCATGGAAAA
Sequence-based reagent	WT_DHFR_pos12_fwd	This work	Mutagenic PCR primer	NNSGTTATCGGCATGGAAAACGC
Sequence-based reagent	WT_DHFR_pos13_fwd	This work	Mutagenic PCR primer	NNSATCGGCATGGAAAACGCC
Sequence-based reagent	WT_DHFR_pos14_fwd	This work	Mutagenic PCR primer	NNSGGCATGGAAAACGCCATG
Sequence-based reagent	WT_DHFR_pos15_fwd	This work	Mutagenic PCR primer	NNSATGGAAAACGCCATGCCG
Sequence-based reagent	WT_DHFR_pos16_fwd	This work	Mutagenic PCR primer	NNSGAAAACGCCATGCCGTGG
Sequence-based reagent	WT_DHFR_pos17_fwd	This work	Mutagenic PCR primer	NNSAACGCCATGCCGTGGAAC
Sequence-based reagent	WT_DHFR_pos18_fwd	This work	Mutagenic PCR primer	NNSGCCATGCCGTGGAACCTG
Sequence-based reagent	WT_DHFR_pos19_fwd	This work	Mutagenic PCR primer	NNSATGCCGTGGAACCTGCCT
Sequence-based reagent	WT_DHFR_pos20_fwd	This work	Mutagenic PCR primer	NNSCCGTGGAACCTGCCTGCC
Sequence-based reagent	WT_DHFR_pos21_fwd	This work	Mutagenic PCR primer	NNSTGGAACCTGCCTGCCGAT
Sequence-based reagent	WT_DHFR_pos22_fwd	This work	Mutagenic PCR primer	NNSAACCTGCCTGCCGATCTC

Share this article

Cite this article

E. coli DHFR deep mutational scanning uncovers many advantageous mutations.

Figure 1—source data 1

Figure 1—source data 2

Figure 1—source data 3

Lon protease expression reshapes the mutational landscape.

Delta selection coefficients show Lon impact.

Figure 3—source data 1

Advantageous mutations arise from an interplay of increased enzymatic velocity and increased abundance in the absence of Lon.

Figure 4—source data 1

Figure 4—source data 2

Figure 4—source data 3

Structural characterization of multiple constraints on the DHFR mutational landscape.

Author details

Samuel Thompson

Contribution

For correspondence

Competing interests

Yang Zhang

Contribution

Competing interests

Christine Ingle

Contribution

Competing interests

Kimberly A Reynolds

Contribution

Competing interests

Tanja Kortemme

Contribution

For correspondence

Competing interests

Citations by DOI

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organism