1. Biochemistry and Chemical Biology
  2. Computational and Systems Biology
Download icon

Comprehensive exploration of the translocation, stability and substrate recognition requirements in VIM-2 lactamase

  1. John Z Chen
  2. Douglas M Fowler
  3. Nobuhiko Tokuriki  Is a corresponding author
  1. Michael Smith Laboratories, University of British Columbia, Canada
  2. Department of Genome Sciences, University of Washington, United States
  3. Department of Bioengineering, University of Washington, United States
Research Article
  • Cited 0
  • Views 210
  • Annotations
Cite this article as: eLife 2020;9:e56707 doi: 10.7554/eLife.56707

Abstract

Metallo-β-lactamases (MBLs) degrade a broad spectrum of β-lactam antibiotics, and are a major disseminating source for multidrug resistant bacteria. Despite many biochemical studies in diverse MBLs, molecular understanding of the roles of residues in the enzyme’s stability and function, and especially substrate specificity, is lacking. Here, we employ deep mutational scanning (DMS) to generate comprehensive single amino acid variant data on a major clinical MBL, VIM-2, by measuring the effect of thousands of VIM-2 mutants on the degradation of three representative classes of β-lactams (ampicillin, cefotaxime, and meropenem) and at two different temperatures (25°C and 37°C). We revealed residues responsible for expression and translocation, and mutations that increase resistance and/or alter substrate specificity. The distribution of specificity-altering mutations unveiled distinct molecular recognition of the three substrates. Moreover, these function-altering mutations are frequently observed among naturally occurring variants, suggesting that the enzymes have continuously evolved to become more potent resistance genes.

Introduction

The rise of drug-resistant bacterial pathogens has been rapid and inevitable following the introduction of novel antibiotics to the clinic. Pathogens often acquired resistances through horizontal gene transfer (HGT) using mobile genetic elements carrying antibiotic resistance genes, such as plasmids or transposable elements (Wright, 2019; Surette and Wright, 2017; Codjoe and Donkor, 2017). Under constant selection pressure from antibiotic use, these resistance genes continuously evolve to improve their efficacy and alter and broaden their specificity to other classes of antibiotics (Surette and Wright, 2017; Livermore, 2012). Understanding the molecular mechanisms and evolutionary dynamics of antibiotic resistance genes is crucial to finding sustainable solutions against the future dissemination and evolution of antibiotic resistance, such as through predicting future evolution and aiding in antibiotic and inhibitor design (Crofts et al., 2017; Tehrani and Martin, 2018).

Metallo-β-lactamases (MBL), or class B β-lactamases, are one of the major sources for the spread of multi-drug resistance bacteria. MBLs are metal dependent hydrolytic enzymes that degrade a broad spectrum of the widely used β-lactam antibiotics, including ‘last resort’ antibiotics such as carbapenems (Bebrone, 2007). Plasmid borne MBLs, such as VIM, NDM, IMP, and SPM-types, have been particularly problematic as they can spread to different bacterial pathogens and have no clinically effective inhibitors (Nordmann and Poirel, 2002). All major MBLs have also been undergoing continual evolution; VIM-type MBLs have diversified up to 70 amino acid mutations (26% sequence difference) into over 50 isolated variants, and some variants seem to have developed new substrate specificity (Yan et al., 2001; Schneider et al., 2008; Bogaerts et al., 2012; Mojica et al., 2015; Martínez-García et al., 2018). Much effort has been made to characterize the molecular mechanisms and identify key residues in several major MBLs using diverse biochemical and structural approaches (Lauretti et al., 1999; Franceschini et al., 2000; Wommer et al., 2002; Jin et al., 2004; King and Strynadka, 2011; Baier and Tokuriki, 2014; Makena et al., 2016; González et al., 2016a; González et al., 2016b; Socha et al., 2019; Sun et al., 2018). However, the contributions of the majority of residues in these enzymes remain unexplored, and little is known of the molecular mechanisms governing substrate recognition.

One way to resolve these questions is through comprehensive, large-scale characterizations of mutations affecting the MBLs’ efficacy and specificity. Deep mutational scanning (DMS) is a recently developed method for the characterization of thousands of mutations within a protein using deep sequencing (Fowler et al., 2010; Fowler et al., 2014; Rubin et al., 2017). The resulting high-resolution and comprehensive mutational datasets provide invaluable information for deciphering a subset of mutations related to monogenetic disease (Starita et al., 2015; Weile et al., 2017; Matreyek et al., 2018), understanding evolutionary dynamics of proteins—including viral coat, antibiotic resistance genes and hormone receptors (Starr et al., 2017; Stiffler et al., 2015; Steinberg and Ostermeier, 2016; Lee et al., 2018)— as well as elucidating protein sequence-structure-function relationships (Araya et al., 2012; Firnberg et al., 2014; Thyagarajan and Bloom, 2014; Klesmith et al., 2017; Roscoe et al., 2013; Romero et al., 2015; Gray et al., 2017). In particular, conducting DMS on a protein of interest under varying conditions—for example against different substrates or in different environments—has further unveiled in-depth molecular details of a protein, such as residues contributing to substrate specificity (Melnikov et al., 2014; Wrenbeck et al., 2017) and protein–environment interactions (Mavor et al., 2016; Noda-García et al., 2019; Thompson et al., 2019).

In this work, we use DMS to characterize the functional behavior of all ~5600 single amino acid variants of VIM-2 against three classes of β-lactam antibiotics (ampicillin, cefotaxime, and meropenem) and at two different temperatures (25°C and 37°C), and gain deep insights into the molecular and evolutionary determinants of VIM-2’s behavior. We generate a series of comprehensive and high-quality datasets, and develop a global understanding of VIM-2 by identifying residues that are critical for its function, stability and/or substrate specificity. We also examine VIM-2’s signal peptide—an often overlooked feature despite its importance in expression and transport. Moreover, we use the data to assess the resistance characteristics and rationalize evolutionary outcomes of the clinically isolated natural variants of VIM-type genes, revealing that several mutations in the natural variants are functionally beneficial and lead to changes in substrate specificity.

Results and discussion

Deep mutational scanning of VIM-2 metallo-β-lactamase

DMS was conducted on a library of VIM-2 variants, each encoding a single amino acid substitution. The wild-type (wt) VIM-2 (UniProt ID: A4GRB6) was sub-cloned into a custom pIDR2 vector with a chloramphenicol resistance marker, where VIM-2 expression is controlled by the constitutive AmpR promoter (Supplementary file 1); an extra Gly was inserted at position two to facilitate cloning (See ‘Generation of a VIM-2 mutagenized library’ in methods), which was also further mutated and selected. We constructed the library of all possible single amino acid variants of wtVIM-2 through PCR based saturation mutagenesis, where each codon position is mutated to an ‘NNN’ codon using restriction-free (RF) cloning (Figure 1Avan den Ent and Löwe, 2006); there are 5607 possible variants in the library ((20 a.a. + stop codons)×267 positions). The plasmids of mutagenized codon libraries were pooled into seven groups—six groups of 39 codons (117 bp, 819 variants in each group) and 1 group of 33 (99 bp, 693 variants)—so the mutagenized region of each group can be covered by Illumina NextSeq deep sequencing. We estimated the mutation rate of our library construction by determining the full sequence of 87 variants by Sanger sequencing: only one nucleotide substitution was found outside the intended codon—which corresponds to a mutation rate of 1.4 × 10−5—while two other variants had a large insertion or deletion, which would be filtered out during the variant identification process (see ‘Variant identification and noise filtering’ in methods). Thus, we constructed a high quality variant library, comparable to other libraries constructed and deep sequenced in a similar manner (Melnikov et al., 2014).

Figure 1 with 4 supplements see all
Deep mutational scanning (DMS) overview.

(A) The workflow for DMS. All single amino acid variants are first generated using RF cloning, subsequently transformed into E. coli and then subject to selection for antibiotic resistance conferred to E. coli. The effects of selection (fitness score) were evaluated by deep sequencing and comparing the enrichment of each variant with and without selection. (B) Chemical structures of the antibiotics used in this study. (C) The dose-response growth curve of E. coli transformed with wtVIM-2 or an empty vector control for each antibiotic. Percent growth is calculated as OD600 selected / OD600 non-selected×100 after 6 hr of selection at 37°C. The vertical dashed lines indicate antibiotic concentrations at which selections were performed in this study. The dose-response curves of E. coli transformed with each of the seven library groups are in Figure 1—figure supplement 1. The number of variants represented in the mutated library is in Table 2. Other aspects of the cloning and data processing are described in Figure 1—figure supplement 2 (PCR cloning method), Figure 1—figure supplement 3 (deep sequencing error rates) and Figure 1—figure supplement 4 (improving variant identification by an error filtering process).

Cultures of E. coli cells—specifically, E. cloni 10G, chosen for their high transformation efficiency and lack of endA and recA—transformed with VIM-2 libraries (each group was treated separately) were subjected to antibiotic selection by incubating the culture at 37°C with LB media in the presence (selected) and the absence (non-selected) of three different classes of β-lactam antibiotics—ampicillin (AMP), a 3rd generation penicillin, cefotaxime (CTX), a 3rd generation cephalosporin and meropenem (MEM), a carbapenem (Figure 1B). To determine the selection conditions, the growth of E. coli cells harboring the plasmid encoding wtVIM-2 was examined at a range of antibiotic concentrations (1.0–1024 µg/mL AMP, 0.0625–64 µg/mL CTX, 0.002–2.0 µg/mL MEM) (Figure 1C and Figure 1—figure supplement 1). We chose to test the highest antibiotic concentration where wtVIM-2 can grow almost 100% relative to growth in media without β-lactam antibiotics, and at successive lower concentrations at 8-fold decrements where the range permits; selected conditions are 128, 16 and 2.0 µg/mL of AMP, 4.0 and 0.5 µg/mL CTX, and 0.031 µg/mL MEM (Figure 1C). The selection process for each antibiotic was conducted in duplicate on separate days. After selection, the plasmids were isolated, the mutagenized region of each group was amplified by PCR, and the amplicons were sequenced by the Illumina NextSeq 550 platform. The sequencing reads were error filtered, and the fitness score of each variant relative to wtVIM-2 was calculated using Equation (1). (see methods for ‘Deep sequencing and quality control’).

(1) fitness score=Log2(frequency of variant Selectedfrequency of variant Nonselectedfrequency of wt Selectedfrequency of wt Nonselected)

Where the frequency of a variant (or wt) is the deep sequencing read count of the variant divided by the total reads in the corresponding sample. Variants with frequencies below the threshold of deep sequencing errors that was estimated from the deep sequencing of wtVIM2 (see methods for ‘Variant identification and noise filtering’) were excluded during scoring. The non-selected library shows excellent coverage, with 5535 of 5607 (98.7%) variants present after filtering in at least one replicate while 97.8% are observed in both replicates (Table 2). For selected libraries, we calculate the fitness score for any variants present in at least one non-selected replicate then average the fitness scores between the two selection replicates (see Supplementary file 2A for all fitness scores).

Our DMS experiments show high replicability in all conditions tested. The R2 of a linear regression between variants observed in both replicates is 0.94 for selection with 128 µg/mL AMP, 0.91 for 4.0 µg/mL CTX and 0.85 for 0.031 µg/mL MEM (Figure 2 and Figure 2—figure supplement 1). As expected, variants with synonymous mutations have near neutral fitness and variants with nonsense mutations (stop codons) have the lowest fitness scores. At the highest concentration of each antibiotic, variants with stop codons have fitness scores centered around −4 and lower, thus a fitness score of −4 is considered the lowest score cut-off for downstream analyses and fitness scores below this cut-off are set to −4 (Figure 2). Like previous DMS studies with other proteins, the overall fitness distribution of all variants exhibit a bi-modal distribution with a peak at neutral fitness and a long tail stretching toward negative fitness to another peak at the cutoff of −4 (Figure 2Stiffler et al., 2015; Firnberg et al., 2014; Roscoe et al., 2013; Jacquier et al., 2013).

Figure 2 with 2 supplements see all
Quality control of DMS data and general mutational properties of VIM-2.

In the horizontal panels, data are shown for (A) 128 µg/mL AMP selection, (B) 4.0 µg/mL CTX selection and (C) 0.031 µg/mL MEM selection. The color legend in panel A) is shared by all panels. For each horizontal panel, the left plot shows correlation between fitness scores of all variants in the two replicates of DMS; replicate correlation of selection conditions not shown here are in Figure 2—figure supplement 1. The middle plot shows distribution of fitness effects for all variants separated into synonymous, missense and nonsense distributions, where the vertical grey lines indicate fitness score cut-offs used to classify fitness effects as positive, neutral or negative. The proportion of variants in each fitness effect category can be found in Figure 2—figure supplement 2. The right plot shows the relationship of DMS fitness scores with antibiotic resistance (EC50) of isolated variants measured in a dose-response curve; variants with resistance lower than the tested range could not be fitted for EC50, leading to EC50 values for 39 unique variants in AMP, 39 for CTX and 45 for MEM—some points are an average of the same codon or amino acid variant isolated multiple times. The filled rectangle in the background indicates the region of linear association between fitness scores and EC50. The text at the top of each plot indicates the Spearman rank-order correlation coefficient and the P-value of the correlation. Individual EC50 measurements can be found in Supplementary file 2B.

We confirmed the DMS fitness scores reflect the actual resistance level of variants (Figure 2 and Supplementary file 2B). We isolated 45 unique variants (61 unique codons), and determined the half maximal effective concentration (EC50) of E. coli culture harboring each variant for three antibiotics by measuring antibiotic dose-response curves. We fit the relationship using a sigmoidal function and identify a linear range of correlation for fitness scores within −3.1 to 0.1 for AMP (EC50 28–81 µg/mL), −2.7 to 0.6 for CTX (EC50 1.8–4.1 µg/mL) and −2.4 to 1.8 for MEM (EC50 0.012–0.066 µg/mL), which correspond to a 2.9, 2.3 and 5.5-fold range of EC50 values for AMP, CTX and MEM, respectively (Sebaugh and McCray, 2003). All fitness scores outside the linear range are still qualitatively consistent with EC50—where higher fitness scores correspond to higher EC50 values and lower fitness scores correspond to lower EC50 values—which is supported by a Spearman rank-order correlation between fitness and EC50 of at least 0.85 for selection using each antibiotic.

Global view of VIM-2 enzyme characteristics

The fitness scores for variants selected at 128 µg/mL AMP are shown in Figure 3 (See Figure 3—figure supplements 1 and 2 for CTX and MEM); the inserted Gly2 is omitted to match the numbering for wtVIM-2 (For Gly2 fitness scores, see Supplementary file 2A). At a glance, there are several interesting trends in the DMS data of VIM-2. Variants with Cys mutations are highly deleterious throughout the catalytic domain (positions 27–266) (Figure 3—figure supplement 3). As wtVIM-2 possesses only one Cys for metal binding, additional Cys may cause the formation of undesired disulfide bonds, leading to misfolding (Mehlhoff, 2020). Pro variants are also often deleterious, as this residue disrupts secondary structures (Stiffler et al., 2015; Firnberg et al., 2014; Gray et al., 2017). We found 112 positions (42% of all residues) are highly sensitive to mutations, where over 75% of missense variants (excluding synonymous and nonsense mutations) display a fitness score <−2.0. These positions are likely key requirements for catalytic activity or protein stability and folding in wtVIM-2. Indeed, these positions include all six active-site metal coordinating residues (His114, His116, Asp118, His179, Cys198, His240), as well as 3–4 residues adjacent to each metal binding residue in the amino acid sequence that are likely to play important roles in the metal configuration and enzymatic function. Additionally, 92% of positions with high mutational sensitivity—including all metal binding residues—are located in the core of the protein (accessible surface area, ASA, of residue <30%) and 63% are almost completely buried (ASA <5%), congruent with previous findings (Fowler et al., 2010; Stiffler et al., 2015; Thyagarajan and Bloom, 2014; Melnikov et al., 2014; Thompson et al., 2019; Kitzman et al., 2015Figure 4A). The association between mutational sensitivity and ASA is also evident at the level of individual variants, where the distribution of variants at positions with ASA <30% exhibits significantly lower fitness than the distribution of variants at positions with ASA ≥30% (two-tailed Mann-Whitney U test, P-value<0.001, Figure 4A).

Figure 3 with 3 supplements see all
Fitness of all VIM-2 single amino acid variants under 128 µg/mL AMP selection.

Each cell in the heat map represents the fitness score of a single amino acid variant. Synonymous variants are indicated by dark grey circles and variants that are not present in the library are in grey. The x-axis under the heat map indicates the wt residue and position (the six active site metal binding residues are highlighted as circles), while the y-axis indicates the variant residue at that position. The secondary structure of the wtVIM-2 crystal structure (PDB: 4bz3) is displayed below the heat map. Corresponding heat maps for variants under selection with CTX and MEM can be found in Figure 3—figure supplement 1 and Figure 3—figure supplement 2, respectively. A comparison between the distributions of variants with each mutated amino acid can be found in Figure 3—figure supplement 3. All fitness data used in the heat maps can be found in Supplementary file 2A.

Correlation of fitness with structural attributes.

Fitness scores are from DMS at 128 µg/mL AMP selection. (A) To the right, is the crystal structure of wtVIM-2 (PDB: 4bz3) colored by the average fitness of 20 amino acid mutations at each position. The inset to the left shows distributions of fitness scores for variants at positions with relative accessible surface area (ASA) <0.3 in wtVIM-2 or positions with ASA ≥0.3, with the number of variants in each distribution shown at the bottom and the results of a two-tailed Mann-Whitney U test between the two distributions at the top. (B) The correlation between accessible surface area and the average fitness of 20 amino acid mutations at each position. (C) The correlation between the changes in folding energy predicted by Rosetta and the DMS fitness scores for all variants.

To examine the distribution of fitness effects (DFE) of VIM-2 variants, we classified the 5291 nonsynonymous variants as having negative (<−0.7), neutral (−0.7 to 0.7) or positive (>0.7) fitness by performing Z-tests of each variant’s fitness scores against the fitness distribution of 244 synonymous variants (the null model distribution), adjusting for 5% false discovery rate using the Benjamini-Hochberg procedure. The DFE of VIM-2 variants is similar across all selection antibiotics at the highest screening concentration (128 µg/mL AMP, 4.0 µg/mL CTX, 0.31 µg/mL MEM), with ~65% of variants being negative, ~30% being neutral and ~5% being positive (Figure 2—figure supplement 2). The overall DFE also agrees with observations found in DMS of other enzymes, such as E. coli amidase, TEM-1 β-lactamase and levoglucosan kinase (Stiffler et al., 2015; Firnberg et al., 2014; Klesmith et al., 2017; Wrenbeck et al., 2017).

Next, in order to determine biophysical properties that explain the fitness scores, a linear model was constructed using the fitness score of variants as the response factor and various parameters such as ASA, ΔΔG predicted by Rosetta, change in amino acid volume and polarity, and the wt and variant amino acid states as predictors (Table 1, Supplementary file 3A and 3B for CTX and MEM models, Supplementary file 2C for data used in models). We first modeled each predictor alone with the response, then selected predictors that account for at least 10% of variance in the response (R2 >0.10) and modeled them in combinations of 2 or more as individual terms and as interacting terms. The final model accounted for the greatest amount of variance using the fewest predictors—that is optimized for adjusted R2—and combined four predictors (ASA, ΔΔG, wt and variant amino acid) capable of explaining 55% of the variation in fitness scores (adjusted R2 = 0.55). The ASA alone can explain 21% of fitness score variation (Table 1) and ASA of the wt amino acid alone show a significant correlation (R2 = 0.50) to the average fitness scores of the position, with mutations at exposed residues having less deleterious fitness effects on average (Figure 4B). The ΔΔG explains an additional 18% (Table 1) and there is overall correlation between ΔΔG and fitness score while individual predictions are relatively scattered, similar to previous findings that compared fitness to Rosetta folding energies or solubility scores (Figure 4CFirnberg et al., 2014; Klesmith et al., 2017). Knowing the wt and variant amino acid can further explain another 10% and 5% of variation respectively (Gray et al., 2017; Gray et al., 2018). Thus, the results indicate structure and biophysical factors can explain the majority of fitness score tendencies.

Table 1
Linear model output for DMS fitness scores under 128 µg/mL AMP selection
Predictor*Estimated effectAdjusted P-value (α = 0.05)Variance explained§
(Intercept)−1.73<0.001
ASA**2.83<0.00121%
Rosetta ΔΔG††−0.28<0.00118%
Starting(wt) residueC−1.53<0.00110%
D−1.33<0.001
E−0.53<0.001
G−0.92<0.001
H−1.09<0.001
I−0.50<0.001
L−0.43<0.001
N−0.62<0.001
R−0.350.001
S−0.230.018
V−0.36<0.001
W−1.35<0.001
Variant residueC−1.69<0.0015%
D−0.400.002
P−0.61<0.001
W−0.45<0.001
  1. *Each predictor indicates a class of wtVIM-2 derived values that were used as explanatory variables to model a linear relationship with the observed fitness score.

    †The estimated effect is the predicted change in fitness score away from the intercept with a one unit increase in a continuous predictor or a binary change in a categorical predictor.

  2. ‡P-values indicates whether a predictor makes a significant contribution to the change fitness score, and are adjusted for a false discovery rate of 5% using the Benjamini-Hochberg procedure.

    §The adjusted R2 of each predictor when correlated with fitness, which is a measure of how much variation in the fitness score can be explained by each predictor in the linear model.

  3. ¶The intercept is the average fitness of all variants where the continuous variable is 0 (ASA and Rosetta ΔΔG) and the wt or variant residue is Ala.

    **ASA ranges from 0.0 to 1.0.

  4. ††Rosetta ΔΔG ranges from −5.0 to 5.0 Rosetta energy units.

Table 2
Variants observed in each library group.
37°C*25°C*
 GroupNumber of positionsTotal possible variantsBoth replicates†At least one replicateBoth replicatesAt least one replicate
 139819808811808812
 239819812813811814
 339819803807802807
 439819802813809812
 539819801812804811
 639819775793789796
 733693682686678690
 Total26756075483553555015542
 % coverage97.8%98.7%98.1%98.8%
  1. *Non-selected libraries were grown, sequenced and filtered separately at 37°C and 25°C.

    †The observed number of variants that passed noise filtering in both sequencing replicates of the non-selected library.

  2. ‡The observed number of variants that passed noise filtering in at least one sequencing replicate of the non-selected library.

Codon and amino acid optimization in the signal peptide

The first 26 residues of VIM-2 has been identified as the signal peptide (Lauretti et al., 1999; Franceschini et al., 2000; Garcia-Saez et al., 2008), which is a sequence used to translocate the enzyme to the periplasm, then cleaved after transport (Oliver, 1985; Pugsley, 1993; Freudl, 2018; Paetzel and Peptidases, 2019). Our DMS data supports the known length of the signal peptide as mutations to Cys are much less deleterious before residue 26, suggesting these positions are not incorporated into the mature enzyme in the periplasm. In general, the signal peptide sequence has an amino terminal (N) region (residues 1–7) with one or more positive residues, a hydrophobic (H) region (residues 8–21) and a carboxy terminal (C) region (residues 22–26) that precedes the cleavage site containing a PXAXS motif (Figure 5AOliver, 1985; Pugsley, 1993; Freudl, 2018; Paetzel and Peptidases, 2019). The signal peptide is conserved at 17 of 25 positions (Met one excluded) across all VIM variants, and the remaining are binary differences between the conserved sequences of the VIM-1 and VIM-2 clades (Figure 5A, clades are defined by Figure 8—figure supplement 1).

Figure 5 with 3 supplements see all
Conservation patterns and fitness scores in the signal peptide.

(A) Sequence logo of the signal peptide region aligned across all VIM natural variants generated using WebLogo (https://weblogo.berkeley.edu/). Positions with two major naturally occurring residues are conserved differences between the VIM-1 and VIM-2 clades (clades are defined in Figure 8—figure supplement 1). (B) The distribution of fitness effects of all missense variants, separated into signal peptide variants and catalytic domain variants. The number of variants in each distribution are displayed in the legend below the distributions. The results of a two-tailed Mann-Whitney U test between the distributions are displayed above the distributions. (C) DMS fitness scores of all variants at each position of the signal peptide. Synonymous variants of wtVIM-2 and conserved variants observed in the VIM-1 clade are highlighted as labelled circles. Additional information on codon variant fitness in the signal peptide can be found in Figure 5—figure supplement 1 (replicate correlation of codon variant fitness), Figure 5—figure supplement 2 (heat map of codon variant fitness) and Figure 5—figure supplement 3 (correlation between RNA folding energy and codon variant fitness).

Mutations in the signal peptide are generally tolerated (67% of missense mutations are neutral with 128 µg/mL of AMP) and even beneficial (10% of missense mutations are positive), which is consistent with a previous DMS study with TEM-1 (Firnberg et al., 2014Figure 5B). Overall, the DFE for missense variants in the signal peptide is significantly more neutral than the DFE of missense variants in the catalytic domain (two-tailed Mann-Whitney U test, P-value<0.001, Figure 5B). In the N-terminal region, mutations to Lys3 are especially deleterious, likely due to the importance of a net positive charge in the N-region for efficient translocation (Oliver, 1985; Inouye et al., 1982; Iino et al., 1987). In contrast, Lys7 is tolerant to substitution—in fact, half of the natural VIM variants have a Ser at this position—suggesting that Lys3, rather than Lys7, is critical for translocation. In the H-region, residues Val10 through Ile16 are the most sensitive to mutation, especially when changed to a charged amino acid (Firnberg et al., 2014; Mehlhoff, 2020; Oliver, 1985). The C-region is most negatively affected by the mutation of Leu23 to Cys (fitness of −2.5) or Trp (−2.7), while 95% of variants are neutral or positive, including those in the PXAXS motif.

Interestingly, variants with evolutionarily conserved residues in the signal peptide often do not have the highest fitness, and we note both residue level and codon level dependencies in fitness (Figure 5C). Across the signal peptide, 10% of variants with missense mutations have positive fitness relative to wtVIM-2, with fitness ranging from 0.7 to 1.3. In terms of codon level fitness within synonymous variants, we find a small but statistically significant association between a mutated codon’s change in mRNA folding energy (including the 5’UTR and excluding the 3’UTR) and fitness within mutants close to the start codon, supporting previous findings that avoidance of secondary structure near the start codon is favored (Figure 5—figure supplement 3, Supplementary files 2D and 2E, see Materials and methods for ‘RNA folding energy calculation’) (Bhattacharyya et al., 2018; Bentele et al., 2013). We also found that 73 of 143 ‘codon dependent variants’—variants in which any pair of synonymous codon scores differ by more than 2.0—are within the signal peptide, which is also suggestive of the importance of codon choice near the start of the coding region (Kelsic et al., 2016Figure 5—figure supplement 2).

The less than optimal residue level fitness of wtVIM-2 may be because we employ E. coli as a host while natural VIM variants are often found in Pseudomonas (Jia et al., 2017), and/or the signal peptide is not selected to produce maximum expression in natural environments. It has been shown that different signal peptides produce variable expression levels and translocation rates for a given protein, both of which affect the final resistance, especially in different host organisms (Socha et al., 2019; Mehlhoff, 2020; Inouye et al., 1982; Iino et al., 1987; Brockmeier et al., 2006; Mathiesen et al., 2009; Hemmerich et al., 2016). Meanwhile, fitness variation at the codon level may be due to the presence of a different 5’UTR compared to the plasmid sequence found in clinical VIM-2 (Bhattacharyya et al., 2018; Kelsic et al., 2016). The inclusion of an extra Glycine at position two may affect the observed fitness, though this effect is expected to be small since signal peptides are highly variable in length and composition in the N-region (Oliver, 1985; Pugsley, 1993); 13/19 missense variants are neutral or positive at Gly2. Overall, the signal peptide is sensitive to changes in organismal and genetic context which would affect the outcome of horizontal gene transfer. Notably, the signal peptide is frequently mutated in naturally occurring VIM-type variants (see section on ‘Natural VIM variation’ below), suggesting changes in the signal peptide sequences may have played significant roles in dissemination of MBL genes to different hosts and adaptation to higher antibiotic concentrations.

Elucidation of the role of residues in the catalytic domain

We sought to further examine the functional and structural roles of residues in the catalytic domain (positions 27–266) of wtVIM-2. We compare fitness scores of missense variants between selection in 128 µg/mL and 16 µg/mL AMP, as fitness at different AMP concentrations reflect a residue’s degree of involvement in the enzyme’s stability, expression and/or catalytic activity. Selection was also performed at 25°C in addition to 37°C to examine temperature dependent mutational effects, highlighting residues involved in protein folding and stability; in general, lower temperatures are permissive to variants with poor folding and high aggregation propensity while having a uniform effect on catalytic rate. To assess the role of each residue, we classified all positions in the catalytic domain into four types: i) ‘tolerant’, if 75% of variants are neutral even in the most stringent condition with 128 µg/mL AMP at 37°C, ii) ‘essential’, if 75% of variants are highly deleterious even in the least stringent condition with 16 µg/mL AMP at 25°C, iii) ‘temperature dependent’, if the difference in the fitness score between 25°C and 37°C is more than 2.0 in either 128 or 16 µg/mL AMP, and iv) ‘residue dependent’, if variants are temperature independent (similar scores at the two temperature) and exhibit a range of fitness rather than being mostly neutral or negative (Figure 6A–B, Supplementary file 2F for all classifications, also see ‘Identification of critical residues and temperature dependence’ in the Materials and methods).

Figure 6 with 2 supplements see all
Distribution of mutational tolerance and temperature dependence of wtVIM-2 residues.

(A) Scatterplots comparing fitness scores under selection at 25°C and 37°C (128 µg/mL AMP). Variants within the classified positions are highlighted in dark blue while all variants are plotted in grey for reference. (B) Proportion of residues in the wtVIM-2 catalytic domain that have been classified into each behavioral category. (C–D) The wtVIM-2 crystal structure (PDB: 5yd7) is colored by the behavioral classifications, the active-site zinc ions are colored in lime green. (C) View of the spacial distribution of temperature dependence in wtVIM-2, with residues depicted as spheres. The upper row depicts all residues, while the lower row depicts only essential and temperature dependent residues. (D) Cartoon representation of wtVIM-2 with metal-binding residues shown as sticks. All classifications can be found in Supplementary file 2F. Additional analysis of temperature dependence can be found in Figure 6—figure supplement 1 (correlation of Rosetta ΔΔG with temperature dependence) and Figure 6—figure supplement 2 (proportion of sidechain-backbone hydrogen bonding by temperature dependence).

As expected, the 55 ‘tolerant’ positions are scattered around the surface of the protein and are mostly solvent exposed (80% of positions have greater than 30% ASA) (Figure 6C–D). The 20 ‘essential’ residues include all six metal binding residues and deeply buried residues (70% have less than 5% ASA), which form the central core of the protein (Figure 6C). This core is further expanded into a larger scaffold by the 93 ‘temperature dependent’ positions that are mostly buried in the structure (76% have less than 30% ASA) and largely hydrophobic—75% of the temperature dependent residues are non-polar (A, G, I, L, P, V) or aromatic residues (F, W, Y) compared to 58% for the entire catalytic domain. The 72 ‘residue dependent’ positions tend to be near the surface or at packing interfaces between α-helices and β-sheets (Figure 6D).

The fitness of variants at temperature and residue dependent positions show equally strong association with the Rosetta predicted ΔΔG, indicating both classes of residues have contributions to structural packing (Figure 6—figure supplement 1). However, the location and hydrogen bonding behavior of each class of residues suggest different functional roles. Essential and temperature dependent residues display an enrichment of sidechain-backbone h-bonding relative to the proportion of h-bonding residues in each class (Figure 6—figure supplement 2, Supplementary file 2G), suggesting—when combined with the formation of a hydrophobic core—these residues are largely involved in protein folding and stability (Roscoe et al., 2013; Thompson et al., 2019; Flynn, 2019). In contrast, ‘residue dependent’ positions are prominent in the two major active site loops (14/23 positions from 60 to 68 and 201–214) and on packing interfaces on these loops’ distal faces from the active site. The loop holding metal-binding residues His114, His116 and Asp118, and the helix positioning the loop into the active-site (positions 112–129) are also enriched in residue dependent positions (10/15 non-metal-binding positions), suggesting possible effects on metal and substrate positioning. Thus, many of the ‘residue dependent’ positions are likely to be involved in catalysis through direct or indirect substrate interactions, and also affect the overall shape of the active site.

Distinct recognition for different classes of β-lactam substrates

VIM-2 is known for its broad spectrum activity against all classes of β-lactam antibiotics except monobactams, but how residues achieve substrate recognition remains unknown. We examined mutations that alter substrate specificity to identify wt residues responsible for substrate recognition by comparing fitness scores between the three antibiotics (128 µg/mL AMP, 4.0 µg/mL CTX, 0.031 µg/mL MEM). First, we identified 29 ‘globally positive’ variants across 10 positions in the catalytic domain that increase fitness score to >1 in all antibiotics, which is more conservative than the cut-off of >0.7 for variants with positive fitness effects relative to wtVIM-2 and is above the upper fitness score range of the peak centered at neutral fitness in the DFEs (Figure 2Supplementary file 2H). Residues at positions 47, 55, 66, 68 and 205 each give rise to at least three globally positive variants (24 total) while 57, 65, 115, 180 and 201 each give rise to one; 9/10 positions with globally positive variants are near the active site, having at least one atom within 15 Å of the active site zinc ions (Figure 7A). Next, we compare fitness scores of different antibiotics in pairs, and identified variants with a change in fitness effect classifications (negative, neutral or positive) combined with a 2.0 fitness score difference between antibiotics. We identified 78 specificity altering variants across 25 positions, with 23/25 positions near the active site (Figure 7B, Table 3 and Supplementary file 2I for individual specificity variants, Figure 7—figure supplement 1 for fitness heat maps at specificity positions). We confirm the specificity by comparing the fitness scores with the log2(EC50 var /EC50 wt) of the variant in the three antibiotics (Figure 7—figure supplement 2). Of the 25 positions, five are shared by both specificity and globally positive variants, and specificity changes are enhanced by the positive fitness at three of these positions (68, 201 and 205). However, most changes in specificity are due to decreases of fitness in one or two substrates, and only three variants (R205H/I/V) maintain neutral or higher fitness in all antibiotics (Melnikov et al., 2014; Wrenbeck et al., 2017). When examining the roles of these positions, we find nine ‘residue dependent’ and one ‘tolerant’ position, as expected for positions that interact with substrate rather than affect folding (Dellus-Gur et al., 2013). However, the other 15 positions are ‘temperature dependent’ positions, suggesting residues that are involved in substrate specificity are also embedded in the protein core.

Figure 7 with 2 supplements see all
Visualization of VIM-2 specificity determining positions.

The wtVIM-2 crystal structure (PDB: 5yd7) is featured in all panels, with the active site Zn ions depicted as lime green spheres. (A) Positions with at least one globally positive mutation are highlighted as orange spheres, which are also found in Supplementary file 2H. (B) Positions classified as being responsible for specificity towards certain antibiotics in wtVIM-2 are color coded by antibiotic and highlighted as spheres; the positions are listed by specificity in Table 3. Individual variants classified as having changes in specificity are listed in Supplementary file 2I. Heat maps of specificity positions for fitness under selection in each antibiotic and fitness differences between antibiotics can be found in Figure 7—figure supplement 1. (C–E). Close-up views of the specificity residues in the active site with (C) hydrolyzed ampicillin (PDB: 4hl2), (D) cefuroxime (PDB: 4rl0) and (E) meropenem (PDB: 5n5i) from VIM-1 and NDM-1 structures that have been aligned to the wtVIM-2 structure using the active site residues. Residues are colored by the inferred substrate specificity as in B). Substrates are shown in stick and ball representation. Comparison of fitness changes with EC50 changes in individual variants are found in Figure 7—figure supplement 2.

Table 3
Inferred specificity of residues in wtVIM-2.
SpecificityPositions*
 AMP39, 55, 57, 63, 117, 119, 142, 143, 153, 196, 219, 222, 243, 248
 CTX62, 67, 68
 AMP or CTX60, 61, 202, 205, 210, 211, 216
 MEM201
  1. *Positions in bold are temperature dependent. The underlined position is tolerant. The unformatted positions are residue dependent.

Interestingly, there is a strong bias in specificity changes depending on mutations and their positions in the active site. In 21 of 25 positions, specificity variants decrease AMP fitness, while in 10 positions variants decrease CTX fitness and variants decrease MEM fitness at only one position (Figure 7B). This bias is maintained at the level of individual specificity variants, where 96% decrease the fitness in either AMP and/or CTX—29 only decrease AMP, 33 only decrease CTX and 10 decrease both—while only three variants decrease MEM. The residues specific to AMP—where mutations to the residue decrease AMP fitness, but are neutral for CTX and/or MEM—are spread around the active site, including the two active site loops (60–68 and 201–214) as well as residues in the protein scaffold. In contrast, the residues specific for CTX are restricted to the two active site loops (Mojica et al., 2015; Martínez-García et al., 2018; Moali et al., 2003; Merino et al., 2010; Leiros et al., 2014; Leiros et al., 2015); mutations in positions 62, 67 and 68 are fully specific to CTX, while positions 202, 205, 210 and 211 all have mixed specificity for CTX and AMP. To visualize residue-substrate interactions, we overlaid AMP, cefuroxime and MEM substrates in the active site of VIM-2 through alignment with VIM-1 and NDM-1 structures crystallized in complex with these substrates. Some substrate interactions are apparent from proximity, such as the packing of hydrophobic residues in loop 60–68 to the non-polar, aromatic substituents on the AMP (C6) and cefuroxime (C7) that is missing in most carbapenems (Figure 7C–D). However, Glu202 and Arg205 seem to be in better position to interact with the C2 substituent of MEM and are further from AMP or cefuroxime, yet both residues are specific for AMP and CTX. Moreover, many residues in the protein scaffold that are affecting AMP specificity do not directly interact with the substrates. We hypothesize that these distant residues may contribute to solvent related phenomenon—such as displacement of solvent and/or bridging solvent with ligand to affect substrate binding energy (Maurer and Oostenbrink, 2019; Spyrakis et al., 2017)—or alter protein dynamics to affect substrate specificity (González et al., 2016b; Petrović et al., 2018; Campbell et al., 2018; Campbell et al., 2016; Singh et al., 2015).

Although wtVIM-2 degrades all three β-lactams, our observations suggest that the enzyme interacts with each substrate in a different manner. AMP interacts with many residues around the active site and is the most sensitive to mutations, while CTX specificity relies exclusively on interactions with residues in the active site loops. Interestingly, MEM seems to rely on contacts shared with other antibiotics, which suggest that carbapenem resistance of VIM variants may have coevolved with other antibiotics.

Natural VIM variation favors neutral, adaptive and specificity mutations

Currently, 56 unique VIM-type MBL sequences (including wtVIM-2) have been found on plasmids in β-lactam resistant clinical isolates (Supplementary file 2J; Martínez-García et al., 2018; Jia et al., 2017). The DMS data of VIM-2 enable us to characterize these naturally occurring mutations comprehensively. We classified these sequences into four clades, represented by VIM-1 (between 25–29 mutations from VIM-2 each, 45 unique mutations total), VIM-2 (1–6 mutations, 31 total), VIM-7 (70 mutations), and VIM-13 (32–33 mutations, 34 total) (Figure 8A, Figure 8—figure supplement 1), with 131 unique point mutations relative to VIM-2 across 99 positions (Figure 8—figure supplement 2, Supplementary file 2K).

Figure 8 with 2 supplements see all
Behavior of natural VIM variants inferred from DMS fitness.

(A) Maximum likelihood phylogenetic tree of all natural VIM variants examined in this study, colored by major clades (a larger version of the tree is presented in Figure 8—figure supplement 1). The wtVIM-2 sequence is highlighted in white. (B) Distribution of fitness for all unique individual mutations found in VIM natural variants compared to all missense variants measured in DMS for all three antibiotics. (C) All residues mutated in the natural variants are shown in sphere representation and colored by mutational tolerance and temperature dependence. The pie chart on the left shows the proportion of natural variant positions in each classification. (D) wtVIM-2 residues that are both mutated in at least one natural variant and affect specificity are highlighted as sticks, colored by the clade(s) in which the residue is mutated. All positions mutated in natural VIM-type variants relative to wtVIM-2 can be found in Figure 8—figure supplement 2.

As expected, the fitness distribution of naturally occurring mutations in the VIM-type MBL for all antibiotics (128 µg/mL AMP, 4.0 µg/mL CTX, 0.031 µg/mL MEM) shows enrichment for neutral and positive mutations (Figure 8B). The trend suggests that natural variants have adapted to higher resistance as 31% of all ‘globally positive’ catalytic domain variants in DMS are also naturally occurring mutations, and 17% of all natural mutations have positive fitness effects for at least one antibiotic. At least 66% of variants have neutral fitness effects within each antibiotic. Natural mutations are disfavored in residues crucial for activity or stability (Figure 8C): Of the 99 mutated positions, only a small proportion of ‘essential’ (1/20 positions) and ‘temperature dependent’ (21/93) positions have been mutated while large proportions of the signal peptide (20/26) and ‘tolerant’ positions (32/55) have been mutated. Interestingly, 44% (11/25) of specificity altering positions have been mutated, which suggests that VIM variants may have changed their substrate specificity during evolution (Figure 8D).

However, 10% of mutations are still highly deleterious (fitness score <−2.0) in 128 µg/mL AMP (6.9% for 4.0 µg/mL CTX, 6.1% for 0.031 µg/mL MEM), indicating other factors that affect natural variation (Figure 8B). These 13 deleterious mutations are spread over 11 positions, where one position is in the signal peptide, and 10 are in the catalytic domain. The signal peptide mutation (K3Q) occurs only in VIM-7 and eliminates the Lys3 critical to translocation, but this is likely neutral as VIM-7 has a S6R mutation that replaces the positive charge. Two mutations are only deleterious to AMP (Q60H, A143T), thus altering substrate specificity. Furthermore, we suspect the neutral I185V mutation (all 55 natural variants have Val185 while our wtVIM-2 has Ile185) acts as a global suppressor (Brown et al., 2010; Huang and Palzkill, 1997), and permit the accumulation of four mutations that are deleterious in all antibiotics (T139A, T139I, V236G and V255A) as I185V is the only other mutation in these natural variants. The remaining six deleterious mutations are likely neutralized by specific intramolecular epistasis, or the background dependence of mutational effects (Starr and Thornton, 2016; Miton and Tokuriki, 2016; Breen et al., 2012; Sarkisyan et al., 2016; Pokusaeva et al., 2019), as these mutations occur in natural variants with at least 25 mutations relative to VIM-2. While such epistasis will hamper our ability to perfectly predict the effect of mutations, the VIM-2 dataset presented in this study contributes to further our understanding of MBL evolution, and can help orient our predictions concerning the emergence of future resistance.

Conclusion

In this work, we report the first comprehensive mutational analysis of an enzyme in the MBL, class B β-lactamase family, one of the most important enzyme families underlying the dissemination of multi-drug resistance to pathogens. We uncover a sensitivity to variation in the signal peptide of VIM-2, that may be due to codon dependent RNA folding or incompatibility with the host translocation system. Such findings highlight the importance of genome and host context in resistance gene compatibility (Socha et al., 2019; Bentele et al., 2013; Hemmerich et al., 2016). By performing DMS at various conditions, three different antibiotics, and two temperatures, we enhance the understanding of sequence-structure-function relationships by unveiling a set of mutations for protein stability, catalysis and substrate specificity of VIM-2.

We find VIM-2’s substrate specificity altering residues to be enriched near the active site, which enables us to elucidate the molecular basis of enzyme-substrate interactions. This finding is in contrast to a previous DMS study that tested multiple substrates: specificity-altering mutations of E. coli amidase (amiE) were distributed across the entire structure in a global mode of specificity determination (Wrenbeck et al., 2017). Thus, it is likely that each enzyme takes different mechanisms for recognizing diverse substrates. The monomeric MBLs have large, solvent-exposed active site clefts to recognize a wide range of substrates, while amiE which has a small, occluded active site and a hexameric quaternary structure that potentially favors controlling specificity through packing and subunit interactions (Wrenbeck et al., 2017). However, determinants of reaction enantioselectivity in 4-OT—another enzyme active as a hexamer—are concentrated in the active site, which further highlights unique behaviors in different enzymes (van der Meer et al., 2016). Regardless, understanding distinct mode of enzyme-substrate interactions will lead to design and development of new antibiotics and inhibitors to re-sensitize the MBL enzymes.

We have demonstrated that VIM-type variants have been continuously evolving by enhancing their resistance as well as altering their substrate specificity in nature. The study of natural variation also reinforces the observation that mutations found to be neutral or beneficial in an experimental setting tend to be enriched in nature as well (Lee et al., 2018; Flynn, 2019). The tendency for mutations at specificity positions to cause collateral sensitivity to AMP and CTX hints at some possibility for a combined treatment using carbapenems as the main antibiotic with penicillins or cephalosporins added as supplement. Though the possibility of a combined treatment will require extensive investigation using specific β-lactams used in the clinics, and for tests to be conducted on clinically isolated strains that often contain multiple sources of β-lactam resistance including other MBLs like NDM-1 and serine β-lactamases like KPC-1 (Monogue et al., 2018). Given that we also observe natural variants with globally positive mutations, it is doubtful that use of β-lactams alone will be a long-term solution and inclusion of antibiotics with different mechanisms and β-lactamase inhibitors will be required to sustain effective treatments, combined with general surveillance and mitigation of spread for resistant pathogens (Codjoe and Donkor, 2017; Crofts et al., 2017). It is likely that many new VIM variants will emerge in the future, and our results will provide a valuable basis to predict likely mutations and estimate the resistance of newly found variants.

Materials and methods

Key resources table
Reagent type
(species) or resource
DesignationSource or referenceIdentifiersAdditional information
Gene (Pseudomonas aeruginosa)wtVIM-2UniProtUniProtKB:A4GRB6
Strain, strain background (Escherichia coli)E. cloni 10GLucigen60061Electrocompetent cells
Recombinant DNA reagentpIDR2-wtVIM-2this paperPlasmid housing the wtVIM-2 sequence, maintained by the Tokuriki lab. See Supplementary file 1 for full sequence.
Sequence-based reagentprimers with Nextera transposase adapter sequencethis paperPrimers used to extract regions of the VIM-2 gene from variant libraries after a selection experiment, attaching the Nextera transposase adaptor sequence in the process. See Supplementary file 2L for all primer sequences.
Chemical compound, drugAmpicillinFisher ScientificBP1760
Chemical compound, drugCefotaximeFisher ScientificBP29511
Chemical compound, drugMeropenemSigma AldrichM2574
Commercial assay or kitNextSeq 500/550 High Output Kit (300 cycles)Illumina20024908
Software, algorithmDMS-FastQ-Processing scriptthis paperScript used to merge and quality filter paired end FastQ reads. Code available at https://github.com/johnchen93/DMS-FastQ-processing (Chen, 2020; copy archived at https://github.com/elifesciences-publications/DMS-FastQ-processing)

Materials

LB Broth, Miller (BP1426), ampicillin sodium salt (BP1760) and cefotaxime sodium salt (BP29511) were purchased from Fisher Scientific. Meropenem trihydrate (M2574) was purchased from Sigma-Aldrich (Millipore sigma). E. cloni 10G electrocompetent cells (60061) and E. cloni 10G chemically competent cells (60107) were purchased from Lucigen Corp. The KAPA HiFi PCR Kit (KK2102) was purchased from KAPA Biosystems Inc, the E.Z.N.A. Cycle Pure Kit was purchased from OMEGA Bio-tek Inc and the QIAprep Spin Miniprep Kit was purchased from Qiagen. The NextSeq 500/550 High Output Kit (300 cycles) (20024908) was purchased from Illumina Inc.

Generation of a VIM-2 mutagenized library with all possible single amino acid substitutions

Request a detailed protocol

The wild-type (wt) VIM-2 gene including its signal peptide sequence from Pseudomonas aeruginosa was synthesized (Bio Basic Inc) and subcloned into an in-house plasmid, pIDR2 (chloramphenicol resistance) (Supplementary file 1), under a constitutive AmpR promoter using Nco I and Xho I restriction enzymes (Fisher Scientific). The ATG codon in the Nco I site was used as the start codon. However, the cut site requires an extra G nucleotide to follow the start codon and an additional Gly (GGA codon) residue was inserted into the second position of the VIM-2 sequence; this extra Gly relative to wtVIM-2 will be labeled as G2 to distinguish it from position two in the wt sequence. The pIDR2 plasmid containing the wtVIM-2 gene will be referred to as pIDR2-wtVIM-2.

To generate all single amino acid variants, a library of codon mutants was made for each codon (267 positions) in the wtVIM-2 gene using restriction-free cloning (RF cloning) (van den Ent and Löwe, 2006Figure 1—figure supplement 2). We designed a forward primer for each codon position that contains a degenerate ‘NNN’ codon—using a MATLAB script used to design primers for the PFunkel method (Firnberg and Ostermeier, 2012)—and a single reverse primer, allowing a PCR to amplify part of the wtVIM-2 gene while incorporating the mutation. The PCR reaction to amplify part of the gene was done using a KAPA HiFi PCR Kit (Kapa Biosystems, Inc) for 30 cycles of amplification each with denaturation at 98°C for 20 s, annealing at 62°C for 15 s and extension at 72°C for 15 s; 1 ng of pIDR2-wtVIM-2 was used as template in a 20 µL reaction, with 1 µL each of the forward and reverse primer (10 µM). The first PCR products were purified using E.Z.N.A. Cycle Pure Kit (OMEGA Bio-tek, Inc). Afterwards, 10 µL of the first PCR product was then used as a primer to extend the entire plasmid, where the cycling conditions were identical to the first reaction except the extension time (90 s) and 1 ng of pIDR2-wtVIM-2 was freshly added as the template. Product from the second PCR was treated with Dpn I for one hour at 37°C to degrade the original wtVIM-2 plasmids, and then the amplified plasmids were purified and concentrated by the ethanol precipitation method. Subsequently, the purified plasmids were transformed into E. cloni 10G chemically competent cells (Lucigen Corp.) using the supplier’s recommended heat-shock transformation protocol and plated on LB-Cm (containing 25 µg/mL chloramphenicol) agar plates. We then counted the number of colony forming units (CFU) obtained after the transformation for every mutagenesis library. Using CASTER (Reetz and Carballeira, 2007) and GLUE (Firth and Patrick, 2008), we conservatively estimated that at least 700 CFU after transformation is needed to achieve 100% coverage of all 64 codon variants per position. If a transformation met the required CFU, all colonies were collected and the plasmids were purified using QIAprep Spin Miniprep Kit (QIAGEN N.V.), while those that did not were re-transformed or re-cloned until the count was met.

Antibiotic selection of the VIM-2 mutagenized library

Request a detailed protocol

Mutant libraries at individual codons were mixed into seven groups of 39 (33 for the last group) consecutive codons (see ‘Deep sequencing and quality control’). E. cloni 10G electrocompetent cells (Lucigen Corp.) were transformed with 1 ng of the plasmid DNA from each of the seven groups using the supplier’s recommended electro-transformation protocol and grown overnight in 10 mL LB-Cm shaking at 30°C. We plated 1/1000 of the transformed culture on LB-Cm agar plates to estimate total CFU after transformation. Using CASTER and GLUE, it was estimated that 20,000 CFU are needed to fully cover 2496 codon mutants (64 codons × 39 positions) and all groups transformed had at least 100,000 CFU. The transformed libraries were suspended in LB media and preserved in 1 mL aliquots at −80°C in LB with 25% final volume glycerol.

Antibiotic selection was conducted in duplicate on two separate days. For each experiment, the 1 mL glycerol stock from each group was thawed and grown in 10 mL LB-Cm shaking at 30°C for 16 hr, with optical density at 600 nm (OD600) of the cell culture reaching ~1.5. The cultures were then diluted by 1000 fold into fresh LB-Cm and grown at 37°C for 1.5 hr. After 1.5 hr of growth (OD600 of the culture is between 0.01 and 0.02), 960 µL of each culture was directly introduced to 40 µL of the antibiotics at 25 × concentration (final concentrations are 128, 16 and 2.0 µg/mL for AMP, 4.0 and 0.5 µg/mL for CTX, and 0.031 µg/mL for MEM, prepared in water) or water (no selection) into the wells of a 2.2 mL deep-well 96 well plate, and grown for 6 hr at 37°C. The cultures were also selected at the same AMP concentrations or grown without selection at 25°C. A culture of E. cloni 10G electrocompetent cells transformed with pIDR2-wtVIM-2 was also grown for 6 hr at 37°C.

After placing the cultures under selection, antibiotics and DNA from lysed cells were removed by washing the selected cultures three times using a Biomek 3000 (Beckman Coulter Inc) liquid handling robot. For each wash, the culture was centrifuged at 4000 RPM for 12 min, and the supernatant was removed using the Biomek. Subsequently, 800 µL of fresh LB was manually dispensed into the wells, the plate was sealed with plastic film, and the pellets were resuspended by vortexing. The resuspended cultures were centrifuged again for the next cycle of the wash. After the final wash, all cultures were propagated overnight shaking at 30°C in 1 mL of LB-Cm. Plasmid DNA was purified from the cultures using a QIAprep 96 Turbo Kit (QIAGEN N.V.).

Determination of half maximal effective concentration (EC50)

Request a detailed protocol

We isolated 12 variants from codon libraries at positions 55, 62, 67, 68 and 11 variants from codon libraries at positions 205, 209, 210, 211 by transforming the libraries into E. coli, plating on agar plates, and picking single colonies. The identity of each variant was obtained by Sanger sequencing. The variants are grown into glycerol stocks in a 2.2 mL 96 well deep well plate; two single colonies transformed with pIDR2-wtVIM-2 and empty vector were also placed on this plate as controls.

The variants in the plate were then placed under the same liquid culture selection procedure as the DMS selection experiments (see ‘Antibiotic selection of the VIM-2 mutagenized library’ above), up to the end of the 6 hr of selection where the cell growth (OD600) was measured. The range of selection is 1–1024 µg/mL for AMP, 0.0625–64 µg/mL for CTX and 0.002–2 µg/mL for MEM, separated in two fold increments. All variants were also grown without antibiotics as a growth control. We calculate the EC50 by fitting Equation (2) using the ‘curve_fit’ function of the ‘Scipy.optimize’ package.

(2) % growth=bottom+top-bottom1+EC50drug concentrationHill Coefficient

Initial estimates were 100% for top, 0% for bottom, −1.0 for the Hill coefficient and 1.0 for the EC50. In the case where a variant’s growth curve did not produce a successful fit based on the initial estimate, only the Hill coefficient and EC50 were adjusted until the fit was successful. Variants where the EC50 do not appear in the growth curve (stop codons, highly deleterious mutations and empty vector) could not be fitted and were excluded.

The DMS fitness scores (y-values) for each antibiotic were fitted against the EC50 (x-values) using a similar sigmoidal curve in Equation (3) with the ‘curve_fit’ function.

(3) DMS fitness score=bottom+top-bottom1+x0EC50Hill Coefficient

The initial estimates for top, bottom and Hill coefficient are 2.0,–4.0 and 1.0 respectively for all antibiotics. The initial estimate for x0, the inflection point of the curve, was 64 µg/mL for AMP, 4.0 µg/mL for CTX and 0.031 µg/mL for MEM.

The linear region of the sigmoidal curve for the DMS fitness scores was calculated using Equations (4-7), based on the final fitted values for each antibiotic (Sebaugh and McCray, 2003).

(4) Ylower=top+( bottom-top1+1/4.6805)
(5) Yupper=top+( bottom-top1+4.6805)
(6) Xlower=x0( bottom-YlowerYlower-top)1Hill Coefficient
(7) Xupper=x0( bottom-YupperYupper-top)1Hill Coefficient

Deep sequencing and quality control

Request a detailed protocol

We grouped individual codon libraries into seven groups of 39 consecutive codons (33 for the last group) so that all mutations are within a distance of 117 bp (99 bp for the last group), allowing 150 bp forward and reverse deep sequencing reads to generate full overlap of each group. PCR amplicons of each library group and wtVIM-2 were generated using primers that flank the 117 bp region of each group, where primers have a Nextera transposase adapter sequence (Illumina, Inc) in the 5’ overhangs. We used KAPA HiFi PCR Kit (Kapa Biosystems, Inc) for 15 cycles of amplification each with denaturation at 98°C for 20 s, annealing at 65°C for 5 s and no extension time. The amplicons were extended by PCR again to include the sample indices (i7 and i5) and flow cell binding sequence, then sequenced using a NextSeq 550 sequencing system (Illumina, Inc); all samples were sequenced in the same NextSeq run. The raw sequencing data can be found on the NCBI Sequencing Read Archive (SRA) (BioProject accession: PRJNA606894). Each group under each condition received between 400,000 to 1,000,000 reads, which is at least 160 reads per codon variant on average. There were five samples that received as low as 100,000 reads in one of the two replicates, but the correlation of fitness scores between both replicates still showed an R2 of at least 0.72 and the scores were retained.

To process the deep sequencing data, including merging paired-end reads, quality filtering, variant identification and fitness score calculation, we use a set of in-house Python scripts (https://github.com/johnchen93/DMS-FastQ-processing; Chen, 2020; copy archived at https://github.com/elifesciences-publications/DMS-FastQ-processing). Paired-end reads in FastQ format were first merged, where quality (Q) scores of matching read positions were combined using a posterior probability calculation to obtain posterior quality (Q) scores, measured on the Phred scale for sequencing quality (Edgar and Flyvbjerg, 2015). In the case of a base mismatch between the forward and reverse reads, the base was taken from the read with the higher Q score at the position.

Reads that had more than 20 base mismatches between the forward and reverse reads, or that had any bases with a posterior Q score less than 10 were discarded. It was found that above a posterior Q score cut-off of 10 or more, the average proportion of sequencing errors per position stabilized and no sizeable reduction of sequencing errors can be obtained by posterior Q score cut-offs (Figure 1—figure supplement 3A). Usually, 75–85% of all reads passed these filters. Additionally, the expected number of errors per read was calculated from adding the error rates calculated from the posterior Q Scores of every position in the entire read (Edgar and Flyvbjerg, 2015). Reads that had an expected number of error greater than one would also be discarded, however no reads exceeded this limit after the previous filters.

Variant identification and noise filtering

Request a detailed protocol

Once forward and reverse reads were merged and filtered by read quality, codon mutants were identified and counted, then aggregated into amino acid (or stop codon) variants. Codon mutations were identified by comparing to the wtVIM-2 DNA sequence as a reference. Since we only intended for single codon mutants in the library, any sequence with mutations in more than one codon was discarded, leading to retention of 80–90% of the filtered reads.

To exclude variants that may be due to sequencing errors alone, we estimated the expected frequency of each variant generated by sequencing errors and excluded variants that have less than 2 × the expected frequency in the non-selected library. Using sequencing data from wtVIM-2, we calculated the error rates that originates from culture growth, sample preparation (PCRs) and sequencing. The error rates at each position was calculated by dividing the errors observed by the total number of reads at that position (Figure 1—figure supplement 4D). The mean of the distribution of the positional errors was used as the estimate for error rates (0.072%) in all positions across the VIM-2 gene. The proportion of each type of nucleotide error (A > T, A > C, A > G, etc.) was calculated to estimate the likelihood of each type of nucleotide error given a starting nucleotide (Figure 1—figure supplement 3B).

We made the observations that 1) sequencing errors in wt sequences will generate single codon mutants, but errors in single codon variants are most likely to be turned into double codon mutants (Figure 1—figure supplement 4A) 2)~5–10% of reads in each library group were occupied by wt sequences while other variants are rarely higher than 0.5% (Figure 1—figure supplement 4B) and 3) single nucleotide sequencing errors are the most abundant type of errors affecting up to 10% of all reads, while higher numbers of sequencing errors are nearly negligible (Figure 1—figure supplement 4C). In summary, single nucleotide errors on wt sequences are the main source of single codon variants arising from sequencing errors. Thus, we first calculated the chance of each of the 64 codons to mutate into the nine adjacent codons by single nucleotide sequencing errors using Equation (8) (Figure 1—figure supplement 4E).

(8) chance of codon with substitution XY =error rate per position × frequency of substitution XY

For example, to gauge how often AAA gets mutated to GAA by chance, we multiplied the per position error rate by the proportion of G mutations when starting from an A, leading to 0.0719% × 70.7% = 0.000719×0.707 = 0.00051. This means for each 100,000 wt reads that has an AAA codon at a given position, we expect 51 GAA mutants on average that arise by chance at the same position. We calculated expected codon error frequencies from every codon, then summed the expected error frequencies of the codons mutant for each amino acid variant (Figure 1—figure supplement 4F). The error frequency is multiplied by the count of the wt reads in each non-selected library group to arrive at an expected error count for that group. Subsequently, we compared the observed count of amino acid (or codon) variants in the non-selected library to the expected count from errors alone and we accepted a variant as truly existing if the observed count is at least twice the expected count from errors. In addition, because our filtering method only accounts for the nine codons with a single mutation relative to the wt codon, we also applied a count cut-off of 5 for all variants to reduce noise by excluding very low count data.

Fitness score calculations

Request a detailed protocol

The fitness score of each variant was calculated according to Equation (1) (see main text). To calculate the fitness score of a given amino acid (or codon) variant, the read count of the variant was first normalized to frequencies within the non-selected or selected library group. Variants that exist in the non-selected library but disappear in the selected library were interpreted to have been removed by antibiotic selection, and were given a dummy count of 1 to emulate the minimum frequency observable for that variant. The frequency of the variant after selection was divided by the frequency of the same variant in the library grown without selection for an enrichment ratio; synonymous codon variants were also considered as variants rather than wt during scoring. The variant enrichment ratio was then normalized to the enrichment ratio of the wt. The final score was expressed in Log2 units, and scores were calculated separately across the seven groups and separately for each replicate.

When combining all data across the seven groups, we subtracted the mean fitness scores of all synonymous variants in each group from all variants of that group to center the mean fitness of synonymous variants at a fitness score of 0. To combine scores from replicates, we simply averaged the fitness scores across the two replicates, and take the single score if only one replicate contained the variant above noise in the non-selected library.

Fitness effect classification

Request a detailed protocol

To classify each amino acid variant as positive, neutral or negative for each of the three selection antibiotics, we use the score of each variant in a two tailed z-test on a normal distribution (null-model) with the same mean and standard deviation as our synonymous distribution (244 synonymous variants total). The P-values were then FDR corrected to an α of 0.05 using the Benjamini-Hochberg procedure; only nonsynonymous variants were tested and the total number of tests was 5291. The variants with scores that are significantly different from the synonymous distribution after FDR correction are then classified as ‘positive’ if their score is greater than the synonymous mean and ‘negative’ if the score is less, while the remaining variants are classified as ‘neutral’.

Linear model of DMS scores with various predictors

Request a detailed protocol

We generated a linear model in R using a combination of terms to try and find properties that best explain the behavior we see in the DMS fitness scores. Using the fitness score as a response, we tried using 1) wild-type (wt) amino acid 2) variant (var) amino acid 3) accessible surface area (ASA) of the residue calculated from the crystal structure of wt VIM-2 (PDB: 4bz3) using ASA view (Ahmad et al., 2004) 4) change in amino acid volume (Perkins, 1986) (Δvolume = volumevar volumewt) 5) change in amino acid polarity (hydrophathy index Kyte and Doolittle, 1982) (Δpolarity = Δpolarityvar – Δpolaritywt) 6) distance of the alpha carbon of each residue in the crystal structure to the active site water held between the Zn ions 7) Rosetta predicted stability change between the variant and the wt (ΔΔG = ΔGvar-ΔGwt) (also see ‘Rosetta ΔΔG Calculation’ below) and 8) BLOSUM62 score for the substitution from wt to variant. Only variants from positions observable in the crystal structure were modelled (positions 32 to 262), and synonymous variants were excluded.

All parameters were first modelled individually as predictors with DMS fitness score as the response, and the predictors with R2 higher than 0.10 are then modelled in combinations of two or more until the combination with the least predictors and the highest adjusted R2 was found. Predictors with R2 less than 0.10 are also retried in combination with the best predictors when optimizing for adjusted R2. Interaction between predictors were tested, but they did not improve adjusted R2 and were excluded for the sake of simplicity. The relative contribution of each term to the overall adjusted R2 were calculated using the R package ‘relaimpo’, using the ‘lmg’ method (Groemping, 2006).

The final equation of the linear model is shown in Equation (9) where the fitness score of a given variant is the additive combination of the model intercept β0 and the various properties and the coefficients of the properties (e.g. βASA and ASA) plus a random error term ε.

(9) FitnessScore=β0+βASA×ASA+βΔΔG×ΔΔG+βwt×wt+βvar×var+ϵ

The categorical predictors wt and var are simplified in the equation and each is actually a collection of terms in the model, where every amino acid is a single binary term represented by 0 or 1 such that 1 indicates the presence of the amino acid. For example, the variant Q60V has Q = 1 for wt and V = 1 for var, while all other wt and var amino acids are set to 0. Ala is not present as one of the estimates in either wt or var, because they are used in calculating the intercept; this is the mean fitness of all data points where either wt = 1 or var = 1 for alanine.

Rosetta δδg calculation

Request a detailed protocol

To estimate the effects of each VIM-2 variant on the stability of the protein, we used the Rosetta ‘ddg_monomer’ application to calculate the folding energy of a monomeric protein crystal structure. Rosetta was run on the Compute Canada server Cedar using a Rosetta 3.8 installation. Following the ‘ddg_monomer’ documentation, the VIM-2 structure (PDB: 4bz3) was first processed using ‘preminimize’ to pre-optimize the packing of the crystal structure and generate a constraints file. Then, all single amino acid variant structures and the wt structure at each position were simulated 50 times each using ‘ddg_monomer’, configured to protocol 16 as specified in Kellogg et al., 2011 while using Talaris 2014 as the scoring function. We store the simulated structures (variant and wt) as PDB files and scored them using the Rosetta ‘score’ function with Talaris 2014 weights to obtain the predicted ΔG in Rosetta Energy Units (REU). We average the predicted ΔG of all 50 replicates of each variant or wt. The ΔΔG is calculated using Equation (10) as the difference in average ΔG between variant and wt at the same position.

(10) ΔΔG=ΔGvarΔGwt

RNA folding energy calculation for single codon mutants

Request a detailed protocol

The RNA folding energy contribution of the 5’ UTR and signal peptide region is calculated according to a previously described method (Bhattacharyya et al., 2018). The ΔG of folding of the 5’ UTR and signal peptide is calculated using equation (11).

(11) ΔG1,118=ΔG1,841ΔG119,841

Each ΔG term is calculated using the NUPACK software package (Zadeh et al., 2011), using the ‘pfunc’ program which calculates the ΔG of all RNA secondary structures from the partition function. All folding energies were calculated using default conditions, with [Na+]=1 M and T = 37°C, and the results are in units of kcal mol−1. The subscripts for each ΔG term indicates the first and last nucleotide position in the transcript that is used for calculating the ΔG of folding, respectively. Thus, ΔG1,841 represents the calculated folding energy of the full transcript (including the 5’ UTR and coding region, but excluding the 3’ UTR), while ΔG119,841 is the folding energy of the transcript after the signal peptide. The interpretation of ΔG1,118 is that it is the folding energy contribution of the RNA transcript up to the end of the signal peptide, including energy from non-local interactions with downstream parts of the transcript but excluding folding energy of interactions exclusively within positions downstream of the signal peptide. The DNA sequence of the wtVIM-2 transcript used to calculate the folding energies is shown below, with positions 1–118 italicized and the translated signal peptide region underlined for clarity; the sequence is converted to RNA for calculation.

5’-CTGATAAATGCTTCAATAATATTGAAAAAGGAAGCCCATGGGATTCAAACTTTTGAGTAAGTTATTGGTCTATTTGACCGCGTCTATCATGGCTATTGCGAGCCCGCTCGCTTTTTCCGTAGATTCTAGCGGAGAATATCCGACAGTCAGCGAAATTCCGGTCGGGGAGGTCCGGCTTTACCAGATTGCCGATGGTGTTTGGTCGCATATCGCAACGCAGTCGTTTGATGGCGCAGTCTACCCGTCCAATGGTCTCATTGTCCGTGATGGTGATGAGTTGCTTTTGATTGATACAGCGTGGGGTGCGAAAAACACAGCGGCACTTCTCGCGGAGATTGAGAAGCAAATTGGACTTCCTGTAACGCGTGCAGTCTCCACGCACTTTCATGACGACCGCGTCGGCGGCGTTGATGTCCTTCGGGCGGCTGGGGTGGCAACGTACGCATCACCGTCGACACGCCGGCTAGCCGAGGTAGAGGGGAACGAGATTCCCACGCACTCTCTTGAAGGACTTTCATCGAGCGGGGACGCAGTGCGCTTCGGTCCAGTAGAACTCTTCTATCCTGGTGCTGCGCATTCGACCGACAACTTAATTGTGTACGTCCCGTCTGCGAGTGTGCTCTATGGTGGTTGTGCGATTTATGAGTTGTCACGCACGTCTGCGGGGAACGTGGCCGATGCCGATCTGGCTGAATGGCCCACCTCCATTGAGCGGATTCAACAACACTACCCGGAAGCACAGTTCGTCATTCCGGGGCACGGCCTGCCGGGCGGTCTTGACTTGCTCAAGCACACAACGAATGTTGTAAAAGCGCACACAAATCGCTCAGTCGTTGAGTAA-3’.

Equation (12) is used to calculate the ΔΔG of folding upon codon mutations in the signal peptide.

(12) ΔΔGsigpepRNA=(ΔG1,118varΔG1,118wt)/(kT)

ΔG1,118wt is the folding energy of the wtVIM-2 signal peptide sequence shown above, while ΔG1,118var is the folding energy of the signal peptide sequence with a single codon mutation. The difference in folding energy is normalized to the thermal energy factor kT, where k = 0.0019872041 kcal mol−1 K−1 and T = 310.15 K (37°C).

Identification of critical residues and temperature dependence

Request a detailed protocol

We classify the role of residues in wtVIM-2 by examining the DMS fitness scores of selection conducted at 128 µg/mL and 16 µg/mL AMP at 37°C and 25°C (four data sets), excluding signal peptide residues 1–26. Residues are classified as ‘essential’ when > 75% of variants are below a fitness score of −2.0 (halfway between zero and the lower score limit of −4) when selected at the least stringent condition of 16 µg/mL and 25°C. Residues are classified as ‘tolerant’ when > 75% of variants are above a fitness score of −1.0 (capturing the lower end of the peak centered at neutral fitness) when selected at the most stringent condition of 128 µg/mL and 37°C. Residues are classified as ‘temperature dependent’ when the fitness score of 25°C selection is higher than 37°C selection of the same AMP concentration by at least 2.0, for two or more variants at either 128 µg/mL or 16 µg/mL AMP. The ‘temperature dependent’ classifications overwrite ‘essential’ or ‘tolerant’ classifications (four occurrences total). Residues that do not fall into the three other classifications are defined as ‘residue dependent’.

Detecting hydrogen bonds in the wtVIM-2 crystal structure

Request a detailed protocol

Potential hydrogen bonding pairs in the wtVIM-2 crystal structure (PDB: 5yd7, chain A only) were extracted using the ‘Polarpairs’ script in PyMol (https://pymolwiki.org/index.php/Polarpairs). The script filters for pairs of h-bond donor and acceptor atoms within a defined distance and h-bond angle; we set the distance limit to be within 3.6 Å and the h-bond angle to be greater than 63°. The script returns atom indices, which were converted to PBD atom IDs using pymol’s built in ‘id_atom’ function. The atoms were then extracted from the 5yd7 pdb file using the atom IDs, and the atom’s name was used to determine if the h-bond was formed between backbone atoms only (‘N’ and ‘O’ atoms indicate backbone amides and carbonyls, respectively), between sidechain atoms only or between backbone and sidechain atoms. All extracted h-bonds can be found in Supplementary file 2G.

Analysis of specificity variants

Request a detailed protocol

We define variants with altered specificity by filtering for variants with a change in fitness effect classifications (positive, neutral and negative, defined in ‘Fitness effect classification’) as well as a fitness score difference of 2.0 between at least 2 of the three antibiotics being compared.

Specificity positions are visualized on the wtVIM-2 structure (PDB: 5yd7) using PyMol, with substrates overlaid from aligned MBL homolog structures (AMP - NDM-1(PDB:4hl2), cefuroxime – NDM-1(PDB:4rl0), MEM – VIM-1 (PDB:5n5i)). To overlay the substrates, the six metal binding residues (structure positions 114, 116, 118, 179, 198, 240 for VIM-1/VIM-2 and 120, 122, 124, 189, 208, 250 for NDM-1) and the two active-site Zn ions were selected from each structure, and the PyMol ‘align’ function was used with the VIM-2 active-site as the target object and the other structure’s active-site as the mobile object. When structures have more than one chain (PDB: 5yd7, 4hl2 and 4rl0), only chain A was used in the alignment. The protein portions of the homolog structures are hidden after alignment, to visualize just the substrates with the wtVIM-2 structure.

Collection of naturally occurring VIM variants

Request a detailed protocol

Amino acid sequences of naturally observed VIM variants were extracted by performing a BLASTP search of the NCBI non-redundant protein database (NCBI Resource Coordinators, 2018), using the protein sequence of our in-house VIM-2 with Gly removed from the 2nd position, identical to the VIM-2 discovered in Pseudomonas aeruginosa isolates (UniProt accession: A4GRB6). We retained all BLASTP results with at least 70% identity and >90% query coverage. We also retrieved protein sequences from the Comprehensive Antibiotic Resistance Database (CARD) (Jia et al., 2017). All sequences from both sources were merged and sequences that are exactly identical in length and sequence were combined, while sequences that are less than 250 or greater than 290 residues were excluded. Recombinant variants VIM-12 and VIM-25 were excluded from this analysis, as well as VIM-14 (UniProt accession: Q6GUL7) which is a member of the VIM-1 clade despite being labeled as both VIM-11 and VIM-14 (UniProt accession: A0SWU7) (both are in the VIM-2 clade).

To identify all mutations different between wtVIM-2 and the 55 other variants, a multiple sequence alignment (MSA) was constructed using the MUSCLE method in MEGA 7 (version 7.0.26) (Kumar et al., 2016). Equivalent positions bearing a different amino acid from VIM-2 were identified as mutations, while deletions are ignored (one in VIM-1, one in VIM-7, four in VIM-18). The MSA was also used to generate a maximum likelihood phylogenetic tree using MEGA 7 (default settings). The tree was used to identify separate VIM clades which were labeled using the VIM variant with the lowest number in the clade (VIM-1, VIM-2, VIM-7, VIM-13).

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
    Role of amino-terminal positive charge on signal peptide in staphylokinase export across the cytoplasmic membrane of Escherichia coli
    1. T Iino
    2. M Takahashi
    3. T Sako
    (1987)
    The Journal of Biological Chemistry 262:7412–7417.
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
  62. 62
  63. 63
  64. 64
  65. 65
  66. 66
  67. 67
  68. 68
  69. 69
  70. 70
  71. 71
  72. 72
  73. 73
  74. 74
  75. 75
  76. 76
  77. 77
  78. 78
  79. 79
  80. 80
  81. 81
  82. 82
  83. 83
  84. 84
  85. 85
  86. 86
  87. 87
  88. 88
  89. 89
  90. 90
  91. 91
  92. 92
  93. 93
  94. 94
  95. 95
  96. 96
  97. 97
  98. 98
  99. 99
  100. 100

Decision letter

  1. Sarel Jacob Fleishman
    Reviewing Editor; Weizmann Institute of Science, Israel
  2. Michael A Marletta
    Senior Editor; University of California, Berkeley, United States
  3. Marc Ostermeier
    Reviewer; Johns Hopkins University, United States
  4. Timothy A Whitehead
    Reviewer; Michigan State University, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

This comprehensive study of mutational effects on enzyme specific activity and their relationship to temperature sheds light on whether deleterious mutations exert their effect due to changes in catalytic activity or protein folding and stability. The work also addresses questions related to the impact of the signal peptide sequence on fitness and demonstrates a correlation between computed stability scores and experimental fitness values of mutations in the protein core. This very broad study addresses important questions on the relationship between sequence, structure and function in a rigorous way.

Decision letter after peer review:

Thank you for submitting your article "Comprehensive exploration of the translocation, stability and substrate recognition requirements in VIM-2 lactamase" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Michael Marletta as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Marc Ostermeier (Reviewer #1); Timothy A Whitehead (Reviewer #3).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

We would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (https://elifesciences.org/articles/57162). Specifically, we are asking editors to accept without delay manuscripts, like yours, that they judge can stand as eLife papers without additional data, even if they feel that they would make the manuscript stronger. Thus the revisions requested below only address statistical analysis, clarity and presentation.

Summary:

Chen et al. sought to understand the role of each residue in the VIM-2 lactamase with respect to protein stability, activity and substrate selectivity. They generated a wealth of data by analyzing VIM-2 mutants under the selective pressures of three antibiotics across a range of concentrations. Of particular interest are the analysis of residues important for enzyme specific activity and how temperature impacts fitness. The temperature dependence analysis provides insight on whether deleterious mutations exert their effect due to changes in catalytic activity or protein folding/stability. The work also addresses questions related to the impact of the signal peptide sequence on fitness and demonstrates a correlation between computed stability scores and experimental fitness values of mutations in the protein core. The study is therefore very broad and addresses several important questions on the relationship between sequence, structure and function in a rigorous way.

The reviewers noted several areas where the statistical analysis and the presentation could be improved, as detailed below.

Essential revisions:

1) The finding that the signal peptide "is less than optimal" can arise from several reasons: the 5' UTR on the plasmid may be substantially different than that observed in clinical isolates (see recent work from Shakhnovich on how 5'UTR interactions impact fitness of coding sequences and from Ostermeier, Mehlhoff et al., 2020 on the fitness effects of TEM-1 BLA signal peptide in the absence of antibiotic selection). Hence, the authors may want to tone down the strength of their claim while suggesting alternatives or substantiate their claim through additional experiments or existing data.

2) Statistical analysis: In several points through the paper, the authors make claims on the strength of selection regarding specific mutations. These claims should be substantiated by a statistical measure because they may result from low sequence coverage in the unselected populations as is often the case in large-scale mutational screens such as this. In addition, the lack of uncertainty values for each fitness score limits the usefulness of this data set for other researchers, should they want to use values from individual mutations instead of looking at collective or global effects of mutations. Examples:

a) Subsection “Codon and amino acid optimization in the signal peptide”, second paragraph: The authors note that only mutations to Cys or Trp are deleterious at Leu23. Are the fitness values of Leu23Cys and Leu23Trp statistically different than neutral?

b) Subsection “Codon and amino acid optimization in the signal peptide”, last paragraph: The first sentence talks about Figure 5C and how the WT residues or conserved residues do not have the highest fitness. But the authors provide no statistical test to say whether these fitness scores are different than each other, and many of them fall within the range of -0.7 to 0.7 that are considered neutral. This same comment applies throughout this paragraph. Are differences in values that the authors cite supported by statistical tests? Such statistical analysis is necessary for the claims made. This is particularly true when comparing synonymous mutations because the sequencing counts are lower than when counts of all synonymous mutations are grouped.

c) Subsection “Distinct recognition for different classes of β-lactam substrates”, first paragraph: Statements on purported adaptive mutants at positions 47, etc. should be substantiated by a statistical test using the uncertainty in those fitness values or by performing EC50 assays, as was done to substantiate claims of changes in substrate specificity.

d) Supplementary Data: The authors provide the fitness scores but no measure of the uncertainty in those scores. Such an analysis of the uncertainty is necessary for several claims made throughout the paper.

e) The comparison to EC50 between deep sequencing results and individual clones should be supported by some statistical measure, possibly a nonparametric rank-order correlation. Additionally, their "EC50" should more likely be classified as an IC50, and the half-inhibitory concentration they are measuring is probably not very accurate.

3) Given that there is now a better understanding for allowed mutations within Vim2 and a fitness description, would the authors be able to suggest measures about treatment options? For instance co-administration or alteration of different antibiotics?

https://doi.org/10.7554/eLife.56707.sa1

Author response

Essential revisions:

1) The finding that the signal peptide "is less than optimal" can arise from several reasons: the 5' UTR on the plasmid may be substantially different than that observed in clinical isolates (see recent work from Shakhnovich on how 5'UTR interactions impact fitness of coding sequences and from Ostermeier, Mehlhoff et al., 2020 on the fitness effects of TEM-1 BLA signal peptide in the absence of antibiotic selection). Hence, the authors may want to tone down the strength of their claim while suggesting alternatives or substantiate their claim through additional experiments or existing data.

Indeed, we expect the 5’UTR to be a factor in the codon level fitness and we believe we have cited the work from Shakhnovich (PMID: 29883608); the RNA folding energy calculation in our study follows their methodology. We have softened the language in our claim and explicitly made mention to this caveat in the concluding paragraph (subsection “Codon and amino acid optimization in the signal peptide”) of this section to deliver a more accurate message, in which we now discuss the sensitivity of the signal peptide to change, instead of declaring an absolute lack of optimization.

The work from the Ostermeier group has also been integrated into the subsection “Global view of VIM-2 enzyme characteristics” (deleterious effects of Cys mutations) and in the second and last paragraphs of the subsection “Codon and amino acid optimization in the signal peptide” (disruption of the hydrophobic region by charged residues) and (effects on translocation).

2) Statistical analysis: In several points through the paper, the authors make claims on the strength of selection regarding specific mutations. These claims should be substantiated by a statistical measure because they may result from low sequence coverage in the unselected populations as is often the case in large-scale mutational screens such as this. In addition, the lack of uncertainty values for each fitness score limits the usefulness of this data set for other researchers, should they want to use values from individual mutations instead of looking at collective or global effects of mutations. Examples:

Comments 2a-c address a common theme in the lack of specific support for differences in fitness scores. In response, we have made more explicit reference to the classification of positive/neutral/negative fitness effects relative to wtVIM-2, which is supported by a z-test against the synonymous variants at 5% FDR correction using the Benjamini-Hochberg procedure. This classification scheme will be used whenever we wish to evaluate the effect of variants relative to wtVIM-2.

In addition, we have included extra statistical support for claims based on differences between distributions using two-tailed Mann-Whitney U tests:

i) The general inference of Cys and Pro variants displaying more deleterious effects than mutations to other amino acids, in the first paragraph of the subsection “Global view of VIM-2 enzyme characteristics”, is now supported by Mann-Whitney U tests between all unique pairs of missense variant distributions grouped by mutated amino acid (Figure 3—figure supplement 3).

ii) The claim that buried positions are more sensitive to mutation, in the aforementioned paragraph, has been supported by a test between the DFE of variants with ASA <0.3 and the DFE of variants with ASA ≥ 0.3. The distributions and test results have also been added to Figure 4A.

iii) The claim that the signal peptide is tolerant to mutations, in the second paragraph of the subsection “Codon and amino acid optimization in the signal peptide”, is now supported by a test between the DFEs of the signal peptide and the body of VIM-2. The discussion of tolerated signal peptide mutations has also been limited to missense variants to exclude stop codon variants and the related information has been updated in Figure 5B and in the aforementioned paragraph.

a) Subsection “Codon and amino acid optimization in the signal peptide”, second paragraph: The authors note that only mutations to Cys or Trp are deleterious at Leu23. Are the fitness values of Leu23Cys and Leu23Trp statistically different than neutral?

The mentioned mutations are the strongest of 5 negative variants out of 95 neutral or positive variants. We have noted the above in the second paragraph of the subsection “Codon and amino acid optimization in the signal peptide” and have provided the fitness scores of the L23C and L23W variants as support.

b) Subsection “Codon and amino acid optimization in the signal peptide”, last paragraph: The first sentence talks about Figure 5C and how the WT residues or conserved residues do not have the highest fitness. But the authors provide no statistical test to say whether these fitness scores are different than each other, and many of them fall within the range of -0.7 to 0.7 that are considered neutral. This same comment applies throughout this paragraph. Are differences in values that the authors cite supported by statistical tests? Such statistical analysis is necessary for the claims made. This is particularly true when comparing synonymous mutations because the sequencing counts are lower than when counts of all synonymous mutations are grouped.

The exact proportion of variants in the signal peptide with a positive effect relative to wtVIM-2 (10%) and the range of scores observed for these variants (0.7-1.3) have been noted in the third paragraph of the subsection “Codon and amino acid optimization in the signal peptide”.

We have removed the claim for increased and decreased fitness in individual codon variants, as we do not have enough power to conduct a statistical test using just two replicates.

We agree that the sentence describing the ‘codon dependent variants’ could be subject to improvement by a statistical test between synonymous codon mutants rather than a score difference cut-off. However, we only aim to note a trend where the majority of codon variation lies in the signal peptide and this trend would most likely remain similar even with a more stringent statistical test. We have weakened our statement to match the lack of statistical testing.

c) Subsection “Distinct recognition for different classes of β-lactam substrates”, first paragraph: Statements on purported adaptive mutants at positions 47, etc. should be substantiated by a statistical test using the uncertainty in those fitness values or by performing EC50 assays, as was done to substantiate claims of changes in substrate specificity.

As described above, variants with positive effects were supported by a statistical test against the synonymous distribution as variants with fitness scores > 0.7. In this claim, we took a more conservative cut-off of fitness score > 1 in all antibiotics, which was chosen to be above the upper fitness score range of the peak centered at neutral fitness in the DFE (Figure 2, center column); this rationale has been clarified in the first paragraph of the subsection “Distinct recognition for different classes of β-lactam substrates”. Furthermore, the strong correlation between fitness and EC50 of 39-45 variants in Figure 2 (right column) provides confidence that variants with high fitness scores also have increased resistance.

With regards to the term ‘adaptive’, we have further clarified that these mutants are those with positive fitness effects in the aforementioned paragraph. The term ‘adaptive’ has been modified to ‘positive’ to reflect the observed behavior in our dataset and avoid suggestions of evolutionary adaptation, which we agree would require further testing.

d) Supplementary Data: The authors provide the fitness scores but no measure of the uncertainty in those scores. Such an analysis of the uncertainty is necessary for several claims made throughout the paper.

As suggested, we included the sample standard deviation for both amino acid and codon variant fitness scores in Supplementary file 2A and D). As noted above, since most of our analysis involve relative differences to wtVIM-2, we’ve made an effort to adhere to the statistical test for positive (fitness >0.7), neutral (fitness from -0.7 to 0.7) or negative (fitness <-0.7) fitness effects relative to wtVIM-2 in those cases. For case where the classifications do not apply, such as the comparison of mutations at the codon level, we have made other adjustments in response to those comments.

e) The comparison to EC50 between deep sequencing results and individual clones should be supported by some statistical measure, possibly a nonparametric rank-order correlation. Additionally, their "EC50" should more likely be classified as an IC50, and the half-inhibitory concentration they are measuring is probably not very accurate.

As suggested, the comparisons between EC50 and fitness have been supported using a Spearman rank-order correlation. To accommodate a direct relation of rank between fitness and EC50 for each variant, Figure 2 has also been modified – where the correlation plots previously compared fitness to all uniquely measured EC50 values, fitness is now correlated to the average EC50 of each unique amino acid variant. This leads to some minor changes in the optimized sigmoidal fit parameters, which have been updated in the last paragraph of the subsection “Deep mutational scanning of VIM-2 metallo-β-lactamase”.

We prefer the use of EC50 over IC50 due to the bactericidal nature of the β-lactam antibiotics. Even though the mechanism of the β-lactam is inhibition of Penicillin-binding proteins (PBP), the response we are measuring to infer resistance is not an inhibition of growth but rather death due to PBP inhibition. As such, we believe the more general term of EC50 is more appropriate.

3) Given that there is now a better understanding for allowed mutations within Vim2 and a fitness description, would the authors be able to suggest measures about treatment options? For instance co-administration or alteration of different antibiotics?

We have extended the discussion in the last paragraph of the subsection “Conclusion” to include a suggestion for further investigation of combined usage of different classes of antibiotics. Ultimately, the situation involving multi-drug resistance is complex and MBLs such as VIM-2 are only a part of the entire problem. Hence, we also note various caveats and other solutions that need to be considered.

https://doi.org/10.7554/eLife.56707.sa2

Article and author information

Author details

  1. John Z Chen

    Michael Smith Laboratories, University of British Columbia, Vancouver, Canada
    Contribution
    Conceptualization, Software, Formal analysis, Investigation, Methodology, Writing - original draft, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-2628-5820
  2. Douglas M Fowler

    1. Department of Genome Sciences, University of Washington, Seattle, United States
    2. Department of Bioengineering, University of Washington, Seattle, United States
    Contribution
    Software, Validation, Methodology, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-7614-1713
  3. Nobuhiko Tokuriki

    Michael Smith Laboratories, University of British Columbia, Vancouver, Canada
    Contribution
    Conceptualization, Supervision, Funding acquisition, Project administration, Writing - review and editing
    For correspondence
    tokuriki@msl.ubc.ca
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8235-1829

Funding

Canadian Institutes of Health Research (FDN-148437)

  • Nobuhiko Tokuriki

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank the members of the Tokuriki lab for comments on the manuscript. Canadian Institute of Health Research (CIHR) Foundation Grant to NT. NT is a Michael Smith Foundation of Health Research (MSFHR) career investigator.

Senior Editor

  1. Michael A Marletta, University of California, Berkeley, United States

Reviewing Editor

  1. Sarel Jacob Fleishman, Weizmann Institute of Science, Israel

Reviewers

  1. Marc Ostermeier, Johns Hopkins University, United States
  2. Timothy A Whitehead, Michigan State University, United States

Publication history

  1. Received: March 6, 2020
  2. Accepted: June 6, 2020
  3. Accepted Manuscript published: June 8, 2020 (version 1)
  4. Version of Record published: June 22, 2020 (version 2)

Copyright

© 2020, Chen et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 210
    Page views
  • 34
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Biochemistry and Chemical Biology
    2. Plant Biology
    Pengxiang Fan et al.
    Research Article
    1. Biochemistry and Chemical Biology
    2. Cell Biology
    Senem Aykul et al.
    Research Article Updated