1. Biochemistry and Chemical Biology
  2. Structural Biology and Molecular Biophysics
Download icon

Uncovering the basis of protein-protein interaction specificity with a combinatorially complete library

  1. Thuy-Lan V Lite
  2. Robert A Grant
  3. Isabel Nocedal
  4. Megan L Littlehale
  5. Monica S Guo
  6. Michael T Laub  Is a corresponding author
  1. Department of Biology Massachusetts Institute of Technology, United States
  2. Howard Hughes Medical Institute Massachusetts Institute of Technology, United States
Research Article
  • Cited 0
  • Views 1,195
  • Annotations
Cite this article as: eLife 2020;9:e60924 doi: 10.7554/eLife.60924

Abstract

Protein-protein interaction specificity is often encoded at the primary sequence level. However, the contributions of individual residues to specificity are usually poorly understood and often obscured by mutational robustness, sequence degeneracy, and epistasis. Using bacterial toxin-antitoxin systems as a model, we screened a combinatorially complete library of antitoxin variants at three key positions against two toxins. This library enabled us to measure the effect of individual substitutions on specificity in hundreds of genetic backgrounds. These distributions allow inferences about the general nature of interface residues in promoting specificity. We find that positive and negative contributions to specificity are neither inherently coupled nor mutually exclusive. Further, a wild-type antitoxin appears optimized for specificity as no substitutions improve discrimination between cognate and non-cognate partners. By comparing crystal structures of paralogous complexes, we provide a rationale for our observations. Collectively, this work provides a generalizable approach to understanding the logic of molecular recognition.

Introduction

Protein-protein interactions underlie most cellular processes and are the basis of established and emerging therapies such as monoclonal antibodies (Weiner et al., 2010), chimeric antigen receptor T cells (Brentjens et al., 2013), and stapled peptide drugs (Walensky and Bird, 2014). To prevent non-specific and potentially detrimental interactions, proteins must discriminate between cognate and non-cognate interaction partners. This is often achieved via molecular recognition or non-covalent interactions between protein surfaces. However, the structural space of interfaces is degenerate, with nearly 90% of native interfaces having a close structural neighbor with a highly similar backbone geometry (Gao and Skolnick, 2010). This problem is particularly acute for cells that encode paralogous protein families, whose members can share significant structural and sequence similarity. Prior work has shown that the specificity of many paralogous protein families is determined by a subset of residues that typically map to the interface formed by a given protein and its binding partner(s) (Aakre et al., 2015; Ovchinnikov et al., 2014; Skerker et al., 2008).

Even when the specificity residues governing a given type of protein-protein interaction have been identified, it remains unclear how individual residues contribute to enforcing specificity. Collectively, specificity-determining residues must achieve two goals: (i) promote or stabilize the desired, cognate interaction and (ii) inhibit or destabilize all unwanted, non-cognate interactions, which we call positive and negative specificity elements, respectively (Schreiber and Keating, 2011). If a residue functions strictly as a positive element, substitutions at this site will destabilize the cognate interaction, but not affect non-cognate interactions (Figure 1A). Conversely, if a residue is strictly a negative element, substitutions will not attenuate the cognate interaction, but could produce cross-talk with non-cognate partners. For most protein complexes, it is not clear if individual residues serve primarily as positive or negative elements, or both (Figure 1A–B). Additionally, it is not clear if specificity residues are optimized for either role, or whether trade-offs arise that prevent optimization. Understanding how individual residues contribute to specificity is important for rationally engineering protein interaction specificity (Chen and Keating, 2012) and for understanding how paralogous protein complexes evolve (McClune and Laub, 2020).

Figure 1 with 1 supplement see all
Specificity-determining residues that dictate toxin-antitoxin interactions.

(A) Schematic representing how antitoxin mutations in specificity-determining residues might affect binding to the cognate toxin (blue) or the non-cognate toxin (gold). A wider gap in binding preference reflects greater specificity. The effects of mutating a specificity residue that serves a positive specificity role (a), a negative specificity role (b), or both roles (c) are shown. Mutations that represent possible trade-offs between binding the cognate toxin and discriminating against the non-cognate toxin are also shown (d–e). (B) Models for how positive and negative specificity determinants are distributed across the interface. Model 1: positive and negative design are accomplished by distinct interface residues. Model 2: individual residues can serve both roles. (C) Sequence alignment of three ParD-ParE systems from Mesorhizobium opportunistum. Coevolving residues are highlighted in purple (ParD) or orange (ParE). (D) Phylogenetic tree inferred from protein sequences of 15 highly conserved genes. Distribution of ParDE2 systems and ParDE3 systems are indicated in gold and blue, respectively. (E) Toxicity-rescue assay for wild-type ParD2, wild-type ParD3, or the ParD3 variant indicated, each co-expressed with either wild-type ParE2 or ParE3 toxin. The ParD3 variants harbor subsets, as noted, of the mutations D61I, K64L, and E80K. (F) Residues in ParD and ParE that strongly coevolve (probability score >0.95) with lines connecting covarying pairs. Residues are numbered according to their position in the alignments in panel (C). Positions selected for the ParD3 saturation mutagenesis library are indicated in red.

Dissecting the roles of interface residues has typically involved analyzing limited numbers of substitutions. A common strategy is to replace interface residues with alanines (Ashkenazi et al., 1990; Clackson and Wells, 1995; Kristensen et al., 1997; Zhang and Palzkill, 2004) or the corresponding residues from a homologous protein (Brasch et al., 2018; Bridgham et al., 2006; Cosmanescu et al., 2018; Sergeeva et al., 2020; Stiffler et al., 2007) and assay the effects on interaction specificity. In some cases, all possible single substitutions have been introduced, either at key positions or, more recently, at all positions within a protein in an approach called 'deep mutational scanning' (Fowler et al., 2010; Fowler and Fields, 2014; McLaughlin et al., 2012; Whitehead et al., 2012). However, single substitutions may not reveal the full impact of a given residue because the effect of a mutation can depend strongly on the context, or genetic background, in which it is introduced (Melamed et al., 2013; Natarajan et al., 2013; Podgornaia and Laub, 2013; Starita et al., 2013).

Considering the effect of a substitution in multiple genetic contexts—in other words, considering combinations of substitutions—can better illuminate the contributions of interface residues to specificity (Dutta et al., 2010; Jenson et al., 2018; Salinas and Ranganathan, 2018). Many studies have leveraged large, combinatorial libraries to identify protein variants that bind the desired target such as bacterial Cry toxins (Dong et al., 2020; Jiao et al., 2017), HIV antigens (Barbas et al., 1994; Burton et al., 1991), or cytokine receptors (Fairlie et al., 2004). However, these studies did not systematically examine the effects of substitutions at a given protein interface. Other combinatorial library studies have systematically dissected protein interactions but focused on a single interacting pair such as the proto-oncogenes Fos and Jun (Diss and Lehner, 2018), PDZ domains and their cognate ligands (Salinas and Ranganathan, 2018), and the two-component signaling proteins PhoP-PhoQ (Podgornaia and Laub, 2013). These studies did not consider non-cognate interactions, precluding conclusions about interaction specificity. Finally, other work has leveraged combinatorial libraries to identify the determinants of specificity for a cognate partner versus a closely-related, non-cognate partner in the context of Bcl-2 family proteins and their peptide ligands (Dutta et al., 2010; Jenson et al., 2017; Jenson et al., 2018) and for bacterial toxin-antitoxin systems (Aakre et al., 2015). However, these studies did not consider all combinations of substitutions. In sum, prior work has not systematically dissected protein interaction specificity by examining the effects of substitutions in a combinatorially complete manner.

Here we built and analyzed libraries containing all 8000 possible combinations of three key specificity residues of a protein complex. Our work focuses on the ParD-ParE family of toxin-antitoxin (TA) systems, which are often found on bacterial plasmids and chromosomes (Fraikin et al., 2020). Most TA systems include a stable toxin that can inhibit cell growth but that is normally bound and inhibited by a cognate antitoxin. ParE toxins likely restrict cell growth by targeting topoisomerases (Jiang et al., 2002; Yuan et al., 2010). Because ParE is inhibited by a cognate ParD antitoxin, cell growth is a convenient readout for binding. TA systems are typically co-operonic, and new paralogs frequently arise through operon duplication. Importantly, these paralogous systems are highly specific, meaning antitoxins typically inhibit only their cognate, co-operonic toxin (Aakre et al., 2015; Fiebig et al., 2010).

For two insulated ParD-ParE systems derived from a recent duplication, we systematically probed the contributions of three key residues to specificity by generating a combinatorially complete library at these three positions in the antitoxin ParD3. Using a bulk competition assay coupled with deep sequencing, we identified the degree to which all variants antagonize the cognate toxin (ParE3) and the non-cognate toxin (ParE2). We not only uncovered the effects of every possible single mutation in the wild-type background but a complete distribution of effects for that mutation in hundreds of genetic backgrounds. This approach reveals the fundamental contributions of a given residue to specificity, independent of genetic context. We find that the three interface positions are each positive elements that promote the cognate interaction, but only two positions serve as negative elements that block the non-cognate interaction. Thus, positive and negative contributions are neither inherently coupled nor mutually exclusive. Notably, when considering the full distribution of effects in hundreds of genetic backgrounds, we find no substitutions that improve the discrimination of cognate and non-cognate partners. These findings suggest that the residues in wild-type ParD3 are optimal with respect to promoting the insulation of these two ParD-ParE systems. By solving a co-crystal structure of ParD2-ParE2 and comparing it to an existing structure of ParD3-ParE3, we identify plausible mechanistic explanations for how each residue contributes to specificity. Altogether, our work provides a comprehensive library-based approach for systematically dissecting protein interaction specificity and a framework for understanding how paralogous systems avoid unwanted cross-talk.

Results

Recently-duplicated ParD-ParE systems use a minimal essential set of specificity residues

To examine insulation between paralogous toxin-antitoxin loci, we focused on two closely-related systems from the widespread ParD-ParE family. Although the first ParD-ParE system characterized was plasmid-borne (Roberts and Helinski, 1992), these TA systems are often found in multiple copies on bacterial chromosomes (Aakre et al., 2015; Leplae et al., 2011). We previously showed that ParD antitoxins are highly specific for their cognate ParE toxins, and that interaction specificity is controlled by a discrete set of coevolving residues that map to the protein-protein interface formed by ParD and ParE (Aakre et al., 2015). To further refine the set of residues that determine the interaction specificity of ParD and ParE, we searched for coevolving residues using GREMLIN (Ovchinnikov et al., 2014), as done previously, using a significantly stricter E-value cut-off (E < 10−20). This analysis revealed eight residues in ParD that strongly covary with 10 residues in ParE (Figure 1C).

These coevolving residues are likely to determine the interaction specificity of paralogous ParD-ParE systems. However, paralogs derived from a recent duplication event may use only a subset of these residues to maintain insulation. The previously-studied ParD2-ParE2 and ParD3-ParE3 systems from the α-proteobacterium Mesorhizobium opportunistum are highly similar, with 41% (antitoxins) and 42% (toxins) identity. To determine whether these systems arose from a recent duplication event, we first built a protein tree of ParE toxins in α-proteobacteria, which revealed that ParE2 and ParE3 are monophyletic and represent sister taxa (Figure 1—figure supplement 1; Source data 12). We then built a species tree for α-proteobacteria using a concatenated alignment of 15 conserved genes and implemented HMMER to identify ParE2 and ParE3 paralogs (Source data 34). This species tree revealed that ParE3 paralogs were more widely distributed, and that organisms that have a ParE2 ortholog usually carry a ParE3 ortholog (Figure 1D). Taken together, the protein and species trees suggest that ParD2-ParE2 and ParD3-ParE3 share a recent common ancestor, and that ParD2-ParE2 is the derived system produced by gene duplication.

Despite their close evolutionary relationship, the two systems do not cross-talk (Aakre et al., 2015 and Figure 1E). Of the 8 ParD specificity-determining residues identified in our coevolution analysis, four are identical between ParD3 and ParD2, and one residue is positively-charged in both (Figure 1C and F). Thus, ParD2 harbors only three non-conservative substitutions relative to ParD3 in its specificity-determining residues. Notably, swapping all ParD3 residues for the derived residues in ParD2 at these three positions (i.e. D61I/K64L/E80K, or DKE to ILK) was sufficient to rewire its specificity, producing an interaction with the non-cognate toxin ParE2 and eliminating the native interaction with ParE3 (Figure 1E). Replacing just one of these three residues in ParD3 with the ParD2 residue was sometimes sufficient to produce promiscuity (Figure 1E). The substitutions K64L and E80K in ParD3 each individually enabled ParD3 to interact with the non-cognate toxin ParE2, while retaining its interaction with ParE3. By contrast, the D61I substitution did not detectably change the interaction specificity of ParD3. Collectively, these results suggest that (i) the coevolving residues identified by GREMLIN dictate the interaction specificity of ParD-ParE complexes and (ii) these specificity-determining residues make different contributions to preventing cross-interactions between the paralogous systems ParD2-E2 and ParD3-E3.

Mapping fitness in a saturated interface mutant library

The targeted mutational studies above offer only limited insight into how individual interface positions contribute to the specificity of ParD-ParE interactions. To more systematically dissect the contributions made by each position in ParD3 to promoting an interaction with the cognate partner ParE3 and to excluding interaction with the non-cognate partner ParE2, we generated a saturation mutagenesis library of ParD3 at the three key interface positions (Figure 2A, Figure 2—figure supplement 1A). The resulting library thus has a theoretical diversity of 8000 variants. This library offers several advantages over the ParD3 library used previously (Aakre et al., 2015), which shares only 130 of 8000 (or 1.6%) of the variants examined here. First, we sought residues that specifically mediate insulation of the ParD3-ParE3 and ParD2-ParE2 systems, and thus focused on residues that are not conserved between the two systems. The prior library mutagenized a position (W60) that is identical between the two antitoxins and is essential for ParD3 to bind either toxin. Second, we generated a combinatorially complete library with full randomization at each position, whereas the previous library included only residues commonly seen at each position in native sequences.

Figure 2 with 2 supplements see all
Mapping variant fitness via a combinatorially complete library.

(A) Residues mutated in the saturation mutagenesis library, mapped onto the ParD3-ParE3 crystal structure (PDB: 5CEG). Salt bridges between ParD3 library positions and ParE3 are indicated. Note that ParD3-ParE3 forms a symmetric tetramer with two chains of each protein; for simplicity, only a single chain of each is shown. (B) Schematic of the ParD3 library experiment. (C) Fraction of single, double, and triple mutants that neutralize ParE3 or ParE2 (W ≥ 0.5). (D) Frequency of amino acids at each position for variants that neutralize ParE3, summarized as a sequence logo, as the fitness thresholds indicated. (E–F) Fitness of all single (E) and double (F) mutants against ParE3. (G–I) Same as in (D)-(F), but for ParE2.

First, to assess the ability of each variant in our library to neutralize the cognate toxin, we co-transformed the library into a strain carrying an arabinose-inducible copy of ParE3 on a plasmid (Figure 2B). The ParD3 variants are expressed from an IPTG-inducible promoter on a separate plasmid. As expected, expressing ParE3 arrests cell growth in conditions that do not induce the ParD3 library (Figure 2—figure supplement 1B). However, co-expressing the ParD3 library with the ParE3 toxin produced robust growth of the culture, similar to conditions in which cells express neither toxin nor antitoxin. This result implies that the library contains a population of ParD3 variants that retain the ability to antagonize ParE3. To identify these variants, we co-expressed the ParD3 library with ParE3 for 10 hr and then deep-sequenced the entire ParD3 gene before and after selection (Figure 2B). We obtained reliable measurements (pre-selection reads >25 and post-selection reads >0) for 7882 variants (Figure 2—figure supplement 1C). Enrichment of a given variant in this competitive growth assay should reflect its ability to neutralize the ParE3 toxin. We observed large changes in variant frequency over the course of the experiment. For instance, ParD3 DKE, the wild-type variant, was enriched 20-fold in the presence of ParE3. In contrast, the variant ParD3 ILK, which has 3 ParD2-like substitutions, was de-enriched 17-fold, and variants harboring stop codons were de-enriched 42-fold on average.

To quantitatively compare the behavior of individual variants over the course of the experiment, we calculated a raw fitness score for each variant, as previously described (Aakre et al., 2015; van Opijnen et al., 2009). Briefly, this raw fitness score (Wraw) reflects the log-fold expansion of each mutant relative to the remaining variants. We then scaled the raw fitness values by setting the mean fitness of variants encoding stop codons to W = 0 and the wild-type sequence DKE to W = 1. The fitness scores were highly reproducible between biological replicates (R2 = 0.96, Figure 2—figure supplement 1D). To validate our approach, we independently cloned four ParD3 variants from our library into an inducible vector and co-expressed each variant with ParE3 (Figure 2—figure supplement 1E–F). We found that the fitness values calculated qualitatively agree with the ability of each variant to rescue growth in vivo following toxin expression. Additionally, for the ~130 variants also queried in our previous work (Aakre et al., 2015), the relative fitness was highly reproducible between the two studies (R2 = 0.96; Figure 2—figure supplement 1G).

Pervasive degeneracy in toxin binding

Our library selection revealed a substantial pool of variants that antagonize the wild-type ParE3 toxin. At a threshold of W > 0.5, we identified 1847 variants out of 7882 (or 23.4%) that neutralize ParE3, including 55 single, 835 double, and 956 triple mutants (Figure 2C). The most common amino acid at each position was the wild-type residue—D61, K64, and E80 (Figure 2D). A similar pattern was also seen at higher fitness thresholds. Nearly all (55/57 = 96%) single substitutions were tolerated except proline substitutions at the first or second positions, variants PKE and DPE (Figure 2E). The majority of double mutants (835/1069 = 78.2%) also retained interaction with the cognate toxin. Double mutants that retained the wild-type residues D61 (DXX) or K64 (XKX) tended to have higher fitness than double mutants that retained E80 (XXE) (Figure 2F), suggesting that position 3 may provide the least binding energy to the native ParD3-ParE3 interaction. Triple mutants, which do not feature any of the wild-type ParD3 residues at the randomized positions, were the least likely to retain an interaction with the cognate toxin (956/6755 = 14.2%), but in absolute numbers comprised the biggest class of functional variants. Taken together, these results show that degeneracy at the ParD3-ParE3 interface is far greater than was previously appreciated (Aakre et al., 2015), with nearly a quarter of all variants retaining an ability to antagonize the toxin ParE3.

To identify variants that antagonize ParE2, a non-cognate toxin for ParD3, we repeated the competitive growth experiment but expressed the ParD3 library in cells producing ParE2 (Figure 2—figure supplement 1H–M). As before, we calculated a fitness score for each variant, this time scaling the raw values by setting the ParD2-like sequence (ILK) to W = 1. The fitness scores were again highly reproducible between biological replicates (R2 = 0.95) and predicted growth in the presence of ParE2 in an independent assay (Figure 2—figure supplement 1J–K). We identified 1510 variants that antagonize the ParE2 toxin (W > 0.5). The antitoxin variants that antagonize ParE2 toxicity often shared properties with the wild-type ParD2 antitoxin (Figure 2G). For instance, the most common amino acid at position 2 in the post-selection library was the wild-type ParD2 residue leucine, followed by isoleucine, which has similar chemical properties. Likewise, the most common amino acid at position 3 was the ParD2 residue lysine, followed by arginine, which is also positively-charged. In contrast, at library position 1, residues with a diverse range of structural and physical properties appear in approximately equal frequencies. As with ParE3, similar patterns were seen at higher fitness thresholds (Figure 2G).

Variants that improved binding to the non-cognate toxin ParE2 typically had 2 or three mutations relative to the parental, wild-type ParD3 (Figure 2C). Only 6 of 57 (10.5%) single mutant variants antagonized ParE2, and, notably, single mutations at position 1 were never sufficient to produce a ParD3-ParE2 interaction (Figure 2H). Single mutants that were functional for antagonizing ParE2 either substituted the wild-type ParD3 residue with leucine or isoleucine at position 2 or 3 (DLE, DIE, DKL, DKI) or reversed the charge of the wild-type residue at position 3 (DKK, DKR). Of the double mutants, 180 of 1069 (16.8%) antagonized ParE2; more than half of these (95) harbored mutations in both the second and third library positions while retaining the aspartate found at position 1 in the parental ParD3, that is, DXX double mutants (Figure 2I). This result further suggests that positions 2 and 3 are important for inhibiting interaction with the non-cognate toxin. Residues at position 2 that were incompatible with the ParD3-ParE2 interaction include all of the aromatic residues (F, H, Y, W), residues with poor helix-forming propensities (P, G), as well as serine and aspartate. As with ParE3 interactions, we find significant degeneracy, with 1510 variants (19.2% of the library) capable of interacting with ParE2. Our observation of extensive degeneracy was robust to the threshold used for defining productive interactions (Figure 2G).

Library variants show a range of abilities to discriminate between cognate and non-cognate partners

Thus far, we have estimated fitness for library variants against the cognate and non-cognate toxins independently. However, specificity in vivo involves simultaneously binding the correct partner while avoiding unwanted interactions. To visualize the extent to which each variant interacts with both toxins, we generated a scatterplot of ParD3 fitness when screened against each potential partner (Figure 3A). This analysis revealed a large pool of variants (666 at a threshold of W > 0.5) that interact promiscuously with both toxins (Figure 3B). This set represents 36% and 44% of total variants that bind ParE3 and ParE2, respectively. For the promiscuous variants, the residues at each position were similar to those that promote interactions with each toxin considered independently (compare Figure 3C with Figure 2D and G). For instance, the most frequent residues among promiscuous variants at position 3 are leucine and isoleucine. Both amino acids appear frequently at position 3 in the full set of variants that antagonize either toxin.

Figure 3 with 1 supplement see all
The sequence space of antitoxin specificity residues.

(A) Fitness of ParD3 variants against ParE2 and ParE3. Gold, specific for ParE2; blue, specific for ParE3; red, promiscuous for both toxins. Histograms of fitness values are depicted. (B) Venn diagram of ParD3 variants that neutralize ParE3, ParE2, or both. (C) Frequency of amino acids at each position for promiscuous variants, summarized as a sequence logo, at the fitness thresholds indicated. (D) Specificity gap for library variants, calculated as WParE3WParE2. The gap for the wild-type ParD3 variant (‘DKE,’ blue dashed line) and ParD2-like variant (‘ILK,’ gold dashed line) are shown. Variants with prolines are omitted.

As expected, the scatterplot showed that the wild-type variant DKE is specific for the cognate toxin ParE3 (Figure 3A). Interestingly, we observed a small set of variants with wild-type-like fitness for the cognate toxin (i.e., W ≈ 1) but with lower fitness against the non-cognate toxin. As depicted in Figure 1A, more specific variants will have a wider ‘gap’ in preference for the cognate versus the non-cognate partner. As a first pass to determine whether the wild-type variant is optimal for specificity, we calculated a specificity gap (WParE3WParE2) for every variant in the library. This analysis revealed a small fraction (2.6%) of the library (Figure 3D) with a wider specificity gap than the wild-type antitoxin. However, given the nature of our assay, it may be difficult to reliably compare highly fit variants, given that all variants that fully neutralize toxin are likely to have maximal apparent fitness. Thus, the measured fitness of the wild-type variant could be an underestimate of its ability to antagonize the cognate toxin. To probe this possibility, we built a linear model of variant fitness against the cognate toxin, using k-fold cross-validation with five folds (Figure 3—figure supplement 1A). Although a purely additive model of residue contributions was highly predictive of variant fitness (R2 = 0.89, SD between folds ±0.003; Figure 3—figure supplement 1B), the model was weakest for the fittest variants, likely due to diminishing returns for highly favorable residues, which may reflect a limitation of our assay or a non-linear relationship between the free energy of binding and the fitness values measured. We observed a similar effect in a linear model for variant fitness against ParE2 (Figure 3—figure supplement 1C–D).

The success of the linear model in predicting variant fitness implies that individual substitutions are largely independent, that is, epistasis in our library is relatively limited. To better quantify epistasis, we plotted the difference between the fitness of each variant predicted assuming independence and the observed fitness of that variant. We considered all possible double and triple mutants relative to the wild type (Figure 3—figure supplement 1E–F). In each case, the distribution of the differences calculated was tightly centered around 0, further suggesting that, on average, there is minimal epistasis or interdependency of mutations. This finding contrasts with the epistasis observed in other proteins (Olson et al., 2014; Podgornaia and Laub, 2015; Salinas and Ranganathan, 2018), and may reflect the fact that each position varied in our ParD3 library interacts with a non-overlapping set of residues in ParE (Figure 1; also see Figure 5 below).

Leveraging a distribution of fitness effects to measure contributions to specificity

We sought a method to better capture the impact of each interface residue on specificity. Introducing individual mutations offers limited information about the role of a residue in determining specificity, particularly due to the intrinsic degeneracy of the interface and the inability to distinguish between high fitness variants. For example, given that ParD3 D61 forms a salt bridge with ParE3 R58 in the native ParD3-ParE3 interaction (Figure 2A), mutations at position 61 (library position 1) should weaken the cognate interaction. However, ParD3 D61I neutralizes wild-type ParE3 in vivo (Figures 1E and 2E) to a degree indistinguishable from wild-type ParD3 in our competition experiment, likely reflecting the robustness of the cognate interface in the context of our assay. A similar effect could, in principle, impact the assessment of how individual mutations affect non-cognate interactions.

To overcome these limitations, we leveraged the inherent redundancy in our saturation mutagenesis library, namely that each substitution occurs in the context of ~400 unique backgrounds. For example, the D61I mutation occurs in our library with all possible combinations of residues at positions 64 and 80 (Figure 4A). We calculated the value ΔW for each substitution in each possible background, for example, for D61I in the context where positions 64 and 80 are alanine, ΔW = WDAA - WIAA. To understand the role of each antitoxin residue, both in terms of mediating cognate toxin binding and preventing non-cognate toxin binding, we can measure and assess the distribution of fitness effects (ΔW values) for mutating each residue in each possible background. For this and all further analyses, we do not consider variants containing prolines because all three library positions reside in α-helices, and prolines tend to break or kink helices; these substitutions could trivially affect antitoxin stability rather than molecular recognition. Thus, we ultimately considered ~361 library contexts per substitution.

Figure 4 with 3 supplements see all
Systematic dissection of interface substitutions.

(A) Example illustrating how each substitution occurs in hundreds of contexts in the saturation mutagenesis library. (B) Distribution of fitness effects for the substitutions indicated (each replacing a specificity residue in ParD3 with the corresponding residue in ParD2) in all possible contexts, fitted to skew normal distributions. The effect on the cognate interaction is shown in blue, and the effect on the non-cognate interaction is shown in gold. Mean (gray), median (purple), and mode (red) are indicated. Triangles denote the effect of each substitution in the wild-type background (DKE). (C–D) Box plots showing the distribution of fitness effects for each substitution. For each substitution, the effects on fitness against ParE3 (C, blue) and ParE2 (D, gold) are shown. The mode of the fitted skew normal distribution is indicated in red. Dashed lines represent the threshold for a mode not expected by chance (<5% FDR based on 1000 permutation tests). Outliers (1.5 * IQR) are hidden for clarity but were included in all quantitative analyses. (E) Diagram illustrating how each substitution affects fitness for both ParE3 and ParE2. Starting fitness of the wild-type antitoxin is arbitrarily set to W = 1 for ParE3 and W = −1 for ParE2 (black dots). Blue and gold points represent shifts in fitness for variants when tested against ParE3 and ParE2, respectively, based on the mode effect size measured in (C) and (D).

We first used this approach to examine the fitness effects of substituting each ParD3 residue with the corresponding residue in ParD2, that is, D61I, K64L, and E80K substitutions. If an antitoxin residue generally serves as a negative specificity element, we predict that a substitution in that residue will have an overall positive distribution of fitness effects when the library is queried against the non-cognate ParE2 (Figure 1A). If an antitoxin residue generally serves as a positive element, we predict that its substitution will produce an overall negative distribution of effects when the library is queried against ParE3. Residues could, in principle, serve as both positive and negative elements. To summarize the fitness effects of a given substitution in the 361 different possible contexts, we fit each distribution to a skew normal distribution and assessed the mode as a measure of central tendency (Figure 4B; Figure 4—figure supplements 12).

We found that all three library positions serve as positive elements that promote the cognate interaction (Figure 4B). Swapping the ParD3 residue with the ParD2 residue had a typical fitness cost, or produce a change in the mode, of −0.63, –0.71, and −0.39 at positions 1, 2, and 3, respectively, when the library was tested against ParE3. These distributions were almost universally negative, and although there was substantial overall variability, each had relatively narrow interquartile ranges (0.17–0.25). Thus, swapping each ParD3 specificity residue for the corresponding residue in ParD2 has a generally detrimental effect on fitness. Importantly, however, each of these substitutions in the wild-type background (i.e., DKE → IKE, DLE, and DKK) was approximately 0 (Figure 4B), demonstrating the importance of using the entire saturation mutagenesis library to characterize the nature of a given substitution.

Although all three positions promote interaction with the cognate partner, only two serve as negative elements that help prevent interaction with the non-cognate toxin ParE2. At position 1, swapping the ParD3 residue with the corresponding ParD2 residue (D61I) produced a distribution of fitness changes with a mode of −0.01 (statistically indistinguishable from 0) with an extremely narrow interquartile range of 0.11 (Figure 4B, left). By contrast, the K64L and E80K mutations had distributions with modes of 0.57 and 0.50, respectively (Figure 4B, middle and right). Taken together, these analyses indicate that position 1 functions primarily as a positive element, whereas positions 2 and 3 have both positive and negative roles in ParD-ParE specificity.

Systematically estimating the effects of all interface substitutions on specificity

The analyses just presented only considered substituting a residue in ParD3 with the corresponding residue in ParD2. Next, we wondered whether the effects documented (Figure 4B) were idiosyncratic or generalizable to all substitutions. In other words, do all amino acid substitutions at position 1 (D61) hinder the cognate interaction, and do all substitutions at positions 2 and 3 (K64 or E80) have reciprocal effects on the cognate and non-cognate interactions? Generally, is each ParD3 interface position ‘optimized’ for specificity? To answer these questions, we measured the distribution of fitness effects, again by considering all ~361 contexts in the library, for each amino acid substitution at each library position (Figure 4C–D).

We observed distinct patterns of effects at each library position. At position 1, we found that mutating 61D to any residue hindered ParD3-ParE3 binding (Figure 4C). Substituting glutamate for aspartate at this position was the least unfavorable mutation when considering the entire distribution of effects (Figure 4C), as expected given that ParD3 61D likely forms a salt bridge with ParE3 58R (Figures 1D and 2A). With respect to the non-cognate interaction, every distribution had a mode close to or indistinguishable from 0 (Figure 4D). In no case did a substitution tend to improve binding to ParE2. However, eight substitutions (to C, F, G, T, W, S, Y, or M) each tended to further destabilize the ParD3-ParE2 interaction. Interestingly, substituting ParD3 position 1 with isoleucine, the corresponding residue at this position in ParD2, did not confer a noticeable fitness advantage relative to other substitutions in terms of promoting an interaction with ParE2. More generally, these results suggest that the ParD3 residue 61D is likely not important for discriminating against the non-cognate toxin. We conclude that position 1 in the antitoxin primarily supports the cognate interaction with ParE3.

At position 2, the distributions of fitness effects for all substitutions were, as for position 1, nearly universally negative in terms of interacting with the cognate ParE3 (Figure 4C). With respect to interacting with the non-cognate toxin ParE2 however, there were striking differences among the different substitutions (Figure 4D). The substitutions from lysine to five different residues (L, I, R, V, M) each significantly improved ParD3 binding to ParE2. For the substitutions of K with L, I, or R, the entire distributions were ≥0 indicating that these substitutions promoted an interaction with ParE2 in all ~361 contexts. By contrast, substitutions to 11 amino acids generally decreased the fitness of ParD3 variants versus ParE2. These findings imply that K64 is a negative design element, but that lysine is not the optimal residue for discriminating against ParE2 (discussed further below). Notably, performing this analysis using the 130 genetic backgrounds found in our previous library (Aakre et al., 2015) did not reliably capture the effects of each substitution (Figure 4—figure supplement 3).

Finally, for position 3, the distributions of fitness effects with respect to the cognate interaction were again generally negative, as with positions 1 and 2 (Figure 4C). However, the typical values and the lowest values of the distributions were notably less severe for position 3, indicating that substitutions at this position are generally more forgiving with respect to maintaining the interaction with the cognate toxin. Conversely, nearly all substitutions at position 3, in virtually all contexts, led to increased interactions between ParD3 and ParE2, consistent with this position in ParD3 helping to enforce specificity by preventing an interaction with ParE2.

Taken all together, our results and analyses provide a comprehensive map of how substitutions impact ParD-ParE interaction specificity and demonstrate that the three positions each play a distinct role in enforcing this specificity. To summarize these results, and to explore how each substitution simultaneously affects the cognate and non-cognate interactions, we plotted the typical (i.e., mode) effects of each substitution on ParD3 fitness versus. ParE3 and ParE2 (Figure 4E). We arbitrarily set the starting values of the wild-type variant DKE to 1 for ParE3 and −1 for ParE2. Increasing fitness versus the cognate toxin or decreasing fitness versus the non-cognate toxin would each increase overall specificity (Figure 1A). Notably, the wild-type residue at each position is optimal for ParE3 binding, such that no substitutions—considering the distribution of fitness effects—improve the fitness of ParD3 queried against this toxin. We do find many substitutions in ParD3 that can further destabilize the non-cognate interaction. However, every such substitution has a substantial cost in fitness when queried against the cognate ParE3. Thus, we conclude that the wild-type ParD3 may provide optimal specificity with respect to ParE3 and ParE2. One limitation of this approach is that each type of substitution is compared to the wild-type amino acid only in its wild-type context (e.g., AXX mutations are compared to DKE), which may overestimate the favorability of the wild-type residue. However, performing a similar analysis in which fitness distributions for each amino acid at each position were compared to the fitness distribution for the wild-type residue (e.g. AXX compared to DXX) produces similar results and conclusions (Figure 4—figure supplement 3E–G).

Structural basis of positive and negative interaction elements

Our saturation mutagenesis results produced several hypotheses about the contributions of individual residues and residue combinations to ParD-ParE interaction specificity. To probe these hypotheses further and to understand how positive and negative elements of specificity function at a structural level, we sought to compare the packing of interface residues in co-crystal structures of ParD3-ParE3 and ParD2-ParE2. For simplicity in comparisons, we use residue position numbers based on the alignment in Figure 1C for all analyses.

In the ParD3-ParE3 co-crystal structure, which we solved previously (Aakre et al., 2015), library position 1 (D61) and position 2 (K64) both reside in α-helix-2 of the antitoxin in a hydrophobic side chain pocket (Figure 5A). D61 forms two salt bridges with toxin R59, while K64 forms salt bridges with toxin E75 and/or E89. These inter-chain salt bridges flank the antitoxin residue W60, permitting extensive hydrophobic contacts between the large aromatic residue and the aliphatic side chains. Toxin residues A62, L73, and V91 also form hydrophobic contacts with the antitoxin residue W60. Previous work has shown that the burial of tryptophan residues is highly energetically favorable and helps drive protein-protein interactions (Cunningham and Wells, 1989). Thus, we posit that D61 and K64 serve as positive specificity elements by both forming direct inter-protein contacts with the toxin and decreasing solvent accessibility of W60. Substitutions at positions 1 and 2 were nearly uniformly (i.e., in all ~361 contexts) detrimental to interacting with ParE3 likely because they disrupt the salt bridges normally formed by D61 and K64, perturb the hydrophobic pocket surrounding W60, or both. The third library position, E80, resides on the antitoxin α-helix-3 and forms a salt bridge with toxin residue K12. Mutations at this position are generally less detrimental than mutations at the other positions, potentially due to the absence of an equivalent hydrophobic binding pocket.

Figure 5 with 1 supplement see all
Comparing the interfaces of paralogous toxin-antitoxin complexes.

(A) The interface between ParD3 and ParE3 (PDB: 5CEG). Top inset, library positions 1 and 2, with neighboring residues. Bottom inset, library position 3. Salt bridges are indicated by dashed lines. (B) Same as in (A), illustrating the corresponding regions of ParD2-ParE2 (PDB: 6X0A).

To compare the binding interfaces formed by the paralogous interacting pairs, we initially solved a 2.4 Å co-crystal structure of ParD2-ParE2 (Figure 5—figure supplement 1A; also see Source data 56). The toxins and the C-terminal half of the antitoxins, which contains all toxin-binding residues, were remarkably similar between the two cognate complexes (rmsd = 0.812 Å). However, we were unable to resolve ParD2 residues 26–42 (Figure 5—figure supplement 1A–C), likely because this region does not have any protein-protein contacts and adopts multiple conformations. We speculate that ParD2 residue 47P (which is 47E in the ParD3 structure) is helix-breaking and contributes to the instability of this region. Although our model for the toxin and toxin-binding region of the antitoxin (residues 43–89) were an excellent fit to the electron density (Figure 5—figure supplement 1C), noisy electron density for the N-terminal portion of the antitoxin prevented adequate refinement of the entire structure (see Materials and methods). To produce a second model for the ParD2-ParE2 interface, we initially pursued truncated versions of ParD2 in complex with ParE2 but could not obtain crystals. Ultimately, we produced a second co-crystal structure to 2.9 Å resolution of ParE2 in complex with a ParD2 variant, ParD2*, modified to have the N-terminal 48 residues of ParD3 (PDB: 6X0A; Figure 5—figure supplement 1D; Table S1). In the original ParD3-ParE3 structure, this region of ParD3 is crystallographically well-behaved and forms extensive contacts with a second antitoxin chain. Importantly, both structural and coevolutionary data suggest that there are no contacts between ParE2 and residues 1–45 of the antitoxin, and the modified structure was indistinguishable from the original structure in the overlapping regions (Figure 5—figure supplement 1E–F).

The ParD2*-ParE2 structure reveals a similar hydrophobic binding pocket to ParD3-ParE3 surrounding W60 and involving toxin residues A62, L73, and V91 (Figure 5B). However, the interactions corresponding to the library positions are markedly different between the paralogous interacting pairs. ParD2 residue 61 (corresponding to position 1 in our library) is isoleucine, versus aspartate in ParD3. Unlike ParD3 D61, which forms a salt bridge with R59 in ParE3, ParD2 I61 does not form a salt bridge. Substitutions of ParD3 residue 61 do not improve binding to ParE2, likely because ParE2 G59 provides flexibility to this region and space to accommodate almost any residue. ParD2 residue 64, which corresponds to the second library position, is leucine, and interacts hydrophobically with W60 (distance = 4.2 Å). Unlike ParD3 K64, which interacts with two glutamate residues, ParD2 L64 does not contact the toxin. Instead, ParE2 E89 forms an intra-chain salt bridge with R75. The hydrophobic interactions between L64 and W60 in the ParD2*-ParE2 structure may explain why ParD3 K64 mutations to the hydrophobic residues leucine, isoleucine, and methionine generally improve ParD3-ParE2 binding (Figure 4D). Intriguingly, arginine mutations at this position also improved ParD3-ParE2 binding. We speculate that, due to its longer side chain, arginine may be more effectively neutralized than lysine by ParE2 E89. On α-helix-3 in ParD2*-ParE2, K80 forms a salt bridge with E12, analogous to the salt bridge formed by ParD3-ParE3 with E80-K12 (Figure 5A). Importantly, the charges of the residues are flipped between the complexes, providing a simple mechanism for how ParD3 excludes ParE2 at this position.

Discussion

A combinatorially complete library reveals substitution effects across hundreds of contexts

Paralogous interacting proteins face the challenge of maintaining the desired interaction while avoiding unwanted cross-talk. This specificity is often accomplished via molecular recognition using a discrete set of amino acids at the protein-protein interface. These specificity-determining residues must inherently stabilize the cognate interaction (positive elements), destabilize the non-cognate interaction (negative elements), or both. Determining which of these roles individual specificity residues play is a difficult problem that can be confounded by epistasis and the degeneracy of the interface, which can mask both beneficial and detrimental effects of individual substitutions. Prior work has shown that examining a substitution across multiple genetic contexts can improve estimates of its effects on an interface. For instance, one study found that including additional mutational data (using a library of 2–6 amino acids at four different interface positions) improved the performance of a position weight matrix to predict binding of a BH3 peptide to either of two Bcl-2 family members (Dutta et al., 2010). We expanded on this approach by screening a combinatorially complete library of ~203 interface variants using ParD-ParE toxin-antitoxin systems as a model. In this library, each substitution occurs in ~202 different backgrounds, which led to relatively narrowly-distributed, unimodal distributions of fitness effects in each case (Figure 4C–D). Based on these distributions, we determined the typical effect of mutating a given specificity residue to every other amino acid. Notably, most substitutions in the wild-type background had little or no effect on either the cognate or non-cognate interaction, which underscores the value of our approach for elucidating the contribution of individual substitutions to specificity (Figure 2E and H; Figure 4).

For the ParD3 antitoxin, we found that all three library positions are positive elements, as all substitutions at each position tended to destabilize interaction with the cognate toxin, ParE3 (Figure 4C). Additionally, each wild-type residue appears optimal for this role, suggesting that there is normally a high fitness cost of inadequately neutralized toxin. With regard to the non-cognate interaction, we found that positions 2 and 3—but not position 1—serve as negative elements in determining specificity (Figure 4D). In other words, the wild-type residues at positions 2 and 3 in the ParD3 antitoxin help inhibit an interaction with ParE2, with many (position 2) or nearly all (position 3) substitutions increasing the interaction of ParD3 with ParE2. These results argue against the notion suggested previously for other paralogous interacting proteins (Hochberg et al., 2018; Sergeeva et al., 2020) that it is difficult to introduce mutations in one protein that improves binding to the cognate partner without also improving binding to the non-cognate partner.

Examining comprehensive, combinatorial libraries also enabled us to ask whether specificity is optimal, at least with respect to the three positions randomized in our library. Overall, ParD3 specificity appears to be optimized with respect to antagonizing the cognate toxin ParE3 and avoiding the non-cognate toxin ParE2 (Figure 4E). When considering the distribution of fitness effects, no substitutions further stabilized the cognate interaction. Several substitutions improved discrimination against the non-cognate ParE2 (e.g., substitutions to D, W, and Y at position 2 – see Figure 4D), but each substitution led to diminished interaction with the cognate toxin. Whether ParD3 is globally optimized for specificity with respect to all other homologous ParE toxins—for instance, the more distantly related ParE1—remains to be tested. Specificity may not be globally optimized, as was shown recently for paralogous bacterial two-component signaling proteins (McClune et al., 2019).

Importantly, our assessment of an interface affinity and discrimination is derived from an in vivo fitness assay, as opposed to an in vitro binding assay. This was partly a necessity, as it is technically difficult to purify large quantities of many toxins given their inherent toxicity, and to purify antitoxins, which are often unfolded in the absence of their cognate toxins (Cherny and Gazit, 2004; De Jonge et al., 2009). Binding assays have been reported for some TA systems (e.g., Brown et al., 2013), but this approach is not scalable to assaying many antitoxin variants in parallel, as done here. Further, even precise quantitative measurements of binding affinity may not predict whether two systems will be insulated in vivo because of non-linear relationships between affinity in vitro and fitness in vivo. In this regard, the use of an in vivo assay is advantageous relative to in vitro affinity measurements as it may better reflect what matters to a cell. However, our in vivo fitness assays are unlikely to be as sensitive as what occurs over long time-scales in nature with large effective population sizes.

We also note that our in vivo assays rely on the induced expression of toxins and antitoxins in a heterologous host, and so may not accurately reflect their concentrations in their native cellular contexts. Nevertheless, the relative behavior of ParD3 variants measured here is likely to apply in whatever expression context they arise (Figure 2—figure supplement 2). Finally, we note that in principle some mutations could affect protein abundance rather than toxin affinity, although the positions mutated are all solvent-exposed residues that are unlikely to affect protein folding and stability.

Structural basis of protein interaction specificity

Our results demonstrate that each position examined in ParD3 made qualitatively different contributions to specificity. To better understand the patterns documented in our library approach and to develop a structural basis for them, we produced a co-crystal structure of the ParD2-ParE2 complex and compared it to the ParD3-ParE3 structure we produced previously (Aakre et al., 2015). Although these systems do not cross-talk in vivo (Figure 1E), the complexes share highly similar secondary and tertiary structures. This overall structural similarity underscores the importance of just a few interface residues in dictating interaction specificity, as has been observed for many other systems (e.g., Brasch et al., 2018; Bridgham et al., 2006; Cosmanescu et al., 2018).

The first position, ParD3 residue D61, is optimal for cognate toxin binding but does not help discriminate against the non-cognate toxin. In the ParD3-ParE3 structure, ParD3 D61 forms a salt bridge with R59 in ParE3. However, in the ParD2-ParE2 structure, there is no connection between the corresponding two residues. In fact, the arginine in ParE3 is replaced by a glycine in ParE2, G59, which likely confers local flexibility and tolerance for diverse amino acids at position 61 in the antitoxin.

At the second position, ParD3 residue K64 is both a positive and negative element with respect to specificity. This lysine is optimal for the cognate interaction but not for non-cognate discrimination. Some mutations would better discriminate against the non-cognate interaction (Figure 4D), although each would come at the expense of cognate binding (Figure 4C). In the ParD3-ParE3 structure, ParD3 K64 forms an inter-chain salt bridge with E75 and E89 in the toxin; the aliphatic portion of the lysine side-chain also contributes to the hydrophobic packing of a conserved tryptophan, W60. Because ParD2 encodes a leucine at position 64, there is no inter-chain salt bridge; instead, there is an intra-chain salt bridge between toxin residues R75 and E89. This provides a structural explanation for why hydrophobic residues (e.g., L, I, V, and M) at ParD3 position 64, which are less likely to interfere with this salt bridge, are more compatible with binding to ParE2. Positively-charged residues (R, K) at position 64 in the antitoxin are also well-tolerated, with respect to binding ParE2, as they may be neutralized by E89 in the toxin.

Finally, the ParD3 residue E80, which corresponds to the third library position, offers maximal interaction with the cognate ParE3 and near-maximal discrimination against ParE2. The only residue which offered significantly more discrimination was aspartate, which shares its negative charge. ParD3 residue E80 forms a salt bridge with K12 in ParE3, while ParD2 K80 forms a salt bridge with E10 in ParE2. In other words, both complexes form salt bridges, but with reversed charges. Opposite salt bridges provide a logical strategy for two paralogous complexes to simultaneously maximize both affinity and discrimination.

Our combined genetic and structural data allow us to predict how ParD3 discriminates against other paralogous toxins. We hypothesize that ParD3 maintains insulation from ParE1 using a different subset of interfacial residues than it uses for ParE2. For instance, both ParD2-ParE2 and ParD3-ParE3 encode a tryptophan at antitoxin position 60 and a leucine at toxin position 73, which lie in a hydrophobic pocket with similar architecture in both cognate complexes (Figure 5A–B). However, ParD1-ParE1 encodes isoleucine at antitoxin position 60 and phenylalanine at toxin position 73 (Figure 1C). Thus, we speculate that the ParD1-ParE1 complex also forms a hydrophobic pocket but with a fundamentally different architecture than the ParD2-ParE2 and ParD3-ParE3 complexes. Steric clash between the residues of this hydrophobic pocket could serve as a common mechanism of insulating ParD3-ParE3 and ParD2-ParE2 systems from ParD1-ParE1. The use of different subsets of residues to discriminate against different partners has been observed in other systems (Sergeeva et al., 2020; Zhang and Palzkill, 2004).

Concluding remarks

Our work demonstrates the novel application of a high-throughput in vivo approach to determining the contributions of individual residues to interaction specificity. Understanding specificity in natural proteins can inform the design of proteins and synthetic signaling pathways. Our results reinforce the notion that negative interaction elements are necessary to maintain insulation between two paralogous toxin-antitoxin systems. Previous work similarly found that explicit negative design is required to prevent undesired interactions between paralogous proteins (Grigoryan et al., 2009). Elucidating the basis of interaction specificity can also provide insight into how paralogous proteins have become insulated during evolution. For toxin-antitoxin systems, as with many interacting proteins, new systems emerge through duplication and divergence, with the paralogous pairs becoming insulated through the forces of drift and selection. Mapping the basis of specificity, as done here, will enable efforts to reconstruct the phylogenetic history of such duplication-divergence events and to elucidate how insulation proceeds through a series of stepwise mutations.

Materials and methods

ParD-ParE coevolution and phylogeny

Request a detailed protocol

ParD-ParE specificity residues, those residues showing strongest coevolution between proteins, were identified using GREMLIN (http://gremlin.bakerlab.org) using Mesorhizobium opportunistum ParD3 and ParE3 as input. We performed eight iterations with an E-value cutoff of 1 × 10−20 and isolated residue pairings with a probability score >0.95. To construct a ParE protein tree, we first used HMMER (http://hmmer.org) to identify and align all α-proteobacteria homologs of M. opportunistum ParE2 in the ProGenomes Database. We then built a protein tree from all homologs using a trimmed alignment (positions represented in <50% of sequences were removed) in FastTree 2 (Price et al., 2010). The protein tree allowed us to categorize the resulting homologs as orthologs of ParE2, ParE3, or neither based on membership in either the ParE2 or ParE3 sister taxa. A species tree was generated for a concatenated alignment of 15 highly conserved single-copy bacterial genes (atpD, dnaA, rpsK, rpsD, ruvA, leuS, yeaZ, rpsF, rplI, radA, tsf, pyrH, yhhF, coaD, frr). HMMER was used to identify and align orthologs of these genes in α-proteobacteria species in the ProGenomes Database. The concatenated alignment was manually trimmed to remove positions represented in <50% of sequences and positions with <25% conservation, and a tree was generated using FastTree two for relevant species.

Bacterial strains and media

Request a detailed protocol

Escherichia coli strains were grown in an M9L medium (1× M9 salts, 2 mM MgSO4, 100 uM CaCl2, 10% v/v LB) supplemented with 0.1% casamino acids and 0.4% glycerol at 37°C, unless otherwise indicated. ParE toxins were cloned between SacI and HindIII sites in the arabinose-inducible pBAD33 vector, and ParD antitoxins were cloned between SacI and HindIII sites in the IPTG-inducible pEXT20 vector. Plasmids were co-transformed into E. coli TOP10 cells and maintained with chloramphenicol (30 μg/mL) and carbenicillin (100 μg/mL). Single colonies were grown overnight at 30°C in an M9L medium supplemented with 0.4% glucose and antibiotics. The following day, cultures were washed in media without glucose, serially diluted, and spotted onto M9L plates supplemented with antibiotics and 0.4% glucose (toxin repression), 0.2% arabinose (toxin induction), or 0.2% arabinose and 100 μM IPTG (toxin and antitoxin induction). After 1 day, toxin-antitoxin interactions produced robust growth on the M9L plates with 0.2% arabinose and 100 μM IPTG. We observed no intermediate growth phenotypes for strains in (Figure 1E).

Library construction

Request a detailed protocol

Residues targeted in the randomized library were selected using GREMLIN, followed by identification of specificity residues that differ between ParD3 and ParD2. The ParD3 saturation mutagenesis library was assembled using two rounds of PCR with the KAPA HiFi HotStart ReadyMix PCR Kit (Roche). The first round of PCR replaced the targeted residues with NNK codons and produced a linear fragment with 5' and 3' BtgI sites. To decrease the frequency of wild-type ParD3 in the pooled library, we used a mutated, non-functional gene (ML3304) with a deletion between codons 64–80 and frameshifted between residues 80–93 as the template. The fragment was then re-circularized via digestion with BtgI followed by ligation with T4 DNA ligase. To ensure strand complementarity, we performed a second round of PCR amplifying the entire ParD3 gene. The amplicon was then sub-cloned into pEXT20 between the SacI and HindIII sites and transformed into One Shot TOP10 Electrocomp E. coli (Invitrogen). Cultures were recovered overnight at 30°C in LB containing 0.4% glucose and carbenicillin to produce ML3310. The yield was ~106 total colonies (theoretical library size = 8×103 at the codon level, 3.3 × 104 at the nucleotide level). Cells harboring the plasmid library were then electroporated with a dialyzed plasmid containing an arabinose-inducible toxin. The transformations were recovered 1 hr in SOC media, followed by overnight growth at 30°C in LB supplemented with 0.4% glucose and antibiotics.

Library selection and analysis

Request a detailed protocol

To assess the ability of each library variant to antagonize a given ParE toxin, we performed competitive growth assays in liquid cultures. To this end, 50 mL of an overnight culture of the library was washed in 50 mL M9L without glucose and diluted to an OD600 of 0.03 in 200 mL M9L supplemented with antibiotics. To induce antitoxin expression, 100 μM IPTG was added and cells were grown at 37°C with shaking for 90 min. Toxin expression was then induced with 0.2% arabinose. Optical density was monitored every 30 min between t = 0–5 hr and every 60 min between t = 5–10 hr. To prevent cells from entering the stationary phase, cells were diluted 1:10 in fresh media 150 min after induction. To identify changes in variant frequency over the course of the experiment, plasmids were extracted (Zymogen) from 50 mL samples collected at t = 0 and t = 10 hr. The entire ParD3 gene was used as a template for PCR (12 cycles) with custom barcoded primers containing Illumina flowcell adaptor sequences. To increase the diversity of the amplicon pool, the PCR also incorporated 4–6 random nucleotides to the 5' end of each ParD3 gene. Samples were then pooled and sequenced on a single lane of a NextSeq flow cell.

Multiplexed reads were sorted based on an exact match to a six-letter barcode. Reads were then filtered to remove sequences that (a) do not encode an ‘NNK’ codon in any of the three targeted positions, or that (b) contain any non-synonymous mutation in the remaining 91 codons. The frequency of each variant in each sample was then counted and normalized by the total number of reads per sample. Next, we assigned a raw fitness score to each variant as described previously (Aakre et al., 2015; van Opijnen et al., 2009) and transformed those fitness values such that the median W value for variants encoding a stop codon = 0, and the W value for the wild-type or predicted best sequence = 1 (DKE for ParE3, ILK for ParE2).

Linear models were built using k-fold cross-validation with five folds, which allows each variant to serve as part of the training and test sets, and permits error estimation for the model coefficients. Models assumed additive relationships between residues. To summarize the fitness effects of substitutions, we fit each distribution to a skew normal distribution, which generalizes the normal distribution to allow for non-zero skewness. We posit that this probability distribution best summarizes the data because our distributions of mutation effects skew toward 0 change in fitness, likely due to the inherent sensitivity limits of the assay for changes at either end of the fitness distribution. Indeed, we find that variants most resistant to change in fitness tend to have the highest and lowest fitness scores (Figure 4—figure supplement 1A–B). We report the mode of the skew normal distribution as our measure of central tendency for the effect of each substitution, but note that it correlates extremely well with other measures such as mean and median (Figure 4—figure supplement 1C–D,F–G) as well as the coefficients from our linear model (Figure 4—figure supplement 2). To determine whether the modes for each distribution deviate significantly from 0, we generated a null distribution by shuffling the variant names in our data and again measuring the modes of fitness distributions. We call an effect significant if its mode lies below the 5% or above the 95% percentile of this null distribution.

Crystallization of ParD2-ParE2 and chimeric ParD2*-ParE2

Request a detailed protocol

To purify the native ParD2-ParE2 complex, the genes were cloned co-operonically under a single IPTG-inducible promoter in the pETDuet vector between the NcoI and HindIII sites (ML3297). Protein expression was induced in T7 Express cells (NEB) with 0.2 mM IPTG at 16°C. Cells were harvested 16–24 hr after induction. Frozen cell pellets were resuspended in Tris-buffered saline (50 mM Tris-HCl [pH 7.6], 150 mM NaCl) with 1 mM PMSF, and lysed via three passages through an LM20 Microfluidizer (Analytik). The precipitate was pelleted at 30,000 g for >1 hr and cleared lysate was purified on a Ni2+-NTA column followed by gel filtration on a Superdex 200 10/300 (GE Healthcare Life Sciences) equilibrated with storage buffer (10 mM Tris-HCl pH 8.0, 150 mM NaCl, 200 mM imidazole). Fractions containing ParD2-ParE2 were identified via SDS-PAGE.

Initial sitting-drop, vapor-diffusion crystallization screens were set up with a Phoenix drop-setting robot (Art Robbins Instruments). Hits were optimized in hanging-drops. For screening, we mixed 0.15 μL of protein (7.5 mg/mL) with 0.15 μL reservoir solution from commercial crystallization kits. Crystals appeared within ~3 d in condition 77 of the Protein Complex Suite (Qiagen) and ~12 hr in condition 59 of the PegRx HT kit (Hampton). Optimized crystals were obtained in 2 μL drops (1 μL protein + 1 μL well solution) at 18°C in two conditions. Crystals for Br/SAD data collection were grown using 0.1 M HEPES (pH 7.40), 0.85 M KCl, 1 M AmSO4, 0.35 M KBr as the well solution. Native crystals were grown using 1.93 M AmSO4, 0.1 M BIS-TRIS pH 6.50, and 2–3% w/v PEG MME 550. Crystals were cryo-protected in well solution variants substituting 1.5 M LiSO4 (Br/SAD) or 2.0 M LiSO4 (Native) for AmSO4.

X-ray diffraction data were collected on the 24-ID-C beamline at APS (NE-CAT). A 2.8 Å Br/SAD data set was collected at a wavelength of 0.920 Å, and a native data set was collected to 2.4 Å resolution at 0.979 Å. Data were indexed, integrated, and scaled using XDS (Kabsch, 2010). The two data sets belong to the same space group (I4122) but have slightly different unit cells, making phasing by SIRAS not feasible. The PHENIX tool XTriage estimated the anomalous signal in the SAD data extended to 4.3 Å (Adams et al., 2010). The AutoSol function of PHENIX was used to locate bromide sites, calculate SAD phases, and refine them with density modification (Terwilliger et al., 2009). The resulting density modified map was sufficiently clear to identify a bundle of three alpha-helices which were built in Coot (Emsley and Cowtan, 2004) as poly-alanine and input back into AutoSol for phase recombination with the SAD phases. In subsequent rounds of phase recombination and density modification, a nearly complete model of the complex was built and refined in PHENIX. Throughout this process, the structure of the ParD3-ParE3 complex (PDB: 5CEG) served as a guide to identify noisy features. The resulting model was used to phase the higher-resolution native data set by molecular replacement using PHASER (McCoy et al., 2007). The rounds of final refinement in PHENIX and Coot used the 2.4 Å native data with only model-based phases. The electron density allowed unambiguous side chain assignments for all of the ParE2 toxin and the C-terminal half of the ParD2 antitoxin (residues 43–89), which contains the entire toxin-binding region. There was a significant amount of electron density that must correspond to the N-terminus of the antitoxin but it was mostly uninterpretable except for a noisy helical density feature that appears to be antitoxin residues 12–25. There are no protein-protein crystal contacts in this region and the protein seems to adopt multiple conformations. Given the high symmetry of the apparent space group of the crystal (I4122), the possible lower symmetries (P422, I4, I222) were explored in an attempt to account for this density without success. We believe that the high final Rfree for the refinement of this structure reflects our inability to model these noisy features corresponding to the flexible N-terminus of the antitoxin.

To build a second model for the ParD2-ParE2 interface, we co-crystallized ParE2 in complex with ParD2 modified to have the rigid, N-terminal 45 amino acids of ParD3 (ParD2*; ML3305). Importantly, neither ParD2 nor ParD3 contain any predicted toxin-binding residues in this N-terminal region. ParD2*-ParE2 was purified using the same general approach as ParD2-ParE2 and was stored in a buffer of 10 mM Tris-HCl pH 8.0 and 150 mM NaCl. Initial crystallization screening was carried out as before using 8 mg/mL of ParD2*-ParE2. The best crystal grew in condition 55 of the PegRx HT kit (0.2 M ammonium acetate, 0.1 M sodium citrate tribasic dihydrate pH 5.5, 24% v/v polyethylene glycol 400). Diffraction data was collected on the 24-ID-E beamline at APS (NE-CAT) to 2.9 Å resolution and indexed, integrated, and scaled using XDS. The structure was solved using molecular replacement with the native ParD2-ParE2 structure, and the model was further refined iteratively using PHENIX and Coot.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
    Coot: model-building tools for molecular graphics
    1. P Emsley
    2. K Cowtan
    (2004)
    Acta Crystallographica. Section D, Biological Crystallography 60:2126–2132.
    https://doi.org/10.1107/S0907444904019158
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
    XDS
    1. W Kabsch
    (2010)
    Acta Crystallographica. Section D, Biological Crystallography 66:125–132.
    https://doi.org/10.1107/S0907444909047337
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59

Decision letter

  1. Christian R Landry
    Reviewing Editor; Université Laval, Canada
  2. Olga Boudker
    Senior Editor; Weill Cornell Medicine, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

A cell is a very crowded environment that is prone to interaction promiscuity among proteins. Natural selection, therefore, acts to maintain interactions among cognate proteins but also to prevent the ones that may have deleterious consequences. Here, Lite and colleagues examine the contribution of single amino acid substitutions and their combinations to protein-protein interaction specificity using a toxin-antitoxin system. The results provide important insight into how protein-protein interactions evolve and achieve specificity in the context of gene duplication where duplicated proteins initially have the same interaction partners but eventually evolve to have their specific cognate partners.

Decision letter after peer review:

Thank you for submitting your article "The genetic landscape of protein-protein interaction specificity" for consideration by eLife. Your article has been reviewed by Olga Boudker as the Senior Editor, a Reviewing Editor, and three reviewers. The reviewers have opted to remain anonymous.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

We would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (https://elifesciences.org/articles/57162). Specifically, when editors judge that a submitted work as a whole belongs in eLife but that some conclusions require a modest amount of additional new data, as they do with your paper, we are asking that the manuscript be revised to either limit claims to those supported by data in hand, or to explicitly state that the relevant conclusions require additional supporting data.

Our expectation is that the authors will eventually carry out the additional experiments and report on how they affect the relevant conclusions either in a preprint on bioRxiv or medRxiv, or if appropriate, as a Research Advance in eLife, either of which would be linked to the original paper.

Summary:

Lite and colleagues examine the contribution of single amino acid substitutions and their combinations to protein-protein interactions using a toxin-antitoxin system. The results provide an important insight into how protein-protein interactions evolve and achieve specificity in the context of gene duplication where duplicated proteins initially have the same interaction partners but eventually evolve to have their specific cognate partners. The reviewers appreciated the quality and importance of the work. They raised several points that would need to be addressed.

1) The novelty of the present study relative to previous work by the same authors and others is not obvious. The context in which the study is anchored could be broader to better demonstrate the implications of the work. Some previous studies that relate to the present work are not discussed.

2) There are some issues regarding the way fitness is estimated across backgrounds and regarding the analysis of epistasis.

3) Many analyses are based on precise cut offs. It would be important to show that the conclusions are robust to the choice of cut off values. A more quantitative assessment of the results rather than one based on cut offs could be used.

4) Issues related to data availability were raised.

5) Protein abundance of the various mutants does not seem to be taken into account and this may be a confounding factor in the measurement of protein-protein interactions. Ideally this would be addressed experimentally but it should at least be discussed if experiments are not easily feasible in the current context or if such data is not already available.

I am leaving the individual reports appended below because they are largely non-redundant and some of the details could be useful for preparing the revisions.

Reviewer #1:

Lite and colleagues describe the contributions of individual residues on protein-protein interaction specificity in the parDE3 toxin-antitoxin system. How interaction specificity arises in PPIs, particularly after gene duplication, is an important problem in evolutionary. They use bulk competition assays, coupled with deep sequencing to quantify the degree to which mutant antitoxins can detoxify the presence of the cognate interaction partner ParE3 and a homologous, orthogonal toxin ParE2. Novel findings include a quantitative exploration of the degeneracy of the cognate parDE3 toxin-antitoxin interface towards change in the targeted residues, an assessment of the relative contributions of each selected residue towards antitoxin binding and selectivity, a structural basis for the interface differences between parDE2 and parDE3 and proof that not all positive elements serve as specificity-determining residues. The data presented in this study should be of interest to biochemists, evolutionary biologists, and geneticists, so is appropriate for eLife's general audience.

Lite et al., characterize the detoxifcation-capabilities of an antitoxin library consisting of 8000 variants, in an approach similar to a previously published study from the same authors (Aarke et al., 2015). While the new library has the advantage of being combinatorically complete, it is smaller than the library from the cited study and 2/3 of the investigated residues in this work were already mutated in the preceding study (albeit with only 13 / 10 of possible amino acids) There is an interesting difference between the previous work, which only used naturally occurring states, whereas this study uses all possible mutations. I think it would be very interesting if the authors could comment on how this might relate to the different results between the studies.

The Results section of this study makes it sound as if the specificity-switching residues were first identified herein. Overall, the similarity to previously published data and experimental setup alongside a decrease in total library size compared to previous studies, that investigate the same model system, make the novelty of this work a little less obvious. Perhaps the authors could emphasize again conceptually (apart from saturation) what new approach this study adds, and why it is crucial for our understanding of specificity.

The assay employed in this work is not capable of discriminating good antitoxins from great antitoxins (as discussed in subsection “Specificity arises from the discrimination between cognate and non-cognate partners”). To address this, the authors measure the contributions to specificity of all single amino acid changes in all possible genetic backgrounds. Such a background-independent analysis leverages the completeness of the library (and I think was first employed here, Salinas and Ranganathan, 2018) to get an average effect for each mutation, though I am a little unsure I understand that it is appropriate in this case. The authors conclude that the naturally occurring specificity-determining residues are "optimal with respect to promoting the insulation" (Introduction). It seems to me that the authors compare the fitness of each mutation in the context of all genetic backgrounds to the wild-type state in its wild-type context and not in the context of all genetic backgrounds. It is thus to be expected that the apparent fitness of antitoxins that retain the cognate amino acid in a given position (e.g. ParD3 retains a D in position 1) is <1 if they are analyzed in a background-dependent manner (e.g. DXX). This is because many of the backgrounds will contain deleterious states. So the comparison of a background averaged fitness to the fitness of the WT state in a single background seems inappropriate to me to show that the WT sequence is optimal.

Reviewer #2:

In this work, Li et al., perform an exceptionally comprehensive assessment of how individual mutations contribute to recognition specificity amongst paralogous complexes. The authors take a bacterial anti-toxin, ParD3, and construct a combinatorial saturation mutagenesis library of three positions that physically interface with the toxin, ParE3. They then measure the effects of these mutations on: (1) binding of the cognate partner, ParE3 and (2) binding of a non-cognate partner, the paralog ParE2. The authors also compare the interactions at these three positions in the crystal structures of the ParE3/ParD3 complex (previously published) and the ParE2/ParD2 (which they solved in this work).

The central result of the paper is that individual positions can act as both positive and negative elements for interaction specificity, and that "positive and negative contributions are neither inherently coupled nor mutually exclusive". That is, rather than using distinct sets of positions to either enhance cognate interactions or discourage non-cognate binding, specificity can be (but is not always) encoded in an overlapping set of residues. This has implications for both the engineering and evolution of protein interactions. The experiments appear to be of high technical quality, and we expect the results will be of interest to a diverse scientific audience in biophysics, structural biology, protein engineering and molecular evolution. We recommend publication in eLife.

Essential revisions:

1) In Figure 1B, the authors lay out two different models of how positions at a physical interface contribute to the formation of a specific protein-protein interaction. Their data demonstrate that both models apply within a single protein. Much of the impact of the paper seems to depend on how surprising (or not) this finding is, and what the consequences of this finding might be both for evolving and engineering complexes. Thus, it is necessary for the authors to provide more context that can help the reader clearly assess the impact of this work. Specifically, can more be said about why and when one model would be favored over another? What is the evolutionary implication for individual residues in a protein to have negative and/or positive roles in identifying cognate interactions? Why is it surprising that residues can carry out dual roles in recognition and discrimination in a binding partner?

2) The linear model (Figure 3—figure supplement 1, Figure 4—figure supplement 2) indicates that the effects of mutations at the anti-toxin/toxin interface can be considered near-independently. This suggests that there is little epistasis (or coupling) between positions. However, can the authors perform a more thorough analysis of second and third order epistasis? Do they see that epistasis is centered at zero for the average interaction between mutations across a pair of sites? Given that the authors have all of the data, a thorough analysis of epistasis would further support their linear model and show that the contributions of each position studied in ParD3 are independent from each other. This seems especially interesting given the spatial proximity of positions 61 and 64.

3) As the authors point out, the in vivo assays rely on induced expression of toxins and antitoxins in a heterologous host. They claim that "the relative behavior of ParD3 variants measured here is likely to apply in whatever context they arise". Can the authors show that the induction conditions do not strongly affect their measurements? One way to demonstrate this is to take a small panel of ~10-20 ParD3 mutants that have a broad range of effects on growth rate across both ParE3 and ParE2 backgrounds. The growth rate effects of mutants in this "sub-library" can then be measured across varying concentrations of IPTG (for induction of ParD3) and arabinose (induction of ParE2/E3). Does the rank ordering of the mutant growth rates effects change with induction level, or are the results indeed qualitatively similar?

Reviewer #3:

This is an interesting study that uses a combinatorially complete deep mutagenesis strategy to identify determinants of the specificity of protein-protein interactions between bacterial toxins and antitoxins using a paralogy pair as a model system. I enjoyed the manuscript: it addresses a general and important question with a good experimental design and using an elegant model system. The clarification of negative and positive contributions to specificity is conceptually interesting.

Essential revisions:

1) Many of the analyses use an arbitrary cut-off of interaction vs no interaction (W>0.5). Whilst this simplifies communication and analyses, it is important to: (1) demonstrate that the conclusions are robust to the choice of this arbitrary cut-off; and (2) to stress that this reduction in fitness is huge and in natural populations much smaller changes in fitness are likely to be selected against, especially in microbes with large effective population sizes. How does imposing a much higher fitness cut-off alter the authors' conclusions?

2) In general, and related to point [1], I would prefer to see a more quantitative treatment of the data. Defining proteins as interacting or not interacting is a bit clunky given that binding is actually a fully quantitative trait and the data here is, at least by design, also quantitative. Many of the questions addressed could be answered quantitatively rather than by using a binary categorisation of binding vs non-binding.

3) What about protein abundance? The authors don't quantify the effects of the combinatorial mutations on protein abundance, so some of the non-specific effects on binding are likely to be due to changes in concentration of the protein not the binding affinity.

4) Relationship to conclusions in previous publications from the same group: previous manuscripts from the same lab have focussed on the finding that mutational effects in protein interaction interfaces change in the presence of additional mutations in the same protein (Podgornaia and Laub, 2015 i.e. the importance of pairwise and higher order epistasis). How, quantitatively, does the current dataset compare to this previous dataset on a different system and also to previous toxin-anti toxin mutagenesis datasets? Are mutational effects less background dependent in the toxin-anti-toxin system or similarly so?

5) "Although a purely additive model of residue contributions was highly predictive of variant fitness (R2 = 0.89, SD between folds + 0.003; Figure 3—figure supplement 1B), the model was weakest for the most fit variants, likely due to diminishing returns for highly favorable residues." This is fully expected- mutations should have effects that are additive for free energy changes (ddG binding) but not for changes in protein concentration because of the non-linear relationship between the amount of protein bound to an interaction partner and the free energy of binding (dG).

6) Data processing / quality control. We tried to download the raw sequencing data from SRA using the referee token to perform some basic quality control, but it seems only possible to obtain the summary tables not the actual sequence reads (this may be an issue in general with private SRA entries). Would it be possible to get access to the raw data e.g. by making the entry fully public prior to publication for reviewing purposes?

We have noticed major issues with quite a few deep mutational scanning data analyses, including in published papers. These include inappropriate filtering such that the analysed datasets consist partly (or in some cases, largely) of sequencing errors not real variants which can seriously alter conclusions. A second issue is underestimating sampling errors due to over-sequencing. For the design of the library used here and the filtering applied, this seem unlikely to be the case. But it would still be good to run some basic checks prior to publication.

7) Introduction/ citation of prior work. An obvious missing citation given the similarity of questions, strategy, title and journal is: Diss et al., 2018. In addition, there have now been quite a few papers published that use a similar combinatorially complete deep mutagenesis design published on different molecules (proteins, RNAs) and molecular processes and very few of them are cited here. Also, the statement in the introduction that 'prior work has not used combinatorially complete libraries to systematically dissect interface residues' is a bit misleading given previous work from the Laub lab.

8) It would be useful to more explicitly state in the text the (rational for the) previous ParD mutagenesis library design in the previous publication by the same lab.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Uncovering the basis of protein-protein interaction specificity with a combinatorially complete library" for further consideration by eLife. Your revised article has been evaluated by Olga Boudker (Senior Editor), a Reviewing Editor and one of the original reviewers.

The manuscript has been improved but there are some minor remaining issues that need to be addressed before formal acceptance, as outlined below:

I would include some of the panels from Figure 2—figure supplement 2 in the main figure – as the interaction strength increases, the degeneracy of the interface decreases. I think this is an interesting point.

Figure 4C, D. It looks like, if the changes in mutational effects across genetic backgrounds are significant, that there is more sign epistasis (switches from beneficial to detrimental mutational effects) for interaction with the non-cognate partner. Is this true? Any ideas why this might be?

Figure 3—figure supplement 1D. There is a (weak) sigmoidal relationship between the fitness scores predicted by the linear model and the actual fitness scores which suggests there is global (non-specific) epistasis in this system. This may be because there is a non-linear relationship between changes in free energy and the phenotype being measured (=~binding), as may be expected from thermodynamics. And/or perhaps because of an upper or lower bound on the measurement range.

https://doi.org/10.7554/eLife.60924.sa1

Author response

Reviewer #1:

Lite and colleagues describe the contributions of individual residues on protein-protein interaction specificity in the parDE3 toxin-antitoxin system. How interaction specificity arises in PPIs, particularly after gene duplication, is an important problem in evolutionary. They use bulk competition assays, coupled with deep sequencing to quantify the degree to which mutant antitoxins can detoxify the presence of the cognate interaction partner ParE3 and a homologous, orthogonal toxin ParE2. Novel findings include a quantitative exploration of the degeneracy of the cognate parDE3 toxin-antitoxin interface towards change in the targeted residues, an assessment of the relative contributions of each selected residue towards antitoxin binding and selectivity, a structural basis for the interface differences between parDE2 and parDE3 and proof that not all positive elements serve as specificity-determining residues. The data presented in this study should be of interest to biochemists, evolutionary biologists, and geneticists, so is appropriate for eLife's general audience.

Lite et al., characterize the detoxifcation-capabilities of an antitoxin library consisting of 8000 variants, in an approach similar to a previously published study from the same authors (Aarke et al., 2015). While the new library has the advantage of being combinatorically complete, it is smaller than the library from the cited study and 2/3 of the investigated residues in this work were already mutated in the preceding study (albeit with only 13 / 10 of possible amino acids) There is an interesting difference between the previous work, which only used naturally occurring states, whereas this study uses all possible mutations. I think it would be very interesting if the authors could comment on how this might relate to the different results between the studies.

The Results section of this study makes it sound as if the specificity-switching residues were first identified herein. Overall, the similarity to previously published data and experimental setup alongside a decrease in total library size compared to previous studies, that investigate the same model system, make the novelty of this work a little less obvious. Perhaps the authors could emphasize again conceptually (apart from saturation) what new approach this study adds, and why it is crucial for our understanding of specificity.

We have now substantially revised the Introduction to provide more context for our study. In particular, we now better describe a series of prior deep-mutational scanning studies and how the work at hand represents a conceptual advance over these prior studies and how the design of our experiment enables new insights into protein interaction specificity, not just binding of a cognate partner.

We also now emphasize the difference between the library constructed here and that constructed previously. The new library is (i) focused on a more appropriate set of interface residues and (ii) is combinatorially complete, which ultimately enables much more powerful and rigorous conclusions. To expand on this last point: in principle, the prior ParD3 library could be used to perform some of the same analyses performed here that investigate how individual substitutions impact interaction specificity in multiple genetic contexts. However, such analyses were not reported in the prior study of ParD3 and would be based on a much more limited number of contexts for virtually all substitutions. We have now run the same analyses presented in Figure 4C-E, but using only the subset of the library that overlaps with what we used previously in Aakre et al., 2015. A direct comparison of the modes calculated for each library clearly demonstrates that the prior library would not have produced the same results as those generated here using the combinatorially complete library. This analysis is now included in Figure 4—figure supplement 3A-D.

The assay employed in this work is not capable of discriminating good antitoxins from great antitoxins (as discussed in subsection “Specificity arises from the discrimination between cognate and non-cognate partners”). To address this, the authors measure the contributions to specificity of all single amino acid changes in all possible genetic backgrounds. Such a background-independent analysis leverages the completeness of the library (and I think was first employed here, Salinas and Ranganathan, 2018) to get an average effect for each mutation, though I am a little unsure I understand that it is appropriate in this case. The authors conclude that the naturally occurring specificity-determining residues are "optimal with respect to promoting the insulation" (Introduction). It seems to me that the authors compare the fitness of each mutation in the context of all genetic backgrounds to the wild-type state in its wild-type context and not in the context of all genetic backgrounds. It is thus to be expected that the apparent fitness of antitoxins that retain the cognate amino acid in a given position (e.g. ParD3 retains a D in position 1) is <1 if they are analyzed in a background-dependent manner (e.g. DXX). This is because many of the backgrounds will contain deleterious states. So the comparison of a background averaged fitness to the fitness of the WT state in a single background seems inappropriate to me to show that the WT sequence is optimal.

We have now directly addressed this concern by performing a similar analysis to Figure 4, but comparing fitness distributions (W, not ∆W) for each amino acid at each position to the fitness distribution for the corresponding wild-type residue. In other words, we compare AXX, CXX, EXX, etc. distributions to the distribution featuring the wild-type residue at position 1, i.e. DXX. This analysis (see revised Figure 4—figure supplement 3E-G) produced highly similar results to those shown in Figure 4, supporting the notion that the wild-type sequence is optimal.

Reviewer #2:

In this work, Li et al., perform an exceptionally comprehensive assessment of how individual mutations contribute to recognition specificity amongst paralogous complexes. The authors take a bacterial anti-toxin, ParD3, and construct a combinatorial saturation mutagenesis library of three positions that physically interface with the toxin, ParE3. They then measure the effects of these mutations on: (1) binding of the cognate partner, ParE3 and (2) binding of a non-cognate partner, the paralog ParE2. The authors also compare the interactions at these three positions in the crystal structures of the ParE3/ParD3 complex (previously published) and the ParE2/ParD2 (which they solved in this work).

The central result of the paper is that individual positions can act as both positive and negative elements for interaction specificity, and that "positive and negative contributions are neither inherently coupled nor mutually exclusive". That is, rather than using distinct sets of positions to either enhance cognate interactions or discourage non-cognate binding, specificity can be (but is not always) encoded in an overlapping set of residues. This has implications for both the engineering and evolution of protein interactions. The experiments appear to be of high technical quality, and we expect the results will be of interest to a diverse scientific audience in biophysics, structural biology, protein engineering and molecular evolution. We recommend publication in eLife.

Essential revisions:

1) In Figure 1B, the authors lay out two different models of how positions at a physical interface contribute to the formation of a specific protein-protein interaction. Their data demonstrate that both models apply within a single protein. Much of the impact of the paper seems to depend on how surprising (or not) this finding is, and what the consequences of this finding might be both for evolving and engineering complexes. Thus, it is necessary for the authors to provide more context that can help the reader clearly assess the impact of this work. Specifically, can more be said about why and when one model would be favored over another? What is the evolutionary implication for individual residues in a protein to have negative and/or positive roles in identifying cognate interactions? Why is it surprising that residues can carry out dual roles in recognition and discrimination in a binding partner?

As we note in the Introduction, it is not yet clear for virtually all protein complexes how specificity is determined and whether individual residues play positive or negative roles, or both. Thus, we did not approach this study with an a priori notion of what would constitute 'surprising' results and we did not want to write the Introduction in too leading a way. Instead, we wanted to indicate that we simply don't know the answer for most protein complexes as there have been exceedingly few studies like ours using combinatorially complete libraries that can properly address this question. As noted above in response to reviewer 1, we have now revised the Introduction to better lay out what prior studies have shown and how our combinatorially complete library approach differs, enabling new insights into protein interaction specificity. We hope these revisions now provide the context this reviewer was seeking. Additionally, based on this reviewer's comment we have also amended the Introduction to highlight the fact that systematically dissecting interaction specificity has implications both for understanding protein evolution and for protein engineering efforts.

2) The linear model (Figure 3—figure supplement 1, Figure 4—figure supplement 2) indicates that the effects of mutations at the anti-toxin/toxin interface can be considered near-independently. This suggests that there is little epistasis (or coupling) between positions. However, can the authors perform a more thorough analysis of second and third order epistasis? Do they see that epistasis is centered at zero for the average interaction between mutations across a pair of sites? Given that the authors have all of the data, a thorough analysis of epistasis would further support their linear model and show that the contributions of each position studied in ParD3 are independent from each other. This seems especially interesting given the spatial proximity of positions 61 and 64.

We have now examined epistasis in more detail, following an approach similar to that presented in Salinas and Ranganathan, (2018). We calculated the difference between the fitness of each variant predicted assuming independence and the observed fitness. We then plotted these distributions for all possible double mutants (broken down by position pair) and all possible triple mutants. As expected given the success of our linear model, the distribution of differences in each case is centered around 0 indicating that, on average, the two positions behave largely independently, i.e. there is minimal epistasis. This new analysis is presented in subsection “Library variants show a range of abilities to discriminate between cognate and non-cognate partners” and in Figure 3—figure supplement 1E-F of the revised manuscript. We also note in the text that the lack of epistasis in our library contrasts with that observed previously by Salinas and Ranganathan in PDZ-peptide interactions and by us in the context of two-component signaling proteins. However, the library constructed here involves three positions in ParD that interact with non-overlapping sets of residues in ParE (even for positions 61 and 64 noted by the reviewer), potentially explaining their independence.

3) As the authors point out, the in vivo assays rely on induced expression of toxins and antitoxins in a heterologous host. They claim that "the relative behavior of ParD3 variants measured here is likely to apply in whatever context they arise". Can the authors show that the induction conditions do not strongly affect their measurements? One way to demonstrate this is to take a small panel of ~10-20 ParD3 mutants that have a broad range of effects on growth rate across both ParE3 and ParE2 backgrounds. The growth rate effects of mutants in this "sub-library" can then be measured across varying concentrations of IPTG (for induction of ParD3) and arabinose (induction of ParE2/E3). Does the rank ordering of the mutant growth rates effects change with induction level, or are the results indeed qualitatively similar?

We have now performed this analysis. Given restrictions on lab time during the pandemic and the first author’s return to medical school (she is an MD-PhD student), we used a smaller (4 antitoxin variants) panel than suggested by the reviewer as this subset of clones already existed (Figure 1—figure supplement 1E) and did not require de novo construction/cloning. Using this panel, we plated strains harboring each antitoxin variant and either the ParE3 or ParE2 toxin on varying concentrations of IPTG (the antitoxin inducer). As is clear in the new Figure 2—figure supplement 3 the rank ordering of mutant growth rates is unaffected by induction level.

Reviewer #3:

This is an interesting study that uses a combinatorially complete deep mutagenesis strategy to identify determinants of the specificity of protein-protein interactions between bacterial toxins and antitoxins using a paralogy pair as a model system. I enjoyed the manuscript: it addresses a general and important question with a good experimental design and using an elegant model system. The clarification of negative and positive contributions to specificity is conceptually interesting.

Essential revisions:

1) Many of the analyses use an arbitrary cut-off of interaction vs no interaction (W>0.5). Whilst this simplifies communication and analyses, it is important to: (1) demonstrate that the conclusions are robust to the choice of this arbitrary cut-off; and (2) to stress that this reduction in fitness is huge and in natural populations much smaller changes in fitness are likely to be selected against, especially in microbes with large effective population sizes. How does imposing a much higher fitness cut-off alter the authors' conclusions?

There are actually relatively few analyses in the paper that rely on an arbitrary cut-off. For instance, the data in Figure 4 involve the presentation of the full distribution of fitness effects, with no thresholds or cut-offs used beyond the exclusion of proline substitutions that may trivially disrupt protein structure. We do introduce thresholds in the analyses shown in Figure 2D, Figure 2G, and Figure 3B-C in which we present sequence logos for ParD variants that are ParE3-specific, ParE2-specific, or promiscuous. However, we also included sequence logos for each category at several different thresholds (0.5, 0.7, and 0.9) in Figure 2—figure supplement 2. We included these for precisely the reason raised by the reviewer, namely to show that the conclusions drawn are robust to the choice of threshold. We have now added statements (see subsection “Pervasive degeneracy in toxin binding” the revised manuscript) drawing the reader's attention to Figure 2—figure supplement 2 and have stressed (see the Discussion section) the point that large population sizes for microbes can lead to selection on fitness differences that are difficult to detect in a laboratory setting.

2) In general, and related to point [1], I would prefer to see a more quantitative treatment of the data. Defining proteins as interacting or not interacting is a bit clunky given that binding is actually a fully quantitative trait and the data here is, at least by design, also quantitative. Many of the questions addressed could be answered quantitatively rather than by using a binary categorisation of binding vs non-binding.

Please see the immediately preceding response on the same topic.

3) What about protein abundance? The authors don't quantify the effects of the combinatorial mutations on protein abundance, so some of the non-specific effects on binding are likely to be due to changes in concentration of the protein not the binding affinity.

Each of the positions mutated in our library are solvent-exposed residues that show high variability in naturally occurring ParD homologs. Thus, we think it is unlikely that the mutations introduced affect protein folding/stability and, consequently, abundance, but we cannot formally rule this out. There are no antibodies to ParD available and epitope-tagging ParD can interfere with function. Thus, we have not been able to experimentally address this issue, but we do now explicitly discuss it in the revised manuscript (see the Discussion section).

4) Relationship to conclusions in previous publications from the same group: previous manuscripts from the same lab have focussed on the finding that mutational effects in protein interaction interfaces change in the presence of additional mutations in the same protein (Podgornaia and Laub, 2015 i.e. the importance of pairwise and higher order epistasis). How, quantitatively, does the current dataset compare to this previous dataset on a different system and also to previous toxin-anti toxin mutagenesis datasets? Are mutational effects less background dependent in the toxin-anti-toxin system or similarly so?

Please see our response to reviewer 2, point #2 on the same issue.

5) "Although a purely additive model of residue contributions was highly predictive of variant fitness (R2 = 0.89, SD between folds + 0.003; Figure 3—figure supplement 1B), the model was weakest for the most fit variants, likely due to diminishing returns for highly favorable residues." This is fully expected- mutations should have effects that are additive for free energy changes (ddG binding) but not for changes in protein concentration because of the non-linear relationship between the amount of protein bound to an interaction partner and the free energy of binding (dG).

We agree and have added a statement in subsection “Library variants show a range of abilities to discriminate between cognate and non-cognate partners” indicating that our model is consistent with mutations having strictly additive effects on binding.

6) Data processing / quality control. We tried to download the raw sequencing data from SRA using the referee token to perform some basic quality control, but it seems only possible to obtain the summary tables not the actual sequence reads (this may be an issue in general with private SRA entries). Would it be possible to get access to the raw data e.g. by making the entry fully public prior to publication for reviewing purposes?

We have noticed major issues with quite a few deep mutational scanning data analyses, including in published papers. These include inappropriate filtering such that the analysed datasets consist partly (or in some cases, largely) of sequencing errors not real variants which can seriously alter conclusions. A second issue is underestimating sampling errors due to over-sequencing. For the design of the library used here and the filtering applied, this seem unlikely to be the case. But it would still be good to run some basic checks prior to publication.

We have confirmed that the SRA data is complete and available. To avoid any further issues, we have made everything publicly available in advance of publication – please see https://www.ncbi.nlm.nih.gov/sra?term=SRP270411. If the reviewers wish to download the raw data and find any issues in their own quality control, we are certainly happy to hear about it and address it. Please note that we detected contamination in one post-selection library sample with the wild-type ParD3 gene (<0.4% of aligned reads for that sample), which was filtered out. This wild-type sequence was likely introduced during library preparation because (1) the wild-type plasmid was not detected in the pre-selection sample, (2) the wild-type plasmid was not used in the construction of the library (see Supplementary file 2), and (3) the wild-type plasmid could not have been created during library construction because it does not strictly use NNK codons at the mutated positions. The exact filtering steps we used (i.e., requiring NNK codons at mutated positions and 0 mutations at all remaining positions) are indicated in the Materials and methods section.

7] Introduction/ citation of prior work. An obvious missing citation given the similarity of questions, strategy, title and journal is: Diss et al., 2018. In addition, there have now been quite a few papers published that use a similar combinatorially complete deep mutagenesis design published on different molecules (proteins, RNAs) and molecular processes and very few of them are cited here. Also, the statement in the introduction that 'prior work has not used combinatorially complete libraries to systematically dissect interface residues' is a bit misleading given previous work from the Laub lab.

As noted above, we have now expanded and revised the Introduction quite substantially to better cite and discuss prior deep mutational scans, including the Diss and Lehner study noted by the reviewer and our own lab's work, with an emphasis on how our study builds off but differs from those prior studies.

8] It would be useful to more explicitly state in the text the (rational for the) previous ParD mutagenesis library design in the previous publication by the same lab.

We now provide additional discussion in the text (see the Results section) of the rationale for the previous ParD library, emphasizing the modifications made in this work and their advantages.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

I would include some of the panels from Figure 2—figure supplement 2 in the main figure – as the interaction strength increases, the degeneracy of the interface decreases. I think this is an interesting point.

We have now moved all of the panels from former Figure 2—figure supplement 2 into the main figure.

Figure 4C,D. It looks like, if the changes in mutational effects across genetic backgrounds are significant, that there is more sign epistasis (switches from beneficial to detrimental mutational effects) for interaction with the non-cognate partner. Is this true? Any ideas why this might be?

We agree with the reviewer's point that the distributions in Figure 4C-D can extend above and below 0, especially for the non-cognate interaction, which suggests that a mutation can have either beneficial or detrimental effects depending on the exact context. If they had been bimodal with a peak above 0 and one below 0, we think it would have been important to comment on such sign epistasis. However, because these distributions are unimodal, we think it would be overly speculative to comment in depth on the issue of sign epistasis. Furthermore, for 4D panel 1 (first library position), where this phenomenon is most common, the distributions of fitness effects are tightly centered around 0. The complete data are, of course, provided with the paper and readers will be able to dissect this aspect of our data, if they so desire.

Figure 3—figure supplement 1D. There is a (weak) sigmoidal relationship between the fitness scores predicted by the linear model and the actual fitness scores which suggests there is global (non-specific) epistasis in this system. This may be because there is a non-linear relationship between changes in free energy and the phenotype being measured (=~binding), as may be expected from thermodynamics. And/or perhaps because of an upper or lower bound on the measurement range.

We agree and had already commented on this in the Discussion section. We have now added a similar comment in the Results section when Figure 3—figure supplement 1D is first cited.

https://doi.org/10.7554/eLife.60924.sa2

Article and author information

Author details

  1. Thuy-Lan V Lite

    Department of Biology Massachusetts Institute of Technology, Cambridge, United States
    Contribution
    Conceptualization, Data curation, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-2743-4231
  2. Robert A Grant

    Department of Biology Massachusetts Institute of Technology, Cambridge, United States
    Contribution
    Data curation, Formal analysis, Validation, Investigation, Visualization, Methodology
    Competing interests
    No competing interests declared
  3. Isabel Nocedal

    Department of Biology Massachusetts Institute of Technology, Cambridge, United States
    Contribution
    Formal analysis
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-4706-1113
  4. Megan L Littlehale

    Department of Biology Massachusetts Institute of Technology, Cambridge, United States
    Contribution
    Investigation, Methodology
    Competing interests
    No competing interests declared
  5. Monica S Guo

    Department of Biology Massachusetts Institute of Technology, Cambridge, United States
    Contribution
    Formal analysis
    Competing interests
    No competing interests declared
  6. Michael T Laub

    1. Department of Biology Massachusetts Institute of Technology, Cambridge, United States
    2. Howard Hughes Medical Institute Massachusetts Institute of Technology, Cambridge, United States
    Contribution
    Conceptualization, Data curation, Formal analysis, Supervision, Funding acquisition, Visualization, Writing - original draft, Project administration, Writing - review and editing
    For correspondence
    laub@mit.edu
    Competing interests
    Reviewing editor, eLife
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8288-7607

Funding

National Institutes of Health (T32GM007753)

  • Thuy-Lan V Lite

Howard Hughes Medical Institute

  • Michael T Laub

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank D Ding, C McClune, A Keating, and R Gaudet for comments on the manuscript. The project described was supported by award T32GM007753 from the National Institute of General Medical Sciences. M.T.L. is an Investigator of the Howard Hughes Medical Institute. This research made use of NE-CAT beamlines (P30 GM124165), a Pilatus detector (RR029205), and an Eiger detector (OD021527) at the APS (DE-AC02-06CH11357).

Senior Editor

  1. Olga Boudker, Weill Cornell Medicine, United States

Reviewing Editor

  1. Christian R Landry, Université Laval, Canada

Publication history

  1. Received: July 10, 2020
  2. Accepted: October 26, 2020
  3. Accepted Manuscript published: October 27, 2020 (version 1)
  4. Version of Record published: November 16, 2020 (version 2)

Copyright

© 2020, Lite et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,195
    Page views
  • 270
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Biochemistry and Chemical Biology
    Donghyuk Shin et al.
    Research Article Updated

    Legionella pneumophila causes a severe pneumonia known as Legionnaires’ disease. During the infection, Legionella injects more than 300 effector proteins into host cells. Among them are enzymes involved in altering the host-ubiquitination system. Here, we identified two LegionellaOTU (ovarian tumor)-like deubiquitinases (LOT-DUBs; LotB [Lpg1621/Ceg23] and LotC [Lpg2529]). The crystal structure of the LotC catalytic core (LotC14-310) was determined at 2.4 Å. Unlike the classical OTU-family, the LOT-family shows an extended helical lobe between the Cys-loop and the variable loop, which defines them as a unique class of OTU-DUBs. LotB has an additional ubiquitin-binding site (S1’), which enables the specific cleavage of Lys63-linked polyubiquitin chains. By contrast, LotC only contains the S1 site and cleaves different species of ubiquitin chains. MS analysis of LotB and LotC identified different categories of host-interacting proteins and substrates. Together, our results provide new structural insights into bacterial OTU-DUBs and indicate distinct roles in host–pathogen interactions.

    1. Biochemistry and Chemical Biology
    2. Microbiology and Infectious Disease
    Qi Yang et al.
    Research Article Updated

    The Spike protein of SARS-CoV-2, its receptor-binding domain (RBD), and its primary receptor ACE2 are extensively glycosylated. The impact of this post-translational modification on viral entry is yet unestablished. We expressed different glycoforms of the Spike-protein and ACE2 in CRISPR-Cas9 glycoengineered cells, and developed corresponding SARS-CoV-2 pseudovirus. We observed that N- and O-glycans had only minor contribution to Spike-ACE2 binding. However, these carbohydrates played a major role in regulating viral entry. Blocking N-glycan biosynthesis at the oligomannose stage using both genetic approaches and the small molecule kifunensine dramatically reduced viral entry into ACE2 expressing HEK293T cells. Blocking O-glycan elaboration also partially blocked viral entry. Mechanistic studies suggest multiple roles for glycans during viral entry. Among them, inhibition of N-glycan biosynthesis enhanced Spike-protein proteolysis. This could reduce RBD presentation on virus, lowering binding to host ACE2 and decreasing viral entry. Overall, chemical inhibitors of glycosylation may be evaluated for COVID-19.