1. Biochemistry and Chemical Biology
  2. Computational and Systems Biology
Download icon

Coevolution-based inference of amino acid interactions underlying protein function

  1. Victor H Salinas
  2. Rama Ranganathan  Is a corresponding author
  1. UT Southwestern Medical Center, United States
  2. The University of Chicago, United States
Research Article
  • Cited 26
  • Views 5,872
  • Annotations
Cite this article as: eLife 2018;7:e34300 doi: 10.7554/eLife.34300

Abstract

Protein function arises from a poorly understood pattern of energetic interactions between amino acid residues. Sequence-based strategies for deducing this pattern have been proposed, but lack of benchmark data has limited experimental verification. Here, we extend deep-mutation technologies to enable measurement of many thousands of pairwise amino acid couplings in several homologs of a protein family – a deep coupling scan (DCS). The data show that cooperative interactions between residues are loaded in a sparse, evolutionarily conserved, spatially contiguous network of amino acids. The pattern of amino acid coupling is quantitatively captured in the coevolution of amino acid positions, especially as indicated by the statistical coupling analysis (SCA), providing experimental confirmation of the key tenets of this method. This work exposes the collective nature of physical constraints on protein function and clarifies its link with sequence analysis, enabling a general practical approach for understanding the structural basis for protein function.

https://doi.org/10.7554/eLife.34300.001

Introduction

The basic biological properties of proteins -- structure, function, and evolvability -- arise from the pattern of energetic interactions between amino acid residues (Anfinsen, 1973; Gregoret and Sauer, 1993; Luque et al., 2002; Starr and Thornton, 2016; Weinreich et al., 2006). This pattern represents the foundation for defining how proteins work, for engineering new activities, and for understanding their origin through the process of evolution. However, the problem of deducing this pattern is extraordinarily difficult. Amino acids act heterogeneously and cooperatively in contributing to protein fitness, properties that are not simple, intuitive functions of the positions of atoms in atomic structures (Alber et al., 1987). Indeed, the marginal stability of proteins and the subtlety of the fundamental forces make it so that many degenerate patterns of energetic interactions could be consistent with observed protein structures. The lack of knowledge of this pattern has precluded effective mechanistic models for the relationship between protein structure and function.

In principle, an experimental approach for deducing the pattern of interactions between amino acid residues is the thermodynamic double mutant cycle (Carter et al., 1984; Hidalgo and MacKinnon, 1995; Horovitz and Fersht, 1990) (TDMC, Figure 1A). In this method, the energetic coupling between two residues in a protein is probed by studying the effect of mutations at those positions, both singly and in combination. The idea is that if mutations x and y at positions i and j, respectively, act independently, the effect of the double mutation (ΔGijxy) must be the sum of the effects of each single mutant (ΔGix+ΔGjy). Thus, one can compute a coupling free energy between the two mutations (ΔΔGijxy) as:

(1) ΔΔGijxy=(ΔGix+ΔGjy)ΔGijxy,
Figure 1 with 2 supplements see all
A deep coupling scan (DCS) for the PDZ binding pocket.

(A), The thermodynamic double mutant cycle (TDMC), a formalism for studying the energetic coupling of pairs of mutations in a protein. Given two mutations (x at postion i and y at position j), the coupling free energy between them is defined as the extent to which the effect of the double mutation (ΔGijxy) is different from the summed effect of the mutations taken individually (ΔGix+ΔGjy), a measure of the interaction (or epistasis) between the two mutations (see Equation 1, main text). (B), Structural overlay of the five PDZ homologs used in this study (PSD95pdz3 (1BE9, white), PSD95pdz2 (1QLC, orange), ZO1pdz (2RRM, yellow), Shank3pdz (5IZU, gray), and Syntrophinpdz (1Z86, blue)), emphasizing the conserved αβ-fold architecture of these sequence-diverse proteins (33% average identity, Table 1).Structural elements discussed in this work are indicated. (C), The nine-amino acid α2-helix, which forms one wall of the ligand-binding site. (D–E), The distribution of experimentally determined binding free energies, ΔGbind, for all single mutations (D, 855/855) and nearly all double mutations (E, 56,694/64,980) in the α2-helix for the 5 PDZ homologs, with the affinity of wild-type PSD95pdz3 indicated (wt). The red lines indicate the independently validated range of the assay (Figure 1—figure supplement 1); essentially all measurements fall within this range. These data comprise the basis for a deep analysis of conserved thermodynamic coupling in the PDZ family.

https://doi.org/10.7554/eLife.34300.002

the difference between the effect predicted by the independent effects of the underlying single mutations and that of the actual double mutant. ΔΔGijxy is typically proposed as an estimate for the degree of cooperativity between positions i and j.

However, there are serious conceptual and technical issues with the usage of the TDMC formalism for deducing the energetic architecture of proteins. First, ΔΔGijxy is not the coupling between the amino acids present in the wild-type protein (the 'native interaction'). It is instead the energetic coupling due to mutation, a value that depends in complex and unknown ways on the specific choice of mutations made (Faiman and Horovitz, 1996). Second, global application of the TDMC method requires a scale of work matched to the combinatorial complexity of all potential interactions between amino acid positions under study. For even a small protein interaction module such as the PDZ domain (~100 residues, Figure 1B) (Hung and Sheng, 2002), a complete pairwise analysis comprising all possible amino acid substitutions at each position involves making and quantitatively measuring the equilibrium energetic effect of nearly two million mutations. Finally, even if these two technical issues were resolved, it is unclear how to go beyond the idiosyncrasies of one particular model system to the general, system-independent constraints that underlie protein structure, function, and evolvability.

Recent technical advances in massive-scale mutagenesis of proteins open up new strategies to address all these issues. In the PDZ domain, a bacterial two-hybrid (BTH) assay for ligand-binding coupled to next-generation sequencing enables high-throughput, robust, quantitative measurement of many thousands of mutations in a single experiment – a 'deep mutational scan' (Fowler and Fields, 2014; McLaughlin et al., 2012; Raman et al., 2016). Parameters of the BTH assay are tuned such that the binding free energy between each PDZ variant x and cognate ligand (ΔGbindx) is quantitatively reported by its enrichment relative to wild-type before and after selection (ΔEx, Figure 1—figure supplement 1 and Materials and methods). This relationship enables extension of single mutational scanning to very large-scale double mutant cycle analyses – a 'deep coupling scan' (DCS) study (Olson et al., 2014). Indeed, the throughput of DCS is so high that it enables the study of double mutant cycles in several homologs of a protein family in a single experiment. Thus, DCS provides a first opportunity to deeply map the pattern and evolutionary conservation of interactions between amino acid residues in proteins, a strategy to reveal the fundamental constraints contributing to protein function.

Here, we apply DCS to several homologs of the PDZ domain family. The data show how to estimate native couplings from mutagenesis, and demonstrate the existence of an evolutionarily conserved network of cooperative amino acid interactions associated with ligand binding. We then use these data as a benchmark to test the predictive power of sequence-based coevolution methods, which if verified, would represent a general and scalable approach for defining the amino acid constraints underlying protein structure and function. We show that with different formulations, coevolution can indeed provide effective estimates of both structural contacts and cooperative functional interactions between residues. This work establishes a path towards a unified practical approach for understanding the design of natural proteins.

Results

A deep coupling scan in the PDZ family

To develop basic principles for high-throughput analysis of amino acid couplings, we focused on a region of the binding pocket of the PDZ domain, a protein-interaction module that has served as a powerful model system for studying protein energetics (Lockless and Ranganathan, 1999; McLaughlin et al., 2012). PDZ domains are mixed αβ folds that typically recognize C-terminal peptide ligands in a binding groove formed between the α2 and β2 structural elements (Figure 1B). We created a library of all possible single and double mutations in the nine-residue α2 helix of five sequence-diverged PDZ homologs (PSD95pdz3, PSD95pdz2, Shank3PDZ, SyntrophinPDZ, and Zo-1PDZ, Figure 1C) (36 position pairs × 5 homologs, with 171 single + 12,996 double mutations + wild-type per homolog = 65,840 total variants) and measured the effect of every variant on binding its cognate ligand (Figure 1D–E and Table 1). Independent trials of this experiment show excellent reproducibility (Figure 1—figure supplement 2), and propagation of errors suggests an average experimental error in determining binding free energies of ~0.3 kcal/mol. Filtering for sequencing quality and counting statistics, we were able to practically collect 56,694 double mutant cycles (87% of total) for the α2 helix for all five homologs, with an average of 315 cycles per position pair per homolog (Table 1). Thus, we can (1) analyze the distributions of double mutant cycle coupling energies for nearly all pairs of mutations in the α2 helix and (2) study the divergence and conservation of these couplings over the five homologs.

Table 1
Summary of data collection.

For each PDZ homolog, we indicate the target ligand, the wild-type affinity, the top-hit sequence identity within the ensemble of homologs, and the assay/sequencing statistics.

https://doi.org/10.7554/eLife.34300.005
PDZ homologLigandLigand sequenceAffinityTop ID% (PDZ)No. of single
mutants
(out of 171)
No. of double mutants
(out of 12,996)
Mean cycles/position pair
(out of 361)
Sequence readsunsel/sel
PSD95pdz3CRIPTTKNYKQTSV0.8 μM
(McLaughlin et al., 2012)
41.0 (Syntrophinpdz)17111,53132012,190,079/14,358,962
PSD95pdz2NMDAR2AKMPSIESDV3.6 μM
(Stiffler et al., 2006)
40.5 (PSD95pdz3)17112,0723359,402,209/14,965,473
Shank3pdzDlgap1/2/3YIPEAQTRL0.2 μM
(Stiffler et al., 2006)
25.0 (PSD95pdz2)17110,45429017,232,329/6,429,999
SyntrophinpdzScn5a (Nav1.5)PDRDRESIV1.6 μM
(Stiffler et al., 2006)
41.0 (PSD95pdz3)17110,7572988,227,200/15,248,680
ZO-1pdzClaudin8SIYSKSQYV4.6 μM
(Zhang et al., 2006)
37.5 (PSD95pdz3)171118803305,365,041/11,523,044

We first addressed the problem of how to estimate native coupling energies from mutant cycle data. In general, the effect of a mutation at any site in a protein is a complex perturbation of the elementary forces acting between atoms, with a net effect that depends on the residue eliminated, the residue introduced, and on any associated propagated structural effects. Thus, the distribution of thermodynamic couplings at any pair of positions over many mutation pairs could in principle be arbitrary and difficult to interpret. However, we find surprising simplicity in the histograms of coupling energies. In general, the data follow single or double-Gaussian distributions (Figures 2 and 3, Figure 2—figure supplements 15, and see Materials and methods), with most distributions centered close to zero and with just a few position pairs displaying two distinct populations. In general, every mutation is associated with the full range of coupling energies, and the distributions of couplings are not immediately obvious from known chemical properties of amino acids or secondary structure propensities (Figure 3—figure supplement 1). For example, mutations to glycine and proline might be expected to disrupt the α2 helix, and cause global large couplings with every other mutation, but in fact we find that these substitutions show a broad range of coupling energies not unlike other mutations. The data suggest that as an ensemble, mutations act as random perturbations to the native state of proteins, with the population-weighted mean of the distribution of coupling energies for each position pair (Figures 23, dashed lines) providing the best empirical estimate of the native interaction between amino acids through mutagenesis.

Figure 2 with 5 supplements see all
Distributions of pairwise thermodynamic couplings in a single PDZ homolog (PSD95pdz3).

Each subplot shows the distribution of coupling free energies (ΔΔG, see Equation 1, main text) for all measured mutants at one pair of positions in the α2-helix (numbering per Figure 1C) in PSD95pdz3. The distributions are fit to single or double Gaussians, using the Bayes Information Criterion to justify choice of model, and the position of zero coupling is indicated by the solid line and circle above. Population-weighted mean values are represented by dashed lines. The data are remarkably well defined by the fitted models. Most position pairs have distributions centered close to zero, with only eight pairs comprising all pairwise couplings between positions 1, 4, 5, and 8, and 1-2, 1-9 showing deviations. For these pairs, distributions of mutational coupling follow either a single mode (1-2, 1-5) or two modes with one centered at zero (1-4, 1-8, 1-9, 4-5, 4-8, 5-8); population-weighted mean values for these pairs are indicated in red. Figure 2—figure supplement 14 show similar data for each of the other homologs taken individually.

https://doi.org/10.7554/eLife.34300.006
Figure 3 with 1 supplement see all
Homolog-averaged pairwise thermodynamic couplings in the PDZ domain.

Each subplot shows the distribution of coupling free energies (ΔΔG, see Equation 1, main text) for all measured mutants at one pair of positions in the α2-helix, but here averaged over the five homologs. As in Figure 2, the distributions are fit to single or double Gaussians, using the Bayes Information Criterion to justify choice of model. The position of zero coupling is indicated by the solid line and circle above and population-weighted mean values are represented by dashed lines. Averaging over homologs reveals the conserved pattern of couplings; now, only six pairs comprising all pairwise couplings between positions 1, 4, 5, and 8 show deviations from zero. For these pairs, distributions of mutational coupling follow either a single mode (1-4, 1-5) or two modes with one centered at zero (1-8, 4-5, 4-8, 5-8); population-weighted mean values for these pairs are indicated in red.

https://doi.org/10.7554/eLife.34300.012

Two technical points are worth noting. First, the spread of the distributions is large, generally exceeding the estimated magnitude of the native interactions (Figures 23). This means (1) that traditional mutant cycle studies carried out with specific choices of mutations are more likely to just reflect the choice of mutations rather than the native interaction, and (2) that the only way to obtain good estimates of the native interaction between residues is to average over the effect of many double mutant cycles per position pair. The lack of such averaging could lead to considerable variation in the interpretation of mutant cycle data (Chi et al., 2008; Faiman and Horovitz, 1996; Lockless and Ranganathan, 1999). Second, we find that the BTH/sequencing approach displays such good reproducibility that it is possible to detect coupling energies with an accuracy that is on par with the best biochemical assays. For example, the average standard deviation in mean coupling energies for position pairs over four independent experimental replicates in PSD95pdz3 is ~0.06 kcal/mol. Thus, we can map native amino acid interactions with high-throughput without sacrificing quality.

A model for distributions of thermodynamic mutant cycle couplings

The uni/bi-modal character of distributions of thermodynamic mutant cycle couplings is striking in two respects. First is the generality. The same distribution shapes are found in all the individual PDZ homologs tested (Figure 2 and Figure 2—figure supplements 14), the average over homologs (Figure 3), and even for DCS in an unrelated protein (GB1, Figure 2—figure supplement 5 [Olson et al., 2014]). Second, the distribution shapes seem to be defined more by position, rather than by the character of mutations. For example, with a few exceptions, the same position-pairs in every PDZ homolog display mean coupling energies close to zero and the same few position pairs display bimodal or non-zero means (compare Figure 2 and Figure 2—figure supplements 14, and see Figure 3). The sparse, position-specific character of bimodal distributions is also in the GB1 protein (Figure 2—figure supplement 5). These results imply a mechanism for the distributions of thermodynamic couplings in proteins that goes beyond local biophysical characteristics of the PDZ α2 helix or the average chemical properties of amino acids.

A simple mechanistic model for mutant cycle distributions is that the observed free energy of ligand binding arises from a cooperative internal equilibrium between two distinct conformational states of a protein (labeled 0 and 1, Figure 4A), with just a few sites defining this equilibrium. The basic idea is that any chemical reaction Kx (here, binding) that is coupled to such an internal configurational equilibrium Kc by a constant α will show an apparent equilibrium constant Kxapp that is a distinct function of each of these three parameters. Specifically, Kxapp depends linearly on Kx (Figure 4B), displays a saturating relationship with non-trivial values of α (that is, for α >> 1) (Figure 4D), and depending on the degree of internal cooperativity, can show a sigmoidal or even ultrasensitive response to changes in Kc (Figure 4C). The key to the bimodality lies in the nonlinearity of the relation between Kxapp and the internal equilibrium Kc. With the wild-type value of Kc set near to the non-linear region (that is, Kc~1) and even without any intrinsic coupling in Kx and α, it is straightforward to see that mutations perturbing only Kx and α will generate distributions of thermodynamic couplings centered at zero (Figure 4E), but perturbations in Kc can evoke bimodal distributions with one mode centered at zero (Figure 4F) or a single distribution centered at a non-zero value (Figure 4G). With slight variations in the wild-type value of Kc between homologs, this model can account for all the observed distributions of pairwise thermodynamic coupling reported here. In addition, the sparse, position-specific character of bimodal or non-zero couplings arises from the constraint that only a few cooperative positions in the protein control the internal conformational equilibrium (Kc).

A basic model for observed distributions of double mutant cycle coupling energies.

(A), A schematic representation of two coupled equilibria in a protein molecule – a reaction with equilibrium constant Kx corresponding to function (here, binding), an internal two-state conformational equilibrium defined by Kc, and a coupling parameter α linking the two. The equation at right shows the general analytic solution for how the apparent equilibrium constant Kxapp depends on these three parameters, and panels (B–D) show graphs of these relationships over a relevant range of values. Note that Kx (and αKx) are defined as dissociation constants, and Kc0F/1F and αKc0B/1B. (B–D), Kxapp shows a linear dependence on Kx, a saturating relationship with α, and a sigmoidal relationship with Kc. For a range of Kc, Kxapp ranges between Kx and αKx, the two extreme limits set by the reaction diagram in panel (A). (E–G), distributions of coupling energies for simulations in which we choose a set of 'wild-type' values of Kx, Kc, and α (red dots, panels B–D) and consider mutations that cause random Gaussian perturbations of Kx and α, but either small or large perturbations of Kc (indicated in panel C). If all mutations cause small effects in Kc, we obtain unimodal distributions centered at zero coupling energy (E), and if all mutations cause large effects in Kc, we obtain unimodal distributions centered at a non-zero coupling energy (G). However, if mutations cause a mix of small and large effects on Kc, we obtain bimodal distributions with one mode centered at zero (F). These three types recapitulate all the observed distributions for all PDZ homologs (main Figure 2 and Figure 1—figure supplement 14), for the GB1 protein (Figure 1—figure supplement 5), and for the average over homologs (Figure 3). Note that higher order cooperativity between amino acids specifying Kc (a plausible scenario), would further steepen the relationship shown in panel (C) and would cause the all-or-nothing character of mutations with regard to Kc with even less distinction between large and small perturbations. This model is not intended as a proof of mechanism for the observed distributions, but instead provides a logical scheme that explains the observations in light of known two-state allosteric equilibria is some PDZ domains (Mishra et al., 2007; Raman et al., 2016).

https://doi.org/10.7554/eLife.34300.014

We note that the model is intended at this stage as a hypothesis rather than proof of mechanism. Nevertheless, we note that a cooperative two-state internal equilibrium involving the α2 helix has been experimentally observed in a PDZ domain, and is part of an allosteric regulatory mechanism controlling ligand binding (Mishra et al., 2007). Specifically, in the Drosophila InaD protein, redox-dependent regulation of Kc in one PDZ domain switches the conformation of the ligand binding pocket and controls the dynamics of visual signaling (Helms, 2011; Mishra et al., 2007). The findings here of bimodality in mutational couplings in diverse PDZ homologs and in the GB1 protein suggests that a two-state internal equilibrium may be a common feature of many proteins. If so, the residues defining Kc may represent the mechanistic basis for classic thermodynamic concept of allosteric regulation in proteins through modulation of two-state conformational equilibria (Cui and Karplus, 2008; Monod et al., 1965; Volkman et al., 2001).

Idiosyncrasy and conservation in functional couplings in the PDZ domain

What do the data tell us about the overall pattern of amino acid interactions? Figure 5A–E show heat maps of the estimated native coupling energies between all pairs of amino acids within the α2 helix for each PDZ homolog. The data demonstrate both idiosyncrasy and conservation of amino acid couplings in paralogs of a protein family. For example, helix positions 3–4 show moderate couplings in two of the domains (PSD95pdz3 and SyntrophinPDZ, Figure 5A and D) but not in the other homologs. Similarly, coupling between positions 7–8 is shared by PSD95pdz3, PSD95pdz2, and Zo1PDZ (Figures 5A, B and E) but not in the other two homologs. In contrast, all pairwise interactions between positions 1, 4, 5, and 8 show a systematic pattern of energetic coupling in all homologs tested. Thus, each PDZ domain displays variations in the pattern and strength of amino acid energetic couplings, but also includes a set of evolutionarily conserved couplings at a few positions. We take the conserved couplings to represent the most fundamental constraints underlying PDZ function, with the homolog-specific couplings indicating more specialized or even serendipitous couplings.

Conservation and idiosyncrasy in the pattern of energetic couplings over PDZ homologs.

(A–E), Matrices of mutation averaged pairwise thermodynamic couplings for the α2-helix in each PDZ homolog. The color scale is chosen to represent the full range of measured energetic couplings. The data show that some couplings are specific to individual homologs or shared by a subset of homologs, but that couplings between positions 1, 4, 5, and 8 are conserved over homologs. (F), the pattern of direct tertiary contacts between amino acid positions in the PDZ α2 helix. By convention (Morcos et al., 2011), trivial contacts between residues with sequence distance less than three are not shown. (G), The homolog and mutation averaged couplings (corresponding to Figure 3), displaying the conserved interactions between amino acids in the PDZ α2-helix. (H), Two views of the α2-helix, with the four interacting positions in the homolog-averaged dataset shown in transparent surface representation, and ligand in yellow stick bonds. These include three positions in direct contact with ligand (1, 5, 8) and one allosteric position buried in the core of the protein (4).

https://doi.org/10.7554/eLife.34300.015

To isolate the fundamental couplings, we averaged all the double mutant cycle data over all mutations and over the five PDZ homologs tested (Figure 3), resulting in a matrix of evolutionarily conserved pairwise thermodynamic couplings (Figure 5G). This analysis reinforces the result that positions 1, 4, 5, and 8 comprise a cooperative network of functional residues in the PDZ domain family, and the remainder, even if in direct contact with each other or with ligand, contribute less and interact idiosyncratically or not at all. The conserved couplings form a chain of physically contiguous residues in the tertiary structure that both contact (1, 5, 8) and do not contact (4) the ligand (Figure 5H). Interestingly, position 4 is part of the distributed allosteric regulatory mechanism in the InaD PDZ domain discussed above (Mishra et al., 2007), providing a biological role for its energetic connectivity with binding pocket residues. Overall, the pattern of couplings does not just recapitulate all tertiary contacts between residues (compare Figure 5F with Figure 5G) or the pattern of internal backbone hydrogen bonds that define this secondary structure element. Instead, conserved amino acid interactions in the PDZ α2 helix are organized into a spatially inhomogeneous, cooperative network that underlies ligand binding and allosteric coupling.

The salient point that emerges from these data is that the pattern of direct contacts that define the protein structure and the pattern of cooperative amino acid interactions that define protein function are not the same. Both coexist and are relevant, but represent distinct aspects of the energetic architecture of proteins.

Coevolution-based inference of functional couplings

This result begins to expose the complex energetic couplings underlying protein function, but also highlights the massive scale of experiments required to deduce this information for even a few amino acid positions. How then can we practically generalize this analysis to deduce all amino acid interactions in a protein, and for many different proteins? There are potential strategies for pushing deep mutational coupling to larger scale, but quantitative assays such as the BTH are difficult to develop, mutation libraries grow exponentially with protein size, and the averaging over homologs will always be laborious, expensive, and incomplete. In addition, the cooperative action of amino acids could contribute both positive (McLaughlin et al., 2012) and negative design (Noivirt-Brik et al., 2009) features in proteins, and it is often not easy to create high-throughput assays for measuring all aspects of proteins that make up function.

A different approach is suggested by understanding the rules learned in this experimental study for discovering relevant energetic interactions within proteins. The bottom line is the need to apply two kinds of averaging. Averaging over many mutations provides an estimate of native interaction energies between positions, and averaging the mutational effects over an ensemble of homologs separates the idiosyncrasies of individual proteins from that which is conserved in the protein family. Interestingly, these same rules also comprise the philosophical basis for a class of methods for estimating amino acid couplings through statistical analysis of protein sequences. The central premise is that the relevant energetic coupling of two residues in a protein should be reflected in the correlated evolution (coevolution) of those positions in sequences comprising a protein family (Göbel et al., 1994; Lockless and Ranganathan, 1999; Neher, 1994; Weigt et al., 2009). Statistical coevolution also represents a kind of combined averaging over mutations and homologs, and if experimentally verified, would (unlike deep mutational studies) represent a scalable and general approach for learning the architecture of amino acid interactions underlying function in a protein. The data collected here provides the first benchmark data to deeply test the predictive power of coevolution-based methods.

One approach for coevolution is the statistical coupling analysis (SCA), a method based on measuring the conservation-weighted correlation of positions in a multiple sequence alignment, with the idea that these represent the relevant couplings (Halabi et al., 2009; Lockless and Ranganathan, 1999). In the PDZ domain family (~1600 sequences, pySCA6.0 [Rivoire et al., 2016]), SCA reveals a sparse internal organization in which most positions evolve in a nearly independent manner and a few (~20%) are engaged in a pattern of mutual coevolution (Halabi et al., 2009; Lockless and Ranganathan, 1999; Rivoire et al., 2016). In this case, the coevolving positions are simply defined by the top eigenmode (or principal component) of the SCA coevolution matrix. Extracting the corresponding coevolution pattern for just the α2 helix (Figure 6), we find that coevolution as defined by SCA strongly predicts the homolog-averaged experimental couplings collected here in a manner robust to both alignment size and method of construction (r2=0.82-0.77, p=10-14-10-12 by F-test, indicating the significance of the correlation coefficient, Figures 6B–D and Figure 6—figure supplement 1). The predictions also hold for individual homologs (Figure 6—figure supplement 2), consistent with the premise that the essential physical constraints underlying function are deeply conserved. Importantly, the goodness of prediction depends strongly on both of the basic tenets that underlie the SCA method – conservation-weighting (Figure 6E–F) and correlation (Figure 6G–H) (Rivoire et al., 2016).

Figure 6 with 2 supplements see all
Coevolution-based inference of energetic couplings - SCA.

(A), Coevolution of sequence positions corresponding to the top eigenmode of the SCA matrix, derived from an alignment of 1656 eukaryotic PDZ domains (the 'Poole' alignment). The data show that a subset of positions coevolve within the PDZ α2-helix. (B–D), The relationship between experimental homology-averaged energetic couplings (ΔΔG) and SCA-based coevolution computed for three different alignments that differ in size and method of construction. The p-values give the significance of the coefficient of determination (r2) by the F-test. (E–H), The basic calculation in SCA is to compute a conservation-weighted correlation matrix C~ijab=ϕiaϕjbfijab-fiafjb, where fIa and fijab represent the frequency and joint frequencies of amino acids a and b at positions i and j, respectively, in a multiple sequence alignment. The term fijab-fiafjb gives the correlation of amino acids at each pair of positions, and ϕ represents a weighting function for each amino acid at each position that is related to its conservation (Halabi et al., 2009; Rivoire et al., 2016). We compared the relationship of the experimental energetic couplings (ΔΔG) with measures of coevolution that leave out the conservation weights (E–F), or that leave out the correlations (G–H). The analysis shows that both terms contribute to predicting native energetic couplings between amino acids.

https://doi.org/10.7554/eLife.34300.016

A basic result of the SCA method is that groups of coevolving positions form physically connected networks of amino acids (termed protein ‘sectors’) that link the main functional site to distantly positioned allosteric sites (Halabi et al., 2009; Lockless and Ranganathan, 1999; Süel et al., 2003). Indeed, in the PDZ domain, the protein sector represents a chain of amino acids the links the β2-β3 loop with the α1-β4 surface through the binding pocket and the buried α1 helix (spheres and surface, Figure 7). The α1-β4 surface is a known binding site for allosteric modifiers (Peterson et al., 2004), and the β2-β3 loop contains positions where mutations can enable adaptation to new ligand specificities (Raman et al., 2016). The four positions experimentally identified here as a cooperative unit (1, 4, 5, 8, red spheres, Figure 7) represent the portion of the α2 helix that is contained in the protein sector. Thus, these data argue that the sector correctly identifies the amino acids engaged in cooperative interactions, but more importantly implies that these positions are just a part of a more global cooperative unit within the PDZ domain that mediates allosteric communication.

Homolog-averaged thermodynamic couplings and protein sectors.

Analysis of the top eigenmodes of the SCA coevolution matrix exposes groups of coevolving amino acids that empirically are found to form physically contiguous networks in the tertiary structure, often connecting the main functional site to remote allosteric sites (Halabi et al., 2009; Lockless and Ranganathan, 1999; Rivoire et al., 2016; Süel et al., 2003). In the PDZ family, the protein sector (shown as CPK spheres and transparent surface on three rotations of a representative structure, PDB 1BE9) connects the ligand binding pocket to two known allosteric sites, one in the α1-β4 loop (Peterson et al., 2004) and the other in the β2-β3 loop (Raman et al., 2016); the ligand is shown in yellow stick bonds. The homolog-averaged thermodynamic couplings in the α2 helix (positions 1, 4, 5, and 8) precisely correspond to the portion of the PDZ sector contributed by the secondary structure element. The selective cooperative action of these residues is consistent with the idea that the sector represents a global collective mode in the PDZ structure associated with function, embedded within a more independent environment.

https://doi.org/10.7554/eLife.34300.019

Another approach for amino acid coevolution is direct contact analysis (DCA, [Marks et al., 2011; Morcos et al., 2011]), a method developed for the prediction of tertiary contacts in protein structures. DCA uses classical methods in statistical physics to deduce a matrix of minimal pairwise couplings between positions (Jij, Figure 8A) that can account for the observed correlations between amino acids in a protein alignment, with the hypothesis that the strong couplings in Jij will be direct contacts in the tertiary structure. Indeed, studies convincingly demonstrate that the top L/2 (where L is the length of the protein) couplings are highly enriched in direct structural contacts (Anishchenko et al., 2017). Consistent with this, this method successfully identifies direct contacts in the PDZ α2 helix (Figure 8A, compare heat map to white and black circles) to an extent that agrees with the reported work. However, DCA model makes predictions of functional energetic couplings between mutations (Figure 8B) that are weakly or not at all related to the homolog-averaged experimental data (r2=0.33-0.05, p=10-3-0.09 by F-test, Figure 8C–E). Interestingly, the best predictive power comes from one moderately-sized structure-based sequence alignment (Figure 8C) rather than from the largest publicly available alignments (Figure 8D–E). These results are similar or poorer for prediction of couplings in individual domains (Figure 8—figure supplement 2). Due to inclusion of many unconserved correlations, DCA is quite sensitive to alignment size, with random sub-samplings of the best performing alignment producing models with variable quality in terms of predicting the data (Figure 8—figure supplement 1).

Figure 8 with 2 supplements see all
Coevolution-based inference of energetic couplings - DCA.

(A), The matrix of direct couplings (Jij) from the DCA method, with tertiary contacts in the PDZ structure (1BE9) indicated by white or black circles. By convention (Morcos et al., 2011), trivial contacts between residues with sequence distance less than three are not shown. The data show that all top direct couplings identified by DCA are indeed tertiary structural contacts. (B), The DCA method involves the inference of a statistical energy function E(σ) that for each sequence σ, is parameterized by a set of intrinsic constraints on amino acids (hi) and pairwise interactions between amino acids (Jij). These parameters are optimized to reproduce the observed alignment frequencies and pairwise correlations. Using the model, the matrix shows mutation- and homolog-averaged energetic couplings, computed precisely as for the experimental data; see Materials and methods for details. (C–E), The relationship between experimental (ΔΔG) and DCA-inferred (ΔΔE) couplings in the PDZ α2-helix, for three PDZ alignments that differ in size and method of construction. The p-values give the significance of the coefficient of determination (r2) by the F-test. (F), The relationship between experimental and DCA-inferred couplings from Jij in which top couplings defining contacts are preserved and all non-contact couplings are randomly scrambled. The DCA model used for this analysis is from the Poole alignment, as in panel D. The data show that pairwise couplings in the DCA model between non-contacting positions contribute significantly to prediction of protein function.

https://doi.org/10.7554/eLife.34300.020

The top couplings in the Jij matrix identify local structural contacts between amino acids, but do these direct couplings also underlie the partial ability of DCA to account for functional couplings? To test this, we chose the best-case alignment (the 'Poole' alignment, Figure 8C), and made an edited DCA model in which only the top L/2 pairwise couplings in Jij that define tertiary contacts are retained and the remaining weaker non-contacting couplings are randomly scrambled. While the full model shows moderate association with experimental data (r2=0.33, p=10-3 by F-test, Figure 8C), the edited model shows predictions that are now unrelated to the experimental data (r2=0.02, p=0.21 by F-test, Figure 8F). Thus, the many non-contact pairwise couplings in the DCA model, which represent noise from the point of view of structure prediction, contribute significantly to prediction of function. A similar result has been noted in the DCA-based prediction of protein-protein interactions, where the quality of prediction depends on many weak couplings between residues not making contacts at the interface (A.F. Bitbol and N. Wingreen, personal communication).

A recent study has shown that for the strong couplings in the DCA model (the top L/2 terms in Jij, Figure 8B), cases of apparently non-contacting residues are often resolved as true contacts by one of three explanations: (a) they are contacts induced by oligomerization, (b) they are contacts in other conformational states or homologous structures, or (c) they are artifacts due to misalignment of repeat regions (Anishchenko et al., 2017). With the presumption that all the weaker remaining terms in Jij are irrelevant, this result has been interpreted to mean that all the evolutionary constraint in protein structures is in direct physical contacts, with allosteric mechanisms not contributing to coevolution (Anishchenko et al., 2017). In one sense, the data shown here are fully consistent with the results of this previous study; the top terms in Jij are indeed enriched in contacts (Figure 8A), and in fact do not correspond to the experimental energetic couplings (Figure 8F), including long-range ones such as 1-8 (Figure 5G). But, the finding that the weak, non-contacting couplings in Jij contribute to predicting the experimental data argue that the origin of evolutionary constraints is not strictly in direct physical interactions of amino acids. Furthermore, the finding that the SCA coevolution matrix provides an improved prediction of experimental couplings (Figure 8B–D), including long-range ones (e.g. 1-8, Figure 6A) argues that information about allosteric energetic interactions are contained in the statistics of alignments and are therefore part of the total evolutionary constraint.

Taken together, these findings provide a set of important clues for now extending the statistical physics approach to produce models for proteins that accurately predict interactions that define both local structural contacts and the global, collective actions of residues that underlie function.

Discussion

Defining the pattern of cooperative interactions between amino acids is essential for understanding the evolutionary design of protein structure and function. Here, we use very high-throughput next-generation sequencing based mutagenesis to experimentally probe the pattern of functional interactions between residues. We show that averaging thermodynamic couplings over many pairs of mutations provides an estimate of the native interactions between amino acids, and exposes an architecture in which most pairs of amino acids are uncoupled and a few significantly interact to make a cooperative network underlying function. Further averaging over homologs refines the pattern of cooperativity, revealing an evolutionarily conserved network of cooperative amino acid interactions that includes both direct and allosteric influences on ligand binding. This pattern is distinct from the pattern of local contacts between residues that defines secondary structure elements and the tertiary structure, indicating that a full understanding of proteins requires inference of both direct local structural contacts and the network of cooperative interactions that underlies function.

While the DCS method represents a productive extension of ‘deep mutagenesis’ methods to probe second-order cooperative interactions between amino acids, the combinatorial complexity of cooperative amino acid interactions is so vast that no experiment can exhaustively probe the global pattern of amino acid interactions within proteins. In this regard, we suggest that DCS serves mainly to provide a critical benchmark to explore other strategies that have the generality and scalability to learn the global pattern. Such a strategy is statistical coevolution, the concept that the relevant energetic interactions between amino acids contributing to structure and function should be reported in the correlations of amino acid outcomes at pairs of positions in a large sampling of homologous sequences comprising a protein family. In fact, we show that two different approaches for coevolution – DCA and SCA – effectively report the experimentally determined pattern of structural contacts and functional couplings, respectively. While prediction of structural contacts are easily verified by comparison with published protein structures (Kamisetty et al., 2013; Marks et al., 2012), datasets for evaluating the prediction of protein function have been limited (Lockless and Ranganathan, 1999; McLaughlin et al., 2012; Teşileanu et al., 2015). In this regard, DCS represents a necessary step for collecting the kind of data to refine and test models for protein function.

The finding that SCA can effectively predict the conserved thermodynamic couplings allows us to propose a deeper hypothesis about the meaning and role of protein sectors (Halabi et al., 2009). The coupled equilibrium model described above (Figure 4A) postulates the existence of a cooperative two-state internal equilibrium Kc within proteins, where only perturbations of Kc can generate non-zero mutational couplings. Since significant conserved couplings in the α2 helix are exclusively within positions 1, 4, 5, and 8 and since these positions are contained within the protein sector, it is logical to propose that the sector represents the structural unit underlying Kc – a distributed cooperative amino acid network through which allosteric effects can be transmitted by modulation of the internal conformational equilibrium. Consistent with this, introduction of new molecular interactions at sector edges has been shown to be a route to engineering new allosteric control in protein molecules (Lee et al., 2008; Reynolds et al., 2011). In future work, it will be interesting to rigorously test the hypothesis that the sector underlies Kc through global or sector-directed DCS experiments.

Overall, our findings clarify the current state of sequence-based inference of protein structure and function (Figliuzzi et al., 2016; Hopf et al., 2017). DCA successfully predicts contacts in protein structures in the top couplings, but in its current form, does not appear to capture the cooperative constraints that underlie protein function well. In contrast, SCA does not predict direct structural contacts well, but instead seems to accurately capture the energetic couplings that contribute to protein function. As explained previously, these two approaches sample different parts of the information contained in a sequence alignment (Cocco et al., 2013; Rivoire, 2013), and therefore are not mutually incompatible. These results highlight the need to unify the mathematical principles of contact prediction and SCA-based energetic predictions towards a more complete model of information content in protein sequences.

In summary, the collection of functional data for some 56,000 mutations in a sampling of PDZ homologs demonstrates an evolutionarily conserved pattern of amino acid cooperativity underlying function. This pattern is well-estimated by statistical coevolution based methods, suggesting a powerful and (given the scale of experiments necessary) uniquely practical approach for mapping the architecture of couplings between amino acids. Indeed, the remarkable implication is that with a sufficient ensemble of sequences comprising the evolutionary history of a protein family and the further technical advancements suggested above, the pattern of relevant amino acid interactions can be inferred without any experiments.

Materials and methods

Library generation

Request a detailed protocol

For each PDZ homolog, a library of all single and pairwise mutations in the α2-helix was generated using a set of 36 mutagenic forward primers (50-60mers, IDT), each with two codons randomized as NNS (IUPAC code). Each primer was used in a separate inverse PCR reaction with a constant reverse primer, a PZS22 plasmid containing the wildtype PDZ variant as template, and Q5 polymerase (NEB). The primers amplify the entire plasmid, introducing mutations only on one strand, a strategy that reduces library bias and over-representation of the wildtype allele that occurs with methods such as overlap-extension PCR (Jain and Varadarajan, 2014). Both forward and reverse primers are designed with BsaI sites in the 5’ region, permitting scarless unimolecular ligation of the PCR products. All 36 PCR products per PDZ homolog are quantified by Qubit and Nanodrop (Thermo Fisher Scientific), mixed in an equimolar ratio, and used for a one-pot digestion-ligation reaction to make the library of all single and double mutants (Engler and Marillonnet, 2013).

The bacterial two-hybrid assay

Request a detailed protocol

The bacterial two-hybrid assay is based on the triple-plasmid system reported in Raman et al. (Raman et al., 2016) (Figure 1—figure supplement 1). The PDZ variants are expressed as fusions with the λcI DNA-binding domain under control of a lac promoter (in PZS22 plasmid, low-copy SC101 origin, trimethioprim (Tm) resistance), the PDZ ligand is expressed as a fusion with the RNA polymerase α-subunit under control of a tet promoter (in PZA31 plasmid, low-copy p15A origin, kanamycin (Kan) resistance), and the reporter gene is cat (coding for the enzyme chloramphenicol acetyltransferase), encoded by the pZE1RM plasmid (medium-copy number ColE1 origin and ampicillin (Amp) resistance).

Libraries of PDZ variants are transformed into electrocompetent pZE1RM+pZA31+ MC4100Z1 cells that harbor chromosomal copies of the lac repressor lacIq and the tet repressor TetR (Tan et al., 2009). After recovery in SOC medium, the culture is used to inoculate 50 mL of LB media (1:50 dilution) supplemented with 50 μg/mL Amp, 40 μg/mL Kan, and 20 μg/mL Tm and incubated overnight at 37°C with shaking. After ~12 hr, a 1:1000 dilution of the culture is made into fresh LB media with the three antibiotics and allowed to grow at 37°C to bring the cells into exponential growth (OD600 = 0.1), at which point another 1:100 dilution is made into LB supplemented with the three antibiotics and 50 ng/mL of doxycycline hydrochloride (dox) to induce expression of the PDZ ligand fusion. Cells are incubated at 25°C for 2 log-orders of growth (~6.7 doublings) to allow protein expression to reach steady-state (Poelwijk et al., 2011). Growth at 25°C appears to represent an optimum in maximizing dynamic range whole also focusing assay sensitivity to binding affinity rather than protein stability. After induction, cells are 1:100 diluted into fresh LB media supplemented with all antibiotics and dox, and also chloramphenicol at a final concentration of 150 μg/mL for selection. Cells are grown at 25°C and harvested at OD600 = 0.1, and plasmids are purified. The region covering the α2 helix is PCR amplified and Truseq barcodes and sequencing adapters (Illumina) are appended in two sequential PCR reactions. Truseq barcodes permit multiplexing different experiments in a single sequencing run.

Deep sequencing

Request a detailed protocol

To analyze allele distributions, samples are combined and sequenced on either the Illumina Miseq or Hiseq2500 instruments (University of Texas Southwestern Medical Center Genomics and Microarray Core) and subsequently de-multiplexed, with allele counts extracted from sequencing files using FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/) and BioPython (Cock et al., 2009) and converted to frequencies before and after selection in Matlab or Python.

To relate sequencing data to binding free energies, we compute the relative frequency of each allele x after (s) and before (u) selection: zx=fsx/fux. Since selection has the property that frequencies of alleles will exponentially diverge as a function of time, we take the logarithm after normalizing zx over all alleles to define the 'relative enrichment': ΔEx=lnzx/izi. Since selection in the BTH is designed to be proportional to the fraction bound of each PDZ variant to the target ligand (McLaughlin et al., 2012; Raman et al., 2016), we have that ΔEx=aln(fbx)+C) (Equation 2), where in the pseudo first-order limit fbx=L/L+Kdx. L (the free ligand concentration in vivo), a, and C are free parameters determined by fitting ΔEx measured for a library of 45 mutants of PSD95 with known equilibrium binding constants (Fig. S1B). This 'standard curve' for the BTH shows excellent dynamic range in reporting binding free energies over the full range of values for essentially all mutants in all homologs (Figure 1D–E). To determine binding energies for data acquired in different sequencing experiments, we use the known binding affinity of the wild-type alleles to apply a correction to ΔEx scores to match the values determined for the standard curve experiment. Given the fitted parameters and a set of corrected ΔEx scores, we compute equilibrium binding constants and corresponding free energies using Equation 2.

Data analysis and statistical comparisons

Request a detailed protocol

Distributions of thermodynamic coupling energies for each pair of positions in each homolog were plotted and fitted to single or double Gaussian models using the Gaussian mixture modeling tools in MATLAB (Mathworks Inc.). For each position pair, the choice between these two models was made by selecting the model with the minimum Bayes Information Criterion (BIC), which appropriately penalizes more complex models in accounting for the data. For correlation analyses comparing the experimental data with coevolution-based predictions (Figures 6 and 8), linear models were fit in MATLAB. The significance of the fitted Pearson’s coefficient of determination (r2) is given by the F-test, with the null hypothesis that there is no correlation.

Statistical coupling analysis (SCA)

Request a detailed protocol

SCA was performed using pySCA 6.0 as recently described (Rivoire et al., 2016) using two manually adjusted, structure-based alignments of 240 ('Lockless') or 1689 ('Poole') eukaryotic PDZ domains, or using a publicly available alignment from PFAM (9610 seqs, [Finn et al., 2016]). Briefly, SCA involves computing a conservation weighted correlation matrix C~ijab=ϕiaϕjbfijab-fiafjb, where fia and fijab represent the frequency and joint frequencies of amino acids a and b at positions i and j, respectively, and ϕ=Df=lnf(1-q)q(1-f), the gradient of the Kullback-Leibler entropy D describing the degree of conservation of amino acids (Halabi et al., 2009). C~ijab is reduced to a positional coevolution matrix C~ij by taking the Frobenius norm over amino acid pairs for each (ij), and C~ij is subject to eigenvalue decomposition. Coevolving positions are hierarchically organized into one or more collective modes (protein sectors, [Halabi et al., 2009]) in the top eigenmodes of the SCA positional coevolution matrix, with lower modes indistinguishable from noise due to limited sampling (Halabi et al., 2009). For PDZ, a protein with a single hierarchical sector, we consider here just the top eigenmode, permitting calculation of cleaned coevolution matrix C^=v1λ1v1T, where λ1 is the top eigenvalue and v1 is the first eigenvector. The portion of C^ corresponding to the α2 helix is shown in Figure 4D.

Direct coupling analysis (DCA)

Request a detailed protocol

DCA calculations were carried out for alignments of 1689 ('Poole'), 9610 (PFAM), or 102410 ('Hopf', [Hopf et al., 2017]) PDZ domains using the pseudolikelihood maximization approach reported in (Hopf et al., 2017), resulting in intrinsic constraints (hi) for each amino acid and pairwise couplings (Jij) for each amino acid pair at positions i and j. These parameters define a statistical energy for any given amino acid sequence σ=(σ1σL): E(σ)=ihiσi+i<jJijσiσj. As described (Hopf et al., 2017), we use these parameters to compute the energetic effect of single mutants and pairwise coupling of mutation pairs (Equation 1, main text), starting from the sequences of each homolog. Just as for the experimental data, histograms of all amino acid couplings for every pair of positions were fit to either single or double Gaussian distributions, and mean values used for comparisons with the experimental data for individual homologs (Figure 8—figure supplement 2). For homolog averaged couplings (Figure 4G), coupling energies for each amino acid pair were averaged over homologs, and then used to make histograms. For Figure 4H, we defined a DCA model in which top couplings in Jij were defined using a cutoff as in (Ovchinnikov et al., 2014), and all couplings below the cutoff were randomly scrambled. The resulting model parameters were used for computing the energetic effect of single and double mutations, as above.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
    Multi-Scale Structure and Dynamics of Visual Signaling in Drosophila Photoreceptor Cells
    1. SJ Helms
    (2011)
    Dallas, TX: The University of Texas Southwestern Medical Center.
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51

Decision letter

  1. Nir Ben-Tal
    Reviewing Editor; Tel Aviv University, Israel
  2. Detlef Weigel
    Senior Editor; Max Planck Institute for Developmental Biology, Germany

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "Inferring amino acid interactions underlying protein function" for consideration by eLife. Your article has been reviewed by three peer reviewers, evaluation has been overseen by a Reviewing Editor and Detlef Weigel as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Nir Ben-Tal (Reviewer #1); Amnon Horovitz (Reviewer #2).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

The manuscript presents deep sequencing study of the ligand-binding helix of 5 homologous PDZ domains. The data is analyzed, using double-mutant thermodynamics couplings, to estimate energetic couplings between the 9 amino acid positions of the helix. The results are correlated with calculated evolutionary couplings using both the SCA method, developed by the authors, and the DCA method used to infer direct couplings. The conclusion is that while DCA detects direct couplings, useful within the context of structure prediction, SCA appears to detect 'functional', allosteric couplings.

Opinion:

Assuming that it was done well, the deep sequencing data itself, regardless of the analysis and interpretations, can be very handy and useful for other studies. Its interpretation using the double-mutant cycle is insightful (but raises several open questions), and the comparison to the evolutionary couplings is questionable.

Essential revisions:

The energetic couplings:

1) A single Gaussian appears to fit well to many (perhaps most) of the energetic couplings in Figure 2, but the description in the main text (subsection “A deep coupling scan in the PDZ family”, second paragraph) refers to double-Gaussian. The energetic couplings of the rest of the PDZ domains (S3-S6) has even fewer double-Gaussians. Thus, the description is confusing.

2) The proposed explanation for the double-Gaussian are two conformations, which is surprising for a helix. Which conformations would these be? The model in S8 is too abstract and does not explain.

3) The analysis considers all the mutation types to be equal but that's not the case. Mutations to glycine or proline, for example, are more likely to disrupt the helix structure whereas mutations to alanine are more likely to be non-disruptive. Given that a helix was studied here, special consideration should be given to helix disrupting mutations. Some re-analysis according to mutation types and discussion of this issue is needed. Perhaps, the bimodal distributions (double-Gaussians) are due to mutation types?

4) Given that this study focuses on an α-helix, certain couplings (both energetic and evolutionary) such as between i,i+3,4 are expected. These couplings may have little to do with the PDZ fold and simply reflect helix properties and solvent exposure. A previous analysis of correlated mutations in the helices of many unrelated proteins did indeed reveal enrichment at positions i,i+3,4 (Noivirt et al. PEDS 2005). Such couplings are detected essentially only along one face of the helix. On the one hand we wonder why it is missing in others. On the other hand, the observed coupling should be discussed in view of this anticipation. That is, maybe the 1-4 and 1-5 couplings (Figure 2) simply reflect an α-helix.

5) What is the mechanics of energy transduction between the amino acids? Which forces are involved? This is particularly interesting for amino acids that are not in direct contact with each other.

6) The mechanistic and biological implications of the couplings should be discussed further. Why is allostery needed in PDZ domains? Why is it not completely shared by all 5 homologues? How does the specific coupling observed for each homologue serve its unique function?

7) Why limiting the analysis to a single helix? It would have been much more insightful to cover the whole domain. Especially given that it's so small.

8) The deep sequencing data should be made publicly available and easily accessible.

Coevolution:

9) We have some concern about the correlations in Figure 5. To what extent are the differences meaningful given that most of the data points are clustered together and the differences between plots appear to be due to a few outliers?

10) With so many homologous sequences, it should be possible to examine the robustness of the results using k-fold tests. For example, how does evolutionary coupling computed using (randomly chosen) half of the taxa compare to the values obtained with the other half? And how do the couplings in each set correlate with the energetic couplings?

11) Regarding the previous point, it would be insightful to examine the robustness of the two methods used to estimate coevolution. Both DCA, used to detect direct couplings, and SCA, used to suggest 'functional couplings'. Hopefully both will be equally robust in k-fold tests.

12) "Extracting the coevolution pattern in the top eigenmode for just the α2 helix (Figure 5C), we find that coevolution as defined by SCA in fact nearly quantitatively recapitulates the homolog averaged experimental couplings collected here (𝑟2 = 0.82, 𝑝 = 10;<= by F-test, Figure 5D)." r2 = 0.82 is fair correlation at most. This sentence should be tuned down considerably.

13) The statement in the sentence before the Discussion that non-contact couplings in the DCA model represent noise is at odds with Anishchenko et al., 2017. This discrepancy requires some discussion.

14) When comparing the two coevolution methods the authors should make the most of each. For some reason they use a very small alignment in DCA. About one tenth of all possible sequences.

15) DCA (and a full evolutionary study) takes into account both direct and indirect couplings between all residues. SCA on the other hand, takes into account only couplings between amino acid pairs. Thus, the energetic couplings work, via the double-mutant cycle, is more similar in spirit to SCA. The authors should refer to this point when correlating the two evolutionary coupling methods to the energetic coupling analysis.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Coevolution-based inference of amino acid interactions underlying protein function" for further consideration at eLife. Your revised article has been favorably evaluated by Detlef Weigel (Senior Editor), a Reviewing Editor, and outside reviewer.

The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

This paper is important in two regards. First, it describes a high-throughput approach to mutant cycle analyses. Second, it shows the relative strengths and weaknesses of DCA and SCA. Nevertheless, there still are some concerns after the revision. The main one is that the equation in Figure 4 seems wrong. This may be a typo but if not, then the analysis in the paper based on this equation is also wrong. The correct form should be:

Kxapp = Kx(1+Kc*α)/(1 + Kc)

Other comments:

1) Averaging over mutation types is better than some arbitrary choice but, under favorable circumstances, mutations to alanine are preferred as a reference because with this substitution interactions are mostly removed without new ones being introduced.

2) In previous work (Cell 2009), the authors attributed the top eigenmode to evolutionary noise but not in this paper. This needs explaining.

3) We still think that the coincidence of the pattern of couplings observed in this paper with i, i+3,4 periodicity in helices suggests that maybe they reflect secondary structure and not allostery.

4) It should be noted that correlated mutations between distant residues can also reflect negative design (Noivirt PLoS Comp. Biol. 2009).

https://doi.org/10.7554/eLife.34300.028

Author response

Essential revisions:

The energetic couplings:

1) A single Gaussian appears to fit well to many (perhaps most) of the energetic couplings in Figure 2, but the description in the main text (subsection “A deep coupling scan in the PDZ family”, second paragraph) refers to double-Gaussian. The energetic couplings of the rest of the PDZ domains (S3-S6) has even fewer double-Gaussians. Thus, the description is confusing.

In fitting distributions, model selection was carried out by statistical criteria, following the Bayes Information Criterion (BIC), which correctly penalizes models with more parameters and provides a rigorous basis to choose single or double Gaussian models. We have added a section in the Materials and methods to describe the fitting process.

2) The proposed explanation for the double-Gaussian are two conformations, which is surprising for a helix. Which conformations would these be? The model in S8 is too abstract and does not explain.

We agree that the description of the model was too cursory in the text and buried in the supplementary information. In revision, we have now moved the model to a main figure and have added a main text section, which provides clearer explanations and connections to known biological properties of PDZ domains (subsection “A model for distributions of thermodynamic mutant cycle couplings”). We also have added a paragraph in the Discussion section to establish the connections between the model and the main conclusions of the paper (second paragraph).

But, it’s important to make a more general point. In agreement with the reviewers, a main conclusion from our data is that the two-state-like behavior of the internal conformational equilibrium is not well described by known average chemical properties of secondary structure or amino acids. Instead, we show that the four positions in the helix that are engaged in the internal equilibrium are actually part of a globally correlated coevolving mode that spans the full protein structure, connecting several secondary structure elements and mediating allosteric communication in PDZ domains (new Figure 7). Thus, we propose that the unit of protein structure from which the two-state behavior emerges is the collective mode, not any particular structural element. This re-formulation of basic functional units of proteins from secondary structure elements to global collective modes is one important feature of these data, and we have tried in revision to make this key point clearer.

3) The analysis considers all the mutation types to be equal but that's not the case. Mutations to glycine or proline, for example, are more likely to disrupt the helix structure whereas mutations to alanine are more likely to be non-disruptive. Given that a helix was studied here, special consideration should be given to helix disrupting mutations. Some re-analysis according to mutation types and discussion of this issue is needed. Perhaps, the bimodal distributions (double-Gaussians) are due to mutation types?

We fully sympathize with the reviewers’ intuition about the character of mutations, but in fact the data indicate that there is no simple relationship between mutation types or helical propensities and thermodynamic couplings due to mutations. In revision, we show this directly with new figures that plot the distribution of coupling energies for every one of the 20 amino acid substitutions (Figure 3—figure supplement 1A). We also show that mutations to Gly or Pro do not show obvious evidence for greater couplings with other mutations than do Ala mutations (Figure 3—figure supplement 1B-D). These findings may at first glance be surprising, but they simply reinforce

the sense that functional couplings seem to be more about the heterogeneous character of positions rather than the heterogeneous character of amino acids. We do not argue that mutations are all equal; instead the data support the argument that over the full ensemble of substitutions, mutations act as Gaussian perturbations around the native coupling between amino acids, with only the slight twist that for some positions, the perturbations influence a nonlinear internal equilibrium. Thus, the average value over a large ensemble of pairwise mutations provides the best estimate of the native coupling from mutation experiments. In revision, we have made these points more directly.

4) Given that this study focuses on an α-helix, certain couplings (both energetic and evolutionary) such as between i,i+3,4 are expected. These couplings may have little to do with the PDZ fold and simply reflect helix properties and solvent exposure. A previous analysis of correlated mutations in the helices of many unrelated proteins did indeed reveal enrichment at positions i,i+3,4 (Noivirt et al. PEDS 2005). Such couplings are detected essentially only along one face of the helix. On the one hand we wonder why it is missing in others. On the other hand, the observed coupling should be discussed in view of this anticipation. That is, maybe the 1-4 and 1-5 couplings (Figure 2) simply reflect an α-helix.

While some experimental couplings are consistent with the classic periodicity of hydrogen-bonding that defines an α-helix, others do not (e.g. 1-8. 4-5, and many specific couplings in individual domains). In addition, and as noted by the reviewers, there are many other i+3/i+4 couplings that are zero. As argued in the paper, the pattern of couplings is more consistent with perturbation of a collectively acting group of amino acids that emerge from the tertiary structure (Figure 7) than anything to do with the characteristics of the α2 helix as an isolated element. The reviewers are correct that our study is in the helix, but the helix is embedded in the full protein, and thus the pattern of energetic couplings must be thought of as a global property of the protein, not a local property of secondary structure. In revision, we make these points clearer.

5) What is the mechanics of energy transduction between the amino acids? Which forces are involved? This is particularly interesting for amino acids that are not in direct contact with each other.

Well, this is an excellent question, but our study here is about the pattern of thermodynamic couplings, and not at all about the underlying mechanism. Nevertheless, given the data, we can speculate that the dense network of thermodynamic couplings between positions 1, 4, 5, and 8 likely indicate that they act as a collective mechanical unit within an otherwise weakly coupled environment. Such an arrangement would selectively transmit forces within the unit, underlying the long-range couplings we observe here. Independent physical experiments are underway to test this model and directly expose the mechanical basis for long-range coupling.

6) The mechanistic and biological implications of the couplings should be discussed further. Why is allostery needed in PDZ domains? Why is it not completely shared by all 5 homologues? How does the specific coupling observed for each homologue serve its unique function?

We agree, and have now provided much better descriptions of the known biological roles of allosteric communication and regulation in PDZ domains. Indeed, there is excellent evidence for long-range allosteric control in PDZ domains that corresponds to the collective mode (new Figure 7), and for the involvement of the very residues in the α2 helix that we identify as experimentally coupled in this work.

7) Why limiting the analysis to a single helix? It would have been much more insightful to cover the whole domain. Especially given that it's so small.

There are two answers to this question. First, a full analysis of all double mutants in the PDZ domain would involve ~ 2 x 106 mutations per protein, a scale that may be possible, but is hardly trivial while maintaining experimental quality and homolog averaging. This is beyond the reasonable scope now. But the second answer is more important. Regardless of any proposed experiment, all these so-called “deep mutational” studies ultimately lack scalability and generality; indeed, they are technically difficult to optimize, they are very expensive, and they will always be limited in scope. Thus, the primary value of the experimental data here is to provide benchmark data to evaluate other approaches that can give us real power in generalized analyses of epistatic interactions within proteins. In that regard, we respectfully suggest that the study we carry out here provides exactly the kind of high-quality experimental data over homologs that we need. The support for this statement is evident in (new) Figures 6 and 8, which show that conserved experimental couplings can indeed be predicted with high statistical significance from statistical genomic methods (Figure 6B-D). Based on these findings, the new Figure 7 shows the full prediction for the pattern of conserved couplings in the whole PDZ domain, an analysis that can now be done for many proteins.

8) The deep sequencing data should be made publicly available and easily accessible.

Our laboratory is fully committed to open sharing of data, and indeed, all the data will be made available to the scientific community. Indeed, we are using the Dryad database to distribute the full dataset.

Coevolution:

9) We have some concern about the correlations in Figure 5. To what extent are the differences meaningful given that most of the data points are clustered together and the differences between plots appear to be due to a few outliers?

The Pearson’s coefficient of determination (conventionally called the 𝑟") provides a measure of linear correlation between two variables, but does not give the significance of this correlation. Indeed, is the correlation different from no correlation, given scatter in the data and number of measurements? To address this, we report the significance of the correlation in the p-value of the F-test, with the null hypothesis that the correlation is zero. Thus, we answer in the manuscript precisely what the reviewers are asking…the meaningfulness of correlations and of differences in correlation over coevolution methods. We now include the F-test p-values in the relevant figure panels.

In revision, we have also made these arguments more explicit and have added further analyses to show how the significance of the correlation depends on various parameters – alignment size, composition, and mathematical details of the both the SCA and DCA methods (new Figures 6 and 8, and many edits in the main text). For three independent alignments, SCA shows significant correlation with the experimental data (𝑝 = 10-14𝑡𝑜 10-12 by F-test), while DCA shows weaker or no significant correlation (𝑝 =. 001 𝑡𝑜. 09), where the conventional 𝑝 = 0.05 or below is used as a threshold for significance. In revision, we also now explain the likely explanations for these differences in predictive power, which should stimulate the scientific community to unify and improve the statistical coevolution methods.

10) With so many homologous sequences, it should be possible to examine the robustness of the results using k-fold tests. For example, how does evolutionary coupling computed using (randomly chosen) half of the taxa compare to the values obtained with the other half? And how do the couplings in each set correlate with the energetic couplings?

11) Regarding the previous point, it would be insightful to examine the robustness of the two methods used to estimate coevolution. Both DCA, used to detect direct couplings, and SCA, used to suggest 'functional couplings'. Hopefully both will be equally robust in k-fold tests.

We agree that it is valuable to report the robustness of the methods to sequencing sampling. In revision, we show that SCA is highly robust to both choice of alignments that vary in size and method of construction (new Figures 6B-D), and random sampling of the alignment (Figure 6—figure supplement 1), and we show that DCA is less robust to alignment choice (Figures 8C-E) and random samplings of the best-performing alignment (Figure 8—figure supplement 1). The results are not surprising; SCA focuses on conserved correlations and is therefore robust to size and errors in alignments, and DCA focuses on all correlations and is sensitive to alignment composition and errors.

12) "Extracting the coevolution pattern in the top eigenmode for just the α2 helix (Figure 5C), we find that coevolution as defined by SCA in fact nearly quantitatively recapitulates the homolog averaged experimental couplings collected here (𝑟2 = 0.82, 𝑝 = 10;<= by F-test, Figure 5D)." r2 = 0.82 is fair correlation at most. This sentence should be tuned down considerably.

As noted above, the significance of any correlation is quantitatively given in the F-test p-value, but there is no doubt that words used to express this result can be a matter of individual judgement. We have toned down the language used in this regard to be simply reflective of just what the numbers say.

13) The statement in the sentence before the Discussion that non-contact couplings in the DCA model represent noise is at odds with Anishchenko et al., 2017. This discrepancy requires some discussion.

This matter is confusing and requires some explanation and clarity. We thank the reviewers for pointing it out, especially the need to directly address the issue in the paper. Anishchenko et al. show that for the top L/2 DCA couplings in the 𝐽12 matrix (where 𝐿 is the length of the protein), cases of apparently non-contacting residues are often resolved as true contacts by one of three explanations: (a) they are contacts induced by oligomerization, (b) they are contacts in other conformational states of the same protein, or (c) they are contacts in homologous members of the family. In our work reported here, we are talking about all couplings in the 𝐽12 matrix (not just the top ones), most of which are truly not-contacting in the tertiary structure. Thus, there is no discrepancy with the specific findings of Anishchenko et al., except perhaps to point out that the weak non-contacting couplings in the 𝐽12 matrix are not noise. They do contribute to function prediction (compare new Figures 8C and 8F), even if they do not contribute to structure prediction. This does argue for a revision of the main conclusions of Anishchenko et al. that evolutionary constraints in protein sequences are strictly in direct contacts. In revision, we have significantly edited the main text (both Results and conclusions) to clarify these points and more importantly, we explicitly provide a hypothesis for what the next steps in refining the DCA-like approach might be.

14) When comparing the two coevolution methods the authors should make the most of each. For some reason they use a very small alignment in DCA. About one tenth of all possible sequences.

Consistent with the reviewers’ request, we now report data for three different alignments in the case of both SCA and DCA, designed to give both methods an optimal chance to explain the functional data. The analysis shows that SCA is (expectedly) robust to alignment size and composition (new Figure 6B-D), while DCA is (expectedly) not (Figure 8C-E). An interesting finding is that DCA has the best performance with the manually adjusted, structure-based alignment we originally reported (Figure 8C, 1656 seqs, 𝑟2 = 0.34, 𝑝 =.001), and shows much poorer performance with larger, publicly available alignments (Figure 8D-E, with 9610 and 102410 seqs, respectively; 𝑟2 = 0.11 − 0.05, 𝑝 =.02 −. 09).

15) DCA (and a full evolutionary study) takes into account both direct and indirect couplings between all residues. SCA on the other hand, takes into account only couplings between amino acid pairs. Thus, the energetic couplings work, via the double-mutant cycle, is more similar in spirit to SCA. The authors should refer to this point when correlating the two evolutionary coupling methods to the energetic coupling analysis.

These assertions require correction and clarification, and we thank the reviewers for giving us the opportunity to do so. DCA makes an energy model (a sequence “Hamiltonian”) that is defined by intrinsic and pairwise interactions of amino acids (new Figure 8B) and tries to account for all the observed frequencies and pairwise correlations of amino acids in the alignment with these parameters. SCA makes no explicit assumption of model, and mathematically decomposes a conservation-weighted correlation matrix to discover clusters of coevolving amino acids. Since pairwise correlations in an alignment could arise from higher order interactions (or couplings) between amino acids, SCA is implicitly open to models with higher than pairwise interactions in the Hamiltonian, while DCA (as currently formulated) is not. We now more explicitly spell out these ideas in the revised manuscript.

As for mutant cycles, the mathematical relationship of thermodynamic coupling to sequence coevolution is not trivial and is an active area of investigation (see Poelwijk et al. PLoS Comp. Biol. 12, e10004771). What we can say is that homolog-averaged thermodynamic couplings are analogous to couplings in the sequence Hamiltonian, and therefore the comparisons with experimental data directly test the energy function inferred by the DCA formalism. Indeed, the prediction of mutational effects using the sequence Hamiltonian is precisely what the DCA community is proposing now (e.g. Hopf et al., 2017). What is less obvious is the mathematical relationship of energetic couplings and the top modes of the SCA coevolution matrix. Nevertheless, the fact that these top modes predict energetic couplings well provides an important clue for now formally unifying SCA and DCA to produce models for proteins that accurately estimate both structural contacts and functional effects. We expect that the work presented here will contribute to such an outcome.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

The manuscript has been improved […] Nevertheless, there still are some concerns after the revision. The main one is that the equation in Figure 4 seems wrong. This may be a typo but if not, then the analysis in the paper based on this equation is also wrong. The correct form should be:

Kxapp = Kx(1+Kc*α)/(1 + Kc)

Well, the equation is not wrong – it just depends on the way that one writes the definitions of the equilibrium constants. In our analysis (Figure 4A), the reactions 𝐾x and 𝛼𝐾x are defined as dissociation constants (𝐾 x= [0𝐹][𝐿]/[0𝐵] and 𝛼𝐾x= [1𝐹][𝐿]/[1𝐵]) and the internal configurational equilibrium is defined in the following way: 𝐾c= 0𝐹/1𝐹 and 𝛼𝐾c= 0𝐵/1𝐵. If one uses these definitions, the apparent binding affinity is given by:

Kxapp=Kx1+Kc1α+Kc

and the fraction of bound protein is given by 𝑓B = [𝐿]/([𝐿] + Kxapp). This is what is given in the paper and in Figure 4A. But, as long as one maintains detailed balance, one is free to define the equilibrium constants differently. In that case, the equation for Kxapp will look different, but no insights or conclusions will differ. For example, if one defines 𝐾x and 𝛼𝐾x as association constants (𝐾x = [0𝐵]/[0𝐹][𝐿] and 𝛼 𝐾x= [1𝐵]/[1𝐹][𝐿]) and the internal configurational equilibrium is defined in the opposite way: 𝐾c= 1𝐹/0𝐹 and 𝛼𝐾c= 1𝐵0𝐵, then the equation for Kxapp will be:

Kxapp=Kx1+αKc1+Kc,

where now the fraction of bound protein is given by 𝑓B= [𝐿]/[L]+1Kxapp, since the apparent dissociation constant is the inverse of the apparent association constant. This is the equation indicated by the reviewers. A quick check of the expected limits of the reaction equilibria at the extreme conditions of 𝐾cvery large or very small will show that both these equations, with equilibrium constants defined in their own ways, give exactly the same results and insights.

The bottom line is that the equation and analysis in Figure 4 are correct. However, we thank the reviewers for raising our awareness to the importance of defining our conventions for equilibrium constants in deriving formulae. We have edited the figure legend of Figure 4 to make the definitions explicit.

Other comments:

1) Averaging over mutation types is better than some arbitrary choice but, under favorable circumstances, mutations to alanine are preferred as a reference because with this substitution interactions are mostly removed without new ones being introduced.

We fully understand and sympathize with the long-held principle that if one were forced to make a single mutation to estimate the native coupling energy between two positions, alanine mutations are preferred. But, what our work shows with considerable data is that while alanine mutations might be preferred over other substitutions for single thermodynamic cycles, the best estimator of native couplings is nevertheless to average over all possible combinations of mutations at a pair of sites. This is justified by the nature of the empirical distributions and it sensibly eliminates the random vagaries of couplings estimated by specific pairs of mutations. We show that if one can manage to do it, the average over the ensemble is better experiment that any particular choice of individual mutations.

2) In previous work (Cell 2009), the authors attributed the top eigenmode to evolutionary noise but not in this paper. This needs explaining.

In Halabi et al., 2009, the main result was the finding of multiple near independent sectors. For this purpose, the top eigenmode is irrelevant since it contains the coherent correlations of all sequence positions. Thus, independent sectors were represented in the next few eigenmodes, namely two to four. While it is true that the top mode contains phylogenetic noise (as do all modes to some extent), it also clearly contains significant signal. Accordingly, in Halabi et al., we did not remove the top mode; we simply ignored it. In follow up papers (including a methods focused paper – Rivoire et al., 2016, we have explained our advances in understanding of how to handle the analysis of the SCA matrix in cases of single or multiple sectors. The best approach is use analytical methods such as independent component analysis to study the independence (or not) of the coevolution pattern. We reference this study explicitly in this work and since we do not report any new advance in SCA here, we hope that our citations will suffice to explain the current practice of the SCA method.

3) We still think that the coincidence of the pattern of couplings observed in this paper with i, i+3,4 periodicity in helices suggests that maybe they reflect secondary structure and not allostery.

Well, the pattern of experimental couplings shows an exact agreement with the known allosteric mode represented by the sector in PDZ domains (Figure 7) and shows partial agreement with helical periodicity with notable major exceptions (e.g. the large 1-8 coupling, and the fact that the majority of i, i+3/i+4 couplings are zero). We believe the interpretation of these data can be left to the scientific community with the publication of these data.

To be clear, we are not challenging the basic principles of secondary structure, but we strongly feel that the secondary structure centric view of energetic interactions that stabilize protein states has caused all of us to be less open that we should about alternative structural models that might have greater explanatory power. With the collection and publication of datasets such as these, we might begin to build a basis for rigorously assessing these views.

4) It should be noted that correlated mutations between distant residues can also reflect negative design (Noivirt PLoS Comp. Biol. 2009).

We have added a sentence in the subsection “Coevolution-based inference of functional couplings” to reflect this point and have added the suggested citation.

https://doi.org/10.7554/eLife.34300.029

Article and author information

Author details

  1. Victor H Salinas

    Green Center for Systems Biology, UT Southwestern Medical Center, Dallas, United States
    Contribution
    Conceptualization, Software, Formal analysis, Supervision, Funding acquisition, Validation, Visualization, Writing—review and editing
    Competing interests
    No competing interests declared
  2. Rama Ranganathan

    1. Center for Physics of Evolving Systems, Biochemistry and Molecular Biology, The University of Chicago, Chicago, United States
    2. Institute for Molecular Engineering, The University of Chicago, Chicago, United States
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing—original draft, Project administration, Writing—review and editing
    For correspondence
    ranganathanr@uchicago.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-5463-8956

Funding

Welch Foundation (I-1366)

  • Rama Ranganathan

National Institutes of Health (RO1GM12345)

  • Rama Ranganathan

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank FJ Poelwijk, MA Stiffler, C Nizak, O Rivoire, M Weigt, R Monasson, and S Cocco for discussions and technical advice, and members of the Ranganathan laboratory for critical review of the manuscript. We also thank the High Performance Computing Group (BioHPC) and the Genomics Core at UT Southwestern for providing computational resources and sequencing, respectively. This work was supported by NIH Grant RO1GM12345 (to RR), a Robert A Welch Foundation Grant I-1366 (to RR), and the Green Center for Systems Biology at UT Southwestern Medical Center. VS was supported in part through a pre-doctoral fellowship (NIGMS T32 GM008203).

Senior Editor

  1. Detlef Weigel, Max Planck Institute for Developmental Biology, Germany

Reviewing Editor

  1. Nir Ben-Tal, Tel Aviv University, Israel

Publication history

  1. Received: December 12, 2017
  2. Accepted: July 18, 2018
  3. Accepted Manuscript published: July 19, 2018 (version 1)
  4. Accepted Manuscript updated: July 20, 2018 (version 2)
  5. Version of Record published: August 30, 2018 (version 3)

Copyright

© 2018, Salinas et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 5,872
    Page views
  • 945
    Downloads
  • 26
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, Scopus, PubMed Central.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Biochemistry and Chemical Biology
    2. Microbiology and Infectious Disease
    CJ Cambier et al.
    Research Article

    Several virulence lipids populate the outer cell wall of pathogenic mycobacteria (Jackson, 2014). Phthiocerol dimycocerosate (PDIM), one of the most abundant outer membrane lipids (Anderson, 1929), plays important roles in both defending against host antimicrobial programs (Camacho et al., 2001; Cox et al., 1999; Murry et al., 2009) and in evading these programs altogether (Cambier et al., 2014a; Rousseau et al., 2004). Immediately following infection, mycobacteria rely on PDIM to evade Myd88-dependent recruitment of microbicidal monocytes which can clear infection (Cambier et al., 2014b). To circumvent the limitations in using genetics to understand virulence lipids, we developed a chemical approach to track PDIM during Mycobacterium marinum infection of zebrafish. We found that PDIM's methyl-branched lipid tails enabled it to spread into host epithelial membranes to prevent immune activation. Additionally, PDIM's affinity for cholesterol promoted this phenotype; treatment of zebrafish with statins, cholesterol synthesis inhibitors, decreased spreading and provided protection from infection. This work establishes that interactions between host and pathogen lipids influence mycobacterial infectivity and suggests the use of statins as tuberculosis preventive therapy by inhibiting PDIM spread.

    1. Biochemistry and Chemical Biology
    2. Neuroscience
    Damien Lemoine et al.
    Research Article Updated

    Glutamate delta (GluD) receptors belong to the ionotropic glutamate receptor family, yet they don’t bind glutamate and are considered orphan. Progress in defining the ion channel function of GluDs in neurons has been hindered by a lack of pharmacological tools. Here, we used a chemo-genetic approach to engineer specific and photo-reversible pharmacology in GluD2 receptor. We incorporated a cysteine mutation in the cavity located above the putative ion channel pore, for site-specific conjugation with a photoswitchable pore blocker. In the constitutively open GluD2 Lurcher mutant, current could be rapidly and reversibly decreased with light. We then transposed the cysteine mutation to the native receptor, to demonstrate with high pharmacological specificity that metabotropic glutamate receptor signaling triggers opening of GluD2. Our results assess the functional relevance of GluD2 ion channel and introduce an optogenetic tool that will provide a novel and powerful means for probing GluD2 ionotropic contribution to neuronal physiology.