Ligand binding remodels protein side-chain conformational heterogeneity

  1. Stephanie A Wankowicz
  2. Saulo H de Oliveira
  3. Daniel W Hogan
  4. Henry van den Bedem
  5. James S Fraser  Is a corresponding author
  1. Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, United States
  2. Biophysics Graduate Program, University of California San Francisco, United States
  3. Atomwise Inc., United States

Abstract

While protein conformational heterogeneity plays an important role in many aspects of biological function, including ligand binding, its impact has been difficult to quantify. Macromolecular X-ray diffraction is commonly interpreted with a static structure, but it can provide information on both the anharmonic and harmonic contributions to conformational heterogeneity. Here, through multiconformer modeling of time- and space-averaged electron density, we measure conformational heterogeneity of 743 stringently matched pairs of crystallographic datasets that reflect unbound/apo and ligand-bound/holo states. When comparing the conformational heterogeneity of side chains, we observe that when binding site residues become more rigid upon ligand binding, distant residues tend to become more flexible, especially in non-solvent-exposed regions. Among ligand properties, we observe increased protein flexibility as the number of hydrogen bonds decreases and relative hydrophobicity increases. Across a series of 13 inhibitor-bound structures of CDK2, we find that conformational heterogeneity is correlated with inhibitor features and identify how conformational changes propagate differences in conformational heterogeneity away from the binding site. Collectively, our findings agree with models emerging from nuclear magnetic resonance studies suggesting that residual side-chain entropy can modulate affinity and point to the need to integrate both static conformational changes and conformational heterogeneity in models of ligand binding.

Editor's evaluation

This work attempts to extract information about protein thermodynamics from X-ray crystallography data, which is a challenging problem. The heterogeneous pattern of order parameter changes in response to ligand binding implies that the approach is identifying new information. This work offers insights into ligand binding affinity and specificity mechanisms, suggesting that distal (allosteric) perturbations represent a possible avenue to modulate protein function.

https://doi.org/10.7554/eLife.74114.sa0

eLife digest

Proteins are the workhorses of our cells. They are large molecules that ‘fold’ into specific, often highly complex, three-dimensional configurations. These structures are not static, but rather dynamic and flexible. In other words, proteins can shift between different three-dimensional shapes to perform their tasks within the cell.

To perform their roles, many proteins have to bind to small molecule ligands. Many ligands are drugs, which means that their effectiveness depends on their ability to bind to and impact the proteins involved in the disease they are treating.

When a ligand binds to a protein, it can reshape the protein. For example, certain conformations of the protein, which were difficult for the protein to be in on its own, may become more stable when the ligand binds. Additionally, upon ligand binding, some parts of the protein may move relative to each other. Previous studies have shown that these movements can affect the interaction between ligand and protein. However, these studies only examined a small number of proteins. Therefore, Wankowicz et al. set out to determine, in greater detail, what happens to protein flexibility upon ligand binding.

First, a pipeline was created to model alternative configurations of the protein both with and without ligands attached. These models measured flexibility within protein structures. The models revealed that when ligands bind to proteins, the flexibility of different regions of the protein changes – and does so in a consistent way. Proteins that become more rigid in the region interacting with their ligands become less rigid in other, distant regions, and vice versa. In other words, the rest of the protein is able to compensate for any changes in flexibility caused by ligand binding, which may contribute to how well a ligand binds to a protein.

This study demonstrates the ability of ligands to affect the entire structure of the proteins they bind to, and therefore sheds new light on the role of proteins’ innate conformational flexibility during this process. These results will contribute to our understanding of how the ligands and proteins involved in different cellular processes interact with each other – and, potentially, how these interactions can be manipulated.

Introduction

Ligand binding is essential for many protein functions, including enzyme catalysis, receptor activation, and drug response (Mobley and Dill, 2009). Ligand binding reshapes the protein conformational ensemble between the ligand-bound (holo) and unbound (apo) states, stabilizing some conformations and destabilizing others (Boehr et al., 2009). Despite the dynamic nature of proteins, when comparing structures, often only static conformational changes are considered. However, differences due to ligand binding can range from large, collective movements, such as a loop closure over the binding pocket, to small, local fluctuations of side chains (Gutteridge and Thornton, 2005). Differences in binding affinity and specificity are most often attributed to the enthalpic portion of binding free energy, including visualized interactions between the receptor and ligand. On the other hand, conformational heterogeneity, especially side-chain fluctuations, can also contribute energetically to the binding affinity by modulating entropy (Wand and Sharp, 2018; Tzeng and Kalodimos, 2012). While the individual fluctuation of residues is small, they can add up to significantly contribute to the entropic portion of binding free energy. Previous work examining a diverse set of protein complexes calculated that protein conformational entropy can contribute between –2 (favoring) and 4 (disfavoring) kcal/mol to binding free energy (Caro et al., 2021; Caro et al., 2017). A holistic understanding of the origins of binding would ideally explore both enthalpic and entropic energetic contributions to binding affinity (Zhou and Gilson, 2009).

Side-chain conformational heterogeneity, including jumps between and variation within rotameric conformations, measured by nuclear magnetic resonance (NMR) relaxation studies has been linked to entropy (Caro et al., 2021; Frederick et al., 2007). In principle, complementary information could be accessed by other structural methods. Structural information from X-ray crystallography or cryo-electron microscopy (CryoEM) typically produces a single set of structural coordinates. However, the underlying density maps are created from thousands-to-millions of protein molecules and averaged in both time and space through the crystal lattice or electron microscope particle stack (Woldeyes et al., 2014; Cheng et al., 2015). When averaged in a single-density map, conformational heterogeneity across these copies can manifest as ‘anharmonic disorder,’ which can be modeled using multiple alternative conformations, or ‘harmonic disorder,’ which can be modeled by B-factors/atomic displacement parameters (Figure 1A). Molecular dynamics experiments have demonstrated that if alternative conformations are not modeled correctly and consistently, then B-factors take on values that are not representative of the underlying conformational heterogeneity (Kuzmanic et al., 2014; Kuriyan et al., 1986). Moreover, B-factors incorporate many effects, including the biases and restraints of the refinement programs, modeling errors, crystal lattice defects, and occupancy changes of atoms. Therefore, consistently modeling X-ray structures as multiconformer models, with alternative side-chain and backbone conformations, along with B-factors, may better complement the view emerging from NMR and improve our understanding of the energetics of binding (van den Bedem and Fraser, 2015).

Figure 1 with 6 supplements see all
Representing structural data as multiconformer models.

(A) The gray outlines represent snapshots of the true underlying ensemble of the phenylalanine residue. The orange stick represents the residue modeled as a single conformer. The teal sticks represent the residue modeled as alternative conformers. The single conformer accounts for all heterogeneity in the B-factor, increasing the B-factor and reducing our ability to determine harmonic versus anharmonic motion. When a residue is modeled using alternative conformers, this heterogeneity is divided between harmonic heterogeneity, captured by the B-factors of each alternative conformation and the anharmonic heterogeneity, captured by spread in coordinates between the alternative conformations. (B) To quantify the conformational heterogeneity of each residue, we used multi-conformer order parameters (Fenwick et al., 2014), which are the products of the ortho order parameter that captures the harmonic or B-factor portion of each conformation and the angular order parameter that captures the anharmonic portion or the displacement between alternative conformers. These are multiplied to produce the final order parameter (Materials and methods). (C) The change in the number of alternative conformers (holo-apo) in binding site residues. In the re-refined dataset (orange), the majority of structures have the same number of alternative conformers in the binding site, with the second most popular category gaining alternative conformers in the holo structure. In the qFit dataset (teal), the majority of structures lose an alternative conformer in the holo structure, with the second most common category being gaining an alternative conformer. (D) The differences in B-factors (holo-apo) in the re-refined (orange) and qFit (teal) datasets. Overall, there was no significant difference in B-factors between holo and apo structures in both the re-refined and qFit datasets.

Here, we examine how protein side-chain conformational heterogeneity changes upon ligand binding by assembling a large, high-quality dataset of matched holo and apo X-ray crystallography structures. To integrate both harmonic and anharmonic disorders, we use a consistent multiconformer modeling procedure, qFit (Riley et al., 2021a), and crystallographic order parameters (Fenwick et al., 2014). We test the hypothesis that ligand binding narrows the conformational ensemble, resulting in a decrease in heterogeneity of side chains in the holo structure compared with the apo structure. Our analysis reveals complex patterns of conformational heterogeneity that vary between and within proteins upon ligand binding. Specifically, in proteins where binding site residues become more rigid upon ligand binding, distant residues tend to become less rigid. This observation suggests that both natural and artificial ligands can modulate the natural composition of the protein conformational heterogeneity across the entire receptor to modulate the free energy of binding.

Results

Assembling the dataset

To assess the differences in conformational heterogeneity upon ligand binding, we identified high-quality, high-resolution (2 Å resolution or better) X-ray crystallography datasets from the PDB (Berman et al., 2000). We classified structures as holo if they had a ligand with 10 or more heavy atoms, excluding common crystallographic additives (Supplementary file 1, Figure 1—figure supplement 1A). Structures without ligands, excluding common crystallographic additives, were classified as apo (Figure 1—figure supplement 1A). We identified apo/holo matched pairs by requiring the same sequence and near-isomorphous crystallographic parameters. Furthermore, we required the resolution difference between holo and apo pairs to be 0.1 Å or less, selecting representative apo structures to minimize the difference in resolution (Figure 1—figure supplement 1B). This stringently matched ligand holo-apo dataset contained 1205 pairs (Supplementary file 2). We also used identical selection criteria to create a control dataset of 293 apo-apo pairs, taken from the set of apo/holo pairs (Supplementary file 3).

Re-refining and qFit modeling of apo/holo pairs

To minimize biases resulting from different model refinement protocols, we re-refined all structures using the deposited structure factors and phenix.refine (Liebschner et al., 2019). The majority of structures in our re-refined dataset had less than 2% of residues modeled with alternative conformations, likely reflecting undermodeling of conformational heterogeneity represented in the PDB, based on prior literature (Lang et al., 2010). To more consistently assess conformational heterogeneity, we rebuilt all structures using qFit, an automated multiconformer modeling algorithm (Keedy et al., 2015; Riley et al., 2021a) with subsequent refinement using phenix.refine (Liebschner et al., 2019). While qFit has biases, running all models through a consistent protocol will avoid manual biases that could creep into the holo or apo structures specifically. Additionally, by re-building each model as a multiconformer model, we were able to better distinguish the contributions of harmonic and anharmonic conformational heterogeneity across the structure (Figure 1A and B). All models went through additional quality control, removing structures that resulted in large increases in R-free at each refinement step, high clashscores, or large root mean squared difference (RMSD) between the pairs (Materials and methods, Figure 1—figure supplement 2). This procedure resulted in 743 pairs. Due to apo datasets serving as the reference state for multiple ligand-bound structures, our dataset consists of 743 unique holo structures and 432 unique apo structures.

Properties of the apo/holo pairs

The median resolution across our dataset was 1.6 Å with a small trend towards improved (higher) resolution in the apo structure (0.01 Å, median improvement [holo-apo]; p=3.8 × 10–20, Wilcoxon signed-rank test; Figure 1—figure supplement 3A and B). Across the dataset, 546 unique ligands were present in the structures, with 134 of these (e.g., NAG, AMP) appearing in multiple structures (Figure 1—figure supplement 4A). The median number of ligand heavy atoms was 19, with only 10 very large ligands (>50 heavy atoms, e.g., atazanavir; Figure 1—figure supplement 4B). The proteins in the dataset represent 315 unique UniProt IDs, with a bias towards enzymes that have been used for model systems for structural biology, including endothiopepsin (n = 73 pairs), lysozyme (n = 62 pairs), trypsin (n = 48 pairs), and carbonic anhydrase 2 (n = 46 pairs; Figure 1—figure supplement 4C and D).

Conformational heterogeneity across the re-refined and qFit dataset

To determine the differences in conformational heterogeneity upon ligand binding in both the re-refined and qFit models, we assessed four commonly used metrics: the number of alternative conformers, B-factors (atomic displacement parameter), root mean square fluctuations (RMSF), and rotamer changes.

Number of alternative conformations

Alternative conformations were modeled at low frequency in the re-refined dataset compared to the qFit modeled structures (1.7% vs. 47.8% of residues). In the re-refined dataset, there is a bias to increased modeling of alternative conformations in the holo dataset (50.5% gain vs. 29.8% loss), whereas more even representation was observed in the qFit dataset (44.3% gain vs. 54.8% loss; Figure 1—figure supplement 5A). These results suggest that the trend of increased side-chain conformational heterogeneity in PDB deposited structures may have its origin in human bias with more careful human attention to careful model building of binding site residues in holo structures.

We next focused our analysis on binding site residues, defined as any residue with a heavy atom within 5 Å of any ligand heavy atom. In the re-refined dataset, 23.9% of the matched pairs had a gain in alternative conformations in the holo model compared to only 19.3% losing an alternative conformer in the holo model, suggesting, counterintuitively, that ligand binding increases local side-chain mobility (Figure 1C). However, in the qFit dataset, holo models tend to lose alternative conformations in the binding site residues (39.7% gain vs. 51.5% loss; Figure 1C).

B-factors

Next, we explored the harmonic contribution to conformational heterogeneity as modeled by B-factors on a pairwise, residue-by-residue basis. Across all residues in the re-refined dataset, B-factors were slightly higher in holo models (0.31 Å2, median difference [holo-apo]; p=4.4 × 10–208, Wilcoxon signed-rank test; Figure 1—figure supplement 5B). In the qFit dataset, similar to the re-refined structures, holo residues had slightly higher B-factors (0.34 Å2, median difference [holo-apo]; p=5.6 × 10–264, Wilcoxon signed-rank test; Figure 1—figure supplement 6A). Of note, the B-factors in the qFit dataset are slightly smaller than the re-refined dataset (13.41 Å2 vs. 13.94 Å2, average B-factors), reflecting the tendency for alternative conformation effects to be modeled as increased B-factors. When examining the binding site residues, there was no significant difference in B-factors between the holo and apo models in both the re-refined (0.01 Å2, median difference in B-factors; p=0.34, Wilcoxon signed-rank test; Figure 1D) and qFit datasets (0.06 Å2, median difference in B-factors; p=0.7, Wilcoxon signed-rank test; Figure 1D, Figure 1—figure supplement 6B). The lack of change in B-factors close to ligands between the holo and apo models indicates that changes between the holo and apo B-factors are driven by signals distant from the binding site.

Conformational differences incorporating alternative conformations

Because of the low number of alternative conformers in the re-refined dataset, we only explored the anharmonic differences for side chains between the holo and apo models in the qFit dataset. First, to determine the extent of conformational change of alternative conformations, we compared the rotameric distribution of side chains. Side-chain rotamer changes between apo and holo structures have been reported to be very prevalent in single-conformer models, with 90% of binding sites having at least one residue changing rotamers upon ligand binding (Gaudreault et al., 2012). To accommodate multiconformer models, we assigned all conformations to distinct rotamers using phenix.rotalyze. We classified each residue as having ‘no change’ in rotamers if the set of rotamer assignments matched the holo and apo residue. In binding sites, we also observed that ‘no change’ was the most common outcome for residues (78.6%; Figure 2A). In the second largest category, ‘distinct,’ the holo and apo residues shared no rotamer assignments (15.5% of residues; Figure 2B).

Figure 2 with 1 supplement see all
Examples of rotamer changes between apo (purple) and holo (green) binding site residues.

(A) Example residues for: ‘no change’ in rotamer status, accounting for 78.7% of binding site residues; (B) ‘distinct’ rotamers, accounting for 14.9% of binding site residues; (C) ‘remodeled-holo loss,’ accounting for 2.6% of binding site residues; and (D) ‘remodeled-holo gain,’ accounting for 3.8% of binding site residues. (E) The percentage of residues in the binding site that have the same rotamer status in the holo and apo structures. The black line highlights the 11% of pairs that had the same rotamer status for all binding site residues. (F) Paired galectin-3 apo (purple; PDB: 5NFC) and holo (green; PDB: 4JC1, ligand: thiodigalactoside) multiconformer models with no changes in rotamer status in any binding site residues. (G) Paired transthyretin apo (purple; PDB: 1ZCR) and holo (green; PDB: 3CFN, ligand: 1-anilino-8-naphthalene) multiconformer models with six out of nine residues with remodeled or different rotamer status in the binding site residues. Residues with rotamer changes are shown as sticks. Residues with no change in rotamer status are shown as lines.

A more complicated situation occurs when some, but not all, of the rotamer assignments are shared across apo and holo residue. We classified 2.6% of residues as ‘remodeled-holo loss’ (Figure 2C) if distinct, additional rotameric conformations were populated in the apo residue only and 3.8% of residues as ‘remodeled-holo gain’ (Figure 2D) if distinct, additional rotameric conformations were populated in the holo residue only. These results suggest a counterintuitive interpretation of binding site residues increasing their conformational heterogeneity upon ligand binding. However, a major potential confounder is that holo structures reflect an ensemble average of two compositional states (apo and holo) with alternative conformations representing the apo state at reduced occupancy, which we examined by subsetting the ligands based on relative B-factors (see below). A potential for a third category of remodeling, where both apo and holo residues share at least one conformation and each have at least one additional conformation, did not occur in our dataset.

Across apo-holo pairs, there was a large range of the percentages of binding site residues with the same rotamer classification in the pairs (23.2–100.0%), indicating that side-chain remodeling can be quite variable (Figure 2E). We found that 11% of binding sites had all residues classified as ‘same’ between pairs, consistent with a previous study that used single-conformer models (Gaudreault et al., 2012). As an example of such a ‘pre-organized’ binding site is galectin-3 bound to thiodigalactoside (PDB: 5NFC, 4JC1; Figure 2F). In contrast, 67% of binding site residues have a rotamer status difference in transthyretin (PDB: 1CZR, 3CFN; Figure 2G), including a rotamer change in Leu101 to avoid a clash with the ligand.

To compare the magnitude of fluctuations between alternative conformations, we calculated RMSF for all residues. This analysis suggested that, on average, apo residues have slightly greater conformational heterogeneity than holo residues (–0.006 Å, mean difference of RMSF(holo-apo); p=3.7 × 10–8, Wilcoxon signed-rank test; Figure 2—figure supplement 1A). This trend was somewhat stronger in binding site residues (–0.02 Å, mean difference of RMSF(holo-apo); p=4.5 × 10–29, Wilcoxon signed-rank test; Figure 2—figure supplement 1B). Our RMSF results suggest that, on average, there is a slight decrease in heterogeneity upon ligand binding and that this reduction is most prevalent at residues distant from the binding site.

Collectively, these results do not conform to a simple model. There is a large amount of variability in the response across datasets and the median responses reveal only small biases. Nonetheless, considering those average responses, upon binding a ligand, the RMSF analysis suggests decreases in heterogeneity at the binding site, whereas the rotamer comparison has a slight bias to increased heterogeneity at the binding site, and B-factors only change at distant sites. One interpretation is that heterogeneity is reduced in binding site residues by a small number of anharmonic conformational changes, as observed by the RMSF reduction, paired with an increase in harmonic fluctuations far away, as observed by an increase in the B-factors. However, it is difficult to interpret these changes separately as conformational heterogeneity is a combination of both harmonic and anharmonic motion and there is potential degeneracy in modeling alternative conformations, even with qFit (van den Bedem et al., 2009). Therefore, we moved to using an integrated measurement of order parameters that can account for these complications (Fenwick et al., 2014).

Order parameters integrate both harmonic and anharmonic conformational heterogeneity

To integrate the anharmonic fluctuations between alternative conformers with the harmonic fluctuations modeled by B-factors (Kuzmanic et al., 2014), we used a crystallographic order parameter (Figure 1A; Fenwick et al., 2014). Order parameters allow us to capture the conformational entropy both within and between side-chain rotamer wells. While order parameters are traditionally used in NMR or molecular dynamic simulations, they can be calculated for multiconformer X-ray models and, in some cases, show reasonable agreement with solution measures (Fenwick et al., 2014). We focused on the order parameters of the first torsion angle (χ1) of every side chain for all residues except for glycine and proline. Order parameters are measured on a scale of 0–1, with 1 representing a fully rigid residue and 0 representing a fully flexible residue. Below, we analyze the differences in normalized order parameters between paired residues (Materials and methods, Figure 3—figure supplement 1).

As an additional control, we compared our apo/holo dataset to a dataset of apo/apo pairs. In examining the differences in order parameters, both in the apo/holo pairs and the apo/apo pairs, there are no large differences in conformational heterogeneity, as indicated by a median difference in order parameters of approximately 0. However, in the apo/holo pairs there is a much wider range of order parameter differences, indicating that ligand binding impacts conformational heterogeneity beyond experimental variability (p=3.4 × 10–17, individual Mann–Whitney U test; Figure 3A). We also explored the different binding site cutoff values, ranging from 2 to 10 Å, observing that the tighter the binding site definition, the more drastic the difference in order parameters between holo and apo pairs (Figure 3—figure supplement 2).

Figure 3 with 5 supplements see all
Ligand binding alters conformational heterogeneity patterns.

(A) Across all residues, the distribution of order parameter changes is much wider in holo-apo pairs compared to apo-apo pairs (p=3.4 × 10–17, individual Mann–Whitney U test); however, there is no median difference in order parameters upon ligand binding (median difference: 0 for both), indicating that ligands have varying impacts across different proteins. (B) The distribution of the average differences of order parameters in binding site residues compared to the average differences in a control dataset made up of the same number, type, and solvent exposure of amino acids. Comparing the apo/holo structures, on average binding site residues got more rigid upon binding. The median difference in order parameters was 0.03 for the binding site residues compared to 0 for the control dataset (p=3.4 × 10–7, individual Mann–Whitney U test). (C) The relationship of the difference in order parameters between the holo and apo residues in binding site residues versus the residual order parameter in distant, non-solvent-exposed residues. We observed a negative trend (slope = −0.44), indicating that structures that had a loss of heterogeneity in the binding site (right on the x-axis) had a relative gain in heterogeneity in residues distant from the binding site that were not solvent exposed (top on the y-axis). (D) We explore this trend in a structure of human ATAD2 bromodomain (PDB: 5A5N). Residues are colored by the differences between the average binding site order parameter minus the order parameter for each residue. Blue residues are less dynamic than the average binding site residue, and red residues are more dynamic than the average binding site residue. Binding site residues are represented by sticks, and distant, non-solvent-exposed alpha carbons are represented by spheres. The ligand ((2S)-2,6-diacetamido-N-methylhexanamide) is colored in teal.

Next, to examine whether different regions of the protein were driving this higher variability, we compared the differences in order parameters among binding site residues, within 5 Å of any ligand heavy atom, compared to a control dataset that matched the number of, type and solvent exposure within the protein for each binding site residue. In binding site residues, the holo structures had a slightly, but significantly, increased order parameters, suggesting reduced conformational heterogeneity compared to the control dataset (0.034 vs. 0, median difference [holo-apo] order parameter; p=3.4 × 10–7, individual Mann–Whitney U test; Figure 3B). While there is a larger range of responses, this indicates that, in general, binding site residues become more rigid upon ligand binding.

Spatial distribution of conformational heterogeneity changes

Based on the large range of order parameter differences we observed across the protein, along with the decrease in heterogeneity localized to binding site residues, we next explored the relationship between changes in heterogeneity in binding site residues and the rest of the protein. The difference in order parameters between the holo and apo models was correlated in both the binding site and distant residues (Figure 3—figure supplement 3A), indicating that ligand binding generally caused global changes to flexibility. Given the average rigidification of the binding site residues (Figure 3B, Figure 3—figure supplement 3B), these results predict a general trend of decreased conformational heterogeneity in the ligand binding site would be associated with a relative increase in conformational heterogeneity at distant sites in the protein. This pattern suggests that the residual change in heterogeneity (the difference between the average order parameter of the distant residues and the average order parameter of the binding site residues) should be inversely related to the change in the binding site residues: more rigidified binding sites will have more flexible than expected distant sites, and vice versa. Therefore, we explored the relationship between binding site residues and distant residues, defined as those more than 10 Å away from any heavy atom in the ligand. Indeed, on a protein-by-protein basis, the relationship between binding site residues and residual changes at distant sites follows this trend (Figure 3—figure supplement 3C). We also explored if protein size impacts our results, but did not observe any trend between protein size and order parameter correlation (Figure 3—figure supplement 3E). Consistent with studies suggesting significant residual conformational heterogeneity in folded buried residues (Wong and Daggett, 1998) and the potential for those buried residues to change heterogeneity upon ligand binding (Moorman et al., 2012), this trend is even stronger in residues that were more than 10 Å away from any heavy atom in the ligand and less than 20% solvent exposed (slope = −0.44, r2 = 0.46; p=5.1 × 10–50, two-sided t-test; Figure 3C). This indicates that proteins that lose conformational heterogeneity in the binding site are associated with a relative increase in conformational heterogeneity in distant, non-solvent-exposed residues.

There are three likely origins of this effect. First, this may reflect a feature of the distribution of order parameters around the mean value within each protein. Second, this may reflect a topological feature of protein packing, whereby packing optimization of certain areas of a protein decreases the optimization of other parts of the protein (Bromberg and Dill, 1994). Third, this may reflect the stabilization of certain conformations in a ligand-bound protein. As a control for these effects, we compared the residual order parameter differences between the buried, non-solvent-exposed residues and the binding site residues in apo-apo pairs. Globally the trends were similar, but weaker in both correlation and magnitude (slope = −0.28, r2 = 0.20; p=1.8 × 10–34, two-sided t-test; Figure 3—figure supplement 3D). The difference in the slope between the holo-apo and apo-apo dataset was further compared using a bootstrap analysis, demonstrating that the mean slope of the holo-apo is more than 2 standard deviations away from the apo-apo slope, representing the robustness in differences between the two slopes (p=0.0, z-test; Figure 3—figure supplement 3F). Therefore, we interpret the trend we observe as mainly based on protein topology, specifically that proteins have areas where there are less efficiently packed alternative conformers, likely to enable entropic compensation across the protein during various functions, including ligand binding. We interpret that the stronger signal we observed in the holo-apo dataset is due to the ligand perturbation, which is also reflected in the median rigidification of binding site residues (Figure 3B). We hypothesize that we are observing this innate protein property being used, specifically optimizing the binding site residues to bind a ligand, while decreasing the optimization elsewhere in the protein.

As an example to visualize this trend, we mapped the change in order parameters onto the structure of the human ATAD2 bromodomain (PDB ID: 5A5N). In ATAD2, the binding site residues rigidify upon ligand binding, whereas the majority of distant residues are more heterogeneous compared to the binding site residues (Figure 3D). Specifically, this difference is greatest between binding residues and non-solvent-exposed residues, as previously observed in lysozyme; however, there was only weak residue to residue correlation (Figure 3—figure supplement 4A and B; Moorman et al., 2012). However, as in the global analysis, the ATAD2 example demonstrates there is a large range of changes in binding site order parameters, consistent with NMR examples that show a heterogeneous response both close to and distant from ligands (Caro et al., 2021).

Hydrogen bond patterns change upon ligand binding

We next investigated changes in protein side-chain hydrogen bonds upon ligand binding. Here, we applied HBplus (McDonald and Thornton, 1994) to identify hydrogen bonds for each side-chain alternative conformation (Materials and methods). We examined the occupancy-weighted hydrogen bonds in binding site residues using a hydrogen bond cutoff of 3.2 Å. Overall, we observed the creation of 0.06 hydrogen bonds per residue in holo binding sites (Figure 3—figure supplement 3), which translates to 10% of structures gaining one full hydrogen bond in the holo structure. This is likely indicative of stable binding sites in holo structures. This follows a trend observed previously where upon ligand binding, hydrogen bonds to water molecules decrease, but hydrogen bonds to other protein atoms increase (Gaudreault et al., 2012).

Ligand properties influence conformational heterogeneity

Next, we investigated how ligand properties impact the conformational heterogeneity of binding site residues. For ligand properties dictated by the size of the ligand (number of rotatable bonds and number of hydrogen bonds), we normalized these metrics by the molecular weight of the ligand. For each property, we compared the highest and lowest quartiles by both the absolute order parameters of the holo structure and the order parameter differences between holo and apo pairs. No significant associations existed when comparing the differences between holo and apo order parameters, but the characteristics of the holo binding site and the rotamer changes were correlated with ligand properties in several cases.

We hypothesized that ligand properties associated with increased ligand dynamics, including more rotatable bonds, higher lipophilicity (logP), fewer hydrogen bonds, and more heavy atoms would be associated with increased conformational heterogeneity (an increase in absolute order parameters or a smaller difference between the apo and holo order parameters; Wicker and Cooper, 2015). While molecules with fewer rotatable bonds (lower quartile: <2 [n = 134] vs. upper quartile: >6 [n = 134]) were indeed associated with more rigid binding sites (lower quartile: 0.83 vs. upper quartile: 0.81, individual Mann–Whitney U test), this was not significant. However, higher numbers of rotatable bonds were associated with a lower number of same rotamers between the apo and holo binding site residues (88% vs. 80%, percentage same rotamer; p=6.0 × 10–6, individual Mann–Whitney U test; Figure 4—figure supplement 1). Increased lipophilicity (logP, upper quartile: <0.04 [n = 134] vs. lower quartile: >2.69 [n = 134]) was significantly associated with a more flexible binding site (0.79 vs. 0.84, median order parameters; p=7.5 × 10–6, individual Mann–Whitney U test; Figure 4A). Previous studies have indicated that increased lipophilicity generates more nonspecific binding interactions (Olsson et al., 2008). Larger compounds (upper quartile: >26 heavy atoms [n = 134] vs. lower quartile: <13 heavy atoms [n = 134]) are also associated with more flexible binding sites (0.83 vs. 0.79, median order parameter; p=0.0001, individual Mann–Whitney U test; Figure 4B). Large compounds, thus a larger ligand surface area, are associated with more nonspecific binding interactions, which is compatible with increased protein conformational heterogeneity. Finally, more total hydrogen bonds per heavy atom (upper quartile: >0.47 [n = 134] vs. lower quartile: <0.25 [n = 134]) are associated with more rigid binding sites (0.84 vs. 0.79, median order parameter; p=5.9 × 10–5, individual Mann–Whitney U test; Figure 4C). This trend holds even when examining hydrogen bond donors or acceptors separately.

Figure 4 with 1 supplement see all
Ligand properties impact binding site order parameters.

(A) Ligands with higher logP value (maroon), indicative of more greasy or hydrophobic ligands, versus ligands with a lower logP value (gold), had lower in order parameters in the binding site residues (0.78 vs. 0.84, median order parameter; p=7.5 × 10–6, independent Mann–Whitney U test) (example ligands: low logP: 5-phospho-d-arabinohyroamic acid; high logP: ethyl 2-amino-1,3-benzothiazole-6-carboxylate). (B) Ligands with relatively higher molecular weight (maroon) had higher-order parameters compared to those with lower molecular weight (gold; 0.79 vs. 0.83, median order parameter; p=0.0001, independent Mann–Whitney U test) (example ligands: high number of heavy atoms: (2S)-2-(3-hydroxy-3-oxopropyl)–6-[[[2-[(4-methoxyphenyl)methylcarbamoyl]phenyl]methyl-methyl-amino]methyl]-2,3-dihydro-1,4-benzodioxine-5-carboxylic acid; low number of heavy atoms: 4-carbamimidamidobutanoic acid). (C) Ligands with relatively higher hydrogen bonds per heavy atom (maroon) had higher-order parameters compared to those with lower molecular weight (gold; 0.84 vs. 0.79, median order parameter; p=5.9 × 10–5, independent Mann–Whitney U test) (example ligands: low hydrogen bond: 4-sulfamoyl-N-(2,2,3,3,4,4,5,5,6,6,6-undecafluorohexyl) benzamide; high hydrogen bond: phosphoaminophosphonic acid-adenylate ester). (D) Binding site order parameters were lower in ligands with partial occupancy (light pink; 0.79, median order parameter) and multiconformer ligands adding to full occupancy (salmon; 0.80, median order parameter) compared to single-conformer ligands with full occupancy (dark red; 0.83, median order parameter; p=4.9 × 10–8, independent Mann–Whitney U test). (E) In fully occupied ligands, ligands in the top quartile of ligand B-factors, controlled for by the mean alpha carbon B-factor, had lower binding site order parameters (salmon; 0.79, median order parameter) compared to ligands in the bottom quartile (dark red; 0.85, median order parameter; p=1.6 × 10–11, independent Mann–Whitney U test).

From these results, an intuitive general picture emerges where more specific, directional interactions, such as hydrogen bonds (Bissantz et al., 2010), are more likely to lock the corresponding protein residue in place, thus creating more rigid binding site residues (Majewski et al., 2019). Whereas the more nonspecific interactions are correlated with more flexible binding site residues. There is also a wide range of deviation from this general picture, likely reflecting that natural and artificial optimization of ligands is based on free energy, not any specific thermodynamic component or interaction type. These trends emphasize the need to monitor both the impacts of ligands on specific interactions with the protein along with conformational heterogeneity of the protein. Additionally, these results suggest that specific interactions can be tuned to rigidify a binding site. Paired with our findings of the relationship between order parameters in binding site and distant residues, ligand impacts are likely propagated throughout the protein. Ligands with more specific interactions, thus a less flexible binding site, will likely have a corresponding increase in conformational heterogeneity distant from the binding site.

Reduced ligand occupancy and conformational heterogeneity

One potential confounder for quantifying the change in conformational heterogeneity of binding site residues is that the ligands may not be fully occupied in the crystal. There were 193 structures with ligands with alternative conformations or partially occupied ligands in our datasets (Figure 4D). Of these 193, 125 ligands had less than full occupancy, whereas 68 had alternative conformations that amounted to full occupancy. The vast majority of ligands (n = 425) were modeled originally with full occupancy. Fully occupied ligands were associated with more rigid binding sites than partially occupied ligands or ligands with alternative conformers (0.84 vs. 0.79, mean order parameters of binding site residues; p=2.9 × 10–7, individual Mann–Whitney U test; Figure 4D). There was no difference observed between the partially occupied ligands and ligands with alternative conformers (p=0.15, individual Mann–Whitney U test). We also explored if partially occupied ligands were associated with more rotamer changes between holo and apo pairs, but no significant difference existed (80% vs. 85%, median percentage of the same rotamer; Figure 4—figure supplement 1B).

While the scattering contributions of B-factor and occupancy changes are subtle (but distinct), most models likely include true occupancy changes as elevated B-factors. We observed a wide range of average ligand B-factors and, as expected, a lack of correlation between the ligand B-factors and ligand occupancy (van Zundert et al., 2018; Bhat, 1989; Carugo, 1999). As a proxy for likely partially occupied ligands, we normalized the ligand B-factor by the mean C-alpha B-factor to identify ligands with higher B-factors than expected (Figure 4—figure supplement 1C). We examined the outer two quartiles of the normalized ligand B-factors (>0.016 vs. <0.005, median normalized B-factor). In these ‘likely partially occupied’ ligands, we observed greater conformational heterogeneity (0.86 vs. 0.80, mean order parameter; p=1.6 × 10–11; individual Mann–Whitney U test, Figure 4E). In structures with modeled partially occupied ligands and likely partially occupied ligands, we learned that binding site residues tend to have more apparent conformational heterogeneity, likely due to a combination of compositional and conformational heterogeneity.

Conformational heterogeneity for multiple ligands to CDK2

To better understand our findings in the context of multiple, diverse ligands binding to a single receptor, we examined cyclin-dependent kinase 2 (CDK2), a cyclin kinase family that regulates the G1 to S transition in the cell cycle. Our dataset contains 13 protein inhibitor complexes, including both type I and type II inhibitors, all of which share the same apo model (PDB ID: 1PW2). We hierarchically clustered the residues and ligands by difference in order parameters between the holo and apo models, identifying three distinct clusters of residues. The first cluster (blue, Figure 5A and B), consisting of 13 residues, is rigidified upon ligand binding. This cluster included residues scattered throughout both the N- and C-lobes of CDK2 that rigidified upon ligand binding. Two notable residues in this cluster, Glu127 and Val18, contact the inhibitors. Upon ligand binding, Val18 transitions from multiple conformers to a single conformation. Glu127 has a similar conformation in the apo and type II structures of two distinct alternative side-chain rotamers, whereas in the type I inhibitor structure, the alternative conformers cluster around the same rotamer (Figure 5—figure supplement 1A and B).

Figure 5 with 2 supplements see all
Conformational change and heterogeneity in CDK2.

(A) The clustermap of all residues in the 13 CDK2 protein/ligand pairs. Red values indicate a negative difference (holo-apo) in order parameters, indicating that the holo structures have more heterogeneity compared to the apo. Blue values indicate positive differences, indicating that the apo structures have more heterogeneity compared to the holo. We highlighted three important clusters, the left red cluster, middle salmon cluster, and right blue cluster. (B) A representative structure (PDB: 3QTW) is shown with each residue colored by the difference in order parameter, corresponding to the same coloring scheme as the clustermap. The three distinct clusters (dark red, salmon, blue) are shown in spheres. (C) Many of the key differences between type I inhibitor (PDB: 3QTW) and type II inhibitor (PDB: 1PXI) are located in the DFG motif, P-loop, and activation loop. The type II inhibitor structure is colored in gray, and the type I inhibitor is colored as the difference in order parameters between the type I inhibitor and type II inhibitor structures. Red signifies a more dynamic region in the type I inhibitor structure, and blue signifies a less dynamic region in the type I inhibitor structure. Changes in the DFG motif, propagates changes, both structural and in dynamics, in the P-loop (highlighted by Tyr15), which propagates even larger changes in the activation loop between the two inhibitors, including changes in conformation of Thr161, the phosphorylation site of CDK2. (D) Threonine 161, the phosphorylation site for CDK2. We looked at the supporting density for specific residues between the apo (PDB: 1PW2, purple), type II (PDB: 1PXI, teal), and type I (PDB: 3QTW, salmon) inhibitors. 2Fo-Fc electron density is shown at 1 sigma. Occupancies of the alternative conformers are labeled with the corresponding color. The apo structure has multiple conformations, whereas the type I model only has one, and the type II model has two very similar conformations, but these are in different rotamer states compared to the apo.

The second cluster (salmon, Figure 5A and B) consists of 14 residues that increase flexibility upon ligand binding. The majority of these residues connect the P-loop and the activation loop (Figure 5B). The electron density is very weak for many of these residues in most of the holo structures, driving their modeling in multiple conformations and elevated B-factors (Figure 5—figure supplement 1C). We also observed that many of these residues had side chain to side chain hydrogen bonds that were lost upon ligand binding (Figure 5—figure supplement 2A). The third cluster (dark red, Figure 5A and B) comprises five residues that became more flexible in all, but two holo datasets, which are the only type II inhibitors in the dataset. These were all located on the activation loop of the kinase (Figure 5C). As type II inhibitors, the two molecules (PDB: 1PXI [ligand: CK1] and PDB: 3QQL [ligand: X03]) bind the DFG out of conformation present in the apo dataset (PDB: 1PW2) and do not have as drastic a rigidifying effect as the type I inhibitors. Notably, these two inhibitors were also smaller than the type I inhibitors and the reduced contacts may also drive some of this effect. We also observed that the hydrogen bonds gained in the holo structure are inhibitor specific (Figure 5—figure supplement 2B and C).

The multiconformer models also provide a structural rationale for these changes. The differences in DFG conformation change the contacts with the P-loop, which allow for greater side-chain flexibility in the ‘up’ form compatible with type I inhibitors. The interface between the P-loop and the activation loop is weaker and residues such as Tyr155 adopt multiple conformations. At the base of the activation loop, Thr161, a critical phosphorylation site, changes conformation, with a rigidifying effect common to both type I and II inhibitors (Figure 5D, Figure 5—figure supplement 2D). The conformation of Thr161 found in the type II inhibitors overlaps, with one of the conformations populated in the multiconformer apo model. In contrast, the type I inhibitors adopt a distinct conformation. This case study highlights how modeling information present in the density can reveal changes beyond those in single-conformer structures.

Discussion

By creating a large dataset of stringent matched pairs of apo and holo multiconformer models, we identified a pattern of conformational heterogeneity consistent with smaller-scale studies of individual proteins (Caro et al., 2021). We observed that individual proteins greatly varied in amount and direction of change of conformational heterogeneity, as observed in previous studies (Caro et al., 2017). In general, we found that binding site residues tend to become more rigid upon ligand binding. But similar to the entire protein, there was a large range of effects, including many sites becoming more flexible when bound to a ligand. The trends suggest that disorder-order transitions between binding site residues and distant residues are common and potentially a selected property of many proteins (Kim et al., 2017). Specifically, our data suggests that some of the entropy lost by the rigidification incurred by binding site residues upon ligand binding can be compensated with an increase in disorder in distant residues. This finding generalizes that the phenomenon has been observed in single-protein analyses with NMR and MD simulation (Wang et al., 2019; Gohlke et al., 2004; Moorman et al., 2012). Both theoretical and experimental analyses suggest that the relationship between local packing optimization and small voids that permit alternative conformations will be key to predictably mapping this relationship (Caro et al., 2021; Bromberg and Dill, 1994). Using temperature or pressure as perturbations during X-ray data collection can help to further map the connection between packing ‘quality’ and side-chain conformational heterogeneity in greater detail (Caro et al., 2021). While NMR order parameter studies only take into account movement that is shorter than the tumbling time for the protein (Hoffmann et al., 2021; Gagné et al., 1998), our results are insensitive to timescale. In addition, it is quite likely that our use of cryo-cooled structures causes an underestimate of the heterogeneity occurring in these datasets (Fraser et al., 2011), and may potentially bias our results by locking in certain populations of the protein ensemble. This effect may particularly impact areas of the protein and surrounding solvent that go from a preorganized, low-energy state to a more dynamic state as observed in galectin-3 and Barnase (Diehl et al., 2010; Caro et al., 2021). This study can also serve as a template to investigate other perturbations, including mutations, pressure, or temperature.

We observed a complex interplay between conformational change and dynamics in our analysis of 13 inhibitor-bound datasets of the kinase CDK2, in the same crystal form and space group. The ability to explore one protein with multiple ligands highlights the utility of crystal systems amenable to isomorphous soaking or co-crystallization (Steuber et al., 2006). We identified differences in conformational heterogeneity between type I and type II inhibitors that can be classified along with well-known changes, such as differences in the DFG motif. Tuning distal site dynamics may be a viable strategy for modulating the affinity of kinase inhibitors and affects the pattern of protein-protein interactions on distal surfaces, which is of critical importance in CDK inhibitor development (Jhaveri et al., 2021; Wood and Endicott, 2018).

We note that our work is not sensitive to many facets of the complex changes associated with ligand binding (Mobley and Dill, 2009). Our stringent resolution matching criterion may also render us blind to the most severe effects on conformational heterogeneity, whereby ligand binding causes a more widespread change, leading to a loss or gain of diffraction power. In addition, water molecules play an important role in ligand binding, both in the release of ordered water molecules contributing to binding through entropy and in forming specific interactions (Breiten et al., 2013; Verteramo et al., 2019). Additionally, ligand conformational heterogeneity has been highlighted by several recent studies (van Zundert et al., 2018; Jain et al., 2020; Caldararu et al., 2021). Another caveat in our analysis is the limitations of qFit modeling for modeling extensive backbone heterogeneity into weak electron density. Ensemble modeling methods, which leverage molecular dynamics for sampling and use a different model representation, may be more appropriate for examining these systems (Burnley et al., 2012; Eshun-Wilson et al., 2019). Future work, integrating the conformational heterogeneity of the protein, ligand, and water molecules, will create better predictions and explanations of the energetics of binding. In addition, this would allow us to interpret the impact of specific interactions and alterations on both the entropy and enthalpy of all components of the system.

Our study, as well as previous NMR studies (Frederick et al., 2007; Caro et al., 2017), only leverages a limited set of side-chain dihedral angles. However, comparisons with molecular dynamics simulations suggest that small sets of side-chain dihedrals alone may be representative of the overall changes in dynamics of the system (Wand and Sharp, 2018; Kasinath et al., 2013; Chatfield and Wong, 2000). What is the thermodynamic impact of restricting side-chain conformational heterogeneity? Protein folding studies and theory indicate that restricting the rotamer of even a single side chain can incur an entropic penalty of binding of ~0.5 kcal/mol (Doig and Sternberg, 1995). While we observe many such restrictions in binding sites due to specific interactions with ligands, our data point to corresponding changes away from the binding site that help balance this cost. Overall, the median increase in rigidity we observe in binding site residues (0.03 order parameter increase) would create an energetic penalty of approximately ~0.1–0.5 kcal/mol, based off of the entropy meter calculated in Caro et al., 2017 and Caro et al., 2021, with outliers having even larger thermodynamic consequences. Given the constraints of maintaining a folded conformational ensemble upon ligand binding, it is likely that ligand binding generally acts to restrict degrees of freedom locally and that protein topological constraints favor increased motion in distal regions (Bromberg and Dill, 1994). This overall effect likely manifests because optimizing affinity is desirable for medicinal chemistry and the selective pressures experienced by many proteins. Such optimization is insensitive as to whether the free energy is driven enthalpically or entropically. However, given the attention paid to designing and optimizing enthalpic interactions, there is likely unleveraged potential in optimizing the entropic component as well. As more sophisticated models of conformational heterogeneity are created and validated (Rosenbaum et al., 2021), the strategy of rationally tuning conformational heterogeneity to improve binding affinity may be an attainable design strategy.

Materials and methods

Dataset

Request a detailed protocol

Our dataset was compiled using a snapshot of the PDB (Berman, 2002) in September 2019, containing 156,187 structures. We then selected structures that had a resolution better or equal to 2 Å (n = 64,557). We also excluded any structure that contained nucleic acids (n = 2280) or covalently bound ligands (n = 1030). We identified holo structures (n = 30,530), defined as those that contained at least one ligand, defined as any HETATM residue with 10 or more heavy atoms, excluding common crystallographic additives.

To create apo/holo pairs, we took each holo structure and compared it to each potential apo structure (n = 30,717), defined as structures without a ligand bound. A pair was defined according to the following criteria:

  • Same space group.

  • Exact sequence or exact sequence after removing the first or last five base pairs of either structure.

  • A resolution difference between the two structures less than 0.1 Å.

  • Dimensions of unit cells do not differ by more than 1 Å

  • Angles of the unit cells do not differ by more than 1°.

This gave us 15,214 pairs. We then subsetted this list down to provide only one apo structure per holo structure, based on the smallest resolution difference. This produced a final pair set of 1205 with 1143 unique structures (Supplementary file 2).

We also created a pairset with 458 unique apo/apo pairs using the same criteria as the apo/holo pairset (Supplementary file 3).

Refinement

Request a detailed protocol

We re-refined all structures using phenix.refine (https://www.phenix-online.org/documentation/reference/refinement.html). This was done using Phenix version 1.17.1–3660. We performed anisotropic refinement on all pairs where both PDBs had a resolution better than 1.5 Å. All other refinement was run isotropically. Refinement used the following parameters:

  • Refine strategy: individual sites + individual adp + occupancies.

  • Number of macro cycles: eight.

  • NQH flips: true.

  • Optimize xyz weight: true.

  • Optimize adp weight: true.

  • Hydrogen refine: riding.

We removed 102 structures because of incompatibility with our re-refinement pipeline due to breaks in the protein chain or ligand incompatibility. We removed 88 structures where the R-free increased by >2.5% compared to the value reported in the PDB header (Supplementary file 1; Figure 1—figure supplement 2D).

Running qFit

Request a detailed protocol

qFit-3.0 (Riley et al., 2021a; version 3.2.0) was run using a composite omit map and the re-refined structure on the default parameters (https://github.com/ExcitedStates/qfit-3.0/). We ran qFit on Amazon Web Services (AWS). We used an autoscaling cluster of images controlled by the scheduler via ParallelCluser. Please see the qFit GitHub for a script that will install qFit on AWS’s default OS image using Conda to install its dependencies.

After qFit, we reran refinement as suggested by qFit-3.0. Briefly, this involves three rounds of refinement. The first refines coordinates only, the second goes through a cyclical round of refinement until the majority of low-occupancy conformers are removed, and the third refinement polishes the structure, including hydrogen. The script used for post-qFit refinement can be found at https://github.com/ExcitedStates/qfit-3.0/blob/master/scripts/post/qfit_final_refine_xray.sh. We removed 100 structures because of incompatibility with refinement after qFit rebuilding.

Quality control

Request a detailed protocol

From our original dataset (n = 1205 pairs), we removed 28 apo structures that had a crystallographic additive or amino acid in the binding site that partially overlaid with the holo structure. We set a minimum ligand occupancy threshold of 0.15, which did not remove any pairs from our dataset. Chains were renamed according to the difference in distance between the two chains. We also renumbered each chain based on the apo structure. We then superimposed the two structures using the PyMOL align function. We measured the alpha carbon RMSD between the two structures as well as the difference in just binding site residues. Structures were removed if the mean RMSD of the entire structure was greater than 1 Å or if the mean RMSD in the binding site residues was greater than 0.5 Å. We removed two pairs based on these RMSD criteria.

We also assessed the difference in R-free values for each refinement step (before/after qFit). If the post-refinement R-free value was 2.5% larger than the pre-refinement R-value, then the structure was removed (n = 85,77 structures removed; Figure 1—figure supplement 2A and B). Additionally, we compared the final R-free values between apo and holo pairs, removing pairs with R-free values with more than a 5% difference (n = 16 pairs removed; Figure 1—figure supplement 2C). We ran the clashscore function out of MolProbity (Williams et al., 2018) to identify severe clashes in our dataset. We removed any structures with a clashscore greater than 15, removing 52 structures. After filtering out pairs that failed our quality checks, our dataset contained 743 matched apo/holo pairs.

Alternative conformations

Request a detailed protocol

Side chains were considered alternative conformers if there was at least one atom that was modeled with an alternative conformer. Our re-refinement procedure changes the occupancy, coordinates, and B-factors of these conformations, but it does not add or delete conformations.

B-factors

Request a detailed protocol

B-factors were assessed on a residue basis by averaging the B-factors of all heavy atoms for each residue. For residues with multiple conformations, we took the mean B-factor for all heavy atoms in each side chain, weighted by occupancy. For structures modeled anisotropically, we used the isotropic equivalent B-factor from Phenix.

Root mean squared fluctuation

Request a detailed protocol

RMSF was chosen over root mean squared deviation as many alternative conformers were predicted to have the same occupancy, thus making it difficult to define which was the main conformer. RMSF was measured for each residue based on all side-chain heavy atoms. RMSF finds the geometric center of each atom in all alternative conformers. It then takes the distance between the geometric mean of each conformer’s side-chain heavy atoms and the overall geometric center. It then takes the squared mean of all of those distances, weighted by occupancy.

Order parameters

Request a detailed protocol

Order parameters were measured for each residue (except proline and glycine) by taking into account both the angle of alternative conformers (s2angle), by measuring the chi1 angle, and the B-factors of alpha or beta carbons along with an attached hydrogen(s2ortho) (Fenwick et al., 2014). To account for differences in B-factors as resolution changes, we investigated the correlation between order parameters in 32 apo lysozyme structures ranging in resolution from 1.1 to 2 Å. We optimized the s2ortho parameter by looking for the normalization that would provide a slope closest to 1 and have the smallest root mean squared error (Figure 3—figure supplement 1A and B, Supplementary file 4). We normalized the s2ortho portion using the following equation:

s2orthonormalized=s2ortho*Bfactoralpha carbon/10(resolution)

The final order parameter reported in the article is

s2calc = s2orthonormalized*s2ang

Rotamer analysis

Request a detailed protocol

Rotamers were determined using phenix.rotalyze (Williams et al., 2018) with manually relaxing the outlier criteria to 0.1%. Each alternative conformation has its own rotamer state. Rotamers were compared on a residue-by-residue basis between the holo and apo, taking into account each rotamer state for each alternative conformation. Residues were classified as ‘no change’ if rotamers matched across the apo and holo residue, ‘distinct’ if the matched residue shared no rotamer assignments. Residues were classified as ‘remodeled-holo loss’ if distinct, additional rotameric conformations were populated in the apo residue only, and ‘remodeled-holo gain’ if distinct, additional rotameric conformations were populated in the holo residue only.

Hydrogen bond analysis

Request a detailed protocol

To assess for the changes in hydrogen bonding across all pairs in our study, we applied HBplus (McDonald and Thornton, 1994) to every multiconfomer structure. HBplus identifies hydrogen bonds when the distance between the hydrogen and acceptor is less than 3.2 Å, with a maximum angle of 90°. Since HBplus, nor any other software program we could identify, looks at hydrogen bonds in reference to alternative conformers, we split up each multiconformer PDB by alternative conformation. For example, the altA PDB contained all atoms that had an alternative conformer A as well as all atoms with no alternative conformation.

We then examined all of the hydrogen bonds for each PDB in binding site residues. We only considered hydrogen bonds between side chains or between side chains and the main chain. Hydrogen bonds were weighted based on the lowest occupancy of the acceptor or donor atom. We then controlled for the number of residues in the binding site.

Solvent-exposed surface area

Request a detailed protocol

We calculated the relative accessible surface area (RASA) using Define Secondary Structure of Proteins (DSSP) (Kabsch and Sander, 1983) with the Tien et al., 2013 definition of Max accessible surface area (MaxASA). Residues with a RASA of ≥20% were considered solvent exposed (Wu et al., 2017).

Ligand analysis

Request a detailed protocol

We obtained the ligand properties using RDkit (version 3/2/2021) by importing SDF files of each ligand in our dataset. To account for the multiple hypothesis testing, we applied a Bonferroni correction, with an alpha of 0.05, as we were testing 10 hypotheses, leaving us with a corrected significance value of 0.005.

Occupancy of the ligands was taken directly from the PDB file and corresponds to the ligand occupancy from the deposited structure. Ligand B-factors were normalized using the mean alpha carbon B-factor of all residues in the structure.

If there were multiple ligands of interest in a structure, we looked at the properties of the ligand and surrounding protein residues in chain A or in the lowest alphabetical chain.

Protein-type analysis

Request a detailed protocol

Protein names and enzyme names were extracted from UniProt Consortium, 2015. Names and properties were connected using PDB IDs.

Ringer analysis

Request a detailed protocol

Indvidual residues in the CDK2 structures were run through Ringer using mmtbx.ringer. Outputs from the csv file were then plotted using Matplotlib.

Statistics

Paired Wilcoxon test was used for all matched properties (comparing holo vs. apo matched residues or structures). Individual Mann–Whitney U test was used for all non-match properties, including ligand properties. Two-sided t-test was used to compare the significance of the slopes.

Code

Request a detailed protocol

Code can be found in the following repositories:

Data availability

Refined models are available here: https://doi.org/10.5281/zenodo.6474333. Code can be found in the following repositories: https://github.com/fraser-lab/Apo_Holo_Analysis (copy archived at swh:1:rev:c92e50c121624b2e4ce440586225b8d7c48dfe38) and https://github.com/ExcitedStates/qfit-3.0.

The following data sets were generated
    1. Wankowicz SA
    2. de Oliveira SHP
    3. Hogan DW
    4. van den Bedem H
    5. Fraser JS
    (2021) Zenodo
    Ligand binding remodels protein side chain conformational heterogeneity.
    https://doi.org/10.5281/zenodo.6474333

References

Decision letter

  1. Axel T Brunger
    Reviewing Editor; Stanford University School of Medicine, Howard Hughes Medical Institute, United States
  2. Volker Dötsch
    Senior Editor; Goethe University, Germany

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Ligand binding remodels protein side chain conformational heterogeneity" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Volker Dötsch as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

1. A discussion of the forces that dominate ligand binding affinities in the context of the analysis of these matching pairs would be desirable. In particular, H-bond networks in the apo and holo forms inferred from the structural coordinate should be compared and included in the structural characterizations of rearrangements due to ligand binding. Moreover, NMR and neutron diffraction (doi: 10.1021/acs.biochem.5b00387; doi: 10.1126/sciadv.aav0482) suggest a global rearrangement of the H-bond network follows ligand binding, at least in the case of kinases. Please analyze and discuss H-bond networks.

2. What is the ligand affinity distribution and is there a correlation of affinity with the effects observed? We realize that ligand affinity is not easily accessible in the literature and may require collection "by hand" and could be an unreasonable task for the entire database of structures. However, a reasonable subset of complexes could be examined from this point of view. For example, for affinities for the Galectin system cover a wide range.

3. The "no net contribution" claim for conformational entropy is at odds with Caro et al. 2017. Possible reasons are incomplete representation of solution behavior at cryogenic temperatures and the inability of the approach used here to monitor the three "classes" of rotamer motion sensed by NMR and MD. Please comment and discuss.

4. The choice of defining the active site as all heavy atoms within 5 Å of a ligand heavy atom seems reasonable, but given the importance of this definition to the result, some exploration {plus minus} 5 Å seems warranted. For a given active site distance definition, what fraction of the residues included are considered direct contacts with substrate? How does this influence rigidity? Presumably, a plot of distance threshold vs. number of ligand contacts made over the dataset could help justify this. A related issue concerns the use of different non-binding site definitions. Several terms are mentioned-distant-binding, and far-binding-and they seem to be used interchangeably in the text and in figure labels/captions. No criteria are explicitly mentioned in one case, so this is presumably all residues outside of 5 Å (line 318, Supplementary Figure 7B), while a 10 Å, <20% solvent exposure criteria is invoked later (line 323, Figure 3C). It would be helpful to clean up this terminology and expand on the latter criteria. To what degree does the effect observed depend on this distance cutoff? It appears as though Supplementary Figure 7C and Figure 3C are quite similar-how sensitive is the difference in slopes between the two to this cutoff? Concerning the sensitivity of the linear fits, are the slopes and correlations sensitive to the exclusion of randomly chosen holo-apo pairs? Additionally, are any proteins small or oddly shaped enough to either be effectively excluded by this criteria and, more generally, how does protein size correlate with the observed effect? Please investigate the sensitivity of the conclusions to the particular selections and criteria.

5. The discussion (186-197) regarding analysis of the redistribution of rotamers and the later connection to entropy somewhat glosses over the issues of resolving not only major alternate rotamers (the focus here) but also motion within a rotamer well with or without occasional crossing into another rotamer carries significant entropy and that changes between these latter classes can contribute significant to binding entropy (Frederick et al. 2007; Caro et al. 2017; Rajeshwar et al. J. Phys. Chem. B 2021, 125, 9641). Please comment and discuss.

6. The observation (269-275) that the net change in "conformational heterogeneity" is near zero is in conflict to a room temperature observation in solution that changes in fast motion (conformational entropy) can strongly oppose, be negligible, or strongly favor the thermodynamics of ligand binding (Caro et al. 2017). Please comment and discuss.

7. The section describing conformational change and heterogeneity in CDK2 is appreciated. However, the authors allude to agreement with the work of Kim et al. 2017 on PKA but show no quantitative comparison with it (or other relatively recent work on PKA). More generally, the authors note that the results of this work are consistent with studies in various specific systems, but no direct comparisons are made. For example, Moorman et al. 2012 cited by the authors described the rigidification of lysozyme in the presence of the ligand based on NMR experiments. Figure 6 from that paper is directly analogous to the work presented in this manuscript. According to Supplementary Figure 4C of this manuscript, lysozyme is the second most represented protein in the author's dataset, but no comparisons to the work of Moorman et al. (2014) who provided a fitted fall off distance dependence for the binding of substrate (inhibitor) binding to Ras Cdc42 and also showed the presence of "sidedness" and vectoral (channeled) transmission of dynamical perturbation. Please comment and discuss.

8. While very detailed comparisons of order parameters from NMR experiments and crystallography data have been made previously in Fenwick et al. 2013 and are perhaps outside of the scope of this paper, some demonstration of agreement between the new results presented here and existing work would be much appreciated. More generally, a number of previous studies have explored the idea of propagated structural change in response to not only ligand binding but other perturbations such as pressure, temperature, saturation mutagenesis, etc.; are the spatial patterns reported here consistent with results from more orthogonal approaches in any of the proteins in the dataset? Please comment and discuss.

9. On the interface between ligand and protein, the statement (389-391) "an intuitive general picture emerges where more specific interactions, such as hydrogen bonds, are correlated with more rigid binding site residues, whereas the more non-specific interactions are correlated with more flexible binding site residues." needs some qualification. Why is a VDW contact considered "non-specific"? Do the authors imply that buried polar interactions are more stable/rigid than simple non-polar VDW interactions? Complementarity of motion in small molecule ligands is difficult to probe, as noted, but has been examined in protein-protein interactions, particularly in the calmodulin complexes (Marlow et al. Nat Chem Biol 6, 352). Please comment and discuss.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Ligand binding remodels protein side chain conformational heterogeneity" for further consideration by eLife. Your revised article has been evaluated by Volker Dötsch (Senior Editor) and a Reviewing Editor.

The manuscript has been much improved but there are some remaining issues that need to be addressed, as outlined below:

The authors explore the effect of different active site definitions on the change in binding site order parameters upon ligand binding. While the authors originally reported this response at a fixed cutoff of 5 Å, they have included new results exploring the change at integer-valued distances from 2-10 Å. This analysis reveals a trend perhaps consistent with expectation, where the median change in order parameter starts high with very tight active site definitions and falls off with increasingly large active sites. In the absence of any surprising features on this curve, the 5 Å cutoff seems reasonable. Please include this figure in the final manuscript (perhaps as a supplemental figure).

The authors evaluated the robustness of the linear regressions between order parameters of non-solvent exposed residues versus average order parameters of binding site residues for holo-apo and apo-apo pairs via bootstrap analysis. F3S2E clearly shows that the mean slopes are robust to exclusion of data in either case. This is a compelling result that greatly improves the paper. Perhaps this result should be emphasized further in the paper.

The authors demonstrated that the majority of the proteins under consideration appear to have an active site composed of a fraction of the total residues less than half of the total number of residues total, and that the number of outliers in this comparison (i.e., with more active site residues than distant residues) is small and thus unlikely to bias the result. The question "how does protein size correlate with the observed effect" is left unanswered and may be scientifically interesting, but perhaps the analysis performed is sufficient to demonstrate that there are at least no pathologies associated with active site definitions. Please comment.

Regarding the comparison to the PKA data from Kim et al. 2017 in point 7, the authors note that a direct comparison is challenging, due to the specifics of the different experiments. The authors clarify that they mean that the overall nature of the changes upon ligand binding are similar (i.e., rigidification throughout the protein). Please clarify the word "pattern" here, as it suggests something more residue-specific, although the overall statement seems fair now.

The addition of F3S2F in point 7 is welcome, although side-by-side comparison with Moorman et al. 2012 F6 is challenging to interpret. Moorman et al. identify a specific subset of contiguous residues in lysozyme which rigidify upon ligand binding, and several that relax. Do the authors intend to imply that the same residues rigidify with ligand binding as in the previous study, or are they saying that there is some more general spatial pattern that is similar? Is there no quantitative comparison that can be made? Explicitly, do the ω-class residues that show collective rigidification upon ligand binding (A11, V29, I55, L83, A90, V92, M105, A107) also rigidify based on the analysis presented here? Do the peripheral residues that relax upon ligand binding (L25, T69) also relax in the analysis presented here? It is very difficult to make such an assessment by staring at F3S2F. There may not be a matching crystal structure for chitotriose, but one might expect some overlap in the residues which rigidify or relax the most for other similar ligands. Furthermore, the question about a similar comparison with Moorman et al. 2014 for Ras Cdc42 still needs to be addressed. Please comment and discuss.

Please provide some form of quantitative comparison between the Moorman et al. 2012 results and the results here for lysozyme.

https://doi.org/10.7554/eLife.74114.sa1

Author response

Essential revisions:

1. A discussion of the forces that dominate ligand binding affinities in the context of the analysis of these matching pairs would be desirable. In particular, H-bond networks in the apo and holo forms inferred from the structural coordinate should be compared and included in the structural characterizations of rearrangements due to ligand binding. Moreover, NMR and neutron diffraction (doi: 10.1021/acs.biochem.5b00387; doi: 10.1126/sciadv.aav0482) suggest a global rearrangement of the H-bond network follows ligand binding, at least in the case of kinases. Please analyze and discuss H-bond networks.

To assess for the changes in hydrogen bonding across all pairs in our study, we applied HBplus (McDonald, I. K., and Thornton, J. M. (1994). Satisfying hydrogen bonding potential in proteins. Journal of molecular biology, 238(5), 777-793.) to every multiconfomer structure. HBplus identifies hydrogen bonds when the distance between the hydrogen and acceptor are less than 2.5 Å, with a maximum angle of 90 degrees. Since HBplus, nor any other software program we could identify, does not look at hydrogen bonds in reference to alternative conformers, we split up each multiconformer PDB by alternative conformation into separate single conformer PDB files. For example, the alt A PDB contained all atoms that had an alternative conformer A as well as all atoms with no alternative conformation, the alt B PDB contained all atoms contained all atoms that had an alternative conformer B as well as all atoms with no alternative conformations, etc.

We then examined all of the hydrogen bonds for each PDB across the entire PDB as well as in just binding site residues. We only considered hydrogen bonds between side chains or between side chains and the main chain. Hydrogen bonds were weighted based on the lowest occupancy of the acceptor or donor atom. We then controlled for the number of residues in the binding site. Figure 3- Supplement Figure 3 shows the distribution of the difference of hydrogen bonds between holo and apo structures (A). We observed a net gain of 0.06 hydrogen bonds per residue in holo binding sites, with ~10% of structures gaining one full hydrogen bond in the holo structure. This is indicative of a trend towards more conformationally stable binding sites in holo structures.

As an example of an outlier pair, we looked at (Holo: 2PLL, Apo: 2PHA) that had 2.2 more hydrogen bonds (averaged by occupancy) in the holo compared to the apo. We identified the three residue pairs that had hydrogen bonds in the holo that were not in the apo structure.

All of these were due to some of the conformations of one or both of the side chains having a dramatically different confirmation in the holo (purple) compared to the apo (green). In the first pair of residues (B), we observe W118 gains a hydrogen bond with H122 due to a conformation that is not present in the apo ensemble. In the second pair of residues (C), K64 in the apo structure does not make any hydrogen bonds with S133, but the last chi angle of K64 is more constrained in the holo ensemble. In the third pair of residues (D), alt A and B in H97 of the apo structure have a much different conformation from H97 in the holo structure. We have included the following paragraph in the paper (line 451-460) and included Figure 3- Supplement Figure 3.

The two kinase papers mentioned by the reviewers use HDX and Neutron crystallography to show that there are differences in the strengths of various hydrogen bonds. The largest conclusions from that are that: (1) the backbone hydrogen bonds in the DFG region change depending on activation status; (2) that outside the DFG, most H-bonds get stronger in the “catalytic” complex relative to the “apo” complex. Direct comparison to our data is complicated: PKA is quite diverged from CDK-2 and we are examining inhibited complexes, not models of the “catalytic” complex. Our analysis of hydrogen bonds in CDK-2 reveals a complex pattern. Many residues lose hydrogen bonds, particularly in loop regions including the activation loop (A) and there is a difference in hydrogen bonds gained that depends greatly on the inhibitor.

We have added the following to the text of the paper in lines (635-637 and 643-645):

‘We also observed that many of these residues had sidechain to sidechain hydrogen bonds that were lost upon ligand binding (Figure 5—figure supplement 2A).’

‘We also observed that hydrogen bonds gained in the holo structure are inhibitor specific (Figure 5—figure supplement 2B/C).’

We have also added Figure 5—figure supplement 2.

2. What is the ligand affinity distribution and is there a correlation of affinity with the effects observed? We realize that ligand affinity is not easily accessible in the literature and may require collection "by hand" and could be an unreasonable task for the entire database of structures. However, a reasonable subset of complexes could be examined from this point of view. For example, for affinities for the Galectin system cover a wide range.

To examine the binding affinities of the ligands in our dataset, we connected the PDB codes to ChEMBL (https://www.ebi.ac.uk/chembl/) to obtain binding data. From here, we only examined Ki as this was the most abundant type of binding information we had. We had 105 ligand-protein pairs with binding data. The range of affinities varied greatly from 0.02 nM to 486000 nM. We correlated the log(ki) values with both the binding site order parameters for the holo protein as well as the difference in binding site order parameters between the apo and holo protein. We did not observe any correlation here (see Author response image 1).

Author response image 1
The relationship between order parameters and binding affinity.

(A) The difference in average binding site order parameters (holo-apo) are not correlated with the inhibitory constant (Ki) of binding. (B) The average order parameter of the holo binding site residues are not correlated with the inhibitory constant (Ki) of binding.

We also examined binding affinities of CDK2 and Trypsin, of which we were able to obtain binding data from six and 10 protein-ligand pairs, respectively. The binding data from each of these analyses were obtained from the original paper and used the same concentration of natural ligand to determine the ki. Again, we were not able to detect any correlation between our metrics of conformational heterogeneity and the binding affinity. These results are not unexpected given the diversity of ligands we have in our dataset – and in our view – do not detract from the idea that altering protein conformational heterogeneity in response to changes within a ligand series is expected to alter binding thermodynamics. We would expect the residual order parameter to be most correlated in the context of a congeneric series of ligands, not the diverse ligands that are reported in the literature.

Author response image 2
We looked at the correlation between order parameters and the inhibitor constant (Ki) of binding for the five CDK2 protein-inhibitor complexes we had data for.

We did not observe any trends in this data. (A) The difference in average binding site order parameters (holo-apo) are not correlated with the inhibitory constant (Ki) of binding. (B) The average order parameter of the holo binding site residues are not correlated with the inhibitory constant (Ki) of binding. (C) The relationship between the residual order parameter (Distant Residues – Binding Site Residues).

Author response image 3
We looked at the correlation between order parameters and the inhibitor constant (Ki) of binding for the 10 trypsin protein-inhibitor complexes we had data for.

We did not observe any trends in this data. (A) The difference in average binding site order parameters (holo-apo) are not correlated with the inhibitory constant (Ki) of binding. (B) The average order parameter of the holo binding site residues are not correlated with the inhibitory constant (Ki) of binding. (C) The relationship between the residual order parameter (Distant Residues – Binding Site Residues).

3. The "no net contribution" claim for conformational entropy is at odds with Caro et al. 2017. Possible reasons are incomplete representation of solution behavior at cryogenic temperatures and the inability of the approach used here to monitor the three "classes" of rotamer motion sensed by NMR and MD. Please comment and discuss.

We concluded that on average, across our 743 pairs, there was no net trend in whether conformational entropy increased or decreased upon ligand binding. However, the spread away from the average trend for individual proteins suggests that there is a large distribution of changes in conformational entropy for individual proteins which is in line with Caro et al., which states ‘Conformational entropy can highly disfavor, have no effect on, or strongly favor association and it is often a large determinant of the thermodynamics of binding’.

We have updated the text in the discussion to clarify this (lines 662-664).

‘We observed that individual proteins greatly varied in amount and direction of change of conformational heterogeneity, as observed in previous studies (Caro et al., 2017).’

4. The choice of defining the active site as all heavy atoms within 5 Å of a ligand heavy atom seems reasonable, but given the importance of this definition to the result, some exploration {plus minus} 5 Å seems warranted. For a given active site distance definition, what fraction of the residues included are considered direct contacts with substrate? How does this influence rigidity? Presumably, a plot of distance threshold vs. number of ligand contacts made over the dataset could help justify this.

Note, we decided on the 5Å cut off before exploring our results. So, while this analysis shows an even stronger trend at other distances, we will keep the focus of the analysis on the 5Å cut off in the text.

To explore the difference in distance threshold, we looked at the difference in binding site order parameters (holo-apo) at nine different cutoffs (2-10Å). We see a general trend of the smaller the distance threshold, the more pronounced the increased in order parameters from apo to holo (represented by a positive value). See Figure 3—figure supplement 2.

A related issue concerns the use of different non-binding site definitions. Several terms are mentioned-distant-binding, and far-binding-and they seem to be used interchangeably in the text and in figure labels/captions.

We have gone through the paper and revised the language related to the non-binding site definitions to make sure they are uniformly labeled with “distant residues” versus “binding site residues”.

No criteria are explicitly mentioned in one case, so this is presumably all residues outside of 5 Å (line 318, Supplementary Figure 7B), while a 10 Å, <20% solvent exposure criteria is invoked later (line 323, Figure 3C). It would be helpful to clean up this terminology and expand on the latter criteria. To what degree does the effect observed depend on this distance cutoff?

We also included the following sentence in the Results section clarifying which residues were included in Figure 3- Supplement Figure 2 (line 409-410).

‘Therefore, we explored the relationship between binding site residues and distant residues, defined as those more than 10Å away from any heavy atom in the ligand.’

It appears as though Supplementary Figure 7C and Figure 3C are quite similar-how sensitive is the difference in slopes between the two to this cutoff? Concerning the sensitivity of the linear fits, are the slopes and correlations sensitive to the exclusion of randomly chosen holo-apo pairs?

Author response image 4 shows the two plots on top of each other demonstrating that the slope of the holo versus apo is steeper (-0.44 versus -0.28). This emphasizes our conclusion that the ligand perturbs a naturally occurring phenomenon seen in apo proteins alone.

Author response image 4
We plotted the distribution of the average difference in order parameters in binding site residues (holo-apo/apo-apo) versus the residual order parameter distant residues versus binding site residue to highlight the slight difference in their slopes (-0.28, apo/apo versus -0.44 apo/holo).

Additionally, we have calculated the slope of the holo versus apo line in a bootstrap manner. Here we have calculated the slope 10,000 times based on a random set of 743 datapoints, allowing for replacement. We also did the same analysis with the apo-apo dataset. Figure 3- Supplement Figure 3 shows a histogram of the holo-apo bootstrapped slope values versus the apo-apo bootstrapped slope values.

The average holo-apo slope over the 10,000 bootstrapped samples was -0.44, with a standard deviation of 0.029. The average apo-apo slope over the 10,000 bootstrapped samples was -0.29, with a standard deviation of 0.072. While there is some overlap in the bootstrapped values, the apo-apo slope mean value is more than two standard deviations away from the holo-apo mean slope value. Comparing this using a z-test, the z-value was -191.26 with a p-value of 0.0.

Additionally, are any proteins small or oddly shaped enough to either be effectively excluded by this criteria and, more generally, how does protein size correlate with the observed effect? Please investigate the sensitivity of the conclusions to the particular selections and criteria.

To assess for the differences in the number of residues considered in the binding site versus in the distant, non-solvent exposed, we calculated the ratio of the binding site residues versus the distant non-solvent exposed residues. The majority of PDBs had more residues in the distant non-solvent (represented as those with a ratio under 1 in the Author response image 5). There were a few outliers with very few distant non-solvent exposed residues. These represented both small proteins and PDBs with the ligand buried deep into the center of the protein (example: 2RCT, ligand in salmon, Autor response image 5). We then checked the δ OP in the outliers (>2 ratio between binding/distant residue), which are colored in red. They all fall within the distribution we observed from other samples.

Author response image 5

(A)The distribution of the ratio of binding versus distant residues in all structures in our dataset (B) The distribution of the relationship between the difference in binding site order parameters and distant order parameters. The outliers (>2 residue ratio are highlighted in red). (C) An example of a structure (PDB ID: 2RCT) with very few distant residues, one of the outliers in our dataset.

5. The discussion (186-197) regarding analysis of the redistribution of rotamers and the later connection to entropy somewhat glosses over the issues of resolving not only major alternate rotamers (the focus here) but also motion within a rotamer well with or without occasional crossing into another rotamer carries significant entropy and that changes between these latter classes can contribute significant to binding entropy (Frederick et al. 2007; Caro et al. 2017; Rajeshwar et al. J. Phys. Chem. B 2021, 125, 9641). Please comment and discuss.

We agree that there is significant entropy within rotamer wells, which we emphasize in the discussion of B-factors and the incorporation of B-factors in the crystallographic order parameter calculation. Since the bulk of the results focus on order parameters which combine the motion between and within rotamer wells, we added the following section to the introduction of order parameters to further drive this message home (lines 316-317).

‘Order parameters allow us to capture the conformational entropy both within and between side chain rotamer wells.’

6. The observation (269-275) that the net change in "conformational heterogeneity" is near zero is in conflict to a room temperature observation in solution that changes in fast motion (conformational entropy) can strongly oppose, be negligible, or strongly favor the thermodynamics of ligand binding (Caro et al. 2017). Please comment and discuss.

As discussed in point 3 above, we concluded that on average, across our 743 pairs, there was no trend in whether conformational entropy increased or decreased upon ligand binding for the entire dataset. However, across the pairs we observed a large distribution of changes in conformational entropy which is in line with Caro et al., which states ‘Conformational entropy can highly disfavor, have no effect on, or strongly favor association and it is often a large determinant of the thermodynamics of binding’.

We have updated the text in the discussion to clarify this (lines 662-664).

‘We observed that individual proteins greatly varied in amount and direction of change of conformational heterogeneity, as observed in previous studies (Caro et al., 2017).’

7. The section describing conformational change and heterogeneity in CDK2 is appreciated. However, the authors allude to agreement with the work of Kim et al. 2017 on PKA but show no quantitative comparison with it (or other relatively recent work on PKA). More generally, the authors note that the results of this work are consistent with studies in various specific systems, but no direct comparisons are made.

We wanted to make the connection between our findings of the dispersion and connectedness of changes observed in Kim et al., however, due to sequence differences and the fact that Kim et al. only measured certain residues, we are unable to compare these quantitatively. We have clarified that the consistency is due to the widespread nature of changes in conformational dynamics in kinases systems as a function of ligand state. We have amended the following sentence in lines 580-583.

‘This dispersed pattern is similar to the trend of rigidification that is observed by NMR in PKA upon substrate binding, suggesting that changes in conformational dynamics in kinases systems are structurally dispersed as a function of ligand state (Kim et al. 2017).’

For example, Moorman et al. 2012 cited by the authors described the rigidification of lysozyme in the presence of the ligand based on NMR experiments. Figure 6 from that paper is directly analogous to the work presented in this manuscript. According to Supplementary Figure 4C of this manuscript, lysozyme is the second most represented protein in the author's dataset, but no comparisons to the work of Moorman et al. (2014) who provided a fitted fall off distance dependence for the binding of substrate (inhibitor) binding to Ras Cdc42 and also showed the presence of "sidedness" and vectoral (channeled) transmission of dynamical perturbation. Please comment and discuss.

To compare the overall trend of differences in order parameters in a lysozyme bound to chitotriose (as observed in Moorman et al. 2012), we compared the difference in order parameters between holo and apo structure of 4XEN, which is bound to acetylchitotetraose. We observed a similar trend to Moorman et al. 2012, where there was a central core of residues that become more rigid upon ligand binding (blue).

8. While very detailed comparisons of order parameters from NMR experiments and crystallography data have been made previously in Fenwick et al. 2013 and are perhaps outside of the scope of this paper, some demonstration of agreement between the new results presented here and existing work would be much appreciated. More generally, a number of previous studies have explored the idea of propagated structural change in response to not only ligand binding but other perturbations such as pressure, temperature, saturation mutagenesis, etc.; are the spatial patterns reported here consistent with results from more orthogonal approaches in any of the proteins in the dataset? Please comment and discuss.

While we agree that these comparisons are needed, we believe they are beyond the scope of this paper. However, given the importance of this analysis, we have added to the discussion of how we would approach this (lines 686-687).

‘This study can also serve as a template to investigate other perturbations including mutations, pressure, or temperature.’

9. On the interface between ligand and protein, the statement (389-391) "an intuitive general picture emerges where more specific interactions, such as hydrogen bonds, are correlated with more rigid binding site residues, whereas the more non-specific interactions are correlated with more flexible binding site residues." needs some qualification. Why is a VDW contact considered "non-specific"? Do the authors imply that buried polar interactions are more stable/rigid than simple non-polar VDW interactions? Complementarity of motion in small molecule ligands is difficult to probe, as noted, but has been examined in protein-protein interactions, particularly in the calmodulin complexes (Marlow et al. Nat Chem Biol 6, 352). Please comment and discuss.

Hydrogen bonds have much stricter distance and angular dependencies compared to van der Waal interactions, making them more stable and rigid. We have updated the text to reflect this thinking, pointing to computational chemistry literature that comments more about the difference between van der Waal interactions and hydrogen bonding in conformational stability in molecular design (lines 528-532).

‘From these results, an intuitive general picture emerges where more specific, directional interactions, such as hydrogen bonds (Bissantz et al. 2010), are more likely to lock the corresponding protein residue in place, thus creating more rigid binding site residues(Majewski et al. 2019). Whereas the more non-specific interactions are correlated with more flexible binding site residues.’

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

The manuscript has been much improved but there are some remaining issues that need to be addressed, as outlined below:

The authors explore the effect of different active site definitions on the change in binding site order parameters upon ligand binding. While the authors originally reported this response at a fixed cutoff of 5 Å, they have included new results exploring the change at integer-valued distances from 2-10 Å. This analysis reveals a trend perhaps consistent with expectation, where the median change in order parameter starts high with very tight active site definitions and falls off with increasingly large active sites. In the absence of any surprising features on this curve, the 5 Å cutoff seems reasonable. Please include this figure in the final manuscript (perhaps as a supplemental figure).

We agree that this figure should be included in the supplement. It is now included as Figure 3- Supplementary Figure 2. Additionally, the following is in the text at line 331-333:

We also explored the different binding site cut off values, ranging from 2-10Å observing that the tighter the binding site definition, the more drastic the difference in order parameters between holo and apo pairs (Figure 3- Supplement Figure 2).

The authors evaluated the robustness of the linear regressions between order parameters of non-solvent exposed residues versus average order parameters of binding site residues for holo-apo and apo-apo pairs via bootstrap analysis. F3S2E clearly shows that the mean slopes are robust to exclusion of data in either case. This is a compelling result that greatly improves the paper. Perhaps this result should be emphasized further in the paper.

We agree that this should be included in the paper. We have added the following in the manuscript on line 450-454:

The difference in the slope between the holo-apo and apo-apo dataset was further compared using a bootstrap analysis, demonstrating that the mean slope of the holo-apo is more than two standard deviations away from the apo-apo slope, representing the robustness in differences between the two slopes(p=0.0, z-test; Figure 3—figure supplement 3E).

The authors demonstrated that the majority of the proteins under consideration appear to have an active site composed of a fraction of the total residues less than half of the total number of residues total, and that the number of outliers in this comparison (i.e., with more active site residues than distant residues) is small and thus unlikely to bias the result. The question "how does protein size correlate with the observed effect" is left unanswered and may be scientifically interesting, but perhaps the analysis performed is sufficient to demonstrate that there are at least no pathologies associated with active site definitions. Please comment.

To analyze the impact that protein size has on the relationship between the difference in order parameters in binding site residues versus the residual order parameters in distant residues, we binned proteins based on the number of residues.

We colored each point on our original binding site residues versus the residual order parameters in distant residues based on the protein size and did not observe any clustering. We have included these results in Figure 3- Supplemental Figure 4A.

The following line was added to the manuscript on lines 430-432:

We also explored if protein size impacts our results, but did not observe any trend between protein size and order parameter correlation (Figure 3—figure supplement 3E)

Regarding the comparison to the PKA data from Kim et al. 2017 in point 7, the authors note that a direct comparison is challenging, due to the specifics of the different experiments. The authors clarify that they mean that the overall nature of the changes upon ligand binding are similar (i.e., rigidification throughout the protein). Please clarify the word "pattern" here, as it suggests something more residue-specific, although the overall statement seems fair now.

We believe that the line that discusses the comparison in the patterns observed in PKA and our kinases series adequately states that the trends are similar but not specific residues, as the residue between these proteins are different.

‘This dispersed pattern is similar to the trend of rigidification that is observed by NMR in PKA upon substrate binding, suggesting that changes in conformational dynamics in kinases systems are structurally dispersed as a function of ligand state (Kim et al. 2017)

The addition of F3S2F in point 7 is welcome, although side-by-side comparison with Moorman et al. 2012 F6 is challenging to interpret. Moorman et al. identify a specific subset of contiguous residues in lysozyme which rigidify upon ligand binding, and several that relax. Do the authors intend to imply that the same residues rigidify with ligand binding as in the previous study, or are they saying that there is some more general spatial pattern that is similar? Is there no quantitative comparison that can be made? Explicitly, do the ω-class residues that show collective rigidification upon ligand binding (A11, V29, I55, L83, A90, V92, M105, A107) also rigidify based on the analysis presented here? Do the peripheral residues that relax upon ligand binding (L25, T69) also relax in the analysis presented here? It is very difficult to make such an assessment by staring at F3S2F. There may not be a matching crystal structure for chitotriose, but one might expect some overlap in the residues which rigidify or relax the most for other similar ligands. Please comment and discuss.

Please provide some form of quantitative comparison between the Moorman et al. 2012 results and the results here for lysozyme.

To compare our results with Moorman 2012, we looked at the difference in order parameters (holo-apo) from an X-ray lysozyme structure (PDB ID: 4XEN and 4WM2), and from the data deposited with Moorman 2012 (BMRB accession codes 18304 and 18305). We compared the carbon order parameters from Moorman 2012, taking the first chi angle when multiple were measured. Only 4 out of the 8 residues that became completely rigid upon ligand binding in Moorman et al. 2012 [maroon] however the difference in order parameters we observed were much smaller (range: -0.21 to 0.23). Both L25 and T69 [orange] become more flexible upon ligand binding, however, again we see a much smaller difference (-0.09 and -0.113, respectively). Overall, we observed a weak correlation between these two datasets (slope=0.32). These results are summarized in Figure 3- Supplement Figure 4B and Supplementary Table 5.

While we agree it is good to look at general trends, the differences in timescale (NMR is sensitive only to fast motions), ligands, and crystal contacts/solution state make it difficult to compare specific residues from this paper to our investigation here.

The following was added to the manuscript at lines 469-470:

‘Specifically, this difference is greatest between binding residues and non-solvent exposed residues, as previously observed in lysozyme, however there was only weak residue to residue correlation (Figure 3—figure supplement 4A, Figure Supplement 4B) (Moorman, Valentine, and Wand 2012).’

Furthermore, the question about a similar comparison with Moorman et al. 2014 for Ras Cdc42 still needs to be addressed.

Moorman et al. 2014 studied protein-protein interactions (Cdc42Hs and the binding domain of the PAK3 kinase (PDB46)). We did not study nor infer any results or conclusions related to protein-protein interactions. Protein-protein interactions are something we are interested in studying in the future and will examine this case as part of those studies.

https://doi.org/10.7554/eLife.74114.sa2

Article and author information

Author details

  1. Stephanie A Wankowicz

    1. Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, United States
    2. Biophysics Graduate Program, University of California San Francisco, San Francisco, United States
    Contribution
    Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-4225-7459
  2. Saulo H de Oliveira

    Atomwise Inc., San Francisco, United States
    Contribution
    Conceptualization, Investigation, Methodology, Software, Writing – review and editing
    Competing interests
    is an employee of Atomwise Inc
  3. Daniel W Hogan

    Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, United States
    Contribution
    Investigation, Resources, Software, Writing – review and editing
    Competing interests
    No competing interests declared
  4. Henry van den Bedem

    1. Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, United States
    2. Atomwise Inc., San Francisco, United States
    Contribution
    Conceptualization, Project administration, Writing – review and editing
    Competing interests
    is an employee of Atomwise Inc
  5. James S Fraser

    Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, United States
    Contribution
    Conceptualization, Funding acquisition, Project administration, Resources, Supervision, Writing – original draft, Writing – review and editing
    For correspondence
    jfraser@fraserlab.com
    Competing interests
    has equity, has received consulting fees, and has sponsored research agreements with Relay Therapeutics
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5080-2859

Funding

National Science Foundation (GRFP 2034836)

  • Stephanie A Wankowicz

National Institutes of Health (GM123159)

  • James S Fraser

National Institutes of Health (GM124149)

  • James S Fraser

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank James Holton, Tony Capra, and Dan Herschlag for helpful comments and suggestions. JSF was supported by NIH GM123159 and GM124149, and a Sanghvi-Agarwal Innovation Award. SAW was supported by NSF GRFP 2034836.

Senior Editor

  1. Volker Dötsch, Goethe University, Germany

Reviewing Editor

  1. Axel T Brunger, Stanford University School of Medicine, Howard Hughes Medical Institute, United States

Publication history

  1. Preprint posted: September 21, 2021 (view preprint)
  2. Received: September 22, 2021
  3. Accepted: March 18, 2022
  4. Accepted Manuscript published: March 21, 2022 (version 1)
  5. Version of Record published: May 9, 2022 (version 2)

Copyright

© 2022, Wankowicz et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 2,019
    Page views
  • 373
    Downloads
  • 2
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Stephanie A Wankowicz
  2. Saulo H de Oliveira
  3. Daniel W Hogan
  4. Henry van den Bedem
  5. James S Fraser
(2022)
Ligand binding remodels protein side-chain conformational heterogeneity
eLife 11:e74114.
https://doi.org/10.7554/eLife.74114

Further reading

    1. Structural Biology and Molecular Biophysics
    Nicola Galvanetto, Zhongjie Ye ... Vincent Torre
    Research Article Updated

    Single-molecule force spectroscopy (SMFS) uses the cantilever tip of an atomic force microscopy (AFM) to apply a force able to unfold a single protein. The obtained force-distance curve encodes the unfolding pathway, and from its analysis it is possible to characterize the folded domains. SMFS has been mostly used to study the unfolding of purified proteins, in solution or reconstituted in a lipid bilayer. Here, we describe a pipeline for analyzing membrane proteins based on SMFS, which involves the isolation of the plasma membrane of single cells and the harvesting of force-distance curves directly from it. We characterized and identified the embedded membrane proteins combining, within a Bayesian framework, the information of the shape of the obtained curves, with the information from mass spectrometry and proteomic databases. The pipeline was tested with purified/reconstituted proteins and applied to five cell types where we classified the unfolding of their most abundant membrane proteins. We validated our pipeline by overexpressing four constructs, and this allowed us to gather structural insights of the identified proteins, revealing variable elements in the loop regions. Our results set the basis for the investigation of the unfolding of membrane proteins in situ, and for performing proteomics from a membrane fragment.

    1. Biochemistry and Chemical Biology
    2. Structural Biology and Molecular Biophysics
    Hemant N Goswami, Jay Rai ... Hong Li
    Research Article

    Cas7-11 is a Type III-E CRISPR Cas effector that confers programmable RNA cleavage and has potential applications in RNA interference. Cas7-11 encodes a single polypeptide containing four Cas7- and one Cas11-like segments that obscures the distinction between the multi-subunit Class 1 and the single-subunit Class-2 CRISPR-Cas systems. We report a cryo-EM structure of the active Cas7-11 from Desulfonema ishimotonii (DiCas7-11) that reveals the molecular basis for RNA processing and interference activities. DiCas7-11 arranges its Cas7- and Cas11-like domains in an extended form that resembles the backbone made up by four Cas7 and one Cas11 subunits in the multi-subunit enzymes. Unlike the multi-subunit enzymes, however, the backbone of DiCas7-11 contains evolutionarily different Cas7 and Cas11 domains, giving rise to their unique functionality. The first Cas7-like domain nearly engulfs the last 15 direct repeat nucleotides in processing and recognition of the CRISPR RNA, and its free-standing fragment retains most of the activity. Both the second and the third Cas7-like domains mediate target RNA cleavage in a metal-dependent manner. The structure and mutational data indicate that the long variable insertion to the fourth Cas7 domain has little impact to RNA processing or targeting, suggesting the possibility for engineering a compact and programmable RNA interference tool.