Binding Entropy Can Be Predicted by Crystallographic Ensembles

Charlotte A Miller; Stephanie A Wankowicz

doi:10.7554/eLife.111298.1

eLife Assessment

This study provides a useful demonstration that, at least for the systems examined, aspects of the entropic contribution to protein-ligand binding can be inferred directly from crystallographic data. In doing so, it strengthens a view of crystal structures as heterogeneous ensembles that are amenable to statistical-mechanical analysis rather than purely static models. The analytical approaches are carefully developed and transparently discussed, with thoughtful consideration of both successful and less effective methods, lending solid support to the central conclusions. However, because the analysis is based on a relatively small and narrowly sampled set of protein-ligand complexes, the generality of these findings remains speculative and will require broader validation.

https://doi.org/10.7554/eLife.111298.1.sa2

Significance of findings

useful: Findings that have focused importance and scope

landmark
fundamental
important
valuable
useful

Strength of evidence

solid: Methods, data and analyses broadly support the claims with only minor weaknesses

exceptional
compelling
convincing
solid
incomplete
inadequate

During the peer-review process the editor and reviewers write an eLife assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife assessments

Abstract

Protein-ligand binding is governed by free energy, comprising both enthalpic and entropic contributions. Yet structural interpretations of binding thermodynamics have predominantly focused on enthalpic interactions, largely neglecting entropy because it is difficult to quantify from static structural models. Here, we developed multiconformer ensemble models to analyze high-resolution X-ray crystallography structures and estimate both protein and solvent conformational entropies. These ensemble models successfully predicted experimental binding entropies measured by isothermal titration calorimetry for over 70 protein-ligand pairs across 12 proteins, revealing a strong linear correlation. Protein entropy, estimated using crystallographic order parameters that capture both harmonic and anharmonic motion, correlates linearly with experimental binding entropy. Incorporating resolution-corrected differences in water-molecule counts substantially improves predictions, demonstrating that protein and solvent contributions must be considered jointly. Analysis of water-protein hydrogen bonding networks partially explains entropic differences across complexes. These results establish that crystallographic ensembles can quantify binding entropy, enabling explicit entropic considerations in structure-based studies of molecular recognition for both functional analysis and drug design.

Introduction

Protein–ligand interactions orchestrate enzyme function and signaling throughout biology. Much of modern medicine harnesses these interactions to manipulate proteins with synthetic ligands, thereby altering disease pathways. We have long sought to understand the physical principles underlying protein-ligand interactions. However, our static structural models, and thus our physical interpretations, have historically emphasized enthalpic interactions[1]. Yet binding affinity is determined by free energy, which comprises both enthalpic and entropic contributions. The entropic component of free energy has received far less attention, primarily because entropy cannot be directly visualized or quantified from static structural models, instead requiring ensemble representations [2,3]. Quantifying binding entropy directly from structural data would enable systematic entropic optimization and help reveal the structural determinants of binding thermodynamics.

Explanations of the entropic component of ligand binding often focus on solvent molecule release [4][5]. Yet accumulating evidence indicates that protein entropy is also a major determinant of binding entropy, with contributions differing substantially between proteins and between different ligands binding to the same protein [2,6–11]. Protein conformational entropy arises from the many degrees of freedom proteins possess, much of which is derived from side-chain sampling within and between rotamer states [12–15]. This conformational multiplicity, along with solvent entropy, creates an entropic reservoir that is perturbed upon ligand binding, contributing to both protein stability and binding thermodynamics [6,12,16–21].

Developing quantitative metrics to estimate protein entropy has been challenging. Protein entropy encompasses both conformational entropy, which reflects anharmonic transitions between discrete rotamer states, and harmonic vibrational entropy, which reflects fluctuations within individual rotamer states. Accurate entropy calculations require the complete enumeration of the partition function over all conformational states, weighted by their Boltzmann factors [22]. However, this is computationally intractable due to the high dimensionality of conformational space, leading to various quantitative approximations. In molecular dynamics (MD), quasi-harmonic analyses of trajectories approximate entropy from covariance matrices but fail to capture the anharmonic entropic contributions [23,24]. Further, many MD simulations do not span biologically relevant timescales[21]. Nuclear Magnetic Resonance (NMR) relaxation techniques provide site-specific measurements of disorder in methyl or amide groups on picosecond timescales [25,26]. These measurements capture probability distributions of spatial coordinates and have been correlated with entropy [7,14,27]; but face challenges of throughput, timescale, and structural degeneracy.

X- ray crystallography and cryo-electron microscopy (cryo-EM) collect data from millions of protein copies, capturing many conformational states, therefore potentially containing significant information on the multiplicity of states and, thus, entropy. However, structural models typically represent only a single state, discarding information about the multiplicity of states that drive entropy [28]. We previously demonstrated algorithmic advances that enable the modeling of high-resolution X-ray crystallography data as multiconformer ensembles[28,29]. These ensembles explicitly capture anharmonic heterogeneity, including differences in rotamer states, through alternative conformations, as well as harmonic heterogeneity through B-factors, thereby demonstrating an improved fit to experimental data [29]. Critically, these models enable quantitative estimates of protein conformational entropy via crystallographic order parameters [30], yielding values that agree with NMR relaxation order parameters and thus capture the spatial distribution of side chains.

Previous studies identified a quantifiable relationship between binding entropy and protein conformational entropy as measured by NMR order parameters [30,31], leading to the development of the ‘entropy meter’ [7]. Recently, we demonstrated that within a single system, protein and solvent entropy derived from crystallographic ensembles estimates correlated with binding entropy [8]. We also showed that protein-solvent hydrogen-bonding networks also correlate with binding entropy, linking protein conformational entropy to solvent entropy through coordinated, system-level redistribution. However, whether these findings generalize beyond individual systems remains an open question.

Here, we establish that crystallographic ensembles provide quantitative entropy estimates that correlate with binding entropy across multiple systems. We show that accounting for both protein and solvent entropy is key to predicting binding entropy. These structure-based entropy metrics enable integration of this critical free energy component into both the intuition and quantification of binding thermodynamics.

Results

Dataset

We collected thermodynamic binding measurements from isothermal titration calorimetry (ITC) from our previous study and from 14 datasets representing 12 different proteins (SARS-CoV-2 macrodomain [Mac1], Cyclin-dependent kinase 2 [CDK2], Thermolysin, Major Urinary Protein 1 [MUP-1], Bromodomain-containing protein 4 [BRD4], tRNA Guanine Transglycosylase [TGT], human Carbonic Anhydrase II [HCAII], Farnesyl pyrophosphate synthase [FPS], Galectin, and Transthyretin [TTR], Bovine Trypsin [Trypsin]), totaling 75 unique protein-ligand interactions(Supplementary Table 1) [8,9,32–41]. For protein-ligand pairs with multiple ITC measurements, we averaged each measurement for subsequent analysis. For each ITC measurement, we required a corresponding structure that contained the same protein-ligand pair measured by ITC, along with an apo structure with the same sequence, space group, and resolution (within 0.3 Angstroms), and unit cell dimensions/angles (within 10%) (Supplementary Table 2). While we looked at additional protein-ligand pairs, most notably the extensive work on HIV protease and Thrombin [42–45], these did not meet the stringent crystallographic criteria (see all datasets examined but omitted in Supplementary Table 3). All structures had resolutions better than 2.1 Angstroms and had their structure factors deposited (Supplementary Figure 1A/B). We remodeled all structures using a pipeline of multiconformer models from qFit and Phenix refinement (Methods) [29,46,47]. Multiconformer models were required to have an R_free of better than 0.25 (Supplementary Figure 1C/D). Three pairs were removed due to refinement errors.

Our dataset comprised proteins with bound chains ranging from 115 to 316 residues (median = 258; Supplementary Figure 2A). All bound and unbound structure pairs exhibited minimal conformational changes, with alpha carbon root mean squared deviations (RMSD) below 1.75 Å (Supplementary Figure 2B). This confirmed the absence of large conformational changes, as previously investigated by NMR [31]. Ligands ranged in molecular weight from 102.46 to 461.50 Da (median = 284.68), with hydrogen bond acceptor counts of 1–7 (median = 4) and hydrogen bond donor counts of 0–5 (median = 2; Supplementary Figure 3A).

Binding Entropy Linearly Correlates with Protein Conformational Entropy

We first aimed to determine if protein entropy, estimated by crystallographic ensembles, correlated with binding entropy (Figure 1A). To estimate protein entropy from multiconformer structures, we calculated crystallographic order parameters that measure the first torsion angle of each alternative conformation, weighted by its B-factor. We measured them for all residues except glycine and proline (Methods) [48]. We calculated the difference in protein entropy as the average difference between residues multiplied by the number of residues in the structure, as previously done [7]. When we compared these structural estimates to binding entropy measurements from ITC, we observed a linear relationship but found that our structural estimates systematically overestimated entropy changes by approximately fourfold. After applying a correction factor of 0.25, the structural estimates aligned much more closely with experimental values along a 1-to-1 line. Proteins showing increased conformational entropy in the bound state exhibited more entropically driven binding (Slope=0.85, R²=0.59; Figure 1B; Supplementary Table 4). We also explored the correlation between protein conformational entropy and the individual measured ITC values, observing that a significant correlation persisted, though with greater scatter around the median trend line, as expected given experimental variation in replicate measurements (Supplementary Figure 4). This correlation indicates that binding entropy increases by approximately 0.71kcal/mol·K per unit of protein conformational entropy. The magnitude of these changes provides insight into their structural distribution. Given that the proteins examined average 258 residues and that the maximum conformational entropy change per residue is 0.25 kcal/mol·K (with a median closer to 0.05 kcal/mol·K), the observed entropy changes likely arise from tens to hundreds of residues shifting their conformational sampling.

A. We obtained binding thermodynamic parameters by ITC from 14 proteins, comprising 72 protein-ligand pairs. Using multiconformer models, we calculated estimates of protein and solvent conformational entropies to identify correlations between ITC binding entropy and structural estimates of entropy. B. Distribution of the thermodynamic binding contributions across all protein-ligand pairs ordered by entropy (n = 72). C. Correlation between protein conformational entropy as measured by crystallographic multiconformer structures and ITC-measured binding entropy. Each dot represents an individual protein-ligand complex, with each color representing a different protein (slope = 0.85, R² = 0.59). Higher binding entropy/protein conformational entropy indicates a more unfavorable binding entropy.

The strong observed correlation between protein conformational entropy and binding entropy, without explicitly accounting for solvent, likely arises for two reasons. First, we and others have observed that protein and solvent entropic contributions tend to covary [4,8] with structural rearrangements that alter protein conformational entropy, often simultaneously reorganizing interfacial and proximal water networks. Second, it is possible that for the protein-ligand pairs studied here, protein conformational entropy may simply have a larger effect than solvent entropy [4].

We also noted that protein-ligand pairs from the same protein tended to cluster within a relatively narrow range of both structural and binding entropy values, with TGT being an exception[35,49]. We suspect that this limited dynamic range is driven by the ligand series used in each system, as we obtained ITC data from papers that often explore closely related ligands bound to the same protein, leading to similar binding thermodynamics. It is also possible that this pattern reflects intrinsic thermodynamic preferences of individual proteins, with some systems favoring more entropically driven binding mechanisms and others relying more on enthalpic stabilization. Further binding-entropy analysis across a broader range of ligands is needed to determine whether this hypothesis is true.

Solvent Entropy Estimates are not Strongly Correlated with Binding Entropy

Having observed a strong linear correlation between protein entropy and binding entropy, we next investigated whether incorporating estimates of solvent entropy from crystallographic structures could further refine these predictions, as binding entropic effects arise from the collective behavior of protein, solvent, and ligand [50–52]. We first estimated changes in solvent-accessible surface area (SASA) between bound and unbound structures (Methods). Across all protein-ligand pairs, bound structures exhibited, on average, slightly greater solvent-accessible surface area (Supplementary Figure 5). This increase was primarily driven by apolar SASA, which has been hypothesized to enhance solvent entropy by displacing ordered water molecules from apolar surfaces into the bulk solvent [53]. However, we observed no strong correlation between any SASA measure and binding entropy (total ΔSASA: R² = 0.0058; apolar ΔSASA: R² = 0.017; polar ΔSASA: R² = 0.0027; Supplementary Figure 6). Previous work has linked solvent-derived heat capacity estimates to entropy estimates using the relationship (ΔHeat Capacity = (−0.26)ΔSASAapolar + (0.45)ΔSASApolar) [7,54], prompting us to explore whether we could identify a similar relationship. However, we found no strong correlation between changes in heat capacity and binding entropy (R² = 0.066; Figure 2A).

A. Estimates of differences in heat capacity with binding entropy (Slope = 0.020, R² = 0.066); B. Difference in the number of water molecules bound corrected for resolution and correlation with binding entropy (Slope = -0.021, R² = 0.11)

We reasoned that SASA measures alone may be insufficient, as they only indicate whether a residue is solvent-exposed, without capturing the complex thermodynamic behavior of water, particularly its hydrogen-bonding capacity or ‘iceberg’ like interactions at non-polar areas [50,51,55,56]. Therefore, we analyzed changes in the number of water molecules bound per residue, defined as water molecules within 3.2 Å of each residue (Methods). Across all protein-ligand pairs, most residues showed no change in bound water molecules (60.7%), while 22.8% gained and 16.5% lost water molecules (Supplementary Figure 7A). Notably, the mean difference in water molecule counts correlated with resolution differences between apo and bound structures, with the higher-resolution structure exhibiting more bound water molecules (R² = 0.38; Supplementary Figure 7B). To address this confounding factor, we performed residual analysis to regress out resolution effects (Methods; Supplementary Figure 8). After correction, the distribution reversed, with 56.7% of residues losing water molecules while 43.3% gained them. We then determined if the differences in the number of water molecules, controlling for resolution differences, would correlate with binding entropy. While this slightly improved the correlation with binding entropy relative to estimates of heat capacity, it still had little power to explain binding entropy (Figure 2B; R² = 0.11).

Binding Entropy can be Predicted from Protein and Solvent Entropy Estimates from Crystallographic Ensembles

We then aimed to test whether estimates in solvent entropy, combined with protein entropy, improved the prediction of binding entropy. We first used SASA values, including heat capacity; however, we did not observe any improvement over protein conformational entropy estimates alone (R² = 0.58; Figure 3A; Supplementary Figure 9). Finally, we evaluated whether changes in water molecule counts improved binding entropy predictions from crystallographic ensembles. Using a linear model, we found that resolution-corrected water counts improved prediction accuracy (R² = 0.69; -1.44 + 0.64 × Protein Conformational Entropy - 3.3 × Difference in Number of Waters (Corrected); Figure 3B). Slightly better improvements were observed with uncorrected differences in water counts (R² = 0.704, Supplementary Figure 10). The consistent trends observed across proteins and ligands demonstrate that protein conformational entropy estimated from high-resolution crystallographic ensembles can provide semi-quantitative estimates of protein conformational entropy upon ligand binding.

A. Correlation between predicted binding entropy values using estimates of heat capacity (R² = 0.59; TΔS = -1.044 + 0.891*[Protein Entropy] + 1.097*[change in heat capacity]) B. Correlation between predicted binding entropy values using change in number of bound water molecules corrected for resolution (R² = 0.69; (TΔS = -1.45 + 0.64×[Protein Entropy] - 3.8×[Number of Water]) C. Protein conformational entropy and solvent differences between apo TGT (PDB: 4PUN) and two ligand-bound structures with dramatically different changes in binding entropy. Both ligands pay an entropic penalty, with 4Q8T paying a lower entropic penalty, derived from a smaller reduction in side chain alternative conformers (observed based on color differences of stick representations in the first row) and a smaller difference in the number of water molecules in the bound structure. 4PUK has more water molecules bound per residue with corrected resolution, further explaining the larger entropic penalty.

The most dramatic changes between binding entropies of the same protein were observed in TGT, with binding entropies ranging from 0.95 to 13.34 kcal/mol[35,49]. All entropies impose a penalty on free energy, though to varying degrees. We aimed to visually investigate what was driving these changes. In the apo state, nearly every residue exhibited alternative conformations, predominantly involving the backbone. Upon ligand binding, both complexes (4Q8T and 4PUK) showed substantial reductions in conformational heterogeneity, contributing to the entropic penalty of binding. Notably, 4PUK exhibited a larger estimated protein entropy change (11.31 vs. 5.91 kcal/mol·K) and a correspondingly larger experimental entropic penalty (13.34 kcal/mol). This complex also showed increased water occupancy (0.018 additional resolution-corrected waters per residue), whereas 4Q8T exhibited net water release (-0.17 fewer resolution-corrected waters per residue). These results demonstrate that both protein conformational restriction and differential solvent reorganization contribute to the distinct binding thermodynamics observed for these two ligands binding the same protein (Figure 3C).

Hydrogen Bond Networks Show Moderate Correlation with Binding Entropy

We previously found that decreased residue packing and fewer protein–solvent hydrogen bonds in the bound state are associated with decreased binding entropy [8]. We aimed to assess whether these patterns generalize across different proteins. We excluded the Mac1 protein-ligand pairs from this analysis, as they were the source of our initial observations. To investigate the structural basis for entropic differences, we built a hydrogen bond network and computed graph-theoretic metrics that capture network connectivity, with nodes corresponding to hydrogen-bond-forming residues (Methods). Graph parameters, including the number of nodes and edgers were computed. When examining protein only features, we observed weak correlations between structural metrics and binding entropy, with the number of nodes (R² = 0.073) and edges (R² = 0.050) showing minimal predictive power (Supplementary Figure 11). Given that local packing density restricts conformational sampling and correlates with residue-level entropy in folding studies[13], we tested whether packing metrics would correlate with binding entropy. We quantified packing by counting neighboring residues within 5.0 Å of each residue’s heavy atoms. Packing density showed only weak correlation with experimental binding entropy (R² = 0.13; Supplementary Figure 12), suggesting that the entropic penalties of binding are likely not very driven by static structural crowding.

However, incorporating water molecules into the hydrogen bond network analysis (considering protein or water molecules as potential hydrogen bonding nodes) revealed stronger correlations with binding entropy (nodes [R² = 0.38] and edges [R² = 0.24]; Supplementary Figure 13). This indicates that binding events that generate more extensive hydrogen-bond networks, encompassing both protein and water, tend to be less entropically driven. The notable improvement observed when water is included in hydrogen-bonding networks further supports the idea that protein and solvent jointly mediate the entropic cost of binding. These features may also help differentiate water patterns driven by distinct thermodynamic processes. For example, a more extensive hydrogen-bonding network involving solvent and protein would more strongly restrict solvent translation than a weaker network, thereby reducing translational entropy [57]. While the correlations remain moderate, the substantial increase in predictive power over protein-only metrics underscores the need to consider the entire protein-water system when analyzing binding thermodynamics.

Discussion

We demonstrate that binding entropy can be predicted from crystallographic ensembles across diverse protein-ligand systems, enabling its quantification in structure-based drug design. We observe that accounting for both protein and solvent entropy is key to predicting this highly variable role in binding. This variability demonstrates that explicit entropy quantification is essential for effective design. Our approach addresses this gap by simultaneously optimizing both contributions using transferable metrics across globular proteins.

While entropy calculations require enumerating the partition function over all conformational states, they are computationally intractable due to high dimensionality and rugged energy landscapes. The crystallographic order parameter approach provides a practical alternative by measuring side-chain angular disorder as a proxy for conformational entropy [25,48]. Drawing on the Lipari-Szabo formalism [25], which relates order parameters to restricted angular motion, we capture local structural heterogeneity that reflects the system’s protein entropy. Further, our solvent estimations showed that using metrics based on explicitly placed solvent yielded a large improvement over implicit heat-capacity measurements [58]. Water molecules resolved in crystal structures represent those with sufficient occupancy and order to be detected, likely corresponding to waters that form stable, enthalpically favorable hydrogen bonds with the protein. The loss of water molecules upon ligand binding increases their rotational and translational entropy, contributing favorably to binding entropy. Conversely, the gain in ordered waters might stabilize specific conformational states, coupling solvent organization to protein entropy [51,59,60]. The improved binding-entropy predictions achieved by combining protein conformational entropy with water counts, compared with using either metric alone, indicate that these contributions are at least partially independent. However, we also found that changes in solvent-protein hydrogen-bonding networks contribute to binding entropy variability beyond what protein conformational changes alone explain. This finding suggests that protein and solvent entropy changes may reinforce each other [51,59].

The strong linear correlation with binding entropy validates our approach as an “entropy meter” for crystallography, extending NMR-based methods to the more accessible crystallographic domain [7]. However, several limitations warrant improvement in the future. Current metrics focus on single-residue angular motion rather than whole-side-chain descriptors and do not account for collective motions between residues or between protein and solvent. Improved protein conformational entropy metrics could be more directly grounded in statistical mechanics and could help illuminate correlations between protein and solvent entropy. Further, crystallographic ensemble models currently do not fully capture the conformational ensembles in the diffraction data, even before accounting for crystallographic artifacts. This limitation is particularly acute for water molecules, where resolution-dependent effects obscure the accurate quantification of solvent entropy [25,48,61]. Although our residual regression approach corrects for systematic resolution biases, it assumes uniform impact across all waters, an assumption that may fail if resolution differences correlate with genuine changes in structural heterogeneity rather than purely technical limitations. These challenges underscore the broader need for improvements in ensemble modeling and solvent modeling methods. Finally, our dataset is dominated by closely related ligand series targeting the same proteins, potentially overweighting certain protein families and limiting the chemical diversity explored. Expanding this analysis to include more structurally diverse proteins and ligands would further test the generalizability of our findings.

Making entropy quantifiable from high-resolution crystallographic data enables thermodynamically balanced ligand optimization rather than enthalpy-focused design. We reveal that the dynamic ensemble, captured through multiconformer modeling, provides a practical means to quantify entropic contributions in structure-activity relationship campaigns, integrating this critical yet often overlooked free-energy component into rational drug design workflows. Our findings establish a foundation for entropically informed drug design, bringing the vision of designing ligands with optimized free energy landscapes, rather than enthalpic interactions alone, increasingly achievable.

Methods

Creation of multiconformer ensemble models

All structures were initially downloaded from PDBRedo[62] and were re-modeled using the qFit (version 2025.2)[29,47,63]. qFit structures were then placed through Phenix.refine (version 1.20)[47].

Crystallographic Order Parameters

Crystallographic order parameter values were computed for each residue (excluding glycine and proline) as a proxy for conformational entropy [48]. These values integrate both side-chain and backbone flexibility using two components: (1) the angle of alternative conformers (s2angle), calculated from χ1 dihedral distributions, and (2) atomic displacement data derived from normalized B-factors of alpha or beta carbons bonded to a hydrogen (s2ortho) [48]. The s2ortho term was corrected for resolution-dependent variation in B-factors using the following normalization:

The final s2calc value was then obtained by combining both components multiplicatively:

This combined metric captures both thermal motion and conformational variability, offering a robust estimate of per-residue rigidity. Values range from 0 (highly flexible) to 1 (fully rigid).

Protein conformational entropy was estimated as 0.25*number of residues*average crystallographic order parameter.

Solvent Accessible Surface Area

For each atom, we computed the SASA using FreeSasa [64]. We then divided polar versus apolar surface area based on atom type. When calculating differences, we used continuous information

Water Molecule Analysis

We quantified local solvation by counting the number of water molecules within 3.2 Å of each residue in the structure. For each PDB file, all non-hydrogen atoms of protein residues were compared against the oxygen atoms of water molecules, and any water whose oxygen atom lay within 3.2 Å of any heavy atom in a given residue was counted once. This analysis generated residue-level water contact counts, providing a quantitative measure of local hydration and solvent accessibility across the protein surface.

Water Molecule Resolution Correction

To remove systematic bias in water counts arising from differences in crystallographic resolution, we applied a resolution correction to all water count differences using linear residualization. The change in mean water count upon binding (bound minus apo) was modeled as a linear function of the resolution difference between paired structures. The expected contribution due solely to resolution was estimated by linear regression and subtracted from the observed water count change, yielding a resolution-corrected water count difference for each structure pair.

Hydrogen Bonds

Hydrogen bonds were identified if the donor–acceptor distance ≤3.5 Å or hydrogen–acceptor distance ≤2.6 Å, and donor–hydrogen–acceptor angle ≥120°. Networks were constructed as undirected graphs with residues as nodes and hydrogen bonds as edges, analyzed both with and without water molecules.

Network metrics included: basic properties (node/edge counts, density, degree distribution), connectivity (component analysis, clustering coefficients, transitivity), path metrics (average shortest path, diameter, radius), and centrality measures (betweenness, closeness, degree, eigenvector). Water-specific metrics quantified water-mediated versus direct protein–protein interactions. Per-residue metrics captured local connectivity and centrality within the hydrogen bonding network. All analyses were performed separately for networks that included and excluded water molecules to distinguish direct and water-mediated interactions.

Residue Packing

Local packing density was quantified for each residue by counting neighboring residues within 5.0 Å using heavy-atom distances (excluding hydrogens and water). For each residue, we computed the number of neighboring residues, contact density (neighbors normalized by the number of heavy atoms in the residue), and average B-factor and occupancy values.

Ligand Properties

Ligand properties were obtained using RDKit via SDF files obtained from PDB structures.

Data

All multiconformer modeled structures are in the Zenodo depository: https://zenodo.org/records/18209769

Code

https://github.com/ExcitedStates/qfit-3.0

https://github.com/Wankowicz-Lab/ensemble_bioinformatic_toolkit

Data availability

Data availability is listed in Supplementary Table 1. PDBs and thermodynamic data were all previously published.

Supplementary Figures

A. Resolution distribution across all PDBs (Median: 1.30 [IQR: 1.08-1.60]). B. Resolution differences between paired PDBs (Median: 0.06 [IQR: -0.06 - 0.15]). C. Rfree distribution across all PDBs (Median: 0.16 [IQR: 0.15-0.19]). D. RFree differences between paired PDBs (Median: 0.010 [IQR: -0.010-0.027].

A. Distribution of protein size across the dataset by number of residue numbers. B. Alpha carbon RMSD between bound and unbound PDB pairs (Median: 0.27 [IQR: 0.19-0.46].

Ligand information
A. Molecular weight distribution across the dataset (Median: 284.68(102.46-461.50]) B. Hydrogen Bond Acceptor(Median: 4 [1–7]) C. Hydrogen Bond Donor (Median: 2 [0–5])

Correlation between ITC-measured binding entropy and protein conformational entropy metrics.
Each point represents the mean binding entropy from ITC experiments, with error bars indicating the range of all measurements. Linear regression parameters: slope = 0.88, intercept = 1.63, R² = 0.61.

Distribution of change in SASA values across pairs
A. Total SASA changes (Median: 0.06, Mean: 0.05, Std: 0.62); B. Apolar SASA changes (Median: 0.16, Mean:-0.03, Std: 0.84); C. Polar SASA changes (Median: 0.07, Mean: 0.03, Std: 0.74)

Correlation of SASA changes with binding entropy
A. Total ΔSASA: R²=0.0058; B. apolar ΔSASA: R²=0.0174; C. polar ΔSASA: R^z=0.0027.

A. Distribution of average number of waters gained or lost between bound and unbound. B. Correlation between resolution differences and mean water count change per residue (R²=0.53, Slope=-1.232).

A. Distribution of average number of waters gained or lost between bound and unbound corrected for resolution differences between bound and apo. B. Correlation between resolution differences and mean water count change per residue (R²=0, Slope=O).

Contribution of solvent-accessible surface area components along with protein conformational entropy to experimental binding entropy.
A. Model including total SASA (R² = 0.588). B. Model including apolar SASA (R² = 0.589). C. Model including polar SASA (R² =0.587).

Linear model relating non-corrected protein conformational entropy and non-corrected water count to the entropic binding (R² = 0.681)

Hydrogen bond protein only
A. Correlation between the number of nodes and experimental entropy measurement (R²=0.0734) B. Correlation between the number of edges and experimental entropy measurement (R²=0.0489)

Packing and entropy measurement
A. Correlation between the difference in packing and protein conformational entropy estimates (R²=0.1976). B. Correlation between average number of waters gained or lost between bound and unbound with protein structural entropy estimates (R²=0.1318).

Hydrogen bond protein and solvent
A. Correlation between the number of nodes and experimental entropy measurement (R²=0.3765) B. Correlation between the number of edges and experimental entropy measurement (R²=0.2423)

Additional files

Supplementary Table 1

Supplementary Table 2

Supplementary Table 3

Supplementary Table 4

Additional information

Funding

American Cancer Society

References

1.
1. Bissantz C
2. Kuhn B
3. Stahl M
2010A medicinal chemist’s guide to molecular interactionsJ Med Chem 53:5061–5084https://doi.org/10.1021/jm100112j PubMed Google Scholar
2.
1. Wankowicz SA
2. Fraser JS
2025Advances in uncovering the mechanisms of macromolecular conformational entropyNat Chem Biol 21:623–634https://doi.org/10.1038/s41589-025-01879-3 PubMed Google Scholar
3.
1. Ladbury JE
2. Klebe G
3. Freire E
2010Adding calorimetric data to decision making in lead discovery: a hot tipNat Rev Drug Discov 9:23–27https://doi.org/10.1038/nrd3054 PubMed Google Scholar
4.
1. Verteramo ML
2. Stenström O
3. Ignjatović MM
4. Caldararu O
5. Olsson MA
6. Manzoni F
7. et al.
2019Interplay between Conformational Entropy and Solvation Entropy in Protein-Ligand BindingJ Am Chem Soc 141:2012–2026https://doi.org/10.1021/jacs.8b11099 PubMed Google Scholar
5.
1. Olsson TSG
2. Williams MA
3. Pitt WR
4. Ladbury JE
2008The thermodynamics of protein-ligand interaction and solvation: insights for ligand designJ Mol Biol 384:1002–1017https://doi.org/10.1016/j.jmb.2008.09.073 PubMed Google Scholar
6.
1. Tzeng S-R
2. Kalodimos CG
2012Protein activity regulation by conformational entropyNature 488:236–240https://doi.org/10.1038/nature11271 PubMed Google Scholar
7.
1. Caro JA
2. Harpole KW
3. Kasinath V
4. Lim J
5. Granja J
6. Valentine KG
7. et al.
2017Entropy in molecular recognition by proteinsProc Natl Acad Sci U S A 114:6563–6568https://doi.org/10.1073/pnas.1621154114 PubMed Google Scholar
8.
1. Seo L
2. Farran I
3. Aslam A
4. Li X
5. Jaishankar P
6. Fraser JS
7. et al.
2025Crystallographic Ensembles Reveal the Structural Basis of Binding Entropy in SARS-CoV2 MacrodomainbioRxiv https://doi.org/10.1101/2025.11.25.690589 PubMed Google Scholar
9.
1. Diehl C
2. Engström O
3. Delaine T
4. Håkansson M
5. Genheden S
6. Modig K
7. et al.
2010Protein flexibility and conformational entropy in ligand design targeting the carbohydrate recognition domain of galectin-3J Am Chem Soc 132:14577–14589https://doi.org/10.1021/ja105852y PubMed Google Scholar
10.
1. Capdevila DA
2. Edmonds KA
3. Campanello GC
4. Wu H
5. Gonzalez-Gutierrez G
6. Giedroc DP
2018Functional Role of Solvent Entropy and Conformational Entropy of Metal Binding in a Dynamically Driven Allosteric SystemJ Am Chem Soc 140:9108–9119https://doi.org/10.1021/jacs.8b02129 PubMed Google Scholar
11.
1. Bonin JP
2. Sapienza PJ
3. Lee AL
2022Dynamic allostery in substrate binding by human thymidylate synthaseeLife 11https://doi.org/10.7554/eLife.79915 PubMed Google Scholar
12.
1. Doig AJ
2. Sternberg MJ
1995Side-chain conformational entropy in protein foldingProtein Sci 4:2247–2251https://doi.org/10.1002/pro.5560041101 PubMed Google Scholar
13.
1. Bromberg S
2. Dill KA
1994Side-chain entropy and packing in proteinsProtein Sci 3:997–1009https://doi.org/10.1002/pro.5560030702 PubMed Google Scholar
14.
1. Akke M
2. Brueschweiler R
3. Palmer AG III
1993NMR order parameters and free energy: an analytical approach and its application to cooperative calcium(2+) binding by calbindin D9kJ Am Chem Soc 115:9832–9833https://doi.org/10.1021/ja00074a073 Google Scholar
15.
1. Lee AL
2. Wand AJ
2001Microscopic origins of entropy, heat capacity and the glass transition in proteinsNature 411:501–504https://doi.org/10.1038/35078119 PubMed Google Scholar
16.
1. Gavrilov Y
2. Kümmerer F
3. Orioli S
4. Prestel A
5. Lindorff-Larsen K
6. Teilum K
2022Double Mutant of Chymotrypsin Inhibitor 2 Stabilized through Increased Conformational EntropyBiochemistry 61:160–170https://doi.org/10.1021/acs.biochem.1c00749 PubMed Google Scholar
17.
1. Capdevila DA
2. Braymer JJ
3. Edmonds KA
4. Wu H
5. Giedroc DP
2017Entropy redistribution controls allostery in a metalloregulatory proteinProc Natl Acad Sci U S A 114:4424–4429https://doi.org/10.1073/pnas.1620665114 PubMed Google Scholar
18.
1. Lee AL
2. Kinnear SA
3. Wand AJ
2000Redistribution and loss of side chain entropy upon formation of a calmodulin-peptide complexNat Struct Biol 7:72–77https://doi.org/10.1038/71280 PubMed Google Scholar
19.
1. Jin M
2. Seed RI
3. Cai G
4. Shing T
5. Wang L
6. Ito S
7. et al.
2024Dynamic allostery drives autocrine and paracrine TGF-β signalingCell 187:6200–6219https://doi.org/10.1016/j.cell.2024.08.036 PubMed Google Scholar
20.
1. Stone MJ
2001NMR relaxation studies of the role of conformational entropy in protein stability and ligand bindingAcc Chem Res 34:379–388https://doi.org/10.1021/ar000079c PubMed Google Scholar
21.
1. Genheden S
2. Ryde U
2012Will molecular dynamics simulations of proteins ever reach equilibrium?Phys Chem Chem Phys 14:8662–8677https://doi.org/10.1039/c2cp23961b PubMed Google Scholar
22.
1. Zhou H-X
2. Gilson MK
2009Theory of free energy and entropy in noncovalent bindingChem Rev 109:4092–4107https://doi.org/10.1021/cr800551w PubMed Google Scholar
23.
1. Chang C-E
2. Chen W
3. Gilson MK
2005Evaluating the Accuracy of the Quasiharmonic ApproximationJ Chem Theory Comput 1:1017–1028https://doi.org/10.1021/ct0500904 PubMed Google Scholar
24.
1. Hikiri S
2. Yoshidome T
3. Ikeguchi M
2016Computational Methods for Configurational Entropy Using Internal and Cartesian CoordinatesJ Chem Theory Comput 12:5990–6000https://doi.org/10.1021/acs.jctc.6b00563 PubMed Google Scholar
25.
1. Lipari G
2. Szabo A
1981Nuclear magnetic resonance relaxation in nucleic acid fragments: models for internal motionBiochemistry 20:6250–6256https://doi.org/10.1021/bi00524a053 PubMed Google Scholar
26.
1. Baxa MC
2. Haddadian EJ
3. Jumper JM
4. Freed KF
5. Sosnick TR
2014Loss of conformational entropy in protein folding calculated using realistic ensembles and its implications for NMR-based calculationsProc Natl Acad Sci U S A 111:15396–15401https://doi.org/10.1073/pnas.1407768111 PubMed Google Scholar
27.
1. Yang D
2. Kay LE
1996Contributions to conformational entropy arising from bond vector fluctuations measured from NMR-derived order parameters: application to protein foldingJ Mol Biol 263:369–382https://doi.org/10.1006/jmbi.1996.0581 PubMed Google Scholar
28.
1. Kuzmanic A
2. Pannu NS
3. Zagrovic B
2014X-ray refinement significantly underestimates the level of microscopic heterogeneity in biomolecular crystalsNat Commun 5:3220https://doi.org/10.1038/ncomms4220 PubMed Google Scholar
29.
1. Wankowicz SA
2. Ravikumar A
3. Sharma S
4. Riley B
5. Raju A
6. Hogan DW
7. et al.
2024Automated multiconformer model building for X-ray crystallography and cryo-EMeLife 12https://doi.org/10.7554/eLife.90606 PubMed Google Scholar
30.
1. Frederick KK
2. Marlow MS
3. Valentine KG
4. Wand AJ
2007Conformational entropy in molecular recognition by proteinsNature 448:325–329https://doi.org/10.1038/nature05959 PubMed Google Scholar
31.
1. Marlow MS
2. Dogan J
3. Frederick KK
4. Valentine KG
5. Wand AJ
2010The role of conformational entropy in molecular recognition by calmodulinNat Chem Biol 6:352–358https://doi.org/10.1038/nchembio.347 PubMed Google Scholar
32.
1. Krimmer SG
2. Betz M
3. Heine A
4. Klebe G
2014Methyl, ethyl, propyl, butyl: futile but not for water, as the correlation of structure and thermodynamic signature shows in a congeneric series of thermolysin inhibitorsChemMedChem 9:833–846https://doi.org/10.1002/cmdc.201400013 PubMed Google Scholar
33.
1. Glöckner S
2. Ngo K
3. Sager CP
4. Hüfner-Wulsdorf T
5. Heine A
6. Klebe G
2020Conformational Changes in Alkyl Chains Determine the Thermodynamic and Kinetic Binding Profiles of Carbonic Anhydrase InhibitorsACS Chem Biol 15:675–685https://doi.org/10.1021/acschembio.9b00895 PubMed Google Scholar
34.
1. Yokoyama T
2. Matsumoto K
3. Ostermann A
4. Schrader TE
5. Nabeshima Y
6. Mizuguchi M
2019Structural and thermodynamic characterization of the binding of isoliquiritigenin to the first bromodomain of BRD4Febs J 286:1656–1667https://doi.org/10.1111/febs.14736 PubMed Google Scholar
35.
1. Neeb M
2. Hohn C
3. Ehrmann FR
4. Härtsch A
5. Heine A
6. Diederich F
7. et al.
2016Occupying a flat subpocket in a tRNA-modifying enzyme with ordered or disordered side chains: Favorable or unfavorable for binding?Bioorg Med Chem 24:4900–4910https://doi.org/10.1016/j.bmc.2016.07.053 PubMed Google Scholar
36.
1. Wood DJ
2. Korolchuk S
3. Tatum NJ
4. Wang L-Z
5. Endicott JA
6. Noble MEM
7. et al.
2019Differences in the Conformational Energy Landscape of CDK1 and CDK2 Suggest a Mechanism for Achieving Selective CDK InhibitionCell Chem Biol 26:121–130https://doi.org/10.1016/j.chembiol.2018.10.015 PubMed Google Scholar
37.
1. Krimmer SG
2. Cramer J
3. Betz M
4. Fridh V
5. Karlsson R
6. Heine A
7. et al.
2016Rational Design of Thermodynamic and Kinetic Binding Profiles by Optimizing Surface Water Networks Coating Protein-Bound LigandsJ Med Chem 59:10530–10548https://doi.org/10.1021/acs.jmedchem.6b00998 PubMed Google Scholar
38.
1. Kawasaki Y
2. Sekiguchi M
3. Kawasaki M
4. Hirakura Y
2014Thermodynamic evaluation of the binding of bisphosphonates to human farnesyl pyrophosphate synthaseChem Pharm Bull (Tokyo) 62:77–83https://doi.org/10.1248/cpb.c13-00710 PubMed Google Scholar
39.
1. Iakovleva I
2. Brännström K
3. Nilsson L
4. Gharibyan AL
5. Begum A
6. Anan I
7. et al.
2015Enthalpic Forces Correlate with the Selectivity of Transthyretin-Stabilizing Ligands in Human PlasmaJ Med Chem 58:6507–6515https://doi.org/10.1021/acs.jmedchem.5b00544 PubMed Google Scholar
40.
1. Szalai TV
2. Bajusz D
3. Börzsei R
4. Zsidó BZ
5. Ilaš J
6. Ferenczy GG
7. et al.
2024Effect of Water Networks On Ligand Binding: Computational Predictions vs ExperimentsJ Chem Inf Model 64:8980–8998https://doi.org/10.1021/acs.jcim.4c01291 PubMed Google Scholar
41.
1. Snyder PW
2. Mecinovic J
3. Moustakas DT
4. Thomas SW
5. Harder M
6. Mack ET
7. Lockett MR
8. Héroux A
9. Sherman W
10. Whitesides GM
2011Mechanism of the hydrophobic effect in the biomolecular recognition of arylsulfonamides by carbonic anhydraseProc Natl Acad Sci U S A 108:17889–17894https://doi.org/10.1073/pnas.1114107108 PubMed Google Scholar
42.
1. Velazquez-Campoy A
2. Todd MJ
3. Freire E
2000HIV-1 protease inhibitors: enthalpic versus entropic optimization of the binding affinityBiochemistry 39:2201–2207https://doi.org/10.1021/bi992399d PubMed Google Scholar
43.
1. Todd MJ
2. Freire E
1999The effect of inhibitor binding on the structural stability and cooperativity of the HIV-1 proteaseProteins 36:147–156https://doi.org/10.1002/(sici)1097-0134(19990801)36:2<147::aid-prot2>3.0.co;2-3 PubMed Google Scholar
44.
1. Dullweber F
2. Stubbs MT
3. Musil D
4. Stürzebecher J
5. Klebe G
2001Factorising ligand affinity: a combined thermodynamic and crystallographic study of trypsin and thrombin inhibitionJ Mol Biol 313:593–614https://doi.org/10.1006/jmbi.2001.5062 PubMed Google Scholar
45.
1. Czodrowski P
2. Sotriffer CA
3. Klebe G
2007Protonation changes upon ligand binding to trypsin and thrombin: structural interpretation based on pK(a) calculations and ITC experimentsJ Mol Biol 367:1347–1356https://doi.org/10.1016/j.jmb.2007.01.022 PubMed Google Scholar
46.
1. Riley BT
2. Wankowicz SA
3. de Oliveira SHP
4. van Zundert GCP
5. Hogan DW
6. Fraser JS
7. et al.
2021qFit 3: Protein and ligand multiconformer modeling for X-ray crystallographic and single-particle cryo-EM density mapsProtein Sci 30:270–285https://doi.org/10.1002/pro.4001 PubMed Google Scholar
47.
1. Afonine PV
2. Grosse-Kunstleve RW
3. Echols N
4. Headd JJ
5. Moriarty NW
6. Mustyakimov M
7. et al.
2012Towards automated crystallographic structure refinement with phenix.refineActa Crystallogr D Biol Crystallogr 68:352–367https://doi.org/10.1107/S0907444912001308 PubMed Google Scholar
48.
1. Fenwick RB
2. van den Bedem H
3. Fraser JS
4. Wright PE
2014Integrated description of protein dynamics from room-temperature X-ray crystallography and NMRProc Natl Acad Sci U S A 111:E445–54https://doi.org/10.1073/pnas.1323440111 PubMed Google Scholar
49.
1. Neeb M
2. Betz M
3. Heine A
4. Barandun LJ
5. Hohn C
6. Diederich F
7. et al.
2014Beyond affinity: enthalpy-entropy factorization unravels complexity of a flat structure-activity relationship for inhibition of a tRNA-modifying enzymeJ Med Chem 57:5566–5578https://doi.org/10.1021/jm5006868 PubMed Google Scholar
50.
1. Fox JM
2. Zhao M
3. Fink MJ
4. Kang K
5. Whitesides GM
2018The Molecular Origin of Enthalpy/Entropy Compensation in Biomolecular RecognitionAnnu Rev Biophys 47:223–250https://doi.org/10.1146/annurev-biophys-070816-033743 PubMed Google Scholar
51.
1. Breiten B
2. Lockett MR
3. Sherman W
4. Fujita S
5. Al-Sayah M
6. Lange H
7. et al.
2013Water networks contribute to enthalpy/entropy compensation in protein-ligand bindingJ Am Chem Soc 135:15579–15584https://doi.org/10.1021/ja4075776 PubMed Google Scholar
52.
1. Wallerstein J
2. Ekberg V
3. Ignjatović MM
4. Kumar R
5. Caldararu O
6. Peterson K
7. et al.
2021Entropy-Entropy Compensation between the Protein, Ligand, and Solvent Degrees of Freedom Fine-Tunes Affinity in Ligand Binding to Galectin-3CJACS Au 1:484–500https://doi.org/10.1021/jacsau.0c00094 PubMed Google Scholar
53.
1. Prabhu N
2. Sharp K
2006Protein-solvent interactionsChem Rev 106:1616–1623https://doi.org/10.1021/cr040437f PubMed Google Scholar
54.
1. Hilser VJ
2. Wrabl JO
3. Millard CEF
4. Schmitz A
5. Brantley SJ
6. Pearce M
7. et al.
2025Statistical Thermodynamics of the Protein Ensemble: Mediating Function and EvolutionAnnu Rev Biophys 54:227–247https://doi.org/10.1146/annurev-biophys-061824-104900 PubMed Google Scholar
55.
1. Wang L
2. Berne BJ
3. Friesner RA
2011Ligand binding to protein-binding pockets with wet and dry regionsProc Natl Acad Sci U S A 108:1326–1330https://doi.org/10.1073/pnas.1016793108 PubMed Google Scholar
56.
1. Ladbury JE
1996Just add water! The effect of water on the specificity of protein-ligand binding sites and its potential application to drug designChem Biol 3:973–980https://doi.org/10.1016/s1074-5521(96)90164-7 PubMed Google Scholar
57.
1. Harano Y
2. Kinoshita M
2005Translational-entropy gain of solvent upon protein foldingBiophys J 89:2701–2710https://doi.org/10.1529/biophysj.104.057604 PubMed Google Scholar
58.
1. Prabhu NV
2. Sharp KA
2005Heat capacity in proteinsAnnu Rev Phys Chem 56:521–548https://doi.org/10.1146/annurev.physchem.56.092503.141202 PubMed Google Scholar
59.
1. Fenimore PW
2. Frauenfelder H
3. McMahon BH
4. Parak FG
2002Slaving: solvent fluctuations dominate protein dynamics and functionsProc Natl Acad Sci U S A 99:16047–16051https://doi.org/10.1073/pnas.212637899 PubMed Google Scholar
60.
1. Dahanayake JN
2. Mitchell-Koch KR
2018Entropy connects water structure and dynamics in protein hydration layerPhys Chem Chem Phys 20:14765–14777https://doi.org/10.1039/c8cp01674g PubMed Google Scholar
61.
1. Carugo O
2. Bordo D
1999How many water molecules can be detected by protein crystallography?Acta Crystallogr D Biol Crystallogr 55:479–483https://doi.org/10.1107/s0907444998012086 PubMed Google Scholar
62.
1. Joosten RP
2. Salzemann J
3. Bloch V
4. Stockinger H
5. Berglund A-C
6. Blanchet C
7. et al.
2009PDB_REDO: automated re-refinement of X-ray structure models in the PDBJ Appl Crystallogr 42:376–384https://doi.org/10.1107/s0021889809008784 PubMed Google Scholar
63.
1. Emsley P
2. Cowtan K
2004Coot: model-building tools for molecular graphicsActa Crystallogr D Biol Crystallogr 60:2126–2132https://doi.org/10.1107/s0907444904019158 PubMed Google Scholar
64.
1. Mitternacht S
2016FreeSASA: An open source C library for solvent accessible surface area calculationsF1000Res 5:189https://doi.org/10.12688/f1000research.7931.1 PubMed Google Scholar

Article and author information

Author information

Charlotte A Miller
Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, United States
Stephanie A Wankowicz
Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, United States, Departments of Biochemistry & Computer Science, Vanderbilt University, Nashville, United States, Center for AI in Protein Dynamics, Vanderbilt University, Nashville, United States, Center for Structural Biology, Vanderbilt University, Nashville, United States
ORCID iD: 0000-0002-4225-7459
- For correspondence: stephanie@wankowiczlab.com

Author Notes

Competing interests: No competing interests declared

Version history

Preprint posted: January 20, 2026
Sent for peer review: March 21, 2026
Reviewed Preprint version 1: May 12, 2026

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.111298. This DOI represents all versions, and will always resolve to the latest one.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

views: 186
downloads: 13
citations: 0

Views, downloads and citations are aggregated across all versions of this paper published by eLife.