Abstract
With the advent of AlphaFold, protein structure prediction has attained remarkable accuracy. These achievements resulted from a focus on single static structures. The next frontier in this field involves enhancing our ability to model conformational ensembles, not just the ground states of proteins. Notably, deposited structures result from interpretation of density maps, which are derived from either X-ray crystallography or cryogenic electron microscopy (cryo-EM). These maps represent ensemble averages, reflecting molecules in multiple conformations. Here, we present the latest developments in qFit, an automated computational approach to model protein conformational heterogeneity into density maps. We present algorithmic advancements to qFit, validated by improved Rfree and geometry metrics across a broad and diverse set of proteins. Automated multiconformer modeling holds significant promise for interpreting experimental structural biology data and for generating novel hypotheses linking macromolecular conformational dynamics to function.
Introduction
Macromolecular X-ray crystallography and single-particle electron microscopy (cryo-EM) can provide valuable information on macromolecular conformational ensembles. These experiments cannot capture all conformations present in solution, as many would disrupt the ability to obtain crystals or align classifiable particles1. However, careful modeling of high-resolution X-ray crystallography and cryo-EM experiments can reveal widespread conformational heterogeneity, particularly for protein side chains and local backbone regions2, 3. Discrete conformational heterogeneity of such small magnitude is significant for many biological functions, including macromolecular binding, catalysis, and allostery4–6.
While the underlying data from X-ray diffraction or cryo-EM experiments contain information on temporal and spatial averages of tens of thousands to billions of protein copies, conventional structural modeling and refinement procedures fail to capture much of this valuable information. Most depositions in the Protein Data Bank reflect only an averaged, single ground state set of atomic coordinates7, ignoring weak but potentially biologically rich signals encoding alternative conformations sampled by distinct copies of the protein in the experiment.
Ideally, we would accurately model the complete ensemble of protein conformations reflected in experimental data8. The two ways to model the conformational heterogeneity present in the sample are to create ensembles or to use alternative conformations (multiconformers)9. The PDB “ensemble” format encodes multiple complete copies of the entire system in different models within a single file. Ensemble refinement approaches are implemented in phenix.ensemble_refinement and Vagabond10, 11. In contrast, multiconformers extend the conventional single structure model by encoding each individual conformation using a distinct “alternative location indicator (altloc)” within a single model. Altlocs are assigned distinct letters and can range from single atoms to a large number of connected or non-connected residues. Refinement and validation programs treat atoms sharing the same altloc as having the ability to interact with each other and with atoms lacking an altloc. In contrast, atoms with different altlocs cannot interact. By representing the underlying heterogeneity through discrete conformations with labeled altlocs, multiconformer models encode the distribution of states that contribute to the density map. Further, when compared to ensemble methods that yield multiple complete copies of a protein11–13, multiconformer models are more interpretable and easier to modify in interactive modeling software such as Coot14.
However, many factors make manually creating multiconformer models difficult and time-consuming. Interpreting weak density is complicated by noise arising from many sources, including crystal imperfections, radiation damage, and poor modeling15–17 in X-ray crystallography, and errors in particle alignment and classification, poor modeling of beam induced motion, and imperfect detector Detector Quantum Efficiency (DQE) in cryo-EM18. These factors make visually distinguishing signals in Coot14 or other visualization software very difficult, especially when genuine low-occupancy signals overlap. Additionally, in X-ray crystallography, this process is iterative. Each time a new alternative conformation is placed, it can impact the entire electron density map, often requiring adjustments to previously modeled regions. The difficulty of this process can lead to burnout and human bias, where parts of the protein are carefully modeled as multiconformers, whereas other regions remain modeled as single conformers. Despite these complications multiconformer modeling can be implemented manually or using software such as FLEXR19 or qFit, as described below.
To enable more routine and impartial multiconformer modeling, we have previously developed qFit20–22. This program leverages the ensemble-rich experimental data from density maps that are better than 2.0 angstroms (Å) resolution to automatically generate parsimonious multiconformer models20, 21. As input, qFit takes a refined single-conformer structure and either a high-resolution X-ray or cryo-EM map as input, and then leverages powerful optimization algorithms to identify alternative protein20, 21 or ligand23 conformations.
Here, we present updates to qFit including algorithmic improvements to protein conformation selection based on Bayesian information criteria (BIC), B-factor sampling, and updated cryo-EM scoring. Collectively, these advances enable the unsupervised generation of multiconformer models that routinely improve Rfree over single-conformer X-ray structures derived from high-resolution (better than 2.0 Å) data and improve model geometry metrics across a diverse test set. We further demonstrate that qFit can identify alternative side-chain conformations in high-resolution cryo-EM datasets. With the improvements in model quality outlined here, qFit can now be increasingly used for finalizing high-resolution models, in both individual cases and large structural bioinformatics projects5, to derive ensemble-function insights.
Results
Overview of qFit protein algorithm
qFit protein is a tool that automatically identifies alternative conformations based on a high-resolution map (generally better than ∼2 Å) and a well-refined single-conformer structure (generally Rfree below 20%). For X-ray maps, we recommend using a composite omit map as input to minimize model bias24. For cryo-EM modeling applications, equivalent metrics of map and model quality are still developing, rendering the use of qFit for cryo-EM more experimental.
Since our previous paper, we have made substantial improvement to the code both algorithmically (e.g. scoring is improved by BIC, sampling of B-factors is now included) and computationally (improving the efficiency and reliability of the code).
All code and associated documentation can be found in the qFit GitHub repository (https://github.com/ExcitedStates/qfit-3.0). The version of qFit associated with this paper is 2023.1.
A – qFit residue
For each residue, qFit samples backbone conformations, dihedral angles, and B-factors. Using mixed quadratic programming optimization (MIQP) and Bayesian information criterion (BIC), we select a parsimonious multiconformer for each residue (Figure 1A). The details of each component of this procedure are outlined below. The sampling and scoring of residues can be run in parallel using Python multiprocessing.
A.1 – Backbone sampling
The qFit process begins with sampling backbone conformations (Figure 1A.1). For each residue, we perform a collective translation of backbone atom (N, C, Cα, O) coordinates. If the model has anisotropic B-factors, this translation is guided by the anisotropic B-factors of the Cβ. Alternatively, if anisotropic B-factors are absent, the translation of coordinates occurs in the X, Y, and Z directions. Each translation takes place in steps of 0.1 along each coordinate, extending to 0.3 Å, resulting in 9 to 81 distinct backbone conformations for further analysis. For Gly and Ala, this is the only sampling that occurs.
A.2 – Aromatic angle sampling
For aromatic residues (His, Tyr, Phe, Trp), qFit takes the conformations from the backbone step (above) and builds part of the side chain out to Cγ (prior to the aromatic ring) based on the input model coordinates (Figure 1A.2). Then, we alter the Cα-Cβ-Cγ angle (“the aromatic angle”) in steps of +/-3.75°, extending to +/-7.5°, creating 5 partial side-chain conformations per backbone conformation. For non-aromatic residues, there is no sampling of this angle. These conformers provide variability in the placement of the aromatic ring prior to dihedral angle sampling.
A.3 – Dihedral angle sampling
The following steps occur for each χ dihedral angle for every residue (Figure 1A.3). For the first dihedral angle (χ1), the input is the backbone conformations (or for aromatic residues the backbone and “aromatic angle” conformers described above). We exhaustively sample around the χ1 dihedral angle by enumerating a conformation every 10° between rotamers. For proline, we sample the exo and endo conformations of the pyrrolidine ring, by +/-60° in steps of 10°. We then eliminate conformations that clash with other parts of the same sampled conformation (based on hard spheres) or are redundant (using an all-atom RMSD threshold of 0.01 Å).
These sampled conformations are then subjected to a quadratic programming (QP) optimization, which identifies the set of conformations whose weighted calculated density best fits the experimental electron density. The output of QP typically yields 5 to 15 conformations that best explain the density.
Next, qFit samples the B-factors of the conformers. The input atomic B-factors are multiplied by a factor ranging from 0.5 to 1.5 in increments of 0.2. The resulting 50 to 150 conformation/B-factor combinations are subjected to a mixed-integer quadratic programming (MIQP) optimization. The MIQP algorithm incorporates two additional constraints relative to QP: a cardinality term, which limits the maximum number of conformations to five, and a threshold term, which stipulates that no individual conformation can have an occupancy weight below 0.2. In qFit, MIQP then outputs up to 5 conformations.
For residues with subsequent dihedral angles, the conformations selected by the MIQP procedure at χ(n-1) serve as the starting conformers for the sampling χ(n) angle. For residues with only one dihedral angle (Ser, Cys, Thr, Val, Pro), we proceed directly to scoring χ1.
A.4 – Final qFit residue Scoring
Upon reaching the terminal dihedral angle, we perform the optimization steps outlined above (QP/MIQP), but instead of relying only on the optimization algorithm to decide on the number of conformations to output, we also consider the model complexity (Figure 1A.4). qFit runs the MIQP step 5 times with a cardinality term ranging from 1 to 5.
Taking each output, we calculate the Bayesian information criterion (BIC). The BIC provides a numerical value of the tradeoff between the difference between the calculated and experimental density (residual sum of squares) and the number of parameters (k). The number of parameters (k) is defined by the following: number of conformers * number of atoms * 4 (representing the x, y, z coordinates and B-factor). A heuristic scaling factor of 0.95 accounts for the fact that the coordinate parameters are not independent due to chemical constraints between atoms during sampling.
qFit then outputs the set of conformations with the lowest BIC value, concluding the qFit residue routine.
B. Connecting residues together into a multiconformer model
After the sampling and scoring of each individual residue, qFit considers the entire protein together. First, we use MIQP and BIC to select the best fitting conformations among connected residues, ensuring that neighboring backbone conformations have the same occupancy. Second, we label the alternative conformers while being aware of clashes. This labeling step is not parallelized.
B.1 – qFit segment
After identifying the optimal conformations for each residue in parallel, qFit reconnects the backbone atoms (Figure 1B.1). Moving linearly along the protein, we identify ‘segments’ of residues with multiple backbone conformations, delimited on each end by a residue with a single backbone conformation. The main reason for this step is to find a harmonious set of occupancies for adjacent residues in a segment. Within each segment, qFit creates fragments of three residues, enumerating all possible combinations of conformations in those residues, and selects the final combination of conformations and their relative occupancies using the optimization algorithms outlined above. The BIC is modified for qFit segment such that k equals the number of conformations. qFit then moves along the segment, enumerating and selecting optimal combinations of fragment conformations until reaching the end of the segment.
B.2 – qFit relabel
Next, qFit determines the correct altloc labeling (A, B, C, D, E) of coupled alternative conformers using Monte Carlo optimization with a simple steric model to prevent spatially adjacent conformers from sterically clashing (Figure 1B.2). There is also an option (‘qFit segment only’) to input a multiconformer model and run only the qFit segment and relabel procedures. This procedure can be especially helpful after manually adding or deleting conformations in Coot14. Running ‘qFit segment only’ will adjust the occupancy of the remaining conformations and correct the labeling of alternative conformations.
B.3 – qFit refinement
The raw output of qFit (a multiconfomer model) should then be refined. We provide scripts for a refinement procedure with Phenix25, where we iteratively refine the occupancy, coordinates, and B-factors, removing conformations with occupancies under 10%. Once the model is stable (has no conformations with occupancies less than 10%), we perform a final round of refinement which optimizes the placements of ordered water molecules (Methods). This refinement protocol outputs a final ‘qFit model’. This model can then be examined and edited in Coot14 or other visualization software.
qFit improves overall fit to data relative to deposited structures
To evaluate the impact of qFit algorithmic and code improvements, we collated a dataset of single-chain, unliganded, high-resolution (1.2-1.5 Å) protein X-ray crystallography structures from the PDB26. We clustered these structures at a sequence identity threshold of 30% and selected the highest-resolution structure. Finally, we ensured that the datasets ran without error through the qFit pipeline, including refinement with Phenix, resulting in 144 diverse structures (Supplementary Figure 1).
Each deposited structure was initially re-refined using phenix.refine (Methods) to eliminate differences from the original refinement protocols. The resulting re-refined model, which we refer to as the ‘deposited model’, was used as the input for qFit. Next, we ran qFit protein using the default parameters and refinement protocol to produce the ‘qFit model’.
To evaluate the crystallographic modeling differences between the deposited and qFit models, we compared the Rfree values as an indicator of overall model/data agreement. The qFit model has a lower (improved) Rfree value for 73% (105/144) of structures (Figure 2A; Supplementary Figure 2A; Supplementary Table 1). On average, there is an absolute decrease of Rfree value by 0.6% (median deposited models Rfree: 18.1%, median qFit models Rfree: 17.5%), which is in line with theoretical expectations for the increase in model complexity created by qFit27, 28. Rfree is an valuable metric for monitoring overfitting, which is an important concern when increasing model parameters as is done in multiconformer modeling. An additional check on overfitting comes from monitoring R-gap, calculated as the difference between Rwork and Rfree. qFit models have similar R-gap values compared to deposited models (mean: 3.0% for both models). Collectively, these results indicate that qFit improves the quality of most models without overfitting (Supplementary Figure 2B).
Despite this general trend of improved models, 25% of the qFit models have worse Rfree than the deposited models (n=36). The majority of these structures had a deposited model Rfree of over 20%. These high Rfree values are notable because our re-refinement procedure generally improved Rfree relative to the originally deposited model (Supplementary Figure 2C). Since qFit builds off of the input structure and the map quality relies on model phases, accurately detecting alternative conformers depends heavily on the agreement between input model and data. This trend reinforced the idea that poor modeling in a deposited model, which serves as input to qFit, will result in poor performance of qFit. It further suggests that qFit is best employed at a late stage of modeling, after the single-structure model is of sufficient quality that it would be deposited in the PDB.
As an example of how qFit can uncover previously unnoticed conformational heterogeneity, we examined differences in conformations in the deposited versus qFit models of the Pyrococcus horikoshii fibrillarin pre-rRNA processing protein (PDB: 1G8A)29. We focused on the residues adjacent to the RNA binding motif. Among these residues, qFit identified well-justified alternative conformations for residues Leu58, Phe69, and Met175, including new rotamers for Leu58 and Met175, that were not present in the deposited model (Figure 2B). Beyond detecting alternative conformers in each of these residues, the qFit labeling process identified potential coupled motions between the alternative conformers. For example, when Leu58 is in the ‘up’ position (altloc B), Phe69 is also in the ‘up’ position (altloc B). It is possible that this coupled motion plays a role in RNA binding, a hypothesis that may merit further investigation.
qFit recovers alternative conformations of deposited models and discovers new ones
As qFit mainly alters structures by adding alternative conformations, we examined the differences in the number of alternative conformations between the deposited models and qFit models. Only 2.9% of residues in the deposited models were multiconformers (two or more alternative conformations, n=970). In contrast, 40.7% (n=11,049) of residues in the qFit models were multiconformers (Figure 2C). The vast majority (92.5%) of multiconformer residues in the qFit models have only two alternative conformations; only 2.4% of residues have more than two alternative conformations.
Alternative conformations created by qFit come in a few main varieties. First and most obvious are alternative conformations that represent drastic changes in coordinates, most commonly in the form of rotameric changes. Most alternative conformations found in deposited models fall into this category. Second are more subtle changes in side-chain and backbone coordinates to represent heterogeneity within a rotameric state. This behavior is exemplified by the Tyr residue in Figure 1A. Third, is even more subtle changes in coordinates to avoid strain because of the alternative conformations of neighboring residues30. This category is essentially imperceptible to visual inspection, as the atom centers are nearly superimposable, but is important to avoid outlier bond geometry because of adjacent residues having larger displacements.
To quantify how often qFit models new rotameric states, we analyzed the qFit models with phenix.rotalyze, which outputs the rotamer state for each conformer as a string (Methods)31, 32. We classified the agreement between the deposited and qFit models into 5 categories (Figure 2D, Supplementary Figure 3). The first category contains residues that have the same rotameric states in both models. This category entails most single-conformer and multiconformer residues with agreement between the two models. Moreover, residues that have multiple conformations in the same rotamer in the qFit model (for the reasons described above) generally populated the same rotamer as found in single-conformer residues in the deposited models. Overall this category, “Consistent”, represents 93.7% of residues (n=42,626) in the dataset.
The second and third categories deal with imbalance in alternative conformations that populate distinct rotamers. Since the original premise of qFit was to discover unmodeled alternative conformations, it is unsurprising that many residues in qFit models populate additional rotameric states that are absent in the deposited model. This category, “Additional Rotamer(s) in qFit model”, represents 2.38% of residues (n=1,082). In contrast, only two residues (0.06% of the dataset) are classified in the converse category, “Additional Rotamer(s) in deposited model”.
The final two categories cover disagreements in rotamer assignments. There are many cases where we observe only partial agreement between alternative conformers modeled in both the deposited and qFit models. These multiconformer residues share at least one common rotamer, but also populate alternative rotamers that are distinct between the two models. This behavior generally occurs in longer residues where subtle differences at higher χ angles leads to distinct rotameric assignments. This category, “Consistent & Different Rotamers”, represents 0.82% of residues (n=373). The final category, “Different”, covers both multiconformer and single-conformer residues where there are no shared rotamer states between the two models. One reason this category occurs is for similar reasons as the “Consistent & Different” category: differences in terminal χ angles in weak density lead to distinct rotamer assignments. Another contributor to this category is single conformers, generally in the deposited model, modeled into density that qFit interprets as multiconformer. Often the rotamer modeled by the single conformer fits an “average” rather than the two distinct minima fit by the multiconformer model. “Different” rotamer assignments represent 3.04% of residues (n=1,384). While the analyses above include all residues, focusing on residues that were modeled in as multiconformers in the deposited models (n=970) reveals a large increase in the “Different” and “Consistent & Different Rotamers” categories, to 14.88% (n=144) and 27.68% (n=268) of residues, respectively. This increase highlights the sensitivity of the rotamer assignments and motivates benchmarking qFit on “true positive” synthetic data in addition to deposited multiconformers.
Collectively, these analyses revealed that qFit identifies the majority of deposited alternative conformations and discovers new ones. Discrepancies between manually modeled and qFit alternative conformations predominantly result from weak density at terminal χ angles. When considered with the improvements in Rfree, these results indicate that qFit is detecting more of the true underlying conformational heterogeneity that exists in crystallographic data.
qFit improves multiple side-chain model geometry metrics
Although qFit improves the agreement of model to data by the addition of alternative conformations, we questioned whether this improvement comes at the cost of degrading model geometry. On one hand, the absence of geometric constraints in qFit backbone residue sampling and the connections made during qFit segment may result in worse geometry. On the other hand, placing additional alternative conformers may alleviate strain in the model that can result from fitting a single conformer into density that should be supported by multiple conformers10, 33.
To validate geometry, we used MolProbity to evaluate the deposited and qFit models. MolProbity compares input models with idealized values and then provides component scores for various geometric and steric features that are summarized in an overall “MolProbity score”34. Component scores that examine all atoms (bond angle/length, clashscore) or side-chain atoms (rotamers) account for all alternative conformers. In contrast, scores that evaluate the backbone (Ramachandran, Cβ deviations) are reported for single-conformer residues or using only altloc A for multiconformer residues. Therefore, the overall MolProbity score includes some of the contributions of alternative conformations, but also misses the potential impact on some other aspects. In the future, we aim to explore updated metrics that consider all alternative conformations.
Compared to deposited models, qFit models had equivalent MolProbity scores (1.27 median deposited versus 1.31 median qFit, p=0.66 from two-sided t-test; Figure 3A), which indicated that qFit does not worsen geometry for better density fitting. To further understand which parts of the model geometry were different (if any) between the deposited and qFit models, we explored the individual component scores, and observed multiple component scores that improved in the qFit models. This included considerable improvements in bond lengths and angles in the qFit models (RMSD between idealized values for bonds: 0.010 Å median deposited versus 0.007 Å median qFit, p=0.0038 from two-sided t-test; RMSD between idealized values for angles: 1.30° median deposited v. 0.93° median qFit, p=8.8e-20 from two-sided t-test; Figure 3B,C). We suspect that the primary factor behind this improvement was the incorporation of multiconformers, rather than straining a single conformer, to explain the density. To visualize these differences, we examined Met189 from PDB: 1V8F. In the deposited model this residue has Sδ-Cε bond lengths of 1.596 Å, which are significantly shorter than the idealized lengths of 1.791 +/-0.025 Å34. qFit adds an additional conformation, both explaining previously unmodeled density and bringing the SD-CE bond lengths to 1.790 Å (altconf A) and 1.794 Å (altconf B) for the two conformations much closer to the expected values (Figure 3E). This multiconformer residue with improved geometry is consistent with the hypothesis that qFit is alleviating strained geometry by modeling multiple conformations.
Additionally, qFit models have improved clashscores (2.50 median deposited, 2.27 median qFit, p=0.0028 from two-sided t-test; Figure 3D). We hypothesized that this was due to a mixture of modeling of alternative conformers and improved fit of single-conformer residues which are re-sampled and refined during the qFit procedure. We looked at the qFit modeling differences in a cluster of Met and Leu residues in PDB: 6HEQ, which had one of the largest changes in clashscores between the deposited and qFit models. We observed that qFit fixes the positioning of Met83, preventing the clash with both conformers of Leu81 (Figure 3F).
However, qFit models tended to have worse rotamer scores, likely due to our extensive sampling. Further, we observed a slight, but not significant, decrease in favored Ramachandran values which may indicate the need to improve qFit segment procedures. It is difficult to evaluate if this is worse than deposited models in residues with backbone alternative conformers as this score only accounts for alternative conformer A (Supplementary Figure 4). However, both rotamer and backbone and rotamer sampling represent areas of potential qFit improvements. Overall, while there is a mix of geometry metrics that improve or deteriorate in qFit models, the negligible difference in overall MolProbity scores suggests that qFit does not deteriorate the model geometry in pursuit of model/data agreement.
Simulated data demonstrate indicate qFit is appropriate for high-resolution data
In the previous sections, we established that qFit has the potential to improve Rfree and some geometry metrics relative to deposited structures. However, the vast majority of the residues in these deposited structures are modeled exclusively as single conformers. This homogeneity in single-conformation models limited our ability to assess how well qFit can recapitulate existing alternative conformers across a wide resolution range. To address this question, we generated artificial structure factors using an ultra-high-resolution structure (0.77 Å) of the SARS-CoV-2 Nsp3 macrodomain (PDB: 7KR0)35. This model had a high proportion of residues (47%) manually modeled as alternative conformations and did not employ qFit, making it an ideal comparison structure. We refer to this structure as the “ground truth 7KR0 model”, and evaluated how well its alternative conformations were recapitulated by qFit as resolution was artificially worsened across synthetic datasets.
To create the dataset for resolution dependence, we used the ground truth 7KR0 model, including all alternative conformations, and generated artificial structure factors from 0.8 to 3.0 Å resolution (in increments of 0.1 Å). We then added random noise to the structure factors that increased as resolution worsened (Methods; Supplementary Figure 5A and 5B). To create a single-conformer model appropriate for input to qFit, we removed all alternative conformations from the ground truth model, maintaining all single conformations and altloc A. Next, we refined this single-conformer model against the synthetic datasets. Finally, we used the refined single-conformer model as input for qFit.
To evaluate the fidelity of qFit in recapitulating the ground truth 7KR0 model, we categorized each residue across two axes: positive/negative and true/false. First, we define the positive/negative axis, with positive being modeled as multiconformer and negative being modeled as single-conformer in the outputted qFit model. Next, we define the true/false axis as agreement of the outputted qFit model with the ground truth 7KR0 model in terms of multiconformer/single-conformer status . Due to many residues in both the ground truth and qFit models having alternative conformations that nearly overlap each other, we categorize residues as multiconformer only if they possess at least two alternative conformers with a side-chain heavy-atom root-mean-square deviation (RMSD) greater than 0.5 Å. From this cutoff, 50 out of the 169 residues (30%) in the ground truth model are classified as multiconformers.
We used a similar 0.5 Å cutoff for defining agreement between multiconformer residues in categorizing a residue as a true positive, false positive, false negative, or true negative (Methods and Figure 4A). A “true positive” has agreement between multiconformers across ground truth and qFit models; a “true negative” has agreement between single conformers across ground truth and qFit models. Generally, a “false positive” has extra or distinct conformations in the qFit model; a “false negative” has at least one alternative conformation in the ground truth model that is not present in the qFit model, or discordant single-conformer conformations.
We observed that qFit is consistently strong at capturing single-conformer residues (true negatives) across resolutions; however, the power to detect alternative conformations (true positives) is limited beyond resolutions of ∼1.8-2.0 Å (Figure 4B; Supplementary Figure 5C). This behavior is exemplified by Glu114, which is multiconformer in the ground truth model (Figure 4C). At high resolution (1.0 Å), qFit correctly models the alternative conformation and this residue is categorized as a true positive. However, as resolution gets worse, qFit begins to mismodel this residue. At 1.8 Å resolution, qFit still models two alternative conformations and has a good fit to density; however, the secondary conformer has an RMSD greater than 0.5 Å away from the ground truth model; consequently this residue is now categorized as a false positive. Finally, at 2.8 Å resolution, qFit only models a single conformer, moving the residue to the false negative category.
Next, we tested the ability of qFit to detect alternative conformations over a larger, more diverse dataset. We generated artificial structure factors for the qFit models with improved Rfree values over the deposited values from the previous sections (n=110). Although this dataset is more diverse, it has a notable weakness relative to the 7KR0 dataset test: the 7KR0 alternative conformations were modeled manually, whereas the larger dataset has alternative conformations modeled by qFit. Therefore, this second synthetic dataset assesses convergence of the qFit models across resolution.
Using these qFit models as ground truth models, we generated structure factors, performed refinement of single-conformer models, and ran qFit over the resolution range of 1.0 to 3.0 Å (Supplementary Figure 5A). We observed a similar fall-off of true positive detection around 2.0 Å (Supplementary Figure 5D). Importantly, this dataset indicates that qFit still models single conformers well at lower resolutions. We also observe a trend of increased false positive/negative rates for longer residues that are just outside the 0.5 Å cutoff (Supplementary Figure 6). We did not observe a relationship between input model Rfree and the number of correctly modeled conformers, but it is difficult to tell whether our synthetic noise procedures properly capture the dependence of qFit performance on input model/data agreement (Supplementary Figure 7A/B).
We then assessed the agreement between individual conformers and the map. To do this, we used the Q-score36, which compares the map profile of an atom with an ideal Gaussian distribution that would be observed if the atom perfectly fits into the density. Across the test dataset, residues that qFit models as single conformers have an almost equivalent Q-score to the ground truth model even at lower resolutions (Figure 4D). The primary alternative conformations in qFit models (occupancy between 0.5 and 1.0) and lower-occupancy alternative conformations (occupancy <0.5) display Q-scores that are very close to the equivalent “ground truth model” alternative conformations until a resolution of about 1.8 Å. At lower resolutions there is a dramatic fall-off in model/map agreement for these alternative conformers. These trends were also observed with the 7KR0 dataset (Supplementary Figure 7C). Overall, these analyses on both the 7KR0 and larger synthetic datasets confirm that qFit will best detect alternative conformations with high-resolution (1.8-2.0 Å or better) data.
qFit models alternative conformers in cryo-EM density maps
As single-particle cryo-EM is increasingly producing high-resolution (better than 2 Å) reconstructions where alternative conformers can be detected37, 38, we wanted to improve and test the ability of qFit to model alternative conformations guided by cyro-EM maps. While a previous version of qFit introduced EM21, we had not optimized the approach to work with EM maps and models. qFit can now be run in ‘EM mode’ which uses electron structure factors, improves the treatment of solvent background levels, and reduces the default maximum number of alternative conformations (cardinality) (Methods).
To benchmark our ability to model alternative conformations in high-resolution cryo-EM structures, we initially gathered a dataset of 22 structures with a depositor-provided resolution better than 2 Å (Fourier Shell Correlation (FSC) at 0.143). However, only 8 of these structures have a resolution better than 2 Å (FSC at 0.143) when calculated by the Electron Microscopy Data Bank (EMDB)39. Some of the original 22 structures did not have FSC curves in EMDB (n=6) due a lack of data, and others had an EMDB calculated resolution worse than 2 Å (n=8) (Supplementary Table 2). The absence of standardized maps for determining cryo-EM structure resolution complicated our selection of structures for qFit analysis.
We downloaded the eight models with resolution better than 2 Å from the PDB and their corresponding maps from EMDB. Using the default parameters of phenix.autosharpen, we sharpened all maps and re-refined each structure (phenix.real_space_refine) against its sharpened map. qFit was run with the ‘EM’ flag and the output model was refined using the qFit real space refinement script (Methods).
Across the first asymmetric unit of the 8 models, 6.97% (n=61) of residues in the deposited model had at least two alternative conformers in the deposited structure, compared with 39.6% (n=266) in the qFit model. To determine if qFit could recapitulate the modeling of alternative conformers from deposited structures, we compared the high-resolution apoferritin deposited model (PDB: 7A4M, resolution: 1.22 Å) with the qFit model using the same criteria outlined in the resolution dependence section above (RMSD within 0.5 Å). qFit correctly models 78% of residues in the first asymmetric unit. This includes Arg22, which has two alternative conformations in the deposited model. qFit was able to recapitulate both alternative conformations (Figure 5A), highlighting that qFit can detect manually modeled alternative conformations in cryo-EM maps. In addition, qFit detected several unmodeled alternative conformers that were visually confirmed (Figure 5B-D).
As with the X-ray models, we wanted to determine how qFit changes the model geometry. Similar to the X-ray models, we observed that qFit improves bond lengths and angles and Cβ deviations. qFit does increase (worsen) the MolProbity clashscore of most structures, primarily due to increased clashes with waters, likely because their positions are not reset in phenix.real_space_refine (Supplementary Figure 8).
While we have made significant progress in modeling alternative conformations in cryo-EM data, the lack of consistent map handling, validation, and metrics with cryo-EM structures and maps is a major impediment to further development. Even among this select group of structures, there were varying levels of experimental and computational map details on EMDB and in manuscripts37, 38, 40, including information on masking, handling of bulk solvent, and local resolution. Our approach depends on sampling and scoring based on resolution. While there is an accepted formula for calculating resolution (FSC at 0.143), the maps to calculate these can vary, leading to differences in resolution as we observed between the deposited versus EMDB calculated resolution. Further, resolution can vary across a single model, and metrics for such local resolutions are not always widely available. Additionally, the handling of background bulk solvent values vary widely, from masking to flattening these values. The EM community will be able to benefit from improved ensemble modeling as efforts to standardize the storage of raw, meta, and processed data continue to improve.
Discussion
Structural biology plays a vital role in understanding the complex connection between protein structure and function. However, since proteins exist as ensembles, structural biology modeling approaches need to adapt accordingly. X-ray crystallography and cryo-EM data hold significant information on these ensembles that are often ignored. qFit offers a solution by leveraging powerful optimization algorithms to transform well-modeled single or static models into multiconformer models. Here we demonstrate that qFit can uncover widespread conformational heterogeneity that better represents the true underlying conformational ensemble data as demonstrated by lower Rfree values. Further, we determine that qFit can reliably pick up on alternative conformers that were modeled manually, highlighting that qFit could be used as a tool to significantly speed up modeling of high-resolution structures.
This automation in modeling is needed especially in light of advances in data collection automation and fast detectors. These tools have revolutionized the field of X-ray crystallography, enabling high-temperature datasets, time-resolved experiments, and high-throughput data collection41, 42, 43, 44, 45. With the ability to capture different conformations, there is a growing demand for methods that can detect protein alternative conformers to extract as much biological information as possible. This is highlighted in massive ligand-soaking campaigns35, 46–48, where there are often hundreds of structures with different ligands to parse. qFit provides a key tool to help extract the most out of these structures by improving the models and providing a better jumping-off point to determine how ligand binding impacts the protein. However, our data here shows that not only does qFit need a high-resolution map to be able to detect signal from noise, it also requires a very well-modeled structure as input.
While both throughput and resolution are currently lower for cryo-EM, recent high-resolution maps have observable conformational heterogeneity37, 40. Current classification approaches do not allow sorting based on signals as small as alternative side-chain conformations, necessitating approaches like qFit for modeling49–51. We see great potential in combining qFit with classification approaches to understand conformational heterogeneity at different scales. In the future, qFit can likely be applied more widely to EM maps in regions with high local resolution52. In addition, we will also incorporate modeling of nucleic acids, with an emphasis on automating refinement of alternative base positions in high-resolution ribosome structures in future work53, 54.
However, we encountered many difficulties in applying qFit to EM data relative to the more established X-ray data. In particular, there are still disparities in how maps are sharpened and how masks are used to exclude noise or lower experimental signals, such as solvent55, 56, making it very challenging to evaluate whether models, especially multiconformer or ensemble models, have improved fit to the data. We suggest strengthening guidelines for reporting computational processing, and improving validation tools to gauge agreement between models and cryo-EM maps55–57.
We envision many other future improvements that will further enhance the quality and accuracy of multiconformer models for both X-ray crystallography and cryo-EM. Simulations have demonstrated that subpar modeling of the macromolecule(s) and surrounding solvent is a major potential avenue to further reduce R-factors27, 28. To accurately account for water molecules in multiconformer models, partially occupied water molecules must be identified and labeled in connection with protein atoms. Automated detection and refinement of partial-occupancy waters should help further reduce Rfree15 and provide additional insights into hydrogen-bond patterns and the influence of solvent on alternative conformations.
Additionally, while qFit models have improved geometry in some respects relative to single-conformer models, we still have room for improvement for fixing rotamer outliers and backbone metrics (Ramachandran and Cβ deviations). The geometry improvements are likely mostly due to single-conformer models having strained conformations that fit the “mean” conformation rather than multiple partially overlapping conformations. Further gains in both accuracy and geometry quality will emerge with better sampling of backbone conformations20. Such improvements are important because splitting the backbone, where appropriate, can result in detection of biologically important alternative conformations58–63. Notably, the recently described FLEXR approach, which leverages Ringer to detect density peaks contributed alternative conformations and Coot to model alternative side chains into density peaks, illustrates that many gains can be made with side chain focused modeling alone19. However, further improvements to backbone modeling, including larger-scale motions such as alternative loop conformations64 or coordinated larger-scale shifts of secondary-structural elements65, 66 will likely yield even higher quality multiconformer models.
Lastly, the experimental and computational advancements in structural biology have increased the focus on ensemble-based models10, 13, 21, 50, 51. But the current data format for structural models (PDB, mmCIF) does not allow for more complex representation of ensembles. To appropriately capture the many aspects of ensembles, we would ideally like to have multiple nested ensembles representing both larger and local conformational changes, or to be able to show how two different backbone conformations can each be “parents” to different side-chain conformations. Currently, neither the PDB nor CIF format allows for this type of representation67–69.
In summary, qFit drastically reduces the time and effort required to create multiconformer models from X-ray and cryo-EM data, thereby lowering the barrier to generating new hypotheses about the relationship between conformational ensembles and biological function4, 5, 70, 71. Additionally, qFit can provide key data to bridge to the next frontier of structure prediction. While AlphaFold has achieved stunning success in predicting protein structure by training against single-conformation models72, future improvements to structure prediction might be gained by more accurately modeling the extent of conformational heterogeneity72.
Methods
Generating the qFit test set
To test the impact of algorithmic changes in qFit, we created a dataset of 144 high-resolution (1.2-1.5 Å) X-ray crystallography structures deposited in the PDB (Supplementary Table 1). These were single-chain protein structures (in the asymmetric unit and at the level of biological assembly) and contained no ligands or mutations. The maximum sequence identity between any two structures was set as 30%. Based on CATH classification31, the resultant entries represented 24 space groups and 72 folds (Supplementary Table 1). All these structures were re-refined as described in “Initial refinement protocol”. These re-refined models are referred to as deposited models. To create multiconformer models, we input the re-refined structures in qFit protein, followed by the post qFit refinement protocol. These multiconformer models are referred to as qFit models.
Initial refinement protocol
All structures from the PDB were re-refined using phenix.refine with the following parameters.
The re-refined models were used as the input for subsequent qFit models.
Running qFit
For this analysis qFit was run using the following command from qFit version 2023.1.
X-ray:
qfit_protein composite_omit_map.mtz -l 2FOFCWT,PH2FOFCWT rerefine_pdb.pdb
Cryo-EM:
qfit_protein sharpened_map.ccp4 rerefine_cryo-EM.pdb -r <resolution> -em -n 10 -s 5
qFit Improved Features
Bayesian information criterion (BIC)
BIC was implemented in the final selection of residue and segment conformations. BIC is defined as the real space residual correlation coefficient penalized by the number of parameters (k):
In qFit residue, k is defined as:
In qFit segment, k is defined as:
BIC is calculated for each candidate cardinality (1-5). We then choose the set of conformations with the lowest BIC as the final conformations for the residue or segment under consideration.
B-factor sampling
To sample B-factors along with atomic coordinates at each step of qFit residue, we first perform one round of quadratic programming to reduce the number of conformations. For all remaining conformations, the input B-factor of each atom in the residue is multiplied by 0.5-1.5 in increments of 0.2. All conformations with sampled B-factors and coordinates are inputs for mixed integer quadratic programming.
Iterative optimization algorithm with non-convex solutions
Due to our exhaustive sampling, there are times when the MIQP optimization algorithm fails to find a non-convex solution. To address this limitation, we have implemented a procedure that iteratively removes solutions one-by-one based on the two solutions with the closest root-mean-square deviation (RMSD) until MIQP identifies a solution.
Parallelization of large maps
Often, cryo-EM maps are very large and reach memory limits using Python multiprocessing. Multiprocessing is used to model multiple residues independently in parallel. We have now implemented a new scheme to divide the density map into portions centered around each residue of interest and feed those portions of the map into our parallelization.
Occupancy constraints
To help refine segments (i.e. sets of residues with alternative conformations flanked by residues with only a single conformation) during X-ray refinement, we now output a restraint file at the end of the qFit protein run for X-ray refinement. This restraint file will enables “group occupancy refinement” for residues in a segment with the same alternative conformation. In group occupancy refinement all residues within the group are refined to the same occupancy, reducing the free parameters to fit.
Finalizing qFit models with iterative refinement
We iteratively run 5 macrocycles of refinement followed by a script that removes any conformations with occupancy less than 0.1. This script also renormalizes the occupancies of any remaining conformations in that segment, ensuring that the occupancy sums to 1. This procedure ends when no conformations have a refined occupancy of less than 0.1 or after 50 total rounds of refinement (whichever comes first). After, we do one final refinement where we release the occupancy constraints on the segments, turn on automated solvent picking, and optimize B-factors (specified as ADP parameters in Phenix) and coordinate weights.
Cryo-EM
To improve the detection of alternative conformations in cryo-EM structures, we made some key updates to part of the qFit algorithm. All of these updates to the algorithm will turn on with the -em flag. First, we now use electron scattering factors when calculating the modeled electron density. Second, we have removed bulk solvent electron density values (set at 0.3 in X-ray qFit protein). We also restricted the cardinality to be 0.3 (compared to 0.2 in X-ray qFit protein) to reduce misplaced conformations.
Q-score
We implemented the option for users to use Q-scores to determine if qFit should be run on a residue or not. This option is off by default. To use this option, generate Q-scores (mapq.py) available as part of the Q-score command-line interface (https://github.com/gregdp/mapq). qFit takes in a text file of Q-scores by using the – qscore option in qFit_protein. By default, all residues with a Q-score of less than 0.7 are not modeled as multiconformers, but are considered in qFit segment. Users can also adjust this level by using the –qscore_cutoff option in qFit protein.
qFit-segment-only runs
qFit can be used as a tool along with iterative model building and refinement. If a user manually removes or adds additional conformations using Coot14 or similar software, this can disrupt the occupancy sum of the residue and the connectivity of the backbone. To alleviate such problems, we developed an option (qfit_protein –only-segment) to facilitate manual model adjustment after running qFit. This procedure generates connected backbones with consistent occupancies for coupled neighboring conformers.
For example, suppose residue N has four alternative backbone conformations (A, B, C, D) and residue N+1 has two alternative conformations (A, B). In that case, this procedure will create C and D conformers for residue N+1 by duplicating its A and B conformers. This duplication continues until we reach the end of a segment so that all backbones have the same number of alternative conformations (A, B, C, D) and are, therefore, properly connected. Subsequent crystallographic refinement of this model (see “Post-qFit refinement script” above) will cause the duplicated conformations to diverge slightly, and will behave as expected without introducing geometry errors.
Metrics
Scripts for all metrics can be found in the scripts folder in the qFit GitHub repository (https://github.com/ExcitedStates/qfit-3.0). Our scripts for running qFit protein on an SGE-based server and all scripts for figures can be found here: https://github.com/fraser-lab/qFit_biological_testset/tree/main.
R values
R-values were obtained after the final round of refinement for the re-refined deposited models (deposited_rerefine.sh) and for the qFit models after the iterative refinement script (qfit_final_xray_refine.sh).
RMSF
RMSF was measured for each residue based on all side-chain heavy atoms. First, the center of all conformations is identified. We then calculate the square root of the mean of the squared distances between the center of each individual conformations to the center of all conformations. Each distance is then weighted by occupancy (RMSF.py).
B-Factors
For each residue, we calculated an occupancy weighted B-factors for each residue (each heavy atom B-factor is weighted by it’s occupancy). We obtain the average heavy atom B-factor for every conformation and multiply it by the occupancy.
Rotamers
The rotamer name for each alternative conformation was determined by phenix.rotalyze34 while manually relaxing the outlier criteria to 0.1%. Each alternative conformation is given its rotamer. Rotamers were compared on a residue-by-residue basis. To compare rotamers, we only consider the first two χ dihedral angles. Each residue was classified into four categories: same, additional rotamer in qFit model, additional rotamer in the deposited model, or different.
Generating synthetic data for resolution dependence
To generate artificial electron density data at increasingly poorer resolutions, we first increased the B-factors of all atoms of the ground truth model by 1 Å2 for every 0.1 Å reduction in resolution and placed the models in a P1 box. We randomly shook the coordinates using the shake argument in phenix.pdbtools with root-mean-square error of shaking given as 0.2 * desired resolution of synthetic data. We generated structure factors (Fshake) for each of these shaken models from 0.8 Å to 3.0 Å in increments of 0.1 Å using the phenix.fmodel command-line function (with bulk solvent parameters k_sol=0.4, b_sol=45, and 5% R-free flags). We then added noise to the structure factors as follows:
Fnoisy = Fshake + (sqrt(Fshake) * random number from normal distribution * resolution of model * 0.5).
The scaling factors of 0.2 and 0.5 for shake RMSD and noise addition was determined by trying out different values and finalizing on the values which gave the most reasonable R-factors over the resolution range after refining the model against the generated structure factors. The addition of noise to Fshake was done using the sftools command in CCP4. Then, the ground truth model with adjusted B-factors was stripped of alternative conformations (if any) at every residue position. The resulting single-conformer model was refined with the Fnoisy structure factors (Supplementary Figure 5A).
The final refined model was given as input to qFit and the composite omit map was obtained for the Fnoisy structure factors. The multiconformer model given by qFit was refined with phenix.refine as explained in the post-qFit refinement script section. Since there is some randomness involved in simulating noise in the synthetic datasets, we repeated the synthetic data generation ten times at each resolution and all the steps following that including qFit modeling and post-qFit refinement for each of these noisy synthetic maps. The same steps of data synthesis were followed for the larger qFit test dataset containing 110 models, except that one set of structure factors was generated for each model at each resolution instead of ten as in the 7KR0 dataset.
True/False Positive/Negative matrix for synthetic data
True positive residues were those with at least two alternative conformations and an RMSD of less than 0.5 Å between the ground truth and qFit model conformations (for example, qFit model altloc A has an RMSD of less than 0.5 Å to ground truth model altloc A or B, and qFit model altloc B has an RMSD of less than 0.5 Å to the other ground truth model altloc A or B) (Figure 4A). A false positive residue has at least two alternative conformations in the qFit model, but fewer conformations in the ground truth model (Figure 4A). Alternatively, for a false positive residue, if the ground truth model residue is also multiconformer, then the RMSD between at least one of the conformations of qFit residue and ground truth residue is more than 0.5 Å (Figure 4A). A true negative residue is when both the ground truth and qFit model have a single conformer and they have an RMSD of less than 0.5 Å (Figure 4A). A false negative residue is when the qFit model has a single conformer but the ground truth model has more than one alternative conformer or both models have a single conformer but they have an RMSD greater than 0.5 Å (Figure 4A).
Acknowledgements
This work was supported by: a National Institutes of Health (NIH) grant GM145238 and Chan Zuckerberg Initiative Open Software grant to JSF and NIH R35 GM133769 to DAK. We thank Christopher Williams and Vincent Chen for help with interpretations of MolProbity score ideal side-chain geometry.
Supplementary Figures
References
- 1.Single-Particle Cryo-EM at Crystallographic ResolutionCell 161:450–457
- 2.Structural heterogeneity in protein crystalsBiochemistry 25:5018–5027
- 3.Achieving better-than-3-Å resolution by single-particle cryo-EM at 200 keVNat. Methods 14:1075–1078
- 4.An expanded allosteric network in PTP1B by multitemperature crystallography, fragment screening, and covalent tetheringElife 7
- 5.Ligand binding remodels protein side-chain conformational heterogeneityElife 11
- 6.Ensemble-function relationships to dissect mechanisms of enzyme catalysisSci Adv 8
- 7.Is one solution good enough?Nat. Struct. Mol. Biol 13:184–5
- 8.What Will Computational Modeling Approaches Have to Say in the Era of Atomistic Cryo-EM Data?J. Chem. Inf. Model 60:2410–2412
- 9.E pluribus unum, no more: from one crystal, many conformationsCurr. Opin. Struct. Biol 28:56–62
- 10.Vagabond: bond-based parametrization reduces overfitting for refinement of proteinsActa Crystallogr D Struct Biol 77:424–437
- 11.Improving sampling of crystallographic disorder in ensemble refinementActa Crystallogr D Struct Biol 77:1357–1364
- 12.Modelling dynamics in protein crystal structures by ensemble refinementElife 1
- 13.A method for intuitively extracting macromolecular dynamics from structural disorderNat. Commun 12
- 14.Features and development of CootActa Crystallogr. D Biol. Crystallogr 66:486–501
- 15.The solvent component of macromolecular crystalsActa Crystallogr. D Biol. Crystallogr 71:1023–1038
- 16.XDSActa Crystallogr. D Biol. Crystallogr 66:125–132
- 17.Linking crystallographic model and data qualityScience 336:1030–1033
- 18.How Good Can Single-Particle Cryo-EM Become? What Remains Before It Approaches Its Physical Limits?Annu. Rev. Biophys 48:45–61
- 19.FLEXR: automated multi-conformer model building using electron-density map samplingActa Crystallogr D Struct Biol 79:354–367
- 20.Exposing Hidden Alternative Backbone Conformations in X-ray Crystallography Using qFitPLoS Comput. Biol 11
- 21.qFit 3: Protein and ligand multiconformer modeling for X-ray crystallographic and single-particle cryo-EM density mapsProtein Sci 30:270–285
- 22.Modeling discrete heterogeneity in X-ray diffraction data by fitting multi-conformersActa Crystallogr. D Biol. Crystallogr 65:1107–1117
- 23.qFit-ligand Reveals Widespread Conformational Heterogeneity of Drug-Like Molecules in X-Ray Electron Density MapsJ. Med. Chem 61:11183–11198
- 24.Iterative-build OMIT maps: map improvement by iterative model building and refinement without model biasActa Crystallogr. D Biol. Crystallogr 64:515–524
- 25.Towards automated crystallographic structure refinement with phenix.refineActa Crystallogr. D Biol. Crystallogr 68:352–367
- 26.The Protein Data BankNucleic Acids Res 28:235–242
- 27.The R-factor gap in macromolecular crystallography: an untapped potential for insights on accurate structuresFEBS J 281:4046–4060
- 28.Why protein R-factors are so large: a self-consistent analysisProteins 46:345–354
- 29.Fibrillarin from Archaea to humanBiol. Cell 107:159–174
- 30.[No title]. https://phenix-online.org/phenixwebsite_static/mainsite/files/newsletter/CCN_2023_01.pdf#page=2.
- 31.CATH--a hierarchic classification of protein domain structuresStructure 5:1093–1108
- 32.The penultimate rotamer libraryProteins 40:389–408
- 33.Alternate conformations always want to spread
- 34.MolProbity: More and better reference data for improved all-atom structure validationProtein Sci 27:293–315
- 35.Fragment binding to the Nsp3 macrodomain of SARS-CoV-2 identified through crystallographic screening and computational dockingSci Adv 7
- 36.Measurement of atom resolvability in cryo-EM maps with Q-scoresNat. Methods 17:328–334
- 37.Single-particle cryo-EM at atomic resolutionNature 587:152–156
- 38.Adeno-Associated Virus (AAV-DJ)-Cryo-EM Structure at 1.56 Å ResolutionViruses 12
- 39.Evolution of standardization and dissemination of cryo-EM structures and data jointly by the community, PDB, and EMDBJ. Biol. Chem 296
- 40.Atomic-resolution protein structure determination by cryo-EMNature 587:157–161
- 41.Room-temperature crystallography reveals altered binding of small-molecule fragments to PTP1Bhttps://doi.org/10.1101/2022.11.02.514751
- 42.The temperature-dependent conformational ensemble of SARS-CoV-2 main protease (Mpro)https://doi.org/10.1101/2021.05.03.437411
- 43.Mapping Protein Dynamics at High-Resolution with Temperature-Jump X-ray Crystallographyhttps://doi.org/10.1101/2022.06.10.495662
- 44.Mix-and-inject XFEL crystallography reveals gated conformational dynamics during enzyme catalysisProc. Natl. Acad. Sci. U. S. A 116:25634–25640
- 45.The mechanisms of catalysis and ligand binding for the SARS-CoV-2 NSP3 macrodomain from neutron and x-ray diffraction at room temperatureSci Adv 8
- 46.Iterative computational design and crystallographic screening identifies potent inhibitors targeting the Nsp3 macrodomain of SARS-CoV-2Proc. Natl. Acad. Sci. U. S. A 120
- 47.Crystallographic and electrophilic fragment screening of the SARS-CoV-2 main proteaseNat. Commun 11
- 48.X-ray screening identifies active site and allosteric inhibitors of SARS-CoV-2 main proteaseScience 372:642–646
- 49.CryoDRGN: reconstruction of heterogeneous cryo-EM structures using neural networksNat. Methods 18:176–185
- 50.Deep learning-based mixed-dimensional Gaussian mixture model for characterizing variability in cryo-EMNat. Methods 18:930–936
- 51.Uncovering structural ensembles from single-particle cryo-EM data using cryoDRGNNat. Protoc 18:319–339
- 52.Residue-wise local quality estimation for protein models from cryo-EM mapsNat. Methods 19:1116–1125
- 53.Synthetic group A streptogramin antibiotics that overcome Vat resistanceNature 586:145–150
- 54.The translating bacterial ribosome at 1.55 Å resolution generated by cryo-EM imaging servicesNat. Commun 14
- 55.Validation analysis of EMDB entriesActa Crystallogr D Struct Biol 78:542–552
- 56.Cryo-EM model validation recommendations based on outcomes of the 2019 EMDataResource challengeNat. Methods 18:156–164
- 57.Electron microscopy holdings of the Protein Data Bank: the impact of the resolution revolution, new validation tools, and implications for the futureBiophys. Rev 14:1281–1301
- 58.3rd, Richardson, D. C. & RichardsonJ. S. The backrub motion: how protein backbone shrugs when a sidechain dances. Structure 14:265–274
- 59.A simple model of backbone flexibility improves modeling of side-chain conformational variabilityJ. Mol. Biol 380:757–774
- 60.Backrub-like backbone simulation recapitulates natural protein conformational variability and improves mutant side-chain predictionJ. Mol. Biol 380:742–756
- 61.The role of local backrub motions in evolved and designed mutationsPLoS Comput. Biol 8
- 62.Algorithm for backrub motions in protein designBioinformatics 24:i196–204
- 63.Structure validation by Calpha geometry: phi,psi and Cbeta deviationProteins 50:437–450
- 64.Flexibility and Design: Conformational Heterogeneity along the Evolutionary Trajectory of a Redesigned UbiquitinStructure 25:739–749
- 65.Multiscale conformational heterogeneity in staphylococcal protein a: possible determinant of functional plasticityStructure 22:1467–1477
- 66.Accessing protein conformational ensembles using room-temperature X-ray crystallographyProc. Natl. Acad. Sci. U. S. A 108:16247–16252
- 67.Integration of software tools for integrative modeling of biomolecular systemsJ. Struct. Biol 214
- 68.Proper modelling of ligand binding requires an ensemble of bound and unbound statesActa Crystallogr D Struct Biol 73:256–266
- 69.ModelCIF: An Extension of PDBx/mmCIF Data Representation for Computed Structure ModelsJ. Mol. Biol 168021
- 70.Rescue of conformational dynamics in enzyme catalysis by directed evolutionNat. Commun 9
- 71.Temporal and spatial resolution of distal protein motions that activate hydrogen tunneling in soybean lipoxygenaseProc. Natl. Acad. Sci. U. S. A 120
- 72.Protein structure prediction has reached the single-structure frontierNat. Methods 20:170–173
Article and author information
Author information
Version history
- Preprint posted:
- Sent for peer review:
- Reviewed Preprint version 1:
- Reviewed Preprint version 2:
- Version of Record published:
Copyright
© 2023, Wankowicz et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
- views
- 1,435
- downloads
- 147
- citations
- 7
Views, downloads and citations are aggregated across all versions of this paper published by eLife.