Introduction

Macromolecular X-ray crystallography and single-particle electron microscopy (cryo-EM) can provide valuable information on macromolecular conformational ensembles. These experiments cannot capture all conformations present in solution, as many would disrupt the ability to obtain crystals or align classifiable particles1. However, careful modeling of high-resolution X-ray crystallography and cryo-EM experiments can reveal widespread conformational heterogeneity, particularly for protein side chains and local backbone regions2, 3. Discrete conformational heterogeneity of such small magnitude is significant for many biological functions, including macromolecular binding, catalysis, and allostery46.

While the underlying data from X-ray diffraction or cryo-EM experiments contain information on temporal and spatial averages of tens of thousands to billions of protein copies, conventional structural modeling and refinement procedures fail to capture much of this valuable information. Most depositions in the Protein Data Bank reflect only an averaged, single ground state set of atomic coordinates7, ignoring weak but potentially biologically rich signals encoding alternative conformations sampled by distinct copies of the protein in the experiment.

Ideally, we would accurately model the complete ensemble of protein conformations reflected in experimental data8. The two ways to model the conformational heterogeneity present in the sample are to create ensembles or to use alternative conformations (multiconformers)9. The PDB “ensemble” format encodes multiple complete copies of the entire system in different models within a single file. Ensemble refinement approaches are implemented in phenix.ensemble_refinement and Vagabond10, 11. In contrast, multiconformers extend the conventional single structure model by encoding each individual conformation using a distinct “alternative location indicator (altloc)” within a single model. Altlocs are assigned distinct letters and can range from single atoms to a large number of connected or non-connected residues. Refinement and validation programs treat atoms sharing the same altloc as having the ability to interact with each other and with atoms lacking an altloc. In contrast, atoms with different altlocs cannot interact. By representing the underlying heterogeneity through discrete conformations with labeled altlocs, multiconformer models encode the distribution of states that contribute to the density map. Further, when compared to ensemble methods that yield multiple complete copies of a protein1113, multiconformer models are more interpretable and easier to modify in interactive modeling software such as Coot14.

However, many factors make manually creating multiconformer models difficult and time-consuming. Interpreting weak density is complicated by noise arising from many sources, including crystal imperfections, radiation damage, and poor modeling1517 in X-ray crystallography, and errors in particle alignment and classification, poor modeling of beam induced motion, and imperfect detector Detector Quantum Efficiency (DQE) in cryo-EM18. These factors make visually distinguishing signals in Coot14 or other visualization software very difficult, especially when genuine low-occupancy signals overlap. Additionally, in X-ray crystallography, this process is iterative. Each time a new alternative conformation is placed, it can impact the entire electron density map, often requiring adjustments to previously modeled regions. The difficulty of this process can lead to burnout and human bias, where parts of the protein are carefully modeled as multiconformers, whereas other regions remain modeled as single conformers. Despite these complications multiconformer modeling can be implemented manually or using software such as FLEXR19 or qFit, as described below.

To enable more routine and impartial multiconformer modeling, we have previously developed qFit2022. This program leverages the ensemble-rich experimental data from density maps that are better than 2.0 angstroms (Å) resolution to automatically generate parsimonious multiconformer models20, 21. As input, qFit takes a refined single-conformer structure and either a high-resolution X-ray or cryo-EM map as input, and then leverages powerful optimization algorithms to identify alternative protein20, 21 or ligand23 conformations.

Here, we present updates to qFit including algorithmic improvements to protein conformation selection based on Bayesian information criteria (BIC), B-factor sampling, and updated cryo-EM scoring. Collectively, these advances enable the unsupervised generation of multiconformer models that routinely improve Rfree over single-conformer X-ray structures derived from high-resolution (better than 2.0 Å) data and improve model geometry metrics across a diverse test set. We further demonstrate that qFit can identify alternative side-chain conformations in high-resolution cryo-EM datasets. With the improvements in model quality outlined here, qFit can now be increasingly used for finalizing high-resolution models, in both individual cases and large structural bioinformatics projects5, to derive ensemble-function insights.

Results

Overview of qFit protein algorithm

qFit protein is a tool that automatically identifies alternative conformations based on a high-resolution map (generally better than ∼2 Å) and a well-refined single-conformer structure (generally Rfree below 20%). For X-ray maps, we recommend using a composite omit map as input to minimize model bias24. For cryo-EM modeling applications, equivalent metrics of map and model quality are still developing, rendering the use of qFit for cryo-EM more experimental.

Since our previous paper, we have made substantial improvement to the code both algorithmically (e.g. scoring is improved by BIC, sampling of B-factors is now included) and computationally (improving the efficiency and reliability of the code).

All code and associated documentation can be found in the qFit GitHub repository (https://github.com/ExcitedStates/qfit-3.0). The version of qFit associated with this paper is 2023.1.

A – qFit residue

For each residue, qFit samples backbone conformations, dihedral angles, and B-factors. Using mixed quadratic programming optimization (MIQP) and Bayesian information criterion (BIC), we select a parsimonious multiconformer for each residue (Figure 1A). The details of each component of this procedure are outlined below. The sampling and scoring of residues can be run in parallel using Python multiprocessing.

A.1 – Backbone sampling

The qFit process begins with sampling backbone conformations (Figure 1A.1). For each residue, we perform a collective translation of backbone atom (N, C, Cα, O) coordinates. If the model has anisotropic B-factors, this translation is guided by the anisotropic B-factors of the Cβ. Alternatively, if anisotropic B-factors are absent, the translation of coordinates occurs in the X, Y, and Z directions. Each translation takes place in steps of 0.1 along each coordinate, extending to 0.3 Å, resulting in 9 to 81 distinct backbone conformations for further analysis. For Gly and Ala, this is the only sampling that occurs.

A.2 – Aromatic angle sampling

For aromatic residues (His, Tyr, Phe, Trp), qFit takes the conformations from the backbone step (above) and builds part of the side chain out to Cγ (prior to the aromatic ring) based on the input model coordinates (Figure 1A.2). Then, we alter the Cα-Cβ-Cγ angle (“the aromatic angle”) in steps of +/-3.75°, extending to +/-7.5°, creating 5 partial side-chain conformations per backbone conformation. For non-aromatic residues, there is no sampling of this angle. These conformers provide variability in the placement of the aromatic ring prior to dihedral angle sampling.

A.3 – Dihedral angle sampling

The following steps occur for each χ dihedral angle for every residue (Figure 1A.3). For the first dihedral angle (χ1), the input is the backbone conformations (or for aromatic residues the backbone and “aromatic angle” conformers described above). We exhaustively sample around the χ1 dihedral angle by enumerating a conformation every 10° between rotamers. For proline, we sample the exo and endo conformations of the pyrrolidine ring, by +/-60° in steps of 10°. We then eliminate conformations that clash with other parts of the same sampled conformation (based on hard spheres) or are redundant (using an all-atom RMSD threshold of 0.01 Å).

These sampled conformations are then subjected to a quadratic programming (QP) optimization, which identifies the set of conformations whose weighted calculated density best fits the experimental electron density. The output of QP typically yields 5 to 15 conformations that best explain the density.

Next, qFit samples the B-factors of the conformers. The input atomic B-factors are multiplied by a factor ranging from 0.5 to 1.5 in increments of 0.2. The resulting 50 to 150 conformation/B-factor combinations are subjected to a mixed-integer quadratic programming (MIQP) optimization. The MIQP algorithm incorporates two additional constraints relative to QP: a cardinality term, which limits the maximum number of conformations to five, and a threshold term, which stipulates that no individual conformation can have an occupancy weight below 0.2. In qFit, MIQP then outputs up to 5 conformations.

For residues with subsequent dihedral angles, the conformations selected by the MIQP procedure at χ(n-1) serve as the starting conformers for the sampling χ(n) angle. For residues with only one dihedral angle (Ser, Cys, Thr, Val, Pro), we proceed directly to scoring χ1.

A.4 – Final qFit residue Scoring

Upon reaching the terminal dihedral angle, we perform the optimization steps outlined above (QP/MIQP), but instead of relying only on the optimization algorithm to decide on the number of conformations to output, we also consider the model complexity (Figure 1A.4). qFit runs the MIQP step 5 times with a cardinality term ranging from 1 to 5.

Taking each output, we calculate the Bayesian information criterion (BIC). The BIC provides a numerical value of the tradeoff between the difference between the calculated and experimental density (residual sum of squares) and the number of parameters (k). The number of parameters (k) is defined by the following: number of conformers * number of atoms * 4 (representing the x, y, z coordinates and B-factor). A heuristic scaling factor of 0.95 accounts for the fact that the coordinate parameters are not independent due to chemical constraints between atoms during sampling.

qFit then outputs the set of conformations with the lowest BIC value, concluding the qFit residue routine.

B. Connecting residues together into a multiconformer model

After the sampling and scoring of each individual residue, qFit considers the entire protein together. First, we use MIQP and BIC to select the best fitting conformations among connected residues, ensuring that neighboring backbone conformations have the same occupancy. Second, we label the alternative conformers while being aware of clashes. This labeling step is not parallelized.

B.1 – qFit segment

After identifying the optimal conformations for each residue in parallel, qFit reconnects the backbone atoms (Figure 1B.1). Moving linearly along the protein, we identify ‘segments’ of residues with multiple backbone conformations, delimited on each end by a residue with a single backbone conformation. The main reason for this step is to find a harmonious set of occupancies for adjacent residues in a segment. Within each segment, qFit creates fragments of three residues, enumerating all possible combinations of conformations in those residues, and selects the final combination of conformations and their relative occupancies using the optimization algorithms outlined above. The BIC is modified for qFit segment such that k equals the number of conformations. qFit then moves along the segment, enumerating and selecting optimal combinations of fragment conformations until reaching the end of the segment.

B.2 – qFit relabel

Next, qFit determines the correct altloc labeling (A, B, C, D, E) of coupled alternative conformers using Monte Carlo optimization with a simple steric model to prevent spatially adjacent conformers from sterically clashing (Figure 1B.2). There is also an option (‘qFit segment only’) to input a multiconformer model and run only the qFit segment and relabel procedures. This procedure can be especially helpful after manually adding or deleting conformations in Coot14. Running ‘qFit segment only’ will adjust the occupancy of the remaining conformations and correct the labeling of alternative conformations.

B.3 – qFit refinement

The raw output of qFit (a multiconfomer model) should then be refined. We provide scripts for a refinement procedure with Phenix25, where we iteratively refine the occupancy, coordinates, and B-factors, removing conformations with occupancies under 10%. Once the model is stable (has no conformations with occupancies less than 10%), we perform a final round of refinement which optimizes the placements of ordered water molecules (Methods). This refinement protocol outputs a final ‘qFit model’. This model can then be examined and edited in Coot14 or other visualization software.

Programmatic flow of qFit protein algorithm.

A. qFit residue algorithm, demonstrated by Tyr118 in the E46Q mutant structure of the photoactive yellow protein from Halorhodospira halophila (PDB: 1OTA). The 2mFo-DFc composite omit density map contoured at 1 σ is shown as a blue mesh.

A.1. Backbone sampling: For each residue, qFit performs a collective translation of backbone atom (N, C, Cα, O) coordinates.

A.2. Aromatic angle sampling: For aromatic residues (His, Tyr, Phe, Trp), qFit takes the conformations from the backbone step and samples the Cα-Cβ-Cγ angle.

A.3. Dihedral angle sampling: Since Tyr has two χ angles, qFit starts by taking the output conformers from the aromatic angle sampling step and exhaustively samples the χ1 angle, scoring the best conformations based on QP/B-factor/MIQP scoring. qFit then uses these best conformations as input to sample the remaining χ angles in the Tyr residue. Since the only angle left to be sampled is the χ2 angle, qFit rotates about the terminal ring of the Tyr and then scores the conformations that best fit the density.

A.4. Final qFit Residue Scoring: Once we reach the terminal ring (all sampling steps have occurred), we perform QP and B-factor sampling, followed by MIQP with BIC selection. MIQP with BIC selection removes a redundant overlapping conformation, resulting in two distinct conformations of this Tyr residue. This model is then output as the residue multiconformer.

B. qFit segment algorithm, demonstrated by Tyr118 in PDB: 1OTA. After identifying all optimal conformations for each individual residue, qFit works to connect the protein back together.

B.1 qFit segment: Moving linearly along the protein, qFit identifies ‘segments’ of residues with multiple backbone conformations. Here, Ser117 (i) and Tyr118 (i+1) have multiple backbone conformations. qFit segment enumerates each possible combination of alternate conformations between these two residues, creating four possible combinations. The optimal combination of conformations is then determined by the QP/MIQP scoring, leading to one combination being culled.

B.2 qFit relabel: qFit utilizes Monte Carlo optimization with a steric model to assign altloc labels to spatially coupled alternative conformers. In this example, Ser117 and neighboring Gln32 initially have clashing altloc B conformers. However, relabeling swaps the A and B labels of the Gln32 to relieve this clash.

B.3 qFit refinement: We then refine the occupancy, coordinates, and B-factors of the raw qFit output file to produce a final qFit model.

qFit improves overall fit to data relative to deposited structures

To evaluate the impact of qFit algorithmic and code improvements, we collated a dataset of single-chain, unliganded, high-resolution (1.2-1.5 Å) protein X-ray crystallography structures from the PDB26. We clustered these structures at a sequence identity threshold of 30% and selected the highest-resolution structure. Finally, we ensured that the datasets ran without error through the qFit pipeline, including refinement with Phenix, resulting in 144 diverse structures (Supplementary Figure 1).

Each deposited structure was initially re-refined using phenix.refine (Methods) to eliminate differences from the original refinement protocols. The resulting re-refined model, which we refer to as the ‘deposited model’, was used as the input for qFit. Next, we ran qFit protein using the default parameters and refinement protocol to produce the ‘qFit model’.

To evaluate the crystallographic modeling differences between the deposited and qFit models, we compared the Rfree values as an indicator of overall model/data agreement. The qFit model has a lower (improved) Rfree value for 73% (105/144) of structures (Figure 2A; Supplementary Figure 2A; Supplementary Table 1). On average, there is an absolute decrease of Rfree value by 0.6% (median deposited models Rfree: 18.1%, median qFit models Rfree: 17.5%), which is in line with theoretical expectations for the increase in model complexity created by qFit27, 28. Rfree is an valuable metric for monitoring overfitting, which is an important concern when increasing model parameters as is done in multiconformer modeling. An additional check on overfitting comes from monitoring R-gap, calculated as the difference between Rwork and Rfree. qFit models have similar R-gap values compared to deposited models (mean: 3.0% for both models). Collectively, these results indicate that qFit improves the quality of most models without overfitting (Supplementary Figure 2B).

Despite this general trend of improved models, 25% of the qFit models have worse Rfree than the deposited models (n=36). The majority of these structures had a deposited model Rfree of over 20%. These high Rfree values are notable because our re-refinement procedure generally improved Rfree relative to the originally deposited model (Supplementary Figure 2C). Since qFit builds off of the input structure and the map quality relies on model phases, accurately detecting alternative conformers depends heavily on the agreement between input model and data. This trend reinforced the idea that poor modeling in a deposited model, which serves as input to qFit, will result in poor performance of qFit. It further suggests that qFit is best employed at a late stage of modeling, after the single-structure model is of sufficient quality that it would be deposited in the PDB.

As an example of how qFit can uncover previously unnoticed conformational heterogeneity, we examined differences in conformations in the deposited versus qFit models of the Pyrococcus horikoshii fibrillarin pre-rRNA processing protein (PDB: 1G8A)29. We focused on the residues adjacent to the RNA binding motif. Among these residues, qFit identified well-justified alternative conformations for residues Leu58, Phe69, and Met175, including new rotamers for Leu58 and Met175, that were not present in the deposited model (Figure 2B). Beyond detecting alternative conformers in each of these residues, the qFit labeling process identified potential coupled motions between the alternative conformers. For example, when Leu58 is in the ‘up’ position (altloc B), Phe69 is also in the ‘up’ position (altloc B). It is possible that this coupled motion plays a role in RNA binding, a hypothesis that may merit further investigation.

qFit recovers alternative conformations of deposited models and discovers new ones

As qFit mainly alters structures by adding alternative conformations, we examined the differences in the number of alternative conformations between the deposited models and qFit models. Only 2.9% of residues in the deposited models were multiconformers (two or more alternative conformations, n=970). In contrast, 40.7% (n=11,049) of residues in the qFit models were multiconformers (Figure 2C). The vast majority (92.5%) of multiconformer residues in the qFit models have only two alternative conformations; only 2.4% of residues have more than two alternative conformations.

Alternative conformations created by qFit come in a few main varieties. First and most obvious are alternative conformations that represent drastic changes in coordinates, most commonly in the form of rotameric changes. Most alternative conformations found in deposited models fall into this category. Second are more subtle changes in side-chain and backbone coordinates to represent heterogeneity within a rotameric state. This behavior is exemplified by the Tyr residue in Figure 1A. Third, is even more subtle changes in coordinates to avoid strain because of the alternative conformations of neighboring residues30. This category is essentially imperceptible to visual inspection, as the atom centers are nearly superimposable, but is important to avoid outlier bond geometry because of adjacent residues having larger displacements.

To quantify how often qFit models new rotameric states, we analyzed the qFit models with phenix.rotalyze, which outputs the rotamer state for each conformer as a string (Methods)31, 32. We classified the agreement between the deposited and qFit models into 5 categories (Figure 2D, Supplementary Figure 3). The first category contains residues that have the same rotameric states in both models. This category entails most single-conformer and multiconformer residues with agreement between the two models. Moreover, residues that have multiple conformations in the same rotamer in the qFit model (for the reasons described above) generally populated the same rotamer as found in single-conformer residues in the deposited models. Overall this category, “Consistent”, represents 93.7% of residues (n=42,626) in the dataset.

The second and third categories deal with imbalance in alternative conformations that populate distinct rotamers. Since the original premise of qFit was to discover unmodeled alternative conformations, it is unsurprising that many residues in qFit models populate additional rotameric states that are absent in the deposited model. This category, “Additional Rotamer(s) in qFit model”, represents 2.38% of residues (n=1,082). In contrast, only two residues (0.06% of the dataset) are classified in the converse category, “Additional Rotamer(s) in deposited model”.

The final two categories cover disagreements in rotamer assignments. There are many cases where we observe only partial agreement between alternative conformers modeled in both the deposited and qFit models. These multiconformer residues share at least one common rotamer, but also populate alternative rotamers that are distinct between the two models. This behavior generally occurs in longer residues where subtle differences at higher χ angles leads to distinct rotameric assignments. This category, “Consistent & Different Rotamers”, represents 0.82% of residues (n=373). The final category, “Different”, covers both multiconformer and single-conformer residues where there are no shared rotamer states between the two models. One reason this category occurs is for similar reasons as the “Consistent & Different” category: differences in terminal χ angles in weak density lead to distinct rotamer assignments. Another contributor to this category is single conformers, generally in the deposited model, modeled into density that qFit interprets as multiconformer. Often the rotamer modeled by the single conformer fits an “average” rather than the two distinct minima fit by the multiconformer model. “Different” rotamer assignments represent 3.04% of residues (n=1,384). While the analyses above include all residues, focusing on residues that were modeled in as multiconformers in the deposited models (n=970) reveals a large increase in the “Different” and “Consistent & Different Rotamers” categories, to 14.88% (n=144) and 27.68% (n=268) of residues, respectively. This increase highlights the sensitivity of the rotamer assignments and motivates benchmarking qFit on “true positive” synthetic data in addition to deposited multiconformers.

Collectively, these analyses revealed that qFit identifies the majority of deposited alternative conformations and discovers new ones. Discrepancies between manually modeled and qFit alternative conformations predominantly result from weak density at terminal χ angles. When considered with the improvements in Rfree, these results indicate that qFit is detecting more of the true underlying conformational heterogeneity that exists in crystallographic data.

Multiconformer models created by qFit are better models than deposited single-conformer models.

A. The distribution of Rfree value in deposited models versus qFit models. The qFit Rfree values improve in 73% of structures.

B. qFit identifies new alternative conformations adjacent to the RNA binding motif in the Pyrococcus horikoshii fibrillarin pre-rRNA processing protein (PDB: 1G8A). (Left) qFit multiconformer model with the region in the right panel highlighted in green and the adjacent RNA binding motif highlighted in red. Key domains in the fibrillarin protein are also annotated in blue. (Right) Comparison of the deposited versus qFit model in a region with several conformationally heterogeneous residues. qFit identified new rotamers for Leu58 (tp) and Met175 (ttp and mtp)32 and significantly different alternative conformations within the original rotameric well for Phe69.

C. The differences in the number of alternative conformations per residue in deposited models versus qFit models. qFit adds at least one additional alternative conformation in 31.7% of residues (n=9,998).

D. The distribution of rotamer assignment agreement between the deposited and qFit models for different (sub)sets of residues. (Left) All residues (n=42,626). (Right) Only residues with alternative conformations in the deposited model (n=970).

qFit improves multiple side-chain model geometry metrics

Although qFit improves the agreement of model to data by the addition of alternative conformations, we questioned whether this improvement comes at the cost of degrading model geometry. On one hand, the absence of geometric constraints in qFit backbone residue sampling and the connections made during qFit segment may result in worse geometry. On the other hand, placing additional alternative conformers may alleviate strain in the model that can result from fitting a single conformer into density that should be supported by multiple conformers10, 33.

To validate geometry, we used MolProbity to evaluate the deposited and qFit models. MolProbity compares input models with idealized values and then provides component scores for various geometric and steric features that are summarized in an overall “MolProbity score”34. Component scores that examine all atoms (bond angle/length, clashscore) or side-chain atoms (rotamers) account for all alternative conformers. In contrast, scores that evaluate the backbone (Ramachandran, Cβ deviations) are reported for single-conformer residues or using only altloc A for multiconformer residues. Therefore, the overall MolProbity score includes some of the contributions of alternative conformations, but also misses the potential impact on some other aspects. In the future, we aim to explore updated metrics that consider all alternative conformations.

Compared to deposited models, qFit models had equivalent MolProbity scores (1.27 median deposited versus 1.31 median qFit, p=0.66 from two-sided t-test; Figure 3A), which indicated that qFit does not worsen geometry for better density fitting. To further understand which parts of the model geometry were different (if any) between the deposited and qFit models, we explored the individual component scores, and observed multiple component scores that improved in the qFit models. This included considerable improvements in bond lengths and angles in the qFit models (RMSD between idealized values for bonds: 0.010 Å median deposited versus 0.007 Å median qFit, p=0.0038 from two-sided t-test; RMSD between idealized values for angles: 1.30° median deposited v. 0.93° median qFit, p=8.8e-20 from two-sided t-test; Figure 3B,C). We suspect that the primary factor behind this improvement was the incorporation of multiconformers, rather than straining a single conformer, to explain the density. To visualize these differences, we examined Met189 from PDB: 1V8F. In the deposited model this residue has Sδ-Cε bond lengths of 1.596 Å, which are significantly shorter than the idealized lengths of 1.791 +/-0.025 Å34. qFit adds an additional conformation, both explaining previously unmodeled density and bringing the SD-CE bond lengths to 1.790 Å (altconf A) and 1.794 Å (altconf B) for the two conformations much closer to the expected values (Figure 3E). This multiconformer residue with improved geometry is consistent with the hypothesis that qFit is alleviating strained geometry by modeling multiple conformations.

Additionally, qFit models have improved clashscores (2.50 median deposited, 2.27 median qFit, p=0.0028 from two-sided t-test; Figure 3D). We hypothesized that this was due to a mixture of modeling of alternative conformers and improved fit of single-conformer residues which are re-sampled and refined during the qFit procedure. We looked at the qFit modeling differences in a cluster of Met and Leu residues in PDB: 6HEQ, which had one of the largest changes in clashscores between the deposited and qFit models. We observed that qFit fixes the positioning of Met83, preventing the clash with both conformers of Leu81 (Figure 3F).

However, qFit models tended to have worse rotamer scores, likely due to our extensive sampling. Further, we observed a slight, but not significant, decrease in favored Ramachandran values which may indicate the need to improve qFit segment procedures. It is difficult to evaluate if this is worse than deposited models in residues with backbone alternative conformers as this score only accounts for alternative conformer A (Supplementary Figure 4). However, both rotamer and backbone and rotamer sampling represent areas of potential qFit improvements. Overall, while there is a mix of geometry metrics that improve or deteriorate in qFit models, the negligible difference in overall MolProbity scores suggests that qFit does not deteriorate the model geometry in pursuit of model/data agreement.

qFit improves some geometry metrics compared to deposited structures.

A. Model MolProbity score (deposited model: 1.27 (median) [0.94-0.16] (interquartile range), qFit model: 1.31 (median) [1.06-1.58] (interquartile range)), p-value=0.66 from two-sided t-test.

B. Model averaged RMSD (0)of idealized versus versus model angles (deposited model: 1.30 [1.14-1.57], qFit model: 0.93 [0.81-1.11]), p-value=8.8e-20 from two-sided t-test.

C. Model averaged RMSD (Å) of idealized versus model bonds (deposited model: 0.010 [0.0070-0.015], qFit model: 0.0070 [0.0050-0.010]), p-value=0.0038 from two-sided t-test.

D. Count of number of high Clashscore per model (deposited model: 2.50 [1.30-5.92], qFit model: 2.27 [1.31-3.73]), p-value=0.0028 from two-sided t-test.

E. Example of qFit (right, blue and magenta) fixing bond length by appropriately modeling in a second conformation. Meshes represent density at 1 σ. Met189 from deposited structure (PDB: 1VF8; left, green) has a Sδ-Cε bond length of 1.596 Å (7.8Σ from idealized length of 1.791 Å)34. qFit models in two alternative conformations, filling in unmodeled density, and fixing the Sδ-Cε bond length (bond length: 1.790 Å for alternative conformation A and 1.794 Å for alternative conformation B).

F. Example of qFit (right, blue and magenta) fixing a clash between Met83 and Leu81 from deposited structure (PDB: 6HEQ). Meshes represent density at 1 σ. In the deposited model (left, green), Met83 is not correctly fitted into density and is clashing with Leu81 (closest contact: 3.0 Å). qFit corrects this by improving the fit of Met83, leading to the closest contact being 3.8 Å.

Simulated data demonstrate indicate qFit is appropriate for high-resolution data

In the previous sections, we established that qFit has the potential to improve Rfree and some geometry metrics relative to deposited structures. However, the vast majority of the residues in these deposited structures are modeled exclusively as single conformers. This homogeneity in single-conformation models limited our ability to assess how well qFit can recapitulate existing alternative conformers across a wide resolution range. To address this question, we generated artificial structure factors using an ultra-high-resolution structure (0.77 Å) of the SARS-CoV-2 Nsp3 macrodomain (PDB: 7KR0)35. This model had a high proportion of residues (47%) manually modeled as alternative conformations and did not employ qFit, making it an ideal comparison structure. We refer to this structure as the “ground truth 7KR0 model”, and evaluated how well its alternative conformations were recapitulated by qFit as resolution was artificially worsened across synthetic datasets.

To create the dataset for resolution dependence, we used the ground truth 7KR0 model, including all alternative conformations, and generated artificial structure factors from 0.8 to 3.0 Å resolution (in increments of 0.1 Å). We then added random noise to the structure factors that increased as resolution worsened (Methods; Supplementary Figure 5A and 5B). To create a single-conformer model appropriate for input to qFit, we removed all alternative conformations from the ground truth model, maintaining all single conformations and altloc A. Next, we refined this single-conformer model against the synthetic datasets. Finally, we used the refined single-conformer model as input for qFit.

To evaluate the fidelity of qFit in recapitulating the ground truth 7KR0 model, we categorized each residue across two axes: positive/negative and true/false. First, we define the positive/negative axis, with positive being modeled as multiconformer and negative being modeled as single-conformer in the outputted qFit model. Next, we define the true/false axis as agreement of the outputted qFit model with the ground truth 7KR0 model in terms of multiconformer/single-conformer status . Due to many residues in both the ground truth and qFit models having alternative conformations that nearly overlap each other, we categorize residues as multiconformer only if they possess at least two alternative conformers with a side-chain heavy-atom root-mean-square deviation (RMSD) greater than 0.5 Å. From this cutoff, 50 out of the 169 residues (30%) in the ground truth model are classified as multiconformers.

We used a similar 0.5 Å cutoff for defining agreement between multiconformer residues in categorizing a residue as a true positive, false positive, false negative, or true negative (Methods and Figure 4A). A “true positive” has agreement between multiconformers across ground truth and qFit models; a “true negative” has agreement between single conformers across ground truth and qFit models. Generally, a “false positive” has extra or distinct conformations in the qFit model; a “false negative” has at least one alternative conformation in the ground truth model that is not present in the qFit model, or discordant single-conformer conformations.

We observed that qFit is consistently strong at capturing single-conformer residues (true negatives) across resolutions; however, the power to detect alternative conformations (true positives) is limited beyond resolutions of ∼1.8-2.0 Å (Figure 4B; Supplementary Figure 5C). This behavior is exemplified by Glu114, which is multiconformer in the ground truth model (Figure 4C). At high resolution (1.0 Å), qFit correctly models the alternative conformation and this residue is categorized as a true positive. However, as resolution gets worse, qFit begins to mismodel this residue. At 1.8 Å resolution, qFit still models two alternative conformations and has a good fit to density; however, the secondary conformer has an RMSD greater than 0.5 Å away from the ground truth model; consequently this residue is now categorized as a false positive. Finally, at 2.8 Å resolution, qFit only models a single conformer, moving the residue to the false negative category.

Next, we tested the ability of qFit to detect alternative conformations over a larger, more diverse dataset. We generated artificial structure factors for the qFit models with improved Rfree values over the deposited values from the previous sections (n=110). Although this dataset is more diverse, it has a notable weakness relative to the 7KR0 dataset test: the 7KR0 alternative conformations were modeled manually, whereas the larger dataset has alternative conformations modeled by qFit. Therefore, this second synthetic dataset assesses convergence of the qFit models across resolution.

Using these qFit models as ground truth models, we generated structure factors, performed refinement of single-conformer models, and ran qFit over the resolution range of 1.0 to 3.0 Å (Supplementary Figure 5A). We observed a similar fall-off of true positive detection around 2.0 Å (Supplementary Figure 5D). Importantly, this dataset indicates that qFit still models single conformers well at lower resolutions. We also observe a trend of increased false positive/negative rates for longer residues that are just outside the 0.5 Å cutoff (Supplementary Figure 6). We did not observe a relationship between input model Rfree and the number of correctly modeled conformers, but it is difficult to tell whether our synthetic noise procedures properly capture the dependence of qFit performance on input model/data agreement (Supplementary Figure 7A/B).

We then assessed the agreement between individual conformers and the map. To do this, we used the Q-score36, which compares the map profile of an atom with an ideal Gaussian distribution that would be observed if the atom perfectly fits into the density. Across the test dataset, residues that qFit models as single conformers have an almost equivalent Q-score to the ground truth model even at lower resolutions (Figure 4D). The primary alternative conformations in qFit models (occupancy between 0.5 and 1.0) and lower-occupancy alternative conformations (occupancy <0.5) display Q-scores that are very close to the equivalent “ground truth model” alternative conformations until a resolution of about 1.8 Å. At lower resolutions there is a dramatic fall-off in model/map agreement for these alternative conformers. These trends were also observed with the 7KR0 dataset (Supplementary Figure 7C). Overall, these analyses on both the 7KR0 and larger synthetic datasets confirm that qFit will best detect alternative conformations with high-resolution (1.8-2.0 Å or better) data.

qFit performs best at high resolution of input dataset

A. Ground truth model residues are shown as green and yellow sticks; qFit model residues are shown as magenta, cyan, and gray. Meshes represent density at 1 σ.

  • True positive – Residue is multiconformer in qFit model with RMSD < 0.5 Å from ground truth residue. qFit models two distinct alternate conformations which recapitulate the ground truth residue’s alternate conformations. Hence this is a true positive.

  • False positive – Residue is multiconformer in qFit model with RMSD > 0.5 Å from ground truth residue. The example on the left has two alternate conformations in the ground truth. qFit models only one of them correctly. The example on the right is a single-conformation residue in ground truth but qFit models three alternate conformations. Hence these two are false positives.

  • True negative – Residue is single-conformer in qFit model with RMSD < 0.5 Å from ground truth residue. Both ground truth model and qFit model have one distinct conformation and they align well. Hence this is a true negative.

  • False negative – Residue is single-conformer in qFit model with RMSD > 0.5 Å from ground truth residue. The example on the left has two alternative conformations in the ground truth residue but only one conformation in the qFit residue. In the example on the right, the single conformer modeled by qFit does not align with the ground truth single conformer. Hence these two are false negatives.

B. Proportion of all residues in the qFit models of 7KR0 that are modeled as true positives (blue), true negatives (orange), false positives (green), and false negatives (red) as a function of resolution of input synthetic data from the 7KR0 dataset.

C. Glu114 in the 7KR0 dataset modeled by qFit (cyan and magenta) compared to the ground truth structure (green and yellow) at different synthetic resolutions. Meshes represent density at 1 σ.

D. The fraction of residues in the qFit models of the qFit test dataset with a Q-score within 0.01 to that of the ground truth model as a function of resolution. In multiconformer residues, Q-score for every alternative conformation is calculated separately. Q-scores of residues (or) conformers which have matching occupancy (range) are compared. Occupancies of conformers were binned into three classes – occupancy equal to 1 (blue), 1 < occupancy <= 0.5 (orange) and occupancy < 0.5 (green).

qFit models alternative conformers in cryo-EM density maps

As single-particle cryo-EM is increasingly producing high-resolution (better than 2 Å) reconstructions where alternative conformers can be detected37, 38, we wanted to improve and test the ability of qFit to model alternative conformations guided by cyro-EM maps. While a previous version of qFit introduced EM21, we had not optimized the approach to work with EM maps and models. qFit can now be run in ‘EM mode’ which uses electron structure factors, improves the treatment of solvent background levels, and reduces the default maximum number of alternative conformations (cardinality) (Methods).

To benchmark our ability to model alternative conformations in high-resolution cryo-EM structures, we initially gathered a dataset of 22 structures with a depositor-provided resolution better than 2 Å (Fourier Shell Correlation (FSC) at 0.143). However, only 8 of these structures have a resolution better than 2 Å (FSC at 0.143) when calculated by the Electron Microscopy Data Bank (EMDB)39. Some of the original 22 structures did not have FSC curves in EMDB (n=6) due a lack of data, and others had an EMDB calculated resolution worse than 2 Å (n=8) (Supplementary Table 2). The absence of standardized maps for determining cryo-EM structure resolution complicated our selection of structures for qFit analysis.

We downloaded the eight models with resolution better than 2 Å from the PDB and their corresponding maps from EMDB. Using the default parameters of phenix.autosharpen, we sharpened all maps and re-refined each structure (phenix.real_space_refine) against its sharpened map. qFit was run with the ‘EM’ flag and the output model was refined using the qFit real space refinement script (Methods).

Across the first asymmetric unit of the 8 models, 6.97% (n=61) of residues in the deposited model had at least two alternative conformers in the deposited structure, compared with 39.6% (n=266) in the qFit model. To determine if qFit could recapitulate the modeling of alternative conformers from deposited structures, we compared the high-resolution apoferritin deposited model (PDB: 7A4M, resolution: 1.22 Å) with the qFit model using the same criteria outlined in the resolution dependence section above (RMSD within 0.5 Å). qFit correctly models 78% of residues in the first asymmetric unit. This includes Arg22, which has two alternative conformations in the deposited model. qFit was able to recapitulate both alternative conformations (Figure 5A), highlighting that qFit can detect manually modeled alternative conformations in cryo-EM maps. In addition, qFit detected several unmodeled alternative conformers that were visually confirmed (Figure 5B-D).

As with the X-ray models, we wanted to determine how qFit changes the model geometry. Similar to the X-ray models, we observed that qFit improves bond lengths and angles and Cβ deviations. qFit does increase (worsen) the MolProbity clashscore of most structures, primarily due to increased clashes with waters, likely because their positions are not reset in phenix.real_space_refine (Supplementary Figure 8).

While we have made significant progress in modeling alternative conformations in cryo-EM data, the lack of consistent map handling, validation, and metrics with cryo-EM structures and maps is a major impediment to further development. Even among this select group of structures, there were varying levels of experimental and computational map details on EMDB and in manuscripts37, 38, 40, including information on masking, handling of bulk solvent, and local resolution. Our approach depends on sampling and scoring based on resolution. While there is an accepted formula for calculating resolution (FSC at 0.143), the maps to calculate these can vary, leading to differences in resolution as we observed between the deposited versus EMDB calculated resolution. Further, resolution can vary across a single model, and metrics for such local resolutions are not always widely available. Additionally, the handling of background bulk solvent values vary widely, from masking to flattening these values. The EM community will be able to benefit from improved ensemble modeling as efforts to standardize the storage of raw, meta, and processed data continue to improve.

qFit identifies alternative conformations in high-resolution cryo-EM models.

Meshes represent density at 1 σ, with blue volumes representing density at 0.5 σ. Green and yellow sticks represent deposited conformation(s). Cyan and magenta sticks represent qFit conformations. Occupancy is labeled based on each conformer.

A. qFit recapitulated the deposited alternative conformations of Arg22 (chain A) in apoferritin (PDB: 7A4M, resolution: 1.22 Å).

B. qFit identified a previously unmodeled alternative conformation of Glu14 (chain A) in apoferritin (PDB: 7A4M, resolution: 1.22 Å).

C. qFit identified a previously unmodeled alternative conformation of Lys49 (chain A) in a different structure of apoferritin (PDB: 6Z9E, resolution: 1.55 Å). .

D. qFit identified a previously unmodeled alternative conformation of Gln403 (chain A) in adeno-associated virus (PDB: 7KFR, resolution: 1.56 Å).

Discussion

Structural biology plays a vital role in understanding the complex connection between protein structure and function. However, since proteins exist as ensembles, structural biology modeling approaches need to adapt accordingly. X-ray crystallography and cryo-EM data hold significant information on these ensembles that are often ignored. qFit offers a solution by leveraging powerful optimization algorithms to transform well-modeled single or static models into multiconformer models. Here we demonstrate that qFit can uncover widespread conformational heterogeneity that better represents the true underlying conformational ensemble data as demonstrated by lower Rfree values. Further, we determine that qFit can reliably pick up on alternative conformers that were modeled manually, highlighting that qFit could be used as a tool to significantly speed up modeling of high-resolution structures.

This automation in modeling is needed especially in light of advances in data collection automation and fast detectors. These tools have revolutionized the field of X-ray crystallography, enabling high-temperature datasets, time-resolved experiments, and high-throughput data collection41, 42, 43, 44, 45. With the ability to capture different conformations, there is a growing demand for methods that can detect protein alternative conformers to extract as much biological information as possible. This is highlighted in massive ligand-soaking campaigns35, 4648, where there are often hundreds of structures with different ligands to parse. qFit provides a key tool to help extract the most out of these structures by improving the models and providing a better jumping-off point to determine how ligand binding impacts the protein. However, our data here shows that not only does qFit need a high-resolution map to be able to detect signal from noise, it also requires a very well-modeled structure as input.

While both throughput and resolution are currently lower for cryo-EM, recent high-resolution maps have observable conformational heterogeneity37, 40. Current classification approaches do not allow sorting based on signals as small as alternative side-chain conformations, necessitating approaches like qFit for modeling4951. We see great potential in combining qFit with classification approaches to understand conformational heterogeneity at different scales. In the future, qFit can likely be applied more widely to EM maps in regions with high local resolution52. In addition, we will also incorporate modeling of nucleic acids, with an emphasis on automating refinement of alternative base positions in high-resolution ribosome structures in future work53, 54.

However, we encountered many difficulties in applying qFit to EM data relative to the more established X-ray data. In particular, there are still disparities in how maps are sharpened and how masks are used to exclude noise or lower experimental signals, such as solvent55, 56, making it very challenging to evaluate whether models, especially multiconformer or ensemble models, have improved fit to the data. We suggest strengthening guidelines for reporting computational processing, and improving validation tools to gauge agreement between models and cryo-EM maps5557.

We envision many other future improvements that will further enhance the quality and accuracy of multiconformer models for both X-ray crystallography and cryo-EM. Simulations have demonstrated that subpar modeling of the macromolecule(s) and surrounding solvent is a major potential avenue to further reduce R-factors27, 28. To accurately account for water molecules in multiconformer models, partially occupied water molecules must be identified and labeled in connection with protein atoms. Automated detection and refinement of partial-occupancy waters should help further reduce Rfree15 and provide additional insights into hydrogen-bond patterns and the influence of solvent on alternative conformations.

Additionally, while qFit models have improved geometry in some respects relative to single-conformer models, we still have room for improvement for fixing rotamer outliers and backbone metrics (Ramachandran and Cβ deviations). The geometry improvements are likely mostly due to single-conformer models having strained conformations that fit the “mean” conformation rather than multiple partially overlapping conformations. Further gains in both accuracy and geometry quality will emerge with better sampling of backbone conformations20. Such improvements are important because splitting the backbone, where appropriate, can result in detection of biologically important alternative conformations5863. Notably, the recently described FLEXR approach, which leverages Ringer to detect density peaks contributed alternative conformations and Coot to model alternative side chains into density peaks, illustrates that many gains can be made with side chain focused modeling alone19. However, further improvements to backbone modeling, including larger-scale motions such as alternative loop conformations64 or coordinated larger-scale shifts of secondary-structural elements65, 66 will likely yield even higher quality multiconformer models.

Lastly, the experimental and computational advancements in structural biology have increased the focus on ensemble-based models10, 13, 21, 50, 51. But the current data format for structural models (PDB, mmCIF) does not allow for more complex representation of ensembles. To appropriately capture the many aspects of ensembles, we would ideally like to have multiple nested ensembles representing both larger and local conformational changes, or to be able to show how two different backbone conformations can each be “parents” to different side-chain conformations. Currently, neither the PDB nor CIF format allows for this type of representation6769.

In summary, qFit drastically reduces the time and effort required to create multiconformer models from X-ray and cryo-EM data, thereby lowering the barrier to generating new hypotheses about the relationship between conformational ensembles and biological function4, 5, 70, 71. Additionally, qFit can provide key data to bridge to the next frontier of structure prediction. While AlphaFold has achieved stunning success in predicting protein structure by training against single-conformation models72, future improvements to structure prediction might be gained by more accurately modeling the extent of conformational heterogeneity72.

Methods

Generating the qFit test set

To test the impact of algorithmic changes in qFit, we created a dataset of 144 high-resolution (1.2-1.5 Å) X-ray crystallography structures deposited in the PDB (Supplementary Table 1). These were single-chain protein structures (in the asymmetric unit and at the level of biological assembly) and contained no ligands or mutations. The maximum sequence identity between any two structures was set as 30%. Based on CATH classification31, the resultant entries represented 24 space groups and 72 folds (Supplementary Table 1). All these structures were re-refined as described in “Initial refinement protocol”. These re-refined models are referred to as deposited models. To create multiconformer models, we input the re-refined structures in qFit protein, followed by the post qFit refinement protocol. These multiconformer models are referred to as qFit models.

Initial refinement protocol

All structures from the PDB were re-refined using phenix.refine with the following parameters.

The re-refined models were used as the input for subsequent qFit models.

Running qFit

For this analysis qFit was run using the following command from qFit version 2023.1.

X-ray:

qfit_protein composite_omit_map.mtz -l 2FOFCWT,PH2FOFCWT rerefine_pdb.pdb

Cryo-EM:

qfit_protein sharpened_map.ccp4 rerefine_cryo-EM.pdb -r <resolution> -em -n 10 -s 5

qFit Improved Features

Bayesian information criterion (BIC)

BIC was implemented in the final selection of residue and segment conformations. BIC is defined as the real space residual correlation coefficient penalized by the number of parameters (k):

In qFit residue, k is defined as:

In qFit segment, k is defined as:

BIC is calculated for each candidate cardinality (1-5). We then choose the set of conformations with the lowest BIC as the final conformations for the residue or segment under consideration.

B-factor sampling

To sample B-factors along with atomic coordinates at each step of qFit residue, we first perform one round of quadratic programming to reduce the number of conformations. For all remaining conformations, the input B-factor of each atom in the residue is multiplied by 0.5-1.5 in increments of 0.2. All conformations with sampled B-factors and coordinates are inputs for mixed integer quadratic programming.

Iterative optimization algorithm with non-convex solutions

Due to our exhaustive sampling, there are times when the MIQP optimization algorithm fails to find a non-convex solution. To address this limitation, we have implemented a procedure that iteratively removes solutions one-by-one based on the two solutions with the closest root-mean-square deviation (RMSD) until MIQP identifies a solution.

Parallelization of large maps

Often, cryo-EM maps are very large and reach memory limits using Python multiprocessing. Multiprocessing is used to model multiple residues independently in parallel. We have now implemented a new scheme to divide the density map into portions centered around each residue of interest and feed those portions of the map into our parallelization.

Occupancy constraints

To help refine segments (i.e. sets of residues with alternative conformations flanked by residues with only a single conformation) during X-ray refinement, we now output a restraint file at the end of the qFit protein run for X-ray refinement. This restraint file will enables “group occupancy refinement” for residues in a segment with the same alternative conformation. In group occupancy refinement all residues within the group are refined to the same occupancy, reducing the free parameters to fit.

Finalizing qFit models with iterative refinement

We iteratively run 5 macrocycles of refinement followed by a script that removes any conformations with occupancy less than 0.1. This script also renormalizes the occupancies of any remaining conformations in that segment, ensuring that the occupancy sums to 1. This procedure ends when no conformations have a refined occupancy of less than 0.1 or after 50 total rounds of refinement (whichever comes first). After, we do one final refinement where we release the occupancy constraints on the segments, turn on automated solvent picking, and optimize B-factors (specified as ADP parameters in Phenix) and coordinate weights.

Cryo-EM

To improve the detection of alternative conformations in cryo-EM structures, we made some key updates to part of the qFit algorithm. All of these updates to the algorithm will turn on with the -em flag. First, we now use electron scattering factors when calculating the modeled electron density. Second, we have removed bulk solvent electron density values (set at 0.3 in X-ray qFit protein). We also restricted the cardinality to be 0.3 (compared to 0.2 in X-ray qFit protein) to reduce misplaced conformations.

Q-score

We implemented the option for users to use Q-scores to determine if qFit should be run on a residue or not. This option is off by default. To use this option, generate Q-scores (mapq.py) available as part of the Q-score command-line interface (https://github.com/gregdp/mapq). qFit takes in a text file of Q-scores by using the – qscore option in qFit_protein. By default, all residues with a Q-score of less than 0.7 are not modeled as multiconformers, but are considered in qFit segment. Users can also adjust this level by using the –qscore_cutoff option in qFit protein.

qFit-segment-only runs

qFit can be used as a tool along with iterative model building and refinement. If a user manually removes or adds additional conformations using Coot14 or similar software, this can disrupt the occupancy sum of the residue and the connectivity of the backbone. To alleviate such problems, we developed an option (qfit_protein –only-segment) to facilitate manual model adjustment after running qFit. This procedure generates connected backbones with consistent occupancies for coupled neighboring conformers.

For example, suppose residue N has four alternative backbone conformations (A, B, C, D) and residue N+1 has two alternative conformations (A, B). In that case, this procedure will create C and D conformers for residue N+1 by duplicating its A and B conformers. This duplication continues until we reach the end of a segment so that all backbones have the same number of alternative conformations (A, B, C, D) and are, therefore, properly connected. Subsequent crystallographic refinement of this model (see “Post-qFit refinement script” above) will cause the duplicated conformations to diverge slightly, and will behave as expected without introducing geometry errors.

Metrics

Scripts for all metrics can be found in the scripts folder in the qFit GitHub repository (https://github.com/ExcitedStates/qfit-3.0). Our scripts for running qFit protein on an SGE-based server and all scripts for figures can be found here: https://github.com/fraser-lab/qFit_biological_testset/tree/main.

R values

R-values were obtained after the final round of refinement for the re-refined deposited models (deposited_rerefine.sh) and for the qFit models after the iterative refinement script (qfit_final_xray_refine.sh).

Rmsf

RMSF was measured for each residue based on all side-chain heavy atoms. First, the center of all conformations is identified. We then calculate the square root of the mean of the squared distances between the center of each individual conformations to the center of all conformations. Each distance is then weighted by occupancy (RMSF.py).

B-Factors

For each residue, we calculated an occupancy weighted B-factors for each residue (each heavy atom B-factor is weighted by it’s occupancy). We obtain the average heavy atom B-factor for every conformation and multiply it by the occupancy.

Rotamers

The rotamer name for each alternative conformation was determined by phenix.rotalyze34 while manually relaxing the outlier criteria to 0.1%. Each alternative conformation is given its rotamer. Rotamers were compared on a residue-by-residue basis. To compare rotamers, we only consider the first two χ dihedral angles. Each residue was classified into four categories: same, additional rotamer in qFit model, additional rotamer in the deposited model, or different.

Generating synthetic data for resolution dependence

To generate artificial electron density data at increasingly poorer resolutions, we first increased the B-factors of all atoms of the ground truth model by 1 Å2 for every 0.1 Å reduction in resolution and placed the models in a P1 box. We randomly shook the coordinates using the shake argument in phenix.pdbtools with root-mean-square error of shaking given as 0.2 * desired resolution of synthetic data. We generated structure factors (Fshake) for each of these shaken models from 0.8 Å to 3.0 Å in increments of 0.1 Å using the phenix.fmodel command-line function (with bulk solvent parameters k_sol=0.4, b_sol=45, and 5% R-free flags). We then added noise to the structure factors as follows:

Fnoisy = Fshake + (sqrt(Fshake) * random number from normal distribution * resolution of model * 0.5).

The scaling factors of 0.2 and 0.5 for shake RMSD and noise addition was determined by trying out different values and finalizing on the values which gave the most reasonable R-factors over the resolution range after refining the model against the generated structure factors. The addition of noise to Fshake was done using the sftools command in CCP4. Then, the ground truth model with adjusted B-factors was stripped of alternative conformations (if any) at every residue position. The resulting single-conformer model was refined with the Fnoisy structure factors (Supplementary Figure 5A).

The final refined model was given as input to qFit and the composite omit map was obtained for the Fnoisy structure factors. The multiconformer model given by qFit was refined with phenix.refine as explained in the post-qFit refinement script section. Since there is some randomness involved in simulating noise in the synthetic datasets, we repeated the synthetic data generation ten times at each resolution and all the steps following that including qFit modeling and post-qFit refinement for each of these noisy synthetic maps. The same steps of data synthesis were followed for the larger qFit test dataset containing 110 models, except that one set of structure factors was generated for each model at each resolution instead of ten as in the 7KR0 dataset.

True/False Positive/Negative matrix for synthetic data

True positive residues were those with at least two alternative conformations and an RMSD of less than 0.5 Å between the ground truth and qFit model conformations (for example, qFit model altloc A has an RMSD of less than 0.5 Å to ground truth model altloc A or B, and qFit model altloc B has an RMSD of less than 0.5 Å to the other ground truth model altloc A or B) (Figure 4A). A false positive residue has at least two alternative conformations in the qFit model, but fewer conformations in the ground truth model (Figure 4A). Alternatively, for a false positive residue, if the ground truth model residue is also multiconformer, then the RMSD between at least one of the conformations of qFit residue and ground truth residue is more than 0.5 Å (Figure 4A). A true negative residue is when both the ground truth and qFit model have a single conformer and they have an RMSD of less than 0.5 Å (Figure 4A). A false negative residue is when the qFit model has a single conformer but the ground truth model has more than one alternative conformer or both models have a single conformer but they have an RMSD greater than 0.5 Å (Figure 4A).

Acknowledgements

This work was supported by: a National Institutes of Health (NIH) grant GM145238 and Chan Zuckerberg Initiative Open Software grant to JSF and NIH R35 GM133769 to DAK. We thank Christopher Williams and Vincent Chen for help with interpretations of MolProbity score ideal side-chain geometry.

Supplementary Figures

Flow Diagram of the selection of the test set PDBs.

Rfree and R-gap distributions.

A. Distribution of difference of Rfree between deposited and qFit models. The median difference in Rfree is 0.6%. Median deposited models Rfree: 18.1%, median qFit models Rfree: 17.5%.

B. Distribution of R-gap values between deposited and qFit models (median deposited model: 3.0%, median qFit model: 3.0%).

C. Distribution of Rfree value in PDB deposited models versus re-refined deposited models. In the manuscript, deposited models refer to the re-refined deposited models.

Examples of rotamer state categories.

Meshes represent density at 1 σ. Green and yellow sticks represent deposited conformer(s). Blue and magenta sticks represent qFit conformers.

A. Same: The entire set of rotamers identified in the deposited and qFit models are the same (PDB: 1BN6, His199).

B. Additional rotamer(s) in the qFit model: Deposited and qFit models share at least one rotamer, and at least one additional rotamer was identified in the qFit model (PDB: 3CX2, Glu165).

C. Additional rotamer(s) in the deposited model: Deposited and qFit models share at least one rotamer, and at least one additional rotamer was identified in the deposited model (PDB: 4P48, Ser6).

D. Consistent and Different: Deposited and qFit models share at least one rotamer, and at least one unique additional rotamer was identified in both the deposited model and the qFit model (PDB: 3HP4, Arg81).

E. Different: The rotamers in the deposited and qFit models are all different (PDB: 1BN6, Glu110).

Deposited versus qFit model geometry.

A. Count of number of Cβ deviation (>0.25Å) per model: deposited model: 0.0 median [interquartile range: 0.0-0.0], qFit model: 0.0 median [interquartile range: 0.0-0.0], p-value=0.32 two-sided t-test.

B. Count of number of rotamer outliers per model: deposited model: 0.94 [0.00-2.12], qFit model: 1.79 [0.980-2.81], p-value=0.00074 two-sided t-test.

C. Percent of Ramachandran favored per model: deposited model: 97.70 [96.90-98.93], qFit model: 97.74 [96.81-98.68]p-value=0.15 two-sided t-test.

D. Percent of Ramachandran outliers per model: deposited model: 0.0 [0.0-0.0], qFit model: 0.0 [0.00-0.25], p-value=0.14 two-sided t-test.

A. Protocol for generating synthetic structure factors at various resolutions starting from the ground truth model. For the 7KR0 dataset, all the steps starting from random shaking of coordinates were done 10 times for each resolution. For the larger test dataset, all steps were only done once.

B. A visualization of synthetic maps generated for the models at varying resolution. The loss in detail of density is clearly visible with worsening resolution.

C. Proportion of all residues in qFit models which have been modeled as multiconformers in the 7KR0 dataset, as a function of resolution. The shaded region around the line indicates the spread across 10 runs at every resolution step.

D. Proportion of all residues in the qFit models of qFit test dataset which are modeled as true positives (blue), true negatives (orange), false positives (green) and false negatives (red) as a function of resolution of input data. The shaded region around the lines indicates the spread across the qFit test dataset which consists of 110 proteins.

A. The distribution of RMSD between qFit residues and corresponding ground truth residues (qFit test set) whenever the RMSD is higher than the 0.5 Å cutoff resulting in the qFit residues being classified as false positives.

B. The propensity of each residue type to be modeled with high RMSD from the ground truth (qFit test set), resulting in being classified as false positive. This propensity of a residue type x is calculated as the ratio between (i) proportion of residue type x among all the residues with a high RMSD and (ii) proportion of residue type x in the entire dataset.

C. The distribution of RMSD between qFit residues and corresponding ground truth residues (qFit test set) whenever the RMSD is higher than the 0.5 Å cutoff, resulting in the qFit residues being classified as false negatives.

D. The propensity of each residue type to be modeled with high RMSD from the ground truth (qFit test set), resulting in being classified as false negative.

A. Rwork (blue) and Rfree (orange) distribution of the input model from the qFit test dataset. These correspond to the models obtained after refining against Fnoisy structure factors (see Supplementary Figure 5A). The shaded region around the lines indicates the spread across the qFit test dataset.

B. Fraction of correctly modeled qFit residues (true positives + true negatives) as a function of input model Rfree for all structures in the qFit test dataset at 1.6 Å resolution (input Rfree range: 0.17 to 0.25, n=110).

C. The fraction of residues in the qFit models of the 7KR0 dataset with a Q-score within 0.01 of that of the ground truth model as a function of resolution. In multiconformer residues, Q-score for every alternate conformer is calculated separately. Q-scores of residues (or) conformations which have matching occupancy (range) are compared. Occupancy of conformations were binned into three classes – occupancy equal to 1 (blue), 1 < occupancy <= 0.5 (orange) and occupancy < 0.5 (green).

MolProbity of deposited versus qFit models.

A. MolProbity score: deposited model: 1.49 (median) [1.40-1.61] (interquartile range), qFit model: 1.59 (median) [1.39-1.92] (interquartile range)

B. Model average of RMSD of model bond length from idealized bond length(Å): deposited model: 0.00 [0.00-0.01], qFit model: 0.00 [0.00-0.00]

C. Model average of RMSD of model bond angle from idealized bond angle(0): deposited model: 0.00 [0.00-0.11], qFit model: 0.00 [0.00-0.01]

D. Structure Clashscore: deposited model: 3.15 [2.74-4.39], qFit model: 8.45 [3.22-10.17]

E. Number of Cβ deviation (>0.25 Å) per model: deposited model: 0.02 [0.00-0.02], qFit model: 0.00 [0.00-0.00]

F. Number of rotamer outliers per model: deposited model: 2.0 [2.0-2.0], qFit model: 2.0 [1.0-3.0]

G. Percent of Ramachandran favored per model: deposited model: 97.6 [96.9-98.9], qFit model: 98.3 [96.7-98.7]

H. Percent of Ramachandran outliers per model: deposited model: 0.0 [0.0-0.0], qFit model: 0.0 [0.0-0.0]