1. Structural Biology and Molecular Biophysics
Download icon

Automated cryo-EM structure refinement using correlation-driven molecular dynamics

  1. Maxim Igaev  Is a corresponding author
  2. Carsten Kutzner
  3. Lars V Bock
  4. Andrea C Vaiana  Is a corresponding author
  5. Helmut Grubmüller  Is a corresponding author
  1. Max Planck Institute for Biophysical Chemistry, Germany
Tools and Resources
  • Cited 1
  • Views 2,457
  • Annotations
Cite this article as: eLife 2019;8:e43542 doi: 10.7554/eLife.43542

Abstract

We present a correlation-driven molecular dynamics (CDMD) method for automated refinement of atomistic models into cryo-electron microscopy (cryo-EM) maps at resolutions ranging from near-atomic to subnanometer. It utilizes a chemically accurate force field and thermodynamic sampling to improve the real-space correlation between the modeled structure and the cryo-EM map. Our framework employs a gradual increase in resolution and map-model agreement as well as simulated annealing, and allows fully automated refinement without manual intervention or any additional rotamer- and backbone-specific restraints. Using multiple challenging systems covering a wide range of map resolutions, system sizes, starting model geometries and distances from the target state, we assess the quality of generated models in terms of both model accuracy and potential of overfitting. To provide an objective comparison, we apply several well-established methods across all examples and demonstrate that CDMD performs best in most cases.

https://doi.org/10.7554/eLife.43542.001

Introduction

State-of-the-art cryo-electron microscopy (cryo-EM) allows biomolecules to be resolved in different functional states and at near-atomic resolution previously achieved only by X-ray crystallography (Zhang and Nogales, 2015). A new generation of electron detectors and the development of sample motion correction algorithms have been the major contributors to this rapid progress (Bai et al., 2015). Cryo-EM maps determined to resolutions better than 4 Å have become standard with several examples claimed to break the 2Å-resolution barrier (Frank, 2017). Yet, benefiting from the recent advances in image classification (Frank and Ourmazd, 2016; Nakane et al., 2018), cryo-EM now provides ensembles of low- to high-resolution reconstructions describing differently populated conformational states of vitrified molecular complexes. However, methods for deriving accurate atomistic models from cryo-EM maps lag behind this resolution revolution (Saibil, 2017). The increasing amount of molecular detail requires the development of new methodologies and software to accurately and timely interpret experimental densities.

For de novo building of polypeptide/nucleotide chains many methods are available (Emsley and Cowtan, 2004; Langer et al., 2008; Terwilliger et al., 2008; Adams et al., 2010; Burnley et al., 2017), including tools for subsequent refinement of hand-built or homology models into densities (Chen et al., 2003; Fabiola and Chapman, 2005; Schröder et al., 2007; Topf et al., 2008; Brown et al., 2015; Lopéz-Blanco and Chacón, 2013; Wu et al., 2013; DiMaio et al., 2015; Wang et al., 2016a; Kovalevskiy et al., 2018). In these methods, stereochemical properties and local electrostatics are not fully enforced, nor is a proper description of large-scale concerted motions guaranteed. When applied to maps with medium (local) resolutions or lower, they tend to produce errors such as atomic clashes, improper bond lengths and angles, and poor agreement between the density and the corresponding model (Hooft et al., 1996; Neumann et al., 2018). As a result, multiple rotamer, backbone and secondary structure restraints, manual intervention on the system during refinement, and subsequent re-refinement are often required to preserve stereochemical plausibility and avoid overfitting.

As an alternative, several molecular dynamics (MD)-based refinement methods have been developed (Brünger et al., 1987; Brunger and Adams, 2002; Trabuco et al., 2008; Orzechowski and Tama, 2008; Fenn et al., 2011; McGreevy et al., 2016; Miyashita et al., 2017; Wang et al., 2018). A more comprehensive overview of how MD simulations can be used to assist structure refinement is given elsewhere (Kirmizialtin et al., 2015). Here, the refinement rests on a chemically accurate force field and thermodynamic sampling of conformations in addition to guiding the model into the density, as opposed to minimizing a simple target function. These MD-based methods have markedly higher computational requirements which are usually met by today’s efficient parallel and GPU implementations. High-resolution maps, however, pose a challenge to MD-based refinement (McGreevy et al., 2016). Frequently, parts of the system get trapped due to inefficient conformational sampling in rugged density regions commonly present in such maps. Despite recent improvements (McGreevy et al., 2016; Miyashita et al., 2017), the obtained models still strongly depend on local map resolution and the choice of a particular refinement scheme or force field. Also, systematic, force-field driven deviations in model geometry have been reported (Wang et al., 2018).

It is therefore fair to say that no available cryo-EM refinement method simultaneously meets the three challenges posed by the current resolution revolution: resolution-independent density fitting, stereochemical accuracy, and automation. Here, we have developed and implemented such an automated yet accurate method for structure refinement and validation, denoted correlation-driven molecular dynamics (CDMD), by combining a previously published methodology for low-resolution maps (Orzechowski and Tama, 2008) with continuously adaptive resolution and simulated annealing (Brünger et al., 1987; Brunger and Adams, 2002). In all the presented application examples, we challenge CDMD to refine distant starting models of diverse quality against cryo-EM densities at various resolutions (2.6–7 Å). Performance is assessed using commonly defined measures such as the radius of convergence, model geometry and overfitting. A full analysis of the results and a discussion of the practical conclusions in the general context of structure refinement are presented.

Results and discussion

Basic concept of CDMD

Figure 1 summarizes the early approach proposed by Orzechowski and Tama (2008) for refinement of crystal structures into low-resolution electron densities. First, a simulated map ρsim(𝐫) at a given resolution is calculated from the atomistic model and then compared with the experimental density ρexp(𝐫) in real space. The agreement between ρsim(𝐫) and ρexp(𝐫) is measured by the correlation coefficient, c.c., which is included into a global biasing potential Vfit. This potential is added to the force field Vff for subsequent refinement MD simulations. In this form, Vfit takes the full electron density information into account, including amplitudes and phases as well as negative densities which, by convention, reflect hydrophobic areas in the map.

Approach: using correlation-based refinement in MD simulation to steer the atomic positions of a macromolecule such that they optimally fit a cryo-EM map.

The molecule is subjected to a global biasing potential Vfit in addition to the MD force field Vff. The forces resulting from Vfit act on every atom to enhance the real-space correlation coefficient c.c. between the cryo-EM density (green) and the density calculated from the current atomic positions (blue). The first step (1) is to generate a simulated density by convoluting the atomic positions with a three-dimensional Gaussian function of width σ (Orzechowski and Tama, 2008). The two maps are correlated (2), and the biasing forces are calculated. These forces are then added to the standard MD force field (3), and new atomic positions are evaluated (4). Steps (1–4) are repeated, yielding a structure that correlates better with the cryo-EM map than the starting structure.

https://doi.org/10.7554/eLife.43542.002

To overcome sampling issues in rugged density regions, which are particularly severe given high resolutions and fine grids offered by cryo-EM today, we have extended the method to additionally allow for adaptive resolution and simulated annealing (Figure 2; see also Figure 2—figure supplement 1 and Materials and methods for a detailed description of the designed protocol). The complete framework has been embedded into the GPU-accelerated GROMACS MD suite (Abraham et al., 2015). The refinement process is controlled by only three parameters which are varied during the MD run. First, to avoid local minima, the resolution of ρsim(𝐫) is gradually increased from very low to the maximum available from the cryo-EM data. This increase allows the structure to adapt globally and only then locally to the experimental density during refinement. In contrast to previous approaches (Singharoy et al., 2016; Wang et al., 2018), only the simulated density is continuously resampled so that the full experimental density is used throughout the refinement. Although the correlation coefficient (Figure 1) is invariant with respect to whether ρexp(𝐫) or ρsim(𝐫) is blurred, changing the resolution of ρsim(𝐫) only is computationally much cheaper for the biasing potential of choice, which facilitates automation. Second, the relative weight of Vfit over Vff is gradually increased via a force constant k. It has been shown previously that gradually increasing k improves the refinement outcome for low-resolution maps (Miyashita et al., 2017). For high-resolution maps, higher force constants at late refinement stages are required to ensure an accurate density fit at the side chain level which could otherwise be distorted by thermal fluctuations. Third, high-temperature simulated annealing (Brünger et al., 1987; Brunger and Adams, 2002) is used to enhance local map-model agreement (e.g. side chain rotamers), while keeping well-refined parts of the model unchanged.

Figure 2 with 1 supplement see all
Schematic representation of the proposed continuous refinement protocol: (1) a low temperature optimization phase, where Vfit is monotonously increased by increasing the force constant k (columns a–d), followed by (2) simulated annealing (columns e, f).

The local effect of the protocol is exemplified in the upper row for a one-dimensional single-atom case. Simulated densities shown in the middle row were generated using the atomic structure of a tubulin dimer (PDB ID: 3JAT; Zhang and Nogales, 2015).

https://doi.org/10.7554/eLife.43542.003

Refinement

In this section, we describe the results for eight test cases at various resolutions (2.6–7.1 Å) covering a wide range of initial structural errors and deviations from the target state. The refined models are validated by comparing them with previously published structures. We were additionally encouraged by the reviewers to use the following alternative approaches for the high- and medium-resolution cases to compare our results to: Phenix real space refinement (Adams et al., 2010; Afonine et al., 2018), Rosetta (DiMaio et al., 2015; Wang et al., 2016a) and Refmac (Brown et al., 2015; Kovalevskiy et al., 2018). All methods are applied straightforwardly to the same far-away starting models (without manual corrections) and using protocols recommended by the developing research groups and/or described in recent literature. While we are aware that most of these methods have not been originally designed or further optimized to refine distant starting models and/or DNA/RNA, such a comparison further supports the results of this work. In the last example, we specifically compare our model with that generated by MDFF, a method originally designed for flexible fitting of distant high-resolution structures into low-resolution densities (Trabuco et al., 2008; Trabuco et al., 2009). Starting model preparation, refinement simulation setups, and validation protocols are described in Materials and methods.

Throughout the rest of the work, the following important notions will be used. (1) The deposited reference (or simply reference) is the atomistic model that has been generated using and deposited together with a particular cryo-EM data set in the Electron Microscopy Data Bank (EMDB). Importantly, for all cases, we use starting structures that are much farther away from the target state defined by the map than those originally used to produce the deposited references. (2) The control structure(s) (or simply control(s)) is a higher quality atomistic model of the same biomolecule captured in the same conformational state as the reference that has been derived from a higher resolution data set (usually X-ray). (3) The true structure is a hypothetical and practically unreachable structure that one would obtain from the ‘perfect’ data set. We generally assume that a control structure is closer to the true structure than ones derived from lower resolution cryo-EM data sets. (4) Any refinement with many parameters (here, atomic coordinates) may be prone to overfitting, that is refinement of the atomic coordinates into noise or map regions already occupied by the refined structure, if the strength of the biasing term is not selected appropriately. It is therefore important to refine the model against only a subset of data (training map) and to cross-validate it with another independent subset (validation map), making sure that the refinement does not lead to an overinterpretation of the training map at the cost of losing agreement with the validation map or stereochemical quality (see also Figure 2—figure supplement 1).

Aldolase: good starting model against a high-resolution map

We first evaluated how well CDMD refines good starting models into high-resolution maps. Specifically, we asked if, in the absence of any other source of atomistic knowledge than the MD force field, the chosen biasing potential (Figure 1) per se introduces steric clashes and disrupts backbone and side chain geometries. The absence of such artifacts is an important requirement for the CDMD application at medium to low resolutions for which side chains and the secondary structure may not be sufficiently described by the density alone. To this end, our method was applied to fit the X-ray structure of a rabbit muscle aldolase (PDB ID: 6ALD; Choi et al., 1999) to the recently deposited data set (EMD ID: 8743, PDB ID: 5VY5; Herzik et al., 2017). The starting structure was subjected to MD simulation at T = 300 K to increase the deviation from the target state (RMSD ≈6.6 Å; see Figure 3a). Higher resolution X-ray structures were used to validate the results (PDB IDs: 3BV4 at 1.7 Å (Sherawat et al., 2008); 1ZAH at 1.8 Å (St-Jean et al., 2005); 1ADO at 1.9 Å (Blom and Sygusch, 1997)).

Figure 3 with 2 supplements see all
Refining a distant starting model into a high-resolution map: rabbit muscle aldolase at 2.6 Å.

(a) RMSD (Cα atoms) between the starting and the reference model (5VY5) showing the extent of rearrangements during refinement. (b) Reciprocal-space agreement of the starting (black dashed), the reference (gray) and our refined model (green) with the full map (top) and stereochemical quality for the three models assessed by EMRinger and MolProbity (bottom). (c) Representative secondary structure elements showing local agreement of our model with the full map. (d) Representative region of the protein interior (chain A) showing the closeness of our model and the reference to higher resolution control X-ray structures in terms of RMSD. Some residues are explicitly labeled.

https://doi.org/10.7554/eLife.43542.005

Following the standard approach to avoid overfitting (Amunts et al., 2014; Brown et al., 2015), we first refined the starting model against a training map reconstructed from half of the raw cryo-EM images (half-map), while simultaneously cross-validating against a validation map (the other half-map) (Figure 3—figure supplement 1a,b). The target force constant was set to 5 × 105 kJ mol-1 which was roughly 10% of the total system energy and was a good initial guess. Both real-space correlation (c.c.) with the training map and root mean square deviation (RMSD) from the reference model (RMSDref) were monitored and showed a continuous improvement throughout the refinement (Figure 3—figure supplement 1a, bottom). Already during the half-map refinement the fitted model converged to a structure with RMSDref ≈2 Å. To assess whether the refined model is overfitted, we monitored the Fourier Shell Correlation (FSC; Harauz and van Heel, 1986) between the model and the training map (FSCtrain, shown in Figure 3—figure supplement 1b, top). The FSC between the model and the validation map (FSCval) was calculated simultaneously (Figure 3—figure supplement 1b, bottom). No large differences were observed between FSCtrain and FSCval at any point during the half-map refinement, confirming the absence of overfitting. Driving the force constant beyond 5 × 105 kJ mol-1 led to gradually increasing deviations between FSCtrain and FSCval until the refinement simulation became unstable due to the biasing potential being too strong (15–20% of the total system energy). Finally, we performed additional refinement including simulated annealing against the full map to account for features not present in the training map, which brought RMSDref further down to ∼1.8 Å. The force constant was no longer increased.

Figure 3b (top) compares the starting, the deposited and our model in terms of FSC with respect to the full map (FSCfull). Stereochemical quality statistics for the three models are summarized in Figure 3b (bottom). Examples of secondary structure elements in the respective density parts are shown in Figure 3c. CDMD refinement converged quickly to a state being very close to the deposited reference and showed a significant improvement of the refined model both in geometry and reciprocal-space correlation relative to the starting structure (Figure 3b and Figure 3—figure supplement 2). Only Rosetta was able to converge as close to the reference as our method, yielding a structure with good geometry, while neither Phenix nor Refmac was. Refinement with Phenix produced reasonable geometry but showed poor convergence of solvent-exposed flexible loops, which contributed to the larger RMSDref. Refinement with Refmac did not place those protein regions into the density that showed the largest RMSDref prior to refinement (see Figure 3a) and produced rather poor geometry. Stereochemical quality statistics for the models shown in Figure 3—figure supplement 2 are summarized in Table 1.

Table 1
Refinement statistics for the aldolase system.
https://doi.org/10.7554/eLife.43542.029
CDMDPhenixRosettaRefmac
FSCavg (full map)0.7740.7040.7900.778
EMRinger4.482.644.901.31
Bond lengths (Å)0.0220.0060.0220.012
Bond angles (°)2.221.301.762.90
MolProbity1.151.490.613.57
All-atom clashscore0.243.820.2831.6
Ramachandran statistics:
Favored (%)96.41100.098.1793.33
Allowed (%)2.790.01.765.06
Outliers (%)0.810.00.071.61
Poor rotamers (%)2.612.610.0932.6
CaBLAM flagged (%)9.010.18.3213.3

When comparing our model with the deposited reference, we noticed that, despite both showing similar reciprocal correlation with the full map and having similar EMRinger scores, the latter contained almost an order of magnitude more steric clashes, whereas the rotamer and Ramachandran statistics were poorer for the former. We assume that this relationship between the number of steric clashes and the rotamer/backbone quality may be explained by the way CDMD and those methods relying on knowledge-based structural restraints (e.g. Rosetta or Phenix) optimize the fit to density. We postpone a more detailed discussion on this observation to the end of the Results section. To determine which of the two models was in fact closer to the ‘true structure’, we compared both with the higher-quality control structures (Figure 3d). Neither model showed a considerably smaller RMSD value to any of the control structures, suggesting that both models might be equally good approximations of the ‘true structure’. Overall, the aldolase results suggest that CDMD is capable of correcting structural errors and converges quickly to the target state while maintaining good geometry at all refinement stages.

Tubulin: accuracy and convergence at medium resolutions

We next applied CDMD to a medium-resolution map (4.1 Å) of a GDP-tubulin dimer in the straight microtubule-like conformation derived from an asymmetric helical reconstruction of a complete, kinesin-decorated microtubule lattice (provided by courtesy of Zhang and Nogales, 2015). No reference structure was available for this map. However, two high-resolution GDP-tubulin structures (PDB ID: 3JAS at 3.3 Å and PDB ID: 6DPV at 3.5 Å) in the same conformational state were available (Zhang et al., 2015; Zhang et al., 2018). We used these structures as controls to confirm the method’s accuracy and convergence for medium-resolution maps. A curved structure of GDP-tubulin (PDB ID: 4ZOL; Wang et al., 2016b) after ∼3 μs of MD simulation in explicit solvent (Igaev and Grubmüller, 2018) was used as the starting model (RMSD6DPV ≈ 3.7 Å; Cα atom deviations from the target straight state are shown in Figure 4a).

Figure 4 with 1 supplement see all
Refinement of a curved tubulin dimer into a map of the straight, microtubule-like state.

(a) RMSD (Cα atoms) between the starting and the control model (6DPV) showing the extent of rearrangements between the solution (curved) and the microtubule-like (straight) tubulin conformation. The α-subunits of both models were aligned for the RMSD calculation. (b) Reciprocal-space agreement of the starting (black dashed), the control (gray and dark gray) and our model (cyan) with the full map (top) and stereochemical quality for the four models assessed by EMRinger and MolProbity (bottom). (c) Representative secondary structure elements showing local agreement of our model with the full map. (d) Same as in c but showing the closeness of our model to higher resolution control structures in terms of RMSD. Some residues are explicitly labeled.

https://doi.org/10.7554/eLife.43542.008

Figure 4b,c,d compares our model with the two high-resolution control structures as well as with the starting model. Despite the higher accuracy of the control structures, our model reached the same reciprocal-space correlation with the full map across the whole range of spatial frequencies (Figure 4b, top). This is particularly remarkable because the high-resolution features were not as pronounced in the 4.1 Å map as they were in the 3.3 Å and 3.5 Å maps used for generating the control structures. The good correlation did not result from overfitting or geometry violations as confirmed by the cross-validation procedure (as in Figure 3—figure supplement 1b, data not shown) and stereochemical quality assessment (Figure 4b, bottom). Our model was very similar in stereochemical quality to the control structures except for the ∼10-fold smaller number of steric clashes and a higher fraction of Ramachandran outliers. The higher EMRinger score for our model indicated improved local side chain correlations and backbone placement.

As in the aldolase case, CDMD refinement quickly converged to the straight state that was very similar in RMSD to the control structures (Figure 4—figure supplement 1). Among all the methods tested, only Phenix produced a structure with comparable geometry and map agreement as our model. Rosetta and Refmac showed poorer convergence in more flexible loop regions, with the Rosetta model having a much higher number of steric clashes. Stereochemical quality statistics for the models shown in Figure 4—figure supplement 1 are summarized in Table 2. Altogether, these results demonstrate the accuracy and good convergence of CDMD at medium resolutions, namely its ability to provide plausible local geometries in cases where this information is not fully present in the map and the refinement has to partially rely on the force field.

Table 2
Refinement statistics for the tubulin system.
https://doi.org/10.7554/eLife.43542.030
CDMDPhenixRosettaRefmac
FSCavg (full map)0.7560.7840.7030.743
EMRinger1.791.070.920.72
Bond lengths (Å)0.0220.0100.0210.014
Bond angles (°)2.301.572.792.15
MolProbity1.421.582.302.58
All-atom clashscore0.4611.724.913.8
Ramachandran statistics:
Favored (%)93.2899.4193.7593.16
Allowed (%)5.310.594.726.01
Outliers (%)1.420.01.530.83
Poor rotamers (%)2.630.730.144.38
CaBLAM flagged (%)14.116.316.89.5

TRPV1: highly heterogeneous local resolution

Our next test system was the vanilloid receptor 1 (TRPV1) resolved at 3.2 Å (EMD ID: 5778, PDB ID: 3J5P; Liao et al., 2013) for which the local resolution ranged from 2.5 to 7 Å, such that no standard non-MD refinement algorithm could be applied to fit a distant model uniformly well to all map regions. The large difference in the local resolution is due to a rigid transmembrane (TM) domain and flexible, cytosolic ankyrin repeat domains (ARDs) (Liao et al., 2013; Gao et al., 2016). To further challenge our method, we generated a distant starting model with poor geometry by subjecting the reference structure (3J5P) to MD simulation in explicit solvent at T = 300 K. The overall RMSDref for the selected starting model was ∼5 Å, with local deviations reaching up to ∼20 Å in the ARD region and the upper part of the TM domain (Figure 5a). No higher resolution TRPV1 structures were available for additional model validation. It should also be noted that residues 111–198 of the ARDs were only poorly reflected in the density. We nevertheless included these parts in the refinement procedure because the correlation-driven potential (Figure 1) does not depend on the local density amplitude and would still refine low-resolution features (overall shape, orientation, etc.).

Figure 5 with 2 supplements see all
Refinement of a poor structure of TRPV1 in a distant conformation into a map with highly heterogeneous local resolution.

(a) RMSD (Cα atoms) between the starting and the reference model (3J5P) showing the extent of rearrangements the TRPV1 structure undergoes during refinement. (b) Reciprocal-space agreement of the starting (black dashed), the reference (gray) and our model (purple) with the full map (top) and stereochemical quality of the models assessed by EMRinger and MolProbity (bottom). Note that the starting model was derived from the deposited structure by subjecting the latter to MD at T = 300 K. (c) Representative secondary structure elements showing local agreement between of our model with the full map.

https://doi.org/10.7554/eLife.43542.010

Figure 5b compares our model with the deposited reference structure as well as with the starting model. Unlike in the above cases, our model correlated better with the full map at spatial frequencies below 0.1 Å-1 (better rigid-body positioning of the domains) but worse in the 0.2–0.3 Å-1 range (Figure 5b, top). We hence asked whether this difference in the higher frequency range might have resulted from the deposited structure (PDB ID: 3J5P) having conflicting geometries or from our model being underfitted due to a low force constant (here, k = 4 × 105 kJ mol-1). Neither the standard cross-validation procedure (as in Figure 3—figure supplement 1, data not shown) nor subsequent re-refinement of the starting model using higher force constants (4.5 and 5 × 105 kJ mol-1) revealed any over- or underfitting, which suggests that the seemingly better correlation of the deposited model at higher spatial frequencies is likely a result of overfitting. This was further supported by the model quality assessment (Figure 5b, bottom), where our model was systematically better in all quality statistics except for a moderate fraction of Ramachandran outliers.

To determine the radius of convergence for the methods under study, we performed comparative modeling using the same distant starting structure as above (Figure 5—figure supplement 1). CDMD refinement converged to a structure with good geometry and RMSDref ≈2.5 Å, while the other methods did not. The convergence was good in the well-resolved TM region across all methods, but only Refmac was able to place the flexible ARD regions back into the density (RMSDref ≈2.8 Å), however, at the price of severe geometry violations. Phenix and Rosetta yielded accurate models but were unable to refine solvent-exposed flexible loops in the ARD region. Stereochemical quality statistics for the models shown in Figure 5—figure supplement 1 are summarized in Table 3.

Table 3
Refinement statistics for the TRPV1 system.
https://doi.org/10.7554/eLife.43542.031
CDMDPhenixRosettaRefmacCDMD (TM domain)
FSCavg (full map)0.6320.6390.6260.5300.684
EMRinger1.281.080.980.3482.12
Bond lengths (Å)0.0220.0080.0210.0120.024
Bond angles (°)2.141.502.182.812.25
MolProbity1.231.901.393.551.50
All-atom clashscore0.218.402.0329.00.29
Ramachandran statistics:
Favored (%)93.3799.2493.7591.2990.89
Allowed (%)5.970.765.267.017.59
Outliers (%)0.660.00.991.701.52
Poor rotamers (%)1.953.900.1126.93.01
CaBLAM flagged (%)15.923.220.517.914.8

Finally, to assess to what extent the CDMD refinement depends on the local map resolution, we performed two additional refinement runs using different but similarly distant starting structures (RMSDref ≈5 Å in both cases). We found that all three refinements produce very similar models but the structural variability was larger in the ARDs, whereas the refinements converged to almost the same structure in the TM domain, consistent with the local resolution in these regions (Figure 5—figure supplement 2). In summary, the TRPV1 refinements demonstrate that CDMD has a large and robust radius of convergence and is able to significantly improve poor starting models irrespective of local resolution and within a single refinement run.

TRPV1: comparison with Rosetta and ReMDFF

The TRPV1 system has been previously used to benchmark the Rosetta (Wang et al., 2016a) and ReMDFF (Singharoy et al., 2016; Wang et al., 2018) algorithms, which offers an additional comparison to these methods. However, the Rosetta model (PDB ID: 3J9J; Barad et al., 2015) contained only the TM region, whereas only one TRPV1 monomer was refined with ReMDFF (courtesy by Wang et al., 2018). For a direct comparison, we cut off the flexible ARDs in the starting TRPV1 model (Figure 5a) to match the sequence of the deposited Rosetta model and re-refined it using a higher force constant (k = 5 × 105 kJ mol-1), as more structural details and less structural variability were expected for the high-resolution TM region (Figure 6a,b). For the ReMDFF comparison (Figure 6c,d), no additional refinement was performed, and the monomer showing the best map-model agreement and model geometry from the three independently refined TRPV1 structures was chosen for further analysis (Figure 5—figure supplement 2).

Comparison of our TRPV1 model with those previously refined using Rosetta (a, b) and ReMDFF (c, d).

Overlays of our model (pink and violet ribbon) with the Rosetta (left, gray ribbon) and ReMDFF (right, gray ribbon) models are shown in (a) and (c), respectively. Reciprocal-space agreement with the full map (top) and stereochemical quality for the four models assessed by EMRinger and MolProbity (bottom) are shown in (b) for the Rosetta model and in (d) for the ReMDFF model.

https://doi.org/10.7554/eLife.43542.013

As shown in Figure 6b, our model for the TM region compared well with that generated by Rosetta both in terms of reciprocal-space correlation and model quality. In the ReMDFF comparison, our model outperformed the ReMDFF model both in reciprocal-space correlation and model geometry (Figure 6d). This was mainly due to better map-model agreement of our model in the low-resolution ARD region (Figure 6c, bottom insert), which was confirmed by the higher reciprocal correlation values at low spatial frequencies (Figure 6d, top). We also observed much poorer rotamer and Ramachandran distributions for the ReMDFF model (Figure 6d, bottom) despite the fact that, like our method, ReMDFF employs an accurate atomistic force field.

N-ethylmaleimide sensitive factor (NSF): comparison with Phenix

Further to the previous section, we compared how CDMD performs on a system that has been recently refined using the newest Phenix protocol (Afonine et al., 2018) with optimized refinement parameters (White et al., 2018). To this end, the structure of an ATP-bound NSF (PDB ID: 3J94; Zhao et al., 2015) was refined into the ∼3.9 Å map of a NSF complex (EMD ID: 9102, class 1; White et al., 2018). To enable a direct comparison, the starting structure was completed to match the deposited Phenix model (see Materials and methods). The starting structure was also subjected to MD simulation at T = 300 K to increase the deviation from the target state (Figure 7a). Higher-resolution X-ray structures were used to validate the results (PDB IDs: 1NSF at 1.9 Å (Yu et al., 1998); 1D2N at 1.75 Å (Lenzen et al., 1998)). No half-maps were available for this test case, which precluded the half-map-based cross-validation and, hence, determining the optimal force constant. We therefore performed seven independent refinement runs starting from the same distant structure (Figure 7a) and using a wide range of force constants (2–8 × 105 kJ mol-1). The model with the best balance between map-model agreement and geometry was then used for further analysis (see Materials and methods for details on refinements against full maps only).

Figure 7 with 1 supplement see all
Refinement of a substrate-free NSF complex in a distant conformation into a medium-resolution map at 3.9 Å.

(a) RMSD (Cα atoms) between the starting and the reference model (6MDO) showing the extent of rearrangements the NSF structure undergoes during refinement. (b) Reciprocal-space agreement with the full map for the starting (black dashed), the reference (gray) and the set of final models (red gradient) refined using a wide range of target force constants (top) and stereochemical quality assessed by EMRinger and MolProbity (bottom). (c) Representative secondary structure elements showing local agreement of the model refined at k = 5 × 105 kJ mol-1 with the map. (d) ATP binding pocket of the D2 domain (chain A) showing the closeness of our model and the reference to higher-resolution control X-ray structures in terms of RMSD. Some residues are explicitly labeled.

https://doi.org/10.7554/eLife.43542.014

Figure 7b compares our refined NSF models with the deposited reference as well as with the starting model in terms of map-model agreement and geometry. Figure 7c additionally shows local agreement of our model with the full map. The map-model agreement improved gradually as the target force constant increased, but the final models correlated almost equally well with the full map (Figure 7b, top, red curves). Because the half-maps were not available, it was challenging to judge which of the obtained models suffered from under- or overfitting. We therefore assessed the stereochemistry of the models (Figure 7b, bottom), while simultaneously comparing them to the deposited reference structure (PDB ID: 6MDO). One would expect that, as the force constant becomes too high, the map-model agreement would increase at the expense of geometry violations (atomic clashes or rotamer and Ramachandran outliers), whereas for insufficiently strong force constants, the model stereochemistry should be less dependent on the biasing force, suggesting that the ‘optimal’ force constant should fall within the chosen range (2–8 × 105 kJ mol-1). Indeed, this was the case for the NSF refinement: while the models at k = 2–6 × 105 kJ mol-1 showed similar geometry, increasing the force constant beyond 6 × 105 kJ mol-1 led to a drastic increase in the number of rotamer outliers comparable with that of the starting model, potentially indicating overfitting. We also noticed that the models at 2 and 3 × 105 kJ mol-1 had lower EMRinger scores, suggesting that the biasing force was insufficient to improve the local side chain-map correlation and potentially indicating underfitting. This was further supported by the weaker reciprocal correlation for these two models as compared to both reference and other models at k> 3 × 105 kJ mol-1 (Figure 7b, top). We hence concluded that the optimal force constant should be in the range of 4–5 × 105 kJ mol-1, which was comparable to what we used to refine the TRPV1 system (Figure 5) being approximately of the same size. We hence used the structure refined at 5 × 105 kJ mol-1 for further analysis.

Comparison of our NSF model with the deposited reference revealed that it had a substantially lower clashscore (Figure 7b, bottom), which might be attributed to the use of an accurate force field that improved poorly defined regions. To determine whether this difference also indicated that our NSF model was closer to the ‘true structure’, we compared both models with the higher quality X-ray structures in terms of RMSD (Figure 7d). As the control structures contained only a single protomer of the D2 ring, we truncated our model and the reference to match each of the control structures prior to RMSD calculation. Our NSF model showed systematically lower RMSD values to both controls (roughly a 0.2 Å difference, which is detectable given the small size of the compared structures). Visual inspection revealed that this was mainly explained by the difference in the ATP binding pockets and that our model more closely resembled the control structures in this region. Particularly, we noticed that either the adenosine group of ATP was rotated by 90° or 180° relative to that of the controls (3 of 6 NSF protomers in 6MDO) or the phosphate tail of ATP deviated from that of the controls (all protomers in 6MDO). In contrast, our model reproduced the structure of the ATP binding pocket equally well across the entire D2 hexamer. We speculate that a proper treatment of electrostatics is important for correct ligand placement in the absence of sufficient density information.

Finally, we assessed the convergence of the NSF refinement across the different methods (Figure 7—figure supplement 1). Stereochemical quality statistics for the models are summarized in Table 4. Unlike in the above test cases, none of the alternative refinement methods was able to approach the reference structure demonstrating only ∼0.5 Å of improvement in terms of RMSDref as compared to ∼2 Å of improvement by CDMD. Interestingly, the convergence was better in the D2 domain for all alternative methods than in the D1 domain, where the split region between protomers A and F and the SNAP-25A 17 N-terminal residues were only poorly modelled by Phenix, Rosetta and Refmac. We speculate that, for the alternative methods, the hexagonal symmetry of the D2 domain translated into global symmetry restraints improves the refinement of this part of the NSF complex relative to the D1 domain, where the symmetry is broken by the split region. Here, we again emphasize that CDMD does not need symmetry restraints (or any other geometry-based restraints) and relies solely on the interatomic interactions defined by the force field.

Table 4
Refinement statistics for the NSF system.
https://doi.org/10.7554/eLife.43542.032
CDMDPhenixRosettaRefmac
FSCavg (full map)0.7650.7060.7480.479
EMRinger1.770.901.560.11
Bond lengths (Å)0.0220.0070.0210.020
Bond angles (°)2.301.522.173.29
MolProbity1.381.381.423.78
All-atom clashscore0.136.933.2647.1
Ramachandran statistics:
Favored (%)92.3798.8695.6789.53
Allowed (%)6.711.033.737.03
Outliers (%)0.920.110.603.44
Poor rotamers (%)2.950.410.0825.8
CaBLAM flagged (%)13.120.514.922.1

Nucleosome: non-rigid-body transition in a protein-DNA complex

We next sought to test how CDMD would perform on a system (a) for which a non-trivial concerted motion is expected (i.e. which cannot be described by a set of rigid body translations or rotations), and (b) that also includes DNA whose correct stereochemistry is notoriously hard to maintain using modern force fields (Galindo-Murillo et al., 2016; Ivani et al., 2016; Dans et al., 2017). To this end, we refined the recently published low-resolution structure of a ‘breathing’ nucleosome complex (PDB ID: 6ESI; Bilokapic et al., 2018) into the 3.7 Å map of the canonical nucleosome state (EMD ID: 3947, PDB ID: 6ESF; Bilokapic et al., 2018). The major rearrangements between the two states include a contraction along the symmetry axis coupled to a concomitant expansion in the perpendicular direction and a shift by ∼1 base pair in the DNA due to rearrangements in the histone-DNA interface. To further challenge our refinement method, we increased the RMSD of the starting structure from the target state by subjecting it to MD simulation in explicit solvent at T = 300 K, which, in addition to the above deviations, led to overall DNA buckling and slipping off the histone octamer (RMSD ≈5 Å with local deviations reaching ∼15 Å; see Figure 8a). The success of refining such a starting structure, therefore, ultimately hinges on an accurate treatment of concerted molecular motions. We used the following higher-quality X-ray structures of the canonical nucleosome state to validate the results: 1KX3 at 1.9 Å (Davey et al., 2002) and 5OMX at 2.3 Å (Frouws et al., 2018) for the histone part only (due to different DNA sequences), and 5MLU at 2.8 Å (Makde et al., 2010) for the DNA part.

Figure 8 with 2 supplements see all
Refinement of a distant nucleosome structure into a medium-resolution map of the canonical nucleosome state.

(a) RMSD (DNA and protein backbone) between the starting and the reference model (6ESF) showing the extent of rearrangements during the refinement. (b) Reciprocal-space agreement of the starting (black dashed), the reference (gray) and our model (yellow) with the full map (top) and stereochemical quality assessed by EMRinger and MolProbity (bottom). (c) Representative secondary structure elements showing local agreement of our model with the full map. (d) Representative region next to the dyad DNA showing the closeness of our model to higher-resolution control structures in terms of RMSD (only protein non-hydrogen atoms). Some residues are explicitly labeled.

https://doi.org/10.7554/eLife.43542.016

Figure 8b compares our model with the deposited reference in terms of map-model agreement and geometry. Figure 8c additionally shows local agreement of our model with the full map. Both models correlated almost equally well with the full map (Figure 8b, top), with the deposited model correlating slightly better at spatial frequencies above 0.2 Å-1, which again raised the question of underfitting as in the case of the TRPV1 refinement (see previous section). We therefore performed a series of nucleosome refinements using force constants ranging from 2 to 4.5 × 105 kJ mol-1 (k = 3 × 105 kJ mol-1 was used in Figure 8), while simultaneously assessing the model geometry (Figure 8—figure supplement 1). The goodness-of-fit slowly improved as the force constant increased, but the refined structure never reached the same extent of reciprocal-space correlation as the deposited reference, suggesting that the deposited reference structure might be slightly overfitting the full map. This conclusion was also supported by the model quality statistics (Figure 8b, bottom) which showed a systematic improvement of our model relative to both starting and deposited reference structure, except for a moderate fraction of Ramachandran outliers and a slightly lower EMRinger score.

To validate these results, we compared our model and the deposited reference with the higher resolution control structures in terms of RMSD (Figure 8d, only protein non-hydrogen atoms). As in the aldolase case, neither model had systematically lower RMSD values to both control structures, suggesting that both models are good representations of the ‘true structure’. The DNA part was very similar across all models: RMSDcontrol ≈1.4 Å for both our model and the reference, and RMSD less than 0.9 Å between the compared models themselves (all DNA non-hydrogen atoms). It should be noted that the DNA accuracy in the deposited reference (refined with Phenix) was due to the use of base-pairing and base-stacking restraints and a much closer starting structure. On the contrary, the high level of achieved DNA quality in our model is particularly remarkable because (a) no geometry restraints were used, and (b) the CHARMM force field family used to refine the nucleosome (see Materials and methods) is rather inaccurate in reproducing correct nucleotide base pairing when used in plain MD simulations (Galindo-Murillo et al., 2016). We hence conclude that the ‘soft’ (global) fitting potential (Figure 1) with a well-adjusted weight is a substantial factor in maintaining the modelled structure’s accuracy, even in cases where neither the resolution nor the structural knowledge (e.g. a force field) are satisfactory.

Finally, the convergence of our and the alternative methods was assessed using the same distant starting structure (Figure 8a). Stereochemical quality statistics for the generated models are summarized in Table 5. Among all the methods tested, only CDMD was able to achieve a model accuracy and map-model agreement similar to the deposited reference. Although the histone part was refined similarly well by all of the methods, the DNA part was, if anything, only poorly refined (Refmac) and in most cases disintegrated (Phenix and Rosetta). The poor DNA quality of the Rosetta model is likely due to the internal Rosetta energy function that was originally designed for protein modeling (Alford et al., 2017), and a generalization for RNA/DNA is currently being developed. The fact that the Phenix refinement, in which DNA restraints were constantly switched on, was unable to converge to the reference structure might indicate that the starting structure is out of this method’s radius of convergence. For Refmac, the optimal geometry weights necessary to obtain good agreement for the DNA and the histone part differed by 2–3 orders of magnitude such that a refinement of the entire nucleosome structure in a single run was impossible without severe geometry violations.

Table 5
Refinement statistics for the nucleosome system.
https://doi.org/10.7554/eLife.43542.033
CDMDPhenixRosettaRefmac
FSCavg (full map)0.6590.6760.5720.399
EMRinger1.201.500.880.76
Bond lengths (Å)0.0220.0080.0190.018
Bond angles (°)1.821.102.603.38
MolProbity0.651.582.603.88
All-atom clashscore0.18.8491.763.5
Ramachandran statistics:
Favored (%)97.3199.8797.0488.84
Allowed (%)1.880.131.758.60
Outliers (%)0.810.01.212.55
Poor rotamers (%)0.511.360.1622.62
CaBLAM flagged (%)5.68.29.415.3

Ribosome: refinement of a large protein-RNA complex

As a second example of a mixed nucleotide-protein system, we refined a large bacterial 70S ribosome/SelB complex in the codon reading (CR state; PDB ID: 5LZC) into the recently published 3.4 Å map (EMD ID: 4124, PDB ID: 5LZD; Fischer et al., 2016) of the GTPase-activated state (GA). Similar to the nucleosome test case, the chosen ribosome system is a nucleic-acid-protein complex (here, RNA) which undergoes non-trivial rearrangements in switching from CR to GA to form the codon-anticodon base pairs (local deviations reaching ∼19 Å in the A-site tRNA and SelB elongation factor; Figure 9a and Figure 9—video 1). Besides these challenges, the refinement of such a large complex is a multiscale problem where local errors introduced by local adjustments accumulate and may propagate far beyond the refined location due to the large system size, if not resolved globally. With currently available refinement tools, several algorithms need to be applied iteratively to obtain a good quality model (Fischer et al., 2016). The ribosome refinement case, therefore, served to test if the method is able to handle local and global structural changes simultaneously while gradually improving the map-model agreement. We validated our results using the high-resolution X-ray structure of an Escherichia coli ribosome (PDB ID: 4YBB at 2.1 Å; Noeske et al., 2015) and the high-resolution X-ray structure of a SelB mRNA-binding domain (PDB ID: 2PJP at 2.3 Å; Soler et al., 2007).

Figure 9 with 2 supplements see all
Refinement of a ribosome complex in the CR state into a 3.4 Å map of the GA state.

(a) RMSD (RNA and protein backbone) between the starting (CR) and the final (GA) model showing the extent of rearrangements during the refinement. (b) Reciprocal-space agreement of the starting (black dashed), the reference (gray) and our model (orange) with the full map (top) and stereochemical quality for the three models assessed by EMRinger and MolProbity (bottom). (c) Representative regions showing local agreement between our model and the map (the codon-anticodon region and a RNA stem-loop in the 30S subunit). (d, e) Representative regions in the ribosomal exit tunnel (constriction site formed by L4 and L22 protein chains is shown) and in the SelB-mRNA contact interface, both demonstrating the closeness of our model to higher resolution control structures in terms of RMSD. Some protein residues are explicitly labeled.

https://doi.org/10.7554/eLife.43542.019

Figure 9b compares our model with the reference structure in terms of map-model agreement and geometry. Figure 9c additionally shows local agreement of our model with the full map. Both models agreed well with the full map and with each other (RMSD ≈1.3 Å for all non-hydrogen atoms). The geometry scores for both models were largely consistent, except for the better Ramachandran statistics and clashscore in our model vs. the lower number of rotamer outliers in the deposited reference. Although this refinement case was not as challenging in terms of radius of convergence as the above examples (RMSDref ≈2.5 Å for the starting structure after equilibration), CDMD was able to produce a good quality model despite the system size and the straightforwardness of the CDMD refinement, compared to the multi-method approach used in the original study (Fischer et al., 2016). We additionally verified our model’s quality by comparing it and the reference with the higher resolution control structures (Figure 9d,e). As the control Escherichia coli ribosome structure (4YBB) was in a different translation state, we extracted only the 23S rRNA together with proteins L4 and L22 of the large 50S subunit (also shown in part in Figure 9d). From the control structure of the SelB mRNA-binding domain (2PJP) only the protein part was extracted (Figure 9e). Both reference and CDMD model were in good agreement with the 23S control structure showing RMSD values of ∼3 Å, which is small given the size of the compared parts. The agreement was worse for the much smaller SelB part, where both models deviated from the control by ∼2.7 Å in terms of RMSD. This can be explained by the lower local resolution of the cryo-EM density in this region. None of the compared models showed significantly smaller or larger RMSDref values from the controls, suggesting that both models are equally good approximations of the ‘true structure’.

To assess the radius of convergence of CDMD and the other alternative methods, we used the same starting structure as in Figure 9a (see Figure 9—figure supplement 1). Refinement statistics for the generated models are summarized in Table 6. Similarly to the NSF case, none of the tested alternative methods was able to achieve a map-model agreement and accuracy consistent with the reference or CDMD model. While Phenix performed relatively well in the ribosomal core domains, yielding a structure with good geometry, its convergence was poor for solvent-accessible parts and for the SelB factor (the region with the largest RMSDref in Figure 9a). As a result, the Phenix model had a larger RMSD from the reference than the starting structure itself. Refinement with Rosetta yielded a rather poor model due to its energy function’s unsuitability for DNA/RNA modeling. Also in this case the final RMSDref value was larger than the starting one. Refmac was the only alternative method tested that refined the starting structure toward the reference, resulting in a final RMSDref being smaller than the starting one. However, as in the Phenix refinement, it was unable to refine regions of the starting model that deviated strongly from the reference. Overall, these comparative modeling results confirm the necessity for a multi-method method approach when refining large systems such as the ribosome with current non-MD methods (Fischer et al., 2016). On the other hand, they also suggest that CDMD may drastically reduce the effort in such a refinement.

Table 6
Refinement statistics for the ribosome system.
https://doi.org/10.7554/eLife.43542.034
CDMDPhenixRosettaRefmac
FSCavg (full map)0.7660.8070.5490.724
EMRinger1.561.870.240.50
Bond lengths (Å)0.0170.0160.0620.010
Bond angles (°)2.011.416.881.83
MolProbity1.611.423.172.69
All-atom clashscore0.287.65152.99.34
Ramachandran statistics:
Favored (%)94.0999.2590.8492.06
Allowed (%)5.100.755.786.69
Outliers (%)0.810.03.381.25
Poor rotamers (%)6.231.010.338.69
CaBLAM flagged (%)14.019.023.716.8
RNA backbone0.540.410.3810.38

CorA magnesium transporter: low-resolution map fitting

Having shown that CDMD provides plausible model geometries consistent with the map even when the cryo-EM density contains only partial structural information, we proceeded to a more challenging case where only low-resolution information was available. To this end, we refined the medium-resolution structure (PDB ID: 3JCF; Matthies et al., 2016) of a magnesium transporter CorA in the closed, symmetric Mg2+-bound state into a 7.1 Å subclass map (EMD ID: 6552; PDB ID: 3JCG, backbone model only) of one of the Mg2+-free, asymmetric open states. Unlike in the above cases, the starting structure was equilibrated in explicit solvent but not further subjected to long-term MD at T= 300K as the RMSD from the target state already reached ∼25 Å for some parts of the channel (Figure 10a). A series of full map refinements was performed using a constant σ value of 0.6 nm and a set of low force constants (0.5, 1.0, 1.5 and 2.0 × 105 kJ mol-1), which was consistent with the low map resolution. We found the values between 1.0 and 2.0 × 105 kJ mol-1 to be optimal for the refinement because they enforced sufficient map-model agreement while preserving secondary structure and maintaining reasonable model geometry. Increasing the force constant beyond 2.0 × 105 kJ mol-1 did not result in better map-model agreement, and for much higher values (comparable with those in, e.g., Figure 3; data not shown), the model quality dropped significantly, suggesting overfitting of the map. We additionally compared our results with CorA models produced by MDFF, a well-established tool originally designed for exactly such refinement cases (see Materials and methods for the MDFF protocol).

Figure 10 with 3 supplements see all
Refinement of a CorA magnesium transporter in the symmetric closed state into a low-resolution 7.1 Å map of the asymmetric open state.

(a) RMSD (Cα atoms) between the starting (closed) and the reference (open) model showing the extent of rearrangements in the cytosolic part of the channel during refinement. (b) Reciprocal-space agreement of the starting (black dashed), the reference (gray) and our model refined with k= 1.0 × 105 kJ mol-1 (sea green) with the full map (top) and stereochemical quality for the three models assessed by EMRinger and MolProbity (bottom). The FSC curves were calculated using the backbone atoms only, whereas the full-atom models were used for geometry analysis. No EMRinger scores were calculated. (c) Same as in (b) but for the two MDFF models refined with (blue) and without (dark blue) secondary structure, chirality and cis peptide bond restraints. (d) Overlay of our full-atom model (see green) with full map.

https://doi.org/10.7554/eLife.43542.022

A direct comparison to the deposited reference (3JCG) was complicated by the fact that all side chains were removed in the original study due to the lack of density to assign their conformations (Matthies et al., 2016). Therefore, all side chains in our and the MDFF models were exempted from the FSC curve calculations (Figure 10b,c, top), but we used the all-atom models to assess the stereochemistry (Figure 10b,c, bottom). Figure 10d additionally shows an overlay of our model with the map. Both our model and the one refined with MDFF correlated similarly well with the low-resolution map, with the latter correlating better at spatial frequencies above 0.1 Å-1 (irrespective of whether or not secondary structure restraints were enabled). However, the quality of the MDFF models was systematically lower than that of our models (Figure 10b,c, bottom and Figure 10—figure supplement 1), although each model had almost zero steric clashes (due to the use of an atomistic force field). The quality of the MDFF model decreased as the restraints were released (Figure 10c and Figure 10—figure supplement 1, blue vs. dark blue). Stereochemical quality statistics for all CDMD and MDFF models shown in Figure 10 are summarized in Table 7.

Table 7
Refinement statistics for the CorA system.
https://doi.org/10.7554/eLife.43542.035
CDMDMDFF (restraints)MDFF (no restraints)
FSCavg (full map)0.6730.7110.713
Bond lengths (Å)0.0170.0210.020
Bond angles (°)2.112.342.30
MolProbity1.151.311.48
All-atom clashscore0.070.00.0
Ramachandran statistics:
Favored (%)94.7793.0190.09
Allowed (%)4.015.297.36
Outliers (%)1.221.702.55
Poor rotamers (%)2.183.023.72
CaBLAM flagged (%)11.113.318.1

Does the CorA model refined with CDMD represent the asymmetric open conformation more accurately? To answer this question, a comparison with high-resolution structures of the asymmetric open state is required which were not available for the CorA system. Nevertheless, there is experimental and simulation evidence that removal of the regulatory Mg2+ ions from the symmetric CorA conformation leads to large-scale rearrangements of the intracellular stalk helices that increase the diameter of the gating pore, hence triggering ion conduction (Chakrabarti et al., 2010; Dalmas et al., 2014). Although pore opening is hard to infer by inspecting the cryo-EM map due to its low resolution, it is conceivable that an accurate refinement of the symmetric CorA structure with well-balanced contributions from both cryo-EM map and molecular force field would induce such structural changes. We therefore compared the structure of the gating pathway in all of the models analyzed in Figure 10b,c. Specifically, we analyzed the structure and biophysical properties of the CorA gating pathway using the CHAP program (Rao et al., 2017; Klesse, 2019). This analysis yielded a surface representation of the gating pathway as well as its radius at every position along the pathway (Figure 10—figure supplement 2 and Figure 10—figure supplement 3). The fact that we used explicit solvent in all of the simulations also allowed us to estimate the solvent density in the pore. The gating pore of our CorA model was, on average, wider along the pathway than that of the closed symmetric structure. The average solvent density in the gating pore of our CorA model was lower, consistent with the increased pore volume. Surprisingly, the gating pore in the MDFF model was rather curled and significantly smaller in volume than that of the closed symmetric structure, despite the intracellular stalk helices adopting an asymmetric conformation very similar to our model. It was mainly due to the gate closure in the central part of the transmembrane domain (pore radii reaching 0.5 Å, see Figure 10—figure supplement 3) that led to a complete permeation block as indicated by zero number density in this region (no water molecules were present). We speculate that this conformation of the gating pore in the MDFF model would lead to even weaker conductance than in the case of the closed symmetric channel in free MD simulations. Although a higher-resolution control is still required to validate the open structure produced by CDMD, the fact that the gating pore opens and is on average more solvated in the CDMD fitting simulations is in agreement with previous biochemical and simulation data (Chakrabarti et al., 2010; Dalmas et al., 2014).

Remarkably, MDFF was unable to induce gate opening. To test whether it is the secondary structure restraints (an option strongly recommended for low-resolution refinement with MDFF) that prevented the torque generated by rearranging the intracellular domains from propagating into the gate region, we switched off all the restraints and re-refined the starting structure. This restraint-free MDFF setup did not lead to gate opening either (Figure 10—figure supplement 2, rightmost structure; Figure 10—figure supplement 3, bottom right plot). We note that the restraint-free MDFF setup used here was identical to our refinement setup, except for the biasing potential form. We therefore speculate that it is the ‘hard’ MDFF biasing potential (each density grid point is an attractor) that makes the refined model less compliant and restricted only to the highest-density regions. On the contrary, the ‘soft’ correlation-based potential (global map-model overlap is maximized) allows more internal flexibility while maintaining a comparable degree of map-model agreement (Figure 10b,c, top, and Figure 10—figure supplement 1). When coupled to a stereochemically accurate treatment (atomistic force field), our refinement protocol might thus be useful in modeling biologically relevant transitions.

Force field influence and model accuracy

We now return to the question of why in many of the above test cases the remarkable reduction of steric clashes by CDMD often comes at the cost of increased rotamer and/or Ramachandran outliers, as compared to the deposited reference models (Figure 3, Figure 4, Figure 6a,b, Figure 7, Figure 9). It should be first noted that, by construction, the refinement methods used to produce the deposited models (COOT, Phenix, Rosetta, Refmac) often enforce accurate rotamer and Ramachandran distributions, whereas, also by construction, force-field-based refinement guarantees almost zero steric clashes. The question therefore narrows down to whether the structural errors observed in our models (a) arise due to imperfections of the force fields in use (e.g., inaccurate backbone dihedral, rotamer torsion angles or atomic partial charges), (b) are enforced by the cryo-EM data (real or noise), or (c) are a synergetic combination of these effects. It is conceivable that in scenario (a), outliers of certain amino acid types would be systematically and significantly overrepresented in the total outlier pool, both before and after structure refinement. In contrast, a low to moderate number of outliers being rather homogeneously distributed across the amino acid alphabet would point to scenario (b). These unspecific outliers might, in turn, be contributed by poorly determined map regions or be naturally occurring outliers in densely packed protein interiors.

To test this idea, we performed a per-residue analysis of the rotamer and Ramachandran outliers detected in our models (Figure 11, left; see Materials and methods). For each of the above non-ribosome test cases, the final structures and the ones pre-equilibrated in explicit solvent prior to refinement were analyzed, and the resulting statistics were merged. The ribosome case was analyzed separately because a different force field family was used (Figure 11, right). We first asked if our refinement procedure (Figure 1 and Figure 2) per se improves the rotamer and Ramachandran distributions. Indeed, there was an overall decrease both in the average per-residue outlier propensity and in the total outlier occurrence (Figure 11, gray vs. blue or red dots), irrespective of the force field. These data indicate that a considerable fraction of structural conflicts is resolved by the refinement and support scenario (b) above: that the cryo-EM density improves the geometry of initially poorer models leaving a certain number of residue-unspecific outliers.

Per-residue outlier analysis.

Per-residue rotamer and Ramachandran outlier propensities calculated from the non-ribosome structures (left panels; CHARMM22*/CHARMM36m force fields were used (Piana et al., 2011Huang et al., 2017) and the ribosome structure (right panels; AMBER12sb was used; Lindorff-Larsen et al., 2010) after pre-equilibration with the force field (gray dots) and after the actual refinement with our approach (blue and red dots, respectively).

https://doi.org/10.7554/eLife.43542.026

However, mainly for the rotamer distributions, we observed that those peaked at certain amino acid types more frequently, both before and after structure refinement. In particular, for the non-ribosome systems refined using CHARMM22*/CHARMM36m (Piana et al., 2011; Huang et al., 2017), aspartates, leucines, tryptophans and valines contributed most to the overall outlier occurrences, which, however, remained rather low (outlier propensities of ∼0.04 or lower). AMBER12sb (Lindorff-Larsen et al., 2010) force field used for the ribosome refinement performed similarly to CHARMM22*/CHARMM36m for all rotamer types except for threonines and valines which systematically disagreed with their knowledge-based conformations (an order of magnitude higher outlier propensities, ∼0.33 and ∼0.2, respectively). The presence of systematic structure errors, therefore, supports scenario (a) as these errors are residue- and force-field-specific and not corrected by the refinement.

Thus, there are several lines of evidence that the structural errors detected in our models are likely to be a combination of cryo-EM data-specific outliers and force field artifacts (scenario (c) above). It should be noted that the quality of a refined structure is only as good as the experimentally determined map. Low determinacy of 3D classification and poor alignment of cryo-EM particles may be related to the non-ideal nature of structure refinement. Furthermore, conformational heterogeneity and ensemble averaging misinterpreted as a single structure can be another source of structural errors in which case an ensemble refinement may be required (Rice et al., 1998; Burnley et al., 2012; Herzik et al., 2019; Bonomi et al., 2018). In our present study, we have concentrated on single-structure refinement and have implicitly assumed that each map stems from a unique conformational state of the biological complex in question. This is a good approximation for high-resolution maps (e.g. Figure 3). For medium- and low-resolution maps, the best representation might not be a single structure but rather a weighted set of structures. The TRPV1 system (Figure 5) is one example of such a case in our study. Here, three independent refinements produced three structures with very similar conformations in the high-resolution TM region and with much more variability in the low-resolution ARD domains (Figure 5—figure supplement 2). The subject of ensemble refinement, however, lies beyond the scope of this study.

Correction of the force-field outliers, in turn, is tightly linked to the general force field development. Still undefined, apparently, is the way a reduction of steric clashes should be reconciled with an improvement of other stereochemical measures (e.g. rotamer and Ramachandran statistics, Cβ deviations, chirality errors, etc.), as these might currently represent conflicting optimization goals. On the one hand, it is well known what the expectations are for such errors (Bower et al., 1997; Shapovalov and Dunbrack, 2011; Renfrew et al., 2008; Towse et al., 2016). On the other hand, incorporation of this knowledge into the force field development is challenging as it fundamentally differs from the way a rotameric library is constructed (Towse et al., 2016). Even without solutions to the above problems, our results clearly imply that the correlation-driven MD refinement is able to generate models with a quality consistent with well-established methods given the map quality.

Conclusions

We have established and validated the accuracy and convergence of CDMD by applying it to a diverse set of test cases (Figure 3-Figure 10) covering the most typical refinement challenges posed by modern cryo-EM: starting models in conformational states distant from the target map, poor starting model geometry, highly heterogeneous local map resolution, multiscale fitting where local adjustments have to be balanced by global optimization, low- or medium-resolution maps where high-resolution features have to be drawn elsewhere, or combinations thereof. Our results demonstrate that all these challenges can be handled by CDMD without additional backbone- and rotamer-specific restraints or manual intervention. Yet, our method has a much higher radius of convergence than current non-MD methods, which is largely independent of the map resolution and which is comparable to that of the most recent MDFF versions (Singharoy et al., 2016; Wang et al., 2018), namely, at least 6–7 Å in terms of the overall RMSD from the target state and at least 25 Å in terms of local deviations. We, once again, emphasize that some of the tested alternative approaches are not entirely optimal for the presented refinement cases, mainly due to the starting structures we used being too distant and/or containing RNA/DNA. The comparisons with the alternative methods, however, clearly illustrate the challenge of reconciling local refinement accuracy and large convergence radius, in which a proper treatment of concerted molecular motions plays a major role. Finally, the CorA case demonstrates that CDMD is capable of capturing biologically relevant conformational changes, thus underlining the advantage of using the correlation-based biasing potential for low-resolution refinements.

The dramatic recent advances in image collection and processing have resulted in cryo-EM densities being generated at an unprecedented pace, with the above challenges tightly interlaced. CDMD largely overcomes the shortcomings of the refinement methods available so far, namely, the poor convergence and accuracy when applied to maps with medium/low resolution (non-MD methods) or high resolution (MD-based flexible methods). Integrating well-established techniques, for example, the correlation biasing potential (Orzechowski and Tama, 2008) or simulated annealing (Brünger et al., 1987; Brunger and Adams, 2002), into an optimized protocol involving a chemically accurate force field, adaptive resolution and the half-map-based validation scheme has finally allowed to overcome these shortcomings in an automated fashion. We note that our method can still be used for quick (re-)refinement of close starting models against high-resolution maps (for example, Figure 6a,b) or as a tool for flexible fitting of distant atomic models into low-resolution maps (Figure 10), which underlines its flexibility and universality. Also, our implementation based on the highly scalable, GPU-accelerated GROMACS engine provides a good trade-off between refinement time and hardware cost (see the benchmark section in Materials and methods). In combination with modern de novo chain building tools, it provides a fully automated and human-bias-free framework for quantitative interpretation of modern cryo-EM data.

Materials and methods

Preparing maps and starting structures

Unless differently specified, VMD (Humphrey et al., 1996) and UCSF Chimera (Pettersen et al., 2004) were used to perform all map and structure manipulations. We used the previously published data sets specified in Table 8. All maps used in this study and the corresponding deposited reference models were publicly available for download from the EMDB or kindly provided by the corresponding authors.

Table 8
Data sets used in this study.
https://doi.org/10.7554/eLife.43542.027
SystemResolution, ÅEMDBDeposited PDBMethodCitation
Aldolase2.687435VY3Rosetta/PhenixHerzik et al. (2017)
Tubulin4.1n/a*3JAS, 6DPVCOOT/RefmacZhang and Nogales (2015)
TRPV13.3 (2.5–7)57783J5PCOOTLiao et al. (2013)
TRPV1 (TM)<3.357783J9JRosettaBarad et al. (2015)
TRPV1 (mono)3.3 (2.5–7)5778n/aReMDFFWang et al. (2018)
NSF3.991026MDOCOOT/Phenix§White et al. (2018)
Nucleosome3.739476ESFCOOT/PhenixBilokapic et al. (2018)
CorA7.165523JCGCOOT/PhenixMatthies et al. (2016)
70S Ribosome3.441245LZDCOOT/Rosetta/PhenixFischer et al. (2016)
  1. *Map segment derived from an asymmetric, 14-protofilament, kinesin-decorated microtubule reconstruction (provided by courtesy of R. Zhang).

    TM region of TRPV1 has a much higher resolution than ARDs.

  2. Provided by courtesy of S. Wang.

    §Optimized Phenix protocol was used (Afonine et al., 2018; White et al., 2018) that differed from that used for the other systems (Adams et al., 2010).

The maps were trimmed using an orthorhombic box to reduce the system size for further simulation. The trimmed maps were then converted into the CCP4 format for GROMACS compatibility using the em2em software (http://www.imagescience.de/em2em.html). X-ray or cryo-EM structures of an aldolase (PDB ID: 6ALD; Choi et al., 1999), a GDP-bound tubulin after ∼3 μs of MD simulation (PDB ID: 4ZOL; Wang et al., 2016b; Igaev and Grubmüller, 2018), a TRPV1 channel (PDB ID: 3J5P; Liao et al., 2013), a NSF complex (PDB ID: 3J94; Zhao et al., 2015), a nucleosome (PDB ID: 6ESI; Bilokapic et al., 2018), a CorA channel in the symmetric state (PDB ID: 3JCF; Matthies et al., 2016), and a 70S ribosome in the CR state (PDB ID: 5LZC; Fischer et al., 2016) were used as starting models.

The aldolase and tubulin starting structures were prepared as described in Herzik et al. (2017) and Igaev and Grubmüller (2018), respectively. For the TRPV1 starting structure, the missing non-terminal loops (residues 503–507) were modeled in using MODELLER version 9.17 (Fiser et al., 2000). The regions with unassigned side chains (UNK residues 720–763) were substituted by those from PDB ID: 5IRZ (Gao et al., 2016). All missing non-terminal residues in the starting NSF structure were modelled in using MODELLER. Two of four ATP molecules (chains A and E) were turned into ADP, consistent with the reference (PDB ID: 6MDO). The SNAP-25A 17 N-terminal residues were copied from PDB ID: 6MDP (White et al., 2018) and rigid-body fitted into the density. In the nucleosome starting model (PDB ID: 6ESI), a part of DNA was missing due to unwinding. This part, as well as short terminal tails of the histone complex, were copied from PDB ID: 3LZ1 (Vasudevan et al., 2010) to match the chain sequences in the deposited reference model (PDB ID: 6ESF). For the CorA starting structure, the Mg2+ ions and the N-terminal tails (residues 1–18 are not reflected in the density) were removed. The preparation of the 70S ribosome starting model is described in a separate section below.

We emphasize that having a complete starting model is not a requirement for our method, and that models with missing regions can still be refined. The starting model completion as described above was done to match the structure of the deposited models and to facilitate a fair comparison. Parts of the models for which there was no clear experimental density (e.g. residues 38–46 in tubulin’s α-subunit or residues 111–198 of TRPV1) were not subjected to the density fitting potential and removed later on during structure analysis. It is however preferable to have complete models if plain MD simulations are to be run with the refined structures.

Simulation setup and refinement of non-ribosomal systems

The non-ribosomal simulation systems consisted of the atomic model pre-aligned with the full map in a triclinic box of TIP3P water molecules (CHARMM-modified) with 1.5–2 nm padding and 0.15 M KCl. All bond lengths were constrained with the LINCS algorithm and virtual sites were used for hydrogens, allowing a 4-fs integration time step. CHARMM36m force field (Huang et al., 2017) was used in the CorA refinement for consistency with MDFF, while CHARMM22* force field (Piana et al., 2011) was used in all other cases. The simulation parameters are described elsewhere (Igaev and Grubmüller, 2018). All files as well as a step-by-step guide necessary for setting up the aldolase refinement (Figure 3) are available at https://www.mpibpc.mpg.de/grubmueller/densityfitting.

The starting models were subjected to the above MD setup followed by steepest-descent energy minimization, a short equilibration in NVT ensemble at T = 100 K with position restraints on the solutes’ heavy atoms (spring constant kposres = 1000 kJ mol-1 nm-1, simulation time 1 ns), and a 50-ns equilibration in NPT ensemble at T = 300 K to relax the systems and to increase the deviation from the target state before refinement against one of the half-maps (training map) as sketched in Figure 2—figure supplement 1. Structures with the largest deviation from the deposited reference models were selected, and continuous refinement against the training map was applied in three different incarnations: (i) T = 100 K for 50 ns, (ii) T = 150 K for 30 ns, and (iii) T = 200 K for 20 ns. Here, higher temperature protocols facilitated thermodynamic sampling, allowing the system to reach good map-model agreement within shorter simulations times. However, we observed that lower temperature protocols were less prone to overfitting when applied to very noisy half-maps or in combination with unusually high force constants. If neither of these conditions was true, all protocols performed similarly well.

The starting values for σ and k were set to 0.6 nm and 0.5 × 105 kJ mol-1. After 3 ns of refinement with σstart and kstart, both parameters were linearly ramped to their target values while keeping the pressure at 1 atm. The target value σstop was determined by correlating the deposited reference model with the respective half- and full map at σ values ranging from 0.6 to 0.1 nm with a 0.01 nm increment, and the value (usually 0.15–0.25 nm) yielding the highest correlation was taken as σstop. Except in the cases of CorA and NSF, target values of kstop between 3 and 5 × 105 kJ mol-1 were optimal, as suggested by the half-map cross-validation. These values were found to not cause overfitting for the given set of test systems and yet to be sufficiently strong to guarantee good map-model agreement. We do not recommend using higher kstop values for systems of this size and resolution (10–40 × 103 atoms, high to medium resolution) as those could cause frequent numerical instabilities due to the density-based forces overriding those from the atomistic force field. For the CorA channel, we used σ = 0.6 nm and k = 1 × 105 kJ mol-1 throughout the refinement simulation against the full map because no high-resolution features were available and only large-scale concerted motions had to be accomplished. For the NSF system, a wide range of target force constants (2–8 × 105 kJ mol-1) was tested to identify the optimal density bias (described later in Materials and methods).

After cross-validation with the second half-map (discussed later in Materials and methods), a short 20 ns refinement run was performed with the full reconstruction in order to account for high-resolution features not present in both half-maps (Figure 2—figure supplement 1). It consisted of a 15 ns simulated annealing step and additional 5 ns to obtain the final refined set of structures, while σstop and kstop were kept constant. Two different annealing schemes were tested: rapid heating of the system to 300 K and 500 K within 1 ns and 3 ns, respectively, followed by a slow cooling down phase. No significant advantage of one scheme over the other was observed. Extending the cooling phase to 20 ns did not have any further improving effect either. Hence, we used the 300 K annealing scheme for all systems.

Simulation setup and refinement of 70S ribosome

The structure of the 70S ribosome in a distant conformational state (codon reading, PDB ID: 5LZC) was used as a starting model. Additional Mg2+ ions were placed according to their positions in the structure of the GTPase-activated state (Fischer et al., 2016). The structure was protonated, solvated, and ions were added as described previously (Arenz et al., 2016). The system, was then pre-equilibrated in three steps: first, steepest-descent energy minimization; second, equilibration of the solvent with position restraints on non-solvent heavy atoms (spring constant kposres = 1000 kJ mol-1 nm-1, simulation time 50 ns); and finally, linear decrease of the position restraints to zero during 20 ns. For the refinement, the waters and ions within a distance of 25 Å from any solute atom of the pre-equilibrated structure were kept, resulting in a water-ion shell around the ribosome. Additional solvent and ions were introduced to fill a triclinic box with ∼1.2 nm padding around the solute. The AMBERFF12sb force field was used (Lindorff-Larsen et al., 2010); further simulation parameters can be found elsewhere (Fischer et al., 2016).

The prepared ribosome system was first run at T = 300 K and p = 1 atm for 5 ns with position restraints on the ribosome’s non-hydrogen atoms to equilibrate the extra solvent added during the preparation phase. The restraints were then released, the temperature was set to 100 K, and a refinement run was performed for 50 ns using the training map resampled on a sparser grid (reduced to 247 × 247 × 247 grid points) to speedup calculations for the large-scale transitions (e.g. tRNA and mRNA or ribosomal platform motion). The parameters were linearly changed from σstart = 0.6 nm and kstart = 4.5 × 105 kJ mol-1 to σstop = 0.2 nm and kstop = 3.6 × 106 kJ mol-1, while keeping the temperature and pressure at 100 K and 1 atm, respectively. Such high values of the force constant were chosen given the particularly large system size and the different force field used (AMBERFF12sb vs. CHARMM22*/CHARMM36m) which we found to decrease the overall system compliance. We then performed a short refinement run on the original training map (400 × 400 × 400 grid points) for 5 ns to account for high-resolution features not present in the reduced training map. After cross-validation with the second half-map, the 70S ribosome was additionally refined against the full reconstruction (involving simulated annealing) as described in the previous section.

Cross-validation and stereochemical quality

We used a half-map-based validation approach similar to that proposed for refinement in REFMAC (Amunts et al., 2014; Brown et al., 2015) and the recent automated Rosetta protocol (Wang et al., 2016a). However, unlike in the Rosetta scheme, the cross-validation is used in our work to determine a range of k values that the model can tolerate without being overfitted, but not to score the refined ensemble and preselect models for further refinement steps.

In practice, overfitting was assessed by monitoring training map-model and validation map-model FSC, FSCtrain and FSCval, simultaneously every 5 ns of the half-map refinement. To this end, structure snapshots were extracted from the refinement trajectory and stripped of all waters, ions, hydrogens, and virtual sites. The periodic box was adapted to match that of the half-map, and the structures were finally converted into simulated densities sampled on the same grid using a custom GROMACS tool gmx map and the respective σstop. If there was a strong divergence between FSCtrain and FSCval at the last stage of the half-map refinement – which we observed only for unusually high kstopvalues or very noisy half-maps – a structure at a lower force constant from the same trajectory that did not feature hallmarks of overfitting was used for the full map refinement (see Figure 2—figure supplement 1). The force constant was not further increased during the full map refinement as there was no longer a validation map to control overfitting.

The average structure from the last 5 ns of the full map refinement was used as the final model to calculate FSCfull and for comparison with the deposited reference. All FSC calculations were done with the PDBe Fourier Shell Correlation Server (https://www.ebi.ac.uk/pdbe/emdb/validation/fsc/). The quality of the final average structure was assessed by EMRinger (Barad et al., 2015), MolProbity (Chen et al., 2010) and CaBLAM (Richardson et al., 2018) implemented in PHENIX version 1.14 (Afonine et al., 2018). X-ray scattering positions were consistently used to place hydrogen atoms with MolProbity (default option) for all models analyzed in this study. Using nuclear positions to place hydrogens led to a increase in the number of steric clashes by 15–20%. This number, however, never exceeded 0.4–0.5 for the models refined with our method or MDFF, as those had almost zero steric clashes by construction. Average FSC (FSCavg) was calculated by integrating the FSC curves over the resolution shells starting from 1/100 Å-1 to 1/r where r is the full map resolution according to the 0.143 criterion (Equation 2 in Brown et al., 2015).

We note that the purpose of the half-map validation discribed above (see also Figure 2—figure supplement 1) is to find the optimal range of force constants that do not cause overfitting and yet guarantee good map-model agreement (e.g., in terms of FSC). For many cryo-EM reconstructions, half-maps are not always deposited, which, of course, does not preclude the use of our method. In such cases, it is possible to perform a series of independent refinements against the deposited full map using a range of target force constants and to select a final model(s) with the best map-model agreement and geometry (see the NSF refinement example). It is, however, strongly advised not to omit the half-map cross-validation because: (a) it is a computationally much less expensive strategy – one half-map refinement vs. 5–7 independent full map refinements per system, and (b) it gives more confidence in the final model’s accuracy.

Comparative refinement with alternative methods

Comparative refinement with Phenix, Rosetta, Refmac, and MDFF was performed using the most recent versions of the respective software packages. More specifically, we used the latest official Phenix release 1.14–3260 (Afonine et al., 2018) in combination with the recently proposed optimized refinement protocol (White et al., 2018). The Phenix refinement involved three steps with 15 cycles each: (a) rigid-body placement of individual chains, (b) refinement without secondary structure or Ramachandran restraints (global minimization, local grid search done in each cycle, simulated annealing done only once, and ADP refinement), and (c) refinement with secondary structure and Ramachandran restraints enabled. The oldfield target function was used, and different values of plot_cutoff (between 0.1 and 0.5) and weight_scale (between 0 and 10) were tested to obtain better structures in terms of CCmask, Ramachandran outliers and CaBLAM scores. Non-crystallographic symmetry (NCS) restraints were enabled where possible.

To produce Rosetta models, we used the latest Rosetta build 2018.48.60516 in combination with the automated fitting protocol (Wang et al., 2016a) following the most recent density fitting tutorial (http://faculty.washington.edu/dimaio/files/rosetta_density_tutorial_aug18.pdf). For each case presented in the study, ∼100 Rosetta models were independently refined against one of the half-maps and validated using the second half-map. The structure possessing the best scores (Rosetta, MolProbity, EMRinger and |FSCtrainFSCval|) was further refined against the full map to produce the final Rosetta model. NCS restraints were enabled where possible. The density fitting weight was set to 35 except for the aldolase refinement, where it was set to 45 in accordance with the much higher map resolution.

Refmac5 from the latest version of the CCP4 suite (v. 7.0.067) optimized to work with cryo-EM densities (Brown et al., 2015; Kovalevskiy et al., 2018) was used to produce Refmac models. Jelly-body restraints on the current interatomic distances and local NCS restraints were enabled to improve the radius of convergence. To identify the optimal weight between geometry and experimental data, we performed multiple refinement runs with weights starting at 0.0001 and doubling at every next run until the value of 0.1 was reached. The optimal weight was determined using the half-map validation (Brown et al., 2015) and by assessing the model geometry with MolProbity, which was typically on the order of 0.001–0.01 for protein systems.

NAMD 2.13 (Phillips et al., 2005) was compiled with Intel Compiler 18.0, Intel MPI 2018 and without CUDA support. The recent MDFF refinement protocol for low-resolution cryo-EM maps was adopted from Gurel et al. (2017). Briefly, MDFF was performed in explicit solvent and in the presence of 150 mM KCl, using CHARMM36m parameters (Huang et al., 2017), and was run in three steps: 2000 steps of energy minimization with the fitting potential disabled, 10–50 ns of MD simulation at T= 200 K using a low force constant until the real-space correlation converged (several GSCALE values in the range between 0.05 and 0.5 were tested), followed by another round of energy minimization (2000 steps) with a much higher force constant (GSCALE = 10). Only backbone atoms were allowed to feel the fitting potential due to the lack of side chain density information, and all the other atoms were subjected only to molecular dynamics. We additionally tested two versions of the above protocol: with secondary structure, chirality and cis peptide bond restraints being enabled to minimize the risk of overfitting and without such restraints for consistency with our method. The optimal GSCALE value for the CorA system was ∼0.1. Higher values led to strong structure distortions even in the presence of restraints. The value GSCALE = 0.05 was too low to induce the asymmetric channel opening.

Per-residue outlier propensity

The outlier propensities shown in Figure 11 were calculated using the following reasoning. We seek to calculate the probability that a particular residue is an outlier given its amino acid type, P(out|aa). Invoking Bayes’ theorem, we obtain:

(1) P(out|aa)=P(aa|out)P(out)P(aa),

where P(aa) reflects the amino acid occurrence probability, P(out) represents the overall probability of outlier occurrence, and P(aa|out) is the probability that an outlier has a particular amino acid type. Denoting the total number of amino acids and outliers as Ntot=Naa and ntot=naa, respectively, where Naa and naa are their per-residue components, we find that the sought per-residue propensity simply reflects the number of outliers for this particular residue type scaled according to its occurrence in the amino acid sequence, that is:

(2) P(out|aa)=naaNaa.

Code availability

For the results presented in this work, we used GROMACS version 5.0.7 to which our density fitting module was added. A tar archive of this version as well as the latest 2018 release with density fitting can be downloaded at https://www.mpibpc.mpg.de/grubmueller/densityfitting. For the benchmarks shown in Table 9, we used the latest 2018 version that we also recommend for general usage as it is faster by a factor of 1.3–1.8 (depending on the hardware used) as compared to the 5.0.7 version. We also provide installation instructions and a tutorial to set up and run the aldolase refinement simulation (Figure 3) which can be downloaded using the above link.

Table 9
Performance of the refinement benchmarks for σ = 0.2 nm on two hardware configurations.
https://doi.org/10.7554/eLife.43542.028
SystemD (×106)N (×106)W (ns/d)S (ns/d)TS (days, short)TS (days, long)
Aldolase1.30.1133.439.91.01.8
Tubulin0.20.0987.0110.90.40.6
TRPV11.90.3716.721.52.03.3
Nucleosome*0.60.1720.729.91.32.3
NSF1.30.3618.524.91.62.8
CorA1.20.3125.429.71.32.4
70S Ribosome64.03.101.42.119.1n/a
  1. D = number of cryo-EM density grid points, N = number of atoms including water and ions, W = 6 core workstation with one Intel E5-1650v4 @ 3.6 GHz CPU and one NVIDIA GTX 980 GPU, S = 24 core server node with two Intel Gold 6146 @ 3.2 GHz CPUs and two GTX 1080Ti GPUs. TS = total run time for the short (40 ns) and long (70 ns) protocols in days using the S hardware configuration (see Figure 2—figure supplement 1). The run times do not include setting up the simulated systems or batch queuing times.

    *2-fs time step was used.

Benchmarks

GROMACS 2018 was compiled with GCC 6.4, AVX2-256 SIMD instructions, CUDA 9.1, and OpenMP support using its built-in thread-MPI library. Calculations of the Coulomb and Van der Waals potentials and forces were offloaded to the GPU. Irrespective of the number of atoms in the simulated system, refinement performance was mainly dictated by the number of grid points in the density map, which explains the order of magnitude difference in timings between the tubulin (smallest) and the ribosome (largest) system. The highest performance was recorded on a single 24-core server node equipped with two consumer-class GPUs. In the non-ribosome cases, the complete refinement (half-map, full map, simulated annealing) took 0.4–2 days and 0.6–3.3 days on the faster hardware for the shortest and the longest protocols, respectively (Figure 2—figure supplement 1), respectively. The ribosome refinement took approximately 19 days using the shortest protocol. Currently, the density code used to calculate the correlation coefficient between the experimental and the simulated map poses a bottleneck. However, our preliminary code optimization for multiple GPU node usage showed almost a three fold performance improvement, bringing the refinement time for the ribosome system down to ∼6 days when running on 8 GPU nodes. Moreover, we expect an additional gain in performance by a factor of 2 due to a less computationally expensive representation of the simulated map, yielding ∼3.5 days for the ribosome refinement using the same number of GPU nodes.

Structure availability

All refined structures presented in this work are provided as supplementary material.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
    Coot: model-building tools for molecular graphics
    1. P Emsley
    2. K Cowtan
    (2004)
    Acta Crystallographica. Section D, Biological Crystallography, 60, 10.1107/S0907444904019158, 15572765.
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
    Exact filters for general geometry three dimensional reconstruction
    1. G Harauz
    2. M van Heel
    (1986)
    Optik 78:146–156.
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
  62. 62
  63. 63
  64. 64
  65. 65
  66. 66
  67. 67
  68. 68
  69. 69
  70. 70
  71. 71
  72. 72
  73. 73
  74. 74
  75. 75
  76. 76
  77. 77
  78. 78
  79. 79
  80. 80
  81. 81
  82. 82
  83. 83
  84. 84
  85. 85
  86. 86
  87. 87
  88. 88
  89. 89
  90. 90

Decision letter

  1. Axel T Brunger
    Reviewing Editor; Stanford University, United States
  2. John Kuriyan
    Senior Editor; University of California, Berkeley, United States
  3. Axel T Brunger
    Reviewer; Stanford University, United States
  4. James S Fraser
    Reviewer; University of California, San Francisco, United States

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

[Editors’ note: a previous version of this study was rejected after peer review, but the authors submitted for reconsideration. The first decision letter after peer review is shown below.]

Thank you for submitting your work entitled "Automated correlation-based structure refinement for high-resolution cryo-EM maps of large biomolecular complexes" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: James S Fraser (Reviewer #3).

Our decision has been reached after consultation between the reviewers. Based on these discussions and the individual reviews below, we regret to inform you that your work will not be considered further for publication in eLife.

Grubmuller et al., use molecular dynamics refinement against a real-space map-correlation coefficient in combination with a chemically-accurate force field in the presence of explicit solvent. In order to better overcome local minima, the resolution of the model map was gradually increased starting at very low resolution to the maximum available resolution of the EM data. Moreover, high-temperature simulated annealing is used. Although each of these approaches individually are not new, this work describes a new implementation that combines all these approaches. However, there are many shortcomings that preclude publication of the present work in eLife as outlined in detail below. Briefly, the main concerns are the lack of comparison to other existing methods, unsubstantiated claims of model accuracy, the lack of other test cases (both at very high and medium-low resolution), and the potential of overfitting.

Reviewer #1:

Grubmuller et al., use molecular dynamics refinement against a real-space map-correlation coefficient in combination with a chemically-accurate force field in the presence of explicit solvent. In order to better overcome local minima, the resolution of the model map was gradually increased starting at very low resolution to the maximum available resolution of the EM data. Moreover, high-temperature simulated annealing is used. Although each of these approaches individually are not new, this work describes a new implementation that combines all these approaches. The method should be a useful new addition to the growing arsenal of tools for the refinement of three-dimensional models against maps obtained by single particle cryoEM.

Please comment on optimization of the resolution ramp, relative weight (force constant), and simulated annealing temperature vs. simulation time. The Materials and methods section suggests that somewhat different refinement protocols were used for each of the four example systems. What rationale was used to design these different protocols? More generally, please provide a guideline for selecting the various parameters.

There is a lack of a comparison to methods that are implemented in other programs, such as Phenix or Refmac. In particular, Phenix does have a simulated annealing option, although it does not use a full force field.

Afonine et al., 2018, showed that judicious use of restraints improves the quality of real-space refinement against medium/low cryo-EM density maps. The presence of such additional restraints may be beneficial during the low-resolution stages of the refinement. Please comment.

As an additional criterion for the performance of the refinements, please calculate the CaBLAM validation scores for the deposited structures and the re-refined structures. In particular, the CaBLAM scores (Richardson, J. S., et al. Acta Crystallographica Section D74, 132-142. (2018), doi:10.1107/S2059798317009834; implemented in Phenix) asses the geometry of peptide bonds and the geometry of Ca atoms.

Reviewer #2:

This well-written manuscript describes methods for the refinement of atomic models against cryo-EM reconstructions. The basis of the method is the inclusion of a biasing term in a molecular dynamics potential that increases the overlap of the model with the experimental map. This approach is not new, and has been used in programs such as MDFF. The authors do describe a new approach to gradually increasing the resolution of the data in refinement, and the application of simulated annealing protocols. However, the limited novelty of the work, and other considerations significantly reduce my enthusiasm. Some important points for consideration:

In the Abstract the authors make several claims for the work:

"The method utilizes a chemically accurate force field and exhaustive thermodynamic sampling while efficiently improving the cross-correlation between the modeled structure and the cryo-EM map. For several test systems of different size with resolutions 2.5-3.4 Å, it generates stereochemically accurate models from cryo-EM data with a low risk of overfitting. The presented automated framework allows efficient refinement of large molecular complexes such as the ribosome without manual intervention or any additional ad hoc restraints, providing a good balance between stereochemical accuracy, refinement time, and hardware requirements."

There are several problems:

- While it is true that the models are improved in some ways after the refinement protocol, in general the rotamer conformations are significantly poorer (outside the realm of what is acceptable for a protein structure), and the Ramachandran distributions also poorer. Therefore the claim of "stereochemically accurate models" is overreaching.

- "Efficient refinement" clearly means different things to different people. For the systems studied the methods described took between 2 and 25 days running on a multiple core system with 2 GPUs. For many researchers this would be considered very long run times, especially when there are other methods that produce comparable results in much shorter times.

- A minor quibble, but the phrase "exhaustive thermodynamic sampling", suggests just that. However, the protocols described are unlikely to be exhaustive. They employ simulated annealing methods which do expand the conformational space that is sampled, but not exhaustively.

There are other issues that need to be considered:

- The authors apply their method to 4 test cases, including a large ribosome structure. However, this is a limited sampling of models, and in particular doesn't include models with very poor starting geometries. All 4 test cases are of high quality to begin with – as judged by the EMRinger score, Molprobity score and other presented criteria. The manuscript would be much more persuasive if it presented results for a larger number of structures – it should be noted that other researchers have applied automated refinement methods to all suitable cryo-EM structures available from the Protein Data Bank. In particular it would be helpful to see results for models with significant initial structural errors.

- The work, as presented, focuses on higher resolution cryo-EM data (3Å or higher is stated in the Abstract). There is definitely a need for accurate refinement methods at these resolutions. However, the quality of starting models at these resolutions is typically very high, and often don't present significant challenges for refinement algorithms. In addition, it is worth noting that, as of a few weeks ago, there are 107 structures from 3Å or better cryo-EM data, 808 structures between 3Å and 4Å resolution, and 515 structures between 4Å and 5Å resolution. The 3.5Å to 5Å resolution range often presents some serious challenges for both model building and subsequent refinement. The application of the methods to 3.1Å and 3.5Å test cases in the manuscript is applauded. However, the manuscript would have been more exciting if it presented a more thorough analysis of the application of the method at resolutions worse that 3.5Å.

- The authors could consider test cases where there is some higher resolution "gold standard" available to validate the results of refinement with lower resolution data. This is, of course, non-trivial, but very helpful in confirming the accuracy and radius of convergence of the refinement method.

- As noted above, much of what is described has been introduced in earlier works, so the novelty is limited. The approach to a gradual increase in data resolution is interesting – blurring of the model density only. However, the authors don't demonstrate that this is an improvement over previous methods.

- Simulated annealing has been already applied in other programs for model to map fitting (DEN, MDFF). This is also used, as an option, in the phenix.real_space_refine program.

- Other approaches to model refinement have been implemented, with the goal of stereochemical quality and computational efficiency. In particular the phenix.real_space_refine program is not mentioned in the manuscript. This program has been used for the successful refinement of many cryo-EM models.

- In the description of the proteasome 3.1A refinement it is stated that "The improved quality statistics for our model (Figure 4A, top-right) now indicate that the conformations of these loops were refined more accurately". How do the authors know this? The model may have improved overall, while the confirmation of these particular loops may be worse after refinement.

Reviewer #3:

The major goal of this paper is to use molecule dynamics simulations to improve the fit and geometry of structures refined against high resolution (better than 4A) maps from single particle cryoEM. The principle contribution is to combine advances (classic work on simulated annealing, MDFF, and recent work from the Tama group) with improved MD and map calculation codes. The manuscript is beautifully formatted and the figures are very clear. The software is benchmarked against a very limited set of targets, focusing on experimental data and previously known crystal structures, which mimics its potential application in the real world. The ribosome dataset seems to be the only one somewhat challenging case, but even that consists largely of a rigid body rotation with few internal degrees of freedom changing during the low resolution part of the search. Given the long run time of the method, and the lack of comparisons against existing software (REFMAC, MDFF, phenix.real_space_refine), it is unclear how widely used this package will become. The major weaknesses of the manuscript are lack of benchmarking against these methods and whether the radius of convergence will hold for poorer starting models (e.g. structures based on homology modeling, etc.).

[Editors’ note: what now follows is the decision letter after the authors submitted for further consideration.]

Thank you for submitting your article "Automated cryo-EM structure refinement using correlation-driven molecular dynamics" for consideration by eLife. Your article has been reviewed by three peer reviewers, including Axel Brunger as the Reviewing Editor, and the evaluation has been overseen by John Kuriyan as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: James S Fraser (Reviewer #2).

The Reviewing Editor and the reviewers believe that this paper would be of considerable general interest, provided that the validity of the method can be demonstrated in a convincing way. They continue to feel, however, that the demonstration that the method performs better than alternative methods is still not convincing. After discussion, we are prepared to consider a revised manuscript that addresses the concerns raised by reviewers. The eventual acceptance of this paper hinges on a satisfactory resolution of these issues. We recognize that the revisions would require a substantial amount of new work. If this is not possible, you might choose to submit this work elsewhere in its present form.

Summary:

In this revised manuscript, the authors have addressed many, but not all, of the concerns that were raised. The use of a "complete" empirical force field (rather than the simplified force fields commonly used in macromolecular refinement packages) seems to contribute to the differences when compared with other methods, along with the use of simulated annealing, and ramping up the resolution. While each of these individual approaches is not new, the authors suggest that the combination of them produces better quality EM structures. If this claim can be substantiated, it would represent a welcome new tool in the arsenal of refinement tools for EM structures. However, we are not convinced that the current manuscript definitely demonstrates such superiority, and major revisions are required for further consideration of this manuscript as outlined in the following.

Critical points to address:

1) Overall, while the manuscript has been improved by inclusion of multiple test-cases of different quality maps and comparisons to alternative (standard) methods. However, distinct alternative approaches were used for each test case and not applied across all examples. For example, MDFF was initially developed primarily for lower (~5-10 A) resolution cases, but is not used for the CorA example here (perhaps the strongest new result in the paper). Please consistently use a common set of alternative (standard) methods for all cases, and, additionally, use methods such as MDFF for the low resolution cases.

2) NSF test case, subsection “N-ethylmaleimide sensitive factor (NSF): comparison with Phenix”: for the re-refinement of the recent NSF/SNAP/SNARE EM single particle structure by White et al., 2018 (Figure 7 in the paper), the most significant advantage appears to be the substantially lower clash score compared to phenix refinement. The authors seem to be able to do better (especially for the clash score), without extensive grid searching (at least as far as we can tell). However, the clashscore itself does not indicate if their refined models are closer to the true structure. We had previously asked for the CABLAM analysis that should be included. Nevertheless, ultimately, the question arises if the re-refined structure is closer to the true structure of the 20S complex. Since there is no higher resolution structure of the entire complex available, the authors could test the quality of their refinement with the D2 domain of NSF by comparing it with the available high-resolution crystal structures.

Other important points to address:

1) NSF test case: "Finally we again point out the strong anti-correlation between the number of steric clashes and rotamer/Ramachandran outliers in the deposited reference and our models (discussed later in the Results section).” Which anti-correlation does this statement refer to?

2) Figure 7: the deposited 6MDO structure (referred to as the "reference" structure) actually has rather reasonable geometry. The improvement by the author's method is primarily a reduction of the clash score (Figure 7B). Please comment and speculate about the reasons (e.g., use of an accurate force field might clean up poorly determined regions?). Note that the Ramachandran statistics also slightly improves, contrary to the suggested anti-correlation (see previous point).

3) Figure 7A: Please calculate the heat map but compared to 6MDO instead of 3J94 as well since 6MDO and 3J94 are not equivalent models (e.g., they just pasted the SNAP-25 N terminal residues into 3J94, in which those residues do not exist in 3J94).

4) Figure 7A: it appears that their method mostly fixed the geometry of the solvent exposed parts of the model, so likely the ATP binding pocket is pretty much the same? These differences could possibly also arise from changes in rigid body positions of the domains in the hexamer. Please comment.

5) Figure 7: clarify how the various parameters for their refinement were chosen. Was there a lot of trial and error involved?

6) "Furthermore, the NSF refinement demonstrates the usefulness of the half-map-based cross-validation procedure as, for a refinement against a full map only, much more refinement runs have to done to identify the optimal map bias". Where is it is shown that more refinement runs have to be performed without access to half maps?

7) The question of whether a single model or an ensemble is more appropriate is worth discussing. For example, the three TRPV1 structures produced by independent refinement vary across the model (Figure 5—figure supplement 2), and may suggest that an ensemble refinement may be suitable. Moreover, the discussion of "ideal geometry and over/underfitting" in the last paragraph of the subsection “Force-field dependence of the model geometry” completely ignores the effect of conformational heterogeneity that may be related to the non-ideal nature of refining a single conformation against an EM map. Ensemble averaging misinterpreted as a single structure is a possible reason. See Rice, Shamoo and Brünger, 1998, and Burnley et al., 2012 for crystallographic multi-start and ensemble refinements, respectively, and Herzik, Fraser and Lander et al., 2018 and Bonomi, Pellarin and Vendruscolo, et al.2018 for EM structure ensemble refinement. On the other hand, low determinacy of 3D classification and refinement of the particles may be another reason for the non-ideal nature of the refinement. These two issues are likely convoluted and represent a major challenge for the interpretation of single particle EM data. The authors are not expected to perform ensemble refinements, but rather, a discussion of this point is requested.

8) Note that the use of an empirical force field for crystallographic refinement was first suggested by Brunger et al., 1987, and re-investigated in more recent papers, e.g., Fenn et al., 2011. A discussion of this point would be appropriate.

9) How were the MolProbity statistics calculated? There are some subtleties that might need to be taken into consideration. By default Molprobity uses bond lengths for hydrogen atoms that are centered on the X-ray scattering position (electron cloud), as opposed to the nuclear position of the atom – i.e. the former is shorter than the latter (see Protein Sci. 2018, 27:293-315). This can have some consequences depending on the workflow. If MolProbity is given a model containing hydrogen atoms whose positions have been defined using nuclear positions the clash score may be reported as high. Alternatively, if a model is optimized using nuclear position hydrogen atoms, and these are removed, MolProbity automatically adds hydrogen atoms at the electron cloud position, resulting in systematically low clash scores. The authors should consider their protocol in light of this, and make it clear what was done in the Materials and methods section.

10) Is a complete model a requirement for the method? In the Materials and methods section it appears that any missing residues in the test structures were "filled in" using homology modeling or other approaches. Is this mandatory? If so it needs to be clearly communicated in the paper. The creation of parts of models for which there is no clear experimental data to support that model are questionable.

11) The authors make an interesting argument about clashes and side- and mainchain outliers. However, arguing that a very low clash score and a significant number of geometry outliers is reasonable stretches credulity. It is well known from high resolution structures what the expectations are for outliers. It is also observed that at lower resolution it becomes increasingly hard to determine correct rotamer and mainchain conformations based on the experimental data. The authors might be better off to emphasize that they are able to create models with a quality consistent with many other methods at this resolution, often better than the starting models, and accept that there is room for improvement.

12) The authors should be clear what they mean when they use the phrase "overfitting" in this context. There is often confusion about this term's meaning – e.g. fitting of noise, model correctness, determination of the optimal protocol.

13) It would seem appropriate to mention the work of Sanbonmatsu and colleagues: Kirmizialtin et al., 2015.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Automated cryo-EM structure refinement using correlation-driven molecular dynamics" for further consideration at eLife. Your revised article has been favorably evaluated by John Kuriyan (Senior Editor), a Reviewing Editor, and three reviewers.

We thank you for addressing many of our concerns, in particular with the new discussion sections, the CorA analysis, and the comparisons for all refinements. The application to NSF shows that the method can obtain a model that is closer to the true structure as assessed by the agreement with the high-resolution crystal structures of the NSF D2 domain. In particular, the improved placement of the nucleotide is impressive. However, there are some remaining minor issues that need to be addressed before acceptance, as outlined below:

1. Please provide the CABLAM and other statistics (provided in the supplemental files as. tex files) as proper tables.

2. You are correct in their assertion that the CDMD procedure has a larger radius of convergence than the Refmac, Phenix and Rosetta procedures. However, while it is very much appreciated that making use of other programs can be challenging, it is probable that some of the protocols used are not optimal for the particular cases being studied. For example, in the case of phenix.real_space_refine the default protocols are appropriate for refinement of models close to correct, i.e. not significantly displaced from the cryo-EM volume (globally or locally). Moreover, the protocol outlined in White et al., 2018 helps extend the radius of convergence. However, for models that have more significant starting errors, morphing is an important option for increasing the radius of convergence. It can also be necessary to run more refinement cycles to increase the convergence radius. Please comment in the Discussion.

3. The benchmark information in Table 2 is likely to be confusing for many readers (the timings provided in the Benchmark text are very useful). Would it be possible to provide statistics in the form of total run times for refinements in a table form?

https://doi.org/10.7554/eLife.43542.038

Author response

[Editors’ note: the author responses to the first round of peer review follow.]

Many thanks for handling our manuscript and for the constructive feedback. It has helped us to completely rewrite and re-structure our manuscript. As you will see from our detailed replies to the reviewers’ comments below, we have now addressed all major issues, with a particular emphasis on the two main ones: (1) As requested, we now provide a thorough assessment of our method using a much larger and more diverse set of test cases, covering the most typical refinement challenges posed by modern cryo-EM: poor starting models, low- and medium-resolution maps, highly heterogeneous local map resolutions, or combinations thereof. (2) Concerning the lack of comparison to other methods in the original manuscript, we now carefully analyze and discuss in detail the differences between our results and previously published structures refined with other widely used methods, in terms of both model accuracy and potential of overfitting. For some test cases, we now present a direct comparison to the most recent versions of the well-established methods (Rosetta, ReMDFF and Phenix). Overall, we are convinced that the manuscript now meets the high standards of eLife and provides a substantial contribution to the field.

Reviewer #1:

[…] Please comment on optimization of the resolution ramp, relative weight (force constant), and simulated annealing temperature vs. simulation time. The Materials and methods section suggests that somewhat different refinement protocols were used for each of the four example systems. What rationale was used to design these different protocols? More generally, please provide a guideline for selecting the various parameters.

Indeed, the rationale behind the protocols and the choice of the refinement parameters (resolution and force constant ramps, simulated annealing temperature etc.) was not sufficiently explained in the original manuscript. We have now added this information as well as a schematic representation of the refinement and validation protocols in a supplementary figure (Figure 2—figure supplement 1). The Materials and methods section now also contains guidelines for selecting the various parameters and protocol durations. The tutorial containing instructions and files necessary to set up the aldolase refinement (Figure 3) additionally explains technical aspects.

There is a lack of a comparison to methods that are implemented in other programs, such as Phenix or Refmac. In particular, Phenix does have a simulated annealing option, although it does not use a full force field.

The manuscript has been completely revised to include (a) more test systems to challenge the method in different, orthogonal ways (Figures 3-10), (b) a more detailed analysis and discussion of the obtained models in the context of the previously published structures, and (c) direct comparisons to other widely used refinement methods (see Figure 5 for a COOT comparison, Figure 6A, B for a Rosetta comparison, Figure 6C, D for a ReMDFF comparison, and Figure 7 for a Phenix comparison). More specific changes are described further below.

Afonine et al., 2018, showed that judicious use of restraints improves the quality of real-space refinement against medium/low cryo-EM density maps. The presence of such additional restraints may be beneficial during the low-resolution stages of the refinement. Please comment.

While it is true that rotamer, Ramachandran and secondary structure restraints used in some refinement algorithms have been shown to improve the quality of fitted structures in some cases, we prefer not to mix force field parameters and knowledge-based restraints. We think that modern atomistic force fields should already contain all necessary information to maintain accurate model geometries and, therefore, that the addition of knowledge-based restraints may create conflicting optimization goals. Therefore, our approach, by construction, does not contain such restraints. As demonstrated by the structure quality assessment for all of the refined systems, our approach also does not require those. The use of the “soft” (global) correlation biasing potential further reduces the risk of structural deteriorations.

Of course, our implementation technically allows one to introduce such restraints, which are already part of the GROMACS functionality.

Reviewer #2:

[…] In the Abstract the authors make several claims for the work:

"The method utilizes a chemically accurate force field and exhaustive thermodynamic sampling while efficiently improving the cross-correlation between the modeled structure and the cryo-EM map. For several test systems of different size with resolutions 2.5-3.4 Å, it generates stereochemically accurate models from cryo-EM data with a low risk of overfitting. The presented automated framework allows efficient refinement of large molecular complexes such as the ribosome without manual intervention or any additional ad hoc restraints, providing a good balance between stereochemical accuracy, refinement time, and hardware requirements."

There are several problems:

- While it is true that the models are improved in some ways after the refinement protocol, in general the rotamer conformations are significantly poorer (outside the realm of what is acceptable for a protein structure), and the Ramachandran distributions also poorer. Therefore the claim of "stereochemically accurate models" is overreaching.

It is true that, for some of our models, the Ramachandran distributions were poorer after refinement as compared to the deposited reference structures. It would be nevertheless highly surprising if it were not the case. While the deposited structures (except for the ReMDFF model) were built with structural restraints enforced and, therefore, by construction, did not violate the “ideal” rotamer and Ramachandran distributions, this came at the cost of significant steric clashes (in many cases tens per thousand atoms or more). In contrast, the force field in our approach avoided steric clashes at the cost of, in our view, tolerable rotamer and Ramachandran outliers. Furthermore, the above situation is far from being the general rule – in fact, we saw improvements in rotamer geometries for the TRPV1 (Figure 5, Figure 6C, D) and nucleosome (Figure 8) systems which came together with the drastic reduction in steric clashes, and the rotamer quality of the tubulin system (Figure 4) was comparable with that of the higher-resolution control structures. The TRPV1 (Figure 6C, D) and ribosome (Figure 9) cases, in turn, demonstrated that increased Ramachandran outliers are not a strict trend and can be reduced by our method.

We have now added a new section at the end of the Results, where we specifically summarize – using now a much larger set of test cases – the structure quality statistics aggregated from all the test cases and thoroughly discuss the “acceptability” of rotamer and Ramachandran outliers in the general context of structure refinement.

- "Efficient refinement" clearly means different things to different people. For the systems studied the methods described took between 2 and 25 days running on a multiple core system with 2 GPUs. For many researchers this would be considered very long run times, especially when there are other methods that produce comparable results in much shorter times.

- Other approaches to model refinement have been implemented, with the goal of stereochemical quality and computational efficiency. In particular the phenix.real_space_refine program is not mentioned in the manuscript. This program has been used for the successful refinement of many cryo-EM models.

The reviewer notes that the computational efficiency of our refinement method is below that of several other methods. Although these methods do achieve shorter run times, we respectfully disagree with the statement that the obtained results are comparable. Whereas these methods perform similarly well for maps at near-atomic resolution (Figure 3 and Figure 6A, B), the additional hard test cases we now show that our method performs better, in particular at low and spatially heterogeneous local resolutions. Concerning the run times, for small to medium-sized systems (Figure 3-8 and Figure 10), these range from 0.5 to 3 days, which we would consider acceptable given the higher quality of the resulting models and the method’s universality. For large systems (Figure 9), the refinement performance is mainly dictated by the large number of grid points in the density map, which is indeed the main bottleneck in the current implementation of the density calculation code. Preliminary code optimization for multiple GPU node usage showed almost a 3-fold performance improvement, thus bringing the refinement time for the ribosome system down to 6-7 days when running on 8 GPU-nodes. Moreover, we expect an additional gain in performance by a factor of 2 due to a less computationally expensive representation of the simulated map, eventually yielding ~3.5 days for the ribosome refinement using the same number of GPU nodes. For a fully automated method, we think this is acceptable.

- A minor quibble, but the phrase "exhaustive thermodynamic sampling", suggests just that. However, the protocols described are unlikely to be exhaustive. They employ simulated annealing methods which do expand the conformational space that is sampled, but not exhaustively.

We agree with the reviewer and have removed this phrasing in the revised manuscript.

There are other issues that need to be considered:

- The authors apply their method to 4 test cases, including a large ribosome structure. However, this is a limited sampling of models, and in particular doesn't include models with very poor starting geometries. All 4 test cases are of high quality to begin with – as judged by the EMRinger score, Molprobity score and other presented criteria. The manuscript would be much more persuasive if it presented results for a larger number of structures – it should be noted that other researchers have applied automated refinement methods to all suitable cryo-EM structures available from the Protein Data Bank. In particular it would be helpful to see results for models with significant initial structural errors.

We agree with the reviewer that the previous set of test cases was limited and did not entirely reflect the full spectrum of refinement challenges posed by modern cryo-EM. In the revised manuscript, following the reviewer’s advice, we have substantially extended the set of refinement cases to include more challenging systems. We particularly added cases for which the starting structures had significant initial errors.

- The work, as presented, focuses on higher resolution cryo-EM data (3Å or higher is stated in the Abstract). There is definitely a need for accurate refinement methods at these resolutions. However, the quality of starting models at these resolutions is typically very high, and often don't present significant challenges for refinement algorithms. In addition, it is worth noting that, as of a few weeks ago, there are 107 structures from 3Å or better cryo-EM data, 808 structures between 3Å and 4Å resolution, and 515 structures between 4Å and 5Å resolution. The 3.5Å to 5Å resolution range often presents some serious challenges for both model building and subsequent refinement. The application of the methods to 3.1Å and 3.5Å test cases in the manuscript is applauded. However, the manuscript would have been more exciting if it presented a more thorough analysis of the application of the method at resolutions worse that 3.5Å.

Again, we agree that considering only high-resolution cases (~3.5 Å or higher) was insufficient to assess the method’s convergence and accuracy. In the new, extended set of test cases, a much wider range of resolutions (2.6-7.1 Å) is covered.

- The authors could consider test cases where there is some higher resolution "gold standard" available to validate the results of refinement with lower resolution data. This is, of course, non-trivial, but very helpful in confirming the accuracy and radius of convergence of the refinement method.

We thank the reviewer for this suggestion and have now included such a test case (Figure 4).

- As noted above, much of what is described has been introduced in earlier works, so the novelty is limited. The approach to a gradual increase in data resolution is interesting – blurring of the model density only. However, the authors don't demonstrate that this is an improvement over previous methods.

- Simulated annealing has been already applied in other programs for model to map fitting (DEN, MDFF). This is also used, as an option, in the phenix.real_space_refine program.

- Other approaches to model refinement have been implemented, with the goal of stereochemical quality and computational efficiency. In particular the phenix.real_space_refine program is not mentioned in the manuscript. This program has been used for the successful refinement of many cryoEM models.

Although many individual components of the proposed refinement method are, indeed, not new (e.g., correlation biasing potential or simulated annealing), the actual novelty is the integration of these components into an automated protocol involving a chemically accurate force field, the "soft" (nonlocal) correlation biasing potential coupled to adaptive resolution, and the half-map-based validation scheme, and we demonstrate, using the extended set of refinement cases, that this is indeed an improvement over previous methods. Unlike other methods, this integration allows fully automated refinement into density maps with heterogeneous local resolutions, as typical of modern cryo-EM maps. We now also explicitly compare our results to the deposited reference models and discuss in the context of other refinement methods used, including the most recent version of phenix.real_space_refine (Afonine et al., 2018) with optimized target functions (White et al., 2018).

- In the description of the proteasome 3.1A refinement it is stated that "The improved quality statistics for our model (Figure 4A, top-right) now indicate that the conformations of these loops were refined more accurately". How do the authors know this? The model may have improved overall, while the confirmation of these particular loops may be worse after refinement.

We have now removed the proteasome refinement case from the revised manuscript, as it was very similar to the aldolase refinement (Figure 3) both in terms of map resolution and starting structure quality.

Reviewer #3:

The major goal of this paper is to use molecule dynamics simulations to improve the fit and geometry of structures refined against high resolution (better than 4A) maps from single particle cryoEM. The principle contribution is to combine advances (classic work on simulated annealing, MDFF, and recent work from the Tama group) with improved MD and map calculation codes. The manuscript is beautifully formatted and the figures are very clear. The software is benchmarked against a very limited set of targets, focusing on experimental data and previously known crystal structures, which mimics its potential application in the real world. The ribosome dataset seems to be the only one somewhat challenging case, but even that consists largely of a rigid body rotation with few internal degrees of freedom changing during the low resolution part of the search. Given the long run time of the method, and the lack of comparisons against existing software (REFMAC, MDFF, phenix.real_space_refine), it is unclear how widely used this package will become. The major weaknesses of the manuscript are lack of benchmarking against these methods and whether the radius of convergence will hold for poorer starting models (e.g. structures based on homology modeling, etc.).

We believe that all of the concerns raised by reviewer #3 have now been addressed, as described above. The fact that there was such a big overlap between these concerns and those of the other reviewers brought us to pay particular attention to (a) substantially expanding the set of challenging test cases and (b) comparing against existing methods (Phenix, Rosetta, MDFF, Refmac, COOT).

[Editors' note: the author responses to the re-review follow.]

Critical points to address:

1) Overall, while the manuscript has been improved by inclusion of multiple test-cases of different quality maps and comparisons to alternative (standard) methods. However, distinct alternative approaches were used for each test case and not applied across all examples. For example, MDFF was initially developed primarily for lower (~5-10 A) resolution cases, but is not used for the CorA example here (perhaps the strongest new result in the paper). Please consistently use a common set of alternative (standard) methods for all cases, and, additionally, use methods such as MDFF for the low resolution cases.

We are glad the reviewers agree that the inclusion of more test cases has improved the original version of our manuscript. Further encouraged by the reviewers, we have added a full, “all-structures-by-all-methods” comparison using the same far-away starting structures across all test cases. Considering that many of the tested methods have not been originally optimized to work with distant starting models or DNA/RNA, we think that, taken together, these comparisons strengthen our main conclusions. These new comparisons indeed demonstrate that the correlation-driven MD refinement has a much larger radius of convergence than current non-MD methods. As requested, we also provide a comparative refinement of the CorA system (Figure 10) using MDFF, which has helped to gain a clearer picture.

The manuscript has been modified as follows:

- Phenix, Rosetta and Refmac have been chosen as alternative methods for comparison for the test cases in Figures 3-9. The radius-of-convergence analysis is shown in the supplementary figures corresponding to Figures 3-9. The results of the comparative refinements are additionally discussed as separate paragraphs in each of the Results subsections. A separate subsection in Materials and methods is dedicated to the description of the Phenix/Rosetta/Refmac refinement protocols.

- MDFF has been applied to the CorA system (Figure 10), and the corresponding refinement protocol is described in Materials and methods. For consistency with MDFF, we have re-refined the CorA starting structure with our method but using a different force field (CHARMM36m), as CHARMM22* used in Figures 3-9 has no version compatible with NAMD. This has also allowed a fairer comparison of our method with MDFF.

2) NSF test case, subsection “N-ethylmaleimide sensitive factor (NSF): comparison with Phenix”: for the re-refinement of the recent NSF/SNAP/SNARE EM single particle structure by White et al., 2018 (Figure 7 in the paper), the most significant advantage appears to be the substantially lower clash score compared to phenix refinement. The authors seem to be able to do better (especially for the clash score), without extensive grid searching (at least as far as we can tell). However, the clashscore itself does not indicate if their refined models are closer to the true structure. We had previously asked for the CABLAM analysis that should be included. Nevertheless, ultimately, the question arises if the re-refined structure is closer to the true structure of the 20S complex. Since there is no higher resolution structure of the entire complex available, the authors could test the quality of their refinement with the D2 domain of NSF by comparing it with the available high-resolution crystal structures.

We agree with the reviewers that an improvement (however pronounced it is) in any of the model quality metrics relatively to the published reference model does not per se indicate that the resulting model is closer to the “true structure”, and that a higher-quality control structure would be required for validation (ideally derived from a higher-resolution data set, e.g., X-ray). We therefore welcomed the suggestion by the reviewers and have validated our refined models in those test cases where higher resolution structures of either the entire refined system or parts of it were available (including NSF). The only exception was TRPV1 for which we could not find a high-resolution crystal structure of the channel or close homolog structures. The results of the validations are shown in panels D of Figures 39 and are now discussed in separate paragraphs in each of the Results subsections. Also following the request to calculate CaBLAM scores, all quality metrics including CaBLAM are now summarized in separate supplementary tables to each main text figure.

Regarding the grid searching mentioned by the reviewers, indeed, we did not perform any extensive parameter search as in the original study by White et al. The major challenge in this particular refinement case was the absence of the half-maps which could have drastically simplified the choice of the optimal force constant. As the half-maps were not available, we did a scan of the force constant space by performing seven independent refinements (instead of just one like in, e.g., Figure 3) with different target force constants (Figure 7B). The optimal force constant for the NSF system turned out to be close to the initial guess – 4-5 × 105 kJ / mol – which was comparable with the one we used to refine the TRPV1 system. Other than that, there was no trial and error involved. This has now been explained more clearly in the main text.

Other important points to address:

1) NSF test case: "Finally we again point out the strong anti-correlation between the number of steric clashes and rotamer/Ramachandran outliers in the deposited reference and our models (discussed later in the Results section).” Which anti-correlation does this statement refer to?

This statement refers to the reduction of steric clashes by our method that often comes at the price of increased rotamer and/or Ramachandran outliers, if one compares the refined models with the deposited ones. Examples are, e.g., Figure 3b (bottom, gray vs. green), Figure 4B (bottom, gray and dark gray vs. cyan), Figure 6b (bottom, gray vs. pink, rotamers), Figure 9B (bottom, gray vs. orange, rotamers). We have now included a more detailed discussion of this observation in the last subsection of the Results (“Force field influence and model accuracy”). Additionally, we now avoid the repetitive use of this statement in the main text, mentioning it only once when describing the results of the aldolase refinement.

2) Figure 7: the deposited 6MDO structure (referred to as the "reference" structure) actually has rather reasonable geometry. The improvement by the author's method is primarily a reduction of the clash score (Figure 7B). Please comment and speculate about the reasons (e.g., use of an accurate force field might clean up poorly determined regions?). Note that the Ramachandran statistics also slightly improves, contrary to the suggested anti-correlation (see previous point).

As requested, we have added a paragraph to the section dedicated to the NSF refinement (Figure 7) discussing the difference between the reference structure (6MDO) and our model in more detail. See also our response in point (4) below.

3) Figure 7A: Please calculate the heat map but compared to 6MDO instead of 3J94 as well since 6MDO and 3J94 are not equivalent models (e.g., they just pasted the SNAP-25 N terminal residues into 3J94, in which those residues do not exist in 3J94).

For each refinement case (Figures 3-10), panel (A) in the corresponding figures shows the per-residue (or per-nucleotide) RMSD between the distant starting structure and the deposited reference. The same applies to Figure 7A where the heat map shows the per-residue RMSD between 6MDO and the distant starting structure. This was not sufficiently clearly stated in the previous manuscript version and is now explicitly pointed out in the respective figure legends.

4) Figure 7A: it appears that their method mostly fixed the geometry of the solvent exposed parts of the model, so likely the ATP binding pocket is pretty much the same? These differences could possibly also arise from changes in rigid body positions of the domains in the hexamer. Please comment.

While it is true that the solvent exposed parts of the NSF model were improved in our model, this was not the only difference between our model and 6MDO. In fact, the validation using higher-resolution X-ray structures (referred to as “controls” in our work), as suggested by the reviewers in Critical point (2), has helped identify the ATP binding pocket as yet another region in the NSF protomers that also differs considerably between the two models. This validation has revealed that either the adenosine group of ATP is rotated by 90° or 180° (3 of 6 NSF protomers in 6MDO) or the phosphate tail of ATP deviates considerably from that of the controls (all protomers in 6MDO). In contrast, our model reproduced the structure of the ATP binding pocket equally well across the entire D2 hexamer. As mentioned above, we have added a paragraph to the NSF section discussing these aspects in more detail.

With respect to global rigid-body positioning of the domains in the hexamer, we have not found any difference between 6MDO and our model, which is also supported by the very similar reciprocal correlation at low spatial frequencies (Figure 7B, top).

5) Figure 7: clarify how the various parameters for their refinement were chosen. Was there a lot of trial and error involved?

See our response to Critical point (2). The improved versions of the sections “Simulation setup and refinement” and “Cross-validation and stereochemical quality” in Materials and methods in combination with the protocol summary in Figure 2—figure supplement 1 now provide a guide for how the refinement parameters were chosen. In particular, we have added a paragraph to the section “Cross-validation and stereochemical quality” that specifically elucidates how to proceed in refinement situations in which the half-maps are not available.

In some cases, for example, several half-map refinements at different temperatures (see Figure 2—figure supplement 1) were performed to test the method’s robustness. In other cases (e.g., the nucleosome refinement), independent refinements with different force constants were done to test the underfitting hypothesis. Overall, very little trial and error was required – the fact that the whole refinement procedure is fully automated really paid off.

6) "Furthermore, the NSF refinement demonstrates the usefulness of the half-map-based cross-validation procedure as, for a refinement against a full map only, much more refinement runs have to done to identify the optimal map bias". Where is it is shown that more refinement runs have to be performed without access to half maps?

See our response to point (5). We have now clarified this issue in the revised manuscript by explicitly stating why and how the absence of half-maps affected the NSF refinement in the first paragraph of the corresponding section of the main text.

7) The question of whether a single model or an ensemble is more appropriate is worth discussing. For example, the three TRPV1 structures produced by independent refinement vary across the model (Figure 5—figure supplement 2), and may suggest that an ensemble refinement may be suitable. Moreover, the discussion of "ideal geometry and over/underfitting" in the last paragraph of the subsection “Force-field dependence of the model geometry” completely ignores the effect of conformational heterogeneity that may be related to the non-ideal nature of refining a single conformation against an EM map. Ensemble averaging misinterpreted as a single structure is a possible reason. See Rice, Shamoo and Brünger, 1998, and Burnley et al., 2012 for crystallographic multi-start and ensemble refinements, respectively, and Herzik, Fraser and Lander, 2018 and Bonomi, Pellarin and Vendruscolo, 2018 for EM structure ensemble refinement. On the other hand, low determinacy of 3D classification and refinement of the particles may be another reason for the non-ideal nature of the refinement. These two issues are likely convoluted and represent a major challenge for the interpretation of single particle EM data. The authors are not expected to perform ensemble refinements, but rather, a discussion of this point is requested.

We agree and thank the reviewers for pointing out these important issues and now discuss them in more detail in the last two paragraphs of the section “Force field influence and model accuracy”. Briefly, we now explicitly distinguish between the two sources of errors in our refined models: (a) conformational heterogeneity and poor 3D classification convoluted in cryo-EM data and (b) the imperfect nature of atomistic force fields. Each point is discussed in a separate paragraph. Besides citing the above literature on ensemble refinement, we also explicitly state that the present work is limited to single-structure refinements.

8) Note that the use of an empirical force field for crystallographic refinement was first suggested by Brunger et al., 1987, and re-investigated in more recent papers, e.g., Fenn et al., 2011. A discussion of this point would be appropriate.

We thank the reviewers for pointing this out. The mentioned studies as well as a comprehensive review on the use of molecular dynamics in cryo-EM refinement by Sanbonmatsu and coworkers are now cited and acknowledged.

9) How were the MolProbity statistics calculated? There are some subtleties that might need to be taken into consideration. By default Molprobity uses bond lengths for hydrogen atoms that are centered on the X-ray scattering position (electron cloud), as opposed to the nuclear position of the atom – i.e. the former is shorter than the latter (see Protein Sci. 2018, 27:293-315). This can have some consequences depending on the workflow. If MolProbity is given a model containing hydrogen atoms whose positions have been defined using nuclear positions the clash score may be reported as high. Alternatively, if a model is optimized using nuclear position hydrogen atoms, and these are removed, MolProbity automatically adds hydrogen atoms at the electron cloud position, resulting in systematically low clash scores. The authors should consider their protocol in light of this, and make it clear what was done in the Materials and methods section.

This information, indeed, was not provided in the previous manuscript version. Throughout the study, we consistently used X-ray scattering positions (electron cloud, default option) for all structures. When using the nuclear position option in MolProbity, we noticed that the number of steric clashes increased by 15-20%. This, however, had almost no effect on the MolProbity scores of the structures refined with our method or MDFF (CorA in Figure 10), as those had almost zero steric clashes by construction, independently of the hydrogen placement option used. We have now made this clear and have added all required details in the section “Cross-validation and stereochemical quality” in Materials and methods.

10) Is a complete model a requirement for the method? In the Materials and methods section it appears that any missing residues in the test structures were "filled in" using homology modeling or other approaches. Is this mandatory? If so it needs to be clearly communicated in the paper. The creation of parts of models for which there is no clear experimental data to support that model are questionable.

The short answer is no. While it is true that missing parts in the starting structures were filled in using MODELLER or by copying them from other structures (unrelated to both refinement and validation), this is generally not required for the method’s application. The filled-in parts in the starting structures can be split into two groups: (a) parts that are missing in the starting structures and reflected in the density in which case they are allowed to feel the correlation biasing potential, and (b) missing parts that are not reflected in the density and, hence, are subjected only to the atomistic force field. The latter were added for completeness but were later excluded in all subsequent analyses.

A new paragraph at the end of the section “Preparing maps and starting structures” in Materials and methods now clearly communicates whether or not starting structures need to be complete prior to refinement with our method.

11) The authors make an interesting argument about clashes and side- and mainchain outliers. However, arguing that a very low clash score and a significant number of geometry outliers is reasonable stretches credulity. It is well known from high resolution structures what the expectations are for outliers. It is also observed that at lower resolution it becomes increasingly hard to determine correct rotamer and mainchain conformations based on the experimental data. The authors might be better off to emphasize that they are able to create models with a quality consistent with many other methods at this resolution, often better than the starting models, and accept that there is room for improvement.

Considering the discussions in Critical point (2) and point (7), we fully agree with this statement and have rewritten the last paragraph of the section “Force field influence and model accuracy” in the Results part accordingly.

12) The authors should be clear what they mean when they use the phrase "overfitting" in this context. There is often confusion about this term's meaning – e.g. fitting of noise, model correctness, determination of the optimal protocol.

We agree and now provide a statement at the beginning of the subsection “Refinement” about how the term “overfitting” should be understood in the structure refinement context. In fact, the concept of overfitting validation is already incorporated in Figure 2—figure supplement 1 describing our refinement protocol.

13) It would seem appropriate to mention the work of Sanbonmatsu and colleagues: Kirmizialtin et al., 2015.

The work is cited and acknowledged in the Introduction.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

1. Please provide the CABLAM and other statistics (provided in the supplemental files as. tex files) as proper tables.

We were previously asked by the eLife editorial support to provide all supplementary tables as separate editable files and include only legends for them in the main article file. For this resubmission, we have compiled the supplementary tables into a single. pdf file and appended it to the end of the main article file (after supplementary figures), as requested. The tables are still available as editable. tex files.

2. You are correct in their assertion that the CDMD procedure has a larger radius of convergence than the Refmac, Phenix and Rosetta procedures. However, while it is very much appreciated that making use of other programs can be challenging, it is probable that some of the protocols used are not optimal for the particular cases being studied. For example, in the case of phenix.real_space_refine the default protocols are appropriate for refinement of models close to correct, i.e. not significantly displaced from the cryo-EM volume (globally or locally).

It is, indeed, true that the tested alternative protocols are not entirely optimal for the presented refinement cases. This is mostly due to the distant starting structures we used to test CDMD. Expectedly, the alternative methods applied to the same distant structures yielded poorer results. Although they were somewhat better than we initially anticipated: for example, the aldolase model generated by Rosetta looked promising, but given the fact that Rosetta was also used to generate the reference aldolase model (5VY3), this result was not surprising. As mentioned in the previous response letters, this was the reason why we refrained from the complete “all-to-all” comparisons in the previous manuscript version. However, we are now convinced that this comparison is necessary and clearly illustrates the challenge of reconciling local refinement accuracy and large convergence radius, in which a proper treatment of concerted molecular motions plays a major role.

Moreover, the protocol outlined in White et al., 2018 helps extend the radius of convergence. However, for models that have more significant starting errors, morphing is an important option for increasing the radius of convergence. It can also be necessary to run more refinement cycles to increase the convergence radius. Please comment in the Discussion.

In all of our Phenix refinements, we switched off the morphing option because we were not expecting the deforming procedure implemented in Phenix (Terwilliger et al., 2012; iterative displacement and smoothing of atom positions in the vicinity each residue to maximize the map-model overlap) to improve the radius of convergence, exactly for the reason mentioned above. To further support this, we have repeated the Phenix refinement of NSF, but this time letting Phenix apply morphing in each macrocycle (this structure is attached to this resubmission; NSF_PHENIX_MORPH.pdb). We have not seen any significant improvements relative to the protocol without morphing in terms of RMSD to the reference structure, and for some solvent-exposed loops, the agreement with the map became even worse.

Concerning the number of refinement cycles, 15 macrocycles used in our Phenix protocols were largely sufficient. In fact, the correlation coefficient (CCmask) stopped increasing already after 7th-8th cycle in all of the refinements.

We have added a remark on the above issue in the Conclusion section.

3. The benchmark information in Table 2 is likely to be confusing for many readers (the timings provided in the Benchmark text are very useful). Would it be possible to provide statistics in the form of total run times for refinements in a table form?

Thank you for this suggestion. We agree that providing only the MD simulation performance might be confusing and requires further recalculations to obtain the total run times. We have now added two columns to Table 2 showing the total refinement run times for the short and long protocol versions.

https://doi.org/10.7554/eLife.43542.039

Article and author information

Author details

  1. Maxim Igaev

    Department of Theoretical and Computational Biophysics, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
    Contribution
    Conceptualization, Data curation, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing—original draft, Project administration
    For correspondence
    migaev@mpibpc.mpg.de
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-8781-1604
  2. Carsten Kutzner

    Department of Theoretical and Computational Biophysics, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
    Contribution
    Software, Methodology, Writing—review and editing
    Competing interests
    No competing interests declared
  3. Lars V Bock

    Department of Theoretical and Computational Biophysics, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
    Contribution
    Data curation, Writing—review and editing
    Competing interests
    No competing interests declared
  4. Andrea C Vaiana

    Department of Theoretical and Computational Biophysics, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
    Contribution
    Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Writing—original draft, Project administration
    For correspondence
    Andrea.Vaiana@mpibpc.mpg.de
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8865-0651
  5. Helmut Grubmüller

    Department of Theoretical and Computational Biophysics, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
    Contribution
    Conceptualization, Resources, Supervision, Methodology, Writing—review and editing
    For correspondence
    hgrubmu@gwdg.de
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-3270-3144

Funding

Max-Planck-Gesellschaft (Open-access funding)

  • Maxim Igaev
  • Carsten Kutzner
  • Lars V Bock
  • Andrea C Vaiana
  • Helmut Grubmüller

Deutsche Forschungsgemeinschaft (IG 109/1-1)

  • Maxim Igaev

Deutsche Forschungsgemeinschaft (FOR 1805)

  • Andrea C Vaiana

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

This work was supported by the Max Planck Society and the German Research Foundation via the research grant No. IG 109/1–1 (M Igaev) and FOR 1805 (AC Vaiana). The authors acknowledge computer time provided by the North-German Supercomputing Alliance (HLRN; located in Hannover and Berlin, Germany) as well as by the the Max Planck Computing and Data Facility and the Leibniz Supercomputing Centre (both located in Garching, Germany). The authors thank Holger Stark, Ashwin Chari, Niels Fischer and Wojciech Kopec (Max Planck Institute for Biophysical Chemistry, Göttingen, Germany), and Christian Blau (Stockholm University, Sweden) for insightful discussions. The authors also thank Mario Halic (LMU Munich, Germany), Rui Zhang (Washington University in St. Louis, USA) and Steven Wang (University of Illinois, Urbana-Champaign, USA) for kindly providing the nucleosome half-maps, the microtubule cryo-EM reconstructions, and the ReMDFF reference model of TRPV1, respectively.

Senior Editor

  1. John Kuriyan, University of California, Berkeley, United States

Reviewing Editor

  1. Axel T Brunger, Stanford University, United States

Reviewers

  1. Axel T Brunger, Stanford University, United States
  2. James S Fraser, University of California, San Francisco, United States

Publication history

  1. Received: November 9, 2018
  2. Accepted: March 3, 2019
  3. Accepted Manuscript published: March 4, 2019 (version 1)
  4. Version of Record published: March 19, 2019 (version 2)

Copyright

© 2019, Igaev et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 2,457
    Page views
  • 470
    Downloads
  • 1
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Neuroscience
    2. Structural Biology and Molecular Biophysics
    Wenlei Ye et al.
    Research Article
    1. Cell Biology
    2. Structural Biology and Molecular Biophysics
    Florian Ullrich et al.
    Research Article