Automated cryo-EM structure refinement using correlation-driven molecular dynamics

  1. Maxim Igaev  Is a corresponding author
  2. Carsten Kutzner
  3. Lars V Bock
  4. Andrea C Vaiana  Is a corresponding author
  5. Helmut Grubmüller  Is a corresponding author
  1. Max Planck Institute for Biophysical Chemistry, Germany
11 figures, 9 tables and 1 additional file

Figures

Approach: using correlation-based refinement in MD simulation to steer the atomic positions of a macromolecule such that they optimally fit a cryo-EM map.

The molecule is subjected to a global biasing potential Vfit in addition to the MD force field Vff. The forces resulting from Vfit act on every atom to enhance the real-space correlation coefficient c.c. between the cryo-EM density (green) and the density calculated from the current atomic positions (blue). The first step (1) is to generate a simulated density by convoluting the atomic positions with a three-dimensional Gaussian function of width σ (Orzechowski and Tama, 2008). The two maps are correlated (2), and the biasing forces are calculated. These forces are then added to the standard MD force field (3), and new atomic positions are evaluated (4). Steps (1–4) are repeated, yielding a structure that correlates better with the cryo-EM map than the starting structure.

https://doi.org/10.7554/eLife.43542.002
Figure 2 with 1 supplement
Schematic representation of the proposed continuous refinement protocol: (1) a low temperature optimization phase, where Vfit is monotonously increased by increasing the force constant k (columns a–d), followed by (2) simulated annealing (columns e, f).

The local effect of the protocol is exemplified in the upper row for a one-dimensional single-atom case. Simulated densities shown in the middle row were generated using the atomic structure of a tubulin dimer (PDB ID: 3JAT; Zhang and Nogales, 2015).

https://doi.org/10.7554/eLife.43542.003
Figure 2—figure supplement 1
Detailed scheme of the proposed continuous refinement protocol subdivided in five stages.

In stage 1 (yellow), the starting structure is subjected to initial equilibration in explicit solvent. Equilibration at T = 300 K for 50 ns is needed to drive the starting structure further away from the target state, posing an additional refinement challenge. In real applications, a short equilibration run at T = 100 K for 5–10 ns should be sufficient. Stages 2 and 3 (gray and green) include the refinement against one of the half-maps (training map) followed by cross-validation by means of Fourier Shell Correlation (FSC) against the other half-map (validation map). In case no overfitting of the training map is observed, the final structure from the half-map refinement is passed to stage 4 (cyan), where it is refined against the full map to account for high-resolution features not present in both half-maps. Finally, in stage 5 (purple), the average structure from the last 5 ns of the full map refinement is subjected to geometry and goodness-of-fit assessment. Note that the procedure is automated such that stages 1–4 can be run without human intervention. Pausing the refinement to inspect intermediate results or introduce changes to the protocol should be possible at any stage. See Materials and methods for more details.

https://doi.org/10.7554/eLife.43542.004
Figure 3 with 2 supplements
Refining a distant starting model into a high-resolution map: rabbit muscle aldolase at 2.6 Å.

(a) RMSD (Cα atoms) between the starting and the reference model (5VY5) showing the extent of rearrangements during refinement. (b) Reciprocal-space agreement of the starting (black dashed), the reference (gray) and our refined model (green) with the full map (top) and stereochemical quality for the three models assessed by EMRinger and MolProbity (bottom). (c) Representative secondary structure elements showing local agreement of our model with the full map. (d) Representative region of the protein interior (chain A) showing the closeness of our model and the reference to higher resolution control X-ray structures in terms of RMSD. Some residues are explicitly labeled.

https://doi.org/10.7554/eLife.43542.005
Figure 3—figure supplement 1
Extension of Figure 3 showing the time evolution of various characteristics during refinement.

(a) The simulated map resolution σ and force constant k were linearly ramped from 0.6 nm and 0.5 × 105kJ mol-1 to the target values of 0.2 nm and 5 × 105 kJ mol-1, respectively. At the bottom, the time evolution of c.c. with the training and the full map and RMSD to the reference structure are shown. (b) FSCtrain and FSCval sampled every five ns. Cross-validation showed no signs of overfitting (black dashed lines).

https://doi.org/10.7554/eLife.43542.006
Figure 3—figure supplement 2
Extension of Figure 3 showing the comparison of the radii of convergence across different refinement methods for the aldolase system and using the same distant starting structure.

RMSD to the reference structure for all methods used is shown in the upper left plot. Overlays between the reference (gray) and the refined structures are depicted as ribbons (colors as in the RMSD plot).

https://doi.org/10.7554/eLife.43542.007
Figure 4 with 1 supplement
Refinement of a curved tubulin dimer into a map of the straight, microtubule-like state.

(a) RMSD (Cα atoms) between the starting and the control model (6DPV) showing the extent of rearrangements between the solution (curved) and the microtubule-like (straight) tubulin conformation. The α-subunits of both models were aligned for the RMSD calculation. (b) Reciprocal-space agreement of the starting (black dashed), the control (gray and dark gray) and our model (cyan) with the full map (top) and stereochemical quality for the four models assessed by EMRinger and MolProbity (bottom). (c) Representative secondary structure elements showing local agreement of our model with the full map. (d) Same as in c but showing the closeness of our model to higher resolution control structures in terms of RMSD. Some residues are explicitly labeled.

https://doi.org/10.7554/eLife.43542.008
Figure 4—figure supplement 1
Extension of Figure 4 showing the comparison of the radii of convergence across different refinement methods for the tubulin system and using the same distant starting structure.

RMSD to the reference structure for all methods used is shown in the upper left plot. Overlays between the control (6DPV, gray) and the refined structures are depicted as ribbons (colors as in the RMSD plot).

https://doi.org/10.7554/eLife.43542.009
Figure 5 with 2 supplements
Refinement of a poor structure of TRPV1 in a distant conformation into a map with highly heterogeneous local resolution.

(a) RMSD (Cα atoms) between the starting and the reference model (3J5P) showing the extent of rearrangements the TRPV1 structure undergoes during refinement. (b) Reciprocal-space agreement of the starting (black dashed), the reference (gray) and our model (purple) with the full map (top) and stereochemical quality of the models assessed by EMRinger and MolProbity (bottom). Note that the starting model was derived from the deposited structure by subjecting the latter to MD at T = 300 K. (c) Representative secondary structure elements showing local agreement between of our model with the full map.

https://doi.org/10.7554/eLife.43542.010
Figure 5—figure supplement 1
Extension of Figure 5 showing the comparison of the radii of convergence across different refinement methods for the TRPV1 system and using the same distant starting structure.

RMSD to the reference structure for all methods used is shown in the upper left plot. Overlays between the reference (3J5P, gray) and the refined structures are depicted as ribbons (colors as in the RMSD plot).

https://doi.org/10.7554/eLife.43542.011
Figure 5—figure supplement 2
Convergence of the TRPV1 refinement both in the higher resolution TM region and in the lower resolution ARD region assessed by means of three independent refinement runs using different but similarly distant starting structures.

Only one TRPV1 monomer and no side chains are shown for clarity.

https://doi.org/10.7554/eLife.43542.012
Comparison of our TRPV1 model with those previously refined using Rosetta (a, b) and ReMDFF (c, d).

Overlays of our model (pink and violet ribbon) with the Rosetta (left, gray ribbon) and ReMDFF (right, gray ribbon) models are shown in (a) and (c), respectively. Reciprocal-space agreement with the full map (top) and stereochemical quality for the four models assessed by EMRinger and MolProbity (bottom) are shown in (b) for the Rosetta model and in (d) for the ReMDFF model.

https://doi.org/10.7554/eLife.43542.013
Figure 7 with 1 supplement
Refinement of a substrate-free NSF complex in a distant conformation into a medium-resolution map at 3.9 Å.

(a) RMSD (Cα atoms) between the starting and the reference model (6MDO) showing the extent of rearrangements the NSF structure undergoes during refinement. (b) Reciprocal-space agreement with the full map for the starting (black dashed), the reference (gray) and the set of final models (red gradient) refined using a wide range of target force constants (top) and stereochemical quality assessed by EMRinger and MolProbity (bottom). (c) Representative secondary structure elements showing local agreement of the model refined at k = 5 × 105 kJ mol-1 with the map. (d) ATP binding pocket of the D2 domain (chain A) showing the closeness of our model and the reference to higher-resolution control X-ray structures in terms of RMSD. Some residues are explicitly labeled.

https://doi.org/10.7554/eLife.43542.014
Figure 7—figure supplement 1
Extension of Figure 7 showing the comparison of the radii of convergence across different refinement methods for the NSF system and using the same distant starting structure.

RMSD to the reference structure for all methods used is shown in the upper left plot. Overlays between the reference (6MDO, gray) and the refined structures are depicted as ribbons (colors as in the RMSD plot).

https://doi.org/10.7554/eLife.43542.015
Figure 8 with 2 supplements
Refinement of a distant nucleosome structure into a medium-resolution map of the canonical nucleosome state.

(a) RMSD (DNA and protein backbone) between the starting and the reference model (6ESF) showing the extent of rearrangements during the refinement. (b) Reciprocal-space agreement of the starting (black dashed), the reference (gray) and our model (yellow) with the full map (top) and stereochemical quality assessed by EMRinger and MolProbity (bottom). (c) Representative secondary structure elements showing local agreement of our model with the full map. (d) Representative region next to the dyad DNA showing the closeness of our model to higher-resolution control structures in terms of RMSD (only protein non-hydrogen atoms). Some residues are explicitly labeled.

https://doi.org/10.7554/eLife.43542.016
Figure 8—figure supplement 1
Extension of Figure 8b showing the reciprocal-space agreement and stereochemical quality for nucleosome models independently refined using force constants ranging from 2 to 4.5 × 105 kJ mol-1.

The structure shown in Figure 8 was refined using k = 3 × 105 kJ mol-1.

https://doi.org/10.7554/eLife.43542.017
Figure 8—figure supplement 2
Extension of Figure 8 showing the comparison of the radii of convergence across different refinement methods for the nucleosome system and using the same distant starting structure.

RMSD to the reference structure for all methods used is shown in the upper left plot. Overlays between the reference (6ESF, gray) and the refined structures are depicted as ribbons (colors as in the RMSD plot).

https://doi.org/10.7554/eLife.43542.018
Figure 9 with 2 supplements
Refinement of a ribosome complex in the CR state into a 3.4 Å map of the GA state.

(a) RMSD (RNA and protein backbone) between the starting (CR) and the final (GA) model showing the extent of rearrangements during the refinement. (b) Reciprocal-space agreement of the starting (black dashed), the reference (gray) and our model (orange) with the full map (top) and stereochemical quality for the three models assessed by EMRinger and MolProbity (bottom). (c) Representative regions showing local agreement between our model and the map (the codon-anticodon region and a RNA stem-loop in the 30S subunit). (d, e) Representative regions in the ribosomal exit tunnel (constriction site formed by L4 and L22 protein chains is shown) and in the SelB-mRNA contact interface, both demonstrating the closeness of our model to higher resolution control structures in terms of RMSD. Some protein residues are explicitly labeled.

https://doi.org/10.7554/eLife.43542.019
Figure 9—figure supplement 1
Extension of Figure 9 showing the comparison of the radii of convergence across different refinement methods for the ribosome system and using the same distant starting structure.

RMSD to the reference structure for all methods used is shown in the upper left plot. Overlays between the reference (5LZD, gray) and the refined structures are depicted as ribbons (colors as in the RMSD plot).

https://doi.org/10.7554/eLife.43542.020
Figure 9—video 1
Refinement trajectory for the 70S ribosome.
https://doi.org/10.7554/eLife.43542.021
Figure 10 with 3 supplements
Refinement of a CorA magnesium transporter in the symmetric closed state into a low-resolution 7.1 Å map of the asymmetric open state.

(a) RMSD (Cα atoms) between the starting (closed) and the reference (open) model showing the extent of rearrangements in the cytosolic part of the channel during refinement. (b) Reciprocal-space agreement of the starting (black dashed), the reference (gray) and our model refined with k= 1.0 × 105 kJ mol-1 (sea green) with the full map (top) and stereochemical quality for the three models assessed by EMRinger and MolProbity (bottom). The FSC curves were calculated using the backbone atoms only, whereas the full-atom models were used for geometry analysis. No EMRinger scores were calculated. (c) Same as in (b) but for the two MDFF models refined with (blue) and without (dark blue) secondary structure, chirality and cis peptide bond restraints. (d) Overlay of our full-atom model (see green) with full map.

https://doi.org/10.7554/eLife.43542.022
Figure 10—figure supplement 1
Extension of Figure 10 showing map-model agreement vs. rotamer or Ramachandran outliers for all CDMD and MDFF refinements.

Average FSC values (FSCavg) were calculated as described in Materials and methods. Each dot represents the result of a single independent refinement. MDFF force constants ranged from 0.05 to 0.5 (see Materials and methods for the MDFF protocol). Arrows indicate the trend in FSCavg, rotamer and Ramachandran outliers as the force constant increases.

https://doi.org/10.7554/eLife.43542.023
Figure 10—figure supplement 2
Extension of Figure 10 showing the structure of the gating pore.

The pore transmembrane helices (residues 281–312) are shown for clarity.

https://doi.org/10.7554/eLife.43542.024
Figure 10—figure supplement 3
Extension of Figure 10 showing how the pore radius changes along the nonlinear gating pathway showin in Figure 10—figure supplement 2 (bottom).

Residues facing the gating pathway are shown as dots color-coded by their relative hydrophobicities.

https://doi.org/10.7554/eLife.43542.025
Per-residue outlier analysis.

Per-residue rotamer and Ramachandran outlier propensities calculated from the non-ribosome structures (left panels; CHARMM22*/CHARMM36m force fields were used (Piana et al., 2011Huang et al., 2017) and the ribosome structure (right panels; AMBER12sb was used; Lindorff-Larsen et al., 2010) after pre-equilibration with the force field (gray dots) and after the actual refinement with our approach (blue and red dots, respectively).

https://doi.org/10.7554/eLife.43542.026

Tables

Table 1
Refinement statistics for the aldolase system.
https://doi.org/10.7554/eLife.43542.029
CDMDPhenixRosettaRefmac
FSCavg (full map)0.7740.7040.7900.778
EMRinger4.482.644.901.31
Bond lengths (Å)0.0220.0060.0220.012
Bond angles (°)2.221.301.762.90
MolProbity1.151.490.613.57
All-atom clashscore0.243.820.2831.6
Ramachandran statistics:
Favored (%)96.41100.098.1793.33
Allowed (%)2.790.01.765.06
Outliers (%)0.810.00.071.61
Poor rotamers (%)2.612.610.0932.6
CaBLAM flagged (%)9.010.18.3213.3
Table 2
Refinement statistics for the tubulin system.
https://doi.org/10.7554/eLife.43542.030
CDMDPhenixRosettaRefmac
FSCavg (full map)0.7560.7840.7030.743
EMRinger1.791.070.920.72
Bond lengths (Å)0.0220.0100.0210.014
Bond angles (°)2.301.572.792.15
MolProbity1.421.582.302.58
All-atom clashscore0.4611.724.913.8
Ramachandran statistics:
Favored (%)93.2899.4193.7593.16
Allowed (%)5.310.594.726.01
Outliers (%)1.420.01.530.83
Poor rotamers (%)2.630.730.144.38
CaBLAM flagged (%)14.116.316.89.5
Table 3
Refinement statistics for the TRPV1 system.
https://doi.org/10.7554/eLife.43542.031
CDMDPhenixRosettaRefmacCDMD (TM domain)
FSCavg (full map)0.6320.6390.6260.5300.684
EMRinger1.281.080.980.3482.12
Bond lengths (Å)0.0220.0080.0210.0120.024
Bond angles (°)2.141.502.182.812.25
MolProbity1.231.901.393.551.50
All-atom clashscore0.218.402.0329.00.29
Ramachandran statistics:
Favored (%)93.3799.2493.7591.2990.89
Allowed (%)5.970.765.267.017.59
Outliers (%)0.660.00.991.701.52
Poor rotamers (%)1.953.900.1126.93.01
CaBLAM flagged (%)15.923.220.517.914.8
Table 4
Refinement statistics for the NSF system.
https://doi.org/10.7554/eLife.43542.032
CDMDPhenixRosettaRefmac
FSCavg (full map)0.7650.7060.7480.479
EMRinger1.770.901.560.11
Bond lengths (Å)0.0220.0070.0210.020
Bond angles (°)2.301.522.173.29
MolProbity1.381.381.423.78
All-atom clashscore0.136.933.2647.1
Ramachandran statistics:
Favored (%)92.3798.8695.6789.53
Allowed (%)6.711.033.737.03
Outliers (%)0.920.110.603.44
Poor rotamers (%)2.950.410.0825.8
CaBLAM flagged (%)13.120.514.922.1
Table 5
Refinement statistics for the nucleosome system.
https://doi.org/10.7554/eLife.43542.033
CDMDPhenixRosettaRefmac
FSCavg (full map)0.6590.6760.5720.399
EMRinger1.201.500.880.76
Bond lengths (Å)0.0220.0080.0190.018
Bond angles (°)1.821.102.603.38
MolProbity0.651.582.603.88
All-atom clashscore0.18.8491.763.5
Ramachandran statistics:
Favored (%)97.3199.8797.0488.84
Allowed (%)1.880.131.758.60
Outliers (%)0.810.01.212.55
Poor rotamers (%)0.511.360.1622.62
CaBLAM flagged (%)5.68.29.415.3
Table 6
Refinement statistics for the ribosome system.
https://doi.org/10.7554/eLife.43542.034
CDMDPhenixRosettaRefmac
FSCavg (full map)0.7660.8070.5490.724
EMRinger1.561.870.240.50
Bond lengths (Å)0.0170.0160.0620.010
Bond angles (°)2.011.416.881.83
MolProbity1.611.423.172.69
All-atom clashscore0.287.65152.99.34
Ramachandran statistics:
Favored (%)94.0999.2590.8492.06
Allowed (%)5.100.755.786.69
Outliers (%)0.810.03.381.25
Poor rotamers (%)6.231.010.338.69
CaBLAM flagged (%)14.019.023.716.8
RNA backbone0.540.410.3810.38
Table 7
Refinement statistics for the CorA system.
https://doi.org/10.7554/eLife.43542.035
CDMDMDFF (restraints)MDFF (no restraints)
FSCavg (full map)0.6730.7110.713
Bond lengths (Å)0.0170.0210.020
Bond angles (°)2.112.342.30
MolProbity1.151.311.48
All-atom clashscore0.070.00.0
Ramachandran statistics:
Favored (%)94.7793.0190.09
Allowed (%)4.015.297.36
Outliers (%)1.221.702.55
Poor rotamers (%)2.183.023.72
CaBLAM flagged (%)11.113.318.1
Table 8
Data sets used in this study.
https://doi.org/10.7554/eLife.43542.027
SystemResolution, ÅEMDBDeposited PDBMethodCitation
Aldolase2.687435VY3Rosetta/PhenixHerzik et al. (2017)
Tubulin4.1n/a*3JAS, 6DPVCOOT/RefmacZhang and Nogales (2015)
TRPV13.3 (2.5–7)57783J5PCOOTLiao et al. (2013)
TRPV1 (TM)<3.357783J9JRosettaBarad et al. (2015)
TRPV1 (mono)3.3 (2.5–7)5778n/aReMDFFWang et al. (2018)
NSF3.991026MDOCOOT/Phenix§White et al. (2018)
Nucleosome3.739476ESFCOOT/PhenixBilokapic et al. (2018)
CorA7.165523JCGCOOT/PhenixMatthies et al. (2016)
70S Ribosome3.441245LZDCOOT/Rosetta/PhenixFischer et al. (2016)
  1. *Map segment derived from an asymmetric, 14-protofilament, kinesin-decorated microtubule reconstruction (provided by courtesy of R. Zhang).

    TM region of TRPV1 has a much higher resolution than ARDs.

  2. Provided by courtesy of S. Wang.

    §Optimized Phenix protocol was used (Afonine et al., 2018; White et al., 2018) that differed from that used for the other systems (Adams et al., 2010).

Table 9
Performance of the refinement benchmarks for σ = 0.2 nm on two hardware configurations.
https://doi.org/10.7554/eLife.43542.028
SystemD (×106)N (×106)W (ns/d)S (ns/d)TS (days, short)TS (days, long)
Aldolase1.30.1133.439.91.01.8
Tubulin0.20.0987.0110.90.40.6
TRPV11.90.3716.721.52.03.3
Nucleosome*0.60.1720.729.91.32.3
NSF1.30.3618.524.91.62.8
CorA1.20.3125.429.71.32.4
70S Ribosome64.03.101.42.119.1n/a
  1. D = number of cryo-EM density grid points, N = number of atoms including water and ions, W = 6 core workstation with one Intel E5-1650v4 @ 3.6 GHz CPU and one NVIDIA GTX 980 GPU, S = 24 core server node with two Intel Gold 6146 @ 3.2 GHz CPUs and two GTX 1080Ti GPUs. TS = total run time for the short (40 ns) and long (70 ns) protocols in days using the S hardware configuration (see Figure 2—figure supplement 1). The run times do not include setting up the simulated systems or batch queuing times.

    *2-fs time step was used.

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Maxim Igaev
  2. Carsten Kutzner
  3. Lars V Bock
  4. Andrea C Vaiana
  5. Helmut Grubmüller
(2019)
Automated cryo-EM structure refinement using correlation-driven molecular dynamics
eLife 8:e43542.
https://doi.org/10.7554/eLife.43542