A Bayesian approach to single-particle electron cryo-tomography in RELION-4.0

  1. Jasenko Zivanov
  2. Joaquín Otón
  3. Zunlong Ke
  4. Andriko von Kügelgen
  5. Euan Pyle
  6. Kun Qu
  7. Dustin Morado
  8. Daniel Castaño-Díez
  9. Giulia Zanetti
  10. Tanmay AM Bharat
  11. John AG Briggs  Is a corresponding author
  12. Sjors HW Scheres  Is a corresponding author
  1. MRC Laboratory of Molecular Biology, United Kingdom
  2. Laboratory of Biomedical Imaging (LIB), Switzerland
  3. BioEM lab, Biozentrum, University of Basel, Switzerland
  4. ALBA Synchrotron, Spain
  5. Max Planck Institute of Biochemistry, Germany
  6. Sir William Dunn School of Pathology, University of Oxford, United Kingdom
  7. Institute of Structural and Molecular Biology, Birkbeck College, United Kingdom
  8. Instituto Biofisika, Spain

Abstract

We present a new approach for macromolecular structure determination from multiple particles in electron cryo-tomography (cryo-ET) data sets. Whereas existing subtomogram averaging approaches are based on 3D data models, we propose to optimise a regularised likelihood target that approximates a function of the 2D experimental images. In addition, analogous to Bayesian polishing and contrast transfer function (CTF) refinement in single-particle analysis, we describe the approaches that exploit the increased signal-to-noise ratio in the averaged structure to optimise tilt-series alignments, beam-induced motions of the particles throughout the tilt-series acquisition, defoci of the individual particles, as well as higher-order optical aberrations of the microscope. Implementation of our approaches in the open-source software package RELION aims to facilitate their general use, particularly for those researchers who are already familiar with its single-particle analysis tools. We illustrate for three applications that our approaches allow structure determination from cryo-ET data to resolutions sufficient for de novo atomic modelling.

Editor's evaluation

Single-particle tomography (SPT) is a useful method to determine the structure of proteins imaged in situ. This important work presents an easy-to-use tool for SPT that approximates the use of 2D tomographic projections using a ‘pseudo-subtomogram’ data structure, chosen to facilitate implementation within the existing RELION codebase. The examples shown provide solid support for the claims about the efficacy of the approach.

https://doi.org/10.7554/eLife.83724.sa0

Introduction

In recent years, electron cryo-microscopy (cryo-EM) has allowed the 3D imaging of an increasing number of biological macromolecules at resolutions sufficient for de novo atomic modelling. This development, originally driven by advances in detector technology, was further facilitated by novel, robust image processing algorithms. In single-particle analysis, images of multiple copies of isolated macromolecular complexes, or particles, that are suspended in random orientations in a thin layer of vitreous water are combined in a 3D reconstruction. Nowadays, many aspects of single-particle analysis workflows can be performed with only minimal human supervision, for example, the detection, extraction, and initial classification of particles in the images (Zivanov et al., 2018; Bepler et al., 2019; Wagner et al., 2019), 3D reconstruction (Zivanov et al., 2018; Punjani et al., 2017), as well as refinement of the optical parameters (Zivanov et al., 2018; Zivanov et al., 2020; Punjani et al., 2017; Tegunov et al., 2021) and per-particle tracking of electron beam-induced motion (Zheng et al., 2017; Zivanov et al., 2019). Many of the algorithms that underlie these modern methods are built on solid statistical foundations that require few tunable parameters. This decreases the need for operator expertise and provides objectivity, as well as robustness, in obtaining optimal structures.

The single-particle approach is, however, limited to investigating isolated protein complexes that are purified to relative homogeneity. To examine these complexes in their crowded physiological environment, electron cryo-tomography (cryo-ET) may be used instead. In the tomographic approach, the sample is tilted multiple times during image acquisition, yielding a so-called tilt series of images from which a 3D tomogram can be computed. In the same manner as single-particle analysis, repeated occurrences of particles in those tomograms can then be aligned and averaged to obtain higher-resolution reconstructions. This process is referred to as subtomogram averaging. Unlike the field of single-particle analysis, labs use many different tools for subtomogram averaging (e.g. Kremer et al., 1996; Nickell et al., 2005; Heumann et al., 2011; Castaño-Díez et al., 2012; Hrabe et al., 2012; Chen et al., 2013; Galaz-Montoya et al., 2016; Sanchez et al., 2019; Chen et al., 2019; Jiménez de la Morena et al., 2022) and many of the tools used require considerable levels of expertise from the operator, often in order to tune parameters that arise from heuristics in the underlying algorithms. This not only provides a barrier for new scientists entering the field, but can also lead to the calculation of suboptimal structures.

Compared to single-particle analysis, subtomogram averaging faces several unique challenges. In addition to estimating the position and orientation of each particle, the algorithm also has to consider the geometry of the tilt series. Typically, this is solved through a set of preprocessing steps that include estimation of contrast transfer function (CTF) parameters and alignment of the tilt series, followed by the reconstruction of, often inconveniently large, tomograms for the entire field of view. Smaller subtomograms, centred around selected particles, are then extracted from the tomograms and used in a separate process of subtomogram alignment and averaging. The separation between tomogram reconstruction and subtomogram averaging can lead to an accumulation of errors, because errors in the CTF estimation or tilt-series alignments are hard to correct. In addition, because the sample cannot be rotated 180° within the microscope, the subtomograms contain empty regions in Fourier space, the so-called missing wedge, which are difficult to deal with in subtomogram averaging (e.g. see Schmid and Booth, 2008; Förster et al., 2008; Bartesaghi et al., 2008; Frank et al., 2012).

A fundamental problem with subtomogram averaging as described above is that it transforms the original 2D image data into 3D subtomograms, which are then used as a substitute for experimental data in the alignment algorithm. RELION-2 introduced the concept of a 3D CTF to describe the transfer of information from the 2D images to the subtomograms, which dealt to some extent with the missing wedge and the loss of information through interpolations in the reconstruction algorithm (Bharat and Scheres, 2016). A drawback of the 3D CTF approach is that it does not deal correctly with the lower resolution regions of Fourier space, where information from different tilt images overlaps. A statistically more attractive approach would be to formulate the optimisation target function directly as a function of the actual 2D images that are measured in the microscope. This has been proposed in an approach called constrained single-particle cryo-ET (Bartesaghi et al., 2012), where individually boxed particles from the tilt-series images are processed as in single-particle analysis, but their relative orientations are kept fixed. A similar approach was also implemented in the program emClarity (Himes and Zhang, 2018). To deal with unknowns in the relative orientations of the particles from the tilt-series images, as well as their CTFs, the program M recently introduced new optimisation approaches that compare reference projections against the 2D particle images (Tegunov et al., 2021). M relies on RELION for alignment and classification of 3D subtomograms that are recalculated from the optimised parameters in M. Nevertheless, this iterative approach allows subtomogram averaging to resolutions that approach those observed for single-particle analysis, even for particles in complex cellular environments (Tegunov et al., 2021).

Here, we describe a new approach to subtomogram averaging in RELION-4.0 that optimises a regularised likelihood function that approximates the direct use of the 2D images of the tilt series. In order to do so at acceptable computational and implementation costs, we have altered the main refinement program in RELION-4.0 to work with so-called pseudo-subtomograms: explicitly constructed sets of 3D data arrays that contain sums of CTF pre-multiplied 2D tilt-series images, together with auxiliary arrays that contain the corresponding sum of squared CTFs and how often each 3D voxel has been observed. Pseudo-subtomograms no longer aim to represent the actual scattering potential of the underlying particles, in the way that conventional subtomograms would. Instead, they represent a convenient way to implement an approximation to the 2D approach within the existing RELION code. Evaluation of the pseudo-subtomograms by RELION-4.0 approximates the likelihood of observing a hypothetical particle in the images of the entire tilt series, given the current model. Using that likelihood as a metric, operations equivalent to those in single-particle analysis can now be performed on tomographic data, for example, 3D initial model generation, 3D classification, or high-resolution refinement. In addition, we describe new methods for optimising parameters of the tilt series that exploit the increased signal-to-noise ratio in the average structure. Besides optimisation of the tilt-series alignment itself, we also describe methods analogous to CTF refinement (Zivanov et al., 2018; Zivanov et al., 2020) for refining descriptors of the optical properties (defocus, astigmatism, and higher-order aberrations) and a method akin to Bayesian polishing (Zivanov et al., 2019) to model beam-induced particle motion throughout the tilt series. Once all these parameters have been optimised, new pseudo-subtomograms can be constructed and the alignment can be repeated. The resulting iterative image processing workflow is similar to existing approaches for single-particle analysis in RELION.

Methods

Particle alignment and averaging

RELION performs maximum a posteriori estimation to find the set of model parameters Θ that maximise the probability of observing the experimental images X. Using Bayes’ theorem, we define a regularised likelihood optimisation target function as

(1) P(Θ|X)=P(X|Θ)P(Θ),

where P(Θ) expresses prior information about the model, that is, that the reconstructed map has limited power in Fourier space, and P(X|Θ) is the likelihood of observing the data given the model. A marginalised likelihood function is used, where one integrates over the unknown alignments ϕ of each individual particle. For simplicity, these integrals are omitted from the notations used in this article.

The data model assumes independent Gaussian noise on the Fourier components of the cryo-EM images of individual particles p. We therefore write the negative log-likelihood of observing a particle p in a hypothetical alignment ϕ as a sum over a grid of 2D Fourier pixels jR2:

(2) log(P(Xp|ϕ))j|XjpCTFp(j)Vj(p)|2σj2,

where Xp is the Fourier transform of the experimental particle image, CTFp its contrast-transfer function, Vj(p) denotes the 2D slice out of the 3D Fourier transform of the known map V into the view of the particle, and σj2 is the noise variance of the frequency band of j given by

(3) Vj(p)=exp(itpj)V(Apj)

for a 2D vector tp and a 2×3 matrix Ap that respectively encapsulate the particle’s position and orientation, and the evaluation of V(Apj) is achieved through linear interpolation.

In tomography, our aim is to approximate that same likelihood on tilt-series data. The equivalent is a sum over the pixels of the relevant regions of all images f from the tilt series:

(4) log(P(Xp|ϕ))f,j|XfjpCTFfp(j)Vfj(p)|2σj2.

We model the shifts and rotations as compositions of per-particle and per-image components:

(5) tpf=AfTpf+tf
(6) Apf=RpAf,

where we keep the per-particle rotation component, Rp, identical for all images in the tilt series, and only vary Af, the rotational alignment of the tilt-series images. In turn, the tilt-series alignment Af is shared among all particles in a given tilt image. The per-particle part of the translation is modelled as a 3D vector, Tpf3, that can vary over different tilt images f. This contrasts with single-particle analysis, where beam-induced motion of the particle can be corrected for as a preprocessing step (Li et al., 2013; Scheres, 2014; Zheng et al., 2017; Zivanov et al., 2019), so that each particle is associated with a single 2D translation in a motion-corrected image.

For our pseudo-subtomogram approach, we now approximate the sum over 2D pixels j and tilt images f in Equation 4 by a sum over 3D voxels k in the pseudo-subtomogram:

(7) log(P(X|ϕ))k|DkpWkpV(Rpk)|2Mpσk2.

Here, the data term Dp, the weight term Wp, and the multiplicity term Mp are 3D arrays in the Fourier domain. Together, they constitute a pseudo-subtomogram. They are constructed as follows:

(8) Dkp=f,jl(Apfjk)CTFfp(j)Xfjp
(9) Wkp=f,jl(Apfjk)|CTFfp(j)|2
(10) Mkp=f,jl(Apfjk),

where l() represents linear interpolation with forward mapping, that is, each 2D Fourier pixel j is projected into 3D Fourier space, updating the eight closest voxels.

Ignoring the difference of pre-multiplying the images with their CTFs, Equation 7 aims to be equivalent of Equation 4. The variance σk2 is equivalent to σj2, the power of the noise in the individual Fourier components in the 2D images.

We then optimise Equation 1 by expectation-maximisation (Dempster et al., 1977), using Equation 7 to construct the likelihood function and using a prior P(Θ)expk|Vk|2-2τk2, based on the expected frequency-dependent power of the signal τk2. This leads to the following iterative algorithm:

(11) Vk(n+1)=pD(Rpk)/σk2(n+1)pW(Rpk)/σk2(n+1)+1/τk2(n+1)
(12) σk2(n+1)=pkSk|DkWkV(Rpk)|22pkSkMk,
(13) τk2(n+1)=|Vk|22pkSkWkpkSkMk

where (n) denotes the iteration; the divisions by τk2 and σk2 in Equation 11 are evaluated element-wise; and τk2 and σk2 are calculated by averaging over τk2 and σk2, respectively, in hollow spheres of radius k and thickness 1, described by Sk. The ratio of the terms containing Wk and Mk in Equation 13 corrects the estimate for the power of the signal from the CTF-corrected map V by the average CTF2 to account for the fact that the likelihood in Equation 7 was calculated for CTF pre-multiplied images.

Pre-oriented priors

Many proteins are organised in imperfect 2D arrays inside the tomograms, for example, inside membranes or as part of capsid-like structures. Often, the individual protein molecules inside these arrays exhibit limited rotational freedom with respect to the surface normal of the array, although they may be able to rotate freely around that normal. This knowledge is often exploited in subtomogram averaging approaches through local orientational searches, for example, see Förster et al., 2005. This not only accelerates the refinement, as fewer orientations need to be evaluated, it also makes it possible to solve more challenging structures because fewer solutions are allowed. In RELION, local orientational searches are implemented as Gaussian priors on the Cartesian translations and on the three Euler angles that describe rotations (Scheres, 2012). One advantage of using pseudo-subtomogram alignment is that the coordinate system of the pseudo-subtomograms themselves can be chosen arbitrarily. By default, pseudo-subtomograms are created in the same orientation as the tomogram, but the user can choose to orient them in a more meaningful way. For example, by constructing the pseudo-subtomograms with their Z-axis parallel to the 2D array, using a rotational prior of approximately 90° on the tilt angle will limit the amount of rocking of the particles inside the array, while avoiding singularities in the definition of the Euler angles that occur when the tilt angle is close to 0°.

Tilt-series refinement

Averaging over multiple particles leads to an increased signal-to-noise ratio in the estimated density map V. We also implemented procedures that exploit V for subsequent re-estimation of parameters that describe the tilt series. These procedures do not require pseudo-subtomograms, but are performed by comparing projections of the density maps directly with the (Fourier) pixels of 2D boxes that are extracted from the tilt-series images, with a sufficient size to hold the CTF-delocalised signal. The various tilt-series parameters are then estimated by minimising the negative log-likelihood as defined in Equation 4, that is, the sum over noise-weighted square differences between the prediction and the observation.

The tilt-series properties that can be refined fall into two broad categories: optical and geometrical. The optical refinement concerns the different parameters of the CTF, while the geometrical refinement aims to optimise the alignment of the tilt series, as well as the beam-induced motion of the individual particles. Both sets of algorithms are closely related to the corresponding single-particle algorithms in RELION: optical-aberration refinement (Zivanov et al., 2018; Zivanov et al., 2020) and Bayesian polishing (Zivanov et al., 2019), respectively. In spite of the similarity between the algorithms, the models that are optimised differ significantly from single-particle analysis. Details of the implementation of the optical and geometrical refinement algorithms are given in Appendix 1. We also note that Bayesian polishing in SPA describes particle motions between individual movie frames. Although our approach for tomography can also consider movie frames, the current implementation uses the same regularisation of particle motions between movie frames within each tilt image as between the movie frames from other tilt images. Because preliminary tests showed limited benefits in considering the movie frames in this manner, only the functionality to model particle motions between the tilt-series images was exposed on the GUI.

CTF refinement for tomographic data in RELION-4.0 includes optimisation of scale factors that model frequency-dependent radiation damage, defocus, astigmatism, and higher-order symmetrical and antisymmetrical aberrations. Although individual particles within a field of view are at distinct defoci in the tilt-series images, their relative defoci are known from the geometry of the tilt series and the known 3D positions of the particles in the tomogram. Therefore, one can efficiently perform defocus estimation in a single pass, considering all particles in a tilt-series image simultaneously. In order to do so, we modified procedures that were developed for higher-order aberration estimation in single-particle analysis (Zivanov et al., 2020), where the information from all particles in each tilt-series image is condensed into two images that are used to estimate a common phase shift (see Appendix).

Similar procedures can also be used to model higher-order symmetrical and antisymmetrical aberrations in the tomographic data. Analogously to our single-particle approach, they are modelled using Zernike polynomials and estimated in the same way. Because the higher-order aberrations are often only a limiting factor at relatively high spatial frequencies, a large number of particles are needed to estimate them reliably. Optimally, higher-order aberrations would thus be estimated globally, over the entire data set, and only for cases that yield high-resolution averages. If aberrations change during data collection, data sets may be split into optics groups, for which aberrations are estimated separately. Typically, the third-order antisymmetrical aberrations are the most important ones, that is, trefoil and axial coma, which can both be caused by a tilted beam. The resolution gains that these optimisations will yield depend on the microscope (mis)alignment. Provided alignment has been performed reasonably well, higher-order aberration correction will probably be most useful for reconstructions that extend beyond 3 Å resolution.

The geometric alignment includes both (rigid) rotational and translational re-alignment of the tilt-series images, as well as the modelling of beam-induced motion of individual particles throughout the tilt series. For the latter, we neglect rotations of the particles, and only model beam-induced translations. By doing so, we can precompute the likelihood of each particle being in each position around its original one, and then look for an alignment that simultaneously maximises the sum of those likelihoods over all tilt-series images and all particles, as well as a prior that ensures spatially coherent motion. This allows us to evaluate the likelihood of a hypothetical particle position by looking up a single interpolated value in an image. In this formulation, the problem becomes equivalent to the Bayesian polishing approach that we originally developed for single-particle analysis, except for the inclusion of a third spatial dimension for the motion.

Results

We tested our approach on three test cases. Appendix 2—table 1 provides experimental details for each of the data sets; Appendix 2—table 2 provides details on the image processing.

HIV-1 immature capsid

We tested the workflow above on the cryo-ET data set that was used to determine the structure of the immature capsid lattice and spacer peptide 1 (CA-SP1) regions of the Gag polyprotein from human immunodeficiency virus 1 (HIV-1) (Schur et al., 2016) (EMPIAR-10164). We used the same subset of five tomograms that were also used to assess the NovaCTF (Turoňová et al., 2017), emClarity (Himes and Zhang, 2018), and Warp (Tegunov and Cramer, 2019) programs. Introducing 3D CTF correction, and using the alignment parameters from the original analysis by Schur et al., NovaCTF reported a resolution of 3.9 Å (Turoňová et al., 2017). The Warp program introduced local and global motion correction in the tilt-series images, as well as optimisation of CTF parameters. The combination of Warp and subtomogram alignment and averaging in RELION-3 led to a resolution of 3.8 Å. A recent application of emClarity led to a reconstruction to 3.3 Å resolution (Ni et al., 2022).

We used tilt-series projections after movie frame alignment from the original analysis (Schur et al., 2016), without any other preprocessing step, along with the tilt-series alignment data, performed with IMOD package (Kremer et al., 1996), and CTF parameters estimation using CTFFIND4 (Rohou and Grigorieff, 2015). We used 12,910 particles from the five tomograms subset, reconstructed an initial reference map using the original published particle alignment, and filtered it to 5 Å. A first alignment in 3D auto-refine, followed by averaging of the initial pseudo-subtomograms, led to a resolution of 3.6 Å. This average was then used for a full cycle of pseudo-subtomogram improvement and realignment. We first applied CTF refinement to optimise the defoci of all particles. This improved the resolution only marginally. Subsequent optimisation of the tilt-series geometry, including modelling local particle motion, improved the resolution to 3.5 Å. Finally, realignment of newly generated pseudo-subtomograms led to a resolution of 3.4 Å. A second cycle of these three steps provided 3.3 Å, while a third cycle converged to 3.2 Å (Figure 1a). Geometrical refinement was performed estimating local particle motion. The consideration of deformations did not show additional improvement. In the first cycle, where improvements in both CTFs and geometry are most obvious, the order of applying those optimisations did not alter the final result for this data set. These data and results are also distributed as part of the subtomogram tutorial in RELION-4.0, as described on https://relion.readthedocs.io/en/release-4.0/. Figure 1—figure supplement 1 shows the improvement in map quality during the iterative refinement process; Figure 1—figure supplement 2 shows a comparison with the 3.3 Å map from emClarity.

Figure 1 with 3 supplements see all
Subtomogram averaging of the HIV-1 immature capsid.

(a) Fourier Shell Correlation (FSC) for resolution estimation of iteratively improved reconstructions using the new RELION-4.0 workflow. (b) Representative region of reconstructed density in the final map. (c) The same density as in (b), together with the published atomic model 5L93, which has not been additionally refined in the density.

Analysis of the complete data set generated a structure at 3.0 Å resolution (Figure 1—figure supplement 1), which is the same resolution obtained using the M and RELION-3 workflow (Tegunov et al., 2021; Figure 1—figure supplement 3), and is likely limited by flexibility and asymmetry in the CA hexamer.

Caulobacter crescentus S-layer

We also applied our approach to thin cellular appendages of C. crescentus bacteria known as stalks, which have previously been imaged using cryo-ET (Bharat et al., 2017). The cell body and cell stalks of C. crescentus cells are covered by a nearly hexagonal, paracrystalline array known as the surface layer (S-layer) (Smit et al., 1992). The structure of the S-layer was solved using a combination of X-ray crystallography, cryo-EM single-particle analysis, and subtomogram averaging, revealing how the S-layer is attached to bacterial cells by an abundant glycolipid called lipopolysaccharide (LPS) (Bharat et al., 2017; von Kügelgen et al., 2020). Previously, cryo-ET of the S-layer, using 110 tilt series collected with a dose-symmetric scheme, yielded 51,866 hexamers of the S-layer. This study used a subtomogram averaging approach that is based on a constrained cross-correlation approach implemented in the AV3 MATLAB suite (Förster and Hegerl, 2007), and which was specifically optimised for the analysis of macromolecules arranged in a lattice (Wan et al., 2017). A 7.4 Å reconstruction of the S-layer was obtained, in which alpha-helices were resolved (Bharat et al., 2017). This reconstruction was improved by application of NovaCTF (Turoňová et al., 2017), leading to a 4.8 Å reconstruction, in which large amino acid residue side chains were resolved (von Kügelgen et al., 2020). Moreover, density for an LPS molecule was observed near the putative LPS-binding residues of the S-layer, in agreement with a cryo-EM single-particle structure of an in vitro reconstituted coplex (von Kügelgen et al., 2020). We used the tilt series after movie frame alignment from the initial analysis (Bharat et al., 2017), along with the tilt-series alignments performed within IMOD (Kremer et al., 1996), CTF parameters from CTFFIND4 (Rohou and Grigorieff, 2015), and the Euler angle assignments and subtomogram coordinates from the original analysis. These parameters were imported into RELION-4.0, followed by multiple cycles of pseudo-subtomogram generation and refinement, analogous to the immature HIV-1 data set described above, leading to a 5.6 Å reconstruction of the S-layer hexamer (Figure 2a). Next, we defined a mask around the central pore of the S-layer, corresponding to the inner domain bound to LPS, to perform focused refinements. Another cycle of pseudo-subtomogram reconstruction, CTF refinement, and refinement within the new mask improved the resolution to 4.4 Å. Accounting for per-particle motions with additional cycles of pseudo-subtomogram improvements and refinements increased the resolution of the central pore to 4.0 Å, and the inner domain of the S-layer to 3.7 Å. Further 3D classification without alignments identified a subset of 42,990 subtomograms that gave a 3.5 Å resolution reconstruction of the inner S-layer.

Figure 2 with 1 supplement see all
Subtomogram averaging of the C. crescentus S-layer from cell stalks.

(a) FSC for resolution estimation of iteratively improved reconstructions using the new RELION-4.0 workflow, tested on the S-layer inner domain. (b) Densities for the previously identified lipopolysaccharide (LPS) (cyan and orange) and Ca2+ ions (green) in prior electron cryo-microscopy (cryo-EM) single-particle analyses are resolved. (c, d) The final map shows two densities for bound LPS O-antigen chains. Panel (c) shows only the S-layer protein as blue ribbon and (d) shows LPS O-antigen as orange and cyan sugars corresponding to the N-acetyl-perosamine and mannose moieties, respectively.

The 3.5 Å map is in excellent agreement with the single-particle structure of the in vitro reconstituted complex, including the LPS binding site (von Kügelgen et al., 2020; see Figure 2—figure supplement 1). Furthermore, divalent metal ions, known to be tightly bound to the inner S-layer (Matthew, 2021), are resolved (Figure 2b). Surprisingly, at lower isosurface contour levels, we also observed a second LPS binding site (Figure 2c and d). The size and shape of this density agree with the structure of the LPS O-antigen, illustrating how improved subtomogram averaging in RELION-4.0 can help uncover new biology.

Coat protein complex II

Finally, we applied our approach to the Saccharomyces cerevisiae coat protein complex II (COPII), which mediates the transport of newly synthesised proteins from the endoplasmic reticulum (ER) to the Golgi apparatus as part of the secretory pathway. COPII is formed by five proteins that assemble sequentially on the ER membrane to induce remodelling of the bilayer into coated carriers in a process known as COPII budding, while simultaneously selectively recruiting cargo into these budding membrane carriers. COPII budding can be reconstituted in vitro from purified proteins and artificial membranes, to form small, spherical vesicles, or long, straight tubes. Cryo-ET has previously been used to visualise the architecture of COPII on reconstituted tubules (Hutchings et al., 2018; Hutchings et al., 2021). The coat assembles into two concentric layers; the inner layer forms a pseudo helical lattice, which has previously been solved to 4.6 Å resolution using Dynamo-based subtomogram averaging protocols (Castaño-Díez et al., 2012).

We used the tilt series after movie alignment from the initial analysis (Hutchings et al., 2021), along with the tilt-series alignments performed in Dyname (Castaño-Díez et al., 2012) and CTF parameters from CTFFIND4 (Rohou and Grigorieff, 2015). COPII-coated tubes were manually traced in the resulting tomograms, and particles were extracted by randomly oversampling their surface, with approximate initial orientations assigned based on the cylindrical geometry. Dynamo was used for initial alignment of 8× binned subtomograms to define the centre of the particles and the directionality of individual tubes. We then imported the particle coordinates for processing in RELION-4.0 using 3D refinement at 4× and 2× binning. Since we expect inner coat subunits to arrange in a lattice, we cleaned the data set by excluding any subtomograms that did not conform to the expected geometrical relationship with their neighbouring particles. A first 3D refinement of the unbinned data set gave a map at 4.4 Å resolution, which was further improved to 4.2 Å and 4.0 Å by tilt-series frame alignment and CTF refinement, respectively. Two further rounds of 3D refinement, followed by tilt-series frame alignment and CTF refinement, yielded a final map with a resolution of 3.8 Å (Figure 3).

Subtomogram averaging of the COP-II inner layer.

(a) FSC for resolution estimation of iteratively improved reconstructions using the new RELION-4.0 workflow, tested on the COP-II inner layer. (b) Reconstructed density for the inner layer. (c) Zoomed-in region of the final map (in transparent grey) with the refined atomic model (blue).

At this resolution, most side chains are visible in the map, enabling us to build and refine an atomic model. The improved model will allow the design of point mutants to precisely disrupt interfaces between coat subunits and test their effects in COPII budding.

Discussion

We formulate the problem of averaging over multiple identical particles in tomographic tilt series in an empirical Bayesian framework that is based on a statistical model that approximates one for two-dimensional experimental data. The Bayesian framework has proven effective in reducing the number of tunable parameters and in obtaining high-quality reconstructions from single-particle data (Fernandez-Leiro and Scheres, 2016). The two-dimensional data model describes the experimental images better than alternative approaches that use 3D reconstructed subtomograms as an intermediate. One example of a problem with the intermediate 3D data model is the need for missing wedge correction, which arises from the observation that the experimental images were acquired, incompletely, in three dimensions. Artefacts related to suboptimal missing wedge correction may affect both alignment and classification of particles. By using an approximation to the 2D data model, missing wedge correction is no longer required. Instead, the problem approaches that of single-particle analysis, where projections from different orientations and of different structural states are sorted out simultaneously. Provided the 3D Fourier transform of the distinct classes is fully sampled through the orientation distribution of the raw particles, likelihood optimisation techniques have been highly successful in tackling this problem in single-particle analysis (Scheres et al., 2007; Fernandez-Leiro and Scheres, 2016).

In practice, the implementation in RELION-4.0 does not use stacks of 2D projection images as input for the refinement program that performs alignment and classification. Instead, the concept of 3D pseudo-subtomograms is introduced, where the tilt-series images are Fourier transformed, pre-multiplied with their CTF, and inserted as a slice into a 3D Fourier volume according to the best current estimates for the tilt-series geometry. The use of 3D pseudo-subtomograms allowed reusing existing code for subtomogram averaging in RELION, while input stacks of 2D images would have required significant software development efforts. Nevertheless, in the future we might still choose to implement a true 2D version of the code, which would be more efficient, both in terms of processing time and disk storage requirements. In cases where the number of tilt images is small in comparison to the box size, fewer Fourier pixels need to be examined in a stack of 2D images than in a pseudo-subtomogram, with a corresponding decrease in processing time. Moreover, the likelihood calculation from the 3D pseudo-subtomogram approach requires separate storage of the accumulated squares of the CTFs, and the corresponding multiplicity terms. In contrast, in the 2D approach, only the 2D images need to be stored, as CTF parameters can be calculated on-the-fly and there is no need for a multiplicity term, giving a corresponding decrease in storage requirements. However, if one were to collect tilt series with very fine angular increments or in a continuous manner (Chreifi et al., 2019), then the current implementation may still be preferable.

Besides the alignment and classification of individual particles, the methods described here also deal with re-estimation of parameters that describe the optical and geometrical features of the tilt series. As soon as an initial average structure has been obtained, its increased signal-to-noise ratio can be exploited to determine these parameters more accurately than what is possible from the raw tilt-series images alone. The implementations in RELION-4.0 again follow those previously implemented for single-particle analysis, where CTF refinement is used for re-estimation of the tilt-series images CTFs, and algorithms akin to Bayesian polishing are used to re-estimate the tilt-series alignment, as well as the movement of individual particles throughout the tilt-series acquisition process. As better tilt-series parameters will allow better pseudo-subtomograms, particle alignment and classification are iterated with the optimisation of the tilt-series parameters.

Similar tilt-series and CTF optimisation approaches have been implemented in the program M (Tegunov et al., 2021). Compared to M, RELION-4.0 uses computationally more efficient algorithms; M uses GPUs to accelerate the calculations. In both tomography and SPA, RELION-4.0 only models beam-induced translations of the particles, whereas M also models beam-induced rotations. Since SPA routinely reaches 2 Å resolutions without modelling beam-induced rotations, we assumed that the effect of rotations of individual particles throughout the tilt series is not large enough to warrant their correction at typical tomography resolutions. In cases where the data do allow for better than 2 Å resolutions, M could still be used to correct for beam-induced rotations in a postprocessing step, following alignment and classification of the individual particles in RELION. It is likely that adaptation of M, in order to function with the pseudo-subtomograms proposed here, would lead to increased synergy between the two programs. In the meantime, external tools to convert from M parameters to RELION-4.0 are already available (https://github.com/joton/reliontomotools; Zivanov, 2022 copy archived at swh:1:rev:bfa43828876ceb77bed0c7eb72f794c79c9de5e6).

Besides the reduction in tunable parameters that is characteristic of the Bayesian approach, its uptake by researchers that are new to the field is further facilitated through the implementation of a graphical user interface. This interface is already widely used for single-particle analysis and has been extended for the processing of tomographic data in RELION-4.0. Apart from the calculations that will be familiar to users of single-particle analysis, for example, 3D classification, 3D initial model generation, and 3D auto-refinement, the new interface also provides convenient access to the tomography-specific versions for CTF refinement and Bayesian polishing, as well as preprocessing operations to calculate the pseudo-subtomograms. However, tilt-series alignment, tomogram reconstruction, and particle picking are not yet part of the RELION workflow. Efforts to also implement solutions for those steps in a single tomography processing pipeline are ongoing and will be part of future RELION releases. Meanwhile, current import procedures rely on specific preprocessing operations in IMOD (Kremer et al., 1996), and particle coordinate conversion tools to use in RELION-4.0 are available for a range of third-party software packages (Pyle et al., 2022). To further facilitate the uptake of this new software by the community, we have provided an online tutorial that uses the publicly available HIV-1 immature capsid data set to describe and illustrate all steps necessary to obtain the results described in Figure 1.

In summary, we introduce new methods for subtomogram averaging to resolutions that are sufficient for de novo atomic modelling and increase the accessibility of this emerging technique. We envision that our methods will allow more researchers to calculate better structures from tomographic data, which will aid the next revolution in structural biology, where macromolecular complexes are imaged, not in isolation, but in their biologically relevant environment.

Appendix 1

CTF refinement

CTF refinement in RELION-4.0 optimises the following parameters: scale, defocus, astigmatism, and higher-order (even and odd) optical aberrations. Since, save for the difference in defocus, the same CTF needs to be valid for an entire micrograph of particles, similar optimisations can be applied as in our single-particle algorithms. All the relevant information is first consolidated into a minimal form using linear transformations, and the final, typically non-linear, optimisation is then performed on that minimal form.

We formulate the CTF for tilt-series frame f of particle p as follows:

(14) CTFpf(j)=αfτf(j)sin(γpf(j))exp(iρf(j)),

where αf describes the overall scaling factor, τf(j) the empirical radiation-damage model as defined by Grant and Grigorieff, 2015, γpfj the symmetrical phase delay component, and ρfj the antisymmetrical one. Note that only γ varies between particles. This is because it contains the quadratic defocus term that depends on the position of the particle. The phase delays are parametrised the same way as in single-particle analysis in RELION-3 – as a combination of explicitly named low-order terms and higher-order Zernike polynomials:

(15) γpf(j)=πλjDpfj+π2Csλ3|j|4χf+Znm(j),
(16) Dpf=[δzp+a1a2a2δzpa2]

As before, the astigmatic-defocus matrix Dpf is decomposed into a defocus term δzp and two linear astigmatism terms, a1 and a2, while Cs describes the spherical aberration of the microscope, χf a constant phase offset (owing to amplitude contrast and a phase plate, if one is used), λ is the wavelength of the electron, and Znm are the higher-order even Zernike terms. One key difference to single-particle analysis is that the defocus term δzp is no longer a free parameter for each particle, but it instead depends on the already known 3D position of the particle. Therefore, in tomography, the defocus term is only estimated once per tilt image, and all the particles contribute to that estimate.

The scaling factor αf is estimated by computing the following two sums for each micrograph and dividing them (the † symbol indicates complex conjugation):

(17) αf=GfHf
(18) Gf=p,j1σ|j|2Re(Xpfj(CTFpf(j)Vpfj(p)))
(19) Hf=p,j1σ|j|2|CTFpf(j)Vpfj(p)|2.

Note that the CTF used in Equation 19 is missing its scale factor:

(20) CTFpf(j)=τf(j)sin(γpf(j))exp(iρf(j)).

Alternatively, we also allow the user to fit the parameters of Lambert’s extinction model to the data instead, assuming perfectly flat samples of constant thickness. In that case, the CTF scale in image f of tomogram t is expressed as a function of the beam luminance α0, sample normal nt, and optical sample thickness κt:

(21) αtf(α0,κt,nt)=α0κt1|ntzf|

If this option is used, then the CTF scales of all the tilt series in the data set are estimated together. The beam luminance α0 is modelled globally, while the sample thickness and normal are allowed to differ among tomograms, but not between the images of a tilt series. The vector zf points in viewing direction of tilt image f. Note that this model does not allow for separating the geometrical sample thickness from its extinction factor, so we can only estimate the product of the two. Also, the ice normal is required to be perpendicular to the estimated tilt axis of the tilt series since its component pointing in the direction of the axis is indistinguishable from an increase in ice thickness or opacity. This global optimisation is performed using the sums Gtf and Htf computed in Equations 18 and 19, where the subscript t indicates tilt-series t. This is done by finding a global value of α0 and values of κt and ntS2 for all tomograms that produce αtf which minimise the following quantity and thus maximise the overall likelihood in Equation 4:

(22) t,f(Htfαtf(α0,κt,nt)Gtf)2.

To perform defocus estimation efficiently, we apply the optimisations we originally developed for the estimation of higher-order aberrations in single-particle analysis (Zivanov et al., 2020). It allows us to determine a collective offset to γ for a large set of particles that all have different values of γ. Specifically, it allows the change to the log-likelihood arising from changing the value of γ at any Fourier pixel to be expressed as a pair of 2D images, independently of the number of particles. Therefore, each pixel of each particle only needs to be considered once. After that, the log-likelihood can be evaluated by iterating over the pixels of a single image.

In single-particle analysis, this approach is used to estimate the higher-order aberrations that are shared among all the particles in a data set. In tomography, we also use this approach to condense the information from all the particles in a tilt image (all of which exhibit slightly different defoci), into two such images, and to then determine the optimal change to γ efficiently using a nonlinear algorithm.

The two condensed images R and t^ that we compute are the same as the ones in single-particle analysis, except for the inclusion of the noise power σ2. The definitions are repeated here for the sake of completeness. Note that each pixel j of R contains a real symmetrical 2×2 matrix and each pixel of t^ a 2 vector:

(23) Rfj=p1σj2|V~fj(p)|2dpfjdpfj
(24) t^fj=Rfj1p1σj2Re(XpfjV~fj(p))dpfj,

where dpfj2 describes the point on the unit circle corresponding to the initial phase angle γ(0), which is given by the initial CTF parameters:

(25) dpfj=[cos(γpf(0)(j))sin(γpf(0)(j))]

The predicted 2D images V~f(p) contain the effect of the initial CTFs, except for the symmetrical aberration:

(26) V~fj(p)=αfτf(j)exp(iρf(j))Vfj(p)

The vector-valued condensed image t^f describes the most likely phase shift γ for each pixel j, expressed as a point on a circle, while the matrix-valued one, Rf, describes the anisotropic weight of that information. With these two condensed images computed for a given tilt image f, the change to the likelihood defined in Equation 4 resulting from the change to the phase delay γ at any pixel can be expressed as a quadratic form. Therefore, we look for a change δD to the astigmatic-defocus matrices Dpf which produces phase delays that minimise that quadratic form:

(27) Cf=jej(δD)Rjej(δD),

where the per-pixel error ej(δD) is given by the deviation from the optimal phase shift t^j:

(28) ej(δD)=[cos(δγ)Re(t^j)sin(δγ)Im(t^j)]
(29) δγ=jδDj

As an alternative to fitting D independently for each tilt-series image, our program also allows the user to apply an L2 regulariser to the δDf of the different images in the same series. In that case, the sum in Equation 27 runs over all the pixels j of all the tilt-series frames f. This helps to stabilise the CTFs of the higher tilts, but it risks impairing the estimates of the CTFs of the more important lower tilts. Formally, this is done by minimising the following cost:

(30) Cglob=f,jefj(Df)Rfjefj(Df)+λf|DfD^|2,

Since the early tilt-series images carry more information than the later ones, their values in R are typically significantly greater. Therefore, using this formulation, they automatically assume a greater weight in the estimation. The optimal weight for the regulariser itself, λ, cannot be measured from the data. Its optimal value depends on how reproducible the defocus values are for each of the tilt-series images, which in turn depends on the microscope setup, such as the stability of the stage.

Geometric refinement

Analogously to Bayesian polishing, the log-likelihood of a particle being observed at a position s is given by twice its cross-correlation with the predicted image:

(31) log(P(X|s))=2CC(s)
(32) CC=IFT(ωXV(P))
(33) wj=1σj2,

where IFT stands for inverse Fourier transform.

To keep the problem differentiable, the cross-correlation CC is always accessed using cubic interpolation. After the inverse Fourier transformation, each such cross-correlation table is cropped to a smaller size to make storing many such tables feasible, and the memory throughput efficient. The size of the tables can be controlled by the user, and should be set to the maximal positional error expected in the data set.

The geometrical model that is optimised this way projects points s3 in the tomogram to 2D positions pf2 in each tilt image:

(34) pf=W(1f)
(35) If=Jf[S1]

The initial linear projection lf is obtained by multiplying s with a 2×4 matrix Jf, and then optionally shifted by the non-linear image distortion W. The cost Calign that is being minimised consists of the sum over all (negative) cross-correlation values of all particles in all images plus all regularisers for all regularised parameters:

(36) Calign=-p,fCC(pf-pf(0))+

Although our framework supports arbitrary projection matrices Jf, our optimisation algorithm only looks for orthogonal rotations to the initial projection matrix. This is achieved by parametrising that rotation using Tait–Bryan angles, not Euler angles. The disadvantage of Euler angles is that they are gimbal locked in the initial configuration where all three angles are zero, that is, the first and third angles refer to the same axis. The rigid alignment of the tilt image is never regularised because we do not assume to have any prior information on it.

The distortion field W can take different forms. We have implemented models that express the distortion using the Fourier basis, a cubic spline basis, and an affine linear one. The intended purpose of these deformations is to model distortions of the image that arise at the image forming lenses at the bottom of the optical system. An imperfect calibration of these lenses is likely to go unnoticed as long as the microscope is only used for single-particle analysis because the same particle is never observed at starkly different positions during the collection of a single-particle movie. In tomography, a given particle may appear at any image position in any tilt image, so arbitrary deformations to the 2D image become relevant. We expect these deformations to be stable over time, so the intended purpose of the deformation field W is to model only one such deformation per tilt series. Optionally, we also allow the user to instead model a different deformation for each tilt image, but we have not encountered any data sets where this has produced an improvement. The deformation fields are optionally regularised by penalising the squared coefficients of the respective model. This limits the extent of deformation, and it forces the system to explain changes in position through particle motion, rather than image deformations.

The quantity that we do expect to change during the collection of the tilt series is instead the 3D position of the particle, sf. Analogously to Bayesian polishing, we model this change as motion over time. The position in image f is given by the sum over its per-tilt-series-frame velocities vf3 up to that point. Note that the velocity vector vf refers to the motion between images f and f+1:

(37) Sf=S0+f=0f<fvf

It is important to note that the tilt images are implicitly assumed to be ordered chronologically. In practice, this is usually not given, so the images are reordered by the program based on the cumulative radiation dose of each image.

As in Bayesian polishing, the motion vectors themselves are expressed in a collective coordinate system for all the particles. This allows the spatial smoothness of a hypothetical motion field to be evaluated and used as a prior. For a more detailed derivation, we refer to the paper on Bayesian polishing for single-particle analysis (Zivanov et al., 2019). The formal details will be given in the following for the sake of completeness and to highlight differences to the original formulation.

The collective coordinate system for particle motion is obtained through a low-rank approximation of a Gaussian process. This is done by constructing and then diagonalising the P×P covariance matrix S for a set of initial particle positions (where P is the number of particles). The entries of S contain the value of the following square-exponential covariance function for each pair of particles p and q:

(38) Sp,q=σV2exp(-|sp-sq|2/σD).

Optionally, the user can instead also use the original formulation without the square inside the exponential:

(39) Sp,q=σV2exp(-|sp-sq|/σD).

The former option forces particles in immediate proximity to move more similarly, but it allows for a greater discrepancy at greater distances. Both the single-particle and the tomography implementations allow the user to choose either function, but the default has changed from the latter to the former in tomography. This choice was motivated by both empirical observations and the fact that the square-exponential kernel produces fewer meaningful deformation components, which speeds up the optimisation for tomograms with hundreds of particles.

We perform a singular-value decomposition of the covariance matrix S,

(40) S=UΛW,

which allows us to construct the coordinate system as follows:

(41) bi=λiwi,

where λi is the ith singular value and aiP the corresponding singular vector. Basis vectors with small λi are discarded here to speed up the optimisation. Let P represent the number of remaining basis vectors. In this coordinate system, the set of all particle velocities in a tilt image Vf(3)P can be expressed as Vf=BQf, where Qf is a P×3 coefficient matrix that encodes the velocity in three spatial dimensions. Note that the same basis B is shared between all three dimensions and all tilt-series frames. In this coordinate system, the negative log-likelihood of a configuration of particle velocities is given by the Frobenius norm of the coefficient vector, ||Qf||, that is, the sum of the squares of its entries. Therefore, the motion regulariser takes a simple form:

(42) Rmotion=f||Qf||.

The acceleration regulariser that would penalise changes in velocity from one tilt-series frame to the next in single-particle analysis has been omitted from tomography. This is because, unlike a single-particle movie, the tilt images are not collected in one continuous exposure. Since they are being exposed individually, there is no reason to assume that the particle velocities will be continuous between them. Two further differences to the original Bayesian polishing are hidden in the notation: the covariance is now based on the 3D distances between the particles, and the coefficient matrix Q contains three columns instead of two.

As in the original Bayesian polishing approach, the complete alignment of the tilt series is performed by finding parameters that minimise Calign from Equation 36 using L-BFGS (Liu and Nocedal, 1989). The set of parameters that are being optimised always includes the three Tait–Bryan angles for each tilt image and the set of initial particle positions. The latter are essential because all the information we have about their 3D positions is derived from the tilt images themselves, so changing the alignment requires the particles to be able to shift to more likely positions. Estimating the image deformations and particle motion is optional. If they are being estimated, then a set of deformation coefficients is fitted either to each tilt image or to each tilt series, while a set of motion coefficients is fitted to each image transition.

In addition to this local, L-BFGS-based refinement, we also offer two methods to align only the 2D shifts of all tilt images globally. This means that instead of trying to obtain the optimal alignment through small changes to the initial one, we instead look for the best possible image shift overall, keeping all other parameters constant. This is helpful when individual tilt images are so badly aligned that a local optimisation cannot converge to the globally optimal position. Note that the initially assumed angles are rarely as incorrect as the image shifts since the angles can be controlled more effectively through the experimental setup.

There are two variants to this method. If the sample contains few particles per tomogram, then the best results are obtained by predicting an entire micrograph and computing its cross-correlation with the original one. The maximum value in that large cross-correlation image then indicates the optimal image shift. This approach can in theory deal with arbitrarily large misalignments. If the sample is dense, however, then this whole-micrograph approach can fail. In that case, better results are obtained by instead adding up the small, per-particle cross-correlation images defined in Equation 31, and finding the maximum in that sum. This latter approach can only correct for misalignments smaller than half the box size of the particle, and it often produces inferior results on samples with few particles per tomogram.

Appendix 2

Appendix 2—table 1
Electron cryo-tomography (Cryo-ET) data collection, refinement, and validation statistics.
HIV-1 Gag(EMD-16207 /EMD-16209)S-layer_inner_domain(EMD-16183)(PDB 8BQE)COPII inner coat(EMD-15949)(PDB 8BSH)
Data collection
MicroscopeTitan KriosTitan KriosTitan Krios
DetectorK2 (Gatan)K2 (Gatan)K2 (Gatan)
SoftwareSerialEM (Mastronarde, 2003)SerialEM (Mastronarde, 2003)SerialEM
(Mastronarde, 2003)
Voltage (kV)300300300
Slit width (eV)202020
Defocus range ( μm)–1.5 to –5.0–1.5 to –5.0–1.5 to –4.5
Pixel size (Å)1.351.351.33
Total exposure (e/\AA2)120–145∼140∼120
Exposure per tilt ( e/\AA2)3.0–3.53.42.9
Total number of tilts414141
Frames per tilt-movie8–101010
Tilt increment±3±3±3
Tilt-series schemeDose-symmetricalDose-symmetricalDose-symmetrical
Tilt range±60±60±60
Tilt-series (no.)5/43110137
Data processing
Software tilt-series alignmentIMOD (Kremer et al., 1996)IMOD (Kremer et al., 1996)Dynamo (Castaño-Díez et al., 2012)
Software CTF estimationCTFPLOTTER (Xiong et al., 2009)CTFFIND4 (Rohou and Grigorieff, 2015)CTFFIND4 (Rohou and Grigorieff, 2015)
Particle images (no.)12,910/144,27542,990106,533
Pre-cropped box-size (pix)512600512
Final box-size (pix)192180196
Pixel size final rec. (Å)1.351.351.33
Symmetry imposedC6C6C1
Map resolution (Å)3.2/3.03.53.8
FSC threshold0.1430.1430.143
Map resolution range (Å)3.2–4.3/3.0–3.53.5–4.83.8–7.2
Map sharpening B factor (Å 2)–85 / –95–75–106
Model refinement
Initial model used (PDB code)6T726GNI
SoftwarePHENIX (Afonine et al., 2018)Isolde (Croll, 2018) and PHENIX (Afonine et al., 2018)
Model resolution (Å)3.64.0
FSC threshold0.50.5
Model composition
Non-hydrogen atoms (no.)11,27413,635
Protein residues (no.)14521729
R.m.s. deviations
Bond lengths (Å)0.0010.004
Bond angles (°)0.3220.858
Validation
MolProbity score1.132.01
Clashscore3.4214.71
Poor rotamers (%)00.66
C β outliers (%)00.00
CABLAM outliers (%)0.842.33
Ramachandran plot
Favoured (%)98.895.1
Allowed (%)1.24.8
Disallowed (%)00.1
Appendix 2—table 2
Computational costs and hardware specifics.
HIV-1 GagS-layer inner domainCOPII inner coat
Tilt-series (no.)5/43110137
Final particle images (no.)12,910/144,27542,990106,533
Pre-cropped box-size (pix)512600512
Final box-size (pix)192180196
Computational costs
Pseudo-subtomogram
Compute time21 min/40 min34 min67 min
Number of CPU nodes1/111
Disk space343 GB/3.8 TB777 GB3.1 TB
Refine3D
Compute time18 hr*/33 hr14 hr57 hr
Number of GPU nodes1/111
Ctf refinement
Compute time15 min*/35 min2 hr2 hr
Number of CPU nodes1/111
Disk space32 MB/247 MB621 MB673 MB
Frame alignment
Compute time2 hr*/12 hr2 hr6 hr
Number of CPU nodes1/111
Disk space383 MB/4.1 GB1.9 GB2.2 GB
Hardware specifics
CPU nodes
CPU model2x Intel Xeon E5-2698 v42x Intel Xeon 6258R1x AMD EPYC 7H12
CPU memory512 GB754 GB256 GB
GPU nodes
CPU model2x Intel Xeon Silver 41162x Intel Xeon E5-2667 v41x AMD EPYC 7H12
CPU memory384 GB256 GB256 GB
GPU model2x Nvidia Quadro RTX 50004x Nvidia GeForce GTX 1080 Ti4x Nvidia RTX A6000
  1. *

    These calculations were performed using the same hardware as for the S-layer inner domain.

Data availability

We have only used previously published cryo-EM data sets for testing our software. Reconstructed maps and atomic models generated in this study have been submitted to the EMDB and PDB, with entry codes as indicated in Table 1.

The following previously published data sets were used
    1. Schur FK
    2. Obr M
    3. Hagen WJ
    4. Wan W
    5. Jakobi AJ
    6. Kirkpatrick JM
    7. Sachse C
    8. Kräusslich HG
    9. Briggs JA
    (2018) Empiar
    ID EMPIAR-10164. Cryo-electron tomography of immature HIV-1 dMACANC VLPs.

References

Decision letter

  1. Edward H Egelman
    Reviewing Editor; University of Virginia, United States
  2. Volker Dötsch
    Senior Editor; Goethe University, Germany
  3. Alberto Bartesaghi
    Reviewer; Duke University, United States

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "A Bayesian approach to single-particle electron cryo-tomography in RELION-4.0" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, one of whom is a member of Our Board of Reviewing Editors, and the evaluation has been overseen by Volker Dötsch as the Senior Editor. The following individual involved in the review of your submission has agreed to reveal their identity: Alberto Bartesaghi (Reviewer #1).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

(1) Half the citations in the manuscript are self-references (16 out of 32!), and important/relevant work on SPT was left out and should be referenced: Himes et al., 2018 (https://doi.org/10.1038/s41592-018-0167-z), Chen et al., 2019 (https://doi.org/10.1038/s41592-019-0591-8), and Sanchez et al., 2019 (https://doi.org/10.23919/EUSIPCO.2019.8903041). Also, with the exception of their own sub-volume averaging approach [4], the vast body of work on missing-wedge compensation was not referenced in the introduction.

(2) In the section on "Orientation priors", they state that one advantage of the proposed approach is that "the coordinate system of the pseudo subtomograms themselves can be chosen arbitrarily" and that "this not only accelerates the refinement" but also "makes it possible to solve more challenging structures because fewer solutions are allowed". This advantage, however, is certainly not specific to the "pseudo-subtomogram" data representation proposed here. Indeed, this strategy was originally proposed in the context of single-particle cryo-EM by Jinag et al., 2001 (https://doi.org/10.1006/jsbi.2001.4376) and was first used for processing cryo-ET data by Forster et al., 2005 (https://doi.org/10.1073/pnas.040917810), followed by many others. This point should be contextualized and the appropriate references included.

(3) Results on the HIV-1 Gag benchmark dataset (EMPIAR-10164) have two components: (1) processing of a "standard" subset consisting of 5 tilt-series, and (2) processing of the entire dataset (43 tilt-series). As far as I know, the best reconstruction obtained using the five tilt-series is a 3.3 A map obtained using emClarity (https://www.ebi.ac.uk/emdb/EMD-13354). Since only lower resolution results obtained with NovaCTF (3.9A) and Warp (3.8A) are cited in the text, my guess is that the authors were not aware of this newer result. Given that the resolution of the emClarity map (3.3A) is similar to the one presented here (3.2A), it would be useful to compare their reconstruction against emClarity's EMD-13354 (both in terms of map features and FSC).

(4) When processing the entire dataset, a 3.0A resolution map was obtained which matches the resolution obtained by Warp/M [24]. This result, however, is "not shown" (page 5). Since this is the only benchmark dataset analyzed in this study and given that many other packages have analyzed the same data, this map should be presented together with the corresponding FSC curve, data processing details, and comparisons made against M's 3.0A EMD-11655 map.

(5) No experiments are presented to validate the ability of the approach to correct for higher-order aberrations. Since "higher-order aberration correction will probably be most useful for reconstructions that extend beyond 3A resolution", why didn't they validate their approach using one of the sub-3A cryo-ET datasets available on EMPIAR (e.g., EMPIAR-10491)?

(6) It would be useful to include a table with a summary of all the relevant data collection/data processing parameters (e.g., pixel size, detector, number of tilt-series and sub-volumes, symmetry, resolution, number of refinement rounds, etc.) for all datasets analyzed in this study.

(7) Ony the 3.8A map for the COPII inner coat has been deposited in the EMDB. All other maps and related metadata need to be deposited as well.

(8) Computational complexity is a very important topic in the context of SPT, but no details are included in the text other than the following statement: "Compared to M, RELION-4.0 uses computationally more efficient algorithms that do not require the computational power of a GPU". Does this imply that computationally inefficient algorithms require the use of GPUs? If the authors want to comment on computational complexity (which I think would be very important to do), actual numbers need to be presented, such as running times, compute resources used, storage, etc.

9) Regarding the use of a "true 2D approach" vs. "pseudo-volumes" (page 6), the following claim is made: "For example, the current implementation could be used to process pairs of tilted images, but a 2D implementation would be more efficient", does this refer to efficiency in computing terms, storage, or both? Also, the authors should comment on the storage requirements of their 3D approach. Since the pseudo-tomograms have three components (Eqs. 8-10), does this mean that they require 3x more storage compared to traditional subtomogram averaging? This would be an important practical consideration for prospective users of this tool and should be adequately discussed.

10) In general, the use of the term "particle polishing" could lead to misinterpretations since it means different things in SPA and tomography. In SPA, "polishing" refers to the analysis of intermediate vidoe frames (which also exist in tomography), but the present study doesn't consider them at all. Also, I noticed that the term "frames" is used interchangeably to refer to SPA "vidoe frames" and "particle tilts" in tomography, which adds to the confusion. For example, the term "frame alignment" is mentioned twice on page 6, but I believe the authors are really referring to "tilt image alignment" instead. This should be clarified and the terms "frame" and "tilts" should be used consistently throughout the manuscript.

(11) For the Caulobacter crescentus S-layer reconstruction, the new cryo-ET map corresponding to the inner domain of the S-layer seems to have even better resolution than the SPA reconstruction reported in the original work [26] (3.5A vs. 3.7A for EMD-10389). Since the authors say these maps are in "excellent agreement", a quantitative comparison should be included. Was the second LPS binding site visible in the 3.7A SPA map? In Figure 2d, the fit between LPS and the cryo-ET density (right-hand side) seems to be of lower quality (i.e., no density for parts of the LPS model is seen). Without having access to the maps, it is not possible to properly evaluate these results.

(12) The paper is well-written and clear, but the description of the new concepts and evaluation of the method is somewhat minimalistic. While these may not be absolutely necessary, they will facilitate better understanding by a broader user community.

(i) A graph describing the workflow as implemented in RELION, e.g., as a flow chart, would increase the readability of the paper.

(ii) A figure visually describing the pseudosubtomograms would make the concept clearer to the reader.

(iii) A more in-depth description of the input files (alignment files from imod, particle lists, and orientation from Dynamo/elsewhere) needed to perform alignments and averaging in RELION using pseudosubtomograms would be beneficial to the community. This could be incorporated into a flowchart as suggested in (i).

13) Along the same lines, the authors do not include any description of the particle-picking procedures or the software requirements. This seems to be important as the authors state "(particles) relative defoci are known from the geometry of the tilt series and the known 3D positions of the particles in the tomogram".

14) In section 2.2 it is shortly discussed that rotational priors can be used to constrain orientational searches. As this is often very useful for particle alignment in cryo-ET, this and the implementation should be described in more detail.

15) The progress in map quality is only shown by improvements to resolution in the FSC for the test datasets. It would be beneficial to also show the initial and intermediary maps for at least one of the test cases so that the reader can better assess the improvement in map quality. Overall maps should be shown to evaluate performance on the "whole particle" level, given considerations of flexibility (especially in cases where focused refinement was used). Corresponding local resolution maps would be informative in such cases.

(16) For the immature HIV capsid data, the FSC curves are only shown for the small subset of data. As it is an important point that the RELION pseudosubtomogram approach can reach the same resolution as previous software such as M, the data should also be shown for the full dataset (authors write "data not shown").

(17) In general, a proper comparison to the previous approaches/results, maybe by providing the maps for comparison in the figures, would be helpful for the reader to evaluate which approach to choose, and based on what expectations/biological system.

18) In section 4 it is briefly discussed that particle rotation is not modeled in the per-particle motion correction. They draw parallels to single particles and conclude that this might be an issue at resolutions better than 2 Å. However, could the authors comment on whether the difference in dose and time frame of data collection as well as the thickness of sample, may affect the assumption that this is only an issue at a higher resolution?

(19) Many subtomogram averages are resolved at a more moderate resolution (e.g. 8-20 Å) than the maps presented in this study due to different reasons. It would be beneficial for the reader if the authors could comment on/discuss which optical and geometrical parameters can be confidently modeled at a lower resolution and if they improve resolution.

(20) Could the authors comment on whether it is possible to extract the refined optical and geometrical parameters for a set of particles from a tilt series, and apply them to a different set of particles from the same tilt series (or to the full tilt series to obtain a better quality tomogram for new particle picking).

(21) Along the same line: would it be possible to refine two distinct species (e.g. two distinct conformations of the same protein or two completely unrelated proteins) together in a multi-species refinement of optical and geometrical parameters? As the overall tilt series alignment and imaging parameters are the same, this might aid in refinement for lower abundance.

(22) On a more conceptual level, the reasoning to separate the SPA/Cryo-ET workflows completely in the software package remains unclear to me and it would be valuable if the authors could briefly discuss this.

(23) Can the authors comment on the "The data model assumes independent Gaussian noise on the Fourier components of the cryo-EM images of individual particles" with respect to the more complex in situ data, where noise is expected to be more structured?

https://doi.org/10.7554/eLife.83724.sa1

Author response

Essential revisions:

(1) Half the citations in the manuscript are self-references (16 out of 32!), and important/relevant work on SPT was left out and should be referenced: Himes et al., 2018 (https://doi.org/10.1038/s41592-018-0167-z), Chen et al., 2019 (https://doi.org/10.1038/s41592-019-0591-8), and Sanchez et al., 2019 (https://doi.org/10.23919/EUSIPCO.2019.8903041). Also, with the exception of their own sub-volume averaging approach [4], the vast body of work on missing-wedge compensation was not referenced in the introduction.

We have included the references above, and multiple references to alternative approaches for subtomogram averaging and approaches for missing-wedge correction. The revised manuscript now contains 53 references.

(2) In the section on "Orientation priors", they state that one advantage of the proposed approach is that "the coordinate system of the pseudo subtomograms themselves can be chosen arbitrarily" and that "this not only accelerates the refinement" but also "makes it possible to solve more challenging structures because fewer solutions are allowed". This advantage, however, is certainly not specific to the "pseudo-subtomogram" data representation proposed here. Indeed, this strategy was originally proposed in the context of single-particle cryo-EM by Jinag et al., 2001 (https://doi.org/10.1006/jsbi.2001.4376) and was first used for processing cryo-ET data by Forster et al., 2005 (https://doi.org/10.1073/pnas.040917810), followed by many others. This point should be contextualized and the appropriate references included.

This is a misunderstanding. We are not claiming that performing local orientational searches is new in our approach. Only the option to arbitrarily pre-orient the pseudo-subtomograms is new. This makes expressing local angular searches in terms of the Euler angles more straightforward. We have completely rewritten this section to reflect this more clearly; it now reads:

“Pre-oriented priors

Many proteins are organised in imperfect 2D arrays inside the tomograms, for example inside membranes or as part of capsid-like structures. Often, the individual protein molecules inside these arrays exhibit limited rotational freedom with respect to the surface normal of the array, although they may be able to rotate freely around that normal. This knowledge is often exploited in subtomogram averaging approaches through local orientational searches, e.g. see Forster2005retro. This not only accelerates the refinement, as fewer orientations need to be evaluated, it also makes it possible to solve more challenging structures because fewer solutions are allowed. In RELION, local orientational searches are implemented as Gaussian priors on the Cartesian translations and on the three Euler angles that describe rotations Scheres2012relion. One advantage of using pseudo subtomogram alignment is that the coordinate system of the pseudo subtomograms themselves can be chosen arbitrarily. By default, pseudo subtomograms are created in the same orientation as the tomogram, but the user can choose to orient them in a more meaningful way. For example, by constructing the pseudo subtomograms with their Z-axis parallel to the 2D array, using a rotational prior of approximately 90o on the tilt angle will limit the amount of rocking of the particles inside the array, while avoiding singularities in the definition of the Euler angles that occur when the tilt angle is close to 0o.”

(3) Results on the HIV-1 Gag benchmark dataset (EMPIAR-10164) have two components:

(1) processing of a "standard" subset consisting of 5 tilt-series, and (2) processing of the entire dataset (43 tilt-series). As far as I know, the best reconstruction obtained using the five tilt-series is a 3.3 A map obtained using emClarity (https://www.ebi.ac.uk/emdb/EMD-13354). Since only lower resolution results obtained with NovaCTF (3.9A) and Warp (3.8A) are cited in the text, my guess is that the authors were not aware of this newer result. Given that the resolution of the emClarity map (3.3A) is similar to the one presented here (3.2A), it would be useful to compare their reconstruction against emClarity's EMD-13354 (both in terms of map features and FSC).

We have included a reference to the new emClarity map in the Results section on the 5tomogram HIV data set. A comparison between our map and the emClarity map is now shown in Figure 1 —figure supplement 2. The RELION map is better than the emClarity map, which used a different, more optimistic mask for its original resolution estimation.

(4) When processing the entire dataset, a 3.0A resolution map was obtained which matches the resolution obtained by Warp/M [24]. This result, however, is "not shown" (page 5). Since this is the only benchmark dataset analyzed in this study and given that many other packages have analyzed the same data, this map should be presented together with the corresponding FSC curve, data processing details, and comparisons made against M's 3.0A EMD-11655 map.

We now show these results and a comparison with the map from M in Figure 1 —figure supplement 3. The two maps are very similar.

(5) No experiments are presented to validate the ability of the approach to correct for higher-order aberrations. Since "higher-order aberration correction will probably be most useful for reconstructions that extend beyond 3A resolution", why didn't they validate their approach using one of the sub-3A cryo-ET datasets available on EMPIAR (e.g., EMPIAR-10491)?

Higher-order aberration correction has now been used on a sub-2A resolution apo-ferritin test data set from a different collaborator. “Unfortunately”, their scope was so well-aligned that not even at this resolution aberration correction led to an improvement. Therefore, and because these collaborators are planning on writing their own publication about these results, we decided not to include these data in our revision.

We merely describe aberration correction for completeness and to make our potential readers aware of the available tools, as we do not intend to write a separate paper on this functionality. We also note that the application of fast, multi-shot data acquisition techniques for tomography that use beam shifts (like FISE and BISECT) may lead to wider applicability in the future, or least for those who do not align their microscope optimally…

(6) It would be useful to include a table with a summary of all the relevant data collection/data processing parameters (e.g., pixel size, detector, number of tilt-series and sub-volumes, symmetry, resolution, number of refinement rounds, etc.) for all datasets analyzed in this study.

We now include data acquisition details for each data set in Table 1.

(7) Only the 3.8A map for the COPII inner coat has been deposited in the EMDB. All other maps and related metadata need to be deposited as well.

The 5-tomo (EMD-16207) and 43-tomo (EMD-16209) maps of the HIV capsid lattice and the map and model for the Caulobacter crescentus S-layer (EMD-16183, PDB 8bqe) have now been submitted. We have also submitted the model for the new COP-II map (PDB-8bsh).

(8) Computational complexity is a very important topic in the context of SPT, but no details are included in the text other than the following statement: "Compared to M, RELION-4.0 uses computationally more efficient algorithms that do not require the computational power of a GPU". Does this imply that computationally inefficient algorithms require the use of GPUs? If the authors want to comment on computational complexity (which I think would be very important to do), actual numbers need to be presented, such as running times, compute resources used, storage, etc.

We did not imply this. We have rephrased the statement about GPU to:

“Compared to M, RELION-4.0 uses computationally more efficient algorithms; M uses GPUs to accelerate the calculations.”

To give better insights into the computational costs, we now include details of image processing requirements for all three data sets in Table 2.

(9) Regarding the use of a "true 2D approach" vs. "pseudo-volumes" (page 6), the following claim is made: "For example, the current implementation could be used to process pairs of tilted images, but a 2D implementation would be more efficient", does this refer to efficiency in computing terms, storage, or both? Also, the authors should comment on the storage requirements of their 3D approach. Since the pseudo-tomograms have three components (Eqs. 8-10), does this mean that they require 3x more storage compared to traditional subtomogram averaging? This would be an important practical consideration for prospective users of this tool and should be adequately discussed.

Yes, this is a good point, it would affect storage too, as the CTF2 and multiplicity volumes take up space as well. To explain this better, the revised text now reads:

“The use of 3D pseudo-subtomograms allowed re-using existing code for subtomogram averaging in RELION, while input stacks of 2D images would have required significant software development efforts. Nevertheless, in the future we might still choose to implement a true 2D version of the code, which would be more efficient, both in terms of processing time and disk storage requirements. In cases where the number of tilt images is small in comparison to the box size, fewer Fourier pixels need to be examined in a stack of 2D images than in a pseudosubtomogram, with a corresponding decrease in processing time. Moreover, the likelihood calculation from the 3D pseudo-subtomogram approach requires separate storage of the accumulated squares of the CTFs, and the corresponding multiplicity terms. In contrast, in the 2D approach, only the 2D images need to be stored, as CTF parameters can be calculated on-the-fly and there is no need for a multiplicity term, giving a corresponding decrease in storage requirements. However, if one were to collect tilt series with very fine angular increments or in a continuous manner, then the current implementation may still be preferable”.

10) In general, the use of the term "particle polishing" could lead to misinterpretations since it means different things in SPA and tomography. In SPA, "polishing" refers to the analysis of intermediate vidoe frames (which also exist in tomography), but the present study doesn't consider them at all. Also, I noticed that the term "frames" is used interchangeably to refer to SPA "vidoe frames" and "particle tilts" in tomography, which adds to the confusion. For example, the term "frame alignment" is mentioned twice on page 6, but I believe the authors are really referring to "tilt image alignment" instead. This should be clarified and the terms "frame" and "tilts" should be used consistently throughout the manuscript.

One complication here is that the terms “frame alignment” and “polishing” are already mentioned on the RELION-4.0 GUI and in its documentation. Still, we recognize the ambiguity of the word frame in our manuscript. Therefore, in all instances of the word “frame”, we explicitly refer to either “vidoe frame” or “tilt series frame” in the revised version.

That said, the actual program that performs the “frame alignment” can, in fact, also work with vidoe frames from the tilt series images. One problem here is that the regularization of the motions is not done separately for vidoe frames within one tilt image and vidoe frames of the other tilt images. This may explain why preliminary tests with the individual vidoe frames did not yield worthwhile improvements. In the end, we decided not to expose the vidoe-frame functionality to the end-user. We now explicitly mention in the Methods section:

“We also note that Bayesian Polishing in SPA describes particle motions between individual vidoe frames. Although our approach for tomography can also consider vidoe frames, the current implementation uses the same regularization of particle motions between vidoe frames within each tilt image as between the vidoe frames from other tilt images. Because preliminary tests showed limited benefits in considering the vidoe frames in this manner, only the functionality to model particle motions between the tilt series images were exposed on the GUI.”

11) For the Caulobacter crescentus S-layer reconstruction, the new cryo-ET map corresponding to the inner domain of the S-layer seems to have even better resolution than the SPA reconstruction reported in the original work [26] (3.5A vs. 3.7A for EMD-10389). Since the authors say these maps are in "excellent agreement", a quantitative comparison should be included. Was the second LPS binding site visible in the 3.7A SPA map? In Figure 2d, the fit between LPS and the cryo-ET density (right-hand side) seems to be of lower quality (i.e., no density for parts of the LPS model is seen). Without having access to the maps, it is not possible to properly evaluate these results.

All maps have now been submitted to the EMDB (also see point 7) and we have now included a comparison with the previous 3.7A map in Figure 2 —figure supplement 1. Please note that the 3.7 Å cryo-EM single-particle map was produced from data collected of an in vitro reconstituted specimen, rather than from the S-layer on native membranes on cells. The second LPS binding site was never observed in vitro, illustrating why in situ imaging is vital for obtaining physiologically relevant insights.

12) The paper is well-written and clear, but the description of the new concepts and evaluation of the method is somewhat minimalistic. While these may not be absolutely necessary, they will facilitate better understanding by a broader user community.

(i) A graph describing the workflow as implemented in RELION, e.g., as a flow chart, would increase the readability of the paper.

(ii) A figure visually describing the pseudosubtomograms would make the concept clearer to the reader.

(iii) A more in-depth description of the input files (alignment files from imod, particle lists, and orientation from Dynamo/elsewhere) needed to perform alignments and averaging in RELION using pseudosubtomograms would be beneficial to the community. This could be incorporated into a flowchart as suggested in (i).

(13) Along the same lines, the authors do not include any description of the particle-picking procedures or the software requirements. This seems to be important as the authors state "(particles) relative defoci are known from the geometry of the tilt series and the known 3D positions of the particles in the tomogram".

This paper describes the 3D pseudo-subtomogram approach with its correspondingly modified refinement algorithm, as well as the new functionality to perform CTF refinement and tilt series frame alignment in RELION-4.0. Unfortunately, RELION-4.0 does not yet provide a smooth pipeline for the entire processing procedure that would be captured well in a flow chart. Much improved import procedures and a smoother pipeline are the topic on ongoing work, which will be implemented in RELION-4.1. We have modified the Discussion to reflect this more clearly:

“However, tilt series alignment, tomogram reconstruction and particle picking are not yet part of the RELION workflow. Efforts to also implement solutions for those steps in a single tomography processing pipeline are ongoing and will be part of future RELION releases. Meanwhile, current import procedures rely on specific pre-processing operations in IMOD kremer1996computer, and particle coordinate conversion tools to use in RELION-4.0 are available for a range of third-party software packages pyle2022picking.”

(14) In section 2.2 it is shortly discussed that rotational priors can be used to constrain orientational searches. As this is often very useful for particle alignment in cryo-ET, this and the implementation should be described in more detail.

Performing local angular searches is done using RELION’s standard Gaussian-shaped priors on the Euler angles, which have been described previously. A reference to the corresponding paper has been inserted. Please also see our response to point (2).

(15) The progress in map quality is only shown by improvements to resolution in the FSC for the test datasets. It would be beneficial to also show the initial and intermediary maps for at least one of the test cases so that the reader can better assess the improvement in map quality. Overall maps should be shown to evaluate performance on the "whole particle" level, given considerations of flexibility (especially in cases where focused refinement was used). Corresponding local resolution maps would be informative in such cases.

We have inserted maps describing the stepwise improvement for the HIV immature capsid data set in Figure 1 —figure supplement 1.

(16) For the immature HIV capsid data, the FSC curves are only shown for the small subset of data. As it is an important point that the RELION pseudosubtomogram approach can reach the same resolution as previous software such as M, the data should also be shown for the full dataset (authors write "data not shown").

This has been done. See point (3).

(17) In general, a proper comparison to the previous approaches/results, maybe by providing the maps for comparison in the figures, would be helpful for the reader to evaluate which approach to choose, and based on what expectations/biological system.

See points (3), (4) and (11).

(18) In section 4 it is briefly discussed that particle rotation is not modeled in the per-particle motion correction. They draw parallels to single particles and conclude that this might be an issue at resolutions better than 2 Å. However, could the authors comment on whether the difference in dose and time frame of data collection as well as the thickness of sample, may affect the assumption that this is only an issue at a higher resolution?

The lower dose in tilt series images, and thicker samples will only make it more difficult to also model the rotations, as both will decrease signal-to-noise ratios in the images. The only advantage of tomographic data is that the distinct tilts provide complementary views of the same structure, which might benefit the rotational alignments.

(19) Many subtomogram averages are resolved at a more moderate resolution (e.g. 8-20 Å) than the maps presented in this study due to different reasons. It would be beneficial for the reader if the authors could comment on/discuss which optical and geometrical parameters can be confidently modeled at a lower resolution and if they improve resolution.

One would expect that shifts of the particles (during subtomogram alignment, but also during tilt series frame alignment) would still be reasonably well defined by such low-medium resolutions. Probably defocus refinement would be more difficult.

(20) Could the authors comment on whether it is possible to extract the refined optical and geometrical parameters for a set of particles from a tilt series, and apply them to a different set of particles from the same tilt series (or to the full tilt series to obtain a better quality tomogram for new particle picking).

This is not yet possible, but we have plans to implement this functionality in the future.

21) Along the same line: would it be possible to refine two distinct species (e.g. two distinct conformations of the same protein or two completely unrelated proteins) together in a multi-species refinement of optical and geometrical parameters? As the overall tilt series alignment and imaging parameters are the same, this might aid in refinement for lower abundance.

Not yet; also see point (19).

(22) On a more conceptual level, the reasoning to separate the SPA/Cryo-ET workflows completely in the software package remains unclear to me and it would be valuable if the authors could briefly discuss this.

The workflows are not separated completely. In fact, their implementation in RELION-4.0 is very much intertwined. This will aid newcomers to the field of tomography, who may already be familiar with SPA processing in RELION. The paper already makes this point.

23) Can the authors comment on the "The data model assumes independent Gaussian noise on the Fourier components of the cryo-EM images of individual particles" with respect to the more complex in situ data, where noise is expected to be more structured?

In neither SPA, nor tomography, is this assumption perfect. In both modalities, neighboring particles will introduce noise that is localized in real-space, and for which correlations thus exist in Fourier space. However, taking such correlations into account would be computationally difficult and expensive.

https://doi.org/10.7554/eLife.83724.sa2

Article and author information

Author details

  1. Jasenko Zivanov

    1. MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
    2. Laboratory of Biomedical Imaging (LIB), Lausanne, Switzerland
    3. BioEM lab, Biozentrum, University of Basel, Basel, Switzerland
    Contribution
    Conceptualization, Software, Formal analysis, Investigation, Methodology, Writing – original draft, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-8407-0759
  2. Joaquín Otón

    1. MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
    2. ALBA Synchrotron, Barcelona, Spain
    Contribution
    Conceptualization, Software, Formal analysis, Investigation, Methodology, Writing – original draft, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-2195-4730
  3. Zunlong Ke

    1. MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
    2. Max Planck Institute of Biochemistry, Martinsried, Germany
    Contribution
    Formal analysis, Investigation, Visualization, Methodology, Writing – original draft, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8408-850X
  4. Andriko von Kügelgen

    1. MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
    2. Sir William Dunn School of Pathology, University of Oxford, Oxford, United Kingdom
    Contribution
    Data curation, Formal analysis, Investigation, Visualization, Methodology, Writing – original draft, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-0017-2414
  5. Euan Pyle

    Institute of Structural and Molecular Biology, Birkbeck College, London, United Kingdom
    Contribution
    Software, Formal analysis, Investigation, Visualization, Methodology, Writing – original draft, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-4633-4917
  6. Kun Qu

    MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
    Contribution
    Formal analysis, Investigation, Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
  7. Dustin Morado

    1. MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
    2. Max Planck Institute of Biochemistry, Martinsried, Germany
    Contribution
    Software, Investigation, Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
  8. Daniel Castaño-Díez

    1. BioEM lab, Biozentrum, University of Basel, Basel, Switzerland
    2. Instituto Biofisika, Leioa, Spain
    Contribution
    Supervision, Funding acquisition, Project administration, Writing – review and editing
    Competing interests
    No competing interests declared
  9. Giulia Zanetti

    Institute of Structural and Molecular Biology, Birkbeck College, London, United Kingdom
    Contribution
    Formal analysis, Supervision, Funding acquisition, Investigation, Visualization, Writing – original draft, Project administration, Writing – review and editing
    Competing interests
    Reviewing editor, eLife
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-1905-0342
  10. Tanmay AM Bharat

    1. MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
    2. Sir William Dunn School of Pathology, University of Oxford, Oxford, United Kingdom
    Contribution
    Formal analysis, Supervision, Funding acquisition, Investigation, Visualization, Writing – original draft, Project administration, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-0168-0277
  11. John AG Briggs

    1. MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
    2. Max Planck Institute of Biochemistry, Martinsried, Germany
    Contribution
    Conceptualization, Formal analysis, Supervision, Funding acquisition, Investigation, Visualization, Writing – original draft, Project administration, Writing – review and editing
    For correspondence
    briggs@biochem.mpg.de
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-3990-6910
  12. Sjors HW Scheres

    MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
    Contribution
    Conceptualization, Resources, Data curation, Software, Formal analysis, Supervision, Funding acquisition, Validation, Investigation, Methodology, Writing – original draft, Project administration, Writing – review and editing
    For correspondence
    scheres@mrc-lmb.cam.ac.uk
    Competing interests
    Reviewing editor, eLife
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-0462-6540

Funding

UK Research and Innovation (MC_UP_A025_1013)

  • Sjors HW Scheres

UK Research and Innovation (MC_UP_1201/16)

  • John AG Briggs

European Research Council (ERC-CoG-2014 grant 648432)

  • John AG Briggs

European Research Council (ERC-StG-2019 grant 852915)

  • Giulia Zanetti

Swiss National Science Foundation (205321_179041/1)

  • Daniel Castaño-Díez

UK Research and Innovation (BBSRC grant BB/T002670/1)

  • Giulia Zanetti

European Research Council (ERC-AdG-2015 grant 692726)

  • Jasenko Zivanov

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We are grateful to the MRC-LMB EM facility for help with data acquisition and to Jake Grimmett, Toby Darling, and Ivan Clayson for help with high-performance computing. This work was funded by the UK Research and Innovation (UKRI) Medical Research Council (MC_UP_A025_1013 to SHWS; and MC_UP_1201/16 to JAGB), the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (ERC-CoG-2014, grant 648432, MEMBRANEFUSION to JAGB and ERC StG-2019, grant 852915 CRYTOCOP to GZ); the Swiss National Science Foundation (grant 205321_179041/1 to DC-D), the Max Planck Society (to JAGB) and the UKRI Biotechnology and Biological Sciences Research Council (grant BB/T002670/1 to GZ). TAMB is a recipient of a Sir Henry Dale Fellowship, jointly funded by the Wellcome Trust and the Royal Society (202231/Z/16/Z). JZ was partially funded by the European Union’s Horizon 2020 research and innovation program (ERC-ADG-2015, grant 692726, GlobalBioIm to Michael Unser). TAMB thanks the Vallee Research Foundation, the Leverhulme Trust, and the Lister Institute of Preventative Medicine for support.

Senior Editor

  1. Volker Dötsch, Goethe University, Germany

Reviewing Editor

  1. Edward H Egelman, University of Virginia, United States

Reviewer

  1. Alberto Bartesaghi, Duke University, United States

Publication history

  1. Preprint posted: February 28, 2022 (view preprint)
  2. Received: September 26, 2022
  3. Accepted: December 4, 2022
  4. Accepted Manuscript published: December 5, 2022 (version 1)
  5. Accepted Manuscript updated: December 6, 2022 (version 2)
  6. Version of Record published: January 5, 2023 (version 3)

Copyright

© 2022, Zivanov et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,057
    Page views
  • 283
    Downloads
  • 2
    Citations

Article citation count generated by polling the highest count across the following sources: PubMed Central, Crossref, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Jasenko Zivanov
  2. Joaquín Otón
  3. Zunlong Ke
  4. Andriko von Kügelgen
  5. Euan Pyle
  6. Kun Qu
  7. Dustin Morado
  8. Daniel Castaño-Díez
  9. Giulia Zanetti
  10. Tanmay AM Bharat
  11. John AG Briggs
  12. Sjors HW Scheres
(2022)
A Bayesian approach to single-particle electron cryo-tomography in RELION-4.0
eLife 11:e83724.
https://doi.org/10.7554/eLife.83724
  1. Further reading

Further reading

    1. Biochemistry and Chemical Biology
    2. Structural Biology and Molecular Biophysics
    Jiemin Shen, Azaan Saalim Wilbon ... Yaping Pan
    Research Article Updated

    Ferroportin (Fpn) is a transporter that releases ferrous ion (Fe2+) from cells and is important for homeostasis of iron in circulation. Export of one Fe2+ by Fpn is coupled to import of two H+ to maintain charge balance. Here, we show that human Fpn (HsFpn) binds to and mediates Ca2+ transport. We determine the structure of Ca2+-bound HsFpn and identify a single Ca2+ binding site distinct from the Fe2+ binding sites. Further studies validate the Ca2+ binding site and show that Ca2+ transport is not coupled to transport of another ion. In addition, Ca2+ transport is significantly inhibited in the presence of Fe2+ but not vice versa. Function of Fpn as a Ca2+ uniporter may allow regulation of iron homeostasis by Ca2+.

    1. Computational and Systems Biology
    2. Structural Biology and Molecular Biophysics
    Artur Meller, Jeffrey M. Lotthammer ... Gregory R Bowman
    Research Article

    The design of compounds that can discriminate between closely related target proteins remains a central challenge in drug discovery. Specific therapeutics targeting the highly conserved myosin motor family are urgently needed as mutations in at least 6 of its members cause numerous diseases. Allosteric modulators, like the myosin-II inhibitor blebbistatin, are a promising means to achieve specificity. However, it remains unclear why blebbistatin inhibits myosin-II motors with different potencies given that it binds at a highly conserved pocket that is always closed in blebbistatin-free experimental structures. We hypothesized that the probability of pocket opening is an important determinant of the potency of compounds like blebbistatin. To test this hypothesis, we used Markov state models (MSMs) built from over 2 milliseconds of aggregate molecular dynamics simulations with explicit solvent. We find that blebbistatin’s binding pocket readily opens in simulations of blebbistatin-sensitive myosin isoforms. Comparing these conformational ensembles reveals that the probability of pocket opening correctly identifies which isoforms are most sensitive to blebbistatin inhibition and that docking against MSMs quantitatively predicts blebbistatin binding affinities (R2=0.82). In a blind prediction for an isoform (Myh7b) whose blebbistatin sensitivity was unknown, we find good agreement between predicted and measured IC50s (0.67 mM vs. 0.36 mM). Therefore, we expect this framework to be useful for the development of novel specific drugs across numerous protein targets.