In situ single particle classification reveals distinct 60S maturation intermediates in cells

  1. Bronwyn A Lucas  Is a corresponding author
  2. Kexin Zhang
  3. Sarah Loerch
  4. Nikolaus Grigorieff  Is a corresponding author
  1. RNA Therapeutics Institute, University of Massachusetts Chan Medical School, United States
  2. Howard Hughes Medical Institute, Janelia Research Campus, United States

Abstract

Previously, we showed that high-resolution template matching can localize ribosomes in two-dimensional electron cryo-microscopy (cryo-EM) images of untilted Mycoplasma pneumoniae cells with high precision (Lucas et al., 2021). Here, we show that comparing the signal-to-noise ratio (SNR) observed with 2DTM using different templates relative to the same cellular target can correct for local variation in noise and differentiate related complexes in focused ion beam (FIB)-milled cell sections. We use a maximum likelihood approach to define the probability of each particle belonging to each class, thereby establishing a statistic to describe the confidence of our classification. We apply this method in two contexts to locate and classify related intermediate states of 60S ribosome biogenesis in the Saccharomyces cerevisiae cell nucleus. In the first, we separate the nuclear pre-60S population from the cytoplasmic mature 60S population, using the subcellular localization to validate assignment. In the second, we show that relative 2DTM SNRs can be used to separate mixed populations of nuclear pre-60S that are not visually separable. 2DTM can distinguish related molecular populations without the need to generate 3D reconstructions from the data to be classified, permitting classification even when only a few target particles exist in a cell.

Editor's evaluation

This paper explores the use of 2D high-resolution template-matching (2DTM) to locate and discriminate highly similar macromolecules within cryo-EM images of focused ion beam-milled cells. It demonstrates that differences in the 2DTM signal-to-noise ratios for located targets against multiple search templates can effectively segregate a mixed population of similar structures, as well as present a formal analysis strategy for probabilistic assignment of species within the mixed population. Because the identification of distinct structural states of macromolecular complexes inside the cell is a fundamental problem in 3D visual proteomics, this paper will be of broad interest to both structural and cell biologists.

https://doi.org/10.7554/eLife.79272.sa0

Introduction

Locating and characterizing molecules in cells is an important goal of molecular, structural, and cell biology. Cryogenic electron microscopy (cryo-EM) enables simultaneous visualization of all cellular molecules in their native cellular environment while preserving high-resolution molecular architecture. Therefore, cryo-EM holds the promise of delivering an atomistic view of the cell. However, realizing this promise is limited by the high density of molecules in a cell, and the low signal-to-noise ratio of cryo-EM micrographs, making it difficult to identify molecules of interest (Lučič et al., 2013). To address this, electron cryo-tomography (cryo-ET) can be used to build 3D maps of cellular structures in their native context (in situ) by reconstructing tomograms from a series of images of tilted 2D samples (Kürner et al., 2005; Lučič et al., 2013; Mahamid et al., 2016). In a tomogram, molecules overlapping in any given view can be separated and large molecular assemblies (particles) with distinctive shapes can be identified. Once identified, subtomogram averaging can yield in situ molecular structures at <4 Å resolution (Himes and Zhang, 2018; Tegunov et al., 2021). However, since the effective resolution of a raw tomogram is below 15–20 Å (Vilas et al., 2020), identification of specific targets in tomograms is limited to abundant particles that are sufficiently distinct at this resolution to be identified.

Many potential cell biological applications require accurate categorization of individual molecule identity at a specific subcellular localization. Examples are the characterization of the spatial organization of a biosynthetic process such as ribosome biogenesis, and the assignment of molecular identities in small volumes such as synapses and vesicles. 3D classification of subtomograms can differentiate between structural states (Himes and Zhang, 2018; Xue et al., 2021). However, the assignment of states is unreliable for similar structures that can only be distinguished using high-resolution detail, and statistical approaches to quantitatively assess classification results are lacking. Machine learning has been employed for particle classification in tomograms, but currently only performs as well as a human operator (Moebel et al., 2021). While machine learning algorithms perform better than 3D template matching at molecule localization in tomograms, classification remains challenging for all algorithms (Gubins et al., 2020). In situ molecule classification, therefore, remains a major challenge.

We recently described an alternate method to locate particles that may improve structural classification in cells. By using 2D cryo-EM images, rather than tomograms, and fine-grained, high-resolution template matching (2DTM), specific particles can be located in cells with high precision using their atomic structures (Lucas et al., 2021; Rickgauer et al., 2020; Rickgauer et al., 2017). 2DTM uses molecular models, from in vitro structure determination or in silico structure prediction (e.g. Alphafold2 Jumper et al., 2021) to generate a 3D density. This 3D density (hereafter referred to as the template) is then used to calculate millions of 2D projections representing different orientations of the molecule. A pixel-wise cross-correlation of the 2D projections with a high-resolution 2D cryo-EM image is performed, yielding a 2DTM signal-to-noise ratio (SNR) of the best matching projection at every pixel location (Rickgauer et al., 2017). The 2DTM SNR values are subjected to a significance test by comparison to a Gaussian noise distribution following a matched filter, which is used to establish a threshold allowing a given number of false positives (Lucas et al., 2021; Rickgauer et al., 2017). In the following, we refer to targets passing this test as significant targets (Lucas et al., 2021; Rickgauer et al., 2017).

The 2DTM SNR is proportional to template mass and negatively affected by non-matching elements between template and target (Lucas et al., 2021; Rickgauer et al., 2020; Rickgauer et al., 2017). We have shown that a template generated from a Bacillus subtilis 50S large ribosomal subunit was able to detect 50S in 2D cryo-EM images of Mycoplasma pneumoniae cells, but with a lower average 2DTM SNR compared to a M. pneumoniae 50S template (Lucas et al., 2021). This demonstrated that (1) 2DTM using partially matching templates can be sufficiently sensitive to yield significant targets and (2) the mean 2DTM SNR of detected targets provides a read-out of the relative similarity between different templates and populations of particle species.

In this study, we investigate whether the ratio of 2DTM SNRs obtained using different templates can be used to identify the template that more closely resembles each cellular target, and thereby classify single particles in cells. As a model system, we chose to examine the late stages of 60S ribosomal subunit biogenesis in the yeast Saccharomyces cerevisiae because (1) intermediates are of a similar size and share significant structure with one another, making them difficult to separate at low resolution, (2) molecular models spanning multiple late intermediate states have recently been described, and (3) the maturation events that occur before and after nuclear export have been characterized. Subcellular localization can thereby validate the assignment of intermediate and mature states.

We show that 2DTM can locate and distinguish nuclear intermediates of 60S maturation in 2D cryo-EM images of focused ion beam (FIB)-milled yeast cells. We confirm that 2DTM can distinguish predefined 60S populations separated by subcellular localization and identify compositional differences between them. We apply a maximum likelihood-based approach to identify two sub-populations of nuclear intermediates that were not otherwise separable and provide a confidence of single particle classification. We show that using this approach, we can observe a shift in the nuclear pre-60S intermediate population to a more mature intermediate after inhibiting Crm1-mediated nuclear export. This study demonstrates that different particle populations in cells can be modelled with a Gaussian distribution of their relative 2DTM SNR ratios to effectively distinguish related complexes and identify population changes in cells.

Results

2DTM identifies 60S in biologically relevant locations and orientations in FIB-milled lamellae

2DTM has been used to detect mammalian ribosomes in thin extensions of adherent cells (Rickgauer et al., 2020), and bacterial ribosomes in Mycoplasma pneumoniae cells (Lucas et al., 2021), both of which are sufficiently thin to permit imaging by transmission EM (TEM). Since most eukaryotic cells are too thick to image by TEM, FIB-milling is used to generate thin, electron-transparent lamellae of cryogenically frozen cells (Marko et al., 2007; Rigort et al., 2012; Villa et al., 2013).

To evaluate the utility of 2DTM to locate molecules in FIB-milled lamellae, we collected 28 2D cryo-EM images of the nuclear periphery of 7 lamellae generated from actively growing Sacchromyces cerevisiae cells (Figure 1, Figure 1—figure supplement 1A-B)(Table 1). We identified 4363 large ribosomal subunits by 2DTM using a template generated from a model representing the mature 60S (PDB: 6Q8Y) (Tesina et al., 2019; Figure 1A–C). The peaks corresponding to significant detections were clearly distinguishable from background (Figure 1D and E, Figure 1—figure supplement 1C), enabling precise localization of mature 60S in the cell.

Figure 1 with 3 supplements see all
Detection of cytoplasmic mature 60S and mitochondrial ribosomes in 2D images of FIB-milled yeast lamella.

(A) Cryo-EM like density generated using the atomic coordinates of PDB: 6Q8Y that correspond with the mature 60S. (B) TEM image of the nuclear periphery from a FIB-milled yeast lamella. Yellow arrows indicate low-resolution features in the cytoplasm that may indicate the presence of ribosomes. Blue arrows indicate regions of similar size and contrast in the nucleoplasm. NE: nuclear envelope; NPC: nuclear pore complex. (C) Cryo-EM micrograph of yeast nuclear periphery from FIB-milled lamella with the results from a 2DTM search using the mature 60S template. Significant targets are indicated by mapping the template in the best matching locations and orientations (shown in yellow). The red box indicates the regions highlighted in (E) and (F). Scale bar = 50 nm. (D) 3D surface representation showing the pixel-wise 2DTM SNRs in the 50x50 pixel region of the normalized maximum intensity projection (MIP) indicated by the yellow box in (E). Each square represents 10x10 pixels. Colors represent the SNR value of each pixel as indicated by the scale bar below. (E) Normalized MIP showing the results of 2DTM using the template in (A) in the region of (C) indicated in red. (F) 3D slab indicating the locations and orientations of mature 60S-detected targets in the indicated region of (C). The red polypeptide indicates the location of the polypeptide exit tunnel on each 60S. (G) Bar chart indicating the number of mature 60S-detected targets identified in the indicated subcellular compartments in 28 images of the nuclear periphery. (H) Plot showing the density of mature 60S in the regions of the images corresponding to the nucleus (blue) or cytoplasm (red). Each dot represents a different image. The solid bar indicates the median. (I) 10 Å filtered 3D reconstruction calculated from 3991 60S subunits at the locations and orientations detected in 28 images, showing clear density for the 40S small subunit. The molecular model of the 60S used to generate the template in (A) is shown in yellow.

Table 1
Estimated thickness and calculated defocus per image.
Lamella #Image fileEst thickness (nm)Defocus 1 (Å)Defocus 2 (Å)Angle
124_Mar11_13.30.44_1_0.mrc179557754347.9
125_Mar11_13.32.53_3_0.mrc1705943563653.07
250_Mar11_15.04.14_36_0.mrc1323626329823.33
251_Mar11_15.07.03_38_0.mrc1093944386316.91
252_Mar11_15.10.54_40_0.mrc1176491619210.89
253_Mar11_15.21.33_42_0.mrc205490646105.32
254_Mar11_15.24.09_44_0.mrc20577016828–76.32
255_Mar11_15.26.04_46_0.mrc2125141479423.45
256_Mar11_15.33.15_48_0.mrc1707240709382.03
257_Mar11_15.36.27_50_0.mrc17859695807–7.87
258_Mar11_15.38.03_52_0.mrc18664456182–75.84
3115_Mar12_10.39.05_93_0.mrc13634363230–22.72
3118_Mar12_10.46.25_99_0.mrc1413291320039.71
4131_Mar12_11.32.55_127_0.mrc1144532437715.85
4133_Mar12_11.37.56_131_0.mrc130325229797.24
4135_Mar12_11.42.53_135_0.mrc9330412977–76.83
4138_Mar12_11.50.12_141_0.mrc1063497340479.97
5141_Mar12_11.57.31_147_0.mrc16161535600–33.24
5143_Mar12_12.02.15_151_0.mrc15245524457–36.2
5146_Mar12_12.19.02_157_0.mrc983902386861.58
5147_Mar12_12.21.27_159_0.mrc1034305406616.81
5148_Mar12_12.23.52_161_0.mrc7337283604–36.33
5149_Mar12_12.26.15_163_0.mrc78369635730.37
5150_Mar12_12.28.45_165_0.mrc98372336071.6
5151_Mar12_12.31.16_167_0.mrc1143771364854.64
6168_Mar12_13.11.12_199_0.mrc14731573040–57.66
6171_Mar12_13.18.43_205_0.mrc902309210231.47
76 A_Mar11_14.59.49_34_0.mrc126521043912.1

To assess the specificity of 60S detection, we identified regions of the images corresponding to the cytoplasm, nucleus and vacuole by visual inspection. Consistent with the expected high specificity of 2DTM, we did not observe any significant mature 60S-detected targets in regions of the image corresponding to the vacuole (Figure 1C and G). In contrast, 229 mature 60S-detected targets localized to the nucleus, representing ~5% of all mature 60S identified targets in these images, well above the expected one false positive per image (Figure 1C–G). We found that the normalized maximum intensity projections (MIPs) have a distribution of values similar in shape and location of their maxima in different regions of the image, corresponding to the nucleus, cytoplasm, and vacuole. This indicates that the probability of a false positives is also similar in these different compartments (Figure 1—figure supplement 2A-E). The slight differences in the MIP distributions between the compartments likely result from differences in their composition. We conclude that false detections are not more likely in visibly darker or denser regions of the image.

In regions of the images corresponding to the cytoplasm we observe a median density of ~6500 60S/μm3, which, assuming an average cell volume of ~42 μm3 of which ~65% is cytoplasm, corresponds to a total of ~180,000 60S/cell (Figure 1H). This is consistent with prior estimates of 187,000 ± 56,000 ribosomes per yeast cell based on rRNA concentration (von der Haar, 2008). The median density in the nucleus was ~30-fold lower at ~200 60S/µm3 (Figure 1H). This agrees with the previously observed lower density of 60S particles in the nucleus, e.g. (Delavoie et al., 2019), but likely only reflects a subset of the nuclear 60S population.

Beyond the subcellular distribution of mature 60S-detected targets, we also confirmed that 2DTM identified specific 60S in biologically relevant locations and orientations. The nuclear envelope (NE) is contiguous with the endoplasmic reticulum and a known site for co-translational transport of transmembrane and secretory proteins, while the vacuole is not known to be a site of translation. We found that mature 60S-detected targets were oriented with their polypeptide exit tunnels facing the cytoplasmic surface of the NE but were depleted from within ~20 nm of the vacuole (Figure 1C and F). This indicates that the orientation of 60S identified by 2DTM is unlikely to be an artefact introduced by features of the membrane in the image.

To confirm that the targets identified with the mature 60S template reflect ribosomes, we generated a 3D reconstruction using the locations and orientations of 3991 significant mature 60S-detected targets using standard single particle approaches as described previously (Lucas et al., 2021). In addition to the 60S, the 10 Å-filtered reconstruction shows density reminiscent of the 40S small ribosomal subunit (Figure 1I). This is consistent with many of the mature 60S detected targets representing a population of 80S ribosomes. Local resolution estimation shows that the resolution of the 40S is lower relative to the 60S (Figure 1—figure supplement 3B). This follows the expected positional heterogeneity of the 40S relative to the 60S when capturing ribosomes in a range of translation states (Freitas et al., 2021; Korostelev, 2022) and prior results using 50S Mycoplasma pneumoniae targets (Lucas et al., 2021). We conclude that 2DTM-identified locations and orientations in 2D cryo-EM images of FIB-milled lamellae reflect biologically relevant locations and orientations of ribosomes in the cell.

Relative 2DTM SNRs enable single particle classification in situ

The nuclear envelope (NE) creates a physical barrier that separates premature 60S in the nucleus from mature 60S in the cytoplasm and is easily distinguishable in many 2D images by its characteristic double membrane and by the more granular appearance of the cytoplasm vs the nucleus (e.g. Figure 1B). Our observation of a substantial population of mature 60S-detected targets in the nucleus, but not in the vacuole (Figure 1C and G), suggests that the nuclear 60S may result from cross-detection of nuclear precursors, which share part of their structure with mature 60S and therefore also produce significant correlations (Figure 2A). As a first step to differentiate between related 60S intermediates, we located precursor 60S by 2DTM searches using a template generated from a late nuclear intermediate (LN 60S, PDB: 6N8J) (Zhou et al., 2019; Figure 2A and B), and annotated each target by its subcellular localization. The LN 60S was chosen because it represents the most mature nuclear intermediate for which there is a structure, and which retains ribosome biogenesis factors (RBFs) that are removed during nuclear and early cytoplasmic processing (Figure 2A). Thus, we expect that (1) the similarities between the mature 60S and LN 60S structures will result in cross-detection of the respective other complex and (2) the cytoplasmic population will more closely resemble the mature 60S and nuclear population will more closely resemble the LN 60S resulting in a higher mature 60S / LN 60S 2DTM SNR ratio in the cytoplasm than the nucleus. In the 28 images of the nucleus and nuclear periphery we located 1651 significant LN 60S-detected targets of which 1382 (~84%) of the LN 60S-detected targets were cytoplasmic and 268 (16%) were nuclear, a three-fold greater proportion than the mature 60S-detected targets (Figure 2—figure supplement 1A). We identified more cytoplasmic than nuclear targets in 2DTM searches with both mature and LN 60S templates because (1) the cytoplasm represented a larger area of our images and (2) the concentration of 60S is expected to be higher in the cytoplasm relative to the nucleus. Only one of the significant LN 60S-detected targets localized to the vacuole, which is below the expected false positive rate and further indicates the specificity of 2DTM.

Figure 2 with 1 supplement see all
2DTM SNRs differentiate cytoplasmic mature 60S from nuclear pre-60S in 2D images of FIB-milled yeast lamella.

(A) Diagram showing the compositional changes that accompany the maturation from the late nuclear (LN) 60S (PDB: 6N8J), shown in blue, to the mature 60S (PDB: 6Q8Y), shown in yellow, in the cytoplasm. (B) Cryo-EM micrograph of yeast nuclear periphery from FIB-milled lamella with the results from a 2DTM search using the LN 60S template. Significant targets are indicated by mapping a projection of the template in the best matching locations and orientations (shown in blue). Scale bar = 50 nm. (C) As in (B), showing the results from a 2DTM search of the indicated image using the mature 60S as a template (yellow). (D) 3D surface representation showing the results of a 2DTM search with the LN 60S template in the 50x50 pixel region of the image in (B) highlighted in red. Each square represents 10x10 pixels. Colors represent the SNR value of each pixel as indicated by the scale bar below. (E) As in (D) showing the results of a 2DTM search with the mature 60S template in the corresponding region of the image in (C) highlighted in red. (F) As in (D) showing the results of a 2DTM search with the LN 60S template in the region of (B) highlighted in blue. (G) As in (D) showing the results of a 2DTM search with the mature 60S template in the corresponding region of the image in (C) highlighted in blue. (H) Diagram indicating the number of mature 60S (yellow) and LN 60S (blue)-detected targets identified in 2DTM searches of 28 images of the nuclear periphery. The overlap of the Venn diagram indicates the number of targets identified in both searches. (I) Bar chart indicating the number of targets detected by the mature 60S (yellow), the LN 60S (blue), and by both (black) in regions of the images corresponding to the nucleus or cytoplasm. (J) Plot showing the log2 2DTM SNR ratios for LN and mature 60S-detected targets grouped by subcellular compartment. Each dot indicates a 60S detected in both searches. ****: p<0.0001. (K) Image showing the identified targets color-coded by the best-matching template (blue: LN 60S, yellow: mature 60S) as determined by the higher 2DTM SNR at each overlapping location. Scale bar = 50 nm. (L) Pie chart indicating the proportion of all nuclear (left) and cytoplasmic (right) 60S targets that more closely resemble the mature 60S (yellow) or LN 60S (blue) template, respectively, as determined by the highest 2DTM SNR at each identified location and orientation.

As expected from the similarity between the mature and LN 60S templates, the locations of many of the targets identified in the two searches overlap (Figure 2B and C). We aligned the two sets of coordinates using the program align_coordinates (Lucas et al., 2021). Approximately one third of the mature 60S-detected targets overlapped with LN 60S-detected targets while 92% of the LN 60S-detected targets overlapped with mature 60S-detected targets (Figure 2H). Combining the results of both searches, 30% of the nuclear targets were LN 60S-detected only, compared to only 0.5% of the cytoplasmic targets (Figure 2I, Figure 2—figure supplement 1A).

Consistent with their expected localizations, the median log2(mature 60S / LN 60S 2DTM SNR) values of targets identified with both templates were significantly higher for the cytoplasmic population than the nuclear population (P<0.0001, K-S. test) (Figure 2D–G and J). We classified each target as LN or mature 60S according to the highest 2DTM SNR (Figure 2K). Of the population detected with both mature and LN 60S templates, 94% of the 1361 cytoplasmic targets have a closer match (higher SNR) with the mature 60S and 88% of the 171 nuclear targets have a closer match with the LN 60S (Figure 2J). Combining all 60S-detected targets, the nuclear 60S targets are now more clearly distinguished from the cytoplasmic population with 98% of the cytoplasmic targets annotated as mature 60S, and 60% of the nuclear targets annotated as pre-60S (Figure 2K and L). The ~40% of nuclear targets that more closely resemble the mature 60S likely reflect nuclear intermediates different from the LN 60S (see below) and thus do not perfectly match either template. We conclude that comparing 2DTM SNRs can effectively differentiate populations of related particles in situ.

Defining a confidence metric for single particle classification in situ

To gain an understanding of cell biology at molecular resolution it is necessary to be able to confidently assign particle identity to individual targets. We show above that the nuclear and cytoplasmic 60S populations were significantly different with respect to their relative similarity to the LN and mature 60S (Figure 2). We also show that classifying targets by their highest 2DTM SNR effectively separates the nuclear from the cytoplasmic population (Figure 2). However, a single threshold does not fully capture the differences between the nuclear and cytoplasmic populations and for an individual particle the confidence of classification is unclear.

To assign a confidence in the class assignments of detected particles we developed a likelihood-based approach to infer the probability of a particle deriving from one of a given number of populations. We sought to classify each of the 1531 LN and mature 60S-detected targets without prior knowledge of their subcellular localization. We restricted our analysis to the targets that were detected by both templates to limit the contribution from noise. We made the initial simplifying assumptions that: (1) each 60S identified more closely reflects either LN or mature 60S, that is, the number of classes needed to describe all detected targets is two; (2) the nuclear targets more closely resemble the LN 60S and the cytoplasmic targets more closely resemble the mature 60S. We therefore define the prior probability that a randomly selected detected target belongs to a specific population according to the number of targets detected in the nucleus and cytoplasm, respectively (Figure 2J, Figure 2—figure supplement 1A).:

P(targets in class 1) = P(Nucleus) = 0.11 and,

P(targets in class 2) = P(Cytoplasm) = 0.89.

We used a maximum-likelihood approach to model the log2(mature / LN 60S 2DTM SNR) values as a mixture of two Gaussians (Figure 3A, R2=0.993). The fit suggests a major population with a mean of 0.336, indicating that it more closely reflected the mature 60S, and a smaller population with a mean of –0.026, that slightly more closely reflected the LN 60S (Figure 3A). The means of the two fitted populations match the means of the cytoplasmic (0.335) and nuclear (–0.026) populations when considered independently (Figure 3B), indicating that our approach effectively separates the nuclear from the cytoplasmic populations. In this case, initializing the prior using the nuclear and cytoplasmic probabilities improved the fit and the agreement with the nuclear and cytoplasmic populations relative to no prior which gave means of –0.038 (class 1) and 0.239 (class 2) (Figure 3A vs Figure 3—figure supplement 1A,B).

Figure 3 with 2 supplements see all
Relative probability of detecting mature or LN 60S.

(A) Histogram showing the distribution of the log2(mature 60S / LN 60S 2DTM SNR) values for each LN and mature 60S-detected target fit with two Gaussians indicating populations 1 (blue dashed line) and 2 (red dashed line). The black line indicates the sum of the two Gaussians, R2=0.993. (B) Histogram showing the log2(mature 60S / LN 60S 2DTM SNR) values for each nuclear (blue) or cytoplasmic (red) LN and mature 60S-detected target. The two subcellular populations are plotted independently. Solid lines indicate Gaussian fits. (C) Line graph showing the probability that a given target belongs to the LN 60S population (blue) line, or mature 60S population (red), as a function of log2 2DTM SNR ratio. (D) Line graph showing the fraction of nuclear (blue) and cytoplasmic (red) targets classified as LN 60S, at the indicated confidence intervals determined using Equation 6. (E) Heat map showing the probability of each LN and mature 60S-detected target belonging to either the LN or mature 60S populations. Each row indicates a detected target, and the rows are sorted by their subcellular distribution. The targets assigned to the mature 60S population are indicated in yellow and the targets assigned to the LN 60S population are indicated in blue.

Using the Gaussian distribution model (see Materials and methods), we calculate the probability that a LN and mature 60S-detected target with a given log2(mature / LN 60S SNR) value derives from class 1 or class 2 via Bayes rule (Figure 3B–C). This analysis could easily be extended to cases where more than two templates are used in the search (see Materials and methods). A confidence threshold of 95% assigns ~18% of the nuclear targets and only ~0.2% of the cytoplasmic targets to class 1 (Figure 3C). Defining a threshold at 50% classifies ~61% of the nuclear targets as class 1 (nuclear) and 96% of the cytoplasmic targets as class 2 (cytoplasmic) (Figure 3C), consistent with the values determined using a threshold log2 value of 0 (Figure 2). The relative probability of each detected 60S belonging to either class can be readily visualized (Figure 3D). This shows that the 2DTM SNR ratio can effectively delineate populations of particles in cells based on their relative similarity to similar templates with a specified confidence for each particle assignment.

Relative similarity to alternate templates reveals population identity

We show above that different 60S populations in the cell can be separated by comparing their relative similarity to alternate templates (Figure 2). We also show that these populations can be identified from a mixed population by fitting Gaussians, allowing for the assignment of states with a given probability (Figure 3). We sought to investigate factors affecting the assignment of states.

Overfitting the template to background features (noise) could bias the classification of states. The bias in the observed 2DTM SNRs will depend on the degrees of freedom of aligning a template to a detected target. Assuming an error of not more than ±1 pixel in the x,y plane and ±one angular step in the three search angles, we have up to 243 possible ways to align a template to a detected target, allowing the template to partially align itself to noise. This leads to an average apparent noise SNR (SNRn) of 2ln243=3.31 (Grigorieff, 2000). In this rough calculation, we ignore the defocus as an additional degree of freedom because the signal for neighboring defocus search steps is highly correlated. Since the apparent signal from the aligned noise adds coherently with the true signal of the detected target, the observed SNRo=SNRs+SNRn , with SNRs the SNR generated by the signal. For the SNR threshold of 7.85 used in our study, this means that SNRs might only be 4.54, about 40% lower, a substantial difference. For higher observed SNRs, the percentage noise bias will be smaller. When several similar templates are used, as in the present study, the average noise bias in the observed 2DTM SNRs will approximately be the same, substantially reducing its effect on the SNR ratios and classification results. For two templates, the SNR ratio is given by

(1) SNRo,1/SNRo,2=(SNRs,1+SNRn,1)/(SNRs,2+SNRn,2)

which will be approximately invariant with SNRn for SNRsSNRn , that is, independent of the background level in the image. For SNR values closer to the detection limit, this ratio will exhibit a larger variance and biased towards 1 (log2 values will be biased towards 0), making it more difficult to unambiguously assign target identity.

Accordingly, we note above that the log2(mature / LN 60S 2DTM SNR) values of the cytoplasmic 60S population could be fit by a single Gaussian and clearly differentiated from the nuclear 60S population (Figure 3). This is striking because the 60S detections were combined from multiple images with differing SNRs. Moreover, the mean log2(2DTM SNR ratio) of the top 20% of targets was only ~10% different from the bottom 20% of targets based on their LN 60S 2DTM SNR (Figure 4A–B), consistent with the ratio of SNRs not depending strongly on the SNR values.

Simulations reveal identity of cytoplasmic 60S population.

(A) Scatter dot plot showing the 2DTM SNR of all cytoplasmic LN 60S and mature 60S-detected targets, grouped into five bins of 272 targets by the 2DTM SNR of the LN 60S-detected target, each representing 20% of the cytoplasmic targets. (B) Gaussian fits to histograms of the log2(mature 60S / LN 60S 2DTM SNR) representing the bins shown in (A), where bin 1 represents the bottom 20%, and bin 5 represents the top 20% of cytoplasmic targets by their LN 60S 2DTM SNR. (C) Scatterplot showing the 2DTM SNRs using the indicated templates to search simulated images of LN 60S (left) or mature 60S (right). (D) Histogram of the log2(mature 60S / LN 60S 2DTM SNR) values resulting from particle-wise comparison of the values in the simulation shown in (C). Solid lines indicate Gaussian fits for two distinct populations.

To test this further we generated simulated images of LN and mature 60S in ice alone, without overlapping proteins and other density (Figure 4C). We found that the mean log2 value of the mature 60S population in the simulation was 0.337 (Figure 4D), which agrees with the observed mean log2(mature / LN 60S 2DTM SNR) value of the cytoplasmic population (p=0.867, unpaired t-test). This result indicates that the cytoplasmic population closely matches the mature 60S template.

Our simulation suggests that, above a certain 2DTM SNR the mean log2 values can be predicted solely based on the templates, and a deviation from this ratio indicates that the detected targets have a significant mismatch with either template. In contrast to the cytoplasmic population, the nuclear population deviates from the predicted log2(mature / LN 60S 2DTM SNR) (Figure 4D). This indicates that targets in this population deviate significantly from both the mature 60S and the LN 60S templates. This provides further evidence that the nuclear population likely contains a more complex mix of maturation states (see below).

Ribosome biogenesis factors differentiate nuclear from cytoplasmic 60S

The nuclear and cytoplasmic 60S populations differ with respect to their relative similarity to the LN and mature 60S templates (Figure 2). Classification based on their log2 2DTM SNR ratios (Figure 3) and comparison with predicted ratios for the mature and LN 60S templates (Figure 4) identified the vast majority of the cytoplasmic targets as mature 60S. However, the mean log2 values of the nuclear 60S population were close to 0, indicating that the nuclear 60S population is distinct from both the LN and mature 60S templates. To investigate this further, we assessed the features of the two templates that distinguish the nuclear from the cytoplasmic populations.

Most of the mass difference between the LN and mature 60S templates results from proteins in the LN 60S that are absent in the mature 60S (Figure 5A–C). Notable exceptions are the proteins on the P-stalk which are present only on the mature 60S (Figure 5A–C, Figure 3A). Additionally, several rRNA helices on the intersubunit interface are in different conformations, specifically the L1 stalk, helix 38 and helix 89, which undergo conformational changes during maturation (Figure 5C). To identify which of these features distinguish nuclear from cytoplasmic 60S, we investigated the relative dependence of the 2DTM SNRs on the rRNA and proteins of the LN 60S template. We generated truncated LN 60S templates containing either rRNA or protein only and calculated the change in the 2DTM SNR for each template at each target relative to the full-length template (Figure 5D). The rRNA contributed 1.5 and 1.8-fold more to the 2DTM SNR of the nuclear and cytoplasmic targets, respectively, despite comprising only 1.25-fold more of the template mass (1004 and 800 kDa, respectively), than the proteins (Figure 5D). Indeed, 60% of the cytoplasmic targets and 34% of the nuclear targets were no longer significant when searching with the proteins alone. Comparing the nuclear and cytoplasmic populations shows that the 2DTM SNR of the LN 60S-detected cytoplasmic targets is less affected by the removal of the LN 60S proteins and more strongly affected by the removal of the rRNA from the template density (Figure 5D). This shows that the LN 60S proteins contribute positively to the SNR of the nuclear targets and negatively to the cytoplasmic targets and therefore differentiate the nuclear from the cytoplasmic 60S population.

Figure 5 with 1 supplement see all
Classification of cytoplasmic mature 60S and nuclear pre-60S by 2DTM corresponds with biologically relevant differences in the templates.

(A) The LN 60S (blue) and mature 60S (yellow) 2DTM templates aligned in UCSF Chimera. (B) LN 60S with difference map calculated using UCSF Chimera showing the density in the LN 60S template that is not present in the mature 60S template (red, transparent). (C) As in (B), showing the mature 60S with density that is not in common with the LN 60S template (red, transparent). (D) Boxplots showing the change in 2DTM SNR when only RNA (left) or protein (right) components of the LN 60S template are included, relative to the full-length template for each significant target. The targets are grouped by their subcellular localization. (E) Upper: LN 60S template with all ribosome biogenesis factors (RBFs) indicated in red. Lower: Boxplot showing the change in the 2DTM SNR of the nuclear (blue) and cytoplasmic (red) targets when all RBFs are removed, relative to the full-length LN 60S template in untreated cells, and when Crm1-mediated nuclear export is inhibited by treating Crm1 (T539C) cells with Leptomycin B (LepB). Box width indicates the interquartile range, the central line indicates the median and the whiskers indicate the range of 95% of the targets. (F) As in (E), for RBF Nog1. (G) As in (E), for RBF Nog2. ****: p<0.0001, ns: not significant (p>0.05).

Since the LN 60S represents a late intermediate of 60S maturation in which the rRNA is almost fully folded, RBF proteins on the LN 60S account for most of the difference with the mature 60S by mass (Figure 5A–D). To confirm that the SNR difference of nuclear LN 60S-detected targets and cytoplasmic mature 60S-detected targets is primarily due to the RBF proteins, we removed the RBFs from the LN 60S template and recalculated the SNR for each target. The removal increased the 2DTM SNR of the cytoplasmic targets, while decreasing the 2DTM SNR of the nuclear targets (Figure 5E), making the SNR values more similar. This is consistent with the nuclear population having these RBFs and the cytoplasmic population lacking the RBFs. We conclude that the differentiation of detected targets using the observed 2DTM SNRs reflects biologically relevant differences between them.

Nog2 lacking intermediates accumulate after inhibition of nuclear export

The two largest RBFs on the LN 60S are Nog1 and Nog2, together accounting for ~50% of the RBF mass (Figure 5F and G). During 60S maturation, Nog2 removal is required to permit binding of the nuclear export adaptor Nmd3 and Crm1-dependent export, and therefore Nog2 removal precedes nuclear export (Ho et al., 2000; Matsuo et al., 2014). In contrast, Nog1 is removed only upon export to the cytoplasm (Pertschy et al., 2007). In cells with active nuclear export, we find that removal of either Nog1 or Nog2 differentiates the nuclear from the cytoplasmic populations (Figure 5F and G, untreated cells). As a further test of differentiating different targets by 2DTM, we inhibited Crm1 mediated export by treating Leptomycin B (LepB) sensitive Crm1 (T539C) cells (Neville and Rosbash, 1999) with LepB and located 60S targets with LN 60S and mature 60S templates in eight images of FIB-milled lamellae. To assess the relative occupancy of Nog1 and Nog2 after Crm1 inhibition, we measured the change in 2DTM SNR after removal of all RBFs, and Nog1 or Nog2 alone. Consistent with LepB inhibiting export of pre-60S from the nucleus, we detected a higher density of pre-60S in the nucleus than in cells with active Crm1 (Figure 5—figure supplement 1A, Figure 6F). When nuclear export is inhibited, all RBFs (Figure 5E) and Nog1 alone (Figure 5F) still differentiate the nuclear from the cytoplasmic populations. In contrast, the occupancy of Nog2 is no longer significantly different between the nuclear and cytoplasmic populations (Figure 5G). This is consistent with a model in which, when Crm1-mediated export is active, nuclear intermediates are rapidly exported after removal of Nog2, depleting the Nog2-lacking population from the nucleus. In the presence of a Crm1-inhibitor, the late, export competent nuclear intermediate lacking Nog2 can no longer be exported and therefore accumulates. Since Nog1 is only removed after export, inhibition of export did not change the occupancy of Nog1 on the maturing 60S. This demonstrates that comparing 2DTM SNRs is sufficiently sensitive to assess the changes in the occupancy of individual proteins on 60S complexes in situ.

Figure 6 with 2 supplements see all
Classification of nuclear targets by relative similarity to early or late nuclear intermediates.

(A) Venn diagram showing the number of significant targets detected in 2DTM searches with the indicated templates. Overlap indicates targets identified in two or more searches. (B) Venn diagrams showing the number of significant targets detected in 2DTM searches with the indicated templates in the nucleus (left) and cytoplasm (right). (C) Scatterplot showing the log2(EN 60S / LN 60S 2DTM SNR) values relative to the log2(mature 60S / LN 60S 2DTM SNR) values for each EN, LN and mature 60S-detected target. Ellipses indicate the fits of three Gaussians and each concentric ellipse indicates one standard deviation from the mean. Each target is colored according to its most likely class membership. (D) Heat map showing the probability of each of the targets examined in (C) belonging to one of the populations, EN, LN or mature 60S. Targets are grouped by their subcellular localization, followed by their classification as EN 60S (purple), LN 60S (light blue), or mature 60S (yellow). (E) Cryo-EM micrograph of the yeast nuclear periphery from a FIB-milled lamella shown in in Figure 1, displaying the results of 2DTM searches, colored by their classification as mature 60S (yellow), LN 60S (blue) or EN 60S (purple) based on their relative 2DTM SNRs. (F) Boxplot showing the nuclear density of EN 60S (purple), LN 60S (light blue), and mature 60S (yellow) detected targets before classification in the indicated cells. (G) Bar chart showing the proportion of the LN 60S-detected targets in the indicated cells that are classified as LN 60S (blue), mature 60S (yellow), or EN 60S (purple).

Classification of nuclear pre-60S intermediates

Ribosome biogenesis is a highly efficient molecular assembly line, and multiple intermediate states co-exist in the cell (Warner, 1999). Therefore, the nuclear population of pre-60S is unlikely to represent a single intermediate population. Accordingly, the distribution of the mature 60S / LN 60S SNR ratios of nuclear mature and LN 60S-detected targets fits a single Gaussian more poorly than the cytoplasmic targets (Figure 2J), and the mean log2 value is close to 0, suggesting that additional nuclear populations, distinct from either template, were identified with both 60S templates. To test this prediction and investigate the nuclear pre-60S population further, we generated a third template corresponding to an earlier nuclear intermediate (EN 60S). EN 60S (PDB: 3JCT) retains internally transcribed spacer RNA 2 (ITS2) and associated proteins and has 5S rRNP in a premature state rotated 180° relative to the LN and mature 60S (Figure 6A; Wu et al., 2016). We identified 679 significant EN 60S-detected targets of which 545 (~80%) were also identified with the LN 60S template, and 489 (72%) were also identified with the mature 60S. All of the 489 EN 60S-detected targets identified with the mature 60S were also identified with the LN 60S (Figure 6A). 289 (43%) of the EN 60S-detected targets localized to regions of the images corresponding to the nucleus, similar to the 268 nuclear LN 60S-detected targets, while only 390 were cytoplasmic, >3 fold fewer than located with the LN 60S template, consistent with the EN 60S representing a less mature nuclear intermediate (Figure 6B). The number and localization of targets identified with 2DTM is consistent with their sequence in the maturation pathway, progressing from EN 60S to LN 60S in the nucleus to mature 60S in the cytoplasm.

Cross-detection of targets by different templates can be used to detect heterogeneity in target populations. When examining the SNR ratios of targets identified by both EN and LN 60S, the cytoplasmic targets display a distribution that is consistent with a single population that more closely resemble the LN 60S template (Figure 6—figure supplement 1C, red). The distribution of nuclear targets, however, was consistent with at least two populations (Figure 6—figure supplement 1C, blue), each of which is distinct from the cytoplasmic population. A similar pattern was observed when comparing the relative similarity of each identified 60S to the EN 60S or mature 60S (Figure 6—figure supplement 1D). This indicated the presence of at least two nuclear populations that differ with respect to their relative similarity to the EN, LN and mature 60S templates.

We next sought to classify the EN, LN and mature 60S-detected targets based on their relative similarity to the three 60S templates. For each target we calculated the log2(mature 60S / LN 60S SNR) and log2(EN 60S / LN 60S SNR) values. We used these values to classify each target based on the relative similarity to the three templates using the maximum-likelihood approach discussed above (Figure 6C). We found that, consistent with their expected subcellular distributions, targets assigned to the mature 60S population represented 315 (85%) of the cytoplasmic targets and only 1 (<1%) of the nuclear targets detected by all three templates (Figure 6D). In contrast, the EN 60S population represents 83 (70%) of the nuclear population and only 4 (~1%) of the cytoplasmic population detected with all three templates (Figure 6D). The LN 60S population was roughly evenly distributed between the nucleus and the cytoplasm, consistent with this structure representing a late maturation intermediate (Figure 6D). These results can be visualized by annotating each detected 60S target with the template it most closely resembles (Figure 6E, Figure 6—figure supplement 2).

The NE provides a convenient visual control for the classification of targets as LN / EN 60S or mature 60S (e.g., Figure 1). However, there are no clear features in the nucleoplasm that would enable visual separation of different populations of nuclear intermediates and thereby confirm their classification. To validate our classification of the nuclear pre-60S populations, we identified conditions wherein the relative occupancy of the two states would be expected to change. We show above that inhibiting Crm1-mediated export results in accumulation of nuclear intermediates that lack Nog2 (Figure 5). In cells with active Crm1, there are similar numbers of nuclear EN, LN and mature 60S detected targets (Figure 5F) and 57% of the nuclear 60S targets are assigned to the EN 60S population (Figure 6G). After inhibition of Crm1-mediated export, the EN 60S population is mostly depleted, while the density of nuclear LN and mature 60S-detected targets increases (Figure 6F), and >90% of targets are assigned to the LN 60S population (Figure 6G). This confirms that 2DTM SNR ratios can be used to effectively classify mixed populations of particles in cells.

Discussion

The immense potential for cryo-EM to reveal the molecular detail of biological processes in cells is currently largely unrealized. One of the major bottlenecks is the lack of reliable, quantitative methods to locate and characterize molecules in cells. Here, we describe the application of 2DTM to in situ particle classification. By considering the relative 2DTM SNRs of alternate templates at a single location and orientation, we separate 60S precursors in the nucleus from mature 60S in the cytoplasm. We also show that a maximum likelihood approach effectively classifies a mixed population of nuclear pre-60S into at least two maturation states with a specified confidence for each particle. We show that 2DTM can be used to probe the composition of complexes in situ by modifying 2DTM templates. In this study we extend the utility of 2DTM beyond a binary indicator of detection to provide a quantitative assessment of particle identity.

2DTM enables specific molecule localization in the dense interior of cells

Cryo-FIB milled eukaryotic cells are sufficiently well preserved to allow imaging with cryo-ET (Mahamid et al., 2016) and subtomogram averaging to yield 3D reconstructions at resolutions of >~12 Å, e.g. (Schaffer et al., 2019). However, before the present work it was unclear if the milling preserves the high-resolution signal in these samples sufficiently well to allow for particle detection with 2DTM. Our results clearly show that FIB-milling is compatible with molecule localization by 2DTM. This expands the application of 2DTM to previously inaccessible cell types and further demonstrates the utility of 2DTM for in situ structural biology.

In many images, 60S subunits detected by 2DTM also generate low-resolution contrast in the cytoplasm that is readily visible (Figure 1B, yellow arrows). In the nucleoplasm, the similar density of RNA and DNA impedes the visual identification of all but a few pre-60S (Figure 1B, blue arrows). However, the reduced low-resolution contrast does not preclude effective detection of pre-60S with 2DTM. This is in contrast to particle localization in tomograms, wherein detection depends more strongly on low-resolution contrast and recognizable shapes.

The ability to distinguish particles in crowded molecular environments is a major advantage of 2DTM relative to cryo-ET, which currently suffers from strong attenuation of high-resolution signal (large B-factors) in the raw tomogram (Schur et al., 2016). 2DTM may enable localization of molecules in other dense environments such as liquid-liquid phase separated granules, which remains challenging for cryo-ET despite success in some cases (Erdmann et al., 2021). Our results confirm that 2DTM is an effective method to localize molecules in dense regions of the cell even when the molecules cannot be distinguished by eye.

2DTM enables single particle classification in situ

In previous work we and others have demonstrated that, when comparing populations of molecules, the average 2DTM SNRs reflect the relative similarity of different templates to the target populations (Lucas et al., 2021; Rickgauer et al., 2020). In this study, we extend this observation to show that the relative 2DTM SNRs of aligned templates at a specific location and orientation can be used to calculate the relative probabilities of a target belonging to a specific particle population.

Of the nuclear targets identified with the mature 60S, ~50% were also detected with the EN 60S, all of which were also detected with the LN 60S (Figure 6B). When calculating the relative similarity to the three 60S templates, the EN 60S and mature 60S population were clearly distinct, with mean 2DTM SNR ratios more than three standard deviations apart (Figure 6C). The maximum likelihood estimation of Gaussian distributions enables quantitative classification even when particle populations are less distinct, by yielding relative probabilities for each detected target belonging to one of a given number of populations (e.g. Figures 3 and 6). The observed shift in the nuclear population towards a more mature intermediate after inhibition of nuclear export provides a biological control that validates our assignment of states.

In this study, we effectively classify at least three populations of 60S maturation states from a population of <500 molecules (Figure 6). This means that given sufficient abundance of the target, it will be possible to distinguish populations based on data from a single image (Figure 2—figure supplement 1D). This contrasts with more traditional (reference-free) methods used to classify subtomograms and single particles, which require hundreds to thousands of particles to generate the class averages needed for particle assignment. 2DTM allows single molecule classification from fewer images, and therefore enables more information to be extracted from images collected from cells and purified samples (single-particle cryo-EM).

Confidence metric for single particle classification in situ

Calculating the confidence in class assignment of individual particles will aid interpretation of the results of 2DTM in situ. One major difference between in situ cryo-EM and single-particle cryo-EM is the type of biological information that is obtained. In single-particle cryo-EM, the goal is to generate high-resolution maps and establish the arrangement of atoms within a complex in different functional states, and to use this information to discern its molecular mechanism. In this case, B-factors and other metrics can be used to indicate uncertainty about an atomic coordinate, which aids interpretation of the model built into the map. In the cell, each individual instance of a complex may be in a different context relative to other similar molecules. For example, particles might be in different subcellular compartments such as the nucleus or cytoplasm or, as a more extreme example, a single particle within a nuclear pore exists in a very different context than particles in the nucleoplasm. For structural cell biology applications, therefore, it is useful to define a metric to establish the confidence of single particle classification. In this study, we show that a maximum likelihood approach using Gaussian fits to log2 2DTM SNR ratios of alternate templates at a specific subcellular location and orientation can be used to calculate the relative probability of a single particle deriving from one of a given number of classes. This provides a quantitative metric to establish confidence in the assignment of single particles that will aid in the biological interpretation of cellular cryo-EM maps.

2DTM templates as computational molecular probes

A major challenge in biological cryo-EM is the retrieval of detailed structural information of inherently flexible and heterogeneous macromolecules from noisy images collected at low dose to limit radiation damage. In single particle cryo-EM, this problem is addressed by averaging images of thousands of purified molecules to identify different structural states at high resolution. By averaging images of many identical copies of a particle, novel structures can be discovered, and this is a clear strength of this approach. However, since most complexes have a low abundance in the cell, the utility of this approach for in situ structural biology is limited to all but the most abundant complexes.

2DTM presents an alternate approach to using the signal in noisy images to gain insight into the structural states of molecules. In this approach, a noise-free template represents a hypothesis that a particle of a given conformational and compositional state is present in the image, and this hypothesis can be tested by searching the image with the template, independent of how many particles the image contains. We demonstrate that by generating modified templates representing different hypotheses, we can directly assess the compositional and conformational states of ribosomal subunits in cells.

Provided the templates have similar molecular mass and shape and are aligned with each other, probing with multiple templates requires only a single initial exhaustive search with one of the templates. This can be followed by a simple evaluation of the cross-correlation coefficient for each additional template at locations and orientations of the detected targets in the initial search (Figure 5), thereby avoiding time-consuming searches for all templates. In future studies, this approach could be extended to assess the relative similarity of a target with respect to a library of alternate structures. Alternate templates could be generated in multiple ways, depending on the biological hypothesis being tested. To reveal compositional heterogeneity in situ, alternate structures could be generated that lack specific subunits of interest as shown in Figure 5. Additionally, to interrogate in situ conformational heterogeneity, templates could be generated from time points of molecular dynamics simulations.

Our finding that, unlike the 2DTM SNR, the mean log2 of SNR ratios of a population is not strongly dependent on image SNR, allowed us to determine whether a given model matches the data. This approach could vastly streamline validation of models from in vitro or in silico experiments using relatively few images. Hypothetically, if models covering the full conformational and compositional space could be generated, the likelihood of a particular structure could be calculated for individual molecules.

Overfitting and classification

Assuming additive Gaussian noise, we estimated that overfitting contributes substantially to the variance in the observed 2DTM SNRs from an individual search. The effect of overfitting is proportionally greater for lower 2DTM SNRs and less for higher 2DTM SNRs, making higher 2DTM SNRs more reliable. However, while overfitting can also affect the classification of targets using their log2(SNR ratios), the effect is substantially less. Overfitting will introduce some bias in the log2 values that may affect classification. Misclassification may occur when the signal SNRs are similar. At high 2DTM SNRs, the proportional difference in the log2 values will be small and are unlikely to affect the classification. At low SNRs close to the threshold overfitting will have a greater effect, making classification of this population more challenging. However, collectively these factors will be captured by modelling the populations with Gaussians. Using a Gaussian model to derive the probability of class membership, rather than using a single threshold, we account for the bias in the log2 values.

Addressing additional potential sources of error

In our study, we used the physical separation of nuclear and cytoplasmic 60S populations to develop and test in situ classification of targets by 2DTM. We found that there are several requirements to permit classification of related molecules by 2DTM. First, the molecular models must be aligned relative to one another resulting in a correlation peak at the same pixel in the image. Comparing SNR values resulting from global searches with different templates may be lowered by imperfect, off-grid rotational matches, potentially affecting 2DTM SNR ratios and hence, target classification. Differences in model quality may also affect the 2DTM SNR ratios, masking other differences of interest. In this study, the mature 60S template was generated using the atomic coordinates of the large subunit of the ribosome built into a map with an overall resolution of 3.1 Å (PDB: 6Q8Y) (Tesina et al., 2019). The large subunit of the ribosome is structurally less variable than the small subunit and local resolution estimates suggest that parts of the LSU map extend to ~2.5 Å (Tesina et al., 2019). The maps used to build the EN 60S and LN 60S subunits were reconstructed at 3.08 Å and at 3.5 Å resolution, respectively. The accuracy of the atomic coordinates of a model will depend on the resolution of the underlying density map. Moreover, the greater number of mature ribosome structures, relative to maturation intermediate structures, may provide more confidence in the atomic coordinates of the mature 60S. The putative higher accuracy of the mature 60S model, relative to the LN 60S model, may partially explain the closer agreement of predicted and observed SNR ratios of the cytoplasmic population, compared to the nuclear population. We expect that more accurate coordinates will result in higher 2DTM SNR values, which may affect target classification.

False positive detections could skew the classification results by introducing false additional populations. In our analysis we use a Gaussian noise model to apply a threshold that permits a single false positive per image (Lucas et al., 2021; Rickgauer et al., 2017), although this is likely to be an overestimate (Lucas et al., 2021). Since 28 images were searched, we expect up to 28 false positives of the 4363 mature 60S-detected targets (0.6%). Since we limited our analysis to targets detected by multiple templates, the false positive rate within these targets will be even lower. Assuming the worst-case scenario of 28 false-positives in the 489 targets detected by all three templates, this would represent a false-positive rate of 6% for the smallest class of EN 60S-detected targets. We therefore do not expect that false positives play a major role in the assignment of 60S states. For low-abundance complexes with a detection of only a few tens of targets, the relative proportion of false positives would be higher, and the detection threshold would therefore need to be adjusted to lower the false-positive rate. Future improvements to the 2DTM algorithm, such as integration of correlation peaks over all search parameters and peak profile fitting could reduce template overfitting and improve the separation of the signal and noise distributions (McDonough and Whalen, 1995) and the identification and classification of smaller or less abundant complexes.

The classification of structurally similar targets could be further improved by identifying and controlling the factors that affect the distribution of observed 2DTM SNR ratios for a given set of templates. Ideally, the mean ratio of SNR values for a set of templates and given target depends only on the structural differences between the templates, while the distribution of observed ratios is solely a function of the noise and background in the images and target orientation. However, factors that contribute to loss of signal such as sample thickness, radiation or FIB-milling damage, beam-induced motion, charging and movie frame alignment errors due to sample deformation all result in loss of high-resolution signal, making the 2DTM SNR ratios less sensitive to structural differences in the templates, biasing their log2 values towards 0 and increasing the variation due to increased relative noise resulting in different degrees of overfitting. The cell is a highly heterogeneous environment, and systematic local differences in noise, such as between subcellular compartments may also affect classification. In this case, context-specific biological controls are needed to validate the assignment of states. Further research is required to account for these factors and reduce the variance in 2DTM SNR ratios, thereby enabling classification of targets with smaller structural differences.

Additional intermediate populations

In the present study, we only considered three alternate 60S templates. We note that the Gaussian fits to the 2DTM SNRs of mature 60S and LN 60S-detected nuclear targets is imperfect, potentially indicating additional pre-60S populations (Figure 2—figure supplement 1C). Further examination of the observed 2DTM SNR ratios revealed the presence of at least one additional pre-60S population (Figure 6). The observed shift towards the more mature LN 60S intermediate when nuclear export was inhibited is evidence for detection of a biologically relevant pre-60S population (Figure 6G). We also observed a small population of cytoplasmic 60S targets with higher SNR values against the LN 60S template than against the mature 60S (Figure 6D). 60S maturation intermediates exit the nucleus in an immature form and complete maturation in the cytoplasm. Whether the cytoplasmic 60S with higher SNR values against the LN 60S template represent cytoplasmic intermediates or reflect the limits of our classification strategy requires further investigation. Future work using additional templates representing other intermediates of 60S maturation will reveal further details about the spatiotemporal organization of pre-60S intermediates in cells.

In this study, we identified an EN 60S population of nuclear 60S with the 5S rRNP in a premature state rotated 180° relative to the mature 60S, consistent with in vitro determined structures (Leidig et al., 2014). The presence of this complex during maturation in vivo has been difficult to establish. Our observation that this population accounts for more than half of the 60S identified in the nucleus argues that this is an on-pathway assembly intermediate. We also identified a nuclear LN 60S population. This population reflects a late intermediate that has already undergone 5S rotation and ITS2 removal, implying a temporal lag after 5S rotation and/or ITS2 removal, and subsequent export from the nucleus. To test these possibilities more thoroughly, future studies establishing the flux through the assembly pathway are needed. By freezing cells at different time points after inhibition of specific maturation steps, 2DTM could be used to study the kinetics of assembly and the flux through the assembly pathway.

Materials and methods

Key resources table
Reagent type (species) or resourceDesignationSource or referenceIdentifiersAdditional information
Strain, strain background (Sacchromyces cerevisiae)BY4741ATCCS288C
Strain (Sacchromyces cerevisiae)Crm1 (T539C)Neville and Rosbash, 1999
MNY8
Software, algorithmcisTEMGrant et al., 2018; Lucas et al., 2021https://cistem.org/development

Yeast cell culture and plunge freezing

Request a detailed protocol

Saccharomyces cerevisiae strains BY4741 (ATCC), or Crm1 (T539C) (MNY8 Neville and Rosbash, 1999 a gift from Michael Rosbash, Brandeis) colonies were inoculated in 20 mL of YPD, diluted 1/5 and grown overnight at 30 °C to an OD600 of ~0.5–1. The cells were then diluted to 10,000 cells/mL and 3 µL applied to a 2/1 or 2/2 Quantifoil 200 mesh Cu grid, allowed to rest for 15 s, back-side blotted for 8 s at 27 °C, 95% humidity followed by plunge freezing in liquid ethane at –184 °C using a Leica EM GP2 plunger. Frozen grids were stored in liquid nitrogen until FIB-milled. When indicated Crm1 (T539C) cells were additionally incubated at 30 °C with shaking in the presence of 200 nM Leptomycin B (Cell Signaling Technologies) for 30 min before applying to grids and plunge freezing.

FIB milling

Request a detailed protocol

Grids were transferred to an Aquilos cryo-FIB SEM, sputter coated with metallic Pt for 15 s then coated with organo-Pt for 10 s and milled in a series of sequential milling steps using a 30kV Ga +beam using the following protocol: rough milling 1: 0.1 nA rough milling 2: 50 pA lamella polishing: 10 or 30 pA at a stage tilt of 15° (milling angle of 8°).

Cryo-EM data collection

Request a detailed protocol

Lamellae were imaged using a Titan Krios 300 keV cryo-TEM (Thermo Fisher) equipped with a K3 direct detector (Gatan) and an energy filter (Gatan), slit width 20 eV at a magnification of 81000 x, corresponding to a sample pixel size of 1.06 Å, and a 100 μm objective aperture. A defocus of –0.5 μm was targeted using an adjacent sacrificial area and the autofocus function in SerialEM (Mastronarde, 2005). Movies were collected at an exposure rate of 1 e-2/frame to a total dose of 30 e-2.

Image processing

Request a detailed protocol

Images were processed using cisTEM (Grant et al., 2018) as described previously (Lucas et al., 2021), and using sample tilt determination implemented in a modified version of CTFFIND4 (Lucas et al., 2021; Rohou and Grigorieff, 2015) to estimate sample defocus and to account for the ~8° tilt of the lamella introduced during FIB-milling. Images of 3D densities and 2DTM results were prepared in ChimeraX (Pettersen et al., 2021).

2DTM

Request a detailed protocol

The molecular models noted in the text were aligned to one another to have the same origin using their 28S rRNA using the MatchMaker function in UCSF Chimera (Meng et al., 2006; Pettersen et al., 2004) and 2DTM templates were generated by simulating 3D densities (Himes and Grigorieff, 2021). 2DTM was performed using the program match_template in the cisTEM GUI (Lucas et al., 2021) using the default parameters. Significant targets were defined as those exceeding a threshold of 7.85, which allows a single false positive per image, assuming no correlation between the rotational or defocus searches. Prior comparisons with 3DTM results suggest that this threshold is conservative and that the true false positive rate is likely lower (Lucas et al., 2021). The coordinates were refined using the program refine_template (Lucas et al., 2021) in rotational steps of 0.1° and a defocus range of 200 Å with a 10 Å step.

3D reconstruction using mature 60S 2DTM coordinates

Request a detailed protocol

We used the program prepare_stack_matchtemplate (Lucas et al., 2021) to generate a particle stack using the locations and orientations of the significant mature 60S-detected targets after refinement as described above. We then used cisTEM to generate a 3D reconstruction from 3991 mature 60S targets detected in 28 images of the nuclear periphery, only including targets with a 2DTM SNR of >8. The reconstruction had a nominal resolution of 3.5 Å using an Fourier Shell Correlation (FSC) threshold of 0.143 and a mask radius of 175 Å (Figure 1—figure supplement 3A; Rosenthal and Henderson, 2003) that is expected to overestimate the resolution due to overfitting (Grigorieff, 2000; Lucas et al., 2021). To best capture the density in the 40S, we low-pass filtered the reconstruction to 10 Å, representing an FSC of 0.9. Local resolution calculations were performed using local_resolution in Phenix (Liebschner et al., 2019), default parameters and extending to 3.5 Å.

Calculating 2DTM SNR values and ratios of SNR values

Request a detailed protocol

Targets identified in two or more searches with aligned templates were identified using the program align_coordinates (Lucas et al., 2021). The 2DTM SNRs of targets identified in two or more searches were compared by taking the log2 of the SNR ratio. The log2 was used in place of the direct ratio because the shape of the distribution is independent of the order of comparison, except for a mirror around 0, while the distribution of the direct ratios shows more complicated behavior. Histograms of both the log2 values and direct ratios of the cytoplasmic 60S population have approximately Gaussian distributions with fits characterized by the coefficient of determination R2=0.993 and R2=0.991 respectively. To calculate the change in the 2DTM SNR with modified templates, the program refine_template (Lucas et al., 2021) was used to calculate 2DTM SNRs for additional templates using the locations and orientations from a previous exhaustive search with an initial template, without performing a rotational search by specifying the rotational step as 360°. To obtain consistent ratios of 2DTM SNRs, the 2DTM SNR values for both the initial template and the additional templates were calculated.

Calculating histograms from different subcellular regions

Request a detailed protocol

Subregions of the image in Figure 1B, corresponding to the nucleus, cytoplasm and vacuole were identified as indicated in Figure 1—figure supplement 2. Care was taken to avoid the edges of the image and regions of the image corresponding to membranes. The corresponding regions of the normalized MIP resulting from a 2DTM search using the mature 60S template were extracted using the clip resize function in IMOD (Kremer et al., 1996), defining a box of 1000x1000 pixels. Image histograms were calculated from the flattened array of pixel values extracted using the python suite mrcfile (Burnley et al., 2017) using only the 2DTM SNR values below the calculated threshold.

Calculating relative probabilities

Request a detailed protocol

Histograms were generated (bin 0.05) of the calculated log2 2DTM SNR ratios and Gaussians were fitted using GaussianMixture in sklearn (Pedregosa et al., 2011). Based on the shape of the histogram, we model the log2 2DTM SNR ratios as a mixture of K-component multivariate Gaussian distributions, when K templates are used in the search. We fit Gaussians to the log2 SNR ratios of any two selected templates. Each target i is then associated with K-1 such SNR ratios xi . For example, for K=4, we can define the following:

(2) Xi=[log2(SNRi,k=1/SNRi,k=2)log2(SNRi,k=1/SNRi,k=3)log2(SNRi,k=1/SNRi,k=4)]

For particles belonging to the same population (class), the log2 SNR ratio can be described by the multivariate Gaussian probability density function (PDF):

(3) P(Xi|Θk,Zi=k)N(Mk,Σk)=1(2π)d2|Σk|12exp((XiMk)TΣk1(XiMk)2)
(4) P(zi=k)=πk

where Xi is a vector of K-1 log2 SNR ratios, zi indicates the identity of the target (k=1,2,,K), and Θk=Μk,Σk,πk is the set of parameters of the Gaussian PDF N and the prior probability that a detected target belonging to class k. The total joint likelihood for N detected targets is then

(5) L(Θ;X)=P(X|Θ)=Πi=1NP(Xi|Θ)=Πi=1NΣj=1KπjN(Mj,Σj)

with Θ=Θ1,Θ2ΘK and X=X1,X2XN .

We use an expectation-maximization (EM) algorithm to iteratively calculate the maximum likelihood estimates of the model parameters where the E-step calculates the posterior probability via Bayes rule,

(6) P(zi=k|Xi,Θ)=πkN(Mk,Σk)Σj=1KπjN(Mj,Σj)

and the M-step updates the model parameters for each class,

(7) πk=Σi=1NP(zi=k|Xi,Θ)N
(8) Mk=Σi=1NXiP(zi=k|Xi,Θ)Σi=1NP(zi=k|Xi,Θ)
(9) Σk=Σi=1NP(zi=k|Xi,Θ)(XiMk)(XiMk)TΣi=1NP(zi=k|Xi,Θ)

Prior probabilities (π) can be set by subjective assessment based on the experiment, or set to 1/K where all classes have equal probability. For example, to determine the relative probability that an LN 60S-detected nuclear target belongs to the LN 60S or EN 60S class, we assume that their relative frequencies are the same and therefore the prior probability of the two intermediates in the nucleus is equal: P(LN 60S)=(EN 60S)=0.5.

Simulations

Request a detailed protocol

We used the cisTEM program simulate (Himes and Grigorieff, 2021) to simulate images of 200 randomly oriented LN or mature 60S using a pixel size of 1.06 Å, an ice thickness of 100 nm amd an exposure rate of 1 e-2 to a total dose of 30 e-2, matching the experimental data. 2DTM was performed as described above.

Data availability

Micrographs, templates and scaled maximum intensity projections (MIPs) in this study have been deposited to EMPIAR and are accessible with the following public access code: EMPIAR-10998. Micrographs, templates and normalized maximum intensity projections (MIPs) in this study are accessible with the following public access code: EMPIAR-10998. A compiled snapshot of the code used for 2D template matching and tilted CTF estimation is available to download from https://cistem.org/development.

The following data sets were generated
    1. Lucas B
    2. Zhang K
    3. Loerch S
    4. Grigorieff N
    (2022) EBI
    ID EMPIAR-10998. In situ single particle classification reveals distinct 60S maturation intermediates in cells.

References

    1. Pedregosa F
    2. Varoquaux G
    3. Gramfort A
    4. Michel V
    5. Thirion B
    6. Grisel O
    7. Blondel M
    8. Prettenhofer P
    9. Weiss R
    10. Dubourg V
    11. Vanderplas J
    12. Passos A
    13. Cournapeau D
    14. Brucher M
    15. Perrot M
    16. Duchesnay E.
    (2011)
    Scikit-learn: machine learning in python
    Journal of Machine Learning Research 12:2825–2830.

Decision letter

  1. Sjors HW Scheres
    Reviewing Editor; MRC Laboratory of Molecular Biology, United Kingdom
  2. Anna Akhmanova
    Senior Editor; Utrecht University, Netherlands
  3. Sjors HW Scheres
    Reviewer; MRC Laboratory of Molecular Biology, United Kingdom
  4. Thomas G Laughlin
    Reviewer; University of California, San Diego, United States

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "In situ single particle classification reveals distinct 60S maturation intermediates in cells" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, including Sjors HW Scheres as Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Anna Akhmanova as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Thomas G Laughlin (Reviewer #3).

Given the close relationship between this paper and eLife paper https://elifesciences.org/articles/68946, I propose to change the current submission to a Research Advance, directly linking the two. Could you please do so when you re-submit your revised paper?

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

(1) One major concern that needs to be addressed more openly in a revised version of the paper is that of potential model bias of incorrect references. The observation on page 18 that the reconstruction from the picked particles was estimated by FSC to be 3.5A, yet the resulting map had to be filtered at 10A to limit the noise, is a strong indication that model bias does play a role in the identification of particles. This bias must affect the measured SNR scores, and thus the probabilistic metrics presented. It also suggests that part of the identified picks may in fact not be true 60S ribosomes, but false positives. This would then affect the conclusions drawn. If the authors disagree, they should set out clear arguments for their case. Also, they should discuss how potential overfitting or model bias would affect their new metrics for particle classification in the discussion. Currently, the only reference to the dangers of overfitting is on page 16, merely referring to their previous paper.

Further questions to consider in this light are:

(a) It is hard to tell from figure 1H how good the density for the 40S is, but from the figure it looks as though it is lower quality than the 60S. Unless there is substantial variation in the relative positions of 40S and 60S, then this is possibly explained by some amount of reference bias? Can the author comment on this? Related: is the FSC in Fig1Suppl1D unmasked (including 40S)?

(b) In figure 2, the assignment of ribosomes in the cytoplasm is quite convincing, however the template matching results are more ambiguous for the nuclear ribosomes. The authors propose this is because in the nucleus there are multiple states (later revisited in Figure 5). But can they exclude part of the ambiguity is due to the nucleus being a more 'dense' environment, generating more noisy data?

Can the authors exclude false positives have been picked particularly in the nucleus? One good control would be to generate reconstructions of ribosomes picked from the nucleus using either template and assessing the 40S density.

(c) Ln 167: What is the expected false-positive rate per image based on the chosen SNR threshold? One per image?

(2) The authors have been wise in selecting the ribosome as a test case. Probably, because of its size and RNA content, for many instances of this complex the SNR is sufficiently high for detection/classification. However, if less careful authors would choose a smaller target, what would happen? What would be the pitfalls and how could they be avoided? This needs a more open discussion in the paper, in order to obtain better insights into the general usefulness of the methodology.

(3) There is no information on the thickness of the lamellae. This would be useful information, and if a range of thicknesses was used, whether there is any correlation between thickness and the ability of 2DTM to discriminate between classes of ribosomes? Does the defocus/Z-refinement of the templates indicate a minimal apparent thickness?

(4) Also, assuming a thickness between 100 and 200 nm, this would mean between a quarter and one eight of ribosomes would be 'cut' by the milling. Have the authors considered whether and how this might affect their analysis?

(5) Regarding the ML approach to establish probability of matched particles to belong to a certain class. In the case reported here priors were assigned based on location. For Figure 5, priors for classification of the two nuclear populations are assigned assuming equal distribution. It is not clear how important these decisions are on the outcome of the classification. In the vast majority of cases there will be no spatial distinction between particles with similar structures, nor will we have any idea about their relative frequencies. How applicable is this methodology to cases other than nuclear versus cytoplasmic ribosomes? I wonder whether a plot like the one in 3B could be derived from in silico data, where the relative abundance is exactly known, and then used to derive probabilities for 'real' data?

(6) Figure 4D could be interpreted in the following way: when the template only contains RNA, the SNR decreases more for nuclear and less for cytoplasmic particles, meaning that the protein part of the LN template contributed less to the 'discovery' of cytoplasmic ribosomes than nuclear. Which makes sense. However, I am not sure I understand the meaning of the experiment on the right side of the panel: when there is only protein (of the LN template), the SNR generally decreased indicating the RNA played an important role in detecting particles – but it decreases more for cytoplasmic ribosomes. This seems to me to suggest that the LN RNA contributed more to detecting the cytoplasmic ribosomes than the nuclear ones, which I struggle to understand. In other words: I agree with lines 245-247 (quote: "the LN60S proteins contribute more to the SNR of the nuclear targets than the cytoplasmic targets and are therefore more effective at differentiating the nuclear from the cytoplasmic 60S populations"), but if that is true then one would be equally able to state that: "the LN60S RNA contributes more to the SNR of the cytoplasmic targets than the nuclear targets and are therefore more effective at differentiating the nuclear from the cytoplasmic 60S populations, but in the wrong way around". Can the authors please explain this?

(7) Lines 428-435. I find this concept interesting, however it would be a challenge to find the most 'comprehensive' template without trying all the available ones. For example, in the case discussed here, if the idea was applied using the mature 60S template the results would have been possibly quite similar, however using the LN 60S template would have significantly skewed the analysis.

(8) Lines 459-464. The idea of refining atomic models against 2D images of cellular samples would be quite revolutionary, however to implement it there would be more required than just "addressing potential of overfitting" and "detecting and quantifying errors in the refined model". I suggest to remove the paragraph.

(9) Ln 110: How many lamellae were the 28 images acquired from?

(10) Figure 2 —figure supplement 1: What is being indicated by the different height lines under panel D? Could this be indicated in the legend.

Reviewer #1 (Recommendations for the authors):

Page 8: The sentences about a threshold defining so many percent of targets were not entirely clear to me. Perhaps they could be reworded?

Page 10: the first sentence seems rather far-reaching for the data presented, and might be wrongly interpreted by less careful users of the technique.

Page 14: "A major challenge … limit radiation damage": a verb is missing?

Figure 1E (and others) just look very black. Perhaps a colour figure would be easier to print/look at? It would also be insightful to have a colour scale bar next to it.

Page 18 mentions a modified version of CTFFIND4. Is this openly available?

Reviewer #2 (Recommendations for the authors):

1. It is hard to tell from figure 1H how good the density for the 40S is, but from the figure it looks as though it is lower quality than the 60S.

Unless there is substantial variation in the relative positions of 40S and 60S, then this is possibly explained by some amount of reference bias? Can the author comment on this?

Related: is the FSC in Fig1Suppl1D unmasked (including 40S)?

2. There is no information on the thickness of the lamellae. This would be useful information, and if a range of thicknesses was used, whether there is any correlation between thickness and the ability of 2DTM to discriminate between classes of ribosomes?

3. Also, assuming a thickness between 100 and 200 nm, this would mean between a quarter and one eight of ribosomes would be 'cut' by the milling.

Have the authors considered whether and how this might affect their analysis?

4. In figure 2, the assignment of ribosomes in the cytoplasm is quite convincing, however the template matching results are more ambiguous for the nuclear ribosomes. The authors propose this is because in the nucleus there are multiple states (later revisited in Figure 5). But can they exclude part of the ambiguity is due to the nucleus being a more 'dense' environment, generating more noisy data?

Can the authors exclude false positives have been picked particularly in the nucleus?

One good control would be to generate reconstructions of ribosomes picked from the nucleus using either template and assessing the 40S density.

5. Regarding the ML approach to establish probability of matched particles to belong to a certain class. In the case reported here priors were assigned based on location. For Figure 5, priors for classification of the two nuclear populations are assigned assuming equal distribution. It is not clear how important these decisions are on the outcome of the classification. In the vast majority of cases there will be no spatial distinction between particles with similar structures, nor will we have any idea about their relative frequencies. How applicable is this methodology to cases other than nuclear versus cytoplasmic ribosomes? I wonder whether a plot like the one in 3B could be derived from in silico data, where the relative abundance is exactly known, and then used to derive probabilities for 'real' data?

6. Figure 4D could be interpreted in the following way: when the template only contains RNA, the SNR decreases more for nuclear and less for cytoplasmic particles, meaning that the protein part of the LN template contributed less to the 'discovery' of cytoplasmic ribosomes than nuclear. Which makes sense. However, I am not sure I understand the meaning of the experiment on the right side of the panel: when there is only protein (of the LN template), the SNR generally decreased indicating the RNA played an important role in detecting particles – but it decreases more for cytoplasmic ribosomes. This seems to me to suggest that the LN RNA contributed more to detecting the cytoplasmic ribosomes than the nuclear ones, which I struggle to understand.

In other words: I agree with lines 245-247 (quote: "the LN60S proteins contribute more to the SNR of the nuclear targets than the cytoplasmic targets and are therefore more effective at differentiating the nuclear from the cytoplasmic 60S populations"), but if that is true then one would be equally able to state that: "the LN60S RNA contributes more to the SNR of the cytoplasmic targets than the nuclear targets and are therefore more effective at differentiating the nuclear from the cytoplasmic 60S populations, but in the wrong way around". Can the authors please explain this?

7. Lines 428-435. I find this concept interesting, however it would be a challenge to find the most 'comprehensive' template without trying all the available ones. For example, in the case discussed here, if the idea was applied using the mature 60S template the results would have been possibly quite similar, however using the LN 60S template would have significantly skewed the analysis.

8. Lines 459-464. The idea of refining atomic models against 2D images of cellular samples would be quite revolutionary, however to implement it there would be more required than just "addressing potential of overfitting" and "detecting and quantifying errors in the refined model". I suggest to remove the paragraph.

Reviewer #3 (Recommendations for the authors):

Comments and questions:

Ln 110: How many lamellae were the 28 images acquired from?

Can the authors comment on the thicknesses of lamellae images? Does the defocus/Z-refinement of the templates indicate a minimal apparent thickness?

Ln 167: What is the expected false-positive rate per image based on the chosen SNR threshold? One per image?

Figure 2 —figure supplement 1: What is being indicated by the different height lines under panel D? Could this be indicated in the legend.

I commend the authors for deposition of all requisite material to reproduce the study to EMPIAR. In addition, could the authors elaborate more on the Cryo-EM data collection for reference for others intending to use this approach for their samples of interest (e.g., target defoci, filter slit-width, number of frames, etc.).

https://doi.org/10.7554/eLife.79272.sa1

Author response

Essential revisions:

(1) One major concern that needs to be addressed more openly in a revised version of the paper is that of potential model bias of incorrect references. The observation on page 18 that the reconstruction from the picked particles was estimated by FSC to be 3.5A, yet the resulting map had to be filtered at 10A to limit the noise, is a strong indication that model bias does play a role in the identification of particles. This bias must affect the measured SNR scores, and thus the probabilistic metrics presented. It also suggests that part of the identified picks may in fact not be true 60S ribosomes, but false positives. This would then affect the conclusions drawn. If the authors disagree, they should set out clear arguments for their case. Also, they should discuss how potential overfitting or model bias would affect their new metrics for particle classification in the discussion. Currently, the only reference to the dangers of overfitting is on page 16, merely referring to their previous paper.

We thank the reviewers for encouraging us to elaborate on this point. We agree that template bias and the potential for false positives are both important considerations. We have updated the manuscript to include further discussion of how these factors could affect our analysis as well as additional control experiments.

As the reviewers point out, our reconstruction suffers from template bias that results from partial alignment of the background with the template. This may indeed affect the observed SNR values of detected targets. However, this will not change the rate of false positive detections, which depends on the distribution of correlation coefficients expected for pure noise, and the chosen threshold, which was set to one false positive per search in our experiments. For the reconstruction, we only allowed targets above an SNR threshold of 8, which would allow five false positives of the 3991 targets (0.1%). We therefore do not expect that false positives affect the reconstruction. Consistently, we have shown previously that we are able to recover features in reconstructions using 2DTM coordinates that do not derive from the template (Rickgauer et al., 2017, Lucas et al., 2021) and that we do not reproduce template features in the reconstruction that are not in the data (Lucas et al., 2021). This indicates that reconstructions using 2DTM coordinates are not dominated by template bias.

We have included a more thorough discussion of template overfitting to noise and how it affects the results under the subheading “Relative similarity to alternate templates reveals population identity”. We further elaborate on these points in the discussion on page 18, lines 535-547 and page 19, lines 569-584.

Further questions to consider in this light are:

(a) It is hard to tell from figure 1H how good the density for the 40S is, but from the figure it looks as though it is lower quality than the 60S. Unless there is substantial variation in the relative positions of 40S and 60S, then this is possibly explained by some amount of reference bias? Can the author comment on this? Related: is the FSC in Fig1Suppl1D unmasked (including 40S)?

Indeed, there is substantial heterogeneity in the position of the 40S relative to the 60S. The text has been updated to reflect this point (page 5, lines 150-156). Additional discussion has been included in the manuscript: page 18, lines 534-546. The Materials and methods have been updated to include the mask radius on page 22, line 679.

(b) In figure 2, the assignment of ribosomes in the cytoplasm is quite convincing, however the template matching results are more ambiguous for the nuclear ribosomes. The authors propose this is because in the nucleus there are multiple states (later revisited in Figure 5). But can they exclude part of the ambiguity is due to the nucleus being a more 'dense' environment, generating more noisy data?

Can the authors exclude false positives have been picked particularly in the nucleus? One good control would be to generate reconstructions of ribosomes picked from the nucleus using either template and assessing the 40S density.

Differences in background in different parts of the image may affect observed 2DTM SNR values, and thus particle classification (see above). However, due to the whitening and normalization procedures, the false-positive rates remain approximately constant across an image. We now include Figure 1—figure supplement 2 to show that the histogram of normalized MIP values is comparable in the nucleus and the vacuole relative to the cytoplasm, and if anything, shifted slightly to the left. Therefore, we would expect that, relative to the cytoplasm, the probability of a false positive in the nucleus, is similar or slightly lower. This is also visible in Figure 2D-G, which shows little difference in the background in the nucleus relative to the cytoplasm and between the two templates in the small area shown, and in Figure 3B, which shows that the standard deviations of the log2(2DTM SNR ratios) in the nucleus and the cytoplasm are similar, indicating similar noise levels. Consistently, the vacuole is much darker in many images than the nucleus. However, of the 4601 independent 60S-detected targets in the searches with the three different templates, only one was found in the vacuole (0.02%). We have included a discussion of this point in the manuscript on pages 4, line 125 – page 5, line 131 and in the discussion on page 19, lines 569-584.

We thank the reviewers for the suggested control experiment to detect 40S bound to 60S targets detected in the nucleus. However, nuclear maturation of 60S and 40S occurs separately, and therefore we would not expect pre-60S to be associated with 40S. As a control, we showed in the original manuscript that inhibiting 60S export depletes targets detected with the EN 60S template.

(c) Ln 167: What is the expected false-positive rate per image based on the chosen SNR threshold? One per image?

We have updated the Materials and methods to include this information.

(2) The authors have been wise in selecting the ribosome as a test case. Probably, because of its size and RNA content, for many instances of this complex the SNR is sufficiently high for detection/classification. However, if less careful authors would choose a smaller target, what would happen? What would be the pitfalls and how could they be avoided? This needs a more open discussion in the paper, in order to obtain better insights into the general usefulness of the methodology.

Thank you for raising this issue. We have included a discussion of this point in the manuscript page 19, line 578-584.

(3) There is no information on the thickness of the lamellae. This would be useful information, and if a range of thicknesses was used, whether there is any correlation between thickness and the ability of 2DTM to discriminate between classes of ribosomes? Does the defocus/Z-refinement of the templates indicate a minimal apparent thickness?

We have updated the manuscript to include a table describing the image-wise thickness estimates and defocus.

(4) Also, assuming a thickness between 100 and 200 nm, this would mean between a quarter and one eight of ribosomes would be 'cut' by the milling. Have the authors considered whether and how this might affect their analysis?

Any factors that decrease the agreement between the template and the target would reduce the 2DTM SNR and increase the standard deviation of the log2(2DTM SNR ratios), making classification more challenging (e.g.: Figure 3—figure supplement 2). We do not expect that any cut or partial ribosomes are detected. While the sample may contain partial ribosomes on the surface of the lamella, interaction with the 30 kV Ga+ beam would likely cause significant damage and degradation of their structure, making them unlikely to be detected. Damage further inside the sample would also be expected to lower the 2DTM SNRs by varying degrees, again leading to increased standard deviations and lower 2DTM SNRs.

(5) Regarding the ML approach to establish probability of matched particles to belong to a certain class. In the case reported here priors were assigned based on location. For Figure 5, priors for classification of the two nuclear populations are assigned assuming equal distribution. It is not clear how important these decisions are on the outcome of the classification. In the vast majority of cases there will be no spatial distinction between particles with similar structures, nor will we have any idea about their relative frequencies. How applicable is this methodology to cases other than nuclear versus cytoplasmic ribosomes? I wonder whether a plot like the one in 3B could be derived from in silico data, where the relative abundance is exactly known, and then used to derive probabilities for 'real' data?

The ability to distinguish the nuclear from the cytoplasmic populations provides a convenient control for testing our approach in situ. We agree that in most cases the subcellular localization will not distinguish classes. This is the justification for the experiment in Figure 5, wherein we attempt to distinguish a mixed population of nuclear intermediates without prior information about their spatial distribution. In this case, we use a 50:50 prior for the two nuclear intermediates, effectively assuming no prior information. We were able to identify changes to the relative occupancy of nuclear 60S populations after treatment with a nuclear export inhibitor. We conclude that this strategy can be used to classify mixed populations with different relative occupancy.

The purpose of this experiment was to show that we could distinguish populations in cells that differ with respect to their relative similarity to different templates. As the reviewers point out, an in silico experiment could be used to show that log2(2DTM SNR ratios) can be used to separate related states. Indeed, we show in the new Figure 4 that provided sufficient targets are selected, the mean log2(mature 60S/ LN60S 2DTM SNR) is stable across a wide range of 2DTM SNRs. We have described how this can be used to validate the identity of a population by comparison with population means from simulations. However, extrapolating from distributions fit to simulated data requires accurately simulating the in situ noise. We show in the new Figure 4 that differences in background alter the standard deviation of the fitted populations and therefore affect classification, making direct extrapolation of probabilities unreliable. We suggest, therefore, that to accurately capture the noise, populations will need to be identified by fitting distributions to the data in each experiment. In the case wherein the populations are mixed, we included biological controls to show that we could detect expected changes to populations in the cell.

(6) Figure 4D could be interpreted in the following way: when the template only contains RNA, the SNR decreases more for nuclear and less for cytoplasmic particles, meaning that the protein part of the LN template contributed less to the 'discovery' of cytoplasmic ribosomes than nuclear. Which makes sense. However, I am not sure I understand the meaning of the experiment on the right side of the panel: when there is only protein (of the LN template), the SNR generally decreased indicating the RNA played an important role in detecting particles – but it decreases more for cytoplasmic ribosomes. This seems to me to suggest that the LN RNA contributed more to detecting the cytoplasmic ribosomes than the nuclear ones, which I struggle to understand. In other words: I agree with lines 245-247 (quote: "the LN60S proteins contribute more to the SNR of the nuclear targets than the cytoplasmic targets and are therefore more effective at differentiating the nuclear from the cytoplasmic 60S populations"), but if that is true then one would be equally able to state that: "the LN60S RNA contributes more to the SNR of the cytoplasmic targets than the nuclear targets and are therefore more effective at differentiating the nuclear from the cytoplasmic 60S populations, but in the wrong way around". Can the authors please explain this?

Thank you for highlighting this point. Indeed, this result seems counterintuitive at first. We have updated our description of this experiment in the text.

(7) Lines 428-435. I find this concept interesting, however it would be a challenge to find the most 'comprehensive' template without trying all the available ones. For example, in the case discussed here, if the idea was applied using the mature 60S template the results would have been possibly quite similar, however using the LN 60S template would have significantly skewed the analysis.

Indeed, using the LN 60S as the template for the “discovery” search would likely miss mature 60S targets, thereby biasing the number detected targets towards the LN 60S population. If the SNR values for the templates are too different, a full search may have to be performed for all of them to avoid such a bias.

(8) Lines 459-464. The idea of refining atomic models against 2D images of cellular samples would be quite revolutionary, however to implement it there would be more required than just "addressing potential of overfitting" and "detecting and quantifying errors in the refined model". I suggest to remove the paragraph.

We have removed this paragraph.

(9) Ln 110: How many lamellae were the 28 images acquired from?

We have included this information in the text on page 4, line 113. We have additionally included a table of image characteristics, including estimated thickness and defocus calculated using CTFFIND.

(10) Figure 2 —figure supplement 1: What is being indicated by the different height lines under panel D? Could this be indicated in the legend.

The lines indicate images from the same lamella. We have included this information in the figure legend.

https://doi.org/10.7554/eLife.79272.sa2

Article and author information

Author details

  1. Bronwyn A Lucas

    1. RNA Therapeutics Institute, University of Massachusetts Chan Medical School, Worcester, United States
    2. Howard Hughes Medical Institute, Janelia Research Campus, Ashburn, United States
    Contribution
    Conceptualization, Data curation, Formal analysis, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing - original draft, Project administration, Writing – review and editing
    For correspondence
    bronwyn.lucas@umassmed.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-9162-0421
  2. Kexin Zhang

    1. RNA Therapeutics Institute, University of Massachusetts Chan Medical School, Worcester, United States
    2. Howard Hughes Medical Institute, Janelia Research Campus, Ashburn, United States
    Contribution
    Software, Formal analysis, Validation, Visualization, Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
  3. Sarah Loerch

    Howard Hughes Medical Institute, Janelia Research Campus, Ashburn, United States
    Present address
    Department of Chemistry and Biochemistry, University of California, Santa Cruz, United States
    Contribution
    Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-1731-516X
  4. Nikolaus Grigorieff

    1. RNA Therapeutics Institute, University of Massachusetts Chan Medical School, Worcester, United States
    2. Howard Hughes Medical Institute, Janelia Research Campus, Ashburn, United States
    Contribution
    Conceptualization, Software, Formal analysis, Supervision, Funding acquisition, Methodology, Writing - original draft, Project administration, Writing – review and editing
    For correspondence
    niko@grigorieff.org
    Competing interests
    Reviewing editor, eLife
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-1506-909X

Funding

Chan Zuckerberg Initiative (2021-234617)

  • Bronwyn A Lucas
  • Nikolaus Grigorieff

Howard Hughes Medical Institute

  • Nikolaus Grigorieff

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We are grateful to Michael Rosbash (Brandeis) for providing MNY8 cells, Xiaowei Zhao for help trouble-shooting FIB-milling, the cryo-EM facility at Janelia Research Campus where this data was collected and Peter Rickgauer, Tim Grant and Ben Himes for helpful comments and suggestions. This project has been made possible in part by grant number 2021–234617 from the Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation awarded to NG and BL.

Senior Editor

  1. Anna Akhmanova, Utrecht University, Netherlands

Reviewing Editor

  1. Sjors HW Scheres, MRC Laboratory of Molecular Biology, United Kingdom

Reviewers

  1. Sjors HW Scheres, MRC Laboratory of Molecular Biology, United Kingdom
  2. Thomas G Laughlin, University of California, San Diego, United States

Publication history

  1. Received: April 5, 2022
  2. Preprint posted: April 10, 2022 (view preprint)
  3. Accepted: August 24, 2022
  4. Accepted Manuscript published: August 25, 2022 (version 1)
  5. Version of Record published: September 5, 2022 (version 2)

Copyright

© 2022, Lucas et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,029
    Page views
  • 501
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Bronwyn A Lucas
  2. Kexin Zhang
  3. Sarah Loerch
  4. Nikolaus Grigorieff
(2022)
In situ single particle classification reveals distinct 60S maturation intermediates in cells
eLife 11:e79272.
https://doi.org/10.7554/eLife.79272
  1. Further reading

Further reading

    1. Biochemistry and Chemical Biology
    2. Cell Biology
    Allison M Gicking, Tzu-Chen Ma ... William O Hancock
    Research Article Updated

    Bidirectional cargo transport in neurons requires competing activity of motors from the kinesin-1, -2, and -3 superfamilies against cytoplasmic dynein-1. Previous studies demonstrated that when kinesin-1 attached to dynein-dynactin-BicD2 (DDB) complex, the tethered motors move slowly with a slight plus-end bias, suggesting kinesin-1 overpowers DDB but DDB generates a substantial hindering load. Compared to kinesin-1, motors from the kinesin-2 and -3 families display a higher sensitivity to load in single-molecule assays and are thus predicted to be overpowered by dynein complexes in cargo transport. To test this prediction, we used a DNA scaffold to pair DDB with members of the kinesin-1, -2, and -3 families to recreate bidirectional transport in vitro, and tracked the motor pairs using two-channel TIRF microscopy. Unexpectedly, we find that when both kinesin and dynein are engaged and stepping on the microtubule, kinesin-1, -2, and -3 motors are able to effectively withstand hindering loads generated by DDB. Stochastic stepping simulations reveal that kinesin-2 and -3 motors compensate for their faster detachment rates under load with faster reattachment kinetics. The similar performance between the three kinesin transport families highlights how motor kinetics play critical roles in balancing forces between kinesin and dynein, and emphasizes the importance of motor regulation by cargo adaptors, regulatory proteins, and the microtubule track for tuning the speed and directionality of cargo transport in cells.

    1. Cell Biology
    Jia Chen, Daniel St Johnston
    Research Article Updated

    In the adult Drosophila midgut, basal intestinal stem cells give rise to enteroblasts that integrate into the epithelium as they differentiate into enterocytes. Integrating enteroblasts must generate a new apical domain and break through the septate junctions between neighbouring enterocytes, while maintaining barrier function. We observe that enteroblasts form an apical membrane initiation site (AMIS) when they reach the septate junction between the enterocytes. Cadherin clears from the apical surface and an apical space appears between above the enteroblast. New septate junctions then form laterally with the enterocytes and the AMIS develops into an apical domain below the enterocyte septate junction. The enteroblast therefore forms a pre-assembled apical compartment before it has a free apical surface in contact with the gut lumen. Finally, the enterocyte septate junction disassembles and the enteroblast/pre-enterocyte reaches the gut lumen with a fully formed brush border. The process of enteroblast integration resembles lumen formation in mammalian epithelial cysts, highlighting the similarities between the fly midgut and mammalian epithelia.