Automated systematic evaluation of cryo-EM specimens with SmartScope

  1. Jonathan Bouvette
  2. Qinwen Huang
  3. Amanda A Riccio
  4. William C Copeland
  5. Alberto Bartesaghi  Is a corresponding author
  6. Mario J Borgnia  Is a corresponding author
  1. Genome Integrity and Structural Biology Laboratory, National Institute of Environmental Health Sciences, United States
  2. Department of Computer Science, Duke University, United States
  3. Department of Electrical and Computer Engineering, Duke University, United States
  4. Department of Biochemistry, Duke University School of Medicine, United States

Abstract

Finding the conditions to stabilize a macromolecular target for imaging remains the most critical barrier to determining its structure by cryo-electron microscopy (cryo-EM). While automation has significantly increased the speed of data collection, specimens are still screened manually, a laborious and subjective task that often determines the success of a project. Here, we present SmartScope, the first framework to streamline, standardize, and automate specimen evaluation in cryo-EM. SmartScope employs deep-learning-based object detection to identify and classify features suitable for imaging, allowing it to perform thorough specimen screening in a fully automated manner. A web interface provides remote control over the automated operation of the microscope in real time and access to images and annotation tools. Manual annotations can be used to re-train the feature recognition models, leading to improvements in performance. Our automated tool for systematic evaluation of specimens streamlines structure determination and lowers the barrier of adoption for cryo-EM.

Editor's evaluation

This paper describes a new software tool: SmartScope, for automated screening of cryo-EM grids. SmartScope can also perform automated data collection on suitable grids, including with beam-image shifts and tilted stage geometries. If it works in practice as advertised in the paper, then it will be a highly useful tool for the field, especially if other groups would also contribute to its open-source and modular code.

https://doi.org/10.7554/eLife.80047.sa0

Introduction

Over the past decade, advances in hardware and software have improved the resolution and throughput of single particle analysis (SPA), establishing cryo-EM as a method of choice in structural biology. However, optimizing specimens for high-resolution cryo-EM imaging remains a significant barrier (Weissenberger et al., 2021). The ideal specimen for solving a structure is a single layer of randomly oriented macromolecular complexes embedded into a thin slab of vitreous ice. During specimen preparation, interactions with the air-water interface facilitated by the confinement into a thin layer of buffer can destabilize protein complexes leading to denaturation and aggregation or force the molecules into a ‘preferred orientation’ (Noble et al., 2018). In addition, vitrification methods typically yield variations in ice thickness across the grid. These artifacts can severely limit the quality of specimens and are typically addressed through an optimization process in which several parameters are varied to increase the stability and mono-dispersity of the target macromolecule (Passmore and Russo, 2016). Evaluating each combination of parameters involves comprehensive sampling of one or more grids using a cryo-EM. Testing all combinations is impractical because the number grows significantly with the inclusion of each parameter. Instead, an iterative search is performed in which a limited number of parameters are evaluated, and new conditions are selected based on the results.

The goal of specimen screening is to learn as much as possible from each condition, often taking advantage of the heterogeneous landscape of each grid to extract valuable information about the behavior of the macromolecule of interest. This process involves selecting areas for evaluation, adjusting the positioning and optical conditions of the microscope, and recording images at multiple magnifications. The lowest magnifications are used to assess the overall quality of the vitrification process, the number of potential areas amenable to sampling at higher resolution and some macroscopic indicators of sample instability such as aggregation. Higher magnification images provide direct information about the macromolecules of interest, such as particle integrity, distribution, affinity for the substrate, density, heterogeneity and orientation, as well as the quality of the ice, and the resolution limit of the images. This makes manual specimen screening a time-consuming activity with a steep learning curve in which the implicit subjectivity in the selection of areas can lead to suboptimal sampling, resulting in missing information. The quality of the results, the speed, and even the integrity of the instrument, all depend on the experience and skills of the operator.

Existing software for automated cryo-EM is not designed to provide a thorough sampling of each grid. Instead, packages are optimized for acquiring a large number of high-quality images of a pre-selected set of targets (Mastronarde, 2005; Suloway et al., 2005). Although all data collection packages preserve lower magnification images and the associated stage positions, they are not designed to facilitate virtual navigation of the grid with the exception Leginon’s companion software, Appion (Lander et al., 2009), which provides offline access to the results via a web user interface (WebUI). However, none of the existing packages is optimized for screening, nor they provide a web-based solution for controlling the microscope during specimen evaluation.

Here, we present SmartScope, a web-based, highly available expert system capable of performing unsupervised screening of specimens and automated data collection for cryo-EM. SmartScope uses pretrained generalized deep learning (DL) models for feature detection and selection to maximize sampling and provide information to guide specimen optimization. By combining automation, machine learning, and remote control, we aim to increase the efficiency of the screening process, reducing costs and dramatically increasing availability. Finally, SmartScope is designed as a modular framework, facilitating the addition of new algorithms for area selection and navigation that can further improve targeting performance.

Results

The complexity of a screening workflow depends on several factors including the instrument used and the type of specimen. Here, we describe the extended operation of a microscope furnished with an autoloader device and loaded with frozen hydrated targets for SPA, which are prepared on a micropatterned holey substrate or continuous carbon (Figure 1 and Figure 1—figure supplement 1). All the interactions with the software are carried out using the SmartScope WebUI which also allows to monitor progress and control the workflow with little to no training in cryo-EM. Further, SmartScope permits simultaneous access from multiple remote devices, greatly facilitating collaborative work.

Figure 1 with 3 supplements see all
Overview of the SmartScope framework.

(A) Workflow for unsupervised grid navigation and imaging. SmartScope handles specimen exchange, atlas acquisition, regions of interest (ROIs) identification, classification, and selection. It then visits the selected regions and identifies and selects targets of interest (TOIs) which are acquired at higher magnification and preprocessed. (B) Detailed steps in ROI selection. After detection and classification, ROIs are also clustered into groups. In the example is a clustering by size. Then, from the ROIs are queried based on their class and ROIs from different clusters are selected. (C) Detailed steps in TOI selection. Shown here is the hole detection followed by a median intensity clustering. Then, holes are grouped by image-shift radius and groups from each cluster are selected for imaging.

Initialization

After a cassette is inserted in the autoloader, a session is initialized by providing the list of grids to be evaluated along with a series of parameters applicable to all of them (Appendix 1—table 1). SmartScope then initiates a connection to SerialEM (Mastronarde, 2005) via its python API to issue commands to the microscope. This connection is locked to prevent the simultaneous execution of multiple workflows. For instruments equipped with automated loading systems, grids are loaded sequentially into the column and subjected to the operations described in the sections below.

Grid analysis

For each grid loaded, a series of low magnification images are acquired and stitched together to generate a grid map or ‘Atlas’ which is analyzed by SmartScope’s DL driven window detector and classifier (Figure 1A). Windows deemed suboptimal for imaging due to physical damage or heavy contamination are excluded from further analysis. The remaining ‘good’ windows are reclassified and clustered based on a selectable criterion (e.g. areas suitable for imaging). Representatives of each window cluster are added to a list of regions of interest (ROIs) with the goal of adequately sampling the diversity of imageable areas (Figure 1B). The program then proceeds to visit and select imaging targets from the ROIs in this list, which can be modified via the WebUI at any time before the grid is completed.

Selection of targets

The stage is moved to the next ROI, brought to eucentric height, and imaged at a magnification that ensures complete coverage of the area (Figure 1A). The next step is to identify targets of interest (TOIs) based on a programmable criterion that depends on the specimen. For example, SmartScope’s DL based hole finder is used to detect holes in frozen hydrated SPA specimens. The current algorithm classifies the holes based on their average signal intensity (a proxy for ice thickness) and clusters them into a selectable number of groups. By default, the group containing the darkest targets is rejected as not suitable for imaging. To maximize diversity, holes are selected from the different clusters and added to a TOI list (Figure 1C). As with ROIs, this selection can be modified at any time during imaging of the grid. A protocol for selecting negative stain TOIs is also available and plug-ins for other types of specimens may be incorporated in the future.

Selected TOIs are visited sequentially by moving the stage to their predicted coordinates. A series of images, at a magnification that encompasses the TOI and surrounding area, are used to recenter the imaging area on the target (a hole in the substrate for SPA). These intermediate resolution images are stored in the database as they often provide valuable information about the specimen, such as affinity of the macromolecules for the support material, aggregation, denaturation, etc. Autofocus and drift stabilization procedures are then performed before acquiring high-magnification images of the target. An optional random offset from the center of each hole can be specified to capture images at different distances from the edge of the substrate. The newly acquired images are processed using the routine alignframes in IMOD (Kremer et al., 1996) and the program CTFFIND4 (Rohou and Grigorieff, 2015) to facilitate assessment of data quality. This cycle is repeated until all TOIs are imaged; then the workflow proceeds to the next ROI.

After all selected ROIs for the grid is finished, SmartScope automatically switches to the next grid. This behavior may be modified by selecting the ‘pause between grids’ option on the session menu. Pausing allows for the selection of additional ROIs at the end of the cycle, providing better control of unattended sessions or when evaluating unusual specimens where automated sampling may not be satisfactory.

Accessing and annotating results

SmartScope systematically documents the results and facilitates their analysis. During collection, all images and their related metadata are stored in a consistent data structure. To display and interact with these data, SmartScope implements an intuitive WebUI that tracks the imaging process in real time. Moreover, it enables remote interaction with a running session, such as modifying area selection, changing labels and acquisition parameters, and taking notes about the specimen, all without interrupting the acquisition workflow.

After a session is over, SmartScope can automatically copy the data to long-term or object storage. The data remains available through the WebUI and allows users to make additional annotations. Other tools, such as micrograph curation and exporting of metadata as star files are also available.

Tools for exhaustive screening and high-throughput data collection

SmartScope can also perform high-throughput data collection. A session can be initialized in data collection mode or changed from screening to data collection by setting the number of TOI to sample to zero. This will select all the available TOIs for imaging.

To achieve high-throughput, SmartScope makes uses beam-image shift (BIS) for multi-hole imaging (Cheng et al., 2018). BIS can be used during screening for more exhaustive sampling, allowing for exploratory data collections that can provide enough images to carry out 2D classification or initial 3D reconstruction. The BIS grouping in SmartScope uses an algorithm that groups the holes within a given radius (Figure 1—figure supplement 2). Only the targets that are labeled as ‘good’ and are in the included clusters are used for the grouping. The algorithm attempts to maximize the coverage of targets while minimizing the total number of groups. To maximize the speed of data collection, a minimal group size can also be specified to prevent the algorithm from assigning small groups of holes. The BIS radius and minimal group size is specified at the start of each session and can be modified during the session.

One way to alleviate the commonly occurring problem of orientation bias in single-particle cryo-EM is to collect data on a tilted specimen (Tan et al., 2017). SmartScope can perform BIS data collection on tilted specimens, where the position and defocus of the targets are corrected using geometrical tilt constraints with a throughput that is similar to regular non-tilted data collection. Moreover, the tilt angle can be seamlessly changed at any point during the acquisition process. Combined with the integrated in-line data processing, this allows to adapt the data collection strategy on-the-fly based on the newly acquired knowledge.

The combination of automated operations such as feature detection routines, multi-hole imaging, and tilted data collection capabilities, makes SmartScope a powerful tool that can accelerate screening of cryo-EM samples and achieve high-throughput data collection.

Asynchronous imaging and processing

SmartScope was designed to maximize microscope efficiency and to remove much of the idling time in the imaging process. A common source of idling is the processing time required for rendering the frame averages or calculating the CTF fits. To minimize the impact of this, the microscope’s imaging process and the image processing routines run as parallel processes. The newly acquired images or movies are queued up for analysis and processed sequentially on a separate thread. For the atlas and windows, processing includes detecting, classifying, and selecting targets. For high-magnification TOIs, it involves frame alignment when fractions are saved, and CTF estimation. This allows the microscope to immediately acquire the next target while the images are being analyzed (Figure 1—figure supplement 1).

Installation and configuration

SmartScope bundles a web server for the WebUI, a database server and the core package necessary to run the main imaging workflow. It relies on the database to store and query essential metadata (Figure 1—figure supplement 3). To simplify the deployment and orchestration of these services, we created a Docker image that should be compatible with most Linux systems.

The application can be installed on a single workstation that will handle the execution of all the services. It can also be installed in a master-worker configuration where, for example, one computer handles the web server and the database (master), while the main workflow is executed on a workstation that is connected to the microscope network (worker). The minimal requirements are that all the systems can access the database that holds the metadata for the session and targets, and the filesystem where the images are saved.

A long-term storage area that holds the data from previous sessions can be specified. Both mounted network drives and object-stores can be used to store the data. This allows to clear data from the main local drives leaving space for the ongoing sessions while keeping older data accessible through the WebUI.

After installation, administrators can login to the WebUI management portal where microscope and detector information needs to be added to allow connection to the instruments. Multiple microscopes can be installed on a single instance of SmartScope, serving as a central hub for microscope access. To access the server, each user has an account and groups are created by an administrator. Users can only access the data from the groups they belong to.

In SerialEM, a settings file containing the magnification for low-dose imaging needs to be prepared for each microscope and detector. The conditions should be set as follows: full square image bound to the Search preset using a magnification that shows the entire square; the fine hole re-alignment condition in low SA magnification bound to the View preset; the data acquisition conditions bound to the Record preset with the acquisition and dose fractionation settings bound to the Preview. As different hardware combinations would require different settings, we included a table with the settings used on a Talos Arctica equipped with a K2 detector and Titan Krios G4 equipped a K3 detector and bio-continuum energy filter as guidelines (Appendix 1—table 2).

Automated object detection and classification

SmartScope identifies and classifies ROIs and TOIs suitable for cryo-EM imaging using DL approaches. At the atlas level, areas suitable for imaging appear as ‘windows,’ commonly shaped as squares, through the metallic grid in which the support layer is intact and not blocked by thick ice or large contaminants. Windows are automatically detected and classified using a pretrained Region-based Convolutional Neural Network (Girshick, 2015) that identifies the ‘good’ windows with 80% precision (Figure 2A, Figure 2—figure supplement 1A, Appendix 1—table 3), thus providing information that can effectively guide the instrument to avoid undesirable regions of the grid.

Figure 2 with 1 supplement see all
Deep-learning-based feature recognition for autonomous grid navigation.

Sample images show the performance of the square (A) and hole (B) detectors applied to gold (left) and carbon (right) grids. (A) Automatic detection of squares and classification into six different classes: small, cracked, dry, contaminated, good, and partial (white scale bars are 100 μm). Representative examples of squares assigned to each class and corresponding detection precision values are shown (bottom panel). (B) Hole detection performance on representative square images extracted from gold and carbon grids. The hole detector implements a classification step to filter out contaminants (shown in yellow) and increases hole detection precision (shown as pink circles) (white scale bars are 10 μm).

Selected windows are then acquired at a higher magnification where TOIs can take various shapes depending on whether the modality is single particle cryo-EM, tomography, or negative stain. In single particle cryo-EM, these TOIs usually show as holes in the substrate and are difficult to detect with traditional image processing tools due to the low contrast, especially when carbon mesh grids are used. We implemented a robust hole detector for frozen specimens based on the You-Only-Look-Once (Redmon et al., 2016) object detection architecture (Figure 2B and Figure 2—figure supplement 1B). To prevent the network from incorrectly picking dark ice contaminants, we also added a classification step to separate holes from contaminants and were able to correctly identify 89% of the holes.

Screening mode statistics

In screening mode, the average time required for exchanging specimens and acquiring a partial atlas covering at least 25% of the grid surface is 7.8 min (Figure 3, Appendix 1—table 4 and Appendix 1—table 5). The median sampling time for a specimen is 21 min, yielding a median of 9.0 high-magnification images of holes sampled from 3.0 different windows. Each day, our screening microscope thoroughly screens an average of 16 specimens and performs data collection for approximately 16 hr.

SmartScope’s screening mode statistics.

Screening rates with and without beam-image shift (BIS) were 1.0 and 0.7 holes per minute, respectively (RANSAC regression). The red arrow indicates the time of specimen loading and start of atlas acquisition. Dashed blue line represents the median session duration (21.6min) and the median number of high-magnification images (9.0) obtained per specimen during screening mode.

Data collection mode statistics

SmartScope offers a convenient option to set up, track data collection and to label and annotate exposures. In data collection mode, the microscope continuously acquires areas and finds targets using operator assistance only to fine tune the selection to specific needs, significantly reducing setup time as compared to our manual workflow. The median data collection setup time, from specimen loading to the start of high-magnification acquisition, is 32 min with our K2 detector (Appendix 1—table 6). As an example, we used SmartScope to determine a 3.4 Å map of the 55 kDa homodimer accessory subunit of the human mitochondrial DNA polymerase (Young and Copeland, 2013) (EMD-25764, Figure 4, Figure 4—figure supplement 1, Figure 4—figure supplement 2). For this specimen, we collected 4327 micrographs and seamlessly tilted to 30° for the last third of the dataset to improve the angular sampling of the protein (Table 1).

Figure 4 with 2 supplements see all
Acquisition of POLG2 dataset using SmartScope.

(A) Atlas of the specimen (left), typical micrograph (center) with some particles picked (purple circles) and 2D classes of POLG2 (right). (B) Resulting map of POLG2 colored by local resolution (left) and example of an alpha helix with atomic model fit into the density (left). (C) Masked Fourier-shell correlation curve between half-maps showing a resolution of 3.4Å.

Table 1
Cryo-EM data acquisition parameters and statistics.
POLG2 (EMD-25764)
No tiltTiltedCombined
Hardware
MicroscopeTalos Arctica (Thermo Fisher)
DetectorK2 summit (Gatan Inc)
Data collection and processing
Magnification45,000
Voltage (kV)200
Electron exposure (e-2)50
Defocus range (µm)1.2–1.81.4–1.61.2–1.8
Tilt angle (°)030
Pixel size (Å/pixel)0.932
Movie No.3029 (70%)1282 (30%)4311
Symmetry imposedC2
Final particles99,189 (78%)28,641 (22%)127,860
Map resolution (unmasked)3.74.33.7
Map resolution (masked)3.53.93.4
FSC threshold0.143
Data collection statistics
Setup time* (min)60
Throughput (movies/hr)117.9
Throughput last hour (movies)130
  1. *

    Data collection times are calculated as the time needed from grid loading to the collection of the first 50 high-magnification targets minus the time required for these 50 targets to be acquired.

Discussion

SmartScope is the first package specifically designed to assist, document, and automate specimen evaluation during the process of optimizing samples for cryo-EM. The program delivers a unique user experience through a WebUI that provides live remote supervision and control of the screening process using a standard web browser. The same WebUI facilitates analysis of results at any time during and after a session. Multiple users can access the same live or stored session simultaneously and multiple instruments can be controlled from the same server. Automated navigation routines provide control of the microscope without granting full access to functions that may compromise the integrity of the instrument. SmartScope uses fast and robust AI-driven feature recognition algorithms to fully automate the cryo-EM imaging workflow. The steps of target identification, object classification, and clustering offer a powerful way to sample a wide variety of areas during screening and help determine the next steps in specimen optimization. This enables complete unsupervised execution of a screening workflow as well as supervised exploration with minimal user training.

SmartScope can also collect data in a semi-supervised or fully automated way. The areas automatically selected by SmartScope can be modified interactively or programmatically without interrupting the process of data collection. This maximizes the use of the microscope and offers the possibility of integrating feedback from in-line data processing workflows to adaptively improve image quality during acquisition, without user intervention. Unsupervised multi-specimen screening and short exploratory data collection sessions can be scheduled to run overnight, offering new ways of using the microscope.

The interface to the microscope hardware in SmartScope is currently achieved through SerialEM, which provides abstraction interfaces to the main microscope and detector manufacturers as well as being open source and well supported. However, the current integration is made so that interfaces to other software (e.g. Leginon, Digital Micrograph, SmartEPU) can also be integrated with SmartScope in the future.

SmartScope has a modular design where new object detection and classification algorithms can be added as plugins, allowing integration of existing object detection, and area selection programs for cryo-EM (Fan et al., 2022; Kim et al., 2021; Rheinberger et al., 2021; Schorb et al., 2019; Xu et al., 2020; Yokoyama et al., 2020; Yonekura et al., 2021). Additionally, the ability to use multiple feature detection and clustering methods at different magnification levels enables the creation of customized protocols for specific applications. This provides flexibility to optimize the selection of areas on a wider variety of targets in cryo-EM, such as virions, filaments, and cells. Finally, as we gather more data about difficult specimens and edge cases, we envision the establishment of a globally accessible ‘virtual microscopist’ server capable of improving itself through periodic re-training based on voluntarily submitted labeled datasets.

SmartScope has proven to be an extremely valuable tool in our facility. It has streamlined bookkeeping, which in turn resulted in better decision making for specimen optimization. It has also maximized microscope usage by eliminating idling time, reducing setup, and screening times. SmartScope facilitates data and instrumentation access as well as collaboration by easing access to cryo-EM technologies and improving the way cryo-EM experiments are carried out. With specimen screening as a primary focus, SmartScope addresses an important limiting step in cryo-EM.

Materials and methods

Cryo-EM

Request a detailed protocol

All the data presented in this study was acquired on a Talos Arctica (Thermo Fisher Scientific) operating at 200 kV and equipped with a K2 direct-electron detector (Gatan Inc). SmartScope was also tested on a Ceta CMOS detector and a Titan Krios (Thermo Fisher Scientific) equipped with a K3 detector and BioQuantum energy filter (Gatan Inc). The statistics were derived from data acquired exclusively with the K2 detector. For microscope and detector control, SmartScope uses SerialEM 4.0 through the python API library (Mastronarde, 2005).

Square finder

Request a detailed protocol

To localize and classify square windows, a Faster R-CNN-based framework (Girshick, 2015) that uses a ResNet50 architecture as the feature extraction backbone was adopted. It incorporates a feature pyramid network for identification of objects at different magnification levels. In addition, since most features have approximately equal width and height, the bounding boxes were constrained to have aspect ratios within the 0.8–1.2 range. To improve robustness and stability of the model, data augmentation was applied to the training data, including zoom-in/zoom-out, rotation, contrast adjustments, and flipping. The degree of augmentation for the contrast intensity was limited to the 0.8 and 1.2 range. To compensate for label imbalance, random oversampling was added during training. Squares are classified into six different classes: good (suitable for imaging), small (thick ice), contaminated, cracked, fractioned, and broken. The low-level magnification feature detector was trained using a total of 26 atlases from both carbon and gold mesh grids acquired on Ceta (Thermo Fisher Scientific) and K2 (Gatan Inc) detectors. Each atlas contains around 50–100 squares on average. The original atlases, usually having widths and heights greater than 10,000 pixels were downsampled to 2048 × 2048 pixels to reduce memory requirements. The framework is implemented using the python library Detectron2 (Wu et al., 2019). Training the detector takes around 2 hr when running on a NVIDIA TITAN V GPU card with 32 GB of RAM. The pre-trained weights are then used for fast real-time square detection during screening, which can evaluate each atlas image in under a second.

Hole finder

Request a detailed protocol

To identify holes in all grid types and contrast levels, a deep neural-network architecture based on the YOLOv5 model was adopted (Jocher et al., 2020; Redmon et al., 2016). We used the Cross Stage Partial Network (CSPNet) (Wang et al., 2019) as the feature extraction backbone and standard convolutional layers as detection layers. Since holes have circular shapes, the aspect ratios of the bounding boxes were also constrained. To further facilitate training, instead of using arbitrary numbers for anchor bounding boxes generation, clustering algorithms to the ground truth boxes from the training dataset were applied to find the most common occurring sizes and we used these sizes to determine the anchor bounding box sizes. Data augmentation was applied during training, including contrast/brightness adjustment, rotation/translation, zoom-in/zoom-out, and cropping. To deal with small contamination areas that can be incorrectly detected as holes, an additional ‘contaminants’ class is used to filter out such areas. Training of the hole finder was done using 36 square images acquired on Ceta (Thermo Fisher Scientic) and K2 (Gatan Inc) detectors and took 1.5 hr when running on a NVIDIA TITAN V GPU card with 32 GB of RAM and inference takes less than a second. For memory efficiency, each square was resized to 1280 × 1280 pixels.

POLG2 purification

Request a detailed protocol

Protein was expressed and purified essentially as described (Young et al., 2015) with the following exception: Triton-X was removed from all steps following lysis. Following Ni purification, pooled protein containing His6-POLG2, as determined by SDS-PAGE, was injected onto a monoS column. Protein was eluted from S column in a linear gradient from 5 to 50% Buffer B (25 mM HEPES, 1 M NaCl, 10% glycerol, 1 mM EDTA, 1 mM TCEP). Peak fractions were eluted around 310 mM NaCl. Fractions were checked for purity, combined, and concentrated using an Amicon concentrator (Millipore) to 28 µM. Protein is flash frozen and stored at –80 °C.

4.1 µM (monomer) his-tag POLG2 was incubated in a 1:1 molar ratio with FORK1 DNA as previously described (6) Oligonucleotides: (i) DCRANDOM- 44 ACTTGAATGCGGCTTAGTATGCATTGTAAAACGACGGCCAGTGC (2) TSTEM GCACTGGCCGTCGTTTTACGGTCGTGACTGGGAAAACCCTGGCG (3) U25 CGCCAGGGTTTTCCCAGTCACGACC were all purchased from IDT. Protein in a final buffer of 20 mM HEPES pH 8, 1.5 mM Tris pH 7.5, 30 mM KCl, 50 mM imidazole pH 8, 0.3 mM EDTA, 225 mM NaCl, 1.5% glycerol, 1 mM TCEP was incubated on ice for approximately 30 min before grid application.

Cryo-EM specimen preparation

Request a detailed protocol

UltraAUfoil R1.2/3 (Quantifoil Micro Tools GmbH) grids were glow-discharged on both sides for 30 s at 15 mA using a Pelco Easiglow. 3 µL of the final buffer was deposited on the back of the grid and 3 µL of POLG2 sample was deposited on front side of the grid. Excess sample was blotted 4 s with blotting force –1, the chamber set at 12 °C and 95% humidity using a Vitrobot Mark IV (Thermo Fisher Scientific).

Data collection of POLG2 with SmartScope

Request a detailed protocol

Data was collected on a Talos Arctica (Thermo Fisher Scientific) operating at 200 kV equipped with a Gatan K2 direct electron detector (Gatan Inc). Data collection was set up using SmartScope using a 6 × 6 tile atlas, image-shift grouping radius of 4 µm and minimum group size of four holes, rolling target defocus of –1.2 to –1.8 µm and drift settling threshold at 1 Å/s. A total of 4311 60-frame movies were collected at a 0.932 Å/pixel and a total dose of 54 e-/Å2. 3029 movies were collected at 0° tilt angle and 1282 movies were collected with 30° tilt angle. Data was collected at a rate of 120 movies per hour.

Cryo-EM data processing and refinement

Request a detailed protocol

The POLG2 dataset was processed using cryoSPARC (Punjani et al., 2017) as detailed in Figure 4—figure supplement 1C. Final maps were sharpened using DeepEMhancer (Sanchez-Garcia et al., 2021). An atomic model from PDB ID: 2G4C (Fan et al., 2006) was fit into the map using Chimera (Pettersen et al., 2004).

Data availability

Request a detailed protocol

Trained models used to obtain the results shown in Figures 1 and 2 are available to download from 10.5281/zenodo.6842025. The Jupyter notebook used to aggregate the statistic in Figure 3 and Appendix 1—Tables 2–5 is part of the code repository (Bouvette et al., 2022a). Cryo-EM density maps of POLG2 collected using SmartScope have been deposited in the Electron Microscopy Data Bank (EMDB) with accession code EMD-25764. Square and hole images and corresponding labels used for training the ML models are available at 10.5281/zenodo.6814642 (square finder) and 10.5281/zenodo.6814652 (hole finder).

Code availability

Request a detailed protocol

The source code for SmartScope is distributed under the open source BSD 3-clause license available at https://github.com/NIEHS/SmartScope (copy archived at swh:1:rev:9e58e2a2b278ca65156390175d393819fbb16a3b, Bouvette et al., 2022a). The AI algorithms are available as a standalone package at https://gitlab.cs.duke.edu/bartesaghilab/smartscopeAI (copy archived at swh:1:rev:43b29ae8c333a94463e0a4d9ecb97a5d5b6adf92; Bouvette et al., 2022b).

Appendix 1

SmartScope: Framework for unsupervised cryo-EM imaging

Appendix 1—table 1
Input parameters for a SmartScope session.

Parameter names and their description. This is presented as a form on the web interface. These parameters can also be updated during the imaging process.

Parameter nameDescription
General session parameters
Session nameName of the microscopy session
GroupGroup name of the microscopy session. Usually the principal investigator’s name
MicroscopeMicroscope being used for the session
DetectorDetector being used for the session
Collection parameters
Atlas XNumber of tiles for the Atlas acquisition in X axis (default: 3)
Atlas YNumber of tiles for the Atlas acquisition in Y axis (default: 3)
Square XNumber of tiles for the window acquisition in X axis (default: 1)
Square YNumber of tiles for the window acquisition in the Y axis (default: 1)
Squares numNumber of windows to be selected for high-magnification imaging (default: 3)
Holes per squareNumber of targets per windows to be selected for higher-magnification imaging. If 0 is entered, all targets will be selected and data collection mode will be enabled (default: 3)
BIS max distanceBeam-Image shift grouping radius in microns (default: 0)
Min BIS group sizeSmallest Beam-Image shift group size to be considered (default: 1)
Target defocus minLower end of the defocus range for rolling defocus in microns (default: –2)
Target defocus maxHigher end of the defocus range for rolling defocus in microns (default: –2)
Defocus stepStep by which the defocus is varied between each target group (default: 0)
Drift critDrift threshold to be met during the drift settling procedure before proceeding with high-magnification imaging. Use –1 to disable (default: –1)
Tilt angleTilt angle to use for high-magnification imaging. Works with BIS enabled (default: 0)
Save framesWhether to save the movie frames or return aligned sum (default: Saving enabled)
Zeroloss delayTime delay in hours for zero loss peak refinement. Only useful if the microscope has an energy filter. Use –1 to deactivate (default: –1)
Offset targetingEnable random targeting off-center to sample the ice gradient and carbon mesh particles. Automatically disabled in data collection mode (default: enabled)
Offset distanceOverride the random offset by an absolute value in microns. Can be used in data collection mode. Use –1 to disable (default: disabled)
Autoloader (1 per grid)
NameName of the grid
PositionPosition in the autoloader
Hole typeGrid hole spacing type (i.e. R1.2/1.3)
Mesh sizeGrid mesh size and spacing (i.e. 300)
Mesh materialGrid mesh material (i.e. carbon or gold)
Appendix 1—table 2
Examples of SerialEM settings for SmartScope usage.

These settings serve as guidelines and will need to be adapted for different hardware combinations. Updated versions of this table can be found at github.com/NIEHS/SmartScope.

Example 1Example 2
Instrument
MicroscopeTalos ArcticaTitan Krios G4
DetectorGatan K2 SummitGatan K3
Energy filter-Gatan BioContiuum
Low Dose Presets
Search
Magnification210580
Pixel size (Å/pixel)196152
ModeLinearCounting
View
Magnification26008700
Pixel size (Å/pixel)16.110.1
ModeLinearCounting
Focus/record
Magnification*36,00081,000
Pixel size (Å/pixel)*1.191.08
ModeCountingCounting
Full grid montage presets
Magnification62135
Pixel size (Å/pix)644654
ModeLinearCounting
  1. *

    These are the presets that are used for screening. They can be changes to suit the requirement for the specimen.

Appendix 1—table 3
Average precision obtained for each type of grid square.
Feature typeSmallCrackedDryContaminatedGoodPartial
Without augmentation65.7%71.6%77.9%46.7%77.7%45.0%
With augmentation73.4%76.5%79.4%50.41%81.2 %45.5%
Appendix 1—table 4
Smartscope screening mode statistics.

All grids were collected using a K2 detector. The default parameters for screening mode are a 9-tile atlas, 3 squares and 3 holes per square.

Total specimens = 981minmaxmeanmedianstandard deviation
Squares sampled072.53.01.3
Holes sampled0338.09.05.0
Time spent on specimen (min)3.056.321.020.98.4
Appendix 1—table 5
Smartscope screening mode statistics excluding specimens with no usable or visible squares.

All grids were collected using a K2 detector. The default parameters for screening mode are a 9-tile atlas, 3 squares and 3 holes per square.

Total specimens = 772minmaxmeanmedianstandard deviation
Square sampled173.03.00.9
Hole sampled1339.49.04.0
Time spent on specimen (min)9.156.323.021.76.7
Appendix 1—table 6
Smartscope data collection mode statistics.

Start times are at the start of specimen loading in the column. All grids were imaged using a K2 detector.

Total specimens = 58minmaxmedianmean ± SD
Total micrographs565833316262155±1502
Holes per hour51141100100±22
Data collection setup time (min)*6803234±17
Setup time per 1000 micrographs (min)4551719±11
  1. *

    Data collection times are calculated as the time needed from grid loading to the collection of the first 50 high-magnification targets minus the time required for these 50 targets to be acquired.

Data availability

SmartScope is open source software. Source code and installation instructions are available from https://github.com/NIEHS/SmartScope (copy archived at swh:1:rev:9e58e2a2b278ca65156390175d393819fbb16a3b). Cryo-EM density maps of POLG2 collected using SmartScope have been deposited in the Electron Microscopy Data Bank (EMDB) with accession code EMD-25764.

The following data sets were generated
    1. Bouvette J
    2. Huang Q
    3. Bartesaghi A
    4. Borgnia MJ
    (2022) Zenodo
    Automated systematic evaluation of cryo-EM specimens with SmartScope - Training data for square detector.
    https://doi.org/10.5281/zenodo.6814642
    1. Bouvette J
    2. Huang Q
    3. Bartesaghi A
    4. Borgnia MJ
    (2022) Zenodo
    Automated systematic evaluation of cryo-EM specimens with SmartScope - Training data for hole detector.
    https://doi.org/10.5281/zenodo.6814652
    1. Bouvette J
    2. Huang Q
    3. Bartesaghi A
    4. Borgnia MJ
    (2022) Zenodo
    Automated systematic evaluation of cryo-EM specimens with SmartScope - Trained models for square and hole detectors.
    https://doi.org/10.5281/zenodo.6842025
    1. Riccio AA
    2. Bouvette J
    3. Borgnia MJ
    4. Copeland WC
    (2022) Electron Microscopy Data Bank
    ID EMD-25764. human DNA polymerase gamma accessory subunit, POLG2.

References

  1. Conference
    1. Girshick R
    (2015) Fast R-CNN
    2015 IEEE International Conference on Computer Vision (ICCV.
    https://doi.org/10.1109/ICCV.2015.169
  2. Software
    1. Jocher GR
    2. Stoken A
    3. Borovec J
    4. Claramunt ER
    5. Changyu L
    6. hopesala DP
    7. NanoCode, ChristopherSTAN
    (2020)
    ultralytics/yolov5, version v3.0
    Ultralytics/Yolov5.
  3. Book
    1. Passmore LA
    2. Russo CJ
    (2016) Specimen Preparation for High-Resolution Cryo-EM
    In: Passmore LA, editors. Methods in Enzymology, The Resolution Revolution: Recent Advances In CryoEM. Academic Press. pp. 51–86.
    https://doi.org/10.1016/bs.mie.2016.04.011
  4. Book
    1. Xu H
    2. Timm DE
    3. Elhabian SY
    (2020) Attention-Guided Quality Assessment for Automated Cryo-EM Grid Screening
    In: Martel AL, Abolmaesumi P, Stoyanov D, Mateus D, Zuluaga MA, Zhou SK, Racoceanu D, Joskowicz L, editors. Medical Image Computing and Computer Assisted Intervention – MICCAI 2020, Lecture Notes in Computer Science. Cham: Springer International Publishing. pp. 56–65.
    https://doi.org/10.1007/978-3-030-59722-1_6

Decision letter

  1. Sjors HW Scheres
    Reviewing Editor; MRC Laboratory of Molecular Biology, United Kingdom
  2. Kenton J Swartz
    Senior Editor; National Institute of Neurological Disorders and Stroke, National Institutes of Health, United States
  3. Sjors HW Scheres
    Reviewer; MRC Laboratory of Molecular Biology, United Kingdom
  4. Arjen J Jakobi
    Reviewer; Delft University of Technology, Netherlands

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Automated systematic evaluation of cryo-EM specimens with SmartScope" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, including Sjors HW Scheres as the Reviewing Editor and Reviewer #1, and the evaluation has been overseen by a Reviewing Editor and Kenton Swartz as the Senior Editor. The following individual involved in the review of your submission has agreed to reveal their identity: Arjen J Jakobi (Reviewer #3).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

All three reviewers agreed that the software described in this paper would be a highly useful addition to the field and are in support of publication.

1) A table summarising the data acquisition parameters and data collection statistics should be provided.

Strong encouragements:

1) The reviewers praise the authors for their intention to make the code available to the community and encourage them to do so as soon as possible.

2) Besides providing the source of the smartScope program on github, the authors are also encouraged to provide the labelled raw training data for the convolutional neural networks. Public repositories like Zenodo or EMPIAR could be used.

3) The 3 reviewers each make useful suggestions to improve the manuscript below. Although these are not essential for acceptance, the authors are encouraged to give this serious consideration.

Reviewer #1 (Recommendations for the authors):

All Supplementary Figures should become supplements to one of the main figures.

Reviewer #2 (Recommendations for the authors):

The paper is in my opinion very well structured with clear figures and procedure steps. In general, I really enjoyed reading it and I am willing to test it soon. Of course, a work of this type passes the real test when it starts being used by the community. I believe that SmartScope will be used as there are no other fully automated solutions for grid screening.

From the description in the text, the software appears to be quite solid and the authors have pretty much answered in the text itself any questions that were coming to my mind while reading. In short, this manuscript is very detailed, easy to follow in a logical mind flow, and takes into consideration all aspects of specimen screening and data acquisition. From my side, the article is publishable as it is.

Some more detailed comments:

Line 143: SmartFlow is bound to serialEM, will it be possible to couple it to EPU for example, which is widely used, or other software used for the collection of diffraction data? How dependent is on serialEM 4.0? Will SmartScope be maintained as serialEM develops?

Line 153: Is there a minimal magnification that has to be used for the window clustering to work properly? Can the authors include some comments/guidelines on this?

Line 175: Do I understand correctly that some images are also taken in the carbon, or at the carbon/ice interface to evaluate the distribution of particles?

Line 179: I would simplify the procedure to gain time, at least for relatively short exposure times. Probably not in all cases frames alignment and CTF estimation are necessary, to view the stability of the new microscopes equipped with autoloader. However, it is probably better to have these options while running the automatic screening e.g. overnight.

Line 221: Out of full completeness the authors might comment on how their algorithm to maximise target coverage while minimising numbers of groups compares to the same procedure in EPU. There the BIS applied can be changed. Is that possible in SmartScope?

The fact that SmartScope can perform BIS collection on tilted specimens in a correct way is great.

For the evaluation of the network organisation part, I have asked the opinion of an expert colleague, who commented:

Line 237: I don't quite understand what the authors mean with: "bundles a webserver and the main imaging workflow". I assume the webserver is for interacting with the program and the "main imaging workflow" is what runs on the worker if installed on separate computers, but it would be better to make it a bit more clear.

The Singularity container image is convenient and good that it's versatile enough to be installed on separate systems.

Line 245: Why do both the web server and the worker need to have access to a shared disk? Does the worker write and the web server only reads the results to display to the user? What kind of information is stored in the database?

248: The object store can be accessed via the Amazon S3 API. It's a nice feature for exporting data to "the cloud". Maybe the author should clarify a bit what are the full advantages that one can get from such a setup?

Supplementary 43: "communicate with each other using Socket or SSH connections" Do they mean HTTP? In the graph, there is no mention of SSH connections, only HTTP.

Reviewer #3 (Recommendations for the authors):

Points for consideration:

– The introduction describes many important concepts and limitations in the preparation of cryo-EM specimens and in screening/data collection, but only sparsely cites, and if, very general reviews on these topics. The reader may benefit if the authors would more specifically refer to the excellent primary literature on these topics.

– In addition to their YOLO-based hole finder, the authors may consider mentioning that the open and modular approach of their software allows integration with approaches such as e.g. virtual maps in Py-EM, which could extend the reach of their method to more sophisticated screening/targeting scenarios and different sample supports.

– The micrograph pre-processing routines implemented in SmartScope currently involve frame alignment and CTF estimation. The authors may consider including the possibility of image denoising workflows such as those implemented in data pre-processing pipelines (Warp, Scipion, SPHIRE, CryoSparc Live, …), which can be useful tools for rapid selection of suitable imaging areas, in particular if particles are small.

– The authors comment on the screening mode statistics from the operation of SmartScope in their facility. From their facility projects, are there statistics available on how many cases the screening procedure followed by automated, targeted data collection lead to successful structure determination, and how this compares to manual screening and subjective targeting on the same sample? This could be insightful.

– The authors mention that data collection of human mitochondrial DNA polymerase involved tilting of the specimen in a subset of the dataset to improve angular sampling. It would be useful to show angular distribution plots for the data excluding and including the tilted images to illustrate this was required and how automation by SmartScope can help detect and mitigate such problems on the fly.

– A table summarising the data acquisition parameters and data collection statistics should be provided.

– It is laudable that the authors make their software and models publicly available. It would be more useful to have the repository public open at the moment of preprint posting so as to give reviewers the possibility to also screen review code. The data availability statement contains a remark of disclosure of data upon reasonable request. It is unclear what this statement means, and which requests are considered reasonable; these data can be useful for other academic projects following related but complementary approaches, as well as comparison and benchmarking. I encourage the authors to make raw ML data available through public repositories such as Zenodo, and to deposit raw micrographs at EMPIAR.

https://doi.org/10.7554/eLife.80047.sa1

Author response

Essential revisions:

All three reviewers agreed that the software described in this paper would be a highly useful addition to the field and are in support of publication.

We appreciate the support of the reviewers.

1) A table summarising the data acquisition parameters and data collection statistics should be provided.

Thank you for pointing this out, we have added this information in the new table1.

Strong encouragements:

1) The reviewers praise the authors for their intention to make the code available to the community and encourage them to do so as soon as possible.

Thank you. The code will be made available immediately when the manuscript becomes live. The Code availability section now states that the code will be available and the corresponding github URL.

2) Besides providing the source of the smartScope program on github, the authors are also encouraged to provide the labelled raw training data for the convolutional neural networks. Public repositories like Zenodo or EMPIAR could be used.

Thanks for pointing this out, this was our intention from the beginning, which we considered implicit in committing to the publication of the code. The raw training data is now available on Zenodo. The data availability section of the manuscript has been modified to refer explicitly to the training datasets as well as to the trained model.

3) The 3 reviewers each make useful suggestions to improve the manuscript below. Although these are not essential for acceptance, the authors are encouraged to give this serious consideration.

We appreciate the suggestions and we have introduced changes guided by them.

Reviewer #1 (Recommendations for the authors):

All Supplementary Figures should become supplements to one of the main figures.

We have reorganized the figures as requested.

Reviewer #2 (Recommendations for the authors):

The paper is in my opinion very well structured with clear figures and procedure steps. In general, I really enjoyed reading it and I am willing to test it soon. Of course, a work of this type passes the real test when it starts being used by the community. I believe that SmartScope will be used as there are no other fully automated solutions for grid screening.

From the description in the text, the software appears to be quite solid and the authors have pretty much answered in the text itself any questions that were coming to my mind while reading. In short, this manuscript is very detailed, easy to follow in a logical mind flow, and takes into consideration all aspects of specimen screening and data acquisition. From my side, the article is publishable as it is.

Some more detailed comments:

Line 143: SmartFlow is bound to serialEM, will it be possible to couple it to EPU for example, which is widely used, or other software used for the collection of diffraction data? How dependent is on serialEM 4.0? Will SmartScope be maintained as serialEM develops?

Historically, SerialEM didn’t remove features so we expect that all the current scripting commands within SerialEM4.0 will exist for the foreseeable future. We are planning on making SmartScope a long-term solution where more features and updates will be added.

Line 153: Is there a minimal magnification that has to be used for the window clustering to work properly? Can the authors include some comments/guidelines on this?

We added a supplementary table 2 with the magnification/pixel sizes that have been used for acquiring the squares. The hole finder was also successfully tested on the low-mag atlas but is not part of the software yet.

Line 175: Do I understand correctly that some images are also taken in the carbon, or at the carbon/ice interface to evaluate the distribution of particles?

Intermediate magnification images always include carbon areas (for carbon grids naturally). The version that we tested for this version of the manuscript did not contemplate taking images of the interface at higher magnification. In a newer version, now available, random exploration of the hole area including the water carbon interface can be achieved by specifying an “offset” parameter from the center of the hole, we have amended the manuscript to reference this parameter. Future versions may include “smarter” exploration based on the analysis of intermediate resolution pictures.

Line 179: I would simplify the procedure to gain time, at least for relatively short exposure times. Probably not in all cases frames alignment and CTF estimation are necessary, to view the stability of the new microscopes equipped with autoloader. However, it is probably better to have these options while running the automatic screening e.g. overnight.

Frame alignment is not essential and will occur only when the exposure mode is set in SerialEM to save the frames. However, since these processes occur asynchronously from acquisition, movie alignment does not have an impact in the speed of imaging.

Line 221: Out of full completeness the authors might comment on how their algorithm to maximise target coverage while minimising numbers of groups compares to the same procedure in EPU. There the BIS applied can be changed. Is that possible in SmartScope?

SmartScope will group the targets according to an input BIS radius in the acquisition settings, which can be changed at any point during the session. A sentence was added in the manuscript to clarify this point. Interestingly, newer versions of EPU seem to have eliminated the user’s ability to modify this parameter in the GUI. The BIS (AFIS) radius in EPU can still be changed through parameter hidden in the global configuration files, but requires restarting the software. The comparison with the algorithm used in EPU is difficult as the code for EPU is not available.

The fact that SmartScope can perform BIS collection on tilted specimens in a correct way is great.

For the evaluation of the network organisation part, I have asked the opinion of an expert colleague, who commented:

Line 237: I don't quite understand what the authors mean with: "bundles a webserver and the main imaging workflow". I assume the webserver is for interacting with the program and the "main imaging workflow" is what runs on the worker if installed on separate computers, but it would be better to make it a bit more clear.

We have reworked this section to improve clarity.

The Singularity container image is convenient and good that it's versatile enough to be installed on separate systems.

Line 245: Why do both the web server and the worker need to have access to a shared disk? Does the worker write and the web server only reads the results to display to the user? What kind of information is stored in the database?

Only metadata about the relationship between the images and their targets is stored in the database. All the images are served directly from disk. This helps in keeping the size of the database small.

248: The object store can be accessed via the Amazon S3 API. It's a nice feature for exporting data to "the cloud". Maybe the author should clarify a bit what are the full advantages that one can get from such a setup?

This feature allows to set up mirror instance in the cloud to provide access to users who, for security reasons, may not have access to the resources behind a firewall. At this point, this instance is distinct from the system that allows navigation of the specimen in real time.

Supplementary 43: "communicate with each other using Socket or SSH connections" Do they mean HTTP? In the graph, there is no mention of SSH connections, only HTTP.

SSH has been added to the figure to match the description.

Reviewer #3 (Recommendations for the authors):

Points for consideration:

– The introduction describes many important concepts and limitations in the preparation of cryo-EM specimens and in screening/data collection, but only sparsely cites, and if, very general reviews on these topics. The reader may benefit if the authors would more specifically refer to the excellent primary literature on these topics.

Heeding the request of Reviewer #1, we have shortened significantly the introduction removing most of the discussion on the difficulties of specimen preparation. Instead, we have introduced a couple of key references.

– In addition to their YOLO-based hole finder, the authors may consider mentioning that the open and modular approach of their software allows integration with approaches such as e.g. virtual maps in Py-EM, which could extend the reach of their method to more sophisticated screening/targeting scenarios and different sample supports.

We have added a citation to Py-EM to the paragraph in which we discuss modularity.

– The micrograph pre-processing routines implemented in SmartScope currently involve frame alignment and CTF estimation. The authors may consider including the possibility of image denoising workflows such as those implemented in data pre-processing pipelines (Warp, Scipion, SPHIRE, CryoSparc Live, …), which can be useful tools for rapid selection of suitable imaging areas, in particular if particles are small.

This is indeed a feature that we’re interested in adding to replace the current preprocessing when necessary. All the pieces are in place to build different wrappers around other packages to fetch the processing data. We didn’t focus on these yet because they didn’t seem quite as urgent for processing a few screening images. However, it becomes more interesting when thinking about data collection.

– The authors comment on the screening mode statistics from the operation of SmartScope in their facility. From their facility projects, are there statistics available on how many cases the screening procedure followed by automated, targeted data collection lead to successful structure determination, and how this compares to manual screening and subjective targeting on the same sample? This could be insightful.

We did not run a control comparison between SmartScope and manual operation. One of the main reasons is that, in the case of data collection, humans are still the ones deciding which are the squares to collect.

– The authors mention that data collection of human mitochondrial DNA polymerase involved tilting of the specimen in a subset of the dataset to improve angular sampling. It would be useful to show angular distribution plots for the data excluding and including the tilted images to illustrate this was required and how automation by SmartScope can help detect and mitigate such problems on the fly.

The intent was not to provide a solution to a problem with this particular sample but rather demonstrate that the quality of tilted data approximates that of untitled images. As it turns out, tilting was not required for POLG2. It was our first time collecting the samples so a subset of the data was collected with tilt to ensure a better angular distribution. We added a breakdown of the tilted vs non-tilted resolutions and angular distribution in the table 1 and Figure 1—figure supplement 2 The particles from the tilted data alone reached a 3.9A resolution and although their inclusion did not improve the to the overall final resolution in the combined dataset, the distribution plot indicates improvements in angular sampling.

– A table summarising the data acquisition parameters and data collection statistics should be provided.

This information was added in the new table 1.

– It is laudable that the authors make their software and models publicly available. It would be more useful to have the repository public open at the moment of preprint posting so as to give reviewers the possibility to also screen review code. The data availability statement contains a remark of disclosure of data upon reasonable request. It is unclear what this statement means, and which requests are considered reasonable; these data can be useful for other academic projects following related but complementary approaches, as well as comparison and benchmarking. I encourage the authors to make raw ML data available through public repositories such as Zenodo, and to deposit raw micrographs at EMPIAR.

We decided to go through a closed beta phase to address questions of ease of installation, documentation and testing at other facilities. Adoption of a new software relies on making sure these points are addressed before release. The code will be made available via github upon publication, and the training data and trained models will be made available on Zenodo. We have made changes to the data and code availability sections to reflect these changes and reported the datasets in the manuscript submission system. We agree with the reviewer that it would be a good feature for the journal to provide means of ensuring access to private repositories by anonymous reviewers.

https://doi.org/10.7554/eLife.80047.sa2

Article and author information

Author details

  1. Jonathan Bouvette

    Genome Integrity and Structural Biology Laboratory, National Institute of Environmental Health Sciences, Research Triangle Park, United States
    Contribution
    Conceptualization, Software, Formal analysis, Investigation, Visualization, Methodology, Writing – original draft, Writing – review and editing, Development and implementation of SmartScope
    Contributed equally with
    Qinwen Huang
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-3550-5319
  2. Qinwen Huang

    Department of Computer Science, Duke University, Durham, United States
    Contribution
    Software, Formal analysis, Investigation, Visualization, Methodology, Writing – original draft, Writing – review and editing, Developed and implemented the AI detection and classification for SmartScope
    Contributed equally with
    Jonathan Bouvette
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-7082-5257
  3. Amanda A Riccio

    Genome Integrity and Structural Biology Laboratory, National Institute of Environmental Health Sciences, Research Triangle Park, United States
    Contribution
    Formal analysis, Investigation, Visualization, Methodology, Writing – original draft, Writing – review and editing, PolG2 project including cloning and purification of the protein
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-7782-0363
  4. William C Copeland

    Genome Integrity and Structural Biology Laboratory, National Institute of Environmental Health Sciences, Research Triangle Park, United States
    Contribution
    Resources, Supervision, Funding acquisition, Methodology, PolG2 project including conceptualization
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-0359-0953
  5. Alberto Bartesaghi

    1. Department of Computer Science, Duke University, Durham, United States
    2. Department of Electrical and Computer Engineering, Duke University, Durham, United States
    3. Department of Biochemistry, Duke University School of Medicine, Durham, United States
    Contribution
    Conceptualization, Resources, Supervision, Funding acquisition, Writing – original draft, Project administration, Writing – review and editing, Developed and implemented the AI detection and classification for SmartScope
    For correspondence
    alberto.bartesaghi@duke.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-7360-1523
  6. Mario J Borgnia

    Genome Integrity and Structural Biology Laboratory, National Institute of Environmental Health Sciences, Research Triangle Park, United States
    Contribution
    Conceptualization, Resources, Supervision, Funding acquisition, Writing – original draft, Project administration, Writing – review and editing, Development and implementation of SmartScope
    For correspondence
    mborgnia@nih.gov
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-9159-1413

Funding

National Institute of Environmental Health Sciences (ZIC ES103326)

  • Mario J Borgnia

National Institute of Environmental Health Sciences (ZIA ES103341)

  • Mario J Borgnia

National Institute of Environmental Health Sciences (Z01 ES065078)

  • William C Copeland

Chan Zuckerberg Initiative (2021-234602)

  • Alberto Bartesaghi

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

This work was supported in part by the Intramural Research Program of the NIH; National Institute of Environmental Health Sciences (ZIC ES103326 and ZIA ES103341 to M.J.B., and Z01 ES065078 to W.C.C.), and a Visual Proteomics Imaging grant from the Chan Zuckerberg Initiative (2021-234602 to A.B.). This work utilized cloud computational resources and services accessed through the NIH STRIDES Initiative (https://cloud.nih.gov) and computational resources offered by Duke Research Computing (http://rc.duke.edu). We thank Tracy Futhey, Charley Kneifel, Katie Kilroy, Mike Newton, Victor Orlikowski, Tom Milledge, and David Lane from the Duke Office of Information Technology and Research Computing for assistance with the computing environment. We also thank David Fargo, John Grovenstein, and Chris Stone from the NIEHS Office of Scientific Computing for allocating computational resources to this project and especially to Gregory Stamper for assistance in setting up computing environments. We also thank Dr. Joshua Strauss from the UNC cryo-EM Core and Dr. Nilakshee Bhattacharya from the Duke Shared Material Instrumentation Facility (SMIF) for testing early versions of SmartScope. Molecular graphics and analyses performed with UCSF Chimera, developed by the Resource for Biocomputing, Visualization, and Informatics at the University of California, San Francisco, with support from NIH P41-GM103311. We thank Robin E Stanley and Bradley P Klemm for critical review of the manuscript.

Senior Editor

  1. Kenton J Swartz, National Institute of Neurological Disorders and Stroke, National Institutes of Health, United States

Reviewing Editor

  1. Sjors HW Scheres, MRC Laboratory of Molecular Biology, United Kingdom

Reviewers

  1. Sjors HW Scheres, MRC Laboratory of Molecular Biology, United Kingdom
  2. Arjen J Jakobi, Delft University of Technology, Netherlands

Publication history

  1. Preprint posted: May 6, 2022 (view preprint)
  2. Received: May 6, 2022
  3. Accepted: July 21, 2022
  4. Version of Record published: August 23, 2022 (version 1)

Copyright

This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

Metrics

  • 1,499
    Page views
  • 204
    Downloads
  • 4
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Jonathan Bouvette
  2. Qinwen Huang
  3. Amanda A Riccio
  4. William C Copeland
  5. Alberto Bartesaghi
  6. Mario J Borgnia
(2022)
Automated systematic evaluation of cryo-EM specimens with SmartScope
eLife 11:e80047.
https://doi.org/10.7554/eLife.80047

Further reading

    1. Structural Biology and Molecular Biophysics
    Zeyu Shen, Bowen Jia ... Mingjie Zhang
    Research Article

    Formation of membraneless organelles or biological condensates via phase separation and related processes hugely expands the cellular organelle repertoire. Biological condensates are dense and viscoelastic soft matters instead of canonical dilute solutions. To date, numerous different biological condensates have been discovered; but mechanistic understanding of biological condensates remains scarce. In this study, we developed an adaptive single molecule imaging method that allows simultaneous tracking of individual molecules and their motion trajectories in both condensed and dilute phases of various biological condensates. The method enables quantitative measurements of concentrations, phase boundary, motion behavior and speed of molecules in both condensed and dilute phases as well as the scale and speed of molecular exchanges between the two phases. Notably, molecules in the condensed phase do not undergo uniform Brownian motion, but instead constantly switch between a (class of) confined state(s) and a random diffusion-like motion state. Transient confinement is consistent with strong interactions associated with large molecular networks (i.e., percolation) in the condensed phase. In this way, molecules in biological condensates behave distinctly different from those in dilute solutions. The methods and findings described herein should be generally applicable for deciphering the molecular mechanisms underlying the assembly, dynamics and consequently functional implications of biological condensates.

    1. Structural Biology and Molecular Biophysics
    Seoyoon Kim, Daehyo Lee ... Duyoung Min
    Tools and Resources

    Single-molecule tweezers, such as magnetic tweezers, are powerful tools for probing nm-scale structural changes in single membrane proteins under force. However, the weak molecular tethers used for the membrane protein studies have limited the observation of long-time, repetitive molecular transitions due to force-induced bond breakage. The prolonged observation of numerous transitions is critical in reliable characterizations of structural states, kinetics, and energy barrier properties. Here, we present a robust single-molecule tweezer method that uses dibenzocyclooctyne (DBCO) cycloaddition and traptavidin binding, enabling the estimation of the folding 'speed limit' of helical membrane proteins. This method is >100 times more stable than a conventional linkage system regarding the lifetime, allowing for the survival for ~12 h at 50 pN and ~1000 pulling cycle experiments. By using this method, we were able to observe numerous structural transitions of a designer single-chained transmembrane (TM) homodimer for 9 h at 12 pN, and reveal its folding pathway including the hidden dynamics of helix-coil transitions. We characterized the energy barrier heights and folding times for the transitions using a model-independent deconvolution method and the hidden Markov modeling (HMM) analysis, respectively. The Kramers rate framework yields a considerably low speed limit of 21 ms for a helical hairpin formation in lipid bilayers, compared to μs scale for soluble protein folding. This large discrepancy is likely due to the highly viscous nature of lipid membranes, retarding the helix-helix interactions. Our results offer a more valid guideline for relating the kinetics and free energies of membrane protein folding.