Overview of cryoRL approach.

(A) Traditional data collection involves a sequential investigation of a cryo-EM sample atlas, squares, and holes found within patches to obtain high-magnification micrographs. (B) Overview of cryoRL approach. (1) Preparation: Low magnification survey of square- and patch-level images are collected on an area of interest. (2) Data collection: A deep Q-network is trained using hole assessment (“Deep Regressor”) and grid image hierarchies are inputs. The network learns to output an action that is based on the current state on the grid.

cryoRL collects high quality cryo-EM data.

(A) Overview of cryo-EM grid used for systematic data collect. Yellow squares indicate regions where almost every hole was collected. (B) Comparison of CTFMaxRes value for all micrographs (gray) vs. cryoRL (yellow). (C) Comparison of cryoRL, Predict-Rank and snowball baseline. (D) Example square showing path of cryoRL. Holes were shaded if they were in the trajectory of cryoRL. (E) Zoom-in on panel in (D). (F) & (G) cryoRL trajectory overlaid on ground-truth CTF values per hole.

Transferred models enable effective data collection by cryoRL.

(A) Schematic for apoferritin dataset. (B) CTFMaxRes histogram for cryoRL vs. all images. (C) Overall performance of cryoRL, Predict-Rank, and snowball baseline. (D) Schematic for RNA polymerase II (RNAPII) dataset. (E) CTFMaxRes histogram for cryoRL vs. all images. (F) Overall performance of cryoRL, Predict-Rank, and snowball baseline.

CryoRL outperforms humans.

(A) Schematic for AldolaseAu dataset. (B) Number of micrographs with CTFMaxRes < 6 Å over the data collection time for cryoRL and human test subjects. (C) Overall performance of cryoRL, Predict-Rank, snowball baseline and human test subjects. (D) Statistics of data collection behaviors of cryoRL and human test subjects. (E) Pairwise Jaccard similarity coefficients between the micrographs collected by cryoRL and human test subjects.

CryoRL data collection trajectories produce consistent reconstruction output.

(A) Standard workflow for analysis of all cryoRL and human datasets from AldolaseAu simulator data. Fourier shell correlation curves (FSC) for ten cryoRL (left) or human (right) trajectories after analysis using workflow in (A). (C) Comparison of FSC@0.143 resolution between cryoRL and human final reconstructions from trajectories. Error bars show S.D.

Cryo-EM grids are complicated data landscapes.

(A) Overview of cryo-EM grid used for systematic data collection. Yellow squares indicate regions where almost every hole was collected. Regions a and b are shaded and zoomed in. (B) Zoom-in on the shaded regions a and b in (A), with ground truth CTFMaxRes values overlayed. (C) Zoom-in on the selected squares in (B) (top), with ground truth CTFMaxRes values overlaid (bottom). (B) and (C) share the same color bar. Note that in real data collection, ground truth CTFMaxRes values are unknown before the holes are visited.

Performance of the regressors on the test datasets.

(A) Performance of the regressor on the Aldolase dataset. The regressor was trained on a subset of the Aldolase dataset and tested on the rest of the dataset. (B) Performance of the general regressor on the Apoferritin dataset. (C) Performance of the regressor (fine-tuned from the general regressor) on the RNAPII dataset. (D) Performance of the general regressor on the AldolaseAu dataset.

Confusion matrices of the regressors on the test datasets.

(A) Confusion matrix of the regressor result on the Aldolase dataset at a cutoff of 6 Å. The regressor was trained on a subset of the Aldolase dataset and tested on the rest of the dataset. (B) Confusion matrix of the general regressor result on the Apoferritin dataset at a cutoff of 6 Å. (C) Confusion matrix of the regressor (fine-tuned from the general regressor) result on the RNAPII dataset at a cutoff of 6 Å. (D) Confusion matrix of the general regressor

Performance of each human test subject on the AldolaseAu dataset.

Detailed performance of each human test subject in the human study, compared with the performance of the baseline and cryoRL. Different grayscale colors indicate CTFMaxRes below 4 Å, between 4 Å and 5 Å, between 5 Å and 6 Å, and above 6 Å, as in the main text figures.

Sharpened 3D reconstructions of AldolaseAu from human and cryoRL trajectories.

Reconstructions from human subjects (gray) and cryoRL (orange).

Input features to the deep-Q network

Time cost table used in this study.

Detailed information of the four experiments presented in this paper.