1. Cell Biology
  2. Computational and Systems Biology
Download icon

CEM500K, a large-scale heterogeneous unlabeled cellular electron microscopy image dataset for deep learning

  1. Ryan Conrad
  2. Kedar Narayan  Is a corresponding author
  1. Center for Molecular Microscopy, Center for Cancer Research, National Cancer Institute, National Institutes of Health, United States
  2. Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, United States
Tools and Resources
Cite this article as: eLife 2021;10:e65894 doi: 10.7554/eLife.65894
11 figures, 2 tables and 6 additional files


Preparation of a deep learning appropriate 2D EM image dataset rich with relevant and unique features.

(a) Percent distribution of collated experiments grouped by imaging technique: TEM, transmission electron microscopy; SEM, scanning electron microscopy. (b) Distribution of imaging plane pixel spacings in nm for volumes in the 3D corpus. (c) Percent distribution of collated experiments by organism and tissue origin. (d) Schematic of our workflow: 2D electron microscopy (EM) image stacks (top left) or 3D EM image volumes sliced into 2D cross-sections (top right) were cropped into patches of 224 × 224 pixels, comprising CEMraw. Nearly identical patches excepting a single exemplar were eliminated to generate CEMdedup. Uninformative patches were culled to form CEM500K.

Figure 1—source data 1

Details of imaging technique, organism, tissue type and imaging plane pixel spacing in collated imaging experiments.

CEM500K pre-training improves the transferability of learned features.

(a) Example images and colored label maps from each of the six publicly available benchmark datasets: clockwise from top left: Kasthuri++, UroCell, CREMI Synaptic Clefts, Guay, Perez, and Lucchi++. The All Mitochondria benchmark is a superset of these benchmarks and is not depicted. (b) Schematic of our pre-training, transfer, and evaluation workflow. Gray blocks denote trainable models with randomly initialized parameters; blue block denotes a model with frozen pre-trained parameters. (c) Baseline Intersection-over-Union (IoU) scores for each benchmark achieved by skipping MoCoV2 pre-training. Randomly initialized parameters in ResNet50 layers were transferred directly to UNet-ResNet50 and frozen during training. (d) Measured percent difference in IoU scores between models pre-trained on CEMraw vs. CEM500K (red) and on CEMdedup vs. CEM500K (blue). (e) Measured percent difference in IoU scores between a model pre-trained on CEM500K over the mouse brain (Bloss) pre-training dataset. Benchmark datasets comprised exclusively of electron microscopy (EM) images of mouse brain tissue are highlighted.

Figure 2—source data 1

IoU scores achieved with different datasets used for pre-training.

Features learned from CEM500K pre-training are more robust to image transformations and encode for semantically meaningful objects with greater selectivity.

(a) Mean firing rates calculated between feature vectors of images distorted by (i) rotation, (ii) Gaussian blur, (iii) Gaussian noise, (iv) brightness v. contrast, (vi) scale. Dashed black lines show the range of augmentations used for CEM500K + MoCoV2 during pre-training. For transforms in the top row, the undistorted images occur at x = 0; bottom row, at x = 1. (b) Evaluation of features corresponding to ER (left), mitochondria (middle), and nucleus (right). For each organelle, the panels show: input image and ground truth label map (top row), heatmap of CEM500K-moco activations of the 32 filters most correlated with the organelle and CEM500K-moco binary mask created by thresholding the mean response at 0.3 (middle row), IN-moco activations and IN-moco binary mask (bottom row). Also included are Point-Biserial correlation coefficients (rpb) values and Intersection-over-Union scores (IoUs) for each response and segmentation. All feature responses are rescaled to range [0, 1]. (c) Heatmap of occlusion analysis showing the region in each occluded image most important for forming a match with a corresponding reference image. All magnitudes are rescaled to range [0, 1].

Models pre-trained on CEM500K yield superior segmentation quality and training speed on all segmentation benchmarks.

(a) Plot of percent difference in segmentation performance between pre-trained models and a randomly initialized model. (b) Example segmentations on the UroCell benchmark in 3D (top) and 2D (bottom). The black arrows show the location of the same mitochondrion in 2D and in 3D. (c) Example segmentations from all 2D-only benchmark datasets. The red arrow marks a false negative in ground truth segmentation detected by the CEM500K-moco pre-trained model. (d) Top, average IoU scores as a percent of the average IoU after 10,000 training iterations, bottom, absolute average IoU scores over a range of training iteration lengths.

Figure 4—source data 1

IoU scores for different pre-training protocols.

Figure 4—source data 2

IoU scores for different training iterations by pre-training protocol .

Appendix 1—figure 1
Deduplication and image filtering.

(a) Breakdown of fractions (top) and representative examples (bottom) of patches labeled ‘uninformative’ by a trained deep learning (DL) model based on defect (as determined by a human annotator). (b) Receiver operating characteristic curve for the DL model classifier and a Random Forest classifier evaluated on a holdout test set of 2000 manually labeled patches (1000 informative and 1000 uninformative).

Appendix 1—figure 2
Randomly selected images from CEMraw, CEMdedup, and CEM500K.
Appendix 1—figure 3
Schematics of the MoCoV2 algorithm and UNet-ResNet50 model architecture.

(a) Shows a single step in the MoCoV2 algorithm. A batch of images is copied; images in each copy of the batch are independently and randomly transformed and then shuffled into a random order (the first batch is called the query and the second is called the key). Query and key are encoded by two different models, the encoder and momentum encoder, respectively. The encoded key is appended to the queue. Dot products of every image in the query with every image in the queue measure similarity. The similarity between an image in the query and its match from the key is the signal that informs parameter updates. More details in He et al., 2019. (b) Detailed schematic of the UNet-ResNet50 architecture.

Appendix 1—figure 4
Randomly selected images from the Bloss et al., 2018 pre-training dataset.
Appendix 1—figure 5
Visual comparison of results on the UroCell benchmark.

The ground truth and Authors’ Best Results are taken from the original UroCell publication (Žerovnik Mekuč et al., 2020). The results from the CEM500K-moco pre-trained model have been colorized to approximately match the originals; 2D label maps were not included in the UroCell paper.

Appendix 1—figure 6
Images from source electron microscopy (EM) volumes are unequally represented in the subsets of CEM.

The line at 45° shows the expected curve for perfect equality between all source volumes (i.e. each volume would contribute the same number of images to CEMraw, CEMdedup, or CEM500K). Gini coefficients measure the area between the Lorenz Curves and the line of perfect equality, with 0 meaning perfect equality and 1 meaning perfect inequality. For each subset of cellular electron microscopy (CEM), approximately 20% of the source 3D volumes account for 80% of all the 2D patches.

Appendix 1—figure 7
Plot showing the percent of random crops from an image that will be 100% uninformative based on the percent of the image that is informative.


Table 1
Comparison of segmentation Intersection-over-Union (IoU) results for benchmark datasets from models randomly initialized and pre-trained with MoCoV2 on the Bloss dataset, and CEMraw, CEMdedup, and CEM500K.

* denotes benchmarks that exclusively contain electron microscopy (EM) images from mouse brain tissue. The best result for each benchmark is highlighted in bold and underlined.

BenchmarkRandom Init.
(No Pre-training)
Bloss et al., 2018CEMrawCEMdedupCEM500K
All Mitochondria0.3060.6940.7190.7220.745
CREMI Synaptic Clefts0.0000.2420.2540.2590.265
*Average Mouse Brain0.7300.8930.8830.8900.893
Average Other0.2160.4890.4990.5180.536
Table 2
Comparison of segmentation IoU scores for different weight initialization methods versus the best results on each benchmark as reported in the publication presenting the segmentation task.

All IoU scores are the average of five independent runs. References listed after the benchmark names indicate the sources for Reported IoU scores.

BenchmarkTraining IterationsRandom Init.IN-superIN-mocoCEM500K-mocoReported
All Mitochondria100000.5870.6530.6530.770
CREMI Synaptic Clefts50000.0000.1960.2260.254
Guay (Guay et al., 2020)10000.3080.2750.3000.4290.417
Kasthuri++ (Casser et al., 2018)100000.9050.9080.9110.9150.845
Lucchi++ (Casser et al., 2018)100000.8940.8650.8920.8950.888
Perez (Perez et al., 2014)25000.6720.8860.8830.9010.821

Additional files

Source data 1

Details of image datasets acquired from external sources.

Source data 2

Zipped folder containing .xl files for Figure 1, 2 and 4 source data.

Supplementary file 1

Details of benchmarks used in this paper.

Supplementary file 2

IoU scores for pre-training with CEM500K after removing benchmark data from pre-training dataset .

Supplementary file 3

IoU scores on Guay benchmark using different hyperparameter choices.

Transparent reporting form

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)