Low cost, high performance processing of single particle cryo-electron microscopy data in the cloud

  1. Michael A Cianfrocco  Is a corresponding author
  2. Andres E Leschziner
  1. Harvard University, United States
  2. Harvard Medical School, United States
5 figures, 1 table and 3 additional files

Figures

Workflow for analyzing cryo-EM data on Amazon's cloud computing infrastructure.

After collecting cryo-EM data (Step 1), particles are extracted from the micrographs and prepared for further analysis (Step 2). After logging into an ‘instance’ (Step 3), data are uploaded to a storage server (elastic block storage) (Step 4). At this point, STARcluster can be configured to launch a cluster of 2–30 instances that is mounted with the data from the storage volume (Step 5). A detailed protocol can be found at an accompanying Google site: http://goo.gl/AIwZJz.

https://doi.org/10.7554/eLife.06664.003
Figure 2 with 1 supplement
Global availability of Amazon r3.8xlarge spot instances.

Shown is the average percentage time spent by the r3.8xlarge type of instance when the current spot instance price was less than the queried price. The data are averaged over all Amazon's regions worldwide (except for SA-East-1, which does not offer r3.8xlarge instances). Spot instance prices were calculated over a 90-day period from 1 January 2015—1 April 2015, where the average is shown ± the s.e. Source data: Figure 2—source data 1.

https://doi.org/10.7554/eLife.06664.004
Figure 2—source data 1

Global spot instance price data from 1 January 2015 to 1 April 2015.

https://doi.org/10.7554/eLife.06664.005
Figure 2—figure supplement 1
Availability of virtual machines within regions at specified spot instance prices.

For each Amazon region (excluding SA-East-1, which does not offer r3.8xlarge instances), r3.8xlarge spot instance prices were retrieved for each availability zone, where separate availability zones are shown as separate data points for a given spot instance price. (Note: each region can have different number of availability zones). From the spot instance prices, the percentage time of the spot instances that were spent below the specified spot instance price were calculated. The average value is shown as a solid black line. Source data: Figure 2—source data 1.

https://doi.org/10.7554/eLife.06664.006
Cryo-EM structure of 80S ribosome at an overall resolution of 4.6 Å.

(A) Overall view of 80S reconstruction filtered to 4.6 Å while applying a negative B-factor of −116 Å2. (B) Gold standard FSC curve. (C) Selected regions from the 60S subunit. Cryo-EM maps were visualized with UCSF Chimera (Pettersen et al., 2004). Source data: Dryad Digital Repository dataset (http://datadryad.org/review?doi=doi:10.5061/dryad.9mb54) (Cianfrocco and Leschziner).

https://doi.org/10.7554/eLife.06664.007
Relion performance on STARcluster configurations of Amazon instances.

(A) Processing times (minutes) for Relion to perform 3D Classification or 3D refinement on 80S ribosome dataset. (B) Speedup for each cluster size relative to a single CPU (black line) shown alongside performance estimate for a perfectly parallel cluster using Amdahl's Law (curve labeled ‘Theoretical limit’). For cluster sizes ≤ 64 CPUs, Relion exhibits near-perfect performance on STARcluster configurations, while cluster sizes > 64 show that Relion's performance reaches a maximum at 256 CPUs for both 3D classification and 3D refinement. (C) Speedup/Cost is plotted against cluster size, where Speedup/Cost is defined as the speedup observed divided by the cost associated with Amazon's pricing at $0.35/hr/16 CPUs. (D) Average STARcluster boot up time (± s.d.) was measured for clusters of increasing size (n = 5). Source data: Figure 4—source data 1.

https://doi.org/10.7554/eLife.06664.008
Figure 4—source data 1

Performance analysis statistics for Relion 3D classification and 3D refinement on STARcluster configurations.

https://doi.org/10.7554/eLife.06664.009
Author response image 1

Percentage of instances below bid price over last 90 days. Shown are the percentages of r3.8xlarge instances that are below the spot instance price across different regions and zones.

https://doi.org/10.7554/eLife.06664.015

Tables

Author response table 1

Cost of 80S ribosome 3D classification and refinement on a 128 CPU STARcluster configuration at increasing spot instance price.

https://doi.org/10.7554/eLife.06664.016

Spot instance bid price

$0.35

$0.45

$0.55

$0.65

80S ribosome 3D classification and refinement cost

$28.89

$37.14

$45.39

$53.64

Additional files

Supplementary file 1

Step-by-step tutorial describing how to use Amazon's EC2 environment to analyze cryo-EM data.

https://doi.org/10.7554/eLife.06664.010
Supplementary file 2

Comparison of estimated processing times and costs for recent near-atomic cryo-EM structures on Amazon's EC2. Source data: Supplementary File 3.

https://doi.org/10.7554/eLife.06664.011
Supplementary file 3

Source data for tables in Supplementary file 2.

https://doi.org/10.7554/eLife.06664.012

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Michael A Cianfrocco
  2. Andres E Leschziner
(2015)
Low cost, high performance processing of single particle cryo-electron microscopy data in the cloud
eLife 4:e06664.
https://doi.org/10.7554/eLife.06664