Enabling X-ray free electron laser crystallography for challenging biological systems from a limited number of crystals

  1. Monarin Uervirojnangkoorn
  2. Oliver B Zeldin
  3. Artem Y Lyubimov
  4. Johan Hattne
  5. Aaron S Brewster
  6. Nicholas K Sauter
  7. Axel T Brunger  Is a corresponding author
  8. William I Weis  Is a corresponding author
  1. Stanford University, United States
  2. Janelia Research Campus, United States
  3. Lawrence Berkeley National Laboratory, United States
  4. Howard Hughes Medical Institute, Stanford University, United States
14 figures and 4 tables

Figures

Geometry of the diffraction experiment and calculation of the Ewald-offset distance, rh.

(A) A reciprocal lattice point intersects the Ewald sphere. The inset shows the coordinate system used in cctbx.xfel and prime. The vector S0 represents the direction of the incident beam (–z-axis) and forms the radius of the Ewald sphere of length 1/λ. The reciprocal lattice point i is expressed in reciprocal lab coordinates using Equation 5 as represented by the vector xi. The Ewald-offset distance, rh, is the difference between the distance from the Ewald-sphere center to the reciprocal lattice point (length of Si) and 1/λ. The inset shows the definition of the crystal rotation axes; they are applied in the following order: θz, θy, θx. (B) Shown is the volume of a reciprocal lattice point with radius rs. The offset rh defines the Ewald-offset correction Eocarea, which is the ratio between the area intersecting the Ewald sphere, Ap, and the area at the center of the volume, As.

https://doi.org/10.7554/eLife.05421.003
Post-refinement protocol.

The flowchart illustrates the iterative post-refinement protocol, broken up into ‘microcycles’ that refine groups of parameters iteratively (blue boxes), and ‘macrocycles’. At the beginning of first macrocycle, a reference diffraction data set is generated. At the end of each macrocycle, the reference diffraction data set is updated. Both the micro- and macrocycles terminate either when the refinement converges or when a user-specified maximum number of cycles is reached.

https://doi.org/10.7554/eLife.05421.004
Post-refinement during the first macrocycle of post-refinement for myoglobin.

Shown are the values of the refined parameters and target functions during the first macrocycle of post-refinement for a representative diffraction image of the myoglobin XFEL diffraction data set. The iterative post-refinement included SF (scale factors), CO (crystal orientation), RR (reflection radius parameters), and UC (unit-cell dimensions) for three microcycles.

https://doi.org/10.7554/eLife.05421.009
Convergence of post-refinement after five macrocycles for myoglobin.

The plots illustrate the convergence of post-refined parameters, target functions, and quality indicators during post-refinement over five macrocycles. A subset of 100 randomly selected diffraction images from the myoglobin XFEL diffraction data was used. For each specified target function and refined parameter, changes are plotted relative to the previous macrocycle, whereas the quality metric CC1/2 is shown as absolute numbers. The changes in post-refined parameters and target functions are shown as ‘box plots’. The bottom and top of the blue box are the first (Q1) and third (Q3) quartiles. The red line inside the box is the second quartile (Q2; median). The black horizontal lines extending vertically from the box indicate the range of the particular quantity at a 1.5 interquartile range (Q3–Q1). The plus signs indicate any items beyond this range.

https://doi.org/10.7554/eLife.05421.010
Merging statistics for myoglobin.

(A) Percent completeness and (B) average number of observations plotted as a function of resolution for the myoblogin XFEL diffraction data set consisting of all 757 diffraction images (Table 1) and a randomly selected subset of 100 diffraction images. (C) CC1/2 for the averaged merged, mean-intensity scaled with partiality correction, and post-refined myoglobin diffraction data sets consisting of 100 and 757 diffraction images.

https://doi.org/10.7554/eLife.05421.011
Impact of post-refinement and number of images on electron density and model quality for myoglobin.

(A) Difference Fourier (mFo-DFc) omit maps around the heme group (which was omitted from molecular replacement and atomic model refinement) for the averaged merged, the mean-scaled partiality-corrected merged, and the post-refined myoglobin XFEL diffraction data sets consisting of all 757 diffraction images (Table 1) and a randomly selected subset of 100 diffraction images. The maps are contoured are at 2.5 σ. (B) A plot of crystallographic Rwork and Rfree values vs resolution after atomic model refinement using the specified myoglobin diffraction data sets with inclusion of the heme group, SO4, and water molecules.

https://doi.org/10.7554/eLife.05421.012
Quality of synchrotron vs. post-refined XFEL diffraction data sets for myoglobin.

Difference Fourier (mFo-DFo) omit maps at 1.35 Å around the heme group (which was omitted from molecular replacement and model refinement), generated from (A) the synchrotron diffraction data and corresponding model with PDB ID 1JW8 (for comparison, all reflections past 1.35 Å resolution were excluded) and (B) the post-refined myoglobin XFEL diffraction data set using all 757 diffraction images (Table 1). The maps are contoured at 2.5 σ.

https://doi.org/10.7554/eLife.05421.013
Impact of post-refinement on the hydrogenase diffraction data set.

(A) Difference Fourier (mFo-DFc) omit maps of one of the four Fe-S clusters (which were omitted in molecular replacement and atomic model refinement) for the averaged merged and the post-refined hydrogenase XFEL diffraction data sets consisting of all 177 diffraction images (Table 1) and a randomly selected subset of 100 diffraction images. The maps are contoured at 3 σ. (B) Crystallographic R and Rfree values vs resolution after atomic model refinement using the specified diffraction data sets with inclusion of the three Fe-S clusters and water molecules.

https://doi.org/10.7554/eLife.05421.014
Impact of post-refinement on the anomalous signal in the thermolysin diffraction dataset.

Anomalous difference Fourier maps for the averaged merged (A, C) or the post-refined (B, D) thermolysin XFEL diffraction data sets consisting of all 12,692 diffraction images (A, BTable 1) and a randomly selected subset of 2000 diffraction images (C, D). The anomalous difference Fourier maps were computed using phases from the thermolysin atomic model (but excluding zinc and calcium ions), refined separately against each diffraction data set. All maps are contoured at 3 σ; the peak heights for the two zinc ions are indicated.

https://doi.org/10.7554/eLife.05421.015
Impact of post-refinement on the quality of electron density maps and models of thermolysin.

(A) Difference Fourier (mFo-DFc) maps revealing a Leu–Lys dipeptide near the zinc site for the averaged merged and the post-refined thermolysin XFEL diffraction data sets consisting of all 12,692 diffraction images (Table 1) and a randomly selected subset of 2000 diffraction images, respectively. The maps are contoured at 3 σ. (B) Crystallographic R and Rfree values vs resolution for the refinements after atomic model refinement using the specified diffraction data sets and with inclusion of two zincs, calcium ions, and the Leu–Lys dipeptide.

https://doi.org/10.7554/eLife.05421.016
Convergence of structure refinements for the post-refined thermolysin XFEL data set at 2.6 Å resolution, using increasing numbers of diffraction images.

(A) Average number of observations per unique hkl. (B) CC1/2 for merged subsets using 2000–12,000 images (100% completeness for all subsets). (C) Peak height (σ) in the omit map for the largest peak. (D) Rwork and Rfree after refining the thermolysin model without zinc and calcium ions against the corresponding post-refined diffraction data sets.

https://doi.org/10.7554/eLife.05421.017
Distribution of the Ewald sphere offset rh.

The histogram shows the distribution of rh calculated after post-refinement for myoglobin using 757 diffraction images. The number of observations after applying the reflection selection criteria for merging and outlier rejections for this 1.35 Å data set is 1,136,447 (∼96% of the total observed reflections). The standard deviation is 0.0016 1/Å or approximately 0.12° (when calculated with the mean of the energy distribution).

https://doi.org/10.7554/eLife.05421.018
The Ewald-offset correction function.

(A) Ewald-offset correction Eoc (Equation 14) viewed as a function of the reciprocal-lattice radius (rs) and the offset distance (rh). (B) A slice through Eoc at rs = 0.003, comparing Eoc (Equation 14) and Eocarea (Equation 11).

https://doi.org/10.7554/eLife.05421.019
Geometry of the incident and diffracted beam for polarization correction.

The diagram shows a reflection on a plane formed by its reciprocal-space vector and the -z-axis at angle ϕ. This reflection is affected by the polarization of the incoming primary beam in both the horizontal (x) and vertical (y) directions.

https://doi.org/10.7554/eLife.05421.020

Tables

Table 1

XFEL diffraction data sets used in this study

https://doi.org/10.7554/eLife.05421.005
MyoglobinClostridium pasteurianum hydrogenaseThermolysin
Space groupP6P42212P6122
Resolution used (Å)20.0–1.3545.0–1.6050.0–2.10
Unit cell dimensions (Å)a = b = 90.8, c = 45.6a = b = 111.2, c = 103.8a = b = 92.7, c = 130.5
No. of unique reflections46,55585,27319,995
No. of images* indexed75717712,692
No. of images with spots to resolution used307751957
Average no. of spots on an image (to resolution used)16283640352
Energy spectrumSASESASESASE
DetectorRayonix MX325HERayonix MX325HECSPAD
Sample delivery methodfixed targetfixed targetElectrospun jet
  1. *

    This is the number of images indexed using cctbx.xfel program, and in the case of thermolysin it is the number of images indexed for one of the two wavelengths.

  2. SASE: self-amplified spontaneous emission.

  3. CSPAD: Cornell-SLAC pixel array detector.

Table 2

Statistics of post-refinement and atomic model refinement for myoglobin

https://doi.org/10.7554/eLife.05421.006
No. images100757
Resolutiona (Å)20.0–1.35 (1.40–1.35)20.0–1.35 (1.40–1.35)
Completenessa (%)80.0 (22.2)97.7 (79.8)
Average no. observations per unique hkla4.0 (1.2)25.7 (2.0)
Averaged-mergedMean-scaled partiality correctedPost-refinedAveraged mergedMean-scaled partiality correctedPost-refined
Post-refinement parametersb
Linear scale factor G01.00 (0.00)2.79 (5.02)1.00 (1.04)1.00 (0.00)2.19 (3.83)0.89 (1.07)
B0.0 (0.0)0.0 (0.0)3.2 (7.8)0.0 (0.0)0.0 (0.0)6.2 (8.3)
γ0 (Å−1)NA0.00135 (0.00028)0.00128 (0.00022)NA0.00147 (0.00042)0.00132 (0.00034)
γy (Å−1)NA0.00 (0.00)0.00007 (0.00080)NA0.00 (0.00)0.00007 (0.00009)
γx (Å−1)NA0.00 (0.00)0.00010 (0.00011)NA0.00 (0.00)0.00008 (0.00010)
γe (Å−1)NA0.00200 (0.00)0.00344 (0.00266)NA0.00200 (0.00)0.00423 (0.00323)
 Unit cell
  a (Å):90.4 (0.4)90.4 (0.4)90.5 (0.4)90.4 (0.4)90.4 (0.4)90.5 (0.3)
  c (Å)45.3 (0.4)45.3 (0.4)45.3 (0.3)45.3 (0.3)45.3 (0.3)45.3 (0.3)
 Average Tpr Start/EndNANA19.39 (7.68)/7.17 (3.38)NANA19.83 (7.54)/6.02 (2.59)
 Average Txy (mm2) Start/EndNANA169.74 (132.56)/132.02 (104.08)NANA170.66 (144.52)/133.42 (109.58)
CC1/2 (%)81.379.686.591.895.798.2
Molecular replacement scoresc
 LLG2837.5043.5291.8264.8364.9320.
TFZ10.513.013.413.713.814.0
Structure-refinement parameters
R (%)39.428.023.521.120.317.8
Rfree (%)42.129.424.823.122.519.7
 Bond r.m.s.d.0.0060.0060.0040.0060.0060.006
 Angle r.m.s.d.1.140.980.791.031.350.86
 Ramachandran statistics
  Favored (%)98.098.098.098.098.098.0
  Outliers (%)0.00.00.00.00.00.0
  1. a

    Values in parentheses correspond to highest resolution shell.

  2. b

    Post-refined parameters are shown as the mean value, with the standard deviation in parentheses.

  3. c

    Molecular replacement scores reported by Phaser (McCoy et al., 2007): log-likelihood gain (LLG) and translation function (TFZ).

Table 3

Statistics of post-refinement and atomic model refinement for hydrogenase

https://doi.org/10.7554/eLife.05421.007
No. images100177
Resolutiona (Å)45.0–1.60 (1.66–1.60)45.0–1.60 (1.66–1.60)
Completenessa (%)83.0 (47.7)91.2 (63.5)
Average no. observations per unique hkla4.4 (1.7)7.13 (2.3)
Averaged-mergedPost-refinedAveraged-mergedPost-refined
Post-refinement parametersb
 Linear scale factor G01.00 (0.00)0.56 (1.27)1.00 (0.00)0.53 (1.22)
B0.0 (0.0)10.0 (7.0)0.0 (0.0)10.5 (6.9)
γ0 (Å−1)NA0.00132 (0.00042)NA0.00126 (0.00041)
γy (Å−1)NA0.00002 (0.00004)NA0.00002 (0.00004)
γx (Å−1)NA0.00008 (0.00009)NA0.00008 (0.00011)
γe (Å−1)NA0.00269 (0.00138)NA0.00288 (0.00160)
 Unit cell
  a (Å):110.1 (0.4)110.4 (0.3)110.1 (0.4)110.3 (0.4)
  c (Å)103.1 (0.4)103.1 (0.2)103.0 (0.4)103.0 (0.2)
 Average Tpr Start/EndNA28.20 (10.86)/5.92 (2.35)NA26.47 (12.70)/5.22 (2.72)
 Average Txy (mm2) Start/EndNA623.36 (314.57)/381.23 (198.44)NA564.30 (267.45)/
372.28 (202.28)
CC1/2 (%)62.077.371.784.8
Molecular replacement scoresc
 LLG53,352.9612.7229.11774.
 TFZ69.275.975.079.0
Structure-refinement parameters
R (%)33.425.329.122.0
Rfree (%)36.728.931.325.0
 Bond r.m.s.d.0.0060.0070.0070.007
 Angle r.m.s.d.1.431.501.681.97
 Ramachandran statistics
  Favored (%)96.397.097.096.7
  Outliers (%)0.00.00.00.0
  1. a

    Values in parentheses correspond to highest resolution shell.

  2. b

    Post-refined parameters are shown as the mean value, with the standard deviation in parentheses.

  3. c

    Molecular replacement scores reported by Phaser (McCoy et al., 2007): log-likelihood gain (LLG) and translation function (TFZ).

Table 4

Statistics of post-refinement and atomic model refinement for thermolysin

https://doi.org/10.7554/eLife.05421.008
No. images200012,692
Resolutiona (Å)50.0–2.10 (2.18–2.10)50.0–2.10 (2.18–2.10)
Completenessa (%)81.3 (24.3)96.5 (74.8)
Average no. observations per unique hkla32.8 (1.2)176.6 (2.4)
Averaged-mergedPost-refinedAveraged-mergedPost-refined
Post-refinement parametersb
 Linear scale factor G01.00 (0.00)1.65 (1.66)1.00 (0.00)2.26 (75.12)
B0.0 (0.0)23.0 (33.8)0.0 (0.0)30.1 (59.8)
γ0 (Å−1)NA0.00052 (0.00040)NA0.00051 (0.00039)
γy (Å−1)NA0.00001 (0.00003)NA0.00001 (0.00003)
γx (Å−1)NA0.00002 (0.00004)NA0.00002 (0.00004)
γe (Å−1)NA0.00110 (0.00129)NA0.00103 (0.00128)
 Unit cell
  a (Å):92.9 (0.3)92.9 (0.2)92.9 (0.3)92.9 (0.3)
  c (Å)130.5 (0.5)130.4 (0.4)130.5 (0.5)130.4 (0.4)
 Average Tpr Start/EndNA1.15 (0.49)/0.55 (0.23)NA1.15 (0.52)/0.28 (0.13)
 Average Txy (mm2) Start/EndNA168.13 (117.29)/167.72 (106.14)NA169.01 (122.20)/170.00 (122.57)
CC1/2 (%)77.793.594.398.8
Molecular replacement scoresc
 LLG3590.4491.5477.6022.
 TFZ8.99.724.124.6
Structure-refinement parameters
R (%)25.219.520.718.4
Rfree (%)29.124.023.921.1
 Bond r.m.s.d.0.0040.0020.0020.002
 Angle r.m.s.d.0.750.580.590.62
 Ramachandran statistics
  Favored (%)95.994.695.294.9
  Outliers (%)0.00.00.00.0
 Zinc peak height
  Zn(1) (σ)14.016.014.320.9
  Zn(2) (σ)3.65.17.77.1
 Average peak height for calcium ions (σ)9.711.314.216.1
  1. a

    Values in parentheses correspond to highest resolution shell.

  2. b

    Post-refined parameters are shown as the mean value, with the standard deviation in parentheses.

  3. c

    Molecular replacement scores reported by Phaser (McCoy et al., 2007): log-likelihood gain (LLG) and translation function (TFZ).

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Monarin Uervirojnangkoorn
  2. Oliver B Zeldin
  3. Artem Y Lyubimov
  4. Johan Hattne
  5. Aaron S Brewster
  6. Nicholas K Sauter
  7. Axel T Brunger
  8. William I Weis
(2015)
Enabling X-ray free electron laser crystallography for challenging biological systems from a limited number of crystals
eLife 4:e05421.
https://doi.org/10.7554/eLife.05421