Enabling X-ray free electron laser crystallography for challenging biological systems from a limited number of crystals
Figures

Geometry of the diffraction experiment and calculation of the Ewald-offset distance, rh.
(A) A reciprocal lattice point intersects the Ewald sphere. The inset shows the coordinate system used in cctbx.xfel and prime. The vector S0 represents the direction of the incident beam (–z-axis) and forms the radius of the Ewald sphere of length 1/λ. The reciprocal lattice point i is expressed in reciprocal lab coordinates using Equation 5 as represented by the vector xi. The Ewald-offset distance, rh, is the difference between the distance from the Ewald-sphere center to the reciprocal lattice point (length of Si) and 1/λ. The inset shows the definition of the crystal rotation axes; they are applied in the following order: θz, θy, θx. (B) Shown is the volume of a reciprocal lattice point with radius rs. The offset rh defines the Ewald-offset correction Eocarea, which is the ratio between the area intersecting the Ewald sphere, Ap, and the area at the center of the volume, As.

Post-refinement protocol.
The flowchart illustrates the iterative post-refinement protocol, broken up into ‘microcycles’ that refine groups of parameters iteratively (blue boxes), and ‘macrocycles’. At the beginning of first macrocycle, a reference diffraction data set is generated. At the end of each macrocycle, the reference diffraction data set is updated. Both the micro- and macrocycles terminate either when the refinement converges or when a user-specified maximum number of cycles is reached.

Post-refinement during the first macrocycle of post-refinement for myoglobin.
Shown are the values of the refined parameters and target functions during the first macrocycle of post-refinement for a representative diffraction image of the myoglobin XFEL diffraction data set. The iterative post-refinement included SF (scale factors), CO (crystal orientation), RR (reflection radius parameters), and UC (unit-cell dimensions) for three microcycles.

Convergence of post-refinement after five macrocycles for myoglobin.
The plots illustrate the convergence of post-refined parameters, target functions, and quality indicators during post-refinement over five macrocycles. A subset of 100 randomly selected diffraction images from the myoglobin XFEL diffraction data was used. For each specified target function and refined parameter, changes are plotted relative to the previous macrocycle, whereas the quality metric CC1/2 is shown as absolute numbers. The changes in post-refined parameters and target functions are shown as ‘box plots’. The bottom and top of the blue box are the first (Q1) and third (Q3) quartiles. The red line inside the box is the second quartile (Q2; median). The black horizontal lines extending vertically from the box indicate the range of the particular quantity at a 1.5 interquartile range (Q3–Q1). The plus signs indicate any items beyond this range.

Merging statistics for myoglobin.
(A) Percent completeness and (B) average number of observations plotted as a function of resolution for the myoblogin XFEL diffraction data set consisting of all 757 diffraction images (Table 1) and a randomly selected subset of 100 diffraction images. (C) CC1/2 for the averaged merged, mean-intensity scaled with partiality correction, and post-refined myoglobin diffraction data sets consisting of 100 and 757 diffraction images.

Impact of post-refinement and number of images on electron density and model quality for myoglobin.
(A) Difference Fourier (mFo-DFc) omit maps around the heme group (which was omitted from molecular replacement and atomic model refinement) for the averaged merged, the mean-scaled partiality-corrected merged, and the post-refined myoglobin XFEL diffraction data sets consisting of all 757 diffraction images (Table 1) and a randomly selected subset of 100 diffraction images. The maps are contoured are at 2.5 σ. (B) A plot of crystallographic Rwork and Rfree values vs resolution after atomic model refinement using the specified myoglobin diffraction data sets with inclusion of the heme group, SO4, and water molecules.

Quality of synchrotron vs. post-refined XFEL diffraction data sets for myoglobin.
Difference Fourier (mFo-DFo) omit maps at 1.35 Å around the heme group (which was omitted from molecular replacement and model refinement), generated from (A) the synchrotron diffraction data and corresponding model with PDB ID 1JW8 (for comparison, all reflections past 1.35 Å resolution were excluded) and (B) the post-refined myoglobin XFEL diffraction data set using all 757 diffraction images (Table 1). The maps are contoured at 2.5 σ.

Impact of post-refinement on the hydrogenase diffraction data set.
(A) Difference Fourier (mFo-DFc) omit maps of one of the four Fe-S clusters (which were omitted in molecular replacement and atomic model refinement) for the averaged merged and the post-refined hydrogenase XFEL diffraction data sets consisting of all 177 diffraction images (Table 1) and a randomly selected subset of 100 diffraction images. The maps are contoured at 3 σ. (B) Crystallographic R and Rfree values vs resolution after atomic model refinement using the specified diffraction data sets with inclusion of the three Fe-S clusters and water molecules.

Impact of post-refinement on the anomalous signal in the thermolysin diffraction dataset.
Anomalous difference Fourier maps for the averaged merged (A, C) or the post-refined (B, D) thermolysin XFEL diffraction data sets consisting of all 12,692 diffraction images (A, B—Table 1) and a randomly selected subset of 2000 diffraction images (C, D). The anomalous difference Fourier maps were computed using phases from the thermolysin atomic model (but excluding zinc and calcium ions), refined separately against each diffraction data set. All maps are contoured at 3 σ; the peak heights for the two zinc ions are indicated.

Impact of post-refinement on the quality of electron density maps and models of thermolysin.
(A) Difference Fourier (mFo-DFc) maps revealing a Leu–Lys dipeptide near the zinc site for the averaged merged and the post-refined thermolysin XFEL diffraction data sets consisting of all 12,692 diffraction images (Table 1) and a randomly selected subset of 2000 diffraction images, respectively. The maps are contoured at 3 σ. (B) Crystallographic R and Rfree values vs resolution for the refinements after atomic model refinement using the specified diffraction data sets and with inclusion of two zincs, calcium ions, and the Leu–Lys dipeptide.

Convergence of structure refinements for the post-refined thermolysin XFEL data set at 2.6 Å resolution, using increasing numbers of diffraction images.
(A) Average number of observations per unique hkl. (B) CC1/2 for merged subsets using 2000–12,000 images (100% completeness for all subsets). (C) Peak height (σ) in the omit map for the largest peak. (D) Rwork and Rfree after refining the thermolysin model without zinc and calcium ions against the corresponding post-refined diffraction data sets.

Distribution of the Ewald sphere offset rh.
The histogram shows the distribution of rh calculated after post-refinement for myoglobin using 757 diffraction images. The number of observations after applying the reflection selection criteria for merging and outlier rejections for this 1.35 Å data set is 1,136,447 (∼96% of the total observed reflections). The standard deviation is 0.0016 1/Å or approximately 0.12° (when calculated with the mean of the energy distribution).

The Ewald-offset correction function.
(A) Ewald-offset correction Eoc (Equation 14) viewed as a function of the reciprocal-lattice radius (rs) and the offset distance (rh). (B) A slice through Eoc at rs = 0.003, comparing Eoc (Equation 14) and Eocarea (Equation 11).

Geometry of the incident and diffracted beam for polarization correction.
The diagram shows a reflection on a plane formed by its reciprocal-space vector and the -z-axis at angle . This reflection is affected by the polarization of the incoming primary beam in both the horizontal (x) and vertical (y) directions.
Tables
XFEL diffraction data sets used in this study
Myoglobin | Clostridium pasteurianum hydrogenase | Thermolysin | |
---|---|---|---|
Space group | P6 | P42212 | P6122 |
Resolution used (Å) | 20.0–1.35 | 45.0–1.60 | 50.0–2.10 |
Unit cell dimensions (Å) | a = b = 90.8, c = 45.6 | a = b = 111.2, c = 103.8 | a = b = 92.7, c = 130.5 |
No. of unique reflections | 46,555 | 85,273 | 19,995 |
No. of images* indexed | 757 | 177 | 12,692 |
No. of images with spots to resolution used | 307 | 75 | 1957 |
Average no. of spots on an image (to resolution used) | 1628 | 3640 | 352 |
Energy spectrum | SASE† | SASE† | SASE† |
Detector | Rayonix MX325HE | Rayonix MX325HE | CSPAD‡ |
Sample delivery method | fixed target | fixed target | Electrospun jet |
-
*
This is the number of images indexed using cctbx.xfel program, and in the case of thermolysin it is the number of images indexed for one of the two wavelengths.
-
†
SASE: self-amplified spontaneous emission.
-
‡
CSPAD: Cornell-SLAC pixel array detector.
Statistics of post-refinement and atomic model refinement for myoglobin
No. images | 100 | 757 | ||||
Resolutiona (Å) | 20.0–1.35 (1.40–1.35) | 20.0–1.35 (1.40–1.35) | ||||
Completenessa (%) | 80.0 (22.2) | 97.7 (79.8) | ||||
Average no. observations per unique hkla | 4.0 (1.2) | 25.7 (2.0) | ||||
Averaged-merged | Mean-scaled partiality corrected | Post-refined | Averaged merged | Mean-scaled partiality corrected | Post-refined | |
Post-refinement parametersb | ||||||
Linear scale factor G0 | 1.00 (0.00) | 2.79 (5.02) | 1.00 (1.04) | 1.00 (0.00) | 2.19 (3.83) | 0.89 (1.07) |
B | 0.0 (0.0) | 0.0 (0.0) | 3.2 (7.8) | 0.0 (0.0) | 0.0 (0.0) | 6.2 (8.3) |
γ0 (Å−1) | NA | 0.00135 (0.00028) | 0.00128 (0.00022) | NA | 0.00147 (0.00042) | 0.00132 (0.00034) |
γy (Å−1) | NA | 0.00 (0.00) | 0.00007 (0.00080) | NA | 0.00 (0.00) | 0.00007 (0.00009) |
γx (Å−1) | NA | 0.00 (0.00) | 0.00010 (0.00011) | NA | 0.00 (0.00) | 0.00008 (0.00010) |
γe (Å−1) | NA | 0.00200 (0.00) | 0.00344 (0.00266) | NA | 0.00200 (0.00) | 0.00423 (0.00323) |
Unit cell | ||||||
a (Å): | 90.4 (0.4) | 90.4 (0.4) | 90.5 (0.4) | 90.4 (0.4) | 90.4 (0.4) | 90.5 (0.3) |
c (Å) | 45.3 (0.4) | 45.3 (0.4) | 45.3 (0.3) | 45.3 (0.3) | 45.3 (0.3) | 45.3 (0.3) |
Average Tpr Start/End | NA | NA | 19.39 (7.68)/7.17 (3.38) | NA | NA | 19.83 (7.54)/6.02 (2.59) |
Average Txy (mm2) Start/End | NA | NA | 169.74 (132.56)/132.02 (104.08) | NA | NA | 170.66 (144.52)/133.42 (109.58) |
CC1/2 (%) | 81.3 | 79.6 | 86.5 | 91.8 | 95.7 | 98.2 |
Molecular replacement scoresc | ||||||
LLG | 2837. | 5043. | 5291. | 8264. | 8364. | 9320. |
TFZ | 10.5 | 13.0 | 13.4 | 13.7 | 13.8 | 14.0 |
Structure-refinement parameters | ||||||
R (%) | 39.4 | 28.0 | 23.5 | 21.1 | 20.3 | 17.8 |
Rfree (%) | 42.1 | 29.4 | 24.8 | 23.1 | 22.5 | 19.7 |
Bond r.m.s.d. | 0.006 | 0.006 | 0.004 | 0.006 | 0.006 | 0.006 |
Angle r.m.s.d. | 1.14 | 0.98 | 0.79 | 1.03 | 1.35 | 0.86 |
Ramachandran statistics | ||||||
Favored (%) | 98.0 | 98.0 | 98.0 | 98.0 | 98.0 | 98.0 |
Outliers (%) | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
-
a
Values in parentheses correspond to highest resolution shell.
-
b
Post-refined parameters are shown as the mean value, with the standard deviation in parentheses.
-
c
Molecular replacement scores reported by Phaser (McCoy et al., 2007): log-likelihood gain (LLG) and translation function (TFZ).
Statistics of post-refinement and atomic model refinement for hydrogenase
No. images | 100 | 177 | ||
Resolutiona (Å) | 45.0–1.60 (1.66–1.60) | 45.0–1.60 (1.66–1.60) | ||
Completenessa (%) | 83.0 (47.7) | 91.2 (63.5) | ||
Average no. observations per unique hkla | 4.4 (1.7) | 7.13 (2.3) | ||
Averaged-merged | Post-refined | Averaged-merged | Post-refined | |
Post-refinement parametersb | ||||
Linear scale factor G0 | 1.00 (0.00) | 0.56 (1.27) | 1.00 (0.00) | 0.53 (1.22) |
B | 0.0 (0.0) | 10.0 (7.0) | 0.0 (0.0) | 10.5 (6.9) |
γ0 (Å−1) | NA | 0.00132 (0.00042) | NA | 0.00126 (0.00041) |
γy (Å−1) | NA | 0.00002 (0.00004) | NA | 0.00002 (0.00004) |
γx (Å−1) | NA | 0.00008 (0.00009) | NA | 0.00008 (0.00011) |
γe (Å−1) | NA | 0.00269 (0.00138) | NA | 0.00288 (0.00160) |
Unit cell | ||||
a (Å): | 110.1 (0.4) | 110.4 (0.3) | 110.1 (0.4) | 110.3 (0.4) |
c (Å) | 103.1 (0.4) | 103.1 (0.2) | 103.0 (0.4) | 103.0 (0.2) |
Average Tpr Start/End | NA | 28.20 (10.86)/5.92 (2.35) | NA | 26.47 (12.70)/5.22 (2.72) |
Average Txy (mm2) Start/End | NA | 623.36 (314.57)/381.23 (198.44) | NA | 564.30 (267.45)/ 372.28 (202.28) |
CC1/2 (%) | 62.0 | 77.3 | 71.7 | 84.8 |
Molecular replacement scoresc | ||||
LLG | 53,352. | 9612. | 7229. | 11774. |
TFZ | 69.2 | 75.9 | 75.0 | 79.0 |
Structure-refinement parameters | ||||
R (%) | 33.4 | 25.3 | 29.1 | 22.0 |
Rfree (%) | 36.7 | 28.9 | 31.3 | 25.0 |
Bond r.m.s.d. | 0.006 | 0.007 | 0.007 | 0.007 |
Angle r.m.s.d. | 1.43 | 1.50 | 1.68 | 1.97 |
Ramachandran statistics | ||||
Favored (%) | 96.3 | 97.0 | 97.0 | 96.7 |
Outliers (%) | 0.0 | 0.0 | 0.0 | 0.0 |
-
a
Values in parentheses correspond to highest resolution shell.
-
b
Post-refined parameters are shown as the mean value, with the standard deviation in parentheses.
-
c
Molecular replacement scores reported by Phaser (McCoy et al., 2007): log-likelihood gain (LLG) and translation function (TFZ).
Statistics of post-refinement and atomic model refinement for thermolysin
No. images | 2000 | 12,692 | ||
Resolutiona (Å) | 50.0–2.10 (2.18–2.10) | 50.0–2.10 (2.18–2.10) | ||
Completenessa (%) | 81.3 (24.3) | 96.5 (74.8) | ||
Average no. observations per unique hkla | 32.8 (1.2) | 176.6 (2.4) | ||
Averaged-merged | Post-refined | Averaged-merged | Post-refined | |
Post-refinement parametersb | ||||
Linear scale factor G0 | 1.00 (0.00) | 1.65 (1.66) | 1.00 (0.00) | 2.26 (75.12) |
B | 0.0 (0.0) | 23.0 (33.8) | 0.0 (0.0) | 30.1 (59.8) |
γ0 (Å−1) | NA | 0.00052 (0.00040) | NA | 0.00051 (0.00039) |
γy (Å−1) | NA | 0.00001 (0.00003) | NA | 0.00001 (0.00003) |
γx (Å−1) | NA | 0.00002 (0.00004) | NA | 0.00002 (0.00004) |
γe (Å−1) | NA | 0.00110 (0.00129) | NA | 0.00103 (0.00128) |
Unit cell | ||||
a (Å): | 92.9 (0.3) | 92.9 (0.2) | 92.9 (0.3) | 92.9 (0.3) |
c (Å) | 130.5 (0.5) | 130.4 (0.4) | 130.5 (0.5) | 130.4 (0.4) |
Average Tpr Start/End | NA | 1.15 (0.49)/0.55 (0.23) | NA | 1.15 (0.52)/0.28 (0.13) |
Average Txy (mm2) Start/End | NA | 168.13 (117.29)/167.72 (106.14) | NA | 169.01 (122.20)/170.00 (122.57) |
CC1/2 (%) | 77.7 | 93.5 | 94.3 | 98.8 |
Molecular replacement scoresc | ||||
LLG | 3590. | 4491. | 5477. | 6022. |
TFZ | 8.9 | 9.7 | 24.1 | 24.6 |
Structure-refinement parameters | ||||
R (%) | 25.2 | 19.5 | 20.7 | 18.4 |
Rfree (%) | 29.1 | 24.0 | 23.9 | 21.1 |
Bond r.m.s.d. | 0.004 | 0.002 | 0.002 | 0.002 |
Angle r.m.s.d. | 0.75 | 0.58 | 0.59 | 0.62 |
Ramachandran statistics | ||||
Favored (%) | 95.9 | 94.6 | 95.2 | 94.9 |
Outliers (%) | 0.0 | 0.0 | 0.0 | 0.0 |
Zinc peak height | ||||
Zn(1) (σ) | 14.0 | 16.0 | 14.3 | 20.9 |
Zn(2) (σ) | 3.6 | 5.1 | 7.7 | 7.1 |
Average peak height for calcium ions (σ) | 9.7 | 11.3 | 14.2 | 16.1 |
-
a
Values in parentheses correspond to highest resolution shell.
-
b
Post-refined parameters are shown as the mean value, with the standard deviation in parentheses.
-
c
Molecular replacement scores reported by Phaser (McCoy et al., 2007): log-likelihood gain (LLG) and translation function (TFZ).