Accurate prediction of CDR-H3 loop structures of antibodies with deep learning

  1. Hedi Chen
  2. Xiaoyu Fan
  3. Shuqian Zhu
  4. Yuchan Pei
  5. Xiaochun Zhang
  6. Xiaonan Zhang
  7. Lihang Liu
  8. Feng Qian  Is a corresponding author
  9. Boxue Tian  Is a corresponding author
  1. MOE Key Laboratory of Bioinformatics, State Key Laboratory of Molecular Oncology, School of Pharmaceutical Sciences, Tsinghua University, China
  2. Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, China
  3. Department of Natural Language Processing, Baidu International Technology (Shenzhen) Co Ltd, China
12 figures, 10 tables and 1 additional file

Figures

Accuracy of AF2 on antibody modeling.

(a) Schematic for CDR heavy chain loops. (b) The CDR lengths of monoclonal antibodies (mAbs) (n = 47) and nanobodies (Nbs) (n = 78). The error bars represent the standard deviation of the data. (c) …

Figure 2 with 1 supplement
Accuracy of AF2 on different antibody regions.

(a) The performance of AlphaFold2 in DB1 relative to other methods after superimposing Fv backbones. (b) The performance of H3-OPT in DB1 relative to other methods after superimposing VH backbones. …

Figure 2—figure supplement 1
Local accuracy of AlphaFold2 prediction.

(a) Side-by-side comparison of backbone and CDR3 heavy atom root mean square deviations (RMSDs) for DeepAb and AlphaFold2 in DB1. (b) Side-by-side comparison of backbone and CDR3 heavy atom RMSDs …

Molecular dynamics (MD) generated conformations for benchmark target 7N0R.

(a) Comparison of CDR-H3 loops of MD (gray), AF2 (pink), and experimentally determined structure (cyan). (b) Root mean square fluctuation (RMSF) of antibody residues during simulation. CDR-H3 loop …

H3-OPT architecture.

(a) Schematic for dataset preparation. Structures were screened from the SAbDab database based on resolution and sequence identity. Clustering of the filtered, high-resolution structures yielded …

Template module and ablation studies.

(a) Side-by-side comparison of Cα-RMSDs of AF2 and IgFold for Sub1 (n = 52); color scale for data points reflects CDR3 length. AF2 outperformed IgFold for targets left of the dashed diagonal; IgFold …

PLM-based structure prediction module (PSPM) module.

(a) Side-by-side comparison of Cα-RMSDs for AF2 and IgFold, IgFold and H3-OPT in the Sub2 (n = 46) and Sub3 (n = 33) test sets, respectively. (b) Comparison of prediction accuracy between AF2 and …

Figure 7 with 1 supplement
Accuracy of CDR-H3 loop prediction by H3-OPT.

(a) The performance of H3-OPT in the test set (nmAbs = 119, nNbs = 12) relative to other methods. The RMSD of H3-OPT was significantly lower than other existing methods (p<0.001). (b) The …

Figure 7—figure supplement 1
Comparison of accuracy between AF2, H3-OPT, and tFold-Ab methods using the CAMEO 2022 benchmark dataset (Leemann et al., 2023).

The x-axis represents different targets, and y-axis represents Cα-RMSD values. RMSD, root mean square deviation.

Figure 8 with 1 supplement
Analysis of surface patches.

(a) Analysis of surface amino acids for predicted H3 loops. Y-axis represents average number of surface residues for H3 loops (n = 131). The surface residues of AF2 models are significantly higher …

Figure 8—figure supplement 1
Solvent-accessible surface area (SASA) analysis of predicted H3 loops.

The values represent the difference in SASA between H3 structures predicted by AF2 or H3-OPT and experimentally determined structures. Positive values indicate that the predicted structures have …

Figure 9 with 1 supplement
Accuracy of H3-OPT predictions of antibody–antigen interactions.

(a) Performance of H3-OPT in binding site prediction. comparison of prediction accuracy between H3-OPT and AF2 for antibody–antigen binding sites (n = 27). Box represents interquartile range (IQR); …

Figure 9—figure supplement 1
RMSDbackbone during production runs.
Author response image 1
Author response image 2
Author response image 3

Tables

Table 1
The root mean square deviation (RMSD) results of PM6D3 level re-ranking method on 14 same CDR-H3 antibodies.
PDB IDRanked 0 RMSDLowest energy RMSDLowest RMSDΔRMSD*
4kmt1.061.141.05–0.08
5i192.161.911.770.25
5i1l3.803.203.190.60
5i172.863.712.86–0.85
5i1d2.102.102.020.00
5i1c2.431.661.450.77
5i1a2.370.850.591.52
5i1i3.723.513.510.21
5i152.161.941.350.22
5i163.191.701.391.49
5i182.882.882.880.00
5i1e1.621.130.920.49
5i1g2.082.082.000.00
5i1h1.581.841.32–0.26
  1. *

    ΔRMSD was calculated by subtracting the RMSD of predicted model from the RMSD of Ranked_0 model.

Table 2
Accuracy of quantum mechanics (QM)-based re-ranking methods.
MethodFreeze terminal CαCDR*PhaseRanked 0 RMSDLowest energy RMSDLowest RMSDΔRMSD
PM6D3YH3Gas2.642.762.16–0.12
PM6D3NH3Gas2.532.672.03–0.14
PM6D3NH1, H2, H3Gas2.502.642.00–0.14
B3LYPNH3Gas2.662.872.30–0.21
B3LYPNH3Water2.662.682.30–0.02
  1. RMSD = root mean square deviation.

  2. *

    CDR means the energy of which loop is used to re-rank AF2 models.

Table 3
Accuracy of quantum mechanics (QM)-based optimization methods.
MethodFreeze terminal CαStructure generation methodPhaseRanked 0 RMSDLowest energy RMSD/opted RMSDLowest RMSDΔRMSD
PM6D3Y/Gas1.691.74/1.871.37–0.05
B3LYPN/Gas1.631.65/2.551.38–0.02
B3LYPN/Water1.631.58/2.251.380.05
B3LYPNBoltzmannGas1.562.051.28–0.49
B3LYPNBoltzmannWater1.561.811.28–0.25
B3LYPNBoltzmann, minimizedGas1.561.961.28–0.40
B3LYPNBoltzmann, minimizedWater1.561.841.28–0.28
  1. RMSD = root mean square deviation.

Table 4
The accuracy of molecular dynamics (MD)-based CDR-H3 loop optimization in the 10 worst cases of AF2.
PDB IDCα-RMSDRanked_0Cα-RMSDMD_optΔCα-RMSD
7n0r10.925.62 ± 0.975.30
3juy6.375.71 ± 0.230.66
5y806.617.59 ± 0.47–0.98
7a4t6.197.48 ± 0.29–1.29
4nzr6.577.73 ± 0.26–1.16
6xzu7.456.34 ± 0.941.11
6x056.327.48 ± 0.63–1.16
3c086.687.01 ± 0.11–0.33
4z9k9.048.01 ± 0.371.03
6oca7.618.01 ± 0.34–0.40
  1. RMSD = root mean square deviation.

Table 5
Performance of H3-OPT with different protein language models (PLMs).
RMSD (Å)
H3-OPT2.24 ± 1.05
AF22.85 ± 0.69
ESM22.31 ± 1.13
Without PLM2.41 ± 1.26
AntiBERTy2.49 ± 1.42
ProtTrans-T52.40 ± 1.28
  1. RMSD = root mean square deviation.

Table 6
Comparison of binding affinities obtained from molecular dynamics (MD) simulations using AF2 and H3-OPT.
PDB IDAF2 (kcal/mol)AF2 RMSD (Å)H3-OPT (kcal/mol)H3-OPT RMSD (Å)AF2 (kcal/mol)H3-OPT (kcal/mol)
MM/GBSAMM/PBSAMM/GBSAMM/PBSAMM/GBSA*|MM/PBSA|MM/GBSA|MM/PBSA|
2ghw–29.20–33.362.7–14.70–21.363.08.632.4223.1314.42
2yc1–38.85–37.732.3–43.80–48.721.56.8018.671.857.68
3l95–29.59–53.352.5–47.22–68.862.523.6011.445.974.07
3u30–37.31–42.412.6–44.94–50.072.59.642.182.015.48
4cni–36.96–42.931.0–31.92–40.391.38.547.443.504.89
4nbz–36.59–43.791.9–59.61–54.230.610.173.3012.8513.74
4xnq–13.55–17.472.7–31.51–30.940.515.4012.322.571.15
4ydl–52.51–74.254.8–49.17–73.573.66.826.5310.157.21
5e5m–59.72–71.153.0–41.29–53.707.30.505.7918.9311.66
5f7y–61.76–69.462.7–60.33–69.431.43.956.385.386.41
6kyz–12.66–20.324.0–9.36–17.133.717.6317.2120.9320.40
6o9h–39.53–43.452.8–52.27–57.510.610.4513.312.290.74
6pyd–45.87–58.711.0–35.75–45.281.16.2913.503.830.06
6u9s–36.54–48.661.0–39.79–44.801.314.3510.4211.1114.28
Average//2.6//2.410.209.358.898.01
  1. *

    ΔMM/GBSA (or ΔMM/PBSA) was calculated by subtracting the MM/GBSA (or MM/PBSA) of predicted model from the MM/GBSA· (or MM/PBSA) of experimental structure.

Table 7
Features of the model.

Nres is the number of residues (Jumper et al., 2021).

Feature and shapeDescription
Amino acid type [Nres, 21]One-hot representation of the input amino acid sequence (including 20 amino acids and unknown).
3D coordinates [Nres, 3]Cα coordinates of all AlphaFold2-predicted residues
Backbone torsion angles [Nres, 6]Sine and cosine encoding of all predicted three backbone torsion angles.
Torsion angles mask [Nres, 3]A mask indicating if the angle was presented in the predicted structure.
H3 residue mask [Nres, 1]A mask indicating if the residue was located in H3 loop.
Pairwise distances [Nres, Nres, 39]One hot representation of residue alpha carbon atoms distance. The pairwise distances ranging from 3.25 Å to 50.75 Å were put into 38 bins equally and the last bin contained any larger distances.
Pairwise amino acid type [Nres, Nres, 21]One-hot representation of the input amino acid sequence.
Table 8
Hyperparameters for H3-OPT models.
Model2513Best
 Initial learning rate1–45–41–35–41–4
 Hidden layers6464646464
 Iterations numbers of Evoformer-like layer66444
 Average RMSD (Å)2.422.362.352.332.24
  1. RMSD = root mean square deviation.

Table 9
Average Cα-RMSDs of our test set under different confidence cutoffs.
CutoffCα-RMSD (Å)
0.702.46
0.752.30
0.802.24
0.852.17
0.902.29
0.952.28
  1. RMSD = root mean square deviation.

Author response table 1
PDBIDAF2(kcal//mol)AF2
RMSD_(Cu)
(A)
H3-OPT(kcal//mol)AF2(kcal//mol)H3-OPT(kcal//mol)
MM/
GBSA
MM/
PBSA
MM/
GBSA
MM/
PBSA
("Å")|/_\MM//GBSA^(**)|| /_\MM//PBSA∣[ /_\MM//GBSA]|/_\MM//PBSA|
2ghw-29.20-33.362.7-14.70-21.363.08.632.4223.1314.42
2yc1-38.85-37.732.3-43.80-48.721.56.8018.671.857.68
3195-29.59-53.352.5-47.22-68.862.523.6011.445.974.07
3u30-37.31-42.412.6-44.94-50.072.59.642.182.015.48
4cni-36.96-42.931.0-31.92-40.391.38.547.443.504.89
4nbz-36.59-43.791.9-59.61-54.230.610.173.3012.8513.74
4xx nq-13.55-17.472.7-31.51-30.940.515.4012.322.571.15
4ydl-52.51-74.254.8-49.17-73.573.66.826.5310.157.21
5e5m-59.72-71.153.0-41.29-53.707.30.505.7918.9311.66
577 y-61.76-69.462.7-60.33-69.431.43.956.385.386.41
6kyz-12.66-20.324.0-9.36-17.133.717.6317.2120.9320.40
609h-39.53-43.452.8-52.27-57.510.610.4513.312.290.74
6pyd-45.87-58.711.0-35.75-45.281.16.2913.503.830.06
6u9s-36.54-48.661.0-39.79-44.801.314.3510.4211.1114.28
Average1I2.6l12.410.209.358.898.01

Additional files

Download links