| H3-OPT architecture

a, Schematic for dataset preparation. Structures were screened from the SAbDab database based on resolution and sequence identity. Clustering of the filtered, high-resolution structures yielded three datasets for training (n = 1021), validation (n = 134), and testing (n = 131). b, The workflow of H3-OPT includes two modules. The template module determines whether to use PSPM, while the PSPM module optimizes the AF2 input structures. c, The Template module retains AF2-predicted loops when the confidence score is greater than 0.8 and grafts CDR-H3 loops onto AF2 models for structures with an available template. d, In the PSPM, the network extracts residue-level information and pairwise residue representations from the AF2-predicted models, which are subsequently updated using weight-sharing blocks and concatenated with sequence representations from ESM2. The resulting data is used to predict the 3D coordinates of the H3 loops.

| Template module and ablation studies

a, Side-by-side comparison of Cα-RMSDs of AF2 and IgFold for Sub1 (n = 52); color scale for data points reflects CDR3 length. AF2 outperformed IgFold for targets left of the dashed diagonal; IgFold outperformed AF2 for targets right of the dashed diagonal. b, Correlations between AF2 confidence score and amino acid sequence length of CDR-H3 loops. Datapoint color indicates Cα-RMSD value for that target. The correlation coefficient for confidence score and CDR-H3 loop length is −0.5921. c, The accuracy of H3-OPT in 3 subgroups of the test set. ΔCα-RMSDs were calculated by subtracting the RMSD of AF2 from that of H3-OPT. AF2 had higher accuracy for targets above the dashed line; H3-OPT had better accuracy for structures below the dashed line. d, Differences in H3-OPT accuracy without the template module. This ablation study means only PSPM is used. e, Differences in H3-OPT accuracy without the CBM. This ablation study means input loop is optimized by TGM and PSPM. There are thirty targets in our database with identical CDR-H3 templates. f, Differences in H3-OPT accuracy without the TGM. This ablation study means input loop is optimized by CBM and PSPM.

| PSPM module

a, Side-by-side comparison of Cα-RMSDs for AF2 and IgFold, IgFold and H3-OPT in the Sub2 (n = 46) and Sub3 (n = 33) test sets, respectively. b, Comparison of prediction accuracy between AF2 and H3-OPT for Sub2 and Sub3 targets. Metrics including RMSDs, TM-scores, and GDT-scores were used to quantitatively assess similarity between predicted and experimental structures. c, Comparison of prediction accuracy between AF2 and H3-OPT using six metrics (RMSD, RMSDbackbone, RMSDsidechain, TM-score, GDT-TS score and GDT-HA score). Radar plots of the mean values of different methods and metrics in predictions of Sub2 and Sub3 targets.

Performance of H3-OPT with different PLMs

| Accuracy of CDR-H3 loop prediction by H3-OPT

a, The performance of H3-OPT in the test set (nmAbs = 119, nNbs = 12) relative to other methods. The RMSD of H3-OPT was significantly lower than other existing methods (p < 0.001). b, The performance of H3-OPT in structural predictions of 3 subgroups of the test set (n = 52, 46, and 33). c, H3-OPT structural predictions for three anti-VEGF nanobodies (PDB ID: 8IIU, 8IJZ, 8IJS).

| Analysis of surface patches

a, Analysis of surface amino acids for predicted H3 loops. Y-axis represents average number of surface residues for H3 loops (n = 131). b, Histogram of surface patches with different properties predicted by H3-OPT, AF2, or experimentally solved H3 loops. Error bars show standard deviations. c, Solvent-accessible surface area (SASA) analysis of predicted H3 loops. Values represent the difference in SASA between predicted and experimentally determined H3 structures using AF2 or H3-OPT. d, Comparison of the charged surface patches between H3-OPT and AF2 for target PDBID: 5U3P. The surface maps compare the surface electrostatic potential of the CDR-H3 loop predicted by H3-OPT or AF2 with the native structure. Darker shading indicates greater difference in electrostatic potential.

Comparison of binding affinities obtained from MD simulations using AF2 and H3-OPT

| Accuracy of H3-OPT predictions of antibody-antigen interactions

a, Performance of H3-OPT in binding site prediction. comparison of prediction accuracy between H3-OPT and AF2 for antibody-antigen binding sites (n = 27). Box represents interquartile range (IQR); horizontal line in the center of the box shows median. b, Comparison of the mean squared errors of residue pairs between H3-OPT and AF2 under different distance thresholds. The x-axis represents the experimentally determined distance between pairs of contacting residues at the binding site in the native structure. Y-axis shows mean squared errors of H3-OPT and AF2. c, Heatmaps of the frequency of pairwise residue-residue contacts across antibody-antigen interfaces. This analysis compares contact frequency of H3 loops predicted by AF2 or H3-OPT with the native structure. Darker shading indicates greater difference in contact frequency. d, The predicted H3 loops of two targets interacting with antigens (PDB: 2YC1, 6O9H). The epitopes are highlighted in red and antibody chains are green. H3-OPT could identify the epitopes of different antigens that form the complementary binding interface(s) for the CDR-H3 of antibodies.

Features to the model. Nres is the number of residues.29

Hyperparameters for H3-OPT models

Average Cα-RMSDs of our test set under different confidence cutoffs

| Solvent-accessible surface area analysis of predicted H3 loops

The values represent the difference in SASA between H3 structures predicted by AF2 or H3-OPT and experimentally determined structures. Positive values indicate that the predicted structures have more exposed surface area compared to the native structures; negative values indicate less exposed surface area.

| Comparison of accuracy between AF2, H3-OPT, and tFold-Ab methods using the CAMEO 2022 benchmark dataset

The x-axis represents different targets; y-axis represents Cα-RMSD values.