Structure diversity of 557 previously unreported ligands co-folded with the SARS-CoV-2 Mac1 (NSP3 macrodomain 1).

a) Global superposition of all 557 ligands within the Mac1 binding pocket highlighting residues involved in key interactions. b) The affinity distribution of 202 molecules determined by HTRF (drug-response curves at 8 concentration points). c) Principal Component Analysis on pairwise MCS% and Tc to illustrate the chemical diversity of new molecules in the space. d) The highest MCS% vs. ECFP-4 Tc values in the training set for each Mac1 compound. At right are examples of the low similarity, scaffold hop, and high similarity distributions, respectively.

Co-folding outperforms DOCK3.7 in predicting the ligand pose for > 500 ligands against Mac1 binding site.

a) Histogram distributions of ligand heavy-atom RMSD (L-RMSD) between predicted and crystallographic poses obtained with AF3, Chai-1, Boltz-2 and DOCK3.7, and the success rate defined as RMSD < 2 Å, b) Demonstration of alternate conformations of Mac1 binding pocket (twisted backbone and open state) and ability of AF3 to capture these conformational changes (residues in green are from PDB ID: 5SQW, and residues in pink have altered conformations), c) AF3 L-RMSD pose recovery is compared with TC ECFP4 and MCS% and accurate poses for test molecules with poor similarity to the closest molecule in the training set.

Complementarity between co-folding confidence metrics and ligand-pose accuracy.

a) Receiver Operating Characteristic (ROC) curves for affinity/energy and confidence scores in AlphaFold3, Chai-1 and Boltz-2, with the success rate being defined as L-RMSD < 2 Å. b) Chemical structure of a compound that was challenging to model with all docking, co-folding methods (mac-x4091), and one that was well-modelled with all methods (mac-x3927), c) Overlay of co-folded (AlphaFold3, Chai-1, Boltz-2) and docked ligand poses on ground-truth, for hits with similar IC50, but different structures. Hydrogen bonds with co-folded, docked ligands are shown in red, and those with crystal ligands are shown in yellow.

Co-folding scores and pose accuracy correlate with Mac1 binding affinity for 202 newly synthesized Mac1 ligands.

a) AF3 ligand pLDDT (L-pLDDT) and Boltz-2 predicted pIC50 correlate with experimental pIC50 more strongly than DOCK energy scores, b) Lower co-folded ligand RMSDs associate with higher affinity, while docked pose RMSDs do not. Pearson’s Correlation Coefficients and Mean Absolute Errors (MAE) are shown. The baseline MAE was 0.68 pIC (or 0.93 kcals), measured against a fixed average pIC50 (pIC50 = 4.95). Friedman tests with Conover post-hoc comparisons and Bonferroni corrections are used to compare MAE between methods.

Classification of known active hits from docked false positives from an initially-screened docked molecules.

The leftmost panel shows affinity scores from Boltz-2 (Boltz-2 pIC50), the middle panel for AF3 L-pLDDT and the right panel for absolute value of DOCK energies (kcal/mol) from DOCK3.7. Docked hit lists from a) σ2, b) Dopamine D4, c) AmpC are grouped into known hits and non-hits, and score distributions are shown in violin plots. Komolgorov-Smirnov test and its KS-statistic (KS stat) is used to indicate meaningful separation of two distribution curves. Asterisks indicate the p-value of KS statistic < 0.001. ROC curves show an ability to distinguish known actives from the docked list with different scoring methods.

: Distribution of pairwise similarity for Mac1 ligands using

a) ECFP4 Tanimoto Coefficients (TC), with lines indicating values for a “scaffold hop” or random distributions. b) Maximum Common Substructure (MCS) similarity values.

Benchmarked datasets are clustered based on TC and MCS% to predict the number of scaffold series.

Principal Component Analysis (PCA) plots make pairwise comparisons of molecules in the docked hit lists of a) σ2, b) D4, c) AmpC for similarity using two different metrics: TC (first column) and MCS% (second column). Red points indicate docked false positives in the hit list, and blue points indicate known actives. Mac1 clustering plots were shown in Fig.1 and an asterisk indicates we only have known ligands for the Mac1 set. The number of scaffolds based on cluster heads are determined by either TC > 0.35, or MCS > 35%. The table gives the summary of the clustering analysis of the datasets used.

Detailed predictions of alternate residue conformations and interactions in co-folded poses.

a) Mis-prediction of alternate conformations of the binding pocket, including twisted backbone (Phe156 RMSD > 1.5 Å) and open structure (Gly130 RMSD, Ala129 RMSD > 3 Å) did not produce false positive or false negative pose prediction. The density distribution plot shows RMSD between residues in PDB ID: 5SQW, and 557 complexes obtained from crystallography, a filter that was used to define alternate conformations. b) Hydrogen bonds between ligands and residues within 5 Å were counted for crystal, and for AF3 co-folded structures. Interactions shown with 4 hotspot residues (D22, I23, F156, D157) were compared and confusion matrices show absolute counts of True positives, True negatives, False positives and False negatives. Matthews Correlation Coefficients are calculated for each residue.

Correlations between Mac1 pose recovery with co-folding/docking scoring metrics.

a) Docked pose RMSD is compared with DOCK scores (in kcal/mol), b) Chai-1 RMSD is compared with interface predictive TM-Score (ipTM), and c) Boltz-2 pose RMSD is compared with predicted Boltz-2 pIC50 affinity score. AF3 RMSD is compared with three possible scoring metrics from AF3: d) Ligand-specific pLDDT (L-pLDDT), e) Ligand-specific PAE (L-PAE), f) minimum PAE (mPAE).

Pose accuracy and co-folding docking scores are compared for different similarity bins.

a) Ligand RMSD between co-folded poses and docked poses are compared to the ground-truth for different Tanimoto Coefficient (TC) bins ( < 0.2, 0.2-0.4, 0.4-0.6, > 0.6), b) Scores for each method (AF3 L-pLDDT, Chai-1 ipTM, Boltz-2 pIC50, DOCK3.7 energies) are compared for different TC bins. c) Ligand RMSD for different Maximum Common substructure (MCS%) bins ( < 40, 40-60, 60-80, > 80), d) Scores for each method are compared for different MCS% bins. TC and MCS% of new molecules to the trained set were calculated for each model, and for DOCK3.7 (the only non-co-folding method), the similarity was calculated against the AF3 trained set.

Correlation between AF3 pose-prediction error and errors from other co-folding and docking methods for 557 Mac1 compounds.

a) AF3-Ground truth (GT) L-RMSD vs AF3-Chai L-RMSD, b) AF3-Ground truth L-RMSD vs AF3-Boltz 2 L-RMSD, c) AF3-Ground truth L-RMSD vs AF3-DOCK L-RMSD. Ligands with affinity data (n= 202) are marked in green, those with high COM distance are marked in red, and molecules with no affinity data are marked as grey. Pearson correlation coefficients and regression lines are shown.

Pose recovery by methods of co-folding and docking against the ground-truth pose.

a) AF3 vs. Chai-1, b) AF3 vs. DOCK3.7, c) Boltz-2 vs. Chai-1, d) Chai-1 vs. DOCK3.7, e) Boltz-2 vs. AF3, f) Boltz-2 vs. DOCK3.7. Molecules indicated (Mac-x4091, Mac-x3927) are exemplary ligands highlighted in Fig. 3, and L-RMSD values quoted here are all compared against the ground-truth pose.

Actives from the benchmarked datasets are compared with resolved structures used to train AF3, Chai-1 and Boltz-2.

Actives from AmpC (n = 247), D4 (n = 205), σ2 (n = 201), Mac1 (n = 557) were compared with corresponding trained sets, to give an indication how similar the actives were to those trained by co-folding models, based on a) Tanimoto coefficients, and b) Maximum Common Substructures (%). AF3 and Chai-1 have similar cutoff dates (indicated as Mac1_af3), but Boltz-2 has been trained with more recent PDB and is labelled as Mac1_boltz. Exact PDB IDs used for each model system is shown in Supplementary Table 6.

Discriminative power of co-folding and docking scores by correlation against experimentally measured pKacross different targets:

a) σ2 (n = 201 actives), b) D4 (n = 205 actives), c) AmpC (n = 247 actives). The leftmost panel is for Boltz-2 pIC50 affinities, the middle panel for AF3 L-pLDDT and the rightmost panel for DOCK scores. All these scores are compared against measured pKi values. Mean Absolute Errors and Pearson correlation coefficients are calculated. Blue lines show the corrected regression line, black dotted line before the correction, and green horizontal line shows the baseline at an average measured pKi value. Friedman test with Conover post-hoc comparisons and Bonferroni corrections compare MAE between methods for each target system.

Different methods of treating randomly assigned pKi values for docked false positives in σ2, D4, AmpC docked lists.

Mean absolute errors (before linear correction is shown in grey dotted line, after linear correction is the red regression line and blue line indicates the baseline when we predict all values at measured pKi) between the measured pKi and Boltz-2 pIC50 affinity scores when a) all non-hits are assigned a pKi = 2 x pKi,threshold; b) non-hits are randomly assigned a pKi value between the pKi,threshold and 1. Baseline MAE is only quoted for b), since the error could be misinterpreted when all non-binders’ pKi values are fixed.

Hit rate curves for co-folding and docking scores for the three experimental benchmark datasets.

Hit rate curves are plotted over a rolling window (window = 100 for AmpC, σ2 and 50 for D4) after ranking with pProp from the docked hit lists, from three different targets: a) σ2 (201 actives, 305 non-binders), b) Dopamine D4 (205 actives, 336 non-binders), c) AmpC β-lactamase (247 actives, 1,046 non-binders).

Correlation between DOCK scores and some co-folding scores (AF3 L-pLDDT, Boltz-2 pIC50).

Blue points indicate known actives and red points are non-binders for each target system: a) σ2, b) D4, c) AmpC and d) Mac1. Note that Mac1 dataset comprises only actives.