A schematic of the AF2RAVE-Glide workflow: (i) decoy structures generated by reduced MSA AF2, (ii) regular space clustering and unbiased MD simulations starting from cluster centers, (iii) State Predictive Information Bottleneck model (SPIB, a RAVE variant) to learn reaction coordinates from unbiased MD, (iv) enhanced sampling runs to calculate free energy landscape, (v) distinguish holo-like structures from decoys in metastable states based on Boltzmann rank and conduct Glide or Induced-fit Docking (IFD) on holo-like structures for ligands targeting metastable states. The decoy structure set and the learnt SPIB coordinates are transferable to homologous systems.

A) The NMR (Nuclear Magnetic Resonance) structures of Abl1 kinase overlay, comparing the activation loop (A-loop) in the active DFG-in state (red, PDB: 6XR6) with the classical DFG-out state (blue, PDB: 6XRG). Type I inhibitors target the active DFG-in state, where the DFG motif adopting the DFG-in configuration and the A-loop adopting the “extended” configuration, while type II inhibitors target the classical DFG-out state, where the DFG motif adopting the DFG-out configuration and the A-loop is “folded”. The distance between CB atoms of residue N98 (grey bead) and residue R162 (red/blue bead) in Abl1 kinase serves as an order parameter here to illustrate the location of the A-loop. The dashed black block emphasizes the different configurations of the DFG motif in this two states. B) The Dunbrack definition for DFG motif configuration is employed here. The Dunbrack space is delineated by two order parameters: D1=dist(F158-CZ, M66-CA), D2=dist(F158- CZ, K47-CA). C) or D) The docking poses with the lowest ligand RMSD for 4 known kinase inhibitors targeting the Abl1 or DDR1 kinase AF2 structure, generated by Glide XP. Cocrystallized structures are shown as light-cyan cartoons (proteins), green sticks (ligand) and magenta sticks (DFG motif). Docking poses are shown as light-gray cartoons (proteins), gray sticks (ligand) and blue sticks (DFG motif). Comparing with type I inhibitors, AF2 structures of protein kinases fail to dock with type II inhibitors.

A) Upper panel: distribution of A-loop location for the reduced MSA AF2 structures of Abl1 kinase. 4 out of 1198 structures are A-loop folded. Lower panel: k-means clustering of 4 A-loop folded Abl1 structures in Dunbrack space, with the number of clusters (n cluster) set to 2. The structures in the cluster closest to the classical DFG-out state (red circles) fail to dock with type II inhibitors using induced-fit docking (IFD). B) Upper panel: distribution of A-loop location for the reduced MSA AF2 structures of DDR1 kinase. 124 out of 1147 structures are A-loop folded. Lower panel: k-means clustering of 124 A-loop folded DDR1 structures in Dunbrack space, with n cluster set to 5. Among the 15 structures in the cluster closest to the classical DFG-out state (red circles), one structure (IFD winner, highlighted by a red circle filled with green) demonstrates successful docking with type II inhibitors, showcasing a ligand RMSD < 2 Å, utilizing IFD or an extended-sampling version of IFD, IFD-trim. C) The docking poses with the lowest ligand RMSD for 2 type II kinase inhibitors targeting the DDR1 kinase IFD winner structure, generated by IFD or IFD-trim. The color code is the same as Fig. 2.

A or B) The unbiased MD trajectories of DDR1 are projected onto the learnt SPIB latent space. In plot A), the colors of sample points represent the A-loop location, while in plot B), they depict the Dunbrack DFG state. The first SPIB coordinate, σ1, correlates with the A-loop location, and the second SPIB coordinate, σ2, correlates with configuration of the DFG motif. C) The reduced MSA AF2 structures of DDR1 are projected onto the latent space. Sample points are color-coded based on the A-loop location. Light green stars highlight the 15 classical DFG-out structures selected based on prior information in Fig 3. D) Free energy profile in the A-loop folded region of the latent space, calculated from umbrella sampling simulations. The 15 classical DFG-out structures from reduced MSA AF2 are shown as red cross and circles (structures with free energy less than 1 kJ/mol). The IFD winner structure is emphasized using a red circle filled with red. The embedding table shows the lowest ligand RMSD in IFD poses of the rMSA AF2 structure with ponatinib. The IFD winner is among the 2 structures selected by AF2RAVE (PMF < 1 kJ/mol).

A) The reduced MSA AF2 structures of Abl1 are projected onto the latent space. Sample points are color-coded based on the A-loop location. Light green stars highlight the 30 AF2-template Abl1 structures modelled from the 15 DDR1 classical DFG-out structures. B) Upper panel: the distribution of ligand RMSD for the IFD poses of Abl1 structures and two type II ligands. Lower panel: IFD poses with the lowest ligand RMSD for Abl1 AF2-template structures and two type II ligands. The color code is the same as Fig. 2. C) Free energy profile in the latent space, calculated from unbiased MD simulations. The 30 Abl1 classical DFG-out structures from AF2-template are shown as red cross and circles (structures with free energy less than 1 kJ/mol). The IFD winner structures are emphasized using red circles filled with red. D) The table shows the lowest ligand RMSD in IFD poses of the AF2-template structures with two type II inhibitors. All the four IFD winners are among the 8 structures selected by AF2RAVE (PMF < 1 kJ/mol).

Comparing the IFD performance of various structure generation methods for docking type II kinase inhibitors

The AF2 pLDDT rank is plotted against the CA RMSDs from the AF2 structure (the one with the highest pLDDT) for each structure in the rMSA AF2 ensemble for Abl1, DDR1 or Src kinase. A RMSD cutoff of 7 Å (dashed black line) is applied to filter out unphysical structures with large RMSD from the native structure. Each rMSA AF2 ensemble is consist of 1280 structures, 640 for MSAs of depth 8:16 (red) and 16:32 (blue), separately.

A) the rMSA AF2 ensemble for DDR1 is projected in the Dunbrack space. Sample points are color-coded based on the CA RMSD from the AF2 structure with the highest pLDDT. Regular space cluster centers are marked by blue hexagons. For each DFGtype (in, inter or out), top two cluster centers with the lowest CA RMSD are selected as AF2RAVE initial structures. B) To take account of the underrepresented A-loop folded configurations, an extra regular space clustering is conducted only for the A-loop folded structures in the rMSA AF2 ensemble. The color code, notation and the way to select initial structures are the same as plot A. Combining AF2RAVE initial structures from both plot A&B, there are 12 initial structures in total.

A) Distributions from different umbrella sampling windows in the latent space. B) The distribution overlap graph for all the umbrella sampling windows. The mean value of each distribution is shown as blue dots. Each distribution’s 2D histogram is flattened into 1D vectors, and the cosine similarity between two distributions is then indicated by the width and color of the edge connecting the respective dots. Windows from the A-loop folded region are not overlapped well with the windows from the A-loop extended region, while windows inside the A-loop folded region (the left part of the graph) are well connected and are used for the local PMF calculation in Figure 4D.

DDR1 PMF calculated with all the umbrella sampling windows. Hanson et al. 4 found the A-loop folded DFG-out state to be more stable than the A-loop folded DFG-in/inter state for DDR1; Vani et al. 1 reported that the A-loop extended DFG-out state is more stable than the A-loop extended DFG-in/inter state for DDR1. Although our umbrella sampling setup is not sufficient to sample the A-loop movement, the observed relative stability corresponds with the findings of Hanson et al. and Vani et al.

A) one representative frame with aC helix broken in Abl1 umbrella sampling trajectories. The backbone of the aC helix is shown with cyan sticks, while the DFG motif is shown as orange sticks. B) or C) The distribution of the ratios of frames with aC helix broken in each umbrella sampling window for Abl1 or DDRl.

Abl1 PMF calculated from umbrella sampling after discarding windows with αC helix broken. The four holo-like structures (IFD winners) are enriched to the top six based on PMF values.

PMF values and Boltzmann ranks of candidate structures fluctuate with the selection of the umbrella sampling windows and the simulation length of umbrella sampling trajectories, demonstrated with the DDR1 system.

The plot illustrates the number of gaps in the multiple sequence alignment (MSA) generated by mmseq2 (using Colabfold 5) for different kinases. The non-gap count describes the coverage of each position in the MSA. The presence of residue positions with gap counts higher than 40 per cent of the total sequence in DDR1 implies that it has fewer conserved regions than abl1 kinase and src kinase. This characteristic of DDR1 MSA enables the rMSA AF2 protocol to generate multiple conformations for DDR1, including the classical DFG-out conformation, by initializing it at various states. However, the highly conserved nature of abl1 and src makes it challenging for the rMSA AF2 to initialize at a state that can lead to a classical DFGout conformation. Therefore, we used the AlphaFold template protocol to overcome this initialization issue with rMSA AF2.

A) the AF2-template structure for Src kinase is superimposed with its template structure (classical DFG-out in DDR1 rMSA AF2 ensemble, IFD winner). The tAF2 structure of Src is shown as light-orange cartoon (protein) and yellow sticks (DFG motif), while DDR1 template is shown as light-gray cartoon (protein) and blue sticks (DFG motif). B) the AF2-template structure for Src kinase is again superimposed with Src/imatinib co-crystallized structure (PDB 2OIQ). Crystal structure is shown as light-cyan cartoon (protein), green sticks (ligand) and magenta sticks (DFG motif).

A) The distribution overlap graph for all the unbiased MD trajectories starting from 15 classical DFG-out structures in DDR1 rMSA AF2 ensemble. B) The distribution overlap graph for all the unbiased MD trajectories starting from 30 Abl1 tAF2 structures in classical DFG-out state. The color-code is the same as Figure S3

Free energy profile for DDR1 in the latent space, calculated from unbiased MD simulations. The 15 DDR1 classical DFG-out structures in rMSA AF2 are shown as red cross and circles (top 5 structures ranked by free energy values). The IFD winner structure is emphasized using a red circle filled with red.

The AF2 pLDDT rank is plotted against the CA RMSDs from the AF2 structure for each structure in the AF2-cluster ensemble for Abl1, DDR1 or SrcK. A RMSD cutoff of 10 Å (dashed black line) is applied to filter out unphysical structures with large RMSD from the native structure. After the RMSD filter, 197 out of 362 structures remain for Abl1, 134 out of 251 structures remain for DDR1, and 93 out of 355 structures remains for SrcK.

The projection of A) the rMSA AF2 ensemble or B) the AF2-cluster ensemble on the AF2RAVE latent space for SrcK. The classical DFG-out SrcK structure generated from AF2-template in Fig S9 is shown as the green star. The color-code shows the A-loop location.

The projection of A-loop folded structures from the rMSA AF2 ensemble or the AF2-cluster ensemble on the AF2RAVE PMF for Abl1 or DDR1.

The distributions of ligand RMSDs for Glide XP docking poses of DDR1 and type I/type II inhibitors (upper/lower panel). Results from cross-docking against 4 crystal holo structures, docking against the AF structure, and docking against 15 classical DFG-out structure in rMSA AF2 ensemble are shown as green, blue, and red, separately.

Ligand RMSDs are plotted against the docking scores for the IFD docking poses of type II inhibitors (ponatinib and imatinib) against AF2 structure (blue) or classical DFG-out structures in rMSA AF2 ensembles (red). A) IFD docking results for Abl1. B) IFD docking results for DDR1. The pose with the lowest ligand RMSD from each input structure is marked by hexagon.

A) Comparison of the DFG motif for DDR1 in its co-crystalized structure with imatinib (PDB 4BKJ), its IFD winner structure and its AF2 structure. B&C) In the IFD winner structure, the Phe residue in the DFG-motif requires rotation to prevent steric clashes with imatinib. proteins from crystal structure are shown as cyan cartoon, while all the other proteins are shown as grey cartoon. D) Ligand RMSDs are plotted against the docking scores for the IFD-trim docking poses of type II inhibitors (ponatinib and imatinib) against the IFD winner structure in DDR1 rMSA AF2 ensembles. The pose with the lowest ligand RMSD is marked by hexagon.

Ligand RMSDs are plotted against the docking scores for the IFD docking poses of type II inhibitors (ponatinib and imatinib) against Abl1 tAF2 structures. The pose with the lowest ligand RMSD from each input structure is marked by hexagon.

Ligand RMSDs are plotted against the docking scores for the IFD/IFD-trim docking poses of type II inhibitors (ponatinib and imatinib) against the SrcK tAF2 structure. The pose with the lowest ligand RMSD from each input structure is marked by hexagon.

Ligand RMSDs are plotted against the DiffDock confidence scores for the DiffDock poses of type II inhibitors (ponatinib and imatinib) against DDR1 AF2 structure (blue) or the classical DFG-out structures in DDR1 rMSA AF2 ensemble (red). The pose with the lowest ligand RMSD from each input structure is marked by hexagon.

Confidence score for the DiffDock pose aligns with AF2RAVE pmf values. The DiffDock confidence score of the pose with the lowest ligand RMSD (marked in red/bold) from each classical DFG-out structure in DDR1 rMSA AF2 ensemble is compared with the AF2RAVE pmf value for corresponding strutcure (marked in red/bold).