Evolutionary divergence in the conformational landscapes of tyrosine vs serine/threonine kinases
Abstract
Inactive conformations of protein kinase catalytic domains where the DFG motif has a “DFG-out” orientation and the activation loop is folded present a druggable binding pocket that is targeted by FDA-approved ‘type-II inhibitors’ in the treatment of cancers. Tyrosine kinases (TKs) typically show strong binding affinity with a wide spectrum of type-II inhibitors while serine/threonine kinases (STKs) usually bind more weakly which we suggest here is due to differences in the folded to extended conformational equilibrium of the activation loop between TKs vs. STKs. To investigate this, we use sequence covariation analysis with a Potts Hamiltonian statistical energy model to guide absolute binding free-energy molecular dynamics simulations of 74 protein-ligand complexes. Using the calculated binding free energies together with experimental values, we estimated free-energy costs for the large-scale (~17–20 Å) conformational change of the activation loop by an indirect approach, circumventing the very challenging problem of simulating the conformational change directly. We also used the Potts statistical potential to thread large sequence ensembles over active and inactive kinase states. The structure-based and sequence-based analyses are consistent; together they suggest TKs evolved to have free-energy penalties for the classical ‘folded activation loop’ DFG-out conformation relative to the active conformation, that is, on average, 4–6 kcal/mol smaller than the corresponding values for STKs. Potts statistical energy analysis suggests a molecular basis for this observation, wherein the activation loops of TKs are more weakly ‘anchored’ against the catalytic loop motif in the active conformation and form more stable substrate-mimicking interactions in the inactive conformation. These results provide insights into the molecular basis for the divergent functional properties of TKs and STKs, and have pharmacological implications for the target selectivity of type-II inhibitors.
Editor's evaluation
This important paper provides a convincing mechanism for the relative binding specificity of Type II inhibitors to kinases. The combination of a sequence-derived Potts model with experimental dissociation constants and calculated free energies of binding to the DFG-out state is highly compelling and goes beyond the current state-of-the-art. Given the importance of kinases in pathophysiological processes, the results will be of interest to a broad audience and, in addition, the combination of computational methods can be applicable to a wide variety of other biophysical processes that involve conformational rearrangements.
https://doi.org/10.7554/eLife.83368.sa0Introduction
The human genome contains approximately 500 eukaryotic protein kinases which coordinate signaling networks in cells by catalyzing the transfer of a phosphate group from ATP to serine, threonine, or tyrosine residues (Manning, 1995; Modi and Dunbrack, 2019a). The GO (gene ontology) database identifies 351 (~70%) of these enzymes as serine/threonine kinases (STKs) and 90 (~18%) as tyrosine kinases (TKs). STKs are an ancient class of protein kinases that predate the divergence of the three domains of life (bacteria, archaea, eukaryote) (Stancik et al., 2018), whereas TKs are a more recent evolutionary innovation, having diverged from STKs about 600 million years ago during early metazoan evolution (Miller, 2012; Sebé-Pedrós et al., 2016). Kinases are important therapeutic targets in a large number of human pathologies and cancers. Both TKs and STKs share a striking degree of structural similarity in their catalytic domains, owing to evolutionary selective pressures that preserve their catalytic function; in particular, the location and structure of the ATP binding site are highly conserved which raises significant challenges for the design of small-molecule ATP-competitive inhibitors that are both potent for their intended target(s) and have low off-target activity for unintended kinase targets. The latter is referred to as the ‘selectivity’ of an inhibitor, a property which is difficult to predict and control but is nonetheless very important for developing drugs with minimal harmful side effects.
A particular class of ATP-competitive kinase inhibitors which were proposed to have a high potential for selectivity are called ‘type-II inhibitors’ which only bind when the kinase adopts an inactive ‘DFG-out’ conformation. ‘DFG’ (Asp-Phe-Gly) refers to a conserved catalytic motif located at the N-terminus of an ~20 residue-long ‘activation loop’ that is highly flexible and controls the activation state of the kinase and the structure of the substrate binding surface. The precise arrangement of catalytic residues and the structural organization of large regulatory elements, such as the activation loop and nearby ‘αC-helix’, are strongly coupled to the conformation of the DFG motif and the DFG-1 residue preceding it, which is well described by regions on the Ramachandran map occupied by the Asp, Phe, and DFG-1 residues (beta-turn, right-handed alpha-helix, left-handed alpha-helix) and the χ1 rotamer state of the DFG-Phe sidechain (trans, gauche-minus, gauche-plus). Recently, Dunbrack and co-workers identified eight major conformational states in the Protein Data Bank (PDB) based on these metrics (Modi and Dunbrack, 2019b). The most common state, which is evolutionarily conserved in all kinases, corresponds to the active ‘DFG-in’ conformation. In this conformation all structural requirements for catalysis are typically met, e.g., a complete hydrophobic spine, a salt bridge between the conserved β3-Lys and αC-Glu residues, and an extended activation loop which forms the substrate binding surface. Inactive kinases in the PDB are most frequently seen in an ‘Src-like inactive’ conformation where the DFG is ‘in’, but the αC-helix is swung outward, breaking the β3-Lys → αC-Glu salt bridge and disassembling the hydrophobic spine. Disassembly of the hydrophobic spine caused by αC-helix rotation increases the cavity volume around the DFG-Phe residue, allowing it to pass through the Src-like inactive conformation and completely ‘flip’ from DFG-in to DFG-out (Levinson et al., 2006; Shan et al., 2013). The classical DFG-out conformation, targeted by type-II inhibitors, displays a highly reorganized activation loop that is folded away from the αC-helix, projecting toward solvent or forming stable secondary structure and substrate-mimicking interactions. We refer to these states of the activation loop collectively as ‘folded’, to describe its ~17 Å reorganization relative to the active ‘extended’ conformation, wherein the substrate binding surface has been ‘folded up’ toward the kinase N-terminal lobe and away from the αC-helix.
In both TKs and STKs, the activation loop undergoes this large-scale conformational change when the DFG motif flips from the active ‘DFG-in’ conformation to the classical DFG-out conformation. The DFG flip swaps the positions of DFG-Phe and DFG-Asp, opening a hydrophobic ‘back pocket’ that is connected to the conserved ATP binding site through the ‘gatekeeper’ residue. Type-II inhibitors typically have a long chemical fragment that allows them to bind across the gatekeeper and form interactions with residues in the back pocket. In contrast, type-I inhibitors (the majority of kinase drugs) occupy the ATP pocket but not the back pocket and can bind to either DFG-in or DFG-out. For these reasons, it has been proposed that type-II inhibition holds greater potential for the design of highly selective drugs (Vijayan et al., 2015; Davis et al., 2011; Anastassiadis et al., 2011); it has been shown that different kinase sequences have different propensities to adopt DFG-out in the absence of inhibitor (Haldane et al., 2016; Hari et al., 2013), and the DFG-out back pocket has been suggested to have a lesser degree of sequence and structural homology between kinases (Liu and Gray, 2006). However, the notion that type-II inhibitors developed to-date are more selective than type-I inhibitors has been brought into question (Zhao et al., 2014; Klaeger et al., 2017), suggesting that further consideration of the energetic contributions described above is required.
In order to fully exploit the target-selective potential of type-II inhibitors it is necessary to understand the underlying sequence-dependent principles that control the conformational preferences of their kinase targets, and the extent to which this has been diversified by evolution. This can, in principle, be directly approached using free-energy simulations to estimate the reorganization free-energy required for different kinases to adopt DFG-out, although this is computationally very expensive and of uncertain reliability for conformational changes involving large-scale loop reorganizations, such as the ~17 Å ‘folding’ of the activation loop that accompanies the transition from active DFG-in to the inactive, classical DFG-out state. To accommodate this limitation, we employ modern sequence-based computational methods to characterize the conformational selection process over the entire kinome and combine the sequence-based results with structure-based free energy simulations with the goal of identifying evolutionarily divergent features of the energy landscape that control the preference of individual kinases for the active (DFG-in) vs inactive (DFG-out ‘folded activation loop’) states. To this end, we report evidence that TK catalytic domains have a molecular evolutionary bias that shifts their conformational equilibrium toward the inactive ‘folded activation loop’ DFG-out state in the absence of activation signals. In contrast, STKs as a class have a more stable active conformation which is favored over the DFG-out state due to sequence constraints in the absence of other signals.
As described below, our analysis of a previously published kinome-wide assay suggests that TKs have properties which privilege the binding of type-II inhibitors in comparison to STKs, which leads us to hypothesize an evolutionary divergence in their conformational energy landscapes. To investigate this, we used a Potts Hamiltonian statistical energy model derived from residue-residue covariation in a multiple sequence alignment (MSA) of protein kinase sequences to probe the active DFG-in ↔ classical DFG-out conformational equilibrium as previously described (Haldane et al., 2016). Using an approach that involves ‘threading’ a large number of kinase sequences onto ensembles of active DFG-in and classical DFG-out structures from the PDB and scoring them using the Potts Hamiltonian, we are able to view the evolutionary divergence in TK and STK conformational landscapes. This calculation only probes the free-energy difference between the active DFG-in and classical DFG-out conformations, and by construction does not consider alternative conformations (e.g. ‘Src-like inactive’) that might be important for analyzing the type-II binding pathway. As discussed below, the Potts calculations from this two-state model correlate well with the free-energy cost to adopt the classical DFG-out conformation.
To validate our results, we used the Potts statistical energy threading calculations to guide target selection for a set of more computationally intensive free-energy simulations. These simulations use type-II inhibitors as tools to probe kinase targets that have already reorganized to DFG-out, allowing us to estimate the free-energy of reorganization () as the excess between the absolute binding free-energy (ABFE) calculated from simulations and the standard binding free-energy measured experimentally in vitro, which already includes the cost to reorganize. Although our methods avoid sampling the conformational change directly, we show how important structural determinants of the conformational change can be identified by analyzing residue-pair contributions to the Potts threading calculations, enabling us to reason about the molecular evolutionary basis for the differences in conformational behavior observed for TKs and STKs.
Results
Insights into the sequence-dependent conformational free-energy landscape
The binding of type-II inhibitors is achieved once a protein kinase has reorganized to the DFG-out with activation loop folded conformation (classical DFG-out). We sought initial insight into the conformational equilibrium from type-II binding data available publicly in the form of literature-reported dissociation constants (Kd). From the binding assay reported by Davis et al., 2011, we report a ‘hit’ where an inhibitor binds to a kinase with Kd ≤10 μM. Using this criterion, a type-II inhibitor hit rate was calculated for each kinase. Analysis of the type-II hit rate distributions for STKs and TKs from the Davis assay (Figure 1A) indicates that STKs, on average, have an unfavorable contribution to the binding of type-II inhibitors relative to TKs.
The size of the gatekeeper residue is important for type-II binding as it controls access to a hydrophobic pocket adjacent to the ATP binding site that is traversed by type-II inhibitors (Zuccotto et al., 2010; Bosc et al., 2015; Liu et al., 1998; Ghose et al., 2008; van Linden et al., 2014), and the size of the gatekeeper residue is thought to negatively affect type-II binding (Zuccotto et al., 2010; Azam et al., 2008; Lovera et al., 2015; Yun et al., 2008). Because TKs tend to have small gatekeepers in comparison to STKs (Zuccotto et al., 2010; Taylor and Kornev, 2011), we considered this as a possible explanation behind the bias for TKs to have larger type-II hit rates. By plotting the hit rate distributions for STKs and TKs where the gatekeeper is either small or large (Figure 1A) we confirm that gatekeeper size has an important influence on type-II binding for both STKs and TKs (Azam et al., 2008). However, the hit rate distribution for TKs appears more sensitive to gatekeeper size than STKs. Even with small gatekeepers, there is a significant fraction of STKs that have hit rates of zero compared with TKs, suggesting the difference in hit rates between TKs and STKs cannot be accounted for primarily by the size of the gatekeeper residue.
Recent solution NMR experiments with Abl kinase revealed two DFG-out conformational states (Xie et al., 2020) one where the DFG motif has flipped ‘DFG-in to DFG-out’ but the activation loop remains in a ‘minimally perturbed’ active-like conformation, and the other state is a classical ‘folded’ DFG-out conformation where the activation loop has moved ~17 Å away from the active conformation (Figure 1B), and the DFG motif is in a ‘classical DFG-out’ (Vijayan et al., 2015) or ‘BBAminus’ (Modi and Dunbrack, 2019b) state. Type-II inhibitors were shown to preferentially bind to this folded DFG-out state, confirming observations that Abl is almost always co-crystallized with type-II inhibitors in this conformation. This binding behavior is also exhibited by other kinases, suggested by the large number of activation loop folded DFG-out states seen in type-II bound co-crystal structures (Figure 1—figure supplement 1). Hence, the importance of large-scale activation loop conformational changes in type-II binding and the large number of residue-residue contact changes involved in this transition (Figure 1—figure supplement 2) suggests the sequence variation of the activation loop and the catalytic loop with which it interacts, might contour the conformational landscape differently for TKs compared with STKs. To investigate this, we used a Potts statistical energy model of sequence covariation to estimate the energetic cost of the active DFG-in (activation loop extended) → inactive DFG-out (activation loop folded) transition for human TKs and STKs (see Methods).
Patterns of coevolution of amino acids at different positions in an MSA are thought to largely reflect fitness constraints for fold stability and function between residues close in 3D space (Lapedes et al., 2012; Hopf et al., 2015; Hopf et al., 2017; Morcos et al., 2014), and these coevolutionary interactions can be successfully modeled by a Potts Hamiltonian (Weigt et al., 2009; Lunt et al., 2010) which we inferred using Mi3-GPU, an algorithm designed to solve ‘Inverse Ising’ problems for protein sequences with high accuracy (Haldane and Levy, 2021). The pairwise interactions from the Potts model can be used as a simple threaded energy function to estimate energetic differences between two conformations, based on changes in residue-residue contacts in the PDB (Haldane et al., 2016). We have calculated the threading penalty for all kinases in the human kinome. Our calculations show the Potts predicted DFG-out penalty (), which is dominated by large-scale reorganization of the activation loop to the folded DFG-out state, is correlated with type-II hit rates (Figure 1C) when controlling for gatekeeper size. From this, we determine that sequence variation of the activation loop and the contacts broken/formed by its large-scale conformational change (Figure 1—figure supplement 2) makes an important contribution to the binding affinity of type-II inhibitors.
Notably, our calculations over the entire human kinome show that the large majority of kinases with large (unfavorable conformational penalties) are STKs, and the large majority of low-penalty kinases are TKs (Figure 1D). To validate this finding, we next perform an independent and more computationally intensive prediction of the conformational reorganization energy of TKs and STKs for select kinase targets, chosen based on the kinome calculations of and type-II hit rates shown in Figure 1, in which we use type-II inhibitors as probes in ABFE simulations as described in the following section. By comparing the conformational penalties predicted from these structure-based molecular dynamics (MD) free-energy simulations with the Potts conformational penalty scores, we also identify the scale of in physical free-energy units. This allows us to predict physical conformational free energies based on Potts calculations which can be carried out at scale on large numbers of sequences, to evaluate the evolutionary divergence of the conformational penalty between STKs and TKs generally.
Structure-based free-energy simulations guided by the sequence-based Potts model
Relative binding free-energy simulations are now widely employed to screen potent inhibitors in large-scale drug discovery studies (Wang et al., 2015). These methods are used to determine the relative free energy of binding between ligands that differ by small substitutions, which permit one to simulate along an alchemical pathway that mutates one ligand to another. By leaving the common core scaffold unperturbed, the cost and difficulty of sampling the transition between unbound (apo) and bound (holo) states of the system are avoided (Wang et al., 2015; Hayes et al., 2022; Guest et al., 2022). Alternatively, alchemical methods to determine ABFEs, such as the ‘double decoupling’ method employed in this work, sample the apo → holo transition along a pathway that decouples the entire ligand from its environment. While more computationally expensive, the advantage of ABFE is that the computed can be directly compared with experimental binding affinities, and successful convergence does not rely on the structural similarity of compounds being simulated (Cournia et al., 2020; Li et al., 2020; Heinzelmann and Gilson, 2021; Lee et al., 2020; Gilson et al., 1997; Qian et al., 2019; Sun et al., 2022).
Our alchemical ABFE simulations of type-II inhibitors binding to TKs and STKs simulate the apo and holo states of the kinase domains in the classical DFG-out conformation with the activation loop folded, starting from the experimentally determined co-crystal structure of the holo state. The apo state remains DFG-out with the activation loop folded throughout the simulations, and therefore the calculated ABFE () excludes the cost to reorganize from DFG-in (). On the other hand, standard binding free-energies () determined experimentally from inhibition or dissociation constants (Equation 1) implicitly include the free-energy cost to reorganize. Therefore given the experimentally determined total binding free energy, , ABFE simulations can be used to separate the free energy of ligand-receptor association in the inactive state () from the cost to reorganize from the active to inactive state, (Equation 3; Deng et al., 2011; Lin et al., 2014; Lin et al., 2013).
We calculated (Equation 1) from literature reported IC50 or values, where the standard concentration is set to 1
can be expressed as the sum of the free-energy change to reorganize from the active to inactive state, plus the free energy to bind to the inactive state (Equation 2). is therefore the excess free-energy difference between and (Equation 3).
Type-II inhibitors generally bind when the activation loop is in a folded DFG-out conformation (Figure 1B), which presents major challenges for direct simulations to determine the free energy cost of the conformational change in contrast to the method employed here (Equation 3).
Because the type-II inhibitor imatinib is co-crystallized in a type-II binding mode with MAPK14 (p38α), an STK, and several other TKs (e.g. ABL1, DDR1, LCK, CSF1R, KIT, and PDGFRA), we chose this inhibitor as an initial probe of our hypothesis that TKs evolved to have lower than STKs (Figure 2). In this example we note that TKs bind strongly to imatinib (‘STI’ in Figure 2) with an average of –9.3 kcal/mol, in contrast to the STK MAPK14 which binds this drug very weakly ( kcal/mol). At face value this appears consistent with our analysis from Figure 1D, where we calculated a large Potts DFG-out penalty for MAPK14 () and low penalties for TKs, suggesting that the weak binding of imatinib to MAPK14 is due at least partially to large . To confirm this, we used ABFE simulations with the imatinib: MAPK14 complex to evaluate Equation 3, confirming that MAPK14 incurs a large penalty to adopt the DFG-out conformation with the activation loop folded ( kcal/mol) (Figure 2).
Despite the large predicted for MAPK14 by both the Potts model and the simulations with imatinib described above, highly potent type-II inhibitors have been successfully developed for this kinase. For example, BIRB-796 (Pargellis et al., 2002) binds to MAPK14 about 7 kcal/mol more strongly than imatinib. This stronger binding of BIRB-796 is captured by from our simulations (Figure 2), and the calculated value of for this complex ( kcal/mol) is very close to the corresponding estimate of based on simulations with imatinib (Figure 2). Importantly, this result suggests that STKs can be potently inhibited by type-II inhibitors despite their large . To support this, we performed additional ABFE simulations with BIRB-796 and calculated for two additional STKs predicted to have large reorganization penalties (MAPK9 and BRAF, ). We calculated kcal/mol for MAPK9 and BRAF, which is consistent with predictions from the Potts model, and comparison of and in Figure 2 confirms that BIRB-796 is able to overcome the large of certain kinases to attain high experimental potencies (e.g. MAPK14 and MAPK9). To further validate this result, we calculated via ABFE simulations of BIRB-796 binding to a TK predicted by the Potts model to have a low penalty (PTK2B, ), which again shows consistency with our Potts prediction of the conformational landscape (Figure 2). The relatively weak value of for this kinase compared with MAPK14 is also consistent with observations of the BIRB-796: PTK2B co-crystal structure (PDB: 3FZS), where the binding mode in PTK2B is more weakly associated with the ATP pocket in comparison with MAPK14 (Han et al., 2009).
The analysis above provides initial support for our hypothesis about the evolutionarily divergent STK and TK conformational landscapes. To further develop this approach, we identified five STKs and five TKs which are predicted by the Potts threading calculations to have large and small , respectively, and for which there are sufficient experimental structural and inhibitory data (co-crystal structures and binding constants) to calculate an average via Equation 3 for each target using multiple type-II inhibitor probes. For each of the TK and STK targets, these sets of calculations can be visualized as a linear regression of vs where the slope is constrained to one, consistent with Equation 2 (see Methods for details). We employed this workflow for the set of five TK and five STK targets by simulating 22 and 23 type-II inhibitor complexes, respectively.
The result of this workflow for the set of five TKs and their type-II complexes revealed a low average of <1 kcal/mol (Figure 3B), consistent with our initial predictions from Potts and type-II hit rates (Table 1). On the other hand, the binding free-energy simulations for the set of five STKs and their type-II complexes show an average of ~6 kcal/mol of is required for these kinases to adopt DFG-out conformation, which is also consistent with our initial predictions from the Potts model (Table 1). To verify that the large identified for STKs is a property of conformational selection for DFG-out rather than systematic overestimation of for these kinases, we performed ABFE simulations of type-I inhibitors binding to the same set of STKs (an additional 23 complexes). For the binding of type-I inhibitors, we expect there to be no reorganization penalty due to the lack of DFG-out conformational selection in their binding mechanism. As anticipated, the calculated values of for type-I inhibitors are very close to their experimental binding affinities () (Figure 3A).
We find that the set of type-II inhibitors complexed with STKs in this dataset tends to have more favorable binding free energies to the reorganized receptor () than type-II inhibitors complexed with TKs, as shown by their distribution along the horizontal axes in Figure 3. The reason for this can be understood as a consequence of selection bias. Our selection of STK complexes for this study usually involved lead compounds from the literature, which were designed for high on-target experimental potency and published for their pharmaceutical potential, similar to BIRB-796: MAPK14 which is a tightly bound complex with high experimental affinity despite the large incurred by this kinase (Figure 2). This tight binding is reflected by the favorability of the term which must be implicitly tuned by medicinal chemists to overcome the large found in STKs. Meanwhile, the chemical space of type-II inhibitors studied against TKs appears to be privileged by their low , judging by the comparably weak for these complexes. This ultimately gives rise to similar experimental potencies for the binding of type II inhibitors to TKs and STKs plotted in Figure 3.
The results of the MD binding free-energy simulations when combined with experimental binding affinities, reveal significant differences in the conformational free-energy landscapes between STKs and TKs. The DFG-in (activation loop extended) to DFG-out (activation loop folded) reorganization penalties are strongly correlated with corresponding calculated from the Potts model () emphasizing the connection between coevolutionary statistical energies in sequence space and physical free-energies in protein conformational space (Figure 4). From this relationship, we can approximate a scale for the Potts scores in physical free-energy units which describe the conformational landscapes of folded proteins in a similar manner to that of an earlier study of protein folding landscapes (Morcos et al., 2014); we find that a Potts statistical energy difference ΔE of one unit corresponds approximately to 1.3 kcal/mol of .
Structural and evolutionary basis for the divergent TK and STK conformational landscapes
The consistency of the predictions between and identified from Potts-guided free-energy simulations (Figure 4) led us to investigate whether the observed difference in the free energy to reorganize from the active to inactive state is a more general feature that distinguishes TKs from STKs. To this end, we extracted ~200,000 STKs and ~10,000 TKs from the large MSA of Pfam sequences used in the construction of our Potts model, based on patterns of sequence conservation that clearly distinguish the two classes (see Methods). For each sequence, we calculated threaded over the structural database (a total of 4268 active DFG-in and 510 classical DFG-out PDB structures) and plotted the distributions for TKs and STKs, revealing a bias for STKs toward larger Potts conformational penalties (Figure 5). The average difference between these distributions, is extremely unlikely to be observed by chance (p≤10−15, see Methods) and supports the hypothesis that TKs are evolutionarily biased toward a lower free-energy cost to adopt the classical ‘folded activation loop’ DFG-out conformation () compared to STKs. We estimate that corresponds to ~4.3 kcal/mol based on the analysis summarized in Figure 4.
To gain insight into the molecular basis for this effect which distinguishes the conformational landscape of TKs from STKs, we examined the residue-residue interactions that make the most significant contributions to the observed . The difference between average Potts threaded-energy penalties, , can also be written as a sum over pairs of alignment positions and along length of the aligned kinase domains, (see Methods for details). We find that ~75% of the total contribution to (approximately 3 kcal/mol) can be traced back to a small number (10) of residue-residue interactions involving the activation loop, suggesting that mutations within the activation loop are largely responsible for the evolutionary divergence between the conformational free-energy landscapes of TKs and STKs. These interactions occur between important structural motifs responsible for controlling the stability of the active ‘extended’ conformation of the activation loop (Figure 6A), especially the activation loop N-terminal and C-terminal ‘anchors’ (Nolen et al., 2004), and the regulatory ‘RD-pocket’ formed by the HRD motif of the catalytic loop which functions to stabilize or destabilize this conformation depending on the activation loop’s phosphorylation state (Nolen et al., 2004; Johnson et al., 1996). The remaining top correspond to contacts that stabilize the DFG-out ‘folded’ conformation of the activation loop for TKs, wherein the kinase’s substrate binding site recognizes its own activation loop tyrosine (Figure 6C, right; Hubbard et al., 1994). We describe these interactions below, focusing on the strongest effects involving these structural motifs that lead to differences in the conformational free-energy landscapes of TKs and STKs. The residue nomenclature we use in our descriptions follows the format , which is the unique residue numbering in our MSA followed by Abl1 (TK) numbering in the subscript (active PDB: 2GQG, inactive PDB: 1IEP) and CDK6 (STK) numbering in the superscript (active PDB: 1XO2, inactive PDB: 1G3N), corresponding to the original PDB files used to generate Figure 6B and Figure 6C.
The ‘RD-pocket’ (Nolen et al., 2004) is a conserved basic pocket formed by the Arg and Asp residues of the HRD motif (Johnson et al., 1996) ( and ) and a positively charged Lys or Arg that is often present in the N-terminal anchor of the activation loop (). in the HRD motif and Lys/Arg in the N-terminal anchor form an unfavorable like-charge interaction when the activation loop is in the active, extended conformation (Nolen et al., 2004). Kinase activation is typically a complex process involving many layers of regulation from other protein domains, cofactors, and phosphorylation events (Endicott et al., 2012) however, a general activation mechanism that applies to the majority of kinases involves quenching the net-charge of the RD-pocket by addition of a negatively charged phosphate group to a nearby residue in the activation loop, stabilizing the active conformation. The conservation of this regulatory mechanism in most protein kinases, particularly those bearing the HRD-Arg residue (termed ‘RD-kinases’ Johnson et al., 1996), explains why Lys or Arg is frequently observed at position of the N-terminal anchor of the activation loop. However, RD-TKs prefer Arg at this position (78%) which the Potts model suggests has a greater destabilizing effect on the active conformation than Lys (9%) due to interactions with HRD-Arg. The activation loop Arg also forms part of an electrostatic interaction network that stabilizes the ‘Src-like inactive’ conformation in TKs (Ozkirimli and Post, 2006; Wu et al., 2020), a conformation with a ‘partially’ folded activation loop (Figure 1—figure supplement 1) that is suggested to be an intermediate state along the transition to DFG-out (Levinson et al., 2006; Shan et al., 2013). On the other hand, RD-STKs display more frequently (26%), which the Potts model suggests interacts more favorably with the HRD-Arg independently of activation loop phosphorylation, thus contributing to greater stabilization of the active DFG-in conformation for RD-STKs in comparison with TKs. Additionally, the Potts statistical energy analysis suggests that packing interactions between HRD-Arg and located near the activation loop C-terminal also contributes to phosphorylation-independent stabilization of the active conformation in RD-STKs (Figure 6B left), appearing in 28% of RD-STKs and only 2% of RD-TKs. In RD-TKs, the residue at this position is usually Arg or Lys which appears to be repelled from the RD-pocket (Figure 6C left) and is suggested by the Potts model to result in a less stable active conformation.
The largest term, which contributes 16% of the total difference in average Potts conformational penalties between STKs and TKs, comes from an interaction pair in the C-terminal of the activation loop () and the C-terminal of the catalytic loop (). These residues form part of the ‘C-terminal anchor’ (Nolen et al., 2004) which is important for creating a suitable binding site for the substrate peptide. The C-terminal anchor residue is Pro in TKs and typically Ser or Thr in STKs (Taylor et al., 1995). In STKs, the sidechain hydroxyl of this residue forms a hydrogen bond with in the catalytic loop, creating a stable binding site for substrate phosphoacceptor residues and stabilizing the C-terminal anchor (Taylor et al., 1995). is also directly involved in catalysis by interacting with and stabilizing the gamma phosphate of ATP (Zheng et al., 1993); hence, it is often referred to as the ‘catalytic lysine’. The hydrogen bond between and or is almost always formed in the active DFG-in conformation, and we observe breakage of this hydrogen bond in many STKs crystallized in the DFG-out/activation loop folded conformation (e.g. CDK6, Figure 6B), suggesting that deformation of the C-terminal anchor contributes an energetic penalty for the active → inactive conformational change. In TKs, however, the catalytic lysine is almost always replaced with Ala, with the exception of a few TKs (e.g. c-Src) which have instead adopted Arg at this position. The C-terminal anchor of TKs containing and is less stable in comparison to STKs containing () or () for which the Potts coupling is very favorable, consistent with the structural observation that () forms weak interactions (Figure 6C). Our analysis suggests the interaction pair () weakens the C-terminal anchor, leading to a less stable active conformation in TKs as compared with STKs. Another significant contribution to the stabilization of the C-terminal anchor in the active DFG-in conformation for STKs comes from interactions between the residue pair (, ) which are both located within the activation loop. We observe () at this position pair in 33% of STKs, but never in TKs (Figure 6—figure supplement 1D). The Potts coupling between these residues is highly favorable. In contrast, we observe () in 40% of TKs but never in STKs (Figure 6—figure supplement 1H), which have weaker coupling. The bulky sidechains of () observed in TKs cause the activation loop to ‘bulge’ in this C-terminal region which has been previously identified as a feature of TKs that helps shape the substrate binding site to accommodate Tyr residues (Nolen et al., 2004). In addition to this paradigm, our analysis suggests that the C-terminal bulge results in weaker structural constraints on the active conformation relative to STKs.
In summary, TKs are suggested by the Potts statistical energy model which is based on sequence covariation, to have on average, weaker N-terminal anchor, RD-pocket, and C-terminal anchor interactions than STKs. This mechanism of shifting the TK conformational equilibrium away from the active DFG-in/extended activation loop conformation can explain 7 of the top 10 , accounting for ~80% of these contributions to the divergence between the STK and TK conformational landscapes. The remaining ~20% of the top contributions can be attributed to residue-residue interactions that occur within the folded DFG-out conformation wherein the activation loop of TKs binds to the kinase’s own active site as though it was engaging a peptide substrate in trans (Figure 6C right) (Nolen et al., 2004). STKs, however, rarely adopt a folded DFG-out conformation with this property, and instead the activation loop is typically found to be unresolved and/or projecting outward toward solvent (Figure 6B right). The Potts model suggests that this substrate mimicry of the folded DFG-out activation loop observed in TKs is highly dependent on the presence of a Tyr phosphorylation site at position in the activation loop (Figure 6C). In the active conformation, the anionic stabilizes the active conformation by binding to the basic RD-pocket (Figure 6C left, phosphate not shown). However, in the (unphosphorylated) folded DFG-out conformation, this mimics a substrate by stacking against the TK-conserved residue (Figure 6C right) (Nolen et al., 2004). The substrate mimicking nature of this binding mode is demonstrated by the autophosphorylation dimer structure of TK FGFR3 (PDB: 6PNX) solved recently (Figure 6—figure supplement 1K; Chen et al., 2020).
The striking connection between the ability of TKs to phosphorylate tyrosine substrates and their enhanced access to the DFG-out conformation via substrate-competitive contacts from their own activation loop described above suggests an evolutionary model for the TK conformational behavior characterized in this work. In this model, the coevolution of residues that form substrate-competitive contacts in folded DFG-out appears to be a byproduct of the evolutionary pressure for TKs to phosphorylate other TKs on their activation loop tyrosine residues. STKs, on the other hand, have optimized the binding of Ser and Thr substrates via a different binding mode (Endicott et al., 2012) which does not have the same energetic feedback with the stability of the folded DFG-out conformation. Additionally, the catalytic domains of TKs appear to have a less energetically stable ‘extended’ activation loop conformation than STKs, which may have encouraged the evolution of more complex mechanisms of allosteric regulation and autophosphorylation which are highly important regulatory mechanisms in TKs (Chen et al., 2020; Beenstock et al., 2016; Lemmon and Schlessinger, 2010). The combined effect of these two TK phenotypes, the former favoring the stabilization of folded DFG-out and the latter favoring destabilization of active DFG-in, may explain their low free-energy cost for the DFG-in → DFG-out conformational change compared with STKs.
Discussion
In this work, we have combined sequence and structure-based approaches to analyze the conformational free energy difference between active DFG-in and inactive DFG-out kinase states. Using a Potts statistical energy model derived from residue-residue covariation in a kinase family multiple sequence alignment, we first threaded all human STKs and TKs onto large ensembles of active DFG-in and classical DFG-out structures from the PDB. We found distinctly different distributions of threading scores for STKs compared with TKs, with STKs having a significant conformational reorganization penalty compared with TKs. The molecular basis for the evolutionary divergence in the conformational landscapes was analyzed; a substantial contribution to the difference is associated with sequence position pairs that couple the N and C terminal anchor residues of the activation loop to N and C terminal residues in the catalytic loop, according to the Potts statistical energy analysis. We then used the Potts statistical energy model to guide the selection for structure-based MD binding free energy simulations of 74 protein-ligand complexes; using the calculated binding free-energy estimates together with experimental values, we were able to estimate free-energy costs for the large-scale (~17–20 Å) conformational change of the activation loop by an indirect approach. The structure-based estimates of the reorganization free-energy penalties are consistent with the sequence-based estimates. Additionally, the strong correlation between and identified in this study reveals that the conformational landscape has a strong sequence dependence with STKs having an ~4 kcal/mol conformational free energy bias favoring the active state over the inactive state relative to TKs (Figure 4). We note that the most potent type-II inhibitors from the literature which target STKs bind with nanomolar Kds, similar to that for TKs, despite the substantial additional reorganization penalty that STKs must overcome. This suggests that medicinal chemists have implicitly been able to exploit particularly favorable characteristics of the type-II binding pocket to design inhibitors with extremely strong affinities to the DFG-out (activation loop folded) receptor conformations of STKs, and that further analysis of the molecular basis for this tight binding could provide a basis for designing more selective inhibitors.
Materials and methods
Multiple sequence alignment (MSA) and classification of serine/threonine vs tyrosine kinases
Request a detailed protocolAn MSA of 236,572 protein kinase catalytic domains with 259 columns was constructed as previously described (McGee et al., 2021). STKs and TKs were classified based on patterns of sequence conservation previously identified by Taylor and co-workers (Taylor et al., 1995); characteristic sequence features of TKs and STKs which form their respective phosphoacceptor binding pockets are found at the HRD +2 (Ala or Arg in TKs, Lys in STKs) and HRD +4 (Arg in TKs, variable in STKs) in the catalytic loop, as well as the APE-2 residue (Trp in TKs, variable in STKs) in the activation loop. These residues correspond to positions 126, 128, and 165 in our MSA, respectively. Kinases which satisfy the conditions for TKs at all three positions were classified as TKs (10,345 raw sequences, 1069 effective sequences), and those that satisfy the condition for STKs at position 126 and are non-overlapping with the TK condition at position 128 were classified as STKs (210,862 raw sequences, 22,893 effective sequences). The effective number of sequences in each class was calculated by summing over sequence weights, where each sequence was assigned a weight defined as the fraction of the number of sequences in the same class that are within 40% identity. In this way, we correct for the effects of phylogenetics in the calculation of sample size as well as other quantities (see below).
Human kinome dataset
Request a detailed protocol497 human kinase catalytic domain sequences were acquired from Modi and Dunbrack, 2019b (excluding atypical kinases). These sequences were aligned using a Hidden Markov Model (HMM) that contains 259 columns (L=259), which was constructed from the same MSA used to derive our Potts model. 447 human kinases remained after filtering-out sequences with 32 or more gaps.
Classification of gatekeeper size
Request a detailed protocolThe designation of gatekeeper residues as ‘large’ or ‘small’ was based on sidechain van der Waals volumes (Miller et al., 1987; Figure 1—figure supplement 3), where small gatekeepers have a volume of <110 Å3 (Gly, Ala, Ser, Pro, Thr, Cys, Val), and large gatekeepers have a volume of >110 Å3 (Asn, His, Ile, Leu, Met, Lys, Phe, Glu, Tyr, Gln, Trp, Arg).
PDB dataset and conformational states
X-ray crystal structures of tyrosine, serine/threonine, and dual-specificity eukaryotic protein kinases in the PDB were collected from http://rcsb.org on July 30, 2020. The protein sequences of 7919 chains were extracted from 6805 PDB files by parsing the SEQRES record and aligned to the MSA used to construct our Potts model, using an HMM.
Contact frequency differences between ensembles of active/DFG-in and inactive/DFG-out (classical DFG-out) PDB structures (Figure 1—figure supplement 2) are incorporated into the calculation of Potts threaded energies, which are central to this work. Our classification of the active DFG-in and classical DFG-out conformational states is based on ref 6, which we describe in further detail here:
Active DFG-in (BLAminus)
Request a detailed protocolThe BLAminus state of the DFG motif (active DFG-in) is the only active conformation of the activation loop, in contrast to several other inactive DFG-in states (ABAminus, BLAplus, BLBplus, BLBminus, BLBtrans). In the active conformation, all structural requirements for catalytic activity are typically met, e.g., a complete hydrophobic spine, a salt bridge between β3-Lys → αC-Glu, and an extended activation loop that ensures unobstructed substrate-binding, all of which have high correspondence with the BLAminus state (Modi and Dunbrack, 2019b). Our analysis using the software provided in Modi and Dunbrack, 2019b identifies 3643 structures in this conformation belonging to STKs and 625 structures belonging to TKs (4268 structures in total).
Classical DFG-out (BBAminus)
Request a detailed protocolThe DFG-out state is characterized by a ‘flip’ of the conserved Phe to occupy the ATP binding pocket which is otherwise occupied by the conserved Asp in the active conformation. The classical DFG-out conformation is associated with the binding of type-II inhibitors which occupy the back pocket region opened up by the DFG-flip (Vijayan et al., 2015). This conformation corresponds to the BBAminus rotamer state of these residues, and it is the dominant DFG-out conformation associated with the binding mode of type-II inhibitors. This classical DFG-out or BBAminus conformation is correlated with a larger-scale conformational change of the activation loop that involves an ~17 Å ‘folding’ transition with respect to the active conformation. This conformational change is usually accompanied by the formation of secondary structure that obstructs the typical substrate binding surface (e.g. PDB ID: 2HIW, Figure 1—figure supplement 1B). Our conformational analysis via Modi and Dunbrack, 2019b identifies 224 structures in this conformational state belonging to STKs and 286 structures belonging to TKs (510 structures in total).
Contact frequency differences
Request a detailed protocolEach of the clustered PDB structures were converted to an adjacency matrix of binary contacts (1 for ‘in-contact’, 0 otherwise). A contact between residues and in structure was assigned when their nearest sidechain atoms (excluding hydrogen) were detected within a distance . The contact frequency in cluster (e.g. active DFG-in) was calculated for each residue pair () by taking a weighted average over all instances of a contact in that cluster –
Where is the number of PDB chains in cluster , and weights were calculated with , where is the number of times the UniProt ID of structure is found within either cluster (i.e. active) or cluster (i.e. DFG-out). In this way, we have down-weighted contributions to the contact differences that are due to overrepresentation of specific kinases in the PDB clusters, with the goal of using contact differences to represent conserved features of the conformational transition across many different kinases. Alignment gaps and unresolved residues were accounted for by excluding these counts in the summations. Only were included in the calculation. The PDB clusters used to calculate these contact differences are described above. The contact frequency differences for both STKs and TKs were plotted on a contact map for visualization (Figure 1—figure supplement 2).
Potts model and threaded-energy calculation
Request a detailed protocolOur Potts Hamiltonian was constructed from an MSA of protein kinase catalytic domains as previously described (McGee et al., 2021). The Potts Hamiltonian takes the form –
where is the number of columns in the MSA (), is a matrix of self-interactions or ‘fields’, and is the coupling matrix which has the interpretation of co-evolutionary interactions between residues.
The Potts threaded-energy penalty for sequence to undergo the conformational transition is calculated using contact frequency differences between the two conformational ensembles (Haldane et al., 2016) –
where represents a class or family of sequences for which sequence has membership, and represents the contact frequency difference between conformations and observed only for other sequences belonging to class (e.g. or ; upper and lower triangle of Figure 1—figure supplement 2, respectively). As described previously (Haldane et al., 2016) the couplings () and fields () were transformed to the ‘zero-gauge’ prior to calculating .
Contributions to average shift in between STKs and TKs (). Where is the Potts conformational penalty for sequence S to undergo the conformational change , we define as the difference in average between two groups of sequences and
To help interpret in a structural and coevolutionary context, we can write as a sum over position pairs
To evaluate this, we note that average can be expressed as a sum over position pairs
where is calculated as follows
is the frequency (bivariate marginal) of residues and at positions and for sequences in the MSA which belong to group , which we calculate after applying the MSA-derived phylogenetic weights described above. Finally, by substituting Equation 10 back into Equation 8, we show how can be decomposed into contributions from individual residue pairs
By viewing the largest (most positive) terms, where and in Equation 12, we are identifying position pairs that cause STKs to have higher penalties than TK in our Potts threading calculations for the active DFG-in to DFG-out conformational change (Figure 6—figure supplement 2).
Calculation of p-value for
Request a detailed protocolThe quantity is a difference between two averages, . Hypothesis testing to determine the statistical significance of this quantity was performed with respect to a null model where the populations of for STKs and TKs, from which our samples were drawn, are indistinguishable. To this end, a p-value was calculated for a t-statistic derived from Welch’s t-test (Welch, 1947), where is the standard error of average –
where the averages and standard errors are calculated after down-weighting each sequence as described above. This was done to lessen the effects of phylogenetic sampling bias from our MSA and ensure that captures general differences between TKs and STKs, rather than specific TK or STK families.
The one-tailed p-value was calculated using the cumulative t-distribution function generated in python using the SciPy package (Virtanen et al., 2020),
where the degrees of freedom for the t-distribution describing the combined population, , was estimated via the Welch-Satterthwaite equation (Welch, 1947) from the degrees of freedom of the two samples and
where represents the effective number of STKs or TKs, which is an unbiased count of sequences in each dataset that can be obtained by summing the sample weights (, ). From the calculation of , we determine the corresponding p-value to be less than , meaning it is highly unlikely for this large of a difference to be observed if the for TKs and STKs were randomly drawn from the same distribution rather than distinct distributions.
Enumeration of absolute binding free-energy simulations
Request a detailed protocolIn this work, we have performed all-atom MD simulations in explicit solvent for a total of 94 different kinase-inhibitor complexes to calculate ABFEs via the alchemical DDM. 74 of these free-energy calculations were guided by insights from the Potts model, specifically the patterns of Potts conformational penalties plotted in Figure 1D. 68 of these complexes correspond to the ABFEs plotted in Figure 3, of which 23 type-II inhibitors and 23 type-I inhibitors ABFEs are plotted for STKs in Figure 3A, and 22 type-II inhibitors ABFEs for TKs are plotted in Figure 3B. Six additional Potts-guided ABFEs corresponding to CSF1R, KIT, PDGFRA, MAPK14, and PTK2B are included in Figure 2. An additional 20 type-I and type-I ½ ABFEs were calculated as part of our benchmarking procedure described in the Methods section.
Double decoupling method setup
Request a detailed protocolThe double decoupling method (DDM), also known as an ‘alchemical’ method, was applied to compute ABFE (), as shown in Equation 16 (Deng et al., 2018; Sakae et al., 2020). This method computes the free energies of decoupling the inhibitor from the bulk solvent in the presence and absence of a receptor via a nonphysical thermodynamic cycle where the two end states are connected via the alchemical pathway. The starting holo-structures for ABFE calculations were taken from the available crystal structure. The absence of crystal structure prompted us to model the structure of the ligand into the active site of the kinase by superimposing over the binding pose of the available holo crystal structure.
Decoupling of the ligand was achieved by first turning off the coulombic intermolecular interactions followed by Lennard-Jones intermolecular interactions from both the legs. This allows DDM to estimate the free energy, i.e., in the presence of protein () and absence of protein, i.e., in the bulk solvent, () as shown in Equations 17; 18.
Substituting Equations 17; 18 into Equation 19 yields the estimated () from DDM
where is the electrostatic energy contribution toward the total ABFE, and is the non-polar energy contribution.
In this study, depending on the system’s convergence, either 20 or 31 total λs were used for decoupling the ligand from bulk solvent. For instance, either 5 λs with Δλ=0.5 or 11 λs with Δλ=0.1 were used for coulombic decoupling and 15 λs with Δλ=0.1 or 20 λs with Δλ=0.05 were used for decoupling Lennard-Jones interactions in the bulk solvent.
Similarly, depending on the convergence, either 30 or 42 total λs were used for decoupling ligand bound to protein. For instance, 11 or 12 non-uniformly distributed λs were used to restrain the ligand. Decoupling the coulombic interactions between ligand and protein was achieved by either using 4 λs with Δλ=0.25 or 10 λs with Δλ=0.1, whereas a large number of λs were used for decoupling Lennard-Jones interactions, i.e., 15 λs with Δλ=0.1 or 20 λs with Δλ=0.05 were used. The correction term developed by Rocklin and coworkers for treating charged ligands during DDM simulations was adopted (Rocklin et al., 2013). In this regard, it is well documented that the use of a finite-sized periodic solvent box during DDM simulations can lead to non-negligible electrostatic energy contribution toward the calculated total ABFE. Thus, calculated () for charged ligand after addition of electrostatic correction term can be expressed as:
For a proper convergence during DDM simulations, the application of restrains is crucial. Herein, we have used six relative orthogonal restrains with harmonic potentials that include one distance, two angles, and three dihedral angles restrain between the ligand and the protein with a force constant of 10 kcal mol−1 Å−2 [deg−2]. At each λ, 10–30 ns of decoupling simulation via replica-exchange (Affentranger et al., 2006) were obtained to compute the over a well-converged trajectory.
Molecular dynamics setup
Request a detailed protocolIn this study, MD simulations were applied to compute the binding free-energy simulations via DDM. GROMACS-2018.8 (Abraham et al., 2015) was used as an MD engine for all simulations. The tleap module of AMBER16 was used to add the missing hydrogen atoms to the kinase enzymes. The system was solvated explicitly using TIP3P water boxes (Jorgensen et al., 1983) that extended at least 10 Å from the center of the system in each direction. The topology file for the kinase enzyme was created using the amber forcefield ff14SB (Maier et al., 2015). The AM1-BCC charge model (Jakalian et al., 2002) and general amber force field 2 (GAFF2) (Wang et al., 2004) were employed to parametrize different inhibitors used in this study. The overall charge of the system was maintained by adding a suitable number of counterions in each system. During the simulations, electrostatic interactions were computed using the particle mesh Ewald method (Essmann et al., 1995) with a cutoff and grid spacing of 10.0 and 1.0 Å, respectively. The NPT (constant Number of particles, Pressure, and Temperature) ensembles with a time step of 2 fs was used in the simulations.
Benchmarking calculations for absolute binding free-energy simulations
Request a detailed protocolAccurate prediction of the ABFE difference between the Apo and Holo state of a protein is extremely important to achieve from force field-based MD simulations (Lin et al., 2013; Lovera et al., 2012). In this study, we used type-II inhibitors as probes to estimate ΔGreorg via ABFE simulations, which is the excess free-energy between experimentally determined binding affinity and ΔGbind calculated from ABFE. Target kinases, i.e., MAPK14, CDK2, and JNK1 bound with type-I and I 1/2 inhibitors in the active conformation states (where ΔGreorg is expected to be close to zero) have been regularly used by the computational community as benchmark systems for absolute or relative binding free energy calculations (Wang et al., 2015; Lee et al., 2020; Goel et al., 2021; Kuhn et al., 2020; Gapsys et al., 2019; Khalak et al., 2021). Benchmarking calculations over multiple protein-ligand complexes show close agreement between calculated (ΔGbind) and experimental (ΔGexp) terms (Table 2, Figure 2—figure supplement 1). In the later part of benchmarking studies, we have included ABL1 bound to type-II inhibitors in the DFG-out/folded activation loop state. NMR studies have shown experimentally that ABL1 has a ΔGreorg of 1.2 kcal/mol (Xie et al., 2020), which is consistent with our range of estimates of ΔGreorg for this kinase (Table 2).
Potts-guided target selection for absolute binding free-energy simulations
Request a detailed protocolOur Potts threaded-energy calculations were used alongside experimental type-II binding data from the large-scale assay by Davis et al., 2011 to identify kinase targets that are likely to have very large or very small ΔGreorg. As described in the main text, all-atom MD simulations to calculate ABFEs of type-II inhibitors can be used alongside experimental binding affinities to calculate the free-energy cost for a kinase to reorganize to the DFG-out/folded activation loop conformation. This relation, ΔGreorg = ΔGexp − ΔGbind(ABFE), gives the free-energy cost to reorganize in physical energy units (kcal/mol) and can be used to approximate a scale for the Potts statistical energy differences provided one sample a sufficient range of ΔGreorg and ΔEPotts. However, ABFE simulations are much more computationally demanding than the Potts threading calculation, which we sought to mitigate by choosing kinase simulation targets which are likely to provide a strong signal, guided by the Potts model. We direct the reader to Table 1, which contains the Potts penalties and type-II hit rates for the targets of interest. For comparison, Figure 1A and Figure 5 provide the overall distributions of hit rates and Potts penalties for TKs and STKs.
A significant challenge for our target selection was the limited availability of type-II inhibitors co-crystallized against STKs which have (a) Potts penalties and type-II hit rates that predict very high ΔGreorg, (b) experimental binding affinities available in the literature in the form of IC50, Ki, or Kd, (c) availability of protein-ligand co-crystallized structure(s), and (d) type-II inhibitor complex systems where the activation loop appears to have undergone a large-scale ‘folding’ conformational change relative to the active ‘extended’ conformation. STK complexes that satisfy all four criteria appear to be sparse, which is consistent with the notion that kinases with large reorganization penalties are more difficult to crystallize in the classical DFG-out conformation (Haldane et al., 2016). However, for some STKs with very high Potts threaded-energy penalties (e.g., MELK) there has been significant medicinal chemistry efforts to design potent type-II inhibitors and structurally characterize their complexes using x-ray crystallography. Using type-II co-crystal structures that cover five different STKs with high Potts penalties (Table 1) and five different TKs with low Potts penalties, we were able to sample a wide range of ΔGreorg from a total of 45 ABFE simulations covering 45 type-II inhibitor complexes and 10 different kinase targets. These simulations and subsequent calculations of ΔGreorg for each kinase resulted in a strong correlation with the Potts threaded energy scores (Figure 5), allowing us to establish a scale for the Potts energies in kcal/mol. We have provided detailed results from the ABFE simulations of these 10 kinase targets in the form of supplementary figures (Figure 3—figure supplement 1) where the average ΔGreorg for each kinase is visualized as the y-intercept of a linear regression with the slope constrained to one.
Data availability
Request a detailed protocolValues of from all ABFE simulations described in this work, including benchmarking calculations (94 simulations in total), are provided in the form of supplementary tables (Figure 2—source data 1 and Figure 3—source data 1). Potts , type-II hit rates computed from Shan et al., 2013, the identity of gatekeeper residues and corresponding van der Waals volumes in Å (Stancik et al., 2018), and the classification of human kinases as TKs or STKs were provided in a separate supplementary table (Figure 1—source data 1).
Data availability
Our computational study makes use of experimental data from the literature, which we extracted and curated manually rather than relying on any specific database. Any experimental data used can be found in our supporting information in the form of tables alongside appropriate citations. A large set of experimental "hit rates" were derived from binding affinities available from Davis et al., 2011. The data used to generate various plots in the main text can be found in tables throughout the supporting information, as well as a distinct "supplementary table" which we provide. The Mi3-GPU (Haldane and Levy, 2021) source code required to reproduce the Potts model employed in this manuscript can be found at the following link: https://github.com/ahaldane/Mi3-GPU (Haldane and avikbiswas, 2021, v1.1, copy archived at swh:1:rev:b8fd4aa67bb2531fdc60e3a00fed6f80c8aceb49).
References
-
A novel hamiltonian replica exchange MD protocol to enhance protein conformational space samplingJournal of Chemical Theory and Computation 2:217–228.https://doi.org/10.1021/ct050250b
-
Comprehensive assay of kinase catalytic activity reveals features of kinase inhibitor selectivityNature Biotechnology 29:1039–1045.https://doi.org/10.1038/nbt.2017
-
Part I: mechanisms of resistance to imatinib in chronic myeloid leukaemiaThe Lancet. Oncology 8:1018–1029.https://doi.org/10.1016/S1470-2045(07)70342-X
-
Activation of tyrosine kinases by mutation of the gatekeeper threonineNature Structural & Molecular Biology 15:1109–1118.https://doi.org/10.1038/nsmb.1486
-
How do protein kinases take a selfie (autophosphorylate)?Trends in Biochemical Sciences 41:938–953.https://doi.org/10.1016/j.tibs.2016.08.006
-
Molecular basis for receptor tyrosine kinase A-loop tyrosine transphosphorylationNature Chemical Biology 16:267–277.https://doi.org/10.1038/s41589-019-0455-7
-
Rigorous free energy simulations in virtual screeningJournal of Chemical Information and Modeling 60:4153–4169.https://doi.org/10.1021/acs.jcim.0c00116
-
Comprehensive analysis of kinase inhibitor selectivityNature Biotechnology 29:1046–1051.https://doi.org/10.1038/nbt.1990
-
Elucidating the energetics of entropically driven protein-ligand association: calculations of absolute binding free energy and entropyThe Journal of Physical Chemistry. B 115:11902–11910.https://doi.org/10.1021/jp204047b
-
Comparing alchemical and physical pathway methods for computing the absolute binding free energy of charged ligandsPhysical Chemistry Chemical Physics 20:17081–17092.https://doi.org/10.1039/c8cp01524d
-
The structural basis for control of eukaryotic protein kinasesAnnual Review of Biochemistry 81:587–613.https://doi.org/10.1146/annurev-biochem-052410-090317
-
Knowledge based prediction of ligand binding modes and rational inhibitor design for kinase drug discoveryJournal of Medicinal Chemistry 51:5149–5171.https://doi.org/10.1021/jm800475y
-
Alchemical free energy methods applied to complexes of the first bromodomain of BRD4Journal of Chemical Information and Modeling 62:1458–1470.https://doi.org/10.1021/acs.jcim.1c01229
-
Mi3-GPU: MCMC-based inverse Ising inference on GPUs for protein covariation analysisComputer Physics Communications 260:107312.https://doi.org/10.1016/j.cpc.2020.107312
-
Structural characterization of proline-rich tyrosine kinase 2 (PYK2) reveals a unique (DFG-out) conformation and enables inhibitor designJournal of Biological Chemistry 284:13193–13201.https://doi.org/10.1074/jbc.M809038200
-
Sequence determinants of a specific inactive protein kinase conformationChemistry & Biology 20:806–815.https://doi.org/10.1016/j.chembiol.2013.05.005
-
Addressing intersite coupling unlocks large combinatorial chemical spaces for alchemical free energy methodsJournal of Chemical Theory and Computation 18:2114–2123.https://doi.org/10.1021/acs.jctc.1c00948
-
Mutation effects predicted from sequence co-variationNature Biotechnology 35:128–135.https://doi.org/10.1038/nbt.3769
-
Fast, efficient generation of high-quality atomic charges. AM1-BCC model: II. parameterization and validationJournal of Computational Chemistry 23:1623–1641.https://doi.org/10.1002/jcc.10128
-
Alchemical absolute protein-ligand binding free energies for drug designChemical Science 12:13958–13971.https://doi.org/10.1039/d1sc03472c
-
Assessment of binding affinity via alchemical free-energy calculationsJournal of Chemical Information and Modeling 60:3120–3130.https://doi.org/10.1021/acs.jcim.0c00165
-
Alchemical binding free energy calculations in AMBER20: advances and best practices for drug discoveryJournal of Chemical Information and Modeling 60:5595–5623.https://doi.org/10.1021/acs.jcim.0c00613
-
Computational study of gleevec and G6G reveals molecular determinants of kinase inhibitor selectivityJournal of the American Chemical Society 136:14753–14762.https://doi.org/10.1021/ja504146x
-
A molecular gate which controls unnatural ATP analogue recognition by the tyrosine kinase v-srcBioorganic & Medicinal Chemistry 6:1219–1226.https://doi.org/10.1016/s0968-0896(98)00099-6
-
Rational design of inhibitors that bind to inactive kinase conformationsNature Chemical Biology 2:358–364.https://doi.org/10.1038/nchembio799
-
The different flexibility of c-src and c-abl kinases regulates the accessibility of a druggable inactive conformationJournal of the American Chemical Society 134:2496–2499.https://doi.org/10.1021/ja210751t
-
Towards a molecular understanding of the link between imatinib resistance and kinase conformational dynamicsPLOS Computational Biology 11:e1004578.https://doi.org/10.1371/journal.pcbi.1004578
-
BookInference of Direct Residue Contacts in Two-Component Signaling (1st ed)Elsevier Inc.
-
Ff14SB: improving the accuracy of protein side chain and backbone parameters from ff99sbJournal of Chemical Theory and Computation 11:3696–3713.https://doi.org/10.1021/acs.jctc.5b00255
-
The protein kinase complement of the human genome - supplemental informationBulletin of the World Health Organization 73:7–14.
-
The generative capacity of probabilistic protein sequence modelsNature Communications 12:6302.https://doi.org/10.1038/s41467-021-26529-9
-
Interior and surface of monomeric proteinsJournal of Molecular Biology 196:641–656.https://doi.org/10.1016/0022-2836(87)90038-6
-
Tyrosine kinase signaling and the emergence of multicellularityBiochimica et Biophysica Acta 1823:1053–1057.https://doi.org/10.1016/j.bbamcr.2012.03.009
-
Src kinase activation: a switched electrostatic networkProtein Science 15:1051–1062.https://doi.org/10.1110/ps.051999206
-
Inhibition of P38 MAP kinase by utilizing a novel allosteric binding siteNature Structural Biology 9:268–272.https://doi.org/10.1038/nsb770
-
Absolute free energy of binding calculations for macrophage migration inhibitory factor in complex with a druglike inhibitorThe Journal of Physical Chemistry. B 123:8675–8685.https://doi.org/10.1021/acs.jpcb.9b07588
-
Structure-based virtual screening workflow to identify antivirals targeting HIV-1 capsidJournal of Computer-Aided Molecular Design 36:193–203.https://doi.org/10.1007/s10822-022-00446-5
-
Protein kinases: evolution of dynamic regulatory proteinsTrends in Biochemical Sciences 36:65–77.https://doi.org/10.1016/j.tibs.2010.09.006
-
KLIFS: a knowledge-based structural database to navigate kinase-ligand interaction spaceJournal of Medicinal Chemistry 57:249–277.https://doi.org/10.1021/jm400378w
-
Conformational analysis of the DFG-out kinase motif and biochemical profiling of structurally validated type II inhibitorsJournal of Medicinal Chemistry 58:466–479.https://doi.org/10.1021/jm501603h
-
Development and testing of a general amber force fieldJournal of Computational Chemistry 25:1157–1174.https://doi.org/10.1002/jcc.20035
-
Accurate and reliable prediction of relative ligand binding potency in prospective drug discovery by way of a modern free-energy calculation protocol and force fieldJournal of the American Chemical Society 137:2695–2703.https://doi.org/10.1021/ja512751q
-
Through the “gatekeeper door”: exploiting the active kinase conformationJournal of Medicinal Chemistry 53:2681–2694.https://doi.org/10.1021/jm901443h
Article and author information
Author details
Funding
National Institutes of Health (R35-GM132090)
- Joan Gizzio
- Abhishek Thakur
- Allan Haldane
- Ronald M Levy
National Institutes of Health (OD020095)
- Joan Gizzio
- Abhishek Thakur
- Allan Haldane
- Ronald M Levy
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
This research was supported by National Institutes of Health grant number R35-GM132090, and by NIH Computer Equipment Grant (OD020095). Gratitude is also expressed to the OWLSNEST high performance cluster at Temple University for its computing support in this project. We thank Shima Arasteh for helpful discussions related to kinase conformational states and alchemical free-energy simulations.
Copyright
© 2022, Gizzio, Thakur et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 2,150
- views
-
- 253
- downloads
-
- 16
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Computational and Systems Biology
- Microbiology and Infectious Disease
Timely and effective use of antimicrobial drugs can improve patient outcomes, as well as help safeguard against resistance development. Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) is currently routinely used in clinical diagnostics for rapid species identification. Mining additional data from said spectra in the form of antimicrobial resistance (AMR) profiles is, therefore, highly promising. Such AMR profiles could serve as a drop-in solution for drastically improving treatment efficiency, effectiveness, and costs. This study endeavors to develop the first machine learning models capable of predicting AMR profiles for the whole repertoire of species and drugs encountered in clinical microbiology. The resulting models can be interpreted as drug recommender systems for infectious diseases. We find that our dual-branch method delivers considerably higher performance compared to previous approaches. In addition, experiments show that the models can be efficiently fine-tuned to data from other clinical laboratories. MALDI-TOF-based AMR recommender systems can, hence, greatly extend the value of MALDI-TOF MS for clinical diagnostics. All code supporting this study is distributed on PyPI and is packaged at https://github.com/gdewael/maldi-nn.
-
- Computational and Systems Biology
- Genetics and Genomics
Enhancers and promoters are classically considered to be bound by a small set of transcription factors (TFs) in a sequence-specific manner. This assumption has come under increasing skepticism as the datasets of ChIP-seq assays of TFs have expanded. In particular, high-occupancy target (HOT) loci attract hundreds of TFs with often no detectable correlation between ChIP-seq peaks and DNA-binding motif presence. Here, we used a set of 1003 TF ChIP-seq datasets (HepG2, K562, H1) to analyze the patterns of ChIP-seq peak co-occurrence in combination with functional genomics datasets. We identified 43,891 HOT loci forming at the promoter (53%) and enhancer (47%) regions. HOT promoters regulate housekeeping genes, whereas HOT enhancers are involved in tissue-specific process regulation. HOT loci form the foundation of human super-enhancers and evolve under strong negative selection, with some of these loci being located in ultraconserved regions. Sequence-based classification analysis of HOT loci suggested that their formation is driven by the sequence features, and the density of mapped ChIP-seq peaks across TF-bound loci correlates with sequence features and the expression level of flanking genes. Based on the affinities to bind to promoters and enhancers we detected five distinct clusters of TFs that form the core of the HOT loci. We report an abundance of HOT loci in the human genome and a commitment of 51% of all TF ChIP-seq binding events to HOT locus formation thus challenging the classical model of enhancer activity and propose a model of HOT locus formation based on the existence of large transcriptional condensates.