Introduction

Receptor Tyrosine Kinases (RTKs) are critical signaling molecules that activate and regulate cellular pathways. Disruption of typical RTK regulatory mechanisms through point mutations, gene amplification, protein fusions, or autocrine loops can drive the development, maintenance, and spread of cancers. Small molecule inhibitors are designed to disrupt aberrant signaling cascades by selectively targeting the kinase domain, with tyrosine kinase inhibitors (TKIs) like imatinib showing durable treatment outcomes (Cohen et al., 2021; Attwood et al., 2021). Inhibitors are generally designed against either a wild type kinase or a specific mutational profile, yet acquired mutations can alter sensitivity to different inhibitors and undermine efficacy. The most extreme case of this is resistance, which emerges in the treatment of many cancers by TKI selective pressure (Cohen et al., 2021; Attwood et al., 2021). These mutations may act by altering kinase stability, expression, conformation, or activity of the target kinase. Although several recurrent mutations at inhibitor-interacting positions have predictable resistance, specific and rare resistance mutations can be associated with the interactions or conformations unique to certain inhibitors.

An attractive strategy to counter resistance is optimizing the interactions that differ between inhibitors. Small-molecule kinase inhibitors fall into four distinct groups, characterized by their binding modality to the ATP pocket and conformational preferences (Arter et al., 2022; Attwood et al., 2021; Zuccotto et al., 2009). Among these groups, three are ATP-competitive: type I, type II, and type I½ (Figure 1A-C). Type I inhibitors occupy the adenosine binding pocket, form hydrogen bonds with "hinge" region residues, and favor an active conformation. Type II inhibitors also occupy the adenosine pocket but extend into an opening in the R-spine that is accessible in an inactive conformation (Arter et al., 2022) (Figure 1B). Type I½ inhibitors combine features from both type I and type II inhibitors, engaging with both the adenosine pocket and the R-spine pocket (Arter et al., 2022) (Figure 1B). Finally, type III inhibitors are allosteric, non-ATP competitive inhibitors (Arter et al., 2022) (Figure 1B). Given the chemical interaction differences and conformational preferences among TKI groups for distinct kinase states, a general approach to combating resistance is sequential treatment of type I and II inhibitors (Recondo et al., 2018). However, without understanding the potential sensitivity of an acquired resistance mutation to the subsequent inhibitor, the efficacy of such strategies is not guaranteed.

MET kinase inhibitor types and resistance mutations screened against a nearly comprehensive library of kinase domain substitutions.

(A) Crystal structure of the ATP-bound MET kinase domain (3DKC) overlaid with type Ia (crizotinib, 2WGJ), type Ib (savolitinib, 6SDE), type II (merestinib, 4EEV), type I½ (AMG-458, 5T3Q), and type III inhibitors (tivantinib, 3RHK). (B) Pocket view of ATP and each inhibitor type bound to the active site of the MET kinase domain with the respective inhibitor and crystal structures from panel A. (C) 2D chemical structures of each inhibitor screened against the site saturation mutagenesis library of the MET kinase domain, with each experimentally determined IC50 values displayed for Ba/F3 cells stably expressing the wild-type MET ICD in a TPR-fusion background. (D) Dose-response curves for each inhibitor against the wild-type MET intracellular domain expressed in a TPR-fusion in the Ba/F3 cell line. (E) Schematics of the full-length and exon 14 skipped MET receptor alongside the TPR-fusion constructs with the full-length and exon 14 skipped intracellular domain, displaying four mechanisms of oncogenic activity: point mutations, exon 14 skipping, constitutive activity through domain fusions, and inhibitor resistance mutations. (F) Experimental workflow for defining the mutational landscape of the wild-type TPR-MET and exon 14 skipped TPR-METΔEx14 intracellular domain against 11 ATP-competitive inhibitors in Ba/F3, interleukin-3 (IL-3) withdrawn pooled competition assay.

Structural inhibitor classification and dose-response determination.

(A) Type I (crizotinib, 2WGJ; tepotinib, 4R1V; capmatinib; savolitinib, 6SDE; NVP-BVU972, 3QTI) and type II (merestinib, 4EEV; cabozantinib; glesatinib) inhibitor-bound MET kinase domain structures globally aligned. Hinge (gray) and G1163 (represented as a sphere) are highlighted to show the kinase domain solvent-front relative to each inhibitor. Inhibitors lacking experimental structures (capmatinib, cabozantinib, glumetinib, and glesatinib) were docked onto a representative type I (PDB 2WGJ) and type II (4EEV) structure through AutoDock Vina (Eberheart et al., 2021; Trott et al., 2010). (B) Solvent-front and G1163 highlighted relative to the ATP-bound kinase domain crystal structure (3DKC) and all inhibitors screened. (C) Dose-response curves for each inhibitor against the TPR-fusion MET and METΔEx14 intracellular domains, stably expressed in Ba/F3 cells.

Correlation analysis of the MET kinase domain site saturation mutagenesis library across replicates and conditions.

Replicate correlation analysis for each inhibitor for both the TPR-fusion MET background scores with Enrich2.

Correlation analysis of the METΔEx14 kinase domain site saturation mutagenesis library across replicates and conditions.

Replicate correlation analysis for each inhibitor for the TPR-fusion METΔEx14 background score with Enrich2.

Fitness landscapes of the MET kinase domain against a panel of 11 inhibitors.

Heatmap for the DMSO control condition and all inhibitor fitness scores from Rosace, subtracted from DMSO for >99% of MET kinase domain variants in the full intracellular domain background in the context of the TPR-fusion.

The problem of which inhibitor to use and in what order is exemplified by the choice of inhibitors targeting MET kinase (Bahcall et al., 2016; Recondo et al., 2020; Fernandes et al., 2021). MET is an RTK and proto-oncogene that has been implicated in the pathogenesis of gastric, renal, colorectal, and lung cancers Frampton et al., 2015; Duplaquet et al., 2018; Wood et al., 2021; Lu et al., 2017). Molecular profiling and next-generation sequencing of tumor samples has provided insight on cancer-associated MET variants (Frampton et al., 2015; Bahcall et al., 2021). Clinical reports following post-treatment outcomes have documented recurrent resistance mutations at positions such as D1228, Y1230, G1163, L1195, H1094 for MET (Fernandes et al., 2021; Lu et al., 2017; Recondo et al., 2020). The challenge of acquired resistance following MET inhibitor therapy has been approached with strategies including sequential treatment of type I and type II TKIs (Bahcall et al., 2016; Recondo et al., 2020), and combination therapy with type I and type II TKIs (Bahcall et al., 2022; Fernandes et al., 2021; Smyth et al., 2022). However, without extensive documentation of the behavior of resistance and sensitizing mutations in MET towards specific TKIs, there remains a barrier towards leveraging inhibitors for specific mutational responses, optimizing inhibitor pairings, and informing rational drug design. Thousands of compounds have been screened against the MET kinase domain, and while several have undergone clinical trials, currently four MET inhibitors have received FDA approval: crizotinib, cabozantinib, tepotinib, and capmatinib (Santarpia et al., 2021; Wang and Lu, 2023). Nevertheless, the emergence of resistance not only limits the efficacy of these drugs but also poses challenges for second-line therapeutic strategies, particularly in the context of rare and novel mutations.

Previously, we used Deep Mutational Scanning (DMS), a pooled cellular selection experiment, to massively screen a library of nearly all possible MET kinase domain mutations. The juxtamembrane domain is encoded by Exon 14 in MET, serves an incompletely understood negative regulatory function in the kinase (Ma et al., 2003) and is recurrently excluded in cancer by somatically encoded exon skipping mutations. By testing this library in the context of a wild-type intracellular domain and the recurrent cancer exon 14 skipped variant (METΔEx14) (Figure 1E), we identified conserved regulatory motifs, interactions involving the juxtamembrane and ⍺C-helix, a critical β5 motif, clinically documented cancer mutations, and classified variants of unknown significance (Estevam et al., 2024). Understanding how these variants respond to specific inhibitors can inform therapeutic strategies, with precedent in inhibitor-based DMS studies across kinases such as ERK, CDK4/6, Src, EGFR, and others (Brenan et al., 2016; Persky et al., 2020; Chakraborty et al., 2023; An et al., 2023).

Here, we explore the landscape of TKI resistance of the MET kinase domain against a panel of 11 inhibitors, utilizing our previously established platform (Estevam et al., 2024). By profiling a near-comprehensive library of kinase domain variants in the MET and METΔEx14 intracellular domain, we captured a diverse range of effects based on inhibitor chemistry and ‘type’ classifications (Figure 1). Within our screen, mutations that confer resistance and offer differential sensitivities across inhibitors were identified, which can be leveraged in sequential or combination therapy. We use Rosace, a Bayesian fitness scoring framework, to reduce false discovery rates in mutational scoring and allow for post-processing normalization of inhibitor treatments (Rao et al., 2024). With our dataset, we have analyzed differential sensitivities to inhibitor pairs and provided a platform for assessing inhibitor efficacy based on mutational sensitivity and likelihood. Lastly, we augment a protein language model (Rives et al., 2021; Brandes et al., 2023; Chen and Guestrin, 2016) with biophysical and chemical features to improve predictions for MET inhibitor datasets, and in the future more effectively learn and predict mutational fitness towards novel inhibitors.

Results

Measuring the mutational fitness of 5,764 MET kinase domain variants against ATP-competitive inhibitors

To evaluate the response of MET mutations to different inhibitors, we selected six type I inhibitors (crizotinib, capmatinib, tepotinib, glumetinib, savolitinib, and NVP-BVU972), three type II inhibitors (cabozantinib, glesatinib analog, merestinib), and a proposed type III inhibitor, tivantinib (Figure 1C). Type I MET inhibitors leverage pi-stacking interactions with Y1230 and salt-bridge formation between D1228 and K1110, and are further classified as type Ia or Ib based on whether they interact with solvent front residue G1163 (Figure 1C) (Cui 2014; Fujino et al., 2019; Wang et al., 2023). Here we specifically define type Ia inhibitors as having a solvent-front interaction (Cui 2014), which structurally classifies tepotinib and capmatinib as type Ia based on our analysis and inhibitor docking models (Figure 1A-C), despite classification as type Ib in other studies (Brazel et al., 2022; Fujino et al., 2022; Recondo et al., 2020) (Figure 1- figure supplement 1).

As in our previous work, we employed the Ba/F3 cell line as our selection system due to its undetectable expression of endogenous RTKs and addiction to exogenous interleukin-3 (IL-3) (Estevam et al., 2024). These properties allow for positive selection based on ectopically expressed kinase activity and proliferation in the absence of IL-3 (Daley and Baltimore, 1988; Warmuth et al., 2007; Koga et al., 2022). We used a TPR-MET fusion to generate IL-3 independent constitutive activity (Estevam et al., 2024). While this system affords cytoplasmic expression, constitutive oligomerization, and HGF-independent activation, features like membrane-proximal effects are lost (Cooper et al., 1984; Park et al., 1986; Peschard et al., 2001; Rodrigues and Park, 1993; Vigna et al., 1999; Mak et al., 2007; Pal et al., 2017; Lu et al., 2017; Fujino et al., 2019). The constitutive activity of TPR-MET and the reliance of Ba/F3 cells on that activity render this selection system quite sensitive for determining reduction in growth from small molecule inhibition.

We generated dose response curves for each inhibitor against wild-type TPR-MET (wild-type intracellular domain, including exon 14 that contains the CBL docking site) and TPR-METΔEx14 (exon 14 skipped intracellular domain) constructs, stably expressed in Ba/F3 cells to determine IC50 values for our system (Figure 1D; Figure 1-figure supplement 1). We used our previously published library harboring >99% of all possible 5,764 kinase domain (1059-1345aa) mutations in a TPR-fusion background carrying either a wild-type MET or exon 14 skipped intracellular domain (Figure 1E) (Estevam et al., 2024). Time points were selected every two cell doublings over the course of four time points, and cells were split and maintained in the absence of IL-3 and drugs at IC50 for each inhibitor, including a DMSO control. All samples, across all time points and replicates, were prepared for next-generation sequencing (NGS) in parallel, and sequenced on the same Illumina NovaSeq 6000 flow cell to identify variant frequencies (Figure 1F). We then calculated variant fitness scores using Rosace (Rao et al., 2024) (Figure 1F; Figure 1-figure supplement 1-4). We performed parallel analysis of the TPR-MET and TPR-METΔEx14 screens; however, we focus our analysis below on TPR-MET with the parallel and largely consistent analyses of TPR-METΔEx14 available in the supplement.

Defining the mutational landscape of resistant and sensitizing mutations for the MET kinase domain

While growth rates were experimentally controlled through equipotent dosing during selection, there was no direct way to validate this post-processing. To generate meaningful comparisons between inhibitor scores and conditions, in addition to performing downstream score subtractions, we normalized cell growth rates for each inhibitor to the growth rate observed for the DMSO population. As expected, the DMSO control population displayed a bimodal distribution with mutations exhibiting wild-type fitness centered around 0, with a wider distribution of mutations that exhibited loss -or gain-of-function effects (Figure 2A). Also as expected, inhibitor-treated populations displayed distributions with a loss-of-function peak, representative of mutations that are sensitive to the inhibitor. Unlike DMSO, inhibitor populations were right-skewed, showing greater enrichment of gain-of-function scores at the positive tail of distributions (Figure 2A). These population differences were exemplified by the low correlation between each inhibitor and DMSO, with capmatinib showing the greatest difference from DMSO with a Pearson’s correlation of 0.45, and tivantinib standing as an outlier with a Pearson’s correlation of 0.93 (Figure 2B).

Mutational landscape of the MET kinase domain under 11 ATP-competitive inhibitor selection.

(A) Distributions of all variants (wild-type synonymous, early stop, and missense) for each condition in the wild-type TPR-MET kinase domain, scored with Rosace and normalized to the growth rate of the DMSO control population. (B) Correlation plots for all mutational fitness scores for each drug against DMSO, fitted with a linear regression and Pearson’s R value displayed. (C) Heatmap showing the Pearson’s R correlation for each condition against each other, annotated by condition and inhibitor type. Correlations are colored according to a scale bar from gray to blue (low to high correlation). (D) Crystal structure of the tivantinib-bound MET kinase domain (PDB 3RHK) overlaid with the ATP-bound kinase domain (PDB 3DKC), with tivantinib-stabilizing residues and overlapping density of tivantinib (orange) and ATP (purple) highlighted. (E) Dose responses of crizotinib and tivantinib tested against stable Ba/F3 cells expressing the wild-type intracellular domain of MET fused to TPR, tested in the presence and absence of interleukin-3 (IL-3).

Mutational landscape of the METΔEx14 kinase domain under 11 ATP-competitive inhibitor selection.

(A) Distributions of all variants (wild-type synonymous, early stop, and missense) for each condition, scored with Rosace and normalized to the growth rate of the DMSO control population. (B) Correlation plots for all mutational fitness scores for each drug against DMSO, fitted with a linear regression and Pearson’s R value displayed. (C) Heatmap showing the Pearson’s R correlation for each condition against each other, annotated by condition and inhibitor type. Correlations are colored according to a scale bar from gray to blue (low to high correlation).

In comparing all conditions to each other, we were able to further capture differences between and within inhibitor types. Type II inhibitors displayed the greatest similarities to one another, with merestinib and the glesatinib analog having the highest correlation (r = 0.93) and cabozantinib and glesatinib analog showing the lowest (r = 0.87) (Figure 2C). While type I inhibitors were also highly correlated, capmatinib stood out as an outlier, displaying the lowest correlations potentially due to difficulty in experimental overdosing due to its greater potency (Bahcall et al., 2022; Fujino et al., 2019) (Figure 2C). While there was only one type I½ inhibitor, AMG-458, it displayed higher similarity to type II inhibitors than to type I, likely due to similar type II back pocket interactions with the kinase R-spine (Figure 1B; Figure 2C). Nevertheless, AMG-458 was most distinct from cabozantinib (r = 0.83) and type I inhibitors tepotinib (r = 0.79) and savolitinib (r = 0.73) (Figure 2C). Between type I and type II groups, with the exception of capmatinib, tepotinib and savolitinib showed the lowest correlation with merestinib and glesatinib analog (Figure 2C).

The strong correlation of tivantinib with the DMSO control (r = 0.93) and low correlation with all other inhibitors suggested a MET-independent mode of action. Until recently (Michaelides et al, 2023), tivantinib was considered the only type III MET-inhibitor and showed promising early clinical trial results (Eathiraj et al., 2011; Wang et al., 2022). In vitro assays on the purified MET kinase domain have shown that tivantinib has the potential to hinder catalytic activity (Munshi et al., 2010) and structural studies revealed that it selectively targets an inactive DFG-motif conformation, with tivantinib stabilizing residues (F1089, R1227) blocking ATP binding (Figure 2D) (Eathiraj et al., 2011). Yet in contradiction, comparative MET-dependent and MET-independent cell-based studies on tivantinib have also shown MET agnostic anti-tumor activity, posing that tivantinib may have an alternative inhibitory mechanism than a MET-selective one (Michieli and Di Nicolantonio, 2013; Basilico et al., 2013; Katayama et al., 2013; Fujino et al., 2019).

Therefore, to test the hypothesis that tivantinib is not MET-selective in our system, we compared the dose response of tivantinib and crizotinib in the presence and absence of IL-3 for wild-type TPR-MET, stably expressed in Ba/F3 cells (Figure 2E). As expected, crizotinib only displayed an inhibitory effect under IL-3 withdrawal, highlighting a MET-dependent mode of action. In contrast, tivantinib displayed equivalent inhibition regardless of IL-3, reinforcing that tivantinib has cytotoxicity effects unrelated to MET inhibition (Figure 2E) and underscores the sensitivity of the DMS in identifying direct protein-drug effects.

Crizotinib-MET kinase domain resistance profiles exemplify the information accessible from individual inhibitor DMS

As an example of the insights that can be learned from the inhibitor DMS screens, we examined the profile for crizotinib, one of four FDA approved inhibitors for MET and a multitarget TKI (Cui et al., 2011; Wang and Lu, 2023; Santarpia et al., 2021). To identify mutations that show gain-of-function and loss-of-function behaviors specific to inhibitors compared to DMSO, we subtracted DMSO from all fitness scores. (Figure 3A), with the expectation that effects related to expression or stability would be similar in both conditions, enhancing the ability to identify drug sensitivity or resistance. Indeed, the highest frequency of gain-of-function mutations occurred at residues mediating direct drug-protein interactions, such as D1228, Y1230, and G1163. These sites, and many of the individual mutations, have been noted previously in clinical reports (Fernandes et al., 2021; Yao et al., 2023; Tovar and Graveel, 2017; Tiedt et al., 2011). Beyond these well characterized sites, regions with sensitivity occurred throughout the kinase, primarily in loop-regions which have the greatest mutational tolerance in DMSO, but do not provide a growth advantage in the presence of an inhibitor (Figure 1-figure supplement 1; Figure 1-figure supplement 2).

Novel resistance mutations identified and mapped for crizotinib.

(A) Heatmap of crizotinib fitness scores subtracted from DMSO, scaled from loss-of-function (red) to gain-of-function (blue), with the wild-type protein sequence, secondary structure, kinase domain residue position, and mutational substitution annotated. Wild-type synonymous substitutions are outlined in green. (B) Resistance positions mapped onto the crizotinib-bound, MET crystal structure (PDB 2WGJ). Positions are labeled and colors are scaled according to the average score for the resistance mutations at each site. (C) 2D protein-drug interactions between crizotinib and the MET kinase domain (PDB 2WGJ) with pocket residues and polar and pi interactions annotated. Schematic generated through PoseEdit (Diedrich et al., 2023) (https://proteins.plus/). (D) Condensed crizotinib heatmap displaying direct drug-protein interacting and non-direct resistance position. (E) Crizotinib binding site and pocket residues displayed with resistance positions highlighted (pink) and the wild-type residue and inhibitor interactions shown (PDB 2WGJ). (F) Resistance mutations modeled for I1084H, V1092I, Y1159R, M1211Y, and N1167 relative to ATP (PDB 3DKC).

In mapping positions with resistance to the crizotinib-bound kinase domain crystal structure (PDB 2WGJ), our DMS results further emphasize the emergence of resistance mutations at the ATP-binding site and direct-protein drug interacting residues (Figure 3B-D). Mutations to the hinge position, Y1159, and C-spine residues, including M1211 and V1092, introduce charge or are predicted to change the conformation of the pocket to clash with crizotinib but not ATP (Figure 3E-G). Outside of direct drug-protein interactions, positions I1084, T1261, Y1093, and G1242 displayed the largest resistance signals (Figure 3A-D). Structurally, I1084 is located in β1 at the roof of the ATP-binding pocket. The R-group of a His mutation at this position clashes with crizotinib’s hinge-binding and solvent-front moieties but does not interfere with bound-ATP (Figure 3E-F). Y1093 is also at the roof of the ATP-binding site, residing in β2 (Figure 3B). However, its structural influence on resistance is unobvious. In all rotameric states, the R-group of Y1093 points away from the catalytic site and does not clash with crizotinib. We speculate that Y1093 mutations may negatively impact crizotinib’s stability in the catalytic site compared to ATP, as ATP’s triphosphate groups are stabilized by the P-loop. Therefore, comparing crizotinib to DMSO highlights both known hotspots and rarer sites like I1084 and Y1093, which may contribute to resistance through conformational changes rather than disrupting direct inhibitor-protein contacts. Specific inhibitor fitness landscapes also aid in identifying mutations with potential drug sensitivity, such as R1086 and C1091 in the MET P-loop, which might represent attractive interacting sites for further drug design with a lower probability of acquiring resistance.

We identified unique resistance mutations enriched at the ATP-binding site across all inhibitors, yet also noticed discernible differences between type I and II inhibitors, the R-spine, and ⍺C-helix (Figure 4A-G). Mapping inhibitor-specific positions and mutational scores, not only provides a mutation-level breakdown of inhibitor contributions to common resistance mutations, but also demonstrates differences in structural resistance enrichment across specific inhibitors (Figure 4A-G). To summarize this information, we next examined trends by grouping inhibitors by type.

Resistance mutations mapped onto experimental and docked kinase domain structures for type I and type II inhibitors.

(A-F) Resistance positions and average resistance mutational score mapped onto representative crystal structures (tepotinib, 4R1V; merestinib, 4EEV) and labeled (type I, pink; type II, blue). Inhibitors lacking experimental structures (capmatinib, cabozantinib, glumetinib, and glesatinib) were docked onto a representative type I (PDB 2WGJ) and type II (4EEV) structure through AutoDock Vina (Eberheart et al., 2021; Trott et al., 2010). (G) Heatmaps of each resistance position structurally modeled for each drug with fitness scores scaled from loss-of-function (red) to gain-of-function (blue). Wild-type synonymous substitutions are outlined in green.

Novel resistance mutations identified for type I, type II, and type I ½ inhibitors

To reveal patterns across different inhibitor types, we filtered resistance mutations by their score and test statistics (Figure 5-figure supplement 1) and collapsed the information by inhibitor type, plotting the total frequency of resistance mutations at each position (Figure 5A). In this condensed heatmap, several common resistance positions emerged within and across inhibitor types to provide a broad view of “hotspots.” Two positions stood out with the highest frequency of resistance: G1163 and D1228 (Figure 5A-D). Both sites are unsurprising due to their inhibitor interactions-G1163 is at the solvent front entrance of the active site and D1228 stabilizes an inactive conformation of the A-loop with an inhibitor bound (Cui 2014; Recondo et al., 2020; Fernandes et al., 2021). Located at the base of the active site, M1211 is a previously documented resistance site (Tidet et al., 2011) and a C-spine residue (Estevam et al., 2024), which harbors a smaller number of resistance mutations for all inhibitor types within our analysis (Figure 5A). In contrast to these universal sites, Y1230 was a hotspot for type I and I ½ inhibitors, but not a major resistance site for type II inhibitors (Figure 5A-D). This specificity can be rationalized based on the role of Y1230 in stabilizing inhibitors through pi-stacking interactions (Cui 2014). In contrast, F1200 and L1195 (Bahcall et al., 2022; Recondo et al., 2020), are both hotspots for type II but not type I inhibitors (Figure 5A-C). Again, this effect can be rationalized structurally: both residues make direct contact with type II inhibitors, but not type I inhibitors.

Novel resistance mutations and “hotspots” identified for MET inhibitor types.

(A) Collapsed heatmap of common resistance positions along the kinase domain, with the wild-type protein sequence and secondary structure annotated. Each tile represents a sum of counts for statistically filtered resistance mutations across all inhibitors for type I (pink), type II (blue), and the type I½ inhibitor AMG-458 (green), with the scale bar reflecting counts of resistance mutations across respective inhibitor types. (B-D) Expanded heatmap showing each resistance position and the counts for each specific resistance mutation across all inhibitor types type I (pink), type II (blue), and the type I½ inhibitor AMG-458 (green). Wild-type sequence and variant change are annotated. (E-F) Average frequency of resistance mutations for each mapped on to a representative type I (crizotinib, 2WGJ) and type II (merestinib, 4EEV) crystal structure, alongside the type I½, AMG-458 structure (5T3Q), with associated scale bars. Individual positions with high resistance mutation frequencies are annotated on each structure, with a zoom-in of the bound inhibitor and surrounding resistance sites. (G) Venn diagram showing mutations shared among type I (pink), type II (light blue), and type I½ (green). (H) Structurally mapped (PDB 2WGJ) resistance positions shared among type I, II, I½ (blue-gray), type I and II (purple), type I and I½ (dusty rose), type II and I½ (teal) inhibitors.

Statistically filtered resistance mutations for grouped type I, type II, and type I½ inhibitors for MET.

(A-C) Heatmaps of the sum of resistance mutations grouped for type I (pink), type II (blue), and type I½ (green) for MET.

Across all inhibitor types, there were a total of 17 shared variants with G1163, D1228, and M1211 being the most common (Figure 5G). The overall spatial pattern of mutations for each inhibitor type follows general principles that are expected based on their interactions. For example, I1084 is enriched as a resistance site for type I inhibitors, consistent with previous studies in hereditary papillary renal cell carcinomas (Guérin et al., 2023). I1084 is located at the solvent front of the phosphate-loop (P-loop) of the kinase N-lobe (Figure 1A), which is responsible for stabilization of the ATP phosphate groups. This region of the kinase is leveraged for interactions with type I, but not type II inhibitors. In contrast, L1142, an R-spine residue, and L1140, which sits at the back of the ATP-binding pocket, are enriched for type II inhibitor resistance, consistent with their spatial locations (Figure 5E-F). Resistance mutations tend to cluster around the catalytic site across all types, but the shared mutations across different inhibitor types did not display a unifying pattern that evokes a simple rule for combining or sequencing inhibitors to counter resistance (Figure 5H).

Differential sensitivities of the MET kinase domain to type I and type II inhibitors

Strategies aimed at preventing resistance, such as sequential or combination dosing of type I and type II inhibitors, have been explored and offer promise in preventing resistance (Bahcall et al., 2016; Recondo et al., 2020; Bahcall et al., 2022; Fernandes et al., 2021). However, the efficacy of these strategies is limited to the emergence of secondary resistance mutations, and specific inhibitor pairings are further limited to case examples of disparate effects. Using our DMS datasets, we sought to identify inhibitor pairings with the largest divergence in cross-sensitivity. By comparing the fitness landscape of each type I inhibitor to each type II inhibitor, we could again assess inhibitor response likeness based on correlations (Figure 2A; Figure 6-figure supplement 1). Type I and type II pairs with the highest correlations included capmatinib and glesatinib analog (r = 0.92), suggesting a large overlapping fitness profile, in contrast to pairs with the lower correlations, like savolitinib and merestinib (r = 0.7) (Figure 6A; Figure 6-figure supplement 1). Overall, cabozantinib maintained the lowest average correlation with all type I inhibitors, making it the most divergent type II inhibitor within our screen, and potentially offering the least overlap in resistance (Figure 6A; Figure 6-figure supplement 1).

MET kinase domain differential sensitivities revealed for type I and type II inhibitors.

(A) Heatmap showing Pearson correlation values for all combinations of screened type I and type II inhibitors. Correlations were determined from DMSO subtracted fitness scores (B) Correlation plot correlation plot of DMSO subtracted fitness scores for crizotinib and cabozantinib. Mutations with differential scores are highlighted for type I (pink) and type II (blue). (C) Average scores of mutations with differential sensitivities within inhibitor pairs mapped and annotated in respective crystal structures (crizotinib, 2WGJ; cabozantinib, docked into 4EEV). Positions that are gain-of-function for type I but loss-of-function in type II are highlighted in pink, whereas positions that are gain-of-function for type II but loss-of-function in type I are highlighted in blue. (D) Dose-response curves for crizotinib and cabozantinib in Ba/F3 cells expressing TPR-MET (full MET intracellular domain) harboring mutations at Y1093K and L1195M.

Cross-comparison of type I and type II inhibitor pairs.

Scatter plots of each type II inhibitor fitness scores (cabozantinib, glesatinib analog, merestinib; axis in blue) against each type I inhibitor (crizotinib, capmatinib, tepotinib, glumetinib, savolitinib, NVP-BVU972; axis in pink).

Cross-comparison analysis of inhibitors within the same type.

Scatter plots of each inhibitor pair within the type II group (cabozantinib, glesatinib analog, merestinib; axes in blue) and within the type I group (crizotinib, capmatinib, tepotinib, glumetinib, savolitinib, NVP-BVU972; axes in pink).

To narrow our characterization of cross-sensitivity, we focused on the inhibitor pair crizotinib and cabozantinib (Figure 6B; Figure 6-figure supplement 1). By statistically filtering mutations that are categorized as gain-of-function in one inhibitor, but loss-of-function in the other, a set of 44 mutations were identified as having crizotinib resistance and cabozantinib sensitivity, and 3 mutations with the opposite profile (Figure 6B,C). Structural mapping of divergent mutations further revealed enrichment at the N-lobe and typical protein-drug interaction sites like Y1230, G1163, and M1211 (Figure 6B, C). While these positions have precedence for resistance, as previously noted, they are also resistance hotspots across all inhibitor types (Figure 5A), where even mutations with differential sensitivities may be insufficient targets to counteract the reemergence of resistance, thus limiting the interchangeability of drugs.

Understanding which mutations have resistance profiles for only type I or type II inhibitors provides better leverage for sequential and combination dosing. To identify such mutations across our dataset, we further filtered variants that met our resistance metrics and were only observed for inhibitors of the same type. In again comparing crizotinib to cabozantinib, Y1093K was a mutation with one of the largest differences between crizotinib and cabozantinib, having a gain-of-function profile for crizotinib and loss-of-function for cabozantinib (Figure 6B). Interestingly, Y1093 is located in β2 of the N-lobe, at the roof of the ATP-binding site, and does not directly engage with crizotinib. We speculate this mutation potentially contributes to resistance by perturbing the packing of β1-β2 and altering the conformation of the ATP binding site in a manner that destabilizes crizotinib binding. When comparing the dose-response of Y1093K to the wild-type TPR-MET kinase domain, Y1093K shows a nearly 10-fold shift in crizotinib sensitivity with no difference in cabozantinib sensitivity (Figure 6D). In identifying mutations with the opposite profiles, resistance to cabozantinib and sensitivity to crizotinib, L1195M displayed the greatest differential scores (Figure 6B). L1195 is an ⍺E-helix position with previously recorded resistance (L1195V/F), which our analysis further supports as a type II-only resistance hotspot (Figure 5A). Structurally, mutations like Met or Phe at 1195 clash with the fluorophenyl moiety of cabozantinib used to access and stabilize a deep, back pocket of the kinase in an inactive conformation, unlike crizotinib which occupies the solvent front and adenosine binding region of the ATP binding site. In comparing the dose-response of L1195M to the wild-type TPR-MET kinase domain, we find that L1195M is refractory to all concentrations of cabozantinib tested, but still sensitive to crizotinib (Figure 6D). Beyond a type I and type II pairing, such cross-resistance identification can be further applied to identify differential sensitivities within an inhibitor group (Figure 6-figure supplement 2), which can further expand opportunities for inhibitor-specific sensitivity in therapy and drug design.

Identification of biophysical contributors to inhibitor-specific fitness landscapes using machine learning

Machine learning models originally developed for predicting protein structure (Jumper et al., 2021; Rives et al., 2021; Lin et al., 2023) have been adapted for predicting protein-ligand complexes (Bryant et al., 2023), and predicting fitness values from DMS studies (Meier et al., 2021; Brandes et al., 2023; Jones et al., 2020). In particular, protein language models have shown the ability to estimate the functional effects of sequence variants in correlation with DMS data (Rives et al., 2021; Meier et al., 2021; Brandes et al., 2023). We observed that ESM-1b, a protein language model (Rives et al., 2021), predicts the fitness of variants in the untreated/DMSO condition (correlation 0.50) much better than it does for the inhibitor treated datasets (correlation 0.28). This difference in predictive ability is likely because the language model is trained on sequences in the evolutionary record and fitness in the presence of inhibitors does not reflect a pressure that has operated on evolutionary timescales.

To overcome this limitation and improve the predictive properties of the ESM approach, we sought to augment the model with additional features that reflect the interactions between protein and inhibitors that are not present in the evolutionary record (Chen and Guestrin, 2016). There are several challenges associated with this task, including the narrow sequence space explored, high correlations between datasets, and the limited chemical space explored by the 11 inhibitors. We used an XGBoost regressor framework and designed a test-train-validation strategy to account for these issues (Figure 7A), exploring many features representing conformation, stability, inhibitor-mutation distance, and inhibitor chemical information (Figure 7-Figure Supplement 1). To avoid overfitting, we introduced several constraints on the monotonicity and the precision of certain features. The final model uses a subset of the features we tested and improves the performance from 0.28 to 0.37 (Figure 7B,C). The model primarily improves the correlation by shifting the distribution of predicted fitness values to center around drug sensitivity, reflecting the pressures that are not accounted for by ESM-1b (Figure 7D). Nonetheless, many resistant mutations are correctly predicted by the new model.

Inhibitor-bound variant fitness predicted from a machine learning model trained on the MET DMS dataset.

(A) Model architecture outlining the information flow and inputs for model training, validation, fitness predictions, and prediction tests. (B) Improvement in correlation between experimental and predicted fitness for each inhibitor with usage of different kinds of features. (C) Cross-validation trends between the baseline ESM model and the model with all features incorporated. (D) Scatter plots of predictions versus experimental fitness scores of the baseline ESM model (top) compared to the model with all features (bottom), with a dashed cross-graph line in red displayed. (E) Residue-level analysis of feature significance in fitness predictions (ESM, stability, distance, conformation, all features). The Rosace experimental score is shown as a red line. (F) Residues with improved predictions mapped on a crizotinib-bound MET kinase domain (PDB 2WGJ). Predicted resistance mutations (dark purple) modeled relative to the wild-type residue (pink).

Distribution and visualization of features used in the XGBoost machine learning models.

(A) Distribution of ESM LLR vs. Experimental fitness (top) and ΔΔG vs. Experimental fitness (bottom). (B) Distribution of all features (except ESM LLR and ΔΔG) extracted and used for the XGBoost models. The features that were incorporated in the best performing model are shown in yellow. The red dashed lines within each distribution show the edges of bins used to bin the feature values. (C) ΔΔΔG calculated from predicted ΔΔG Type I (PDB 2WGJ) (left) and Type II (PDB 4EEV) (right) MET kinase structure by subtracting type II ΔΔG from type I ΔΔG. The key regions showing difference in conformation between type I and II structures are the DFG motif (purple) and aC helix (teal). (D) Calculation of “residue to ATP” distance feature for residue D1228 in ATP bound MET Kinase structure (3DKC) is shown. Centroid of the ATP molecule is shown as a pink sphere. (E) Example of ΔVolume feature calculation using the difference between the volume of Asp and Cys. (F) Ensemble of MET kinase domain crystal structures aligned and RMSF of a given residue (D1228 in this example). (G) The shortest distance between the inhibitor and a mutation calculated from the Umol predicted variant-inhibitor structure. (H) The binding pocket of crizotinib in the predicted Umol structure. Pocket volume, hydrophobicity score, polarity score and RF score are calculated from this binding site. (I) Residue RMSD feature is described by the Umol predicted structure of variant D1228C (pink) superposed onto the wild-type reference structure (PDB 2WGJ, gray) and RMSD between D1228 in the wild-type structure and 1228C in the variant structure. (J) Ligand RMSD feature The Umol predicted structure of variant D1228C (pink) superposed onto the wild-type reference structure (PDB 2WGJ, gray) and RMSD between crizotinib in the wild-type structure (pink) and in the variant structure (blue).

To examine whether the model could help interpret the mechanisms of specific mutations, we examined several cases with notable improved predictions as the model increased in complexity (Figure 7E,F). For some mutations, as in Y1230D, we observe a gradual improvement in prediction for each set of features, suggesting that resistance relies on multiple factors. For other mutations, such as N1167K, we see a single set of features driving the improvement, which suggests much more dominant driving forces. Lastly, in other mutations, like G1290D, the models trained with different features can over or under predict the true value, demonstrating the value of combining features together. The reliance on simple features helps identify some of the major factors in drug resistance and sensitization such as distance to the inhibitor and active/inactive conformation; however, improved feature engineering and coverage of both sequence and chemical space will likely be needed to create a more interpretable model.

Discussion

Tyrosine kinase inhibitors have revolutionized the treatment of many diseases, but the development of resistance creates a significant challenge for long term efficacy. Many strategies, including sequential dosing (Attwood et al., 2021; Recondo et al., 2020), are being explored to overcome resistance. Our DMS of the MET receptor tyrosine kinase domain, performed against a panel of varying inhibitors offers a framework for experimentally identifying resistance and sensitizing mutations in an activated kinase context for different inhibitors. By massively screening the effect of a nearly comprehensive library of amino acid mutations in the MET kinase domain against 11 inhibitors, some generalizable patterns emerged. In concordance with the binding mode of both type I and II inhibitors, residues that commonly confer resistance, or act as “hotspots,” were mapped to previously reported sites like D1228, Y1230, M1211, G1163, and novel sites like I1084, L1140, L1142, T1261, and L1272. Annotation of hotspots also offers an opportunity to inform inhibitor selection based on likelihood of cross-inhibitor resistance (Figure 5). For instance, I1084 is a hotspot for type I and II inhibitors within our study that displayed wild-type sensitivity to the type I½ inhibitor screened (Figure 5). Understanding positions with high resistance frequencies that are distal from the ATP-binding site also offers a design opportunity for allosteric inhibitors that can target cancer-associated and resistance-associated regions within the N- and C-lobe (Mingione et al., 2023).

Nevertheless, similar to its ability in identifying resistance for inhibitors, our parallel DMS also demonstrated the ability to detect non-selective drugs, with the example of tivantinib. Despite being a proposed MET-selective inhibitor, like several others, tivantinib failed clinical trials, and follow-up studies suggested cytotoxicity and off-target binding as the culprit (Michieli and Di Nicolantonio, 2013; Basilico et al., 2013; Katayama et al., 2013; Fujino et al., 2019)-a scenario that is not uncommon to antitumor drugs that do not advance to the clinic (Lin et al., 2019). To this effect, the ability of DMS to differentiate between selective compounds provides a unique prospect for developing inverse structure−activity relationships, whereby varying protein sequence both inhibitor specificity and resistance can be learned.

Reported cancer mutations in databases such as OncoKB or cBioPortal are useful for patient data and cancer type reporting (Suehnholz et al., 2024; Chakravarty et al., 2017; Cerami et al., 2012). A recent analysis of these databases aided in annotation of mutations observed within patient populations (Pecci et al., 2024). This study used pre-clinical models to examine a subset of these mutations and identified sensitivities to multiple inhibitors and confirmed clinical responses of two rare driver mutations (H1094Y and F1200I) to elzovantinib, a type Ib inhibitor (Pecci et al., 2024). Their results are consistent with the predictions of our DMS, illustrating the potential value of having a broad dictionary of inhibitor sensitivity and resistance patterns.

Finally, a significant challenge of inhibitor screening is the considerable time and cost involved, even at high-throughput. While docking has accelerated the prioritization of compounds for protein targeting and screening in silico (Sadybekov and Katritch, 2023), prediction of drug resistance is of high interest in informing iterative drug design. Screening for resistance in the early stages of drug design is particularly useful for obtaining inhibitors that can be effective in the long-term by optimizing protein-inhibitor interactions in the wildtype and functionally silent mutant context (Pisa and Kapoor, 2020). While base-editor approaches can rapidly screen for inhibitor resistance mutations within full-length, endogenous genes, undersampling of rare variants due to lower coverage is a significant caveat (Dorighi et al., 2024), compared to DMS where nearly full coverage is achieved and controlled. A full landscape of mutational effects can help to predict drug response and guide small molecule design to counteract acquired resistance. The ideas motivating our ML-model, which combines protein language models and biophysical/chemical features, to novel inhibitors could eventually be used to profile resistance and sensitivity for novel and unscreened small molecules, greatly extending the scale of kinase inhibitor repositioning for second-line therapies.

Materials and methods

Mammalian cell culturing

Ba/F3 cells (DSMZ) were maintained and passaged in 90% RPMI (Gibco), 10% HI-FBS (Gibco), 1% penicillin/streptomycin (Gibco), and 10ng/ml IL-3 (Fisher), and incubated at 37°C with 5% CO2. Cells were passaged at or below 1.0e6 cells/ml to avoid acquired IL-3 resistance, and regularly checked for IL-3 dependence by performing 3x PBS (Gibco) washes and outgrowth in the absence of IL-3.

Plat-E cells stably expressing retroviral envelope and packaging plasmids were originally gifted by Dr. Wendell Lim, and maintained in 90% DMEM, HEPES (Gibco), 10% HI-FBS (Gibco), 1% penicillin/streptomycin (Gibco), 10ug/ml blasticidin, 1ug/ml puromycin. Cells were cultured at 37°C with 5% CO2 and maintained under blasticidin and puromycin antibiotic pressure unless being transfected.

Dose response and IC50 determination of inhibitors

Unless otherwise stated, all inhibitors used in this study were purchased from SelleckChem.

Ba/F3 cells stably expressing TPR-MET and TPR-METΔEx14 were washed with DPBS (Gibco) 3x times to remove IL-3, puromycin, penicillin, and streptomycin. Cells were resuspended in 90% RPMI and 10% FBS, and were seeded in the wells of a 96-well, round-bottom plate at a density of 2.5e4 cells/ml in 200ul. Cells were incubated for 24hr to allow kinase-driven signaling. The next day, inhibitors were added to triplicate rows of cells at a concentration range of 0 to 10uM (2-fold dilutions), and allowed to incubate for 72 hr post TKI addition. CellTiter-Glo reagent (Promega) was mixed at a 1:1 ratio with cells to lyse and provide a luminescence readout, which was measured on a Veritas luminometer. Cell numbers were determined from a Ba/F3 cell and ATP standard curve generated according to the manufacturer’s instructions. Dose response curves were fitted using GraphPad Prism with the log(inhibitor) vs. response, variable slope function. Data are presented as cell viability normalized to the fold change from the TKI free control.

MET kinase domain variant library generation, cloning, and library introduction into Ba/F3

In this study, we repurposed cell lines transduced with TPR-MET and TPR-METΔEx14 kinase domain variant libraries, previously reported in Estevam et al., 2024. All libraries were generated, transfected, and tested in parallel.

In short, the MET kinase domain sequence used in this study spans amino acid positions 1059-1345, which includes the full kinase domain (aa 1071-1345) and a small region of the juxtamembrane (aa 1059-1070). The variant DNA library was synthesized by Twist Bioscience, containing one mammalian high-usage codon per amino acid. A “fill-in” library was generated to introduce an early stop control codon every 11 amino acids evenly spaced across the sequence. In addition, mutations at positions with failed synthesis (positions 1194 and 1278) were generated and added at equimolar concentration into the variant library. The kinase domain variant library was introduced into two different cloning backbones, one carrying the TPR-fusion sequence with the wild-type juxtamembrane sequence (aa 963-1058), wild-type C-terminal tail (aa 1346-1390), and IRES-EGFP (pUC19_kozak-TPR-METΔEx14-IRES-EGFP) and the other carrying the TPR-fusion sequence with an exon 14 skipped juxtamembrane sequence (aa 1010-1058), wild-type C-terminal tail (aa 1346-1390), and IRES-mCherry (pUC19_kozak-TPR-MET-IRES-mCherry). The libraries were transformed in MegaX 10 beta cells (Invitrogen), propagated in 50mL LB and Carbinacillin at 37°C to an OD of 0.5, and then midiprepped (Zymo). Library coverage was determined by colony count of serial dilutions from the recovery plated at varying dilutions (1:100, 1:1k, 1:10k, 1:100k, 1:1M).

The full TPR-METΔEx14-IRES-EGFP and TPR-MET-IRES-mCherry variant libraries were then shuttled into the mammalian retroviral MSCV backbone (addgene) through restriction enzyme digest with MluI-HF (NEB) and MfeI-HF (NEB), then ligated into the empty backbone with T4 ligase (NEB). Ligations were DNA cleaned (Zymo), electroporated into ElectroMAX Stbl4 Competent Cells (Thermo Fisher), plated on LB-agar bioassay plates with Carbenicillin, incubated at 37°C, then colonies were scraped into 50mL LB and midi-prepped for transfections (Zymo).

Variant libraries were transfected into Plat-E cells for retroviral packaging using Lipofectamine3000 (Invitrogen) following the manufacturer’s for a T-175 scale, and using a total of 46 μg DNA. 48 hr post-transfection, the viral supernatant was harvested, passed through a 0.45 μm sterile filter, then concentrated with Retro-X concentrator (TakaraBio) using a 1:4 ratio of concentrator to supernatant. The concentrated virus was titered in Ba/F3 to determine the proper volume for a transduction MOI of 0.1-0.3. The viral titer was calculated from the percent of fluorescent cells and viral dilution. To generate the DMS transduced cell lines, 6 million cells were spinfected at an MOI of 0.1, in triplicate. Then infected cells were selected with 1μg/ml puromycin in 4 days, with fluorescence and cell counts tracked each day.

DMS time point selection and sample preparation

All screening conditions were performed and handled in parallel for TPR-MET and TPR-METΔEx14 libraries across all independent conditions and biological replicates.

For each biological replicate, a stock of 4.0e6 cells transduced with TPR-MET and TPR-METΔEx14 kinase domain variants was thawed and expanded for 48 hrs in the presence of IL-3 and puromycin to prevent pre-TKI selection to reach a density for screen seeding. Each batch of cells were grown to a density of 72 million cells to be split into 12 dishes (15cm) for each selection condition. Cells were first washed with DPBS (Gibco) 3 times to remove IL-3 and antibiotics. Cells were resuspended in 90% RPMI and 10% FBS, counted, and split across 12 dishes (15cm) at a density of 6 million cells in 30mL. A total of 6 million cells from each replicate was harvested and pelleted at 250xg to serve as the “time point 0” pre-selection sample (T0).

To begin selection of each replicate for each library, DMSO was added to the control plate (0.01% final) while the appropriate IC50 concentration of inhibitor was added to each respective plate (independent pool of cells). Three time points post T0 were collected for each library replicate and inhibitor condition for a total of 4 time points (T0, T1, T2, T3). Time points were harvested every two doublings (∼72hr) across 12 days; 6 million cells were harvested for each condition and pelleted at 250xg for 5min; 2.0e5 cells/ml were split at every time point and maintained either in DMSO or TKI at the appropriate concentration to maintain cellular growth rates under inhibitor selection.

The gDNA of each time point sample was isolated with the TakaraBio NucleoSpin Blood QuickPure kit the same day the cells were harvested. gDNA was eluted in 50μl of elution buffer provided by the kit, using the high concentration and high yield elution manufacturer’s protocol. Immediately after gDNA was isolated, 5μg of gDNA was used for PCR amplification of the target MET KD gene to achieve the proper variant coverage. A 150μl PCR master mix was prepared for each sample using the TakaraBio PrimeStar GXL system according to the following recipe: 30μl 5X PrimeStar GXL buffer, 4.5μl 10μM forward primer (0.3uM final), 4.5μl 10μM reverse primer (0.3uM final), 5μg gDNA, 12μl 10mM dNTPs (2.5mM each NTP), 6μl GXL polymerase, nuclease free water to a final reaction volume of 150uL. The PCR master mix for each sample was split into three PCR tubes with 50μl volumes for each condition and amplified with the following thermocycler parameters: initial denaturation at 98°C for 30 s, followed by 24x cycles of denaturation at 98°C for 10 s, annealing at 60°C for 15 s, extension at 68°C for 14 s, and a final extension at 68°C for 1 min.

PCR samples were stored at-20°C until all time points and replicates were harvested and amplified, so as to prepare all final samples for NGS together with the same handling and sequence them in the same pool to prevent sequencing bias.

Library preparation and Next-generation sequencing

After all time points were selected, harvested, and PCR amplified, the target gene amplicon was isolated from gDNA by gel purification (Zymo), for a total of 222 samples. The entire 150μl PCR reaction for each sample was mixed with 1X NEB Purple Loading Dye (6X stock) and run on a 0.8% agarose, 1X TBE gel, at 100 mA until there was clear ladder separation and distinct amplicon bands. The target amplicons were gel excised and purified with the Zymo Gel DNA Recovery kit. To remove excess agarose contamination, each sample was then further cleaned using the Zymo DNA Clean and Concentrator-5 kit and eluted in nuclease free water. Amplicon DNA concentrations were then determined by Qubit dsDNA HS assay (Invitrogen).

Libraries were then prepared for deep sequencing using the Nextera XT DNA Library Prep kit in a 96-well plate format (Illumina). Manufacturer’s instructions were followed for each step: tagmentation, indexing and amplification, and clean up. Libraries were indexed using the IDT for Nextera Unique Dual Indexes Set A,B and C (Illumina). Then, indexed libraries were quantified using the Agilent TapeStation with HS D5000 screen tape (Agilent) and reagents (Agilent). DNA concentrations were further confirmed with a Qubit dsDNA HS assay (Invitrogen). All samples were manually normalized and pooled at 10nM (MET and METΔEx14 in the same pool). The library was then paired-end sequenced (SP300) on two lanes of a NovaSeq6000.

MET kinase domain variant analysis and scoring

Enrich2 Scoring

Our approach followed the one used for our initial MET DMS experiments [cite Met 1]. Sequencing files were obtained from the sequencing core as demultiplexed fastq.gz files. The reads were first filtered for contamination and adapters using BBDuk, then the paired reads were error-corrected and merged with BBMerge and mapped to the reference sequence using BBMap (all from BBTools) [Bushnell, B. (2014). BBTools software package.]. Read consequences were determined and counted using the AnalyzeSaturationMutagenesis tool in GATK v4 (van der Auwera and O’Connor, 2020). This is further processed by use of a script to filter out any variants that are not expected to be in the library (i.e., variants due to errors in sequencing, amplification, etc). The final processed count files were then analyzed with Enrich2, using weighted least squares and normalizing to wildtype sequences (SRA 28784151).

Rosace scoring

We used Rosace to analyze experiments of different conditions (DMSO or inhibitors) independently. In order to make the scores more comparable and interpretable across conditions, we modified the original Rosace software so that the output scores reflect the scale of cell doubling rate between every contiguous time point. For example, in an ideal experiment, if the wild-type cell doubling rate is 2 and its score is 0 by wild-type normalization, a score of-2 means that the cells are not growing (2^(2-2)) and a score of 1 means that the cells are doubling three times (increasing to 2^(2+1) times the original count) between every contiguous time point.

The input to Rosace is the filtered count files provided by the AnalyzeSaturationMutagenesis tool described in the above section. From here, we filtered variants by the mean count (>= 4) and the proportion of 0 count across replicates and time points (<= 10/12) and before inhibitor selection at T0 (<= 2/3). Second, we normalized the counts using wild-type normalization with log2 transformation rather than the default natural log transformation to maintain the doubling rate scale. Finally, the normalized count is regressed on time intervals (t = {1, 2, 3, 4}) instead of the entire time span (t = {1/4, 2/4, 3/4, 4/4}) so that the resulting score reflects the growth rate between every contiguous time point.

Statistical filtering and resistance classification

In mathematical terms, we define the raw Rosace fitness scores of a mutation in DMSO as βv,DMSOand in a specific inhibitor condition as βv,inh. The scores of wildtype variants were normalized to 0, and we denote them as βwt,DMSO= 0 and βwt,inh= 0. Growth rate of wild-type cells under different inhibitor selections were controlled to be identical (two doublings between every time point), so even though raw Rosace scores are computed independently per condition, βv,inh are directly comparable between inhibitor conditions.

Within one condition (DMSO or inhibitor), according to convention, we call variants with βv,inh≫ 0 or βv,DMSO≫ 0 “gain-of-function” and βv,inh≪ 0 or βv,DMSO≪ 0 “loss-of-function”. With scores from multiple conditions, we presented three types of filtering strategies and produced the following classification: inhibitor-specific "resistance mutation", inhibitor-specific "resistance position", and "loss-of-function" and "gain-of-function" mutation in the context of growth rate differential with and without an inhibitor.

We stress the different interpretations of “gain-of-function” and “loss-of-function” labels. Within one condition, this label is a general term to describe whether the function of protein is perturbed by the mutation. In contrast, the latter describes the difference with and without a given inhibitor, canceling effects of folding, expression, and stability and targeting only the inhibitor sensitivity function of the protein.

A “resistance mutation” is specific to a certain inhibitor, and it satisfies the chained inequality βv,inh≫ βwt,inh= βwt,DMSO≥ βv,DMSO. The first inequality specifies that the growth rate of a resistant mutation needs to be much larger than that of the wild-type in the presence of the inhibitor, and we used the one-sided statistical test βv,inh > 0. 5 with the test statistics cutoff 0.1. The second inequality specifies that in DMSO, the growth rate of that mutation is equal to or lower than that of the wild-type, ensuring that the resistance behavior we see is specific to that inhibitor, not that it grows faster under every condition, and thus we used the effect size cutoff βv,DMSO≤ 0.

A “resistance position” is a position that contains at least one “resistance mutation” to a certain inhibitor.

In the context of growth rate differential with and without an inhibitor, a mutation is “gain-of-function” if it has a higher growth rate in the presence of the inhibitor than in its absence, which is one feature of “resistance mutation”. It is “loss-of-function” if it grows faster in the absence of the inhibitor. To label the mutations accordingly, we first computed a recentered Rosace score for each mutation under inhibitor selection γv,inh= βv,inh − βv,DMSO, and define “gain-of-function” γv,inh > 0.75 and "loss-of function" γv,inh< 0 in the differential sensitivity analysis.

Machine learning modeling

Feature selection for the machine learning model

Interpretable features of the MET sequence variants and inhibitors were carefully chosen to be incrementally added to a model. To extract structural features from inhibitor bound mutant complexes, we used Umol to predict the structures of all the MET kinase variants bound to each of the inhibitors (Bryant et al. 2023). The input to Umol is the MET kinase variant sequence, SMILES string of the inhibitor and list of residues lining the putative binding pocket. The predicted complexes (MET kinase bound to inhibitor) were relaxed using OpenMM (Eastman et al. 2013). To ensure the inhibitor in the predicted structures are in the same pose as compared to reference structures, we tethered the predicted inhibitor structure to the reference pose using a modified version of the script available in https://github.com/Discngine/rdkit_tethered_minimization. The reference pose for crizotinib, NVP-BVU972, Merestenib and Savolitinib were taken from the corresponding crystal structures in the PDB-2WGJ, 3QTI, 4EEV and 6SDE, while the reference pose for cabozantinib, capmantinib, glumetinib and glesatinib analog were taken from the structures docked using Autodock Vina (see Kinase domain structural analysis). Following this, the tethered inhibitors were redocked back to the predicted variant structures using Autodock Vina (Eberhardt et al. 2021). We also extracted features from wild-type MET kinase structures. The features could be broadly classified into four categories: inhibitor, stability, distance, conformation and inhibitor binding. Apart from these, ESM Log Likelihood Ratio was used as a feature in all models that we trained. Each of the feature categories that we explored and the rationale behind choosing them are explained below:

  • (1)ESM Log Likelihood Ratio (ESM LLR)

    ESM1b is an unsupervised protein language model trained on a large set of protein sequences from UniProt that has successfully learned protein fitness patterns (Rives et al, 2021; Lin et al., 2023). By including a mask token at a given position in the sequence, the log-likelihoods of all amino acid substitutions can be extracted from the model. The ratio between ESM1b log-likelihoods for the mutant and wildtype amino acids provides a score that indicates the fitness of each variant in the mutational scan, with log-likelihood ratios having precedent as a variant predictor (Rives et al, 2021; Lin et al., 2023). The predictions used here were obtained using esm-variants webserver (https://huggingface.co/spaces/ntranoslab/esm_variants) (Brandes et al., 2023).

  • (2)Inhibitor features:

    • Inhibitor molecular weight: We calculated the molecular weight of each inhibitor as a feature.

    • Ligand RMSD: We structurally superposed the predicted variant structure onto the corresponding wildtype structure and calculated the RMSD between the predicted, re-docked inhibitor and the reference inhibitor structure (Figure 7-figure supplement 1J)

  • (3)Stability features:

    • ΔΔΔG and ΔΔG: Because inhibitor types are largely distinguished based on binding configuration, we reasoned that the difference in stability contributed by each mutation between given binding states (e.g. Type I bound state vs. a Type II bound state) could contribute to the success of the predictor. To compute the stability difference, we used structural representatives for type-I bound (2WGJ) and type II bound (4EEV) MET kinase and calculated the change in free energy (ΔΔG) of every possible mutation at every position using ThermoMPNN (Dieckhaus et al., 2023). The difference in ΔΔG between type-I bound and type-II bound structures (ΔΔΔG) for every variant was added as a feature to the XGBoost model to capture the difference in stabilization from the mutation in the Type I or Type II bound state (Figure 7-figure supplement 1C). We also used the predicted ΔΔG score of the corresponding inhibitor type-bound structure directly as a feature. For instance, if the input data corresponds to a mutation to Alanine at position 1065 in the presence of glumetinib (a type I inhibitor), the difference between ΔΔG predicted for the 1065A variant for the type-I bound (2WGJ) and for type II bound (4EEV) structure is used as a feature (ΔΔΔG). The ΔΔG predicted for the 1065A variant for the type-I bound (2WGJ) structure is also used as a feature.

  • (4)Distance features:

    • Residue to ATP distance: Proximity to the ATP-binding site indicates the ability of the given residue to influence inhibitor binding given that Type I and Type II inhibitors are ATP competitive. To include this feature, the distance between C-alpha residue atoms and the centroid of bound ATP in a representative structure (PDB 3DKC) was calculated and the distance corresponding to each position was added as a feature (Figure 7-figure supplement 1D).

    • Inhibitor distance: This is the shortest distance between the inhibitor and mutated residue in the predicted variant-inhibitor complexes. (Figure 7-figure supplement 1G).

  • (5)Conformational features:

    • MET crystal structure RMSF: The extent of flexibility at the mutation position could be significantly affected by the mutation, which in turn can affect the function of the variant. To account for this, we utilized the structural information abundantly available for MET kinases in PDB. We structurally aligned all crystal structures of human MET kinases with resolution better than 3Å (81 structures) using mTM-align (Dong et al., 2018) and calculated the Root Mean Squared Fluctuation at every residue position using Prody (Zhang et al., 2021) (Figure 7-figure supplement 1F).

    • Residue RMSD: We structurally superposed the predicted variant structure onto the corresponding wildtype structure and calculated the RMSD between the mutant and wildtype residue at the mutation position (Figure 7-figure supplement 1I)

  • (6)Inhibitor Binding features:

    • RF-Score: To quantify the binding strength between the inhibitor and the variant protein structure, we calculated the RF-score, which is a random forest-based approach to predict protein-ligand binding affinity (Wójcikowski et al., 2017)

    • Pocket volume, hydrophobicity score and polarity score: Changes to the binding pocket in terms of volume and hydrophobicity due to mutations could affect the interaction and binding between the inhibitor and variant. These effects were brought in as features into the model by calculating the binding pocket volume, hydrophobicity score and polarity score of the binding pocket using fpocket (Le Guilloux et al., 2009) (Figure 7-figure supplement 1H).

This category of features are not part of the best performing model shown in Figure 7.

Apart from these categories, we calculated the difference in volume between the wildtype and mutated residue at a given position and added it as a feature (ΔVolume) since residue volume changes upon mutation could contribute to steric hindrance (Figure 7-figure supplement 1E). This feature is also not part of the best performing model. This led to a total of 14 interpretable features to evaluate our models on. We trained and tested a total of 8192 models by considering all possible numbers and combinations of these features (keeping ESM LLR as a constant feature in all models). The hyperparameter tuning, cross-validation, training and testing of each of these models are described in detail below.

Training and Selecting the Predictive Model

An XGBoost regressor model, which is a gradient boosting method based on decision trees as the base learner (Chen and Guestrin, 2016), was used to predict DMS fitness scores in presence of inhibitors. Given the relatively small dataset we are using here, the models are prone to overfitting. Hence, we used monotonic constraints on features that had a monotonic relationship with the experimental fitness scores. ESM LLR score and ΔΔG have a positive and negative correlation with the experimental fitness scores respectively (Figure 7-figure supplement 1A). Therefore, ESM LLR was constrained positively and ΔΔG was constrained negatively by assigning 1 and-1 respectively to the ‘monotone_constraints’ parameter in Python XGBoost. This ensures that the monotonic relationship between the input feature and the target value is maintained during predictions.To further prevent overfitting, we binned the values of the 12 remaining into four or five bins and assigned the median of the bin as their value. The bins were chosen such that one or two bins would contain the majority of feature values. The distribution of these twelve features are shown in Figure 7-figure supplement 1B. The bins of each feature are shown as red dashed lines on the histograms. Model performance was evaluated using Pearson’s R and mean squared error (MSE).

Experimental fitness scores of MET variants in the presence of DMSO and AMG458 were ignored in model training and testing since having just one set of data for a type I ½ inhibitor and DMSO leads to learning by simply memorizing the inhibitor type, without generalizability. The remaining dataset was split into training and test sets to further avoid overfitting (Figure 7A). The following data points were held out for testing-(a) all mutations in the presence of one type I (crizotinib) and one type II (glesatinib analog) inhibitor, (b) 20% of randomly chosen positions (columns) and (c) all mutations in two randomly selected amino acids (rows) (e.g. all mutations to Phe, Ser). After splitting the dataset into train and test sets, the train set was used for XGBoost hyperparameter tuning and cross-validation. For tuning the hyperparameters of each of the XGBoost models, we held out 20% of randomly sampled data points in the training set and used the remaining 80% data for Bayesian hyperparameter optimization of the models with Optuna (Akiba et al., 2019), with an objective to minimize the mean squared error between the fitness predictions on 20% held out split and the corresponding experimental fitness scores. The following hyperparameters were sampled and tuned: type of booster (booster-gbtree or dart), maximum tree depth (max_depth), number of trees (n_estimators), learning rate (eta), minimum leaf split loss (gamma), subsample ratio of columns when constructing each tree (colsample_bytree), L1 and L2 regularization terms (alpha and beta) and tree growth policy (grow_policy-depthwise or lossguide). After identifying the best combination of hyperparameters for each of the models, we performed 10-fold cross validation (with re-sampling) of the models on the full training set. The training set consists of data points corresponding to 230 positions and 18 amino acids. We split these into 10 parts such that each part corresponds to data from 23 positions and 2 amino acids. Then, at each of 10 iterations of cross-validation, models were trained on 9 of 10 parts (207 positions and 16 amino acids) and evaluated on the 1 held out part (23 positions and 2 amino acids). Through this protocol we ensure that we evaluate performance of the models with different subsets of positions and amino acids. The average Pearson correlation and mean squared error of the models from these 10 iterations were calculated and the best performing model out of 8192 models was chosen as the one with the highest cross-validation correlation. The final XGBoost models were obtained by training on the full training set and also used to obtain the fitness score predictions for the validation and test sets. These predictions were used to calculate the inhibitor-wise correlations shown in Figure 7B.

Kinase domain structural analysis

Unless otherwise stated, all structural analysis was performed on PyMOL. Structural mapping incorporated tools from the Bio3D bioinformatics package in R (Grant et al., 2006). Inhibitors that lacked an experimental crystal structure were docked into a representative type I (2WGJ) or type II (4EEV) structure with AutoDock Vina (Eberhardt et al., 2021). Existing ligands in both the structures were removed in silico and the proteins prepared for docking using AutoDockTools by adding polar hydrogens and Kollman charges. The inhibitors were also prepared using AutoDockTools by adding polar hydrogens and charges and identifying rotatable torsions. A grid box which dictates the search space for the docking tool was defined approximately around the region where the existing ligands in 2WGJ and 4EEV were bound. The energy range and exhaustiveness of docking was set to 3 and 8 respectively. AutoDock Vina was made to output 5 modes for each ligand. Capmatenib and glumetinib (type I inhibitors) were docked on to 2WGJ and glesatinib analog and cabozantinib (type II inhibitors) were docked on to 4EEV.

Acknowledgements

Sequencing was performed at the UCSF CAT, supported by UCSF PBBR, RRP IMIA, and NIH 1S10OD028511-01 grants. This work was supported by NIH CA239604 to EAC, NJ, JSF; HHMI Hanna Gray Fellowship and UCSF QBI Fellow program to WCM; NIH R01LM013434 to JAC; NIH GM145238 and the UCSF Program for Breakthrough Biomedical Research, funded in part by the Sandler Foundation, to JSF.

Additional information

Author contributions

Conceptualization JSF, NJ, EAC; Methodology, GOE, EML, JR, CBM; Formal analysis, GOE; Writing – Original Draft, GOE and JSF; Writing – Editing, GOE, JSF, EML, NJ, EAC; Validation, GOE, EML; Machine learning, AR, KC; Funding Acquisition, JSF, NJ, EAC; Resources, JSF, NJ, EA, WCM, HP; Supervision, JSF, NJ, EAC, WMC, EML, HP, JAC.

Competing Interests

JSF is a consultant for, has equity in, and receives research support from Relay Therapeutics and is a consultant for Octant Bio. N.J. is a founder of Rezo Therapeutics and a shareholder of Rezo Therapeutics, Sudo Therapeutics, and type6 Therapeutics. N.J. is a SAB member of Sudo Therapeutics, type6 Therapeutic and NIBR Oncology. The Jura laboratory has received sponsored research support from Genentech, Rezo Therapeutics and type6 Therapeutics. E.A.C. is a consultant at IHP Therapeutics, Valar Labs, Tatara Therapeutics and Pear Diagnostics, reports receiving commercial research grants from Pfizer, and has stock ownership in Tatara Therapeutics, HDT Bio, Clara Health, Aqtual, and Guardant Health.

Code and data availability

The sequencing data has been deposited at the NCBI SRA (28784151). Original data files, analysis, and source code is available at https://github.com/fraser-lab/MET_kinase_Inhibitor_DMS