High-throughput profiling of sequence recognition by tyrosine kinases and SH2 domains using bacterial peptide display

  1. Allyson Li
  2. Rashmi Voleti
  3. Minhee Lee
  4. Dejan Gagoski
  5. Neel H Shah  Is a corresponding author
  1. Department of Chemistry, Columbia University, United States
  2. Department of Biological Sciences, Columbia University, United States

Abstract

Tyrosine kinases and SH2 (phosphotyrosine recognition) domains have binding specificities that depend on the amino acid sequence surrounding the target (phospho)tyrosine residue. Although the preferred recognition motifs of many kinases and SH2 domains are known, we lack a quantitative description of sequence specificity that could guide predictions about signaling pathways or be used to design sequences for biomedical applications. Here, we present a platform that combines genetically encoded peptide libraries and deep sequencing to profile sequence recognition by tyrosine kinases and SH2 domains. We screened several tyrosine kinases against a million-peptide random library and used the resulting profiles to design high-activity sequences. We also screened several kinases against a library containing thousands of human proteome-derived peptides and their naturally-occurring variants. These screens recapitulated independently measured phosphorylation rates and revealed hundreds of phosphosite-proximal mutations that impact phosphosite recognition by tyrosine kinases. We extended this platform to the analysis of SH2 domains and showed that screens could predict relative binding affinities. Finally, we expanded our method to assess the impact of non-canonical and post-translationally modified amino acids on sequence recognition. This specificity profiling platform will shed new light on phosphotyrosine signaling and could readily be adapted to other protein modification/recognition domains.

Editor's evaluation

This paper reports an improved bacterial surface peptide display technology and its use to survey the primary sequence specificities of a broad range of tyrosine kinases and to assess the effects of naturally-occurring positional variations around sites of tyrosine phosphorylation on the efficiency of phosphorylation. The versatility of this approach was demonstrated by using expanded genetic code technology to investigate the consequences of installing post-translationally modified amino acids, such as acetyl-lysine, at positions upstream and downstream of a target tyrosine on the efficiency of phosphorylation by different tyrosine kinases. In addition, pre-phosphorylated surface peptide display libraries were exploited to interrogate the primary sequence binding specificities of SH2 phosphotyrosine-binding domains.

https://doi.org/10.7554/eLife.82345.sa0

Introduction

Cells respond to external stimuli by activating a finely-tuned cascade of enzymatic reactions and protein-protein interactions. This signal transduction is governed, in large part, by post-translational modifications that alter protein activity, stability, and localization, as well as the formation of higher-order macromolecular complexes. Despite its low abundance relative to serine and threonine phosphorylation, tyrosine phosphorylation is an essential post-translational modification in metazoans (Lim and Pawson, 2010). Tyrosine kinases, the enzymes that phosphorylate tyrosine residues on proteins, and Src homology 2 (SH2) domains, protein modules that bind tyrosine-phosphorylated sequences, must have the ability to discriminate among a myriad of potential phosphorylation sites (phosphosites) in the proteome, in order to ensure proper signal transduction. The preferential engagement of specific phosphosites by tyrosine kinases and SH2 domains is dependent on the amino acid sequence surrounding the tyrosine or phosphotyrosine residue (Songyang et al., 1995; Songyang et al., 1993).

Isolated tyrosine kinase domains most efficiently engage phosphosites that conform to specific sequence motifs, which are defined by a small number of key residues that contribute significantly to recognition (Songyang et al., 1995). These motifs suggest a mechanism by which a specific set of phosphosites in a proteome is selectively engaged by an individual kinase, based on the presence of favorable sequence features around that site. Negative selection of specific sequence features can also play a role in kinase specificity (Alexander et al., 2011). For example, the T cell tyrosine kinase ZAP-70 cannot readily phosphorylate co-localized proteins that contain even a modest positive charge (Shah et al., 2016).

Phosphosite sequence recognition by kinase domains is just one mechanism of substrate selection for tyrosine kinases, and other interactions are necessary to achieve efficient substrate targeting in vivo. Binding domains, such as SH2 domains, can strongly influence specificity by localizing kinases to the vicinity of phosphorylation targets (Pawson and Nash, 2000). Secondary interactions between SH2 and kinase domains can also refine the substrate preferences of a tyrosine kinase by stabilizing its active state (Filippakopoulos et al., 2008). Thus, for signaling systems that involve a tyrosine kinase domain and a tethered SH2 domain, the sequence specificities of both domains contribute to the intricate control of phosphotyrosine signaling responses.

Many methods have been developed to characterize sequence recognition by tyrosine kinases and SH2 domains. The most prominent approach employs purified kinases/SH2 domains and oriented peptide libraries, which are synthetic, degenerate peptide libraries with a central tyrosine or phosphotyrosine residue (Songyang et al., 1995; Songyang et al., 1993). Several variations on this technique have been reported to improve the throughput and quantification of sequence preferences (Deng et al., 2014; Huang et al., 2008; Hutti et al., 2004; Mok et al., 2010). Notably, this method is also applicable to serine/threonine kinases, and large swaths of the yeast and human kinomes have been characterized using oriented peptide libraries, providing significant insights into kinase-substrate recognition and phospho-signaling (Deng et al., 2014; Johnson et al., 2023; Mok et al., 2010; Songyang et al., 1995). Oriented peptide library screens have primarily been useful for determining the preference for each amino acid at a given position, independent of sequence context, but evidence suggests that some amino acid preferences may depend on the surrounding sequence (Cantor et al., 2018).

Several groups have developed strategies to compare the phosphorylation of specific sequences, rather than obtain position-averaged amino acid preferences from pooled degenerate libraries. Strategies include ‘one-bead-one-peptide’ combinatorial libraries (Imhof et al., 2006; Ren et al., 2011; Sweeney et al., 2005; Trinh et al., 2013; Wavreille et al., 2007) and protein/peptide microarrays (Amanchy et al., 2008; Jones et al., 2006; Koytiger et al., 2013; Mok et al., 2009; Schutkowski et al., 2004; Uttamchandani et al., 2003). One-bead-one-peptide methods often require manual isolation and individual sequencing of positive (phosphorylated or SH2-bound) beads, making the method technically challenging. Microarrays offer the capacity to analyze thousands of discrete sequences and require small quantities of proteins, but their use can be limited by the high cost of reagents. As an alternative, several groups have conducted mass spectrometry proteomics on heterologously expressed purified peptide libraries, kinase-treated cell extracts, and cells over-expressing a kinase of interest (Barber et al., 2018; Chou et al., 2012; Corwin et al., 2017; Douglass et al., 2012; Finneran et al., 2020; Imamura et al., 2014; Kettenbach et al., 2012; Lubner et al., 2018; Sugiyama et al., 2019; Xue et al., 2012). This strategy has enabled the identification of potential substrates and can also be used to infer position-specific amino acid preferences. Studies using intact proteomes have the added benefit that the kinase of interest is operating on intact proteins, rather than isolated peptides, but interpretation of the results can be convoluted by the presence of endogenous kinases.

Molecular display techniques, such as mRNA, phage, yeast, and bacterial display, have also been used for specificity profiling. Early investigations employed phage or mRNA display to profile tyrosine kinase and SH2 specificity. These methods were relatively low-throughput, as they relied on Sanger sequencing of individual clones (Cujec et al., 2002; Dente et al., 1997). The advent of deep sequencing technologies has transformed this style of specificity profiling, by enabling rapid, quantitative analysis of library composition without requiring the sequencing of individual clones. This was demonstrated recently in a series of studies that employed bacterial/yeast peptide display, fluorescence-activated cell sorting (FACS), and deep sequencing to profile tyrosine kinase and SH2 domain specificity (Cantor et al., 2018; Lo et al., 2019; Shah et al., 2018; Shah et al., 2016; Taft et al., 2019). A key facet of these investigations was the facile generation of peptide libraries tailored to specific mechanistic questions: these included scanning mutagenesis libraries derived from individual substrates (Shah et al., 2016), as well as diverse peptide libraries encoding known phosphosites in the human proteome (Shah et al., 2018).

In this report, we describe a high-throughput platform to profile the recognition of large peptide libraries by any tyrosine kinase or SH2 domain. Our approach uses biotinylated bait proteins (pan-phosphotyrosine antibodies or SH2 domains) and avidin-functionalized magnetic beads to isolate tyrosine kinase-phosphorylated bacterial cells, and is coupled to deep sequencing for a quantitative readout (Figure 1A). The use of magnetic bead-based separation, rather than FACS, permits simultaneous, benchtop processing of multiple samples and enables the analysis of larger libraries for less time and cost. Libraries can be custom-made for specific readouts: mutational scanning for structure-activity relationships, libraries derived from natural proteomes to answer specific signaling questions, or degenerate libraries for the generation of predictive models.

Figure 1 with 3 supplements see all
High-throughput profiling of tyrosine kinase substrate specificity using bacterial peptide display.

(A) Schematic representation of the workflow for kinase specificity profiling. (B) Heatmap depicting the specificity of the c-Src kinase domain, measured using the X5-Y-X5 library. Enrichment scores were log2-transformed and are displayed on a color scale from blue (disfavored sequence features, negative value), to white (neutral sequence features, near zero value), to red (favored sequence features, positive value). Values in the heatmap are the average of three replicates. (C) Correlation between position-specific amino acid enrichments from screens with the 4G10 Platinum and PY20 biotinylated pan-phosphotyrosine antibodies.

To demonstrate the versatility of our approach, we designed two new bacterial peptide display libraries that provide distinct insights into tyrosine kinase and SH2 sequence recognition. The first library contains 106–107 random 11-residue sequences with a central tyrosine (referred to as the X5-Y-X5 library). Screens with the X5-Y-X5 library recapitulate previously reported specificity motifs and can be used to generate highly efficient peptide substrates. The second library contains defined sequences spanning 3000 human tyrosine phosphorylation sites, along with 5000 variant sequences bearing disease-associated mutations and natural polymorphisms (referred to as the pTyr-Var library). Kinase and SH2 screens with the pTyr-Var library reveal hundreds of phosphosite-proximal mutations that significantly impact phosphosite recognition by individual protein domains. These datasets will be a valuable resource in the growing efforts to understand the functional impact of protein variants across the human population that may contribute to disease (Stein et al., 2019). Finally, we show that our peptide display platform is compatible with Amber codon suppression, enabling analysis of how non-canonical or post-translationally modified amino acids impact sequence recognition. Overall, the method described in this report provides an accessible, high-throughput platform to study the specificity of phosphotyrosine signaling proteins.

Results and discussion

A bacterial display and deep sequencing platform to screen tyrosine kinases against large peptide libraries

We expanded upon a previously established screening platform that combines bacterial display of genetically encoded peptide libraries and deep sequencing to quantitatively compare phosphorylation efficiencies across a substrate library (Shah et al., 2016). In the published approach, peptides are displayed on the surface of E. coli cells as fusions to an engineered bacterial surface-display protein, eCPX (Rice and Daugherty, 2008), then phosphorylated by a purified kinase (Henriques et al., 2013). Following this, the cells are labeled with a pan-phosphotyrosine antibody, and cells with high phosphorylation levels are separated by FACS. The DNA encoding the peptides is then amplified and analyzed by Illumina deep sequencing to determine the frequency of each peptide in the library before and after selection (Shah et al., 2018; Shah et al., 2016). In order to determine the phosphorylation efficiency of each peptide by a particular kinase, an enrichment score is determined by calculating the frequency of that peptide in the kinase-selected sample normalized to the frequency in the input sample.

While peptide libraries of virtually any composition can theoretically be screened using this approach, previous implementations focused on libraries containing less than 5000 peptides, due to the low throughput of FACS (Shah et al., 2018). In those experiments, the objective was to over-sample the library at the cell sorting step by a factor of 100–1000, to ensure that enrichment or depletion of every member of the library could be accurately quantified by deep sequencing. When multiple screens were conducted in parallel, the throughput of FACS limited experiments to small libraries (less than 5000 sequences). To improve the scalability and cost-effectiveness of this approach, we switched to a bead-based sorting method, using avidin-coated magnetic beads to enrich highly-phosphorylated cells, thus circumventing the need for FACS (Figure 1A). With this approach, the cells are instead labeled with biotinylated pan-phosphotyrosine antibodies and then sorted using magnetic beads. The use of magnetic beads permits simultaneous separation of multiple samples of virtually any size, enabling larger library analysis for less time and cost. Notably, these screens can be carried out in any laboratory, without the need for a fluorescence-activated cell sorter.

To test our upgraded screening platform, we generated a random library of 11-residue sequences with a central tyrosine (the X5-Y-X5 library, where X is any of the 20 canonical amino acids). The library was generated using a degenerate synthetic oligonucleotide with five NNS codons (N=A,T,G,C and S=G,C) before and after the central codon that encodes for tyrosine (TAT). The NNS triplet has the benefit of encoding all 20 amino acids, but it can still contain an Amber stop codon (TAG) roughly 3% of the time. Therefore, up to 30% of the peptide-coding sequences in the library are expected to have an Amber stop codon – a feature that we take advantage of later in this study. The degenerate oligonucleotide mixture was cloned into a plasmid in between the DNA encoding a signal sequence and the eCPX surface-display scaffold. In a previously reported version of this platform, the eCPX scaffold contained a C-terminal strep-tag to detect surface-display level (Shah et al., 2016). Due to the potential background binding of the strep-tag with the avidin-coated magnetic beads during cell enrichment, we cloned both a strep-tagged and a myc-tagged version of the library. Deep sequencing of both versions of the X5-Y-X5 library confirmed that they have 1–10 million unique peptide sequences, 20% of which contain one or more stop codons. Furthermore, all 20 canonical amino acids were well-represented at each of the 10 variable positions surrounding the fixed tyrosine residue (Figure 1—figure supplement 1). Notably, our library includes peptides containing Cys residues and non-central Tyr residues, both of which are often excluded from tyrosine kinase specificity screens to avoid oxidation-related artifacts and challenges in interpreting signal from multi-Tyr sequences (Deng et al., 2014). These sequences can be filtered during data analysis, if needed, although they did not pose significant issues in our studies.

Using the myc-tagged X5-Y-X5 library, we determined the position-specific amino acid preferences of the kinase domain of c-Src. Cells displaying the library were phosphorylated by c-Src to achieve roughly 20–30% phosphorylation, as determined by flow cytometry (Figure 1—figure supplement 2). The phosphorylated cells were labeled with a biotinylated anti-phosphotyrosine antibody and enriched with magnetic beads, then peptide-coding DNA sequences were counted by deep sequencing. We visualized the sequence preferences of c-Src by generating a heatmap and sequence logo based on the position-specific enrichment scores of each amino acid residue surrounding the central tyrosine (Figure 1B, Figure 1—figure supplement 3). Sequences containing a stop codon were not considered in these calculations, but the depletion of stop codons at each position was separately confirmed and is reported below the heatmap on the same color scale. The preferences determined from this screen matched the sequence specificity of c-Src defined by prior reports using oriented peptide libraries (Deng et al., 2014; Songyang et al., 1995). We observed a strong preference for bulky aliphatic residues (Ile/Leu/Val) at the –1 position relative to the central tyrosine and a phenylalanine at the +3 position (Figure 1B, Figure 1—figure supplement 3). Our results showed modest differences from the specificity observed by oriented peptide libraries, including a strong preference for a+1 Asp/Glu/Ser in addition to the previously reported +1 Gly. To test whether these differences were due to biases introduced by the specific pan-phosphotyrosine antibody used, we obtained a different commercially available biotinylated pan-phosphotyrosine antibody and repeated the screen. The position-specific amino acid enrichments obtained using both antibodies were nearly identical (Figure 1C). This suggests that there is no significant bias in the enrichment of peptides introduced by the pan-phosphotyrosine antibody.

Degenerate library screens capture specificity profiles for diverse tyrosine kinases

We next used the degenerate X5-Y-X5 library to characterize the sequence preferences of four additional tyrosine kinase domains, derived from the non-receptor tyrosine kinases c-Abl and Fer, and the receptor tyrosine kinases EPHB1 and EPHB2. The kinases were selected because they represent a few distinct branches of the tyrosine kinome and can be easily produced through bacterial expression (Albanese et al., 2018). The X5-Y-X5 library was screened against the kinases in triplicate, and the data from replicates were averaged to generate specificity profiles for each kinase (Figure 2A and Figure 2—source data 1). The amino acid preferences for c-Abl are well-characterized and were recapitulated in this screen (Deng et al., 2014; Songyang et al., 1995; Till et al., 1999; Till et al., 1994). Like c-Src, c-Abl preferred bulky aliphatic residues at the –1 position with respect to the central tyrosine. Unlike c-Src, c-Abl preferred an alanine at the +1 position and had a notably strong preference for proline at the +3 position (Figures 1B and 2A, Figure 2—figure supplement 1). Fer showed a specificity pattern distinct from both c-Src and c-Abl, which included a preference for tryptophan residues at the +2,+3, and +4 positions. As expected, the closely related EPHB1 and EPHB2 kinases had similar specificities, which included a unique preference for Asn and Asp at the –1 residue that was not observed for the tested non-receptor tyrosine kinases (Figure 2A, Figure 2—figure supplement 1).

Figure 2 with 2 supplements see all
Specificity profiling of tyrosine kinases using the X5-Y-X5 library.

(A) Heatmaps depicting the specificities of c-Abl, Fer, EPHB1, and EPHB2. Enrichment scores were log2-transformed and are displayed on a color scale from blue (disfavored sequence features, negative value), to white (neutral sequence features, near zero value), to red (favored sequence features, positive value). Values in the heatmaps are the average of three replicates. (B) Sequences of consensus peptides identified through X5-Y-X5 screens, compared with previously reported SrcTide and AblTide sequences. (C) Phosphorylation kinetics of five consensus peptides against five kinases. Initial rates were normalized to the rate of the cognate consensus peptide. All peptides were used at a concentration of 100 μM, and the kinases were used at a concentration of 10–50 nM. Error bars represent the standard deviation from at least three measurements.

Figure 2—source data 1

Position-specific amino acid enrichment matrices from the tyrosine kinase X5-Y-X5 library screens.

Matrices calculated with and without inclusion of multi-tyrosine sequences are provided.

https://cdn.elifesciences.org/articles/82345/elife-82345-fig2-data1-v2.xlsx

Degenerate library screens can be used to design highly-efficient peptide substrates

Specificity profiling methods are often used to design consensus sequences that serve as optimal peptide substrates for biochemical assays and biosensor design (Deng et al., 2014; Lin et al., 2019; Songyang et al., 1995). We wanted to assess whether our method could also be used to generate high-efficiency substrates, and whether these would differ from sequences identified using oriented peptide libraries. To test this, we combined the most favorable amino acids in each position flanking the central tyrosine residue in our specificity profiles, excluding tyrosine, to generate unique consensus peptide substrates for c-Src, c-Abl, Fer, EPHB1, and EPHB2. Consensus sequences for c-Src and c-Abl have been identified previously using oriented peptide libraries (Deng et al., 2014; Songyang et al., 1995). These sequences, often referred to as SrcTide and AblTide, are different than our consensus sequences at a few residues surrounding the phospho-acceptor tyrosine (Figure 2B). The SrcTide and AblTide peptides are canonically embedded within a conserved peptide scaffold containing N-terminal Gly and C-terminal (Lys)3-Gly flanks. For direct comparison, we embedded our consensus peptides in the same scaffold and conducted a series of kinetic studies.

First, we used an in vitro continuous fluorimetric assay to compare the steady-state kinetic parameters (kcat and KM) for our c-Src and c-Abl consensus peptides with the SrcTide and AblTide peptides. The Michaelis-Menten parameters for our Src Consensus peptide were on par with one of the previously reported SrcTide substrates (SrcTide 1995, Songyang et al., 1995), but the KM value for a more recently reported SrcTide variant (SrcTide 2014, Deng et al., 2014) was substantially tighter (Table 1). Our Src Consensus peptide had a higher maximal catalytic rate (kcat) but a lower apparent binding affinity (KM) when compared to both SrcTides. We were surprised to see that our Src Consensus peptide had a+1 Asp residue, as opposed to the +1 Gly residue in both SrcTides. Substitution of the +1 Asp for a Gly in a related peptide marginally improved the KM value but reduced kcat (Table 1). These results indicate that our c-Src specificity screens may select for peptides with a high kcat, and that there is a trade-off between kcat and KM for c-Src substrate recognition. For c-Abl, our consensus peptide had both a higher maximal rate (kcat) and tighter apparent affinity (KM) relative to the previously reported AblTide peptide (Table 1). Collectively, these experiments suggest that different methods may be biased toward slightly different realms of sequence space, and that there are multiple solutions to achieving high-efficiency phosphorylation.

Table 1
Michaelis-Menten parameters for consensus peptides against c-Src and c-Abl kinase domains.

All measurements were carried out using the ADP-Quest assay in three to five replicates. Errors represent the standard error in global fits of all replicates to the Michaelis-Menten equation.

EntryKinasePeptide namePeptide sequencekcat (s–1)KM (μM)
1c-SrcSrc ConsensusGPDECIYDMFPFKKKG4.9±0.4196±38
2c-SrcSrc Consensus (P-5C, D+1 G)GCDECIYGMFPFKKKG4.4±0.297±10
3c-SrcSrcTide (1995)GAEEEIYGEFEAKKKG3.1±0.264±10
4c-SrcSrcTide (2014)GAEEEIYGIFGAKKKG1.8±0.17±3
5c-SrcFer ConsensusGPDEPIYEWWWIKKKG0.4±0.18±4
6c-SrcAbl ConsensusGPDEPIYAVPPIKKKG2.0±0.2159±31
7c-AblAbl ConsensusGPDEPIYAVPPIKKKG3.0±0.26±2
8c-AblAblTide (2014)GAPEVIYATPGAKKKG2.5±0.235±8

Next, we assayed all of the consensus peptides generated using our approach against their cognate kinases, as well as the other kinases in our screens. For the non-receptor tyrosine kinases (c-Src, c-Abl, and Fer), the corresponding consensus peptides were the best substrates tested. At a higher substrate concentration (100 μM), c-Abl also efficiently phosphorylated the Src and EPHB2 consensus peptides (Figure 2C), but selectivity for the Abl Consensus improved at a lower concentration (20 μM), consistent with selectivity being driven by KM for this set of peptides (Figure 2—figure supplement 2 and Table 1). By contrast, c-Src was selective for the Src Consensus peptide at high concentrations (Figure 2C), but showed significant off-target activity toward the Fer Consensus at low concentrations (Figure 2—figure supplement 2). Michaelis-Menten analysis of the Fer Consensus with c-Src revealed that it has a remarkably tight KM for c-Src, with a low kcat as a trade-off (Table 1). Finally, we observed that the receptor tyrosine kinase EPHB1 showed very little selectivity across the consensus peptides and did not prefer its own cognate consensus sequence (Figure 2C and Figure 2—figure supplement 2). EPHB2, on the other hand, efficiently phosphorylated its own consensus peptide, as well as the Abl Consensus. Both of these sequences contain a –1 Ile and +3 Pro (Figure 2B and C). These experiments demonstrate the applicability of our bacterial peptide display method to the design of high-activity substrates. Our results also suggest that not all consensus peptides will be selective for their given kinase, as there can be overlap in substrate specificities.

Data from X5-Y-X5 library screens can be used to predict the relative phosphorylation rates of peptides

Given that data from the X5-Y-X5 library screens could yield high-efficiency substrates, we investigated whether the same data could be used to quantitatively predict the relative phosphorylation rates of biologically interesting sequences. If so, this would be a potentially powerful tool for the identification of native substrates and the dissection of phosphotyrosine signaling pathways. Indeed, oriented peptide libraries have been applied extensively to predict the native substrates of protein kinases (Johnson et al., 2023; Miller et al., 2008; Obenauer et al., 2003). We are particularly interested in using high-throughput specificity screens to predict how mutations proximal to phosphorylation sites affect tyrosine kinase selectivity. The PhosphoSitePlus database documents thousands of missense mutations within five residues of tyrosine phosphorylation sites, many of which are associated with human diseases or are human polymorphisms, but the functional consequences of most of these mutations are unexplored (Hornbeck et al., 2019; Hornbeck et al., 2015; Krassowski et al., 2018; Landrum et al., 2018).

We used the c-Src X5-Y-X5 screening data to predict the relative phosphorylation rates of six peptide pairs, corresponding to reference and variant sequences derived from human phosphorylation sites. Each peptide sequence was scored using an approach that is similar to that used for oriented peptide libraries in the Scansite database (Obenauer et al., 2003; Yaffe et al., 2001). For each peptide sequence, we summed the log2-transformed enrichment values for the appropriate amino acid at each position in the peptide (the numerical values that make up the heatmaps in Figures 1B and 2A). This sum was divided by the number of variable positions (10 positions for all peptides in this study), then normalized to be on a scale from 0 (the worst possible sequence) to 1 (the best possible sequence). We compared the predicted scores to in vitro phosphorylation rates measured using a highly-sensitive assay based on reverse-phase high-performance liquid chromatography (RP-HPLC) (Figure 3 and Figure 3—figure supplement 1). We found that our predictions, which were derived from log-transformed enrichment scores, correlated moderately well with the log-transformed rates of phosphorylation by c-Src (Figure 3A). The predictions could differentiate high, medium, and low activity substrates but could not accurately rank peptides within these clusters. Focusing specifically on the effects of the mutations in this set of peptides, we found that the X5-Y-X5 screening data could accurately predict the directionality of the effects of five out of six mutations (Figure 3B).

Figure 3 with 1 supplement see all
Predicting relative phosphorylation rates using data from X5-Y-X5 library screens.

(A) Correlation between measured phosphorylation rates and X5-Y-X5 predictions for 12 peptides with c-Src. All peptides were used at a concentration of 100 μM, and c-Src was used at a concentration of 500 nM. Error bars represent the standard deviation from at least three rate measurements and three separate scores with individual replicates of the X5-Y-X5 screen. (B) Correlation between the magnitude of mutational effects for 6 peptide pairs with mutational effects predicted from X5-Y-X5 library screens. Error bars represent the standard deviation of at least three rate measurements and three separate scores with individual replicates of the X5-Y-X5 screen.

One drawback to the aforementioned scoring approach, like all models based on position-specific scoring matrices, is that it cannot capture context-dependent amino acid preferences. We recently explored a machine-learning approach, using screening data from a related degenerate library, to model c-Src kinase specificity (Rube et al., 2022). The model not only incorporated pairwise inter-residue dependencies, but also data from multiple time points. This approach could reasonably predict absolute rate constants, as well as the directionality and magnitude of several phosphosite-proximal mutational effects. As an alternative to building models based on random library screens, we reasoned that direct measurements of reference and variant peptides using our screening platform might also provide reliable assessment of mutational effects.

A proteome-derived peptide library accurately measures sequence specificity and phosphorylation rates

To refine our assessment of phosphosite-proximal mutational effects, we designed a library, derived from the PhosphoSitePlus database, that is composed of 11-residue sequences spanning 3159 human phosphosites and 4760 disease-associated variants of these phosphosites bearing a single amino acid substitution (pTyr-Var library; Figure 4—figure supplement 1; Hornbeck et al., 2019). While the majority of sequences in this library contained a single tyrosine residue, some sequences contained multiple tyrosines, for which we included additional variants where the non-central tyrosine residues were mutated to phenylalanine. Including these tyrosine mutants and additional control sequences, such as previously designed consensus substrates, the library totaled ~10,000 unique sequences. As with the X5-Y-X5 library, we generated two versions of this library, bearing a C-terminal strep-tag or myc-tag. We conducted specificity screens with the myc-tagged pTyr-Var library against 7 non-receptor tyrosine kinases (c-Src, Fyn, Hck, c-Abl, Fer, Jak2, and AncSZ, an engineered homolog of Syk and ZAP-70 Hobbs et al., 2022) and 5 receptor tyrosine kinases (EPHB1, EPHB2, FGFR1, FGFR3, and MERTK). The majority of these kinases could be expressed in bacteria and purified in good yield (Albanese et al., 2018; Hobbs et al., 2022). One of these kinases (Jak2) was purchased from a commercial vendor.

Using the catalytically active tyrosine kinase constructs, we identified an optimal concentration (typically between 0.1–1.5 μM) to ensure 20–30% of maximal phosphorylation in three minutes. For some kinases (FGFR1, FGFR3, and MERTK), pre-incubation with ATP was required in order to activate the kinase by auto-phosphorylation (Figure 4—figure supplement 2). We conducted the screens analogously to those with the X5-Y-X5 library, but rather than calculate position-specific residue preferences from the deep sequencing data, we directly calculated enrichment scores for each peptide in the pTyr-Var library (Figure 4A and Figure 4—source data 1). Three to five replicates of the pTyr-Var screen were conducted with each kinase, and the results were reproducible across replicates (Figure 4—figure supplement 3). To validate our pTyr-Var screens, we examined enrichment scores from the c-Src experiments for the same six peptide pairs for which predictions using X5-Y-X5 screening data were only moderately accurate. We found a strong correlation between the pTyr-Var enrichment scores and phosphorylation rates, particularly for high-activity sequences (Figure 4B). Furthermore, the effects of mutations in the screens were consistent with those observed using the in vitro RP-HPLC assay with purified peptides (Figure 4C).

Figure 4 with 9 supplements see all
Specificity profiling of tyrosine kinases using the pTyr-Var library.

(A) Distribution of enrichment scores from pTyr-Var screens with 13 tyrosine kinases. Each point represents a peptide sequence in the pTyr-Var library. Data points in orange-red represent sequences without a Tyr residue and data points in dark gray represent sequences with a Tyr residue. Each dataset represents the average of three to five replicates. (B) Correlation between enrichment scores and measured phosphorylation rates for 12 peptides (100 μM) with c-Src (500 nM). (C) Correlation between the magnitude of mutational effects for 6 peptide pairs in the pTyr-Var library with mutational effects measured using an in vitro kinetic assay. Error bars in panels B and C represent the standard deviation from 3 to 4 rate measurements and four pTyr-Var screens. (D) Matrix of Pearson’s correlation coefficients for all pairwise comparisons between replicate-averaged pTyr-Var datasets for 13 kinases. (E) Volcano plot depicting mutational effects in the pTyr-Var screen with c-Src kinase domain. Data points represent the average of four replicates. Hits are colored orange-red. (F) Percent phosphorylation of SHP2 wild-type, D61V, and D61N (10 μM) after an hour incubation with c-Src, Fyn, and FGFR1 (1 μM). Error bars represent the standard deviation from 2 to 3 measurements.

Figure 4—source data 1

Enrichment scores from tyrosine kinase pTyr-Var screens.

Data are provided in a flat sheet with average and standard deviation values for all kinase-substrate pairs. Data are also provided for each kinase as a side-by-side comparison of enrichment scores reference and variant sequences and whether the mutation was considered a significant in our analysis. Three sheets are provided listing substrates for c-Src, Fyn, and c-Abl that are also found in a curated list of kinase-substrate pairs in the PhosphositePlus database.

https://cdn.elifesciences.org/articles/82345/elife-82345-fig4-data1-v2.xlsx
Figure 4—source data 2

Position-specific amino acid enrichment matrices from the tyrosine kinase pTyr-Var library screens for sequences containing a single central tyrosine residue.

https://cdn.elifesciences.org/articles/82345/elife-82345-fig4-data2-v2.xlsx

A total of 370 peptides in the pTyr-Var library contain no tyrosine residues and thus serve as controls to determine background noise in our screens. For every kinase tested, the tyrosine-free sequences showed distinctly low enrichment scores, consistent with signal in these screens being driven by tyrosine phosphorylation of the surface-displayed peptides (Figure 4A). For each kinase, a subset of the library (between 7% and 10%) showed enrichment scores above this background level (Figure 4—figure supplement 4). To confirm that the pTyr-Var screens were reporting on unique substrate specificities across these tyrosine kinases, we calculated Pearson’s correlation coefficients for the average datasets of each kinase pair and visualized position-specific amino acid preferences as heatmaps (Figure 4D, Figure 4—figure supplement 5, and Figure 4—source data 2). We found strong correlation in specificity between kinases of the same family (the Src-family kinases c-Src/Fyn/Hck, and receptor pairs EPHB1/EPHB2 and FGFR1/FGFR3). We also observed that the specificity of Src-family kinases partly overlapped with the Ephrin receptors and MERTK. The specificity of AncSZ and Jak2 correlated with that of FGFRs.

Next, we compared the results of our pTyr-Var library screens with a curated list of kinase-substrate pairs found in the PhosphositePlus database (Hornbeck et al., 2019). For c-Src, Fyn, and c-Abl, out of the sequences that overlapped between our library and the curated list, 30–40% of the kinase-substrate pairs showed efficient phosphorylation in the peptide-display screen (Figure 4—source data 1). This is consistent with a previous study using bacterial display and a different proteome-derived peptide library (Shah et al., 2018). The modest overlap between peptide screens and literature-reported kinase-substrate pairs is not surprising, given that other mechanisms in kinase-substrate recognition, such as localization, may override kinase domain sequence preferences (Miller and Turk, 2018). Furthermore, the curated list of kinase-substrate pairs comes from both in vitro and in vivo studies and may not accurately represent bona-fide substrates for each kinase.

Natural variants of tyrosine phosphorylation sites impact kinase recognition

For pairs of peptides in the pTyr-Var library that correspond to a disease-associated variant and a reference sequence, we calculated the log2-fold change in enrichment for the variant relative to the reference. The large number of replicates for each screen afforded a robust analysis of phosphosite-proximal mutational effects for each kinase. We filtered the results in five steps to identify significant mutations: (1) We omitted phosphosite pairs where there was no statistically significant difference in enrichment between the variant and reference (p-value cutoff of 0.05). (2) We then applied a second filtering step to remove phosphosite pairs where the fold-change in enrichment between the variant and reference sequence was less than two. (3) Next, we excluded pairs where both sequences were low-activity substrates (enrichment score less than 1.5). (4) We removed mutations that added or removed a tyrosine residue, as their interpretation is ambiguous in our assay. (5) Lastly, we excluded phosphosite pairs in which the average read count of either the variant or wild-type sequence was less than 50. This left us with unique set of 50–400 high-confidence candidates for each tyrosine kinase (Figure 4E, Figure 4—figure supplement 6, and Figure 4—source data 1). From this filtered list, we found that kinases showed distinct patterns of mutational sensitivity at each position around the central tyrosine, consistent with their distinct sequence preferences (Figure 4—figure supplement 7).

For c-Src, we identified 381 high-confidence mutations (Figure 4E). A number of these mutations were on proteins involved in neurotrophin-regulated signaling, cyclin-dependent serine/threonine kinase activity, and other receptor/non-receptor tyrosine kinase activity. We found notable mutational effects at a known target of c-Src, Tyr 149 of the tumor suppressor protein FHL1 (Wang et al., 2018), as well as on other proteins known to interact with c-Src, such as the lipid and protein phosphatase PTEN and the immune receptor LILRB4 (Kang et al., 2016; Lu et al., 2003). We were particularly interested in cases where a kinase not previously known to phosphorylate a specific phosphosite showed a dramatic gain-of-function upon phosphosite-proximal mutation. For example, we found that the R982C mutation, proximal to Tyr 981 on the receptor tyrosine kinase RET, significantly enhanced phosphorylation by c-Src (Figure 4—figure supplement 8). This phosphosite is a known to engage the SH2 domain of c-Src and facilitate c-Src activation upon recruitment to RET, but it is not considered a kinase substrate of c-Src (Encinas et al., 2004). This mutation could potentially rewire signaling by promoting phosphorylation of RET by c-Src, and in doing so, sustaining c-Src activation by its binding to phospho-RET. The RET R982C mutation also enhanced Tyr 981 phosphorylation by several other kinases, most notably Fer (Figure 4—figure supplement 8). These examples show how the pTyr-Var data could be used as a resource to guide mutation-focused signaling studies.

To further validate our approach, we examined the effects of phosphosite-proximal mutations on the phosphorylation of an intact protein, rather than a peptide. Tyr 62 in the tyrosine phosphatase SHP2 sits within a region of this protein that is frequently mutated in various human diseases (Tartaglia et al., 2006), and this residue is highly phosphorylated in receptor tyrosine kinase-driven cancers (Gillette et al., 2020; Pfeiffer et al., 2022). Several Tyr 62-proximal mutations are encoded in the pTyr-Var library. In our screens, the reference peptide for Tyr 62 was preferentially phosphorylated by receptor tyrosine kinases, such as FGFR1, over non-receptors such as c-Src and Fyn, and nearby mutations showed varied effects on Tyr 62 phosphorylation, depending on the kinase tested (Figure 4—figure supplement 9). For example, D61V enhanced and D61N attenuated phosphorylation by Src-family kinases, but these mutations had little impact on recognition by FGFR1. To assess whether the effects of D61 mutations in the screens were retained in the context of the intact protein, we monitored phosphorylation of wild-type, D61V, and D61N SHP2 by c-Src, Fyn, and FGFR1 using intact protein mass spectrometry. We made two modifications to SHP2 to facilitate measurements: (1) substitution of the catalytic residue (C459E) to prevent dephosphorylation by the SHP2 phosphatase domain and (2) deletion of the disordered C-terminal tail to avoid background phosphorylation of an accessible site. Our measurements recapitulated the relative phosphorylation efficiencies for the Tyr 62 reference peptides, with Fyn being the slowest, and FGFR1 being the fastest (Figure 4F and Figure 4—figure supplement 9). Both D61V and D61N dramatically enhanced phosphorylation by all three kinases, consistent with reports that mutations at this site dramatically alter SHP2 structure and probably also increase Tyr 62 accessibility (Keilhack et al., 2005). For c-Src and Fyn, but not FGFR1, D61V showed a stronger enhancement of phosphorylation than D61N, consistent with our peptide screens (Figure 4F and Figure 4—figure supplement 9). The effects of these mutations in SHP2 on signal rewiring in cells warrants further investigation.

Position-specific amino acid preferences for tyrosine kinases are context-dependent

As noted earlier, position-specific scoring matrices do not reflect context-dependent sequence preferences. To illustrate this further, we scored peptide sequences in the pTyr-Var library using the position-specific scoring matrices generated from the X5-Y-X5 library. For peptides that showed significant enrichment in the pTyr-Var screens (enrichment >1), there was a modest correlation with the scores predicted using the X5-Y-X5 library, with many outliers (Figure 5A and Figure 5—figure supplement 1). We selected peptides for c-Src and c-Abl that were high-activity sequences based on the pTyr-Var screens (enrichment >4) but deviated significantly from canonical recognition motifs, and therefore were low scoring (score <0.5). The peptides selected for c-Src had unfavorable residues downstream of the central tyrosine (+1 Arg and +3 Gly for MISP_Y95;+1 Asn,+2 Arg, and +3 Glu for HLA-DPB1_Y59_F64L_YF). For c-Abl, the peptides had an unfavorable –1 Glu and +2 Ser (SIRPA_Y496_P491L) or an unfavorable +2 Glu and +3 Gly (HGD_Y166_F169L). We measured phosphorylation rates for these peptides using our RP-HPLC assay. Phosphorylation rates for these peptides deviated from what would be expected based on a position-specific scoring matrix (Figure 5B, Figure 5—figure supplement 1, and Figure 5—source data 1). This suggests that the putatively unfavorable sequence features in these peptides were tolerated in their specific sequence contexts.

Figure 5 with 2 supplements see all
Context-dependent effects of tyrosine kinase recognition.

(A) Correlation of enrichment scores measured for c-Src in the pTyr-Var library screen with scores predicted from the X5-Y-X5 library using a position-specific scoring matrix. (B) Correlation between predicted scores and measured phosphorylation rates for 14 peptides (100 μM) with c-Src (500 nM). Peptides that could not be accurately scored by the X5-Y-X5 data are highlighted in orange. (C) Correlation of variant effects measured in the pTyr-Var library screen with those predicted from the X5-Y-X5 library screen for c-Src. Several points lie in the top-left and bottom-right quadrants, indicating a discrepancy between the measured mutational effect in the pTyr-Var screen and the predicted mutational effect from the X5-Y-X5 screen. (D) Effects of serine-to-proline substitution at the –2 position in various assays with c-Src. The left panels show the enrichment levels of –2 serine and proline in the X5-Y-X5 screen (top), and the effect of a –2 serine to proline substitution in a specific peptide in the pTyr-Var screen, (bottom). The right panels show rate measurements using the RP-HPLC assay for the same substitution in the Src consensus peptide (top) and the peptide from the pTyr-Var screen (bottom).

Figure 5—source data 1

Peptide sequences and their phosphorylation rates by c-Src or c-Abl, measured using the RP-HPLC kinetic assay.

https://cdn.elifesciences.org/articles/82345/elife-82345-fig5-data1-v2.xlsx
Figure 5—source data 2

Mutational effects measured from the pTyr-Var library screens and their corresponding predictions based on the X5-Y-X5 library screening data.

Only those sequence pairs with high-quality sequencing data (read counts >100) and a single central tyrosine were included in the analysis.

https://cdn.elifesciences.org/articles/82345/elife-82345-fig5-data2-v2.xlsx

The observation that there are context-dependent sequence preferences for kinase-substrate interactions has important consequences for predicting the effects of phosphosite-proximal mutations. The same substitution could have different effects depending on the composition of the surrounding sequence. This phenomenon is uniquely visible in our screening approach, as we are measuring the phosphorylation of defined peptide sequences, and we are conducting screens with thousands of peptide pairs that vary by only a single amino acid substitution. To test our hypothesis, we assessed whether the directionality of mutational effects observed for specific peptides in the pTyr-Var screen could be predicted using the position-specific scoring matrix derived from the X5-Y-X5 screen (which would represent the effect of making a substitution averaged over all sequence contexts). While the directionality of the effect of most mutations could be predicted by the X5-Y-X5 screen, we observed many mutations that showed a significant effect where none was predicted, as well as mutations where the effect was the opposite of what was predicted (Figure 5C, Figure 5—figure supplement 2, and Figure 5—source data 2).

To validate this observation, we selected a peptide pair in the pTyr-Var library where a mutation (–2 Ser to Pro) had the opposite effect of that predicted by our X5-Y-X5 screen for c-Src (Figure 5D), as well as published results with oriented peptide libraries (Begley et al., 2015; Obenauer et al., 2003). Additionally, we made the same substitutions to the c-Src consensus peptide to determine whether the X5-Y-X5 predictions would hold true in that context. Measurements of these purified peptides by c-Src show that the same amino acid substitution had different impacts on c-Src recognition, depending on the sequence context (Figure 5D). A previous study that analyzed the specificity of the epidermal growth factor receptor (EGFR) kinase using bacterial peptide display showed that the effect of mutations at the –2 position was sometimes dependent on the identity of the –1 residue (Cantor et al., 2018). Molecular dynamics analyses in that report suggested that the amino acid identity at the –1 position determined how the side chain of the –2 residue was presented to the kinase, and vice versa, thereby dictating context-dependent preferences at both positions. Our pTyr-Var screens suggest that context dependent sequence preferences may be commonplace. Depending on the kinase, 5–15% of all significant mutations in the pTyr-Var screen had the opposite effect of that predicted using the X5-Y-X5 library data. Mapping these context-dependent effects comprehensively could have a significant impact on our ability to predict native substrates of kinases, and it will improve our understanding of the structural basis for substrate specificity.

Phosphorylation of bacterial peptide display libraries enables profiling of SH2 domains

In previous implementations of our bacterial peptide display and deep sequencing approach, the specificities of phosphotyrosine recognition domains (e.g. SH2 domains and phosphotyrosine binding (PTB) domains) were analyzed in addition to tyrosine kinase domains (Cantor et al., 2018; Lo et al., 2019). This approach required two amendments to the kinase screening protocol. First, the surface-displayed libraries were phosphorylated to saturating levels using a cocktail of tyrosine kinases. Second, because phosphotyrosine recognition domains generally have fast dissociation rates from their ligands (Morimatsu et al., 2007; Oh et al., 2012), making binding-based selection assays challenging, constructs were generated in which two identical copies of an SH2 domain were artificially fused together. The tandem-SH2 constructs enhanced avidity for phosphopeptides displayed on the cell surface through multivalent effects, thereby enabling enrichment of cells via FACS (Cantor et al., 2018).

For this study, we reasoned that a multivalent SH2 construct could be mimicked by functionalizing avidin-coated magnetic beads with biotinylated SH2 domains. These SH2-coated beads could then be used to select E. coli cells displaying enzymatically phosphorylated peptide display libraries, followed by deep sequencing to determine SH2 sequence preferences (Figure 6A). Thus, we first established a protocol to produce site-specifically biotinylated SH2 domains in E. coli, by co-expressing an Avi-tagged SH2 construct with the biotin ligase BirA (Gräslund et al., 2017). This system yielded quantitatively biotinylated SH2 domains, as confirmed by mass spectrometry (Figure 6—figure supplement 1). Since the biotinylated SH2 domains could be produced in high yields through bacterial expression, the recognition domains were immobilized on the magnetic beads at saturating concentrations to ensure a uniform concentration across experiments. This also prevented background binding of strep-tagged libraries to the beads, making this method compatible with previously reported strep-tagged libraries (Cantor et al., 2018; Shah et al., 2018; Shah et al., 2016).

Figure 6 with 8 supplements see all
High-throughput profiling of SH2 domain ligand specificity using bacterial peptide display.

(A) Schematic representation of the workflow for SH2 domain specificity profiling. (B) Heatmaps depicting the specificities of the c-Src, SHP2-C, and Grb2 SH2 domains, measured using the X5-Y-X5 library. Enrichment scores were log2-transformed and are displayed on a color scale from blue (disfavored), to white (neutral), to red (favored). Values in the heatmaps are the average of three replicates. (C) Distribution of enrichment scores from pTyr-Var screens with three SH2 domains and the pan-phosphotyrosine antibody 4G10 Platinum. Each point represents a peptide sequence in the library. The antibody selection was done similar to the kinase screens, with antibody labeling of cells, followed by bead-based enrichment, as opposed to cell enrichment with antibody-saturated beads. Each dataset represents the average of three replicates. (D) Correlation between enrichment scores for 9 peptides from the pTyr-Var screen and binding affinities measured using a fluorescence polarization assay. Error bars represent the standard deviations from three screens or binding measurements. (E) Examples of phosphosite-proximal mutations that selectively enhance binding to specific SH2 domains. Error bars represent the standard deviations from three screens.

Figure 6—source data 1

Position-specific amino acid enrichment matrices from the SH2 domain X5-Y-X5 library screens.

Matrices calculated with and without inclusion of multi-tyrosine sequences are provided.

https://cdn.elifesciences.org/articles/82345/elife-82345-fig6-data1-v2.xlsx
Figure 6—source data 2

Enrichment scores from SH2 domain pTyr-Var screens.

Data are provided in a flat sheet with average and standard deviation values for all SH2-ligand pairs. Data are also provided for each SH2 domain as a side-by-side comparison of enrichment scores reference and variant sequences and whether the mutation was considered a significant in our analysis.

https://cdn.elifesciences.org/articles/82345/elife-82345-fig6-data2-v2.xlsx
Figure 6—source data 3

Position-specific amino acid enrichment matrices from the SH2 domain pTyr-Var library screens for sequences containing a single central tyrosine residue.

https://cdn.elifesciences.org/articles/82345/elife-82345-fig6-data3-v2.xlsx

To implement SH2 specificity screens, the strep-tagged X5-Y-X5 library was phosphorylated to a high level using a mixture of c-Src, c-Abl, AncSZ, and EPHB1 (Figure 6—figure supplement 2). The phosphorylated library was screened against three SH2 domains that fall into distinct specificity classes and are derived from three different types of signaling proteins: the SH2 domain from the tyrosine kinase c-Src, the C-terminal SH2 (C-SH2) domain from the tyrosine phosphatase SHP2, and the SH2 domain from the non-catalytic adaptor protein Grb2 (Figure 6B, Figure 6—figure supplement 3, and Figure 6—source data 1). The X5-Y-X5 library screens recapitulated known sequences preferences for each SH2 domain. For c-Src, there was a distinctive preference for –2 His,+1 Asp/Glu, and +3 Ile, as previously reported from oriented peptide libraries (Huang et al., 2008). For Grb2, a characteristic +2 Asn preference dominated the specificity profile (Gram et al., 1997; Huang et al., 2008; Kessels et al., 2002; Rahuel et al., 1996; Songyang et al., 1994). Notably, our Grb2 screen also reveals subtle amino acid preferences at other positions, which could tune the affinity for +2 Asn-containing sequences. Several studies have measured the sequence specificity of the SHP2 C-SH2 domain using diverse methods, including peptide microarrays, oriented peptide libraries, and one-bead-on-peptide libraries (Huang et al., 2008; Miller et al., 2008; Sweeney et al., 2005; Tinti et al., 2013). The results of these reported screens are not concordant. Our method indicates a preference for β-branched amino acids (Thr/Val/Ile) at the –2 position, a small residue (Ala/Ser/Thr) at the +1 position, and strong preference for an aliphatic residue (Ile/Val/Leu) at the +3 position. Our results are most in-line with the one-bead-one-peptide screens (Sweeney et al., 2005).

We next phosphorylated and screened the pTyr-Var library against the same three SH2 domains in triplicate (Figure 6C, Figure 6—source data 2, and Figure 6—source data 3). The replicates for each SH2 domain were highly correlated, but datasets between SH2 domains had poor correlation, suggesting distinct ligand specificities (Figure 6—figure supplement 4). As observed for kinases, we saw negligible enrichment of peptides lacking a tyrosine residue, but each SH2 domain showed strong enrichment of a few hundred peptides containing one or more tyrosines (Figure 6C). With the phosphorylated pTyr-Var library, we also carried out selection with a biotinylated pan-phosphotyrosine antibody to assess the level of bias in phosphorylation across the library. Compared to selection with SH2 domains, selection with the antibody yielded a narrower distribution of enrichment scores, with very few highly enriched sequences, suggesting relatively uniform phosphorylation (Figure 6C). We further validated the SH2 screening method by measuring the binding affinities of 9 peptides from the pTyr-Var library with the c-Src SH2 domain, using a fluorescence polarization binding assay. Enrichment scores from the pTyr-Var screen showed a good linear correlation with measured Kd values over two orders of magnitude (Figure 6D).

The pTyr-Var library screens with the SH2 domains were analyzed and filtered similarly to those with kinase domains. For each SH2 domain, we identified 50–300 phosphosite-proximal mutations that significantly and reproducibly enhanced or attenuated binding (Figure 6—figure supplement 5 and Figure 6—source data 2). As expected, given their distinct specificities, the c-Src, SHP2-C, and Grb2 SH2 domains showed unique sensitivities to mutations (Figure 6—figure supplement 6). We identified several phosphosite-proximal mutations that were selectively gain-of-function for one or two SH2 domains (Figure 6E and Figure 6—source data 2). These mutations could drive the rewiring of signaling pathways by changing which downstream effector engages a phosphosite. This phenomenon was recently reported for lung-cancer associated mutations near phosphorylation sites in EGFR, which impacted the recruitment of Grb2 and SHP2 to the receptor and altered downstream signaling (Lundby et al., 2019).

Finally, we note that our pTyr-Var datasets included screens with both the kinase and SH2 domains of c-Src. When the SH2 domain of c-Src interacts with phosphoproteins, it both localizes the kinase domain in proximity to its substrates and activates the enzyme (Liu et al., 1993). Our screens revealed that the phosphorylation profiles of c-Src kinase and SH2 domains against the pTyr-Var were completely orthogonal (Figure 6—figure supplement 7). Their starkly different activities toward the pTyr-Var library can largely be attributed to kinase domain preference for a+3 Phe and SH2 domain preference for a+3 Ile/Val/Leu/Met. This is in contrast to previous observations for c-Abl, which has kinase and SH2 domains with largely overlapping sequence specificities, dominated by a+3 Pro preference (Songyang et al., 1995). For c-Src, phosphosite mutations that impacted recognition by one domain generally had no effect on the other, because preferred sequence features for one domain were typically tolerated (neutral) for the other (Figure 6—figure supplement 8). A consequence of this is that phosphosite-proximal mutations may alter c-Src function in two mechanistically distinct ways: (1) mutations that enhance SH2 binding can alter the localization and local activation of c-Src or (2) mutations that enhance kinase recognition will directly increase phosphorylation rates by c-Src. These insights highlight value in profiling multiple domains of the same signaling protein against the same peptide library.

Amber codon suppression yields an expanded repertoire of peptides for specificity profiling

The specificity profiling screens described thus far were constrained to sequences that contain the canonical twenty amino acids. Several studies have suggested that non-canonical amino acids and post-translationally modified amino acids can also impact sequence recognition by kinases and SH2 domains (Alfaro-Lopez et al., 1998; Begley et al., 2015; Chapelat et al., 2012; Johnson et al., 2023; Yeh et al., 2001). The most notable example of this is phospho-priming, whereby phosphorylation of one residue on a protein enhances the ability of a kinase to recognize and phosphorylate a proximal residue. This phenomenon was recently described for EGFR, which preferentially phosphorylates sequences containing a tyrosine followed by a+1 phosphotyrosine (Begley et al., 2015). Other prevalent post-translational modifications, such as lysine acetylation, may also impact the ability of kinases or SH2 domains to recognize a particular phosphosite (Parker et al., 2014; Rust and Thompson, 2011).

We sought to expand our specificity profiling method to incorporate non-canonical and post-translationally modified amino acids (Figure 7A). Since our libraries are genetically encoded, we employed Amber codon suppression and repurposing, using engineered tRNA molecules and aminoacyl tRNA synthetases (Amiram et al., 2015; Xie et al., 2007; Zheng et al., 2018). The degenerate (X) positions in our X5-Y-X5 library are encoded using an NNS codon, which means that an Amber codon (TAG) is sampled at each position 3% of the time. Thus, this library theoretically contains a sufficiently large number of diverse sequences to profile specificity with a 21 amino acid alphabet. For Amber suppression in E. coli, tRNA/synthetase pairs are commonly expressed from pEVOL or pULTRA plasmids (chloramphenicol and streptomycin resistant, respectively) (Chatterjee et al., 2013). Both of these systems are incompatible with our surface-display platform, which uses MC1061 cells (streptomycin resistance encoded in the genome) and libraries in a pBAD33 vector (chloramphenicol resistant). Thus, we designed a variant of the pULTRA plasmid in which we swapped the streptomycin resistance gene for an ampicillin resistance gene from a common pET vector for protein expression (pULTRA-Amp).

Figure 7 with 3 supplements see all
Expansion of peptide display libraries using Amber suppression.

(A) Non-canonical amino acids used in this study. CMF = 4-carboxymethyl phenylalanine, AzF = 4-azido phenylalanine, and AcK = N-ε-acetyl-L-lysine. (B) Amber suppression in the strep-tagged X5-Y-X5 library using CMF. Library surface-display level was monitored by flow cytometry using a fluorophore-labeled StrepMAB antibody for samples with or without Amber suppression components. (C) AzF labeling on bacterial cells using a DIBO-conjugated fluorophore. Cells expressing the X5-Y-X5 library, with and without various Amber suppression components, were treated with DIBO-conjugated Alexa Fluor 555 then analyzed by flow cytometry. (D) Heatmaps depicting the specificities of c-Src, Hck, and c-Abl after CMF or acetyl lysine incorporation. Only sequences with one stop codon were used in this analysis. Enrichment scores were log2-transformed and are displayed on a color scale from blue (disfavored), to white (neutral), to red (favored). Values in heatmaps are the average of three replicates.

To confirm that non-canonical amino acids could be incorporated into the X5-Y-X5 library, we co-transformed E. coli with the library and a pULTRA-Amp plasmid encoding a tRNA/synthetase pair that can incorporate 4-carboxymethyl phenylalanine (CMF) via Amber suppression (Figure 7A; Xie et al., 2007). We measured peptide display levels by flow cytometry for cultures that were grown with or without CMF in the media. For the cultures grown without CMF, roughly 20% of the cells had no surface-displayed peptides, consistent with termination of translation at Amber codons within the peptide-coding region (Figure 7B). In the presence of CMF, this premature termination was significantly suppressed, and a larger fraction of the cells displayed peptides. As an additional test, we incorporated 4-azido phenylalanine (AzF) into the X5-Y-X5 library (Figure 7A; Amiram et al., 2015). Cells expressing this expanded library were treated with a dibenzocyclooctyne (DIBO)-functionalized fluorophore, which should selectively react with the azide on AzF via strain-promoted azide-alkyne cycloaddition (Ning et al., 2008). Only cells expressing the synthetase and grown in the presence of AzF showed significant DIBO labeling, confirming Amber suppression and non-canonical amino acid incorporation into our library (Figure 7C).

Using this library expansion strategy, we assessed how substrate recognition by c-Src is impacted by neighboring CMF or acetyl-lysine residues. We subjected CMF- or AcK-containing X5-Y-X5 libraries to c-Src phosphorylation, selection, and sequencing, using the same methods described above. When analyzing X5-Y-X5 libraries in standard kinase and SH2 screens, we typically omit all Amber-containing sequences from our calculations, as they do not encode expressed peptides (Figure 1B and Figure 2A). For these experiments, we included Amber-containing sequences in our analysis. Using this strategy, we found that the Amber codon was less depleted at each position surrounding the central tyrosine than we observed for libraries without Amber suppression, but the log-transformed enrichment scores for Amber codons at all positions surrounding the tyrosine residue were still negative (Figure 7—figure supplement 1). We reasoned that, if Amber suppression efficiency was not 100%, any Amber-containing sequence would still be depleted relative to a sequencing lacking a stop codon, due to some premature termination. Thus, we re-analyzed the data by exclusively counting sequences that contained one Amber codon, under the assumption that every sequence would have approximately the same amount of premature termination. This revealed positive enrichment for CMF and AcK at select positions (Figure 7D and Figure 7—figure supplement 1). Although we only included a fraction of the total library in our new analysis, the overall specificity profile was almost identical to that observed when including the whole library, indicating that this sub-sampling approach was valid (Figure 7—figure supplement 2).

Next, we compared the preferences for CMF and AcK at each position to their closest canonical amino acids, phenylalanine (Phe) and lysine (Lys). CMF was enriched at the –3 and –2 positions, where Phe is not tolerated by c-Src (Figure 7D). Negatively-charged amino acids (Asp and Glu) are also preferred at these positions, and the negative charge on the carboxymethyl group of CMF at neutral pH may be able to mimic this recognition. c-Src has a strong selective preference for Phe at the +3 position, which it engages via a well-formed hydrophobic pocket near the active site (Bose et al., 2006; Shah et al., 2018). The charged carboxymethyl group on CMF is likely to be incompatible with this mode of binding, consistent with depletion of CMF at this site (Figure 7D). The difference between Lys and AcK was even more striking. Lys is unfavorable for c-Src at every position around the phospho-acceptor tyrosine. By contrast, AcK was not only tolerated, but even favorable at a few positions (Figure 7D).

To determine whether the position-specific responsiveness to lysine acetylation was kinase-dependent, we also performed additional screens of the AcK-containing X5-Y-X5 library with Hck and c-Abl. These screens showed that all three kinases had very similar position-dependent tolerance for AcK over Lys, with the closely-related c-Src and Hck being more similar to one another than their distant relative c-Abl (Figure 7D). Finally, we assessed how the effect of lysine acetylation translated to actual changes in phosphorylation rates. We produced variants of the c-Src and c-Abl consensus peptides with Lys or AcK at various positions and measured their rates of phosphorylation by their respective cognate kinases (Figure 7—figure supplement 3). Of the positions tested (−2,+1, and +5 relative to the tyrosine), we saw the largest effect at the +1 position, consistent with the screens. At the +1 position, where Lys is not tolerated, acetylation enhanced activity as much as five-to-ten-fold, depending on the peptide concentration. In the long-term, we envision using this approach to predict sites in the proteome where lysine acetylation creates new, high-activity substrates for tyrosine kinases. Furthermore, the same analysis could be applied to other tyrosine kinases and to SH2 domains, and our strategy could be readily expanded to other post-translational modifications that can be encoded using Amber suppression.

Concluding remarks

In this report, we describe a significant expansion to a previously developed method for profiling the sequence specificities of tyrosine kinases and SH2 (phosphotyrosine recognition) domains (Cantor et al., 2018; Shah et al., 2018; Shah et al., 2016). Our method relies on bacterial display of DNA-encoded peptide libraries and deep sequencing, and it enables the simultaneous analysis of multiple phosphotyrosine signaling proteins against thousands-to-millions of peptides or phosphopeptides. The resulting data can be used to design high-activity consensus sequences, predict the activities of uncharacterized sequences, and accurately measure the effects of amino acid substitutions on sequence recognition. A notable feature of our platform is that it relies on deep sequencing as a readout, yielding quantitative results. Furthermore, the data generated from our screens show a strong correlation with phosphorylation rates and binding affinities measured using orthogonal biochemical assays.

We envision a number of exciting applications of this expanded specificity profiling platform. Several recent reports have aimed to explain the molecular basis for tyrosine kinase and SH2 sequence specificity and affinity, by combining protein sequence and structure analysis with specificity profiling data (Bradley et al., 2021; Creixell et al., 2015a; Kaneko et al., 2010; Liu et al., 2019). The rich datasets generated using our platform will augment these approaches, particularly when coupled with screening data for additional proteins. A long-term goal of these efforts will undoubtedly be to accurately predict the sequence specificity and signaling properties of any uncharacterized phosphotyrosine signaling protein, such as a disease-associated kinase variant (Creixell et al., 2015b). Given the nature of the data generated by our platform, we expect that it will also aid the development and implementation of machine learning models for sequence specificity and design (Creixell et al., 2015a; Cunningham et al., 2020; Kundu et al., 2013). Indeed, our initial efforts in this realm suggest that specificity profiling data using the X5-Y-X5 library, without any protein structural information, may be sufficient to build models of sequence specificity that can accurately predict phosphorylation rates (Rube et al., 2022).

The pTyr-Var Library described in this report provides a unique opportunity to investigate variant effects across the human proteome. The vast majority of mutations near tyrosine phosphorylation sites are functionally uncharacterized (Hornbeck et al., 2019; Krassowski et al., 2018). Our screens are yielding some of the first mechanistic biochemical hypotheses about how many of these mutations could impact cell signaling. For example, these datasets will allow us to identify mutations that tune signaling pathways by altering the phosphorylation efficiency of specific phosphosites or the binding of SH2-containing effector proteins to those sites. Alternatively, these screens may help identify instances of network rewiring, in which a phosphosite-proximal mutation alters the canonical topology of a pathway by changing which kinases phosphorylate a phosphosite or which SH2-containing proteins get recruited to that site. The biological effects of signal tuning and rewiring caused by phosphosite-proximal mutations remain largely unexplored.

Our high-throughput platform to profile tyrosine kinase and SH2 sequence recognition is accessible and easy to use in labs that are equipped to culture E. coli and execute common molecular biology and biochemistry techniques. Screens can be conducted on the benchtop with proteins produced in-house or obtained from commercial vendors. Peptide libraries of virtually any composition, tailored to address specific biochemical questions, can be produced using commercially available oligonucleotides and standard molecular cloning techniques. Furthermore, facile chemical changes to the library (e.g. enzymatic phosphorylation or the introduction of non-canonical amino acids via Amber suppression) afford access to new biochemical questions. For example, the tyrosine-phosphorylated libraries described here will also be useful for the characterization of tyrosine phosphatase specificity, and acetyl-lysine-containing libraries could be used to profile lysine deacetylases and bromodomains. Additional amendments to this platform will enable the analysis of serine/threonine kinases and other protein modification or recognition domains, adding to the growing arsenal of robust methods for the high-throughput biochemical characterization of cell signaling proteins.

Materials and methods

Expression and purification of tyrosine kinase domains

Request a detailed protocol

Constructs for the kinase domains of c-Src, c-Abl, Fyn, Hck, AncSZ, Fer, FGFR1, FGFR3, EPHB1, EPHB2, and MERTK all contained an N-terminal His6-tag followed by a TEV protease cleavage site. These proteins were co-expressed in E. coli BL21(DE3) cells with the YopH tyrosine phosphatase. Cells transformed with YopH and the tyrosine kinase domains were grown in LB supplemented with 100 μg/mL ampicillin and 100 μg/mL streptomycin at 37 °C. Once cells reached an optical density of 0.5 at 600 nm, 500 uM of Isopropyl-β-D-1-thiogalactopyranoside (IPTG) was added to induce the expression of proteins and the cultures were incubated at 18 °C for 14–16 hours. Cells were harvested by centrifugation (4000 rpm at 4 °C for 30 min), resuspended in a lysis buffer containing 50 mM Tris, pH 7.5, 300 mM NaCl, 20 mM imidazole, 2 mM β-mercaptoethanol (BME), 10% glycerol, plus protease inhibitor cocktail, and lysed using sonication (Fisherbrand Sonic Dismembrator). After separation of insoluble material by centrifugation (33,000 g at 4 °C for 45 min), the supernatant was applied to a 5 mL HisTrap Ni-NTA column (Cytiva). The resin was washed with 10 column volumes of lysis buffer and wash buffer containing 50 mM Tris, pH 8.5, 50 mM NaCl, 20 mM imidazole, 2 mM BME, 10% glycerol. The protein was eluted with 50 mM Tris, pH 8.5, 300 mM NaCl, 500 mM imidazole, 2 mM BME, and 10% glycerol.

The eluted protein was further purified by anion exchange on a 5 mL HiTrap Q column (Cytiva) and eluted with a gradient of 50 mM to 1 M NaCl in 50 mM Tris, pH 8.5, 1 mM TCEP-HCl and 10% glycerol. The His6-TEV tag of the collected fractions were cleaved by the addition of 0.10 mg/mL TEV protease overnight. The reaction mixture was subsequently flowed through 2 mL of Ni-NTA resin (ThermoFisher). The cleaved protein was collected in the flow-through and washes, then concentrated by centrifugation in an Amicon Ultra-15 30 kDa MWCO spin filter (Millipore). The concentrate was separated on a Superdex 75 16/600 gel filtration column (Cytiva), equilibrated with 10 mM HEPES, pH 7.5, 100 mM NaCl, 1 mM TCEP, 5 mM MgCl2, 10% glycerol. Pure fractions were pooled, aliquoted, and flash frozen in liquid N2 for long-term storage at –80 °C.

Expression and purification of biotinylated SH2 domains

Request a detailed protocol

Grb2 SH2 (56-152), c-Src SH2 (143-250), and SHP2 CSH2 (105-220) domains were cloned into a His6-SUMO-SH2-Avi construct and were co-expressed with biotin ligase BirA in E. coli C43(DE3) cells. Specifically, cells transformed with both BirA and SH2 domains were grown in LB supplemented with 100 µg/mL kanamycin and 100 µg/mL streptomycin at 37 °C until cells reached an optical density of 0.5 at 600 nm. The temperature was brought down to 18 °C, protein expression was induced with 1 mM IPTG, and the media was also supplemented with 250 µM biotin to facilitate biotinylation of the Avi-tagged SH2 domains in vivo. Proteins expression was carried out at 18 °C for 14–16 hours. After removal of media by centrifugation, the cells were resuspended in a lysis buffer containing 50 mM Tris, pH 7.5, 300 mM NaCl, 20 mM imidazole, 10% glycerol, and 2 mM BME, supplemented with protease inhibitor cocktail. The cells were lysed using sonication (Fisherbrand Sonic Dismembrator), and the lysate was clarified by ultracentrifugation. The supernatant was applied to a 5 mL Ni-NTA column (Cytiva). The resin was washed with 10 column volumes each of buffers containing 50 mM Tris, pH 7.5, 300 mM NaCl, 20 mM imidazole, 10% glycerol, and 2 mM BME and 50 mM Tris, pH 7.5, 50 mM NaCl, 20 mM imidazole, 10% glycerol, and 2 mM BME. The protein was eluted in a buffer containing 50 mM Tris pH 7.5, 300 mM NaCl, 500 mM imidazole, 10% Glycerol.

The eluted protein was further purified by ion exchange on a 5 mL HiTrap Q anion exchange column (Cytiva). The following buffer was used: 50 mM Tris, pH 7.5, 50 mM NaCl, 1 mM TCEP and 50 mM Tris, pH 7.5, 50 mM NaCl, 1 mM TCEP. The protein was eluted off the column over a salt gradient from 50 mM to 1 M NaCl. The His6-SUMO tag was cleaved by addition of 0.05 mg/mL Ulp1 protease. The reaction mixture was flowed over 2 mL Ni-NTA column (ThermoFisher) to remove the Ulp1, the uncleaved protein, and His6-SUMO fragments. The cleaved protein was further purified by size-exclusion chromatography on a Superdex 75 16/60 gel filtration column (Cytiva) equilibrated with buffer containing 20 mM HEPES, pH 7.4, 150 mM NaCl, and 10% glycerol. Pure fractions were pooled, aliquoted, and flash frozen in liquid N2 for long-term storage at –80 °C.

Synthesis and purification of peptides for in vitro validation measurements

Request a detailed protocol

All the peptides used for in vitro kinetic assays were synthesized using 9-fluorenylmethoxycarbonyl (Fmoc) solid-phase peptide chemistry. All syntheses were carried out using the Liberty Blue automated microwave-assisted peptide synthesizer from CEM under nitrogen atmosphere, with standard manufacturer-recommended protocols. Peptides were synthesized on MBHA Rink amide resin solid support (0.1 mmol scale). Each Nα-Fmoc amino acid (6 eq, 0.2 M) was activated with diisopropylcarbodiimide (DIC, 1.0 M) and ethyl cyano(hydroxyamino)acetate (Oxyma Pure, 1.0 M) in dimethylformamide (DMF) prior to coupling. Each coupling cycle was done at 75 °C for 15 s then 90 °C for 110 s. Deprotection of the Fmoc group was performed in 20% (v/v) piperidine in DMF (75 °C for 15 s then 90 °C for 50 s). The resin was washed (4 x) with DMF following Fmoc deprotection and after Nα-Fmoc amino acid coupling. All peptides were acetylated at their N-terminus with 10% (v/v) acetic anhydride in DMF and washed (4 x) with DMF.

After peptide synthesis was completed, including N-terminal acetylation, the resin was washed (3 x each) with dichloromethane (DCM) and methanol (MeOH) and dried under reduced pressure overnight. The peptides were cleaved and the side chain protecting groups were simultaneously deprotected in 95% (v/v) trifluoroacetic acid (TFA), 2.5% (v/v) triisopropylsilane (TIPS), and 2.5% water (H2O), in a ratio of 10 μL cleavage cocktail per mg of resin. The cleavage-resin mixture was incubated at room temperature for 90 min, with agitation. The cleaved peptides were precipitated in cold diethyl ether, washed in ether, pelleted, and dried under air. The peptides were redissolved in 50% (v/v) water/acetonitrile solution and filtered from the resin.

The crude peptide mixture was purified using reverse-phase high-performance liquid chromatography (RP-HPLC) on a semi-preparatory C18 column (Agilent, ZORBAX 300 SB-C18, 9.4x250 mm, 5 μm) with an Agilent HPLC system (1260 Infinity II). Flow rate was kept at 4 mL/min with solvents A (H2O, 0.1% (v/v) TFA) and B (acetonitrile, 0.1% (v/v) TFA). Peptides were generally purified over a 40-min linear gradient from solvent A to solvent B, with the specific gradient depending on the peptide sample. Peptide purity was assessed with an analytical column (Agilent, ZORBAX 300 SB-C18, 4.6x150 mm, 5 μm) at a flow rate of 1 mL/min over a 0–90% B gradient in 30 minutes. All peptides were determined to be ≥95% pure by peak integration. The identities of the peptides were confirmed by mass spectroscopy (Waters Xevo G2-XS QTOF). Pure peptides were lyophilized and redissolved in 100 mM Tris, pH 8.0, as needed for experiments.

Preparation of the X5-Y-X5 and pTyr-Var libraries for specificity profiling

Request a detailed protocol

All bacterial display libraries used in this study are embedded within the pBAD33 plasmid (chloramphenicol resistant), with the surface-display construct inducible by L-(+)-arabinose (Rice and Daugherty, 2008). All libraries have the same general structure:

  • [signal sequence: MKKIACLSALAAVLAFTAGTSVA]-[GQSGQ]-[peptide-coding sequence]-[GGQSGQ]-[eCPX scaffold]-[GGQSGQ]-[strep-tag: WSHPQFEK or myc-tag: EQKLISEEDL]

The X5-Y-X5 library contains 11-residue peptide sequences with five randomized amino acids flanking both sides of a fixed central tyrosine residue. The library was produced using the X5-Y-X5 library oligo, with each X encoded by an NNS codon, and Y encoded by a TAT codon (see key resources table for all primer sequences). This oligo included a 5’ SfiI restriction site and DNA sequences encoding the flanking linkers that connect library peptide sequences to the 5’ signal sequence and 3’ eCPX scaffold.

The sequences in the pTyr-Var library were derived from the PhosphoSitePlus database and include 3159 human tyrosine phosphorylation sites and 4,760 variants of these phosphosites bearing a single amino acid mutation (Hornbeck et al., 2019). The sequences in this library are named as ‘GeneName_pTyr-position’ and ‘GeneName_pTyr-position’ (e.g. ‘SRC_Y530’ and ‘SRC_Y530_527 K’). In this initial list, about 2,133 sequences had more than one tyrosine residue, and so a second version of those sequences were included in which the tyrosines except the central tyrosine were substituted with phenylalanine (denoted with a ‘_YF’ suffix). In addition, 24 previously reported consensus substrate sequences were included (Begley et al., 2015; Deng et al., 2014; Marholz et al., 2018; Rube et al., 2022; Songyang et al., 1995). In total, our designed pTyr-Var library contained 9,898 unique 11-residue peptide sequences, which were then converted into DNA sequences using the most frequently used codon in E. coli. The DNA sequences were further optimized, swapping synonymous codons to achieve a GC content of all sequence between 30% and 70%. Sequences were also inspected and altered to remove any internal SfiI recognition sites. The 33-base peptide-coding sequences were flanked by 5’-GCTGGCCAGTCTGGCCAG-3’ on the 5’ side and 5’- GGAGGGCAGTCTGGGCAGTCTG-3’ on the 3’ side, the same flanks used for the X5-Y-X5 library oligo. An oligonucleotide pool based on all 9,898 sequences was generated by on-chip massively parallel synthesis (Twist Bioscience). This oligo-pool was amplified by PCR in ten cycles with the Oligopool-fwd-primer and Oligopool-rev-primer, using the NEB Q5 polymerase with a slow ramping speed (2 °C/s) and long denaturation times.

Next, we integrated the oligonucleotide sequences encoding the X5-Y-X5 and pTyr-Var library into a pBAD33 vector as a fusion to the eCPX bacterial display scaffold, in a series of steps. The eCPX gene was previously fused to a sequence encoding a 3’ strep-tag (pBAD33-eCPX-cStrep) (Shah et al., 2018), and we produced a myc-tagged eCPX construct analogously, using standard molecular cloning techniques (pBAD33-eCPX-cMyc). The coding sequences for the eCPX-strep and eCPX-myc constructs were amplified from these plasmids by PCR using the link-eCPX-fwd primer and the link-eCPX-rev primer. These PCR products contained a 3’ SfiI restriction site. The peptide-coding sequences were then fused to the eCPX scaffold at the 5’ end of the scaffold in another PCR step to generate the library-scaffold inserts. For the X5-Y-X5 Library, this step used the X5-Y-X5 library oligo and the link-eCPX-rev primer, along with the amplified eCPX gene. For the pTyr-Var library, this step used the amplified oligo-pool, the amplified eCPX gene, and the Oligopool-fwd-primer and link-eCPX-rev primer. The resulting PCR products contained the peptide-scaffold fusion constructs flanked by two unique SfiI sites.

In parallel, the pBAD33-eCPX backbone was amplified by PCR from the pBAD33-eCPX plasmid using the BB-fwd-primer and BB-rev primer. Both the amplified insert and backbone were purified over spin columns and then digested with the SfiI restriction endonuclease overnight at 50 °C. After digestion, the backbone was treated with Quick CIP (NEB) to prevent self-ligation from occurring. Both the digested insert and backbone were gel purified. The purified library insert was ligated into the digested pBAD33-eCPX backbone using T4 DNA ligase (NEB) overnight at 16 °C. Typically, this reaction was done with a total of approximately 1.5 μg of DNA, with a 1:5 molar ratio of backbone:insert. The ligation reaction was concentrated and desalted over a spin column and then used to transform commercial DH5a cells by electroporation. The transformed DH5a cells were grown in liquid culture overnight, and the plasmid DNA was isolated and purified using a commercial midiprep kit (Zymo).

Experimental procedure for high-throughput specificity screening of tyrosine kinases

Preparation of cells displaying peptide libraries

Request a detailed protocol

The high-throughput specificity screens for tyrosine kinases using the X5-Y-X5 and the pTyr-Var peptide library were carried out as described previously (Shah et al., 2018), with the main difference being the use of magnetic beads to isolate phosphorylated cells, rather than fluorescence-activated cell sorting. 25 µL of electrocompetent E. coli MC1061 F- cells were transformed with 200 ng of library DNA. Following electroporation, the cells were resuspended in 1 mL of LB and allowed to recover at 37 °C for 1 hr with shaking. These cells were resuspended in 250 mL of LB with 25 µg/ml chloramphenicol and incubated overnight at 37 °C. Of the overnight culture, 150 μL was used to inoculate 5.5 mL of LB containing 25 µg/mL chloramphenicol. This culture was grown at 37 °C for 1–2 hr until the cells reached an optical density of 0.5 at 600 nm. Expression of the library was induced by adding arabinose to a final concentration of 0.4% (w/v). The cells were incubated at 25 °C with shaking at 220 rpm for 4 hr. Small aliquots of the cells (75–150 µL) were transferred to microcentrifuge tubes and centrifuged at 1000 g at 4 °C for 10–15 min. The media was removed and the cells were resuspended in PBS and centrifuged again. The PBS was removed and the cells were stored at 4 °C. Experiments were performed with cells stored at 4 °C between 1–4 days. Typical screens were carried out on a 50 μL to 100 μL scale, with cells that were 50% more concentrated than in culture (OD600 value around 1.5). Thus, for a 100 μL reaction, typically 150 μL of cell culture was pelleted and washed.

Phosphorylation of peptides displayed on cells

Request a detailed protocol

Phosphorylation reactions of the library were conducted with the purified kinase domain and 1 mM ATP in a buffer containing 50 mM Tris, pH 7.5, 150 mM NaCl, 5 mM MgCl2, 1 mM TCEP, and 2 mM sodium orthovanadate. To achieve similar library phosphorylation levels across the kinases, an optimal concentration of kinase was determined to achieve 20–30% phosphorylation of the library after 3 minutes of incubation at 37 °C. This was assessed by flow cytometry based on anti-phosphotyrosine antibody labeling (Attune NxT, Invitrogen). To label the phosphorylated cells, 50 μL pellets were resuspended with a 1:25 dilution of the PY20-PerCP-eFluor 710 conjugate (eBioscience) in PBS containing 0.2% bovine serum albumin (BSA). The cells were incubated with the antibody for 1 hr on ice in the dark, then centrifuged, washed once with PBS with 0.2% BSA, and finally resuspended in 100 μL of PBS with 0.2% BSA. For flow cytometry analysis, 20 μL of cells were diluted in 130 μL of PBS with 0.2% BSA.

The following concentrations were used: 0.5 μM for Src, 1.5 μM for Abl, 0.4 μM for Fer, 1.5 μM for EPHB1, 1.25 μM for EPHB2, 0.1 μM for JAK2, 0.5 μM for AncSZ, 0.45 μM for FGFR1, 0.5 μM for FGFR3, and 0.7 μM for MERTK. For some tyrosine kinases, such as FGFR1, FGFR3, and MERTK, pre-activation with ATP was required to enhance its kinetic activity. To accomplish this, autophosphorylation reactions were performed with 25 μM kinase and 5 mM ATP for 0.5–2 hours at 25 °C. The preactivated kinase mixture was then desalted and concentrated using an Amicon Ultra-15 30 kDa MWCO spin filter (Millipore) to remove the residual ATP.

After the desired time of library phosphorylation, kinase activity was quenched with 25 mM EDTA and the cells were washed with PBS containing 0.2% BSA. Kinase-treated cells were then labeled with a 1:1000 dilution of biotinylated 4G10 Platinum anti-phosphotyrosine antibody (Millipore) for an hour on ice and washed with PBS containing 0.1% BSA and 2 mM EDTA (isolation buffer). The cells were finally resuspended in PBS containing 0.1% BSA. The phosphorylated, antibody-labeled cells were then mixed with magnetic beads from Dynabeads FlowComp Flexi kit (Invitrogen), at a ratio of 37.5 μL of washed beads per 50 μL of cell suspension, diluted into 450 μL of isolation buffer. The suspension was rotated at 4 °C for 30 minutes, then 375 μL of isolation buffer was added and the beads were separated from the bulk solution on a magnetic rack. The beads were washed once with 1 mL of isolation buffer, and then the supertantant were removed by aspiration. The beads were resuspended in 50 µL of fresh water, vortexed, and boiled at 100 °C for 10 minutes to extract DNA from cells bound to Dynabeads. The bead/lysate mixture was centrifuged to pellet the beads and the mixture was stored at –20 °C.

DNA sample preparation and deep sequencing

Request a detailed protocol

To amplify the peptide-coding DNA sequence for deep sequencing, the supernatant from this lysate was used as a template in a 50 μL, 15-cycle PCR reaction using the TruSeq-eCPX-Fwd and TruSeq-eCPX-Rev primers and Q5 polymerase. The resulting mixture from this PCR reaction was used without purification as a template for a second, 20 cycle PCR reaction to append a unique pair of Illumina sequencing adapters and 5’ and 3’ indices for each sample (D700 and D500 series primers). The resulting PCR products were purified by gel extraction, and the concentration of each sample was determined using QuantiFluor dsDNA System (Promega). Each sample was pooled to equal molarity and sequenced by paired-end Illumina sequencing on a MiSeq or NextSeq instrument using a 150 cycle kit. The number of samples multiplexed in one run, and the loading density on the sequencing chip, were adjusted to obtain at least 1–2 million reads for each index/sample.

Experimental procedure for high-throughput specificity screening of SH2 domains

Preparation of cells displaying peptide libraries

Request a detailed protocol

Bacteria displaying peptide libraries for SH2 screens were prepared similarly to the bacteria for the kinase screens, with some small modifications. Specifically, after transformation with the library DNA and outgrowth of an overnight culture, 1.8 mL of the overnight culture was added to a 100 mL of LB containing 25 μg/mL of chloramphenicol. This culture was grown at 37 °C until the cells reached an optical density of 0.5 at 600 nm. Then, 20 mL of this culture was transferred to a 50 mL flask, and expression was induced by addition of arabinose to a final concentration of 0.4% (w/v). Expression was carried out at 25 °C for 4 hr, then cells were aliquoted, pelleted, and washed as described for kinase screens.

Phosphorylation of peptides displayed on cells

Request a detailed protocol

Phosphorylation of cells was performed in a buffer containing 50 mM Tris, pH 7.5, 150 mM NaCl, 5 mM MgCl2, 1 mM TCEP. A mixture of 2.5 µM of c-Abl kinase domain, 2.5 µM c-Src kinase domain, 2.5 µM of EPHB1 kinase domain, 2.5 µM of AncSZ, 50 µg/mL rabbit muscle creatine phosphokinase, and 5 mM creatine phosphate was prepared in this buffer. Cells were resuspended in this solution such that a pellet derived from 50 μL of cell culture was resuspended in 50 μL of solution. To initiate the phosphorylation reaction, ATP was added from a concentrated stock to a final concentration of 5 mM, and the mixture was incubated at 37 °C for 3 hr. Following this, the kinase activity was quenched by addition of 25 mM EDTA. Library phosphorylation was assessed by flow cytometry based on anti-phosphotyrosine antibody labeling, as described above for the kinase screens (Attune NxT, Invitrogen).

Preparation of magnetic beads functionalized with SH2 domains (SH2-dynabeads)

Request a detailed protocol

First, 37.5 µL of magnetic beads from the Dynabeads FlowComp Flexi kit (Invitrogen) were washed with 1 mL of SH2 screen buffer containing 50 mM HEPES, pH 7.5, 150 mM NaCl. After washing, the beads were resuspended in 75 µL of 20 µM biotinylated SH2 domain and incubated at 4 °C for 2.5–3 hr. Unbound SH2 domain protein was removed by washing twice with 1 mL of SH2 screen buffer twice. The beads were finally resuspended in 37.5 µL of SH2 screen buffer.

Selection with SH2-dynabeads

Request a detailed protocol

Fifty µL of the phosphorylated cells were centrifuged at 4000 g at 4 °C for 15 min. After the supernatant was discarded, the cells were resuspended in SH2 screen buffer with 0.1% BSA, mixed with 37.5 µL of SH2-dynabeads, and rotated for 1 hr at 4 °C. Then, the magnetic beads were separated from the bulk solution using a magnetic rack, and the supernatant was removed by aspiration. After the supernatant was discarded, the SH2-beads were washed by incubating them with 1 mL of SH2 screen buffer for 30 min at 4 °C. After discarding the wash solution, the beads were resuspended in 50 µL of fresh water, vortexed, and boiled at 100 °C for 10 min to extract DNA from cells bound to SH2-dynabeads. DNA samples were prepped and sequenced identically as done for the kinase screens.

Procedure for incorporating non-canonical amino acids in the high-throughput specificity screen

General protocol

Request a detailed protocol

E. coli MC1061 electrocompetent bacteria were transformed with genetically-encoded peptide libraries and grown in liquid LB media as described in the regular screens, but with an additional plasmid encoding the corresponding non-canonical amino acid aminoacyl synthetase and tRNA pair and the addition of 100 µg/mL ampicillin to the growth medium. The cells were grown to an optical density of 0.5 at 600 nm. Peptide expression was induced with 0.4% (w/v) arabinose, 1 mM isopropylβ-D-1-thiogalactopyranoside (IPTG), and 5 mM CMF, 5 mM AzF, or 10 mM AcK, and incubated at 25 °C for 4 h. Cell pellets were collected and washed in PBS as described in the regular screens. Bacteria bearing surface-displayed peptides containing the non-canonical amino acid of interest were phosphorylated with 0.5 µM Src kinase for 3 min using the same buffer conditions as in the regular kinase screens. The reaction was carried out in buffer containing 50 mM Tris, 150 mM NaCl, 5 mM MgCl2, pH 7.5, 1 mM TCEP, and 2 mM activated sodium orthovanadate for 3 min. The reactions were initiated with 1 mM ATP and quenched with 25 mM EDTA, then washed with PBS containing 0.2% BSA, as described for the regular screens. Downstream processing of the samples, including phospho-tyrosine labeling, separation using magnetic beads, and deep sequencing were done exactly as in the regular kinase screens.

Fluorophore labeling of surface-displayed AzF using click chemistry

Request a detailed protocol

The DIBO labeling solution was prepared by dissolving 0.5 mg of DIBO-alkyne Alexa Fluor 555 dye (ThermoFisher) in dimethyl sulfoxide (DMSO) to a concentration of 1 mM, and the solution was kept protected from light. The c-Myc tag labeling solution was prepared by a 1:100 dilution of c-Myc Alexa Fluor 488 conjugate (ThermoFisher) in PBS containing 0.2% BSA. The cell pellets treated with AzF were resuspended in 50 μM of the DIBO labeling solution and incubated overnight at RT with gentle nutation, protected from light (Tian et al., 2014). The cell suspension was pelleted and washed 4 x in PBS containing 0.2% BSA to ensure all excess DIBO dye was removed. The cell pellets were then resuspended in the c-Myc antibody solution and incubated on ice for 1 hr, protected from light. The cell suspension was pelleted and washed using PBS with 0.2% BSA. The pellets were resuspended in PBS with 0.2% BSA and analyzed by flow cytometry (Attune NxT, Invitrogen).

A note about replicates for the bacterial peptide display screens

Request a detailed protocol

We define technical replicates as sets of screens conducted with library-expressing cells that are all derived from the same library transformation reaction. Biological replicates are screens done using different transformations with the library DNA, often on different days. The replicates in this study are generally all biological replicates or two biological replicate sets of two to three technical replicates.

Processing and analysis of deep sequencing data from high-throughput specificity screens

The raw paired-end reads for each index pair from an Illumina sequencing run were merged using the FLASH (Magoč and Salzberg, 2011). The resulting merged sequences were then searched for the following 5’ and 3’ flanking sequences surrounding the peptide-coding region of the libraries: 5’ flanking sequence = 5’-NNNNNNACCGCAGGTACTTCCGTAGCTGGCCAGTCTGGCCAG-3’, and 3’ flanking sequence = 5’-GGAGGGCAGTCTGGGCAGTCTGGTGACTACAACAAAANNNNNN-3’. These flanks were removed using the software Cutadapt to yield a filed named ‘SampleName.trimmed.fastq’ (Martin, 2011). Sequences that did not contain both flanking regions were discarded at this stage (typically less than 5%). From this point onward, all analysis was carried out using Python scripts generated in-house, which can be found in a GitHub repository https://github.com/nshahlab/2022_Li-et-al_peptide-display (copy archived at Li et al., 2023). Trimmed and translated FastQ and FastA files for all data used in this paper can be found in a Dryad repository (https://doi.org/10.5061/dryad.0zpc86727).

Analysis of data from screens with thepTyr-Var Library

Request a detailed protocol

For the samples screened with the pTyr-Var Library, we ran scripts that identify every 33 base trimmed DNA sequence, translate those DNA sequences into amino acid sequences, count the abundance of each translated sequence that matches a peptide in the pTyr-Var library. In one format of this analysis, we used the countPeptides.py script on a trimmed input file, or batch-countPeptides.py script for multiple input files, to generate a list of every unique peptide and its corresponding counts. In a second format of this analysis, we used the countPeptides-var-ref.py (or batch-countPeptides-var-ref.py), along with paired text files listing each variant (pTyr-Var_variant.txt) and their corresponding reference sequence (pTyr-Var_reference.txt), line-by-line, to yield side-by-side counts for each variant-reference pair. These processing steps were conducted for both selected samples (after kinase phosphorylation or SH2 binding), as well as unselected input samples. Next, the number of reads for every sequence (npeptide) was normalized to the total number of peptide-coding reads in that sample (ntotal), to yield a frequency (fpeptide, equation 1). Then, the frequency of each peptide in a selected sample (fpeptide,selected) was further normalized to the frequency of that same peptide in the unselected input sample (fpeptide,input) to yield an enrichment score (Epeptide, equation 2).

(1) fpeptide=npeptidentotal
(2) Epeptide=fpeptide,selectedfpeptide,input

Analysis of data from screens with the X5-Y-X5 Library

Request a detailed protocol

For data from the X5-Y-X5 library, we did not calculate enrichments for individual sequences, as the sequencing depth per sample was generally on-par with the library size was (106–107 sequences). Instead, we computed the counts for each amino acid (or a stop codon) at every position along peptides of the expected length (11 amino acid residues). To accomplish this, we first translated all of the DNA sequences in the trimmed sequencing files using the translateUnique.py (or batch-translateUnique.py) script. When stop codons were encountered, they were translated as an asterisk symbol. In addition to producing a file of translated reads named ‘SampleName.translate.fasta’, this script also produced lists of every unique translated 11-residue peptide and the corresponding counts for that peptide. These files allowed us to assess whether any individual sequence was disproportionately enriched (not expected for a single round of selection with a library of this size), how many unique sequences were in each sample, and what fraction of the unique sequences contained a stop codon.

Using the translated read files, we then calculated the position-specific amino acid counts in three formats. In the simplest format, we exclusively counted 11-residue sequences that contained a central tyrosine and no stop codons (AA-count-nostop.py and batch-AA-count-nostop.py). In order to calculate stop codon depletion, we run a version of the script that counted amino acid and stop codon composition across all 11-residue sequences (AA-count-full.py and batch-AA-count-full.py). Finally, for Amber suppression datasets, we exclusively counted sequences containing one stop and a central tyrosine residue (AA-count-1stop.py and batch-AA-count-1stop,py). Each of these scripts generated an 11x21 counts matrix with each position in the peptide represented by a column (from –5 to +5), and each row represented by an amino acid (in alphabetical order, with the stop codon in the 21st row). Frequencies of each amino acid at each position were determined by taking the position-specific count for each amino acid and dividing that by the column total. Frequencies in a matrix from a selected sample were further normalized against frequencies from an input sample, and the resulting enrichment values were log2-transformed to yield the data represented in the heatmaps in Figures 1, 2, 6 and 7.

Scoring sequences using data from the X5-Y-X5 Library

Request a detailed protocol

In order to score peptides using position-weighted counts matrixes from the X5-Y-X5 Library, we wrote a Python script called score_peptide_nostop.py. This script requires the selected and input counts matrices for a kinase or SH2 domain, produced by the AA-count-nostop.py script, along with a list of peptides as a text file, with one peptide per line. The script first calculates the normalized enrichments for each amino acid at each position across the matrices. Then, it reads each target sequence, sums up the log2-normalized enrichments for each residue according to the enrichment matrix, ignoring the central tyrosine, and divides the sum by the number of scored residues (10 for the X5-Y-X5 Library). The script also calculates the score for the best and worst sequence, according to the enrichment matrix. Both unnormalized and normalized scores for the whole peptide list are outputted as text files.

In vitro measurements of phosphorylation rates with purified kinases and peptides

RP-HPLC assay to measure peptide phosphorylation kinetics

Request a detailed protocol

To validate the enrichment scores observed in the c-Src screening data, the phosphorylation rates were measured in vitro with the purified catalytic domain of c-Src and synthetic 11-residue peptides derived from sequences in the pTyr-Var library. Kinetic measurements were carried out at 37 °C by mixing 500 nM c-Src and 100 μM peptide in a buffer containing 50 mM Tris, pH 7.5, 150 mM NaCl, 5 mM MgCl2, 1 mM TCEP, and 2 mM activated sodium orthovanadate. Reactions were initiated by adding 1 mM ATP. At various time points, 100 μL aliquots were removed and quenched by the addition of EDTA to a final concentration of 25 mM. Each time point sample was analyzed by analytical RP-HPLC, monitoring absorbance at 214 nm. Forty μL of each time point was injected onto a C18 column (ZORBAX 300 SB-C18, 5 μm, 4.6x150 mm). The solvent system used was water with 0.1% trifluoroacetic acid (solvent A) and acetonitrile with 0.1% trifluoroacetic acid (solvent B). Peptides were eluted at a flow rate of 1 mL/min, using the following set of linear gradients: 0–2 min: 5% B, 2–12 min: 5–95% B, 12–13 min: 95% B, 13–14 min: 95–5% B, and 14–17 min: 5% B. The areas under the peaks corresponding to the unphosphorylated and phosphorylated peptides were calculated using the Agilent OpenLAB ChemStation software. The fractional product peak area was plotted as a function of reaction time, and the initial linear regime of this plot was fitted to a straight line to determine a reaction rate. Rates were corrected for substrate and enzyme concentration. Reactions were done in triplicate or quadruplicate.

Michaelis-Menten analysis using the ADP-Quest assay

Request a detailed protocol

A fluorescence-based assay from Eurofins (ADP Quest) was used to measure the Michaelis-Menten kinetic parameters for phosphorylation of the consensus peptides by purified tyrosine kinase domains. In this assay, ADP production as a result of kinase activity is coupled to the production of resorufin, a fluorophore that emits signal at 590 nm. For all experiments, the assay reactions were set up as described in the provided assay kit protocol, in a 384 well plate format. The peptide solutions were serially diluted in 100 mM Tris, pH 8.0, and the kinases were diluted to 50 nM in buffer (10 mM HEPES, 100 mM NaCl, 1 mM TCEP, 5 mM MgCl2, 10% (v/v) glycerol). The final reaction mixtures contained 10 nM of kinase with 100 μM of ATP. Reactions were initiated with the addition of 1 mM ATP into a 50 μL reaction mixture for a final concentration of 100 μM ATP. Phosphorylation reaction progress was monitored by measuring fluorescence at excitation 530 nm and emission 590 nm every 2 min at 37 °C on a plate reader (BioTek Synergy Neo 2). The fluorescence units (RFU) were converted to μM ADP by comparison to a standard curve, and the initial rates were extracted from the linear regime of the reaction progress curves. Initial rates were also measured for samples containing each kinase but lacking a peptide substrate, to account for background ATP hydrolysis. This background rate was subtracted from the rates measured in the presence of peptide. The subtracted rates were plotted as a function of substrate concentration and fit to the Michaelis-Menten equation to extract kcat and KM values.

In vitro measurements of binding affinities with purified SH2 and phospho-peptides

Request a detailed protocol

Binding affinities of SH2 domains and phospho-peptides were measured using fluorescence polarization-based competition binding assay, following previously reported methods (Cushing et al., 2008). The fluorescent peptide (FITC-Acp-GDG(pY)EEISPLLL) used for KD measurements was a gift from the Amacher lab. A buffer containing 60 mM HEPES, pH 7.2, 75 mM KCl, 75 mM NaCl, 1 mM EDTA, and 0.05% Tween 20 was used for the experiments. For KD measurement, varying concentrations of the c-Src SH2 protein were incubated with 30 nM fluorescent peptide for 15 min in a black, half-area, 96-well plate. The plate was centrifuged for 5 min at 1000 g to remove air bubbles. Following this, fluorescence polarization data was collected on a plate reader at 25 °C (BioTek Synergy Neo 2). The samples were excited at a wavelength of 485 nm and emission data was collected at 525 nm. Data was analyzed and fitted to a quadratic binding equation to determine the KD for the fluorescent peptide with c-Src. A KD of 160 nM was obtained for the fluorescent peptide with the c-Src SH2 domain, and this value was used in subsequent calculations for the competition binding experiments.

Competition binding experiments were performed similarly. A stock solution was prepared by mixing 60 nM fluorescent peptide with SH2 domain at a concentration of 480 nM (3 x KD) and incubated at room temp for 15 min. Unlabeled competitor peptide was serially diluted in buffer. Each serial dilution was mixed with fluorescent peptide-SH2 stock solution at a 1:1 ratio in a black, half-area, 96 well plate. After mixing the samples by pipetting, the plate was centrifuged at 1000 g for 5 min to remove air bubbles. The final fluorescent peptide concentration was 30 nM and the final SH2 concentration was 1.5 x KD (240 nM). Fluoresce polarization was measured as previously described for initial KD measurements. Competition binding data were fit to a cubic binding equation as described previously (Cushing et al., 2008).

In vitro measurements of phosphorylation rates with purified kinases and SHP2 substrate

Expression and Purification of SHP2 WT, D61V, and D61N

Request a detailed protocol

All SHP2 variants contained a catalytic cysteine mutation (C459E), C-terminal tail (526-593) deletion, and N-terminal His6-tag followed by a TEV protease cleavage site. The same protocol used to express and purify SH2 domains, excluding co-expression of BirA and addition of biotin, was applied to the expression and purification of the SHP2 variants.

LC-MS assay to measure protein phosphorylation kinetics

Request a detailed protocol

To pre-activate the kinases, 1 μM of each purified kinase domain was preincubated at 37 °C for 30 min in the same buffer conditions used in the kinase domain peptide display screen, with 1 mM ATP. The reaction of the kinase with SHP2 was initiated with the addition of 10 μM SHP2, and the mixture was incubated in 37 °C for 1 hr. To terminate the reaction, the mixture was quenched with 200 mM EDTA. The reaction mixture was diluted 3:2 in water and injected onto a BEH C8 column (Waters) on a UPLC-MS system (Xevo QToF, Waters). Reverse-phase liquid chromatography was carried out at 0.3 mL/min with solvents A (H2O, 0.1% (v/v) formic acid) and B (acetonitrile, 0.1% (v/v) formic acid). Proteins were eluted over a gradient of 5–95% B for 8.5 min. The protein peak on the chromatogram was deconvoluted using the MaxEnt1 algorithm from 32,000–65,000 Da with a resolution of 1 Da/channel over 30 iterations. Peaks were chosen according to the theoretical MW of the protein within a range of 5 Da, and integrated for the signal intensity.

Materials and data availability

Request a detailed protocol

The key reagents produced in this study (the X5-Y-X5 Library, the pTyr-Var Library, and protein expression plasmids) will be made freely available to any researcher interested in using our specificity profiling platform. Data from the specificity screens in this study, in the form of enrichment scores, are available alongside this publication as source data files. Trimmed and translated deep sequencing data (.fastq and.fasta files) are available via Dryad: https://doi.org/10.5061/dryad.0zpc86727. Code used in this study to process and analyze the data can be found in this GitHub repository: https://github.com/nshahlab/2022_Li-et-al_peptide-display (copy archived at Li et al., 2023). The plasmid libraries and unprocessed data can be requested by directly contacting the corresponding author.

Appendix 1

Appendix 1—key resources table
Reagent type (species) or resourceDesignationSource or referenceIdentifiersAdditional information
Strain, strain background (E. coli)MC1061LucigenLucigen: 10361012bacterial cells used for surface-display screens
Strain, strain background (E. coli)DH5αInvitrogenInvitrogen: 18265017bacterial cells used for
general cloning and library cloning
Strain, strain background (E. coli)BL21(DE3)ThermoFisher ScientificThermo: C600003bacterial cells for general protein-expression;
pre-transformed with pCDF-YopH for
tyrosine kinase overexpression
Strain, strain background (E. coli)C43(DE3)LucigenLucigen: NC9581214bacterial cells used for SH2 domain over-expression;
pre-transformed with pCDFDuet-BirA-WT for biotinylation
Antibody4 G10 Platinum, Biotin (mouse monoclonal)Millipore SigmaMillipore Sigma: 16–452-MIbiotin conjugated mouse monoclonal
pan-phosphotyrosine antibody dilution: (1:1000)
AntibodyPY20-PerCP-eFluor 710 (mouse monoclonal)eBioscienceeBioscience: 46-5001-42PerCP-eFluor 710-conjugated mouse monoclonal
pan-phosphotyrosine antibody, clone PY20 dilution: (1:25)
AntibodyPY20-biotin (mouse monoclonal)ExalphaExalpha: 50-210-1865biotin conjugated mouse monoclonal
pan-phosphotyrosine antibody dilution (1:500)
AntibodyStrepMAB Chromeo 488 (mouse monoclonal)IBA LifeSciencesIBA: 2-1546-050Chromeo 488-conjugated antibody that
recognizes the strep-tag dilution: (1:50–100).
Discontinued, but can be replaced with IBA
LifeSciences StrepMAB-Classic conjugate
DY-488 (IBA: 2-1563-050)
Recombinant DNA reagentpBAD33-eCPXPMID:18480093Addgene: 23336pBAD33 plasmid encoding the eCPX bacterial
display gene with flanking 5' and 3' SfiI restriction sites
Recombinant DNA reagentpBAD33-eCPX-cStrepPMID:29547119
pBAD33 plasmid encoding the eCPX bacterial
display gene with a 3' sequence encoding a
strep-tag and flanking 5' and 3' SfiI restriction sites
Recombinant DNA reagentpBAD33-eCPX-cMycthis paper
pBAD33 plasmid encoding the eCPX bacterial
display gene with a 3' sequence encoding a
myc-tag and flanking 5' and 3' SfiI restriction sites
Recombinant DNA reagentX5-Y-X5 Library (myc-tagged)this paper
peptide display library in the pBAD33 vector, fused
to the eCPX scaffold, containing 1–10 million
unique sequences with the structure X5-Y-X5, where X is
encoded by an NNS codon. The scaffold protein is
encoded to have a C-terminal myc-tag: EQKLISEEDL.
Recombinant DNA reagentX5-Y-X5 Library (strep-tagged)this paper
peptide display library in the pBAD33 vector, fused to
the eCPX scaffold, containing 1–10 million unique
sequences with the structure X5-Y-X5,
where X is encoded by an NNS codon.
The scaffold protein is encoded to have a
C-terminal strep-tag: WSHPQFEK.
Recombinant DNA reagentpTyr-Var Library (myc-tagged)this paper
peptide display library in the pBAD33 vector,
fused to the eCPX scaffold, containing ~10,000
unique sequences encoding reference and
variant phosphosite pairs deried from the
PhosphoSitePlus database. The scaffold
protein is encoded to have a C-terminal
myc-tag: EQKLISEEDL.
Recombinant DNA reagentpTyr-Var Library (strep-tagged)this paper
peptide display library in the pBAD33 vector,
fused to the eCPX scaffold, containing ~10,000
unique sequences encoding reference and variant
phosphosite pairs deried from the
PhosphoSitePlus database. The scaffold protein is
encoded to have a C-terminal strep-tag: WSHPQFEK.
Recombinant DNA reagentpET-23a-His6-TEV-Src(KD)PMID:29547119
bacterial expression vector encoding the human
c-Src kinase domain (residues 260–528), with an
N-terminal His6-tag and TEV protease recognition sequence
Recombinant DNA reagentpET-23a-His6-TEV-Fyn(KD)PMID:29547119
bacterial expression vector encoding the human
Fyn kinase domain (residues 261–529) with an
N-terminal His6-tag and TEV protease recognition sequence
Recombinant DNA reagentpET-23a-His6-TEV-Hck(KD)PMID:29547119
bacterial expression vector encoding the human
Hck kinase domain (residues 252–520) with an
N-terminal His6-tag and TEV protease recognition sequence
Recombinant DNA reagentpET-23a-His6-TEV-Abl(KD)PMID:29547119
bacterial expression vector encoding the mouse
c-Abl kinase domain (residues 232–502) with an
N-terminal His6-tag and TEV protease recognition sequence
Recombinant DNA reagentpET-23a-His6-TEV-AncSZ(KD)DOI:
10.1101/2022.04.24.489292

bacterial expression vector encoding the AncSZ
kinase domain (residues 352–627) with an N-terminal
His6-tag and TEV protease recognition sequence
Recombinant DNA reagentpET23a-His6-TEV-Fer(KD)this paper
bacterial expression vector encoding the mouse
Fer kinase domain (residues 553–823) with an
N-terminal His6-tag and TEV protease recognition sequence
Recombinant DNA reagentpET-His6-TEV-FGFR1(KD)PMID:30004690Addgene: 79719bacterial expression vector encoding the human
FGFR1 kinase domain (residues 456–763) with an
N-terminal His6-tag and TEV protease recognition sequence
Recombinant DNA reagentpET-His6-TEV-FGFR3(KD)PMID:30004690Addgene: 79731bacterial expression vector encoding the human
FGFR3 kinase domain (residues 449–759) with an
N-terminal His6-tag and TEV protease recognition sequence
Recombinant DNA reagentpET-His6-TEV-EPHB1(KD)PMID:30004690Addgene: 79694bacterial expression vector encoding the human
EPHB1 kinase domain (residues 602–896) with an
N-terminal His6-tag and TEV protease recognition sequence
Recombinant DNA reagentpET-His6-TEV-EPHB2(KD)PMID:30004690Addgene: 79697bacterial expression vector encoding the human
EPHB2 kinase domain (residues 604–898) with an
N-terminal His6-tag and TEV protease recognition sequence
Recombinant DNA reagentpET-His6-TEV-MERTK(KD)PMID:30004690Addgene: 79705bacterial expression vector encoding the human
MERTK kinase domain (residues 570–864) with an
N-terminal His6-tag and TEV protease recognition sequence
Recombinant DNA reagentpCDF-YopHPMID:16260764
bacterial expression vector for co-expression of
untagged YopH phosphatase with tyrosine kinases
Recombinant DNA reagentpET28-His6-TEV-SHP2-C459E-no tailthis paper
bacterial expression vector encoding the human SHP2
(residues 1–526) with the C459E mutation, an N-terminal
His6-tag, and TEV protease recognition sequence
Recombinant DNA reagentpET28-His6-TEV-SHP2-C459E-no tail-D61Vthis paper
bacterial expression vector encoding the human SHP2
(residues 1–526) with C459E and D61V mutations, an
N-terminal His6-tag, and TEV protease recognition sequence
Recombinant DNA reagentpET28-His6-TEV-SHP2-C459E-no tail-D61Nthis paper
bacterial expression vector encoding the human
SHP2 (residues 1–526) with C459E and D61N mutations,
an N-terminal His6-tag, and TEV protease recognition sequence
Recombinant DNA reagentpCDFDuet-BirA-WTthis paper
bacterial expression vector encoding BirA biotin ligase,
used to coexpress with SH2 domain expression
vector for biotinylation of SH2 domain
Recombinant DNA reagentpET-His6-SUMO-Src(SH2)this paper
bacterial expression vector encoding the human
cSrc SH2 domain (residues 143–250) with an
N-terminal His6-SUMO tag
Recombinant DNA reagentpET-His6-SUMO-SHP2(CSH2)this paper
bacterial expression vector encoding the human SHP2
CSH2 domain (residues 105–220) with an N-terminal His6-SUMO tag
Recombinant DNA reagentpET-His6-SUMO-Grb2(SH2)this paper
bacterial expression vector encoding the human Grb2
SH2 domain (residues 56–152) with an N-terminal His6-SUMO tag
Recombinant DNA reagentpULTRA CMFPMID:28604693
bacterial expression vector encoding the tRNA/syntetase
pair for incorporation of 4-carboxymethyl phenylalanine
via Amber suppression
Recombinant DNA reagentpEVOL pAzFRS.2.t1PMID:26571098Addgene: 73546bacterial expression vector encoding the tRNA/syntetase
pair for incorporation of 4-azido phenylalanine and other
Phe derivatives via Amber suppression
Recombinant DNA reagentpULTRA chAcKRS3PMID:29544052
bacterial expression vector encoding the tRNA/syntetase
pair for incorporation of acetyl-lysine via Amber suppression;
gift from Abhishek Chatterjee at Boston College
Recombinant DNA reagentpULTRA-Amp CMFthis paper
bacterial expression vector encoding the tRNA/syntetase
pair for incorporation of 4-carboxymethyl phenylalanine
via Amber suppression, altered to have an ampicillin resistance marker
Recombinant DNA reagentpULTRA-Amp pAzFRS.2.t1this paper
bacterial expression vector encoding the tRNA/syntetase
pair for incorporation of 4-azido phenylalanine and other
Phe derivatives via Amber suppression, altered to have
an ampicillin resistance marker
Recombinant DNA reagentpULTRA-Amp chAcKRS3this paper
bacterial expression vector encoding the tRNA/syntetase
pair for incorporation of acetyl-lysine via Amber suppression,
altered to have an ampicillin resistance marker
Sequence-based reagentX5-Y-X5 library oligo; eCPX-rand-libthis paper, purchased from Millipore Sigma
primer sequence: 5’-GCTGGCCAGTCTGGCCAGNNS
NNSNNSNNSNNStatNNSNNSNNSNNSNNSGGAGG
GCAGTCTGGGCAGTCTG 3’
Sequence-based reagentOligopool-fwd-primerthis paper, purchased from Millipore Sigma
primer sequence: 5’-GCTGGCCAGTCTG-3’
Sequence-based reagentOligopool-rev-primerthis paper, purchased from Millipore Sigma
primer sequence: 5’-CAGACTGCCCAGACT-3’
Sequence-based reagentlink-eCPX-fwdthis paper, purchased from Millipore Sigma
5’-GGAGGGCAGTCTGGGCAGTCTG-3’
Sequence-based reagentlink-eCPX-revthis paper, purchased from Millipore Sigma
5’-GCTTGGCCACCTTGGCCTTATTA-3’
Sequence-based reagentBB-fwd-primerthis paper, purchased from Millipore Sigma
5’-TAATAAGGCCAAGGTGGCCAAGC-3’
Sequence-based reagentBB-rev primerthis paper, purchased from Millipore Sigma
5’-CTGGCCAGACTGGCCAGCTACG-3’
Sequence-based reagentTruSeq-eCPX-Fwdsequence from PMID:29547119, purchased from Millipore Sigmaround one amplicon PCR primerprimer sequence: 5’-TGACTGGAGTTCAGACGTG
TGCTCTTCCGATCTNNNNNNACCGCA
GGTACTTCCGTAGCT-3’
Sequence-based reagentTruSeq-eCPX-Revsequence from PMID:29547119, purchased from Millipore Sigmaround one amplicon PCR primerprimer sequence: 5’-CACTCTTTCCCTACACGACG
CTCTTCCGATCTNNNNNN
TTTTGTTGTAGTCACCAGACTG-3’
Sequence-based reagentD701sequence from Illumina, purchased from Millipore Sigmaround two amplicon/indexing PCR primerprimer sequence: 5'-CAAGCAGAAGACGG
CATACGAGATcgagtaatGTG
ACTGGAGTTCAGACGTG-3'
Sequence-based reagentD702sequence from Illumina, purchased from Millipore Sigmaround two amplicon/indexing PCR primerprimer sequence: 5'-CAAGCAGAAGA
CGGCATACGAGATtctccgga
GTGACTGGAGTTCAGACGTG-3'
Sequence-based reagentD703sequence from Illumina, purchased from Millipore Sigmaround two amplicon/indexing PCR primerprimer sequence: 5'-CAAGCAGAAGA
CGGCATACGAGATaatgagcg
GTGACTGGAGTTCAGACGTG-3'
Sequence-based reagentD704sequence from Illumina, purchased from Millipore Sigmaround two amplicon/indexing PCR primerprimer sequence: 5'-CAAGCAGAAGAC
GGCATACGAGATggaatctcG
TGACTGGAGTTCAGACGTG-3'
Sequence-based reagentD705sequence from Illumina, purchased from Millipore Sigmaround two amplicon/indexing PCR primerprimer sequence: 5'-CAAGCAGA
AGACGGCATACGAGA
TttctgaatGTGACTGGAGT
TCAGACGTG-3'
Sequence-based reagentD706sequence from Illumina, purchased from Millipore Sigmaround two amplicon/indexing PCR primerprimer sequence: 5'-CAAGCAGAAGA
CGGCATACGAGATacgaattc
GTGACTGGAGTTCAGACGTG-3'
Sequence-based reagentD707sequence from Illumina, purchased from Millipore Sigmaround two amplicon/indexing PCR primerprimer sequence: 5'-CAAGCAGAAG
ACGGCATACGAGATagcttcag
GTGACTGGAGTTCAGACGTG-3'
Sequence-based reagentD708sequence from Illumina, purchased from Millipore Sigmaround two amplicon/indexing PCR primerprimer sequence: 5'-CAAGCAGAAGACG
GCATACGAGATgcgcattaGT
GACTGGAGTTCAGACGTG-3'
Sequence-based reagentD709sequence from Illumina, purchased from Millipore Sigmaround two amplicon/indexing PCR primerprimer sequence: 5'-CAAGCAGAAG
ACGGCATACGAGATcatagccg
GTGACTGGAGTTCAGACGTG-3'
Sequence-based reagentD710sequence from Illumina, purchased from Millipore Sigmaround two amplicon/indexing PCR primerprimer sequence: 5'-CAAGCAGA
AGACGGCATACGAGATttcgcgga
GTGACTGGAGTTCAGACGTG-3'
Sequence-based reagentD711sequence from Illumina, purchased from Millipore Sigmaround two amplicon/indexing PCR primerprimer sequence: 5'-CAAGCAGAAGACG
GCATACGAGATgcgcgaga
GTGACTGGAGTTCAGACGTG-3'
Sequence-based reagentD712sequence from Illumina, purchased from Millipore Sigmaround two amplicon/indexing PCR primerprimer sequence: 5'-CAAGCAGAA
GACGGCATACGAGATctatcgctGT
GACTGGAGTTCAGACGTG-3'
Sequence-based reagentD501sequence from Illumina, purchased from Millipore Sigmaround two amplicon/indexing PCR primerprimer sequence: 5'-AATGATACGGCGA
CCACCGAGATCTACACtatagcct
ACACTCTTTCCCTACACGAC-3'
Sequence-based reagentD502sequence from Illumina, purchased from Millipore Sigmaround two amplicon/indexing PCR primerprimer sequence: 5'-AATGATACGGCG
ACCACCGAGATCTACACatagaggc
ACACTCTTTCCCTACACGAC-3'
Sequence-based reagentD503sequence from Illumina, purchased from Millipore Sigmaround two amplicon/indexing PCR primerprimer sequence: 5'-AATGATACGGCGA
CCACCGAGATCTACACcctatcct
ACACTCTTTCCCTACACGAC-3'
Sequence-based reagentD504sequence from Illumina, purchased from Millipore Sigmaround two amplicon/indexing PCR primerprimer sequence: 5'-AATGATACGGCGA
CCACCGAGATCTACACggctctga
ACACTCTTTCCCTACACGAC-3'
Sequence-based reagentD505sequence from Illumina, purchased from Millipore Sigmaround two amplicon/indexing PCR primerprimer sequence: 5'-AATGATACGGC
GACCACCGAGATCTACACaggcgaag
ACACTCTTTCCCTACACGAC-3'
Sequence-based reagentD506sequence from Illumina, purchased from Millipore Sigmaround two amplicon/indexing PCR primerprimer sequence: 5'-AATGATACGG
CGACCACCGAGATCTACACtaatctta
ACACTCTTTCCCTACACGAC-3'
Sequence-based reagentD507sequence from Illumina, purchased from Millipore Sigmaround two amplicon/indexing PCR primerprimer sequence: 5'-AATGATACGGC
GACCACCGAGATCTACACcaggacgt
ACACTCTTTCCCTACACGAC-3'
Sequence-based reagentD508sequence from Illumina, purchased from Millipore Sigmaround two amplicon/indexing PCR primerprimer sequence: 5'-AATGATACG
GCGACCACCGAGATCTACAC
gtactgacACACTCTTTCCCTACACGAC-3'
Peptide, recombinant proteinSrc(KD)this paper, expressed/purified in-house
human c-Src kinase domain (residues 260–528)
Peptide, recombinant proteinFyn(KD)this paper, expressed/purified in-house
human Fyn kinase domain (residues 261–529)
Peptide, recombinant proteinHck(KD)this paper, expressed/purified in-house
human Hck kinase domain (residues 252–520)
Peptide, recombinant proteinAbl(KD)this paper, expressed/purified in-house
mouse c-Abl kinase domain (residues 232–502)
Peptide, recombinant proteinJAK2 Protein, activeMillipore SigmaMillipore Sigma: 14–640 MActive, C-terminal His6-tagged,
recombinant, human JAK2, amino
acids 808-end, expressed by baculo
virus in Sf21 cells, for use in Enzyme Assays.
Peptide, recombinant proteinAncSZ(KD)this paper, expressed/purified in-house
AncSZ kinase domain (residues 352–627)
designed by ancestral sequence reconstruction
Peptide, recombinant proteinFer(KD)this paper, expressed/purified in-house
mouse Fer kinase domain (residues 553–823)
Peptide, recombinant proteinFGFR1(KD)this paper, expressed/purified in-house
human FGFR1 kinase domain (residues 456–763)
Peptide, recombinant proteinFGFR3(KD)this paper, expressed/purified in-house
human FGFR3 kinase domain (residues 449–759)
Peptide, recombinant proteinEPHB1(KD)this paper, expressed/purified in-house
human EPHB1 kinase domain (residues 602–896)
Peptide, recombinant proteinEPHB2(KD)this paper, expressed/purified in-house
human EPHB2 kinase domain (residues 604–898)
Peptide, recombinant proteinMERTK(KD)this paper, expressed/purified in-house
human MERTK kinase domain (residues 570–864)
Peptide, recombinant proteinSrc(SH2)this paper, expressed/purified in-house
human c-Src SH2 domain (residues 143–250)
Peptide, recombinant proteinSHP2(C-SH2)this paper, expressed/purified in-house
human SHP2 C-SH2 domain (residues 105–220)
Peptide, recombinant proteinGrb2(SH2)this paper, expressed/purified in-house
human Grb2 SH2 domain (residues 56–152)
Peptide, recombinant proteinSHP2(PTP; C459E)this paper, expressed/purified in-house
human full-length SHP2 (residues 1–526; C459E)
Peptide, recombinant proteinSHP2(PTP; C459E, D61V)this paper, expressed/purified in-house
human full-length SHP2 (residues 1–526; C459E, D61V)
Peptide, recombinant proteinSHP2(PTP; C459E, D61N)this paper, expressed/purified in-house
human full-length SHP2 (residues 1–526; C459E, D61N)
Peptide, recombinant proteinSHP2(PTP; C459E, G60V)this paper, expressed/purified in-house
human full-length SHP2 (residues 1–526; C459E, G60V)
Peptide, recombinant proteinSrc Consensusthis paper, synthesized in-house
peptide sequence: Ac-GPDECIYDMFPFKKKG-NH2
Peptide, recombinant proteinSrc Consensus (P-5C, D+1 G)this paper, synthesized in-house
peptide sequence: Ac-GCDECIYGMFPFRRRG-NH2
Peptide, recombinant proteinAbl Consensusthis paper, synthesized in-house
peptide sequence: Ac-GPDEPIYAVPPIKKKG-NH2
Peptide, recombinant proteinFer Consensusthis paper, synthesized in-house
peptide sequence: Ac-GPDEPIYEWWWIKKKG-NH2
Peptide, recombinant proteinEPHB1 Consensusthis paper, synthesized in-house
peptide sequence: Ac-GPPEPNYEVIPPKKKG-NH2
Peptide, recombinant proteinEPHB2 Consensusthis paper, synthesized in-house
peptide sequence: Ac-GPPEPIYEVPPPKKKG-NH2
Peptide, recombinant proteinSrcTide (1995)sequence from PMID:7845468, synthesized in-house
peptide sequence: Ac-GAEEEIYGEFEAKKKG-NH2
Peptide, recombinant proteinSrcTide (2014)sequence from PMID:25164267, purchased from Synpeptide
peptide sequence: Ac-GAEEEIYGIFGAKKKG-NH2
Peptide, recombinant proteinAblTide (2014)sequence from PMID:7845468, synthesized in-house
peptide sequence: Ac-GAPEVIYATPGAKKKG-NH2
Peptide, recombinant proteinHRAS_Y64sequence from PMID:35606422, purchased from Synpeptide
peptide sequence: Ac-AGQEEYSAMRD-NH2
Peptide, recombinant proteinHRAS_Y64_E63Ksequence from PMID:35606422, purchased from Synpeptide
peptide sequence: Ac-AGQEKYSAMRD-NH2
Peptide, recombinant proteinCDK13_Y716_YFthis paper, synthesized in-house
peptide sequence: Ac-IGEGTYGQVFK-NH2
Peptide, recombinant proteinCDK13_Y716_G717R_YFthis paper, synthesized in-house
peptide sequence: Ac-IGEGTYRQVFK-NH2
Peptide, recombinant proteinCDK5_Y15sequence from PMID:35606422, purchased from Synpeptide
peptide sequence: Ac-IGEGTYGTVFK-NH2
Peptide, recombinant proteinCDK5_Y15_G16Rsequence from PMID:35606422, purchased from Synpeptide
peptide sequence: Ac-IGEGTYRTVFK-NH2
Peptide, recombinant proteinPLCG1_Y210this paper, synthesized in-house
peptide sequence: Ac-SGDITYGQFAQ-NH2
Peptide, recombinant proteinPLCG1_Y210_T209Nthis paper, synthesized in-house
peptide sequence: Ac-SGDINYGQFAQ-NH2
Peptide, recombinant proteinGLB1_Y294this paper, synthesized in-house
peptide sequence: Ac-VASSLYDILAR-NH2
Peptide, recombinant proteinGLB1_Y294_L297Fthis paper, synthesized in-house
peptide sequence: Ac-VASSLYDIFAR-NH2
Peptide, recombinant proteinMISP_Y95this paper, synthesized in-house
peptide sequence: Ac-EGWQVYRLGAR-NH2
Peptide, recombinant proteinHLA-DPB1_Y59_F64L_YFthis paper, synthesized in-house
peptide sequence: Ac-LERFIYNREEL-NH2
Peptide, recombinant proteinPEAK1_Y797this paper, synthesized in-house
peptide sequence: Ac-SVEELYAIPPD-NH2
Peptide, recombinant proteinSIRPA_Y496_P491Lthis paper, synthesized in-house
peptide sequence: Ac-LFSEYASVQV-NH2
Peptide, recombinant proteinHGD_Y166_F169Lthis paper, synthesized in-house
peptide sequence: Ac-GNLLIYTELGK-NH2
Peptide, recombinant proteinITGA3_Y237_YFthis paper, synthesized in-house
peptide sequence: Ac-WDLSEYSFKDP-NH2
Peptide, recombinant proteinITGA3_Y237_S235P_YFthis paper, synthesized in-house
peptide sequence: Ac-WDLPEYSFKDP-NH2
Peptide, recombinant proteinSrc Consensus (C-2S)this paper, synthesized in-house
peptide sequence: Ac-GPDESIYDMFPFKKKG-NH2
Peptide, recombinant proteinSrc Consensus (C-2P)this paper, synthesized in-house
peptide sequence: Ac-GPDEPIYDMFPFKKKG-NH2
Peptide, recombinant proteinACTA1_Y171_YFthis paper, synthesized in-house
peptide sequence: Ac-QPIFEG(pY)ALPHAG-NH2
Peptide, recombinant proteinACTA1_Y171_A172G_YFthis paper, synthesized in-house
peptide sequence: Ac-QPIFEG(pY)GLPHAG-NH2
Peptide, recombinant proteinACTB_Y240this paper, synthesized in-house
peptide sequence: Ac-QSLEKS(pY)ELPDGG-NH2
Peptide, recombinant proteinACTB_Y240_P243Lthis paper, synthesized in-house
peptide sequence: Ac-QSLEKS(pY)ELLDGG-NH2
Peptide, recombinant proteinCCDC39_Y593this paper, synthesized in-house
peptide sequence: Ac-QRKQQL(pY)TAMEEG-NH2
Peptide, recombinant proteinCLIP2_Y972this paper, synthesized in-house
peptide sequence: Ac-QSDQRR(pY)SLIDRG-NH2
Peptide, recombinant proteinCLIP2_Y972_R977Pthis paper, synthesized in-house
peptide sequence: Ac-QSDQRR(pY)SLIDPG-NH2
Peptide, recombinant proteinCBS_Y308this paper, synthesized in-house
peptide sequence: Ac-QVEGIG(pY)DFIPTG-NH2
Peptide, recombinant proteinCBS_Y308_G307Sthis paper, synthesized in-house
peptide sequence: Ac-QVEGIS(pY)DFIPTG-NH2
Peptide, recombinant proteinfluorescently-labeled c-Src-SH2 consensus peptidesequence from PMID:7680959
peptide sequence: FITC-Ahx-GDG(pY)EEISPLLL-NH2; gift from Jeanine Amacher at Western Washignton University
Peptide, recombinant proteinSrc Consensus (D+1 K)this paper, synthesized in-house
peptide sequence: Ac-GPDECIYKMFPFKKKG-NH2
Peptide, recombinant proteinSrc Consensus (D1AcK)this paper, synthesized in-house
peptide sequence: Ac-GPDECIY(AcK)MFPFKKKG-NH2
Peptide, recombinant proteinSrc Consensus (C-2K)this paper, synthesized in-house
peptide sequence: Ac-GPDEKIYDMFPFKKKG-NH2
Peptide, recombinant proteinSrc Consensus (C-2AcK)this paper, synthesized in-house
peptide sequence: Ac-GPDE(AcK)IYDMFPFKKKG-NH2
Peptide, recombinant proteinAbl Consensus (A+1 K)this paper, synthesized in-house
peptide sequence: Ac-GPDEPIYKVPPIKKKG-NH2
Peptide, recombinant proteinAbl Consensus (A+1 AcK)this paper, synthesized in-house
peptide sequence: Ac-GPDEPIY(AcK)VPPIKKKG-NH2
Peptide, recombinant proteinAbl Consensus (I+5 K)this paper, synthesized in-house
peptide sequence: Ac-GPDEPIYAVPPKKKKG-NH2
Peptide, recombinant proteinAbl Consensus (I+5 AcK)this paper, synthesized in-house
peptide sequence: Ac-GPDEPIYAVPP(AcK)KKKG-NH2
Commercial assay or kitMiSeq Reagent Kit v3 (150 cycles)IlluminaIllumina:
MS-102–3001

Commercial assay or kitNextSeq 500 Mid-Output v2 Kit (150 cycles)IlluminaIllumina:
FC-404–2001

Commercial assay or kitPromega QuantiFluor dsDNA Sample KitPromegaPromega:
E2671

Commercial assay or kitADP Quest Assay KitEurofins DiscoverxEurofins Discoverx:
90–0071

Commercial assay or kitDynabeads FlowComp Flexi KitThermoFisher ScientificThermoFisher Scientific:
11061D

Chemical compound, drug4-carboxymethyl phenylalanine (CMF)Millipore SigmaMillipore
Sigma:
ENA423210770

Chemical compound, drug4-azido-L-phenylalanine (AzF)Chem-Impex InternationalChem-Impex:
06162

Chemical compound, drugN-ε-Acetyl-L-Lysine (AcK)MP BiomedicalsMP
Biomedicals:
02150235.2

Chemical compound, drugClick-iT sDIBO -Alexa fluor 555ThermoFisherThermo: C20021
OtherCreatine Phosphokinase from rabbit muscleMillipore SigmaMillipore Sigma:
C3755-500UN
purified enzyme extracted from rabbit muscle
Software, algorithmFLASH (version FLASH2-2.2.00)PMID:21903629
https://ccb.jhu.edu/software/FLASH/
Software, algorithmCutadapt (version 3.5)DOI:10.14806/ej.17.1.200
https://cutadapt.readthedocs.io/en/stable/
Software, algorithmPython scripts for processing and analysis of adeep sequencing datathis paper (Li et al., 2023)
https://github.com/nshahlab/2022_Li-et-al_peptide-display
Software, algorithmLogomakerPMID:31821414
https://logomaker.readthedocs.io/en/latest/index.html

Data availability

All of the processed data from the high-throughput specificity screens are provided as source data files. The raw fastq and fasta sequencing files are available as a Dryad repository (DOI:https://doi.org/10.5061/dryad.0zpc86727). Custom code used to process/analyze screening data can be found in a GitHub repository (copy archived at Li et al., 2023) as specified in the manuscript.

The following data sets were generated
    1. Li A
    2. Voleti R
    3. Lee M
    4. Gagoski D
    5. Shah NH
    (2023) Dryad Digital Repository
    Data from: High-throughput profiling of sequence recognition by tyrosine kinases and SH2 domains using bacterial peptide display.
    https://doi.org/10.5061/dryad.0zpc86727

References

    1. Liu X
    2. Brodeur SR
    3. Gish G
    4. Songyang Z
    5. Cantley LC
    6. Laudano AP
    7. Pawson T
    (1993)
    Regulation of c-Src tyrosine kinase activity by the Src SH2 domain
    Oncogene 8:1119–1126.
    1. Pawson T
    2. Nash P
    (2000)
    Protein-Protein interactions define specificity in signal transduction
    Genes & Development 14:1027–1047.

Article and author information

Author details

  1. Allyson Li

    Department of Chemistry, Columbia University, New York, United States
    Contribution
    Conceptualization, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-2359-7703
  2. Rashmi Voleti

    Department of Chemistry, Columbia University, New York, United States
    Contribution
    Conceptualization, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-3705-7460
  3. Minhee Lee

    Department of Chemistry, Columbia University, New York, United States
    Contribution
    Conceptualization, Validation, Investigation, Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-7141-7351
  4. Dejan Gagoski

    1. Department of Chemistry, Columbia University, New York, United States
    2. Department of Biological Sciences, Columbia University, New York, United States
    Contribution
    Validation, Investigation, Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-6194-8514
  5. Neel H Shah

    Department of Chemistry, Columbia University, New York, United States
    Contribution
    Conceptualization, Resources, Formal analysis, Supervision, Funding acquisition, Visualization, Methodology, Writing - original draft, Project administration, Writing – review and editing
    For correspondence
    neel.shah@columbia.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-1186-0626

Funding

National Institute of General Medical Sciences (R35GM138014)

  • Neel H Shah

Damon Runyon Cancer Research Foundation (DFS 31-18)

  • Neel H Shah

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Fereshteh Zandkarimi and Brandon Fowler from the Columbia Chemistry mass spectrometry facility for their assistance with mass spectrometry; Jia Ma from the Columbia Precision Biomolecular Characterization Facility for his guidance with biophysical measurements; and the Columbia Genome Center for their support with deep sequencing. We thank Neil Vasan for his guidance with SHP2 phosphorylation assays. We thank Harmen Bussemaker, Tomas Rube, and Chaitanya Rastogi for their insightful discussions, and members of the Shah lab for their technical and conceptual guidance throughout this project. The fluorescently-labeled c-Src SH2 ligand was a gift from the Jeanine Amacher. The pULTRA chAcKRS3 plasmid was a gift from Abhishek Chatterjee. Bacterial expression vectors for Fer, FGFR1, FGFR3, EPHB1, EPHB2, and MERTK were gifts from John Chodera, Nicholas Levinson, and Markus Seeliger (Addgene plasmid #s 79686, 79719, 79731, 79694, 79697, and 79705). The pEVOL pAzFRS.2.t1 plasmid was a gift from Farren Isaacs (Addgene plasmid #73546). This work was supported by NIH grant R35 GM138014 and a Damon Runyon-Dale F Frey Award for Breakthrough Scientists (DFS 31–18), awarded to NHS.

Version history

  1. Preprint posted: August 1, 2022 (view preprint)
  2. Received: August 1, 2022
  3. Accepted: March 15, 2023
  4. Accepted Manuscript published: March 16, 2023 (version 1)
  5. Version of Record published: March 31, 2023 (version 2)

Copyright

© 2023, Li et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,782
    views
  • 224
    downloads
  • 4
    citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Allyson Li
  2. Rashmi Voleti
  3. Minhee Lee
  4. Dejan Gagoski
  5. Neel H Shah
(2023)
High-throughput profiling of sequence recognition by tyrosine kinases and SH2 domains using bacterial peptide display
eLife 12:e82345.
https://doi.org/10.7554/eLife.82345

Share this article

https://doi.org/10.7554/eLife.82345

Further reading

    1. Biochemistry and Chemical Biology
    2. Structural Biology and Molecular Biophysics
    Isabelle Petit-Hartlein, Annelise Vermot ... Franck Fieschi
    Research Article

    NADPH oxidases (NOX) are transmembrane proteins, widely spread in eukaryotes and prokaryotes, that produce reactive oxygen species (ROS). Eukaryotes use the ROS products for innate immune defense and signaling in critical (patho)physiological processes. Despite the recent structures of human NOX isoforms, the activation of electron transfer remains incompletely understood. SpNOX, a homolog from Streptococcus pneumoniae, can serves as a robust model for exploring electron transfers in the NOX family thanks to its constitutive activity. Crystal structures of SpNOX full-length and dehydrogenase (DH) domain constructs are revealed here. The isolated DH domain acts as a flavin reductase, and both constructs use either NADPH or NADH as substrate. Our findings suggest that hydride transfer from NAD(P)H to FAD is the rate-limiting step in electron transfer. We identify significance of F397 in nicotinamide access to flavin isoalloxazine and confirm flavin binding contributions from both DH and Transmembrane (TM) domains. Comparison with related enzymes suggests that distal access to heme may influence the final electron acceptor, while the relative position of DH and TM does not necessarily correlate with activity, contrary to previous suggestions. It rather suggests requirement of an internal rearrangement, within the DH domain, to switch from a resting to an active state. Thus, SpNOX appears to be a good model of active NOX2, which allows us to propose an explanation for NOX2’s requirement for activation.

    1. Biochemistry and Chemical Biology
    2. Plant Biology
    Dietmar Funck, Malte Sinn ... Jörg S Hartig
    Research Article

    Metabolism and biological functions of the nitrogen-rich compound guanidine have long been neglected. The discovery of four classes of guanidine-sensing riboswitches and two pathways for guanidine degradation in bacteria hint at widespread sources of unconjugated guanidine in nature. So far, only three enzymes from a narrow range of bacteria and fungi have been shown to produce guanidine, with the ethylene-forming enzyme (EFE) as the most prominent example. Here, we show that a related class of Fe2+- and 2-oxoglutarate-dependent dioxygenases (2-ODD-C23) highly conserved among plants and algae catalyze the hydroxylation of homoarginine at the C6-position. Spontaneous decay of 6-hydroxyhomoarginine yields guanidine and 2-aminoadipate-6-semialdehyde. The latter can be reduced to pipecolate by pyrroline-5-carboxylate reductase but more likely is oxidized to aminoadipate by aldehyde dehydrogenase ALDH7B in vivo. Arabidopsis has three 2-ODD-C23 isoforms, among which Din11 is unusual because it also accepted arginine as substrate, which was not the case for the other 2-ODD-C23 isoforms from Arabidopsis or other plants. In contrast to EFE, none of the three Arabidopsis enzymes produced ethylene. Guanidine contents were typically between 10 and 20 nmol*(g fresh weight)-1 in Arabidopsis but increased to 100 or 300 nmol*(g fresh weight)-1 after homoarginine feeding or treatment with Din11-inducing methyljasmonate, respectively. In 2-ODD-C23 triple mutants, the guanidine content was strongly reduced, whereas it increased in overexpression plants. We discuss the implications of the finding of widespread guanidine-producing enzymes in photosynthetic eukaryotes as a so far underestimated branch of the bio-geochemical nitrogen cycle and propose possible functions of natural guanidine production.