High-throughput profiling of sequence recognition by tyrosine kinases and SH2 domains using bacterial peptide display

  1. Allyson Li
  2. Rashmi Voleti
  3. Minhee Lee
  4. Dejan Gagoski
  5. Neel H Shah  Is a corresponding author
  1. Columbia University, United States

Abstract

Tyrosine kinases and SH2 (phosphotyrosine recognition) domains have binding specificities that depend on the amino acid sequence surrounding the target (phospho)tyrosine residue. Although the preferred recognition motifs of many kinases and SH2 domains are known, we lack a quantitative description of sequence specificity that could guide predictions about signaling pathways or be used to design sequences for biomedical applications. Here, we present a platform that combines genetically-encoded peptide libraries and deep sequencing to profile sequence recognition by tyrosine kinases and SH2 domains. We screened several tyrosine kinases against a million-peptide random library and used the resulting profiles to design high-activity sequences. We also screened several kinases against a library containing thousands of human proteome-derived peptides and their naturally-occurring variants. These screens recapitulated independently measured phosphorylation rates and revealed hundreds of phosphosite-proximal mutations that impact phosphosite recognition by tyrosine kinases. We extended this platform to the analysis of SH2 domains and showed that screens could predict relative binding affinities. Finally, we expanded our method to assess the impact of non-canonical and post-translationally modified amino acids on sequence recognition. This specificity profiling platform will shed new light on phosphotyrosine signaling and could readily be adapted to other protein modification/recognition domains.

Data availability

All of the processed data from the high-throughput specificity screens are provided as source data files. The raw fastq and fasta sequencing files are available as a Dryad repository (DOI: 10.5061/dryad.0zpc86727). Custom code used to process/analyze screening data can be found in a GitHub repository, as specified in the manuscript.

The following data sets were generated
The following previously published data sets were used

Article and author information

Author details

  1. Allyson Li

    Department of Chemistry, Columbia University, New York, United States
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-2359-7703
  2. Rashmi Voleti

    Department of Chemistry, Columbia University, New York, United States
    Competing interests
    The authors declare that no competing interests exist.
  3. Minhee Lee

    Department of Chemistry, Columbia University, New York, United States
    Competing interests
    The authors declare that no competing interests exist.
  4. Dejan Gagoski

    Department of Chemistry, Columbia University, New York, United States
    Competing interests
    The authors declare that no competing interests exist.
  5. Neel H Shah

    Department of Chemistry, Columbia University, New York, United States
    For correspondence
    neel.shah@columbia.edu
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-1186-0626

Funding

National Institute of General Medical Sciences (R35GM138014)

  • Neel H Shah

Damon Runyon Cancer Research Foundation (DFS 31-18)

  • Neel H Shah

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Copyright

© 2023, Li et al.

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 2,666
    views
  • 290
    downloads
  • 13
    citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Allyson Li
  2. Rashmi Voleti
  3. Minhee Lee
  4. Dejan Gagoski
  5. Neel H Shah
(2023)
High-throughput profiling of sequence recognition by tyrosine kinases and SH2 domains using bacterial peptide display
eLife 12:e82345.
https://doi.org/10.7554/eLife.82345

Share this article

https://doi.org/10.7554/eLife.82345

Further reading

    1. Biochemistry and Chemical Biology
    2. Genetics and Genomics
    Kira Breunig, Xuifen Lei ... Luiz O Penalva
    Research Article

    RNA binding proteins (RBPs) containing intrinsically disordered regions (IDRs) are present in diverse molecular complexes where they function as dynamic regulators. Their characteristics promote liquid-liquid phase separation (LLPS) and the formation of membraneless organelles such as stress granules and nucleoli. IDR-RBPs are particularly relevant in the nervous system and their dysfunction is associated with neurodegenerative diseases and brain tumor development. Serpine1 mRNA-binding protein 1 (SERBP1) is a unique member of this group, being mostly disordered and lacking canonical RNA-binding domains. We defined SERBP1’s interactome, uncovered novel roles in splicing, cell division and ribosomal biogenesis, and showed its participation in pathological stress granules and Tau aggregates in Alzheimer’s brains. SERBP1 preferentially interacts with other G-quadruplex (G4) binders, implicated in different stages of gene expression, suggesting that G4 binding is a critical component of SERBP1 function in different settings. Similarly, we identified important associations between SERBP1 and PARP1/polyADP-ribosylation (PARylation). SERBP1 interacts with PARP1 and its associated factors and influences PARylation. Moreover, protein complexes in which SERBP1 participates contain mostly PARylated proteins and PAR binders. Based on these results, we propose a feedback regulatory model in which SERBP1 influences PARP1 function and PARylation, while PARylation modulates SERBP1 functions and participation in regulatory complexes.

    1. Biochemistry and Chemical Biology
    Parnian Arafi, Sujan Devkota ... Michael S Wolfe
    Research Article

    Missense mutations in the amyloid precursor protein (APP) and presenilin-1 (PSEN1) cause early-onset familial Alzheimer’s disease (FAD) and alter proteolytic production of secreted 38-to-43-residue amyloid β-peptides (Aβ) by the PSEN1-containing γ-secretase complex, ostensibly supporting the amyloid hypothesis of pathogenesis. However, proteolysis of APP substrate by γ-secretase is processive, involving initial endoproteolysis to produce long Aβ peptides of 48 or 49 residues followed by carboxypeptidase trimming in mostly tripeptide increments. We recently reported evidence that FAD mutations in APP and PSEN1 cause deficiencies in early steps in processive proteolysis of APP substrate C99 and that this results from stalled γ-secretase enzyme-substrate and/or enzyme-intermediate complexes. These stalled complexes triggered synaptic degeneration in a Caenorhabditis elegans model of FAD independently of Aβ production. Here, we conducted full quantitative analysis of all proteolytic events on APP substrate by γ-secretase with six additional PSEN1 FAD mutations and found that all six are deficient in multiple processing steps. However, only one of these (F386S) was deficient in certain trimming steps but not in endoproteolysis. Fluorescence lifetime imaging microscopy in intact cells revealed that all six PSEN1 FAD mutations lead to stalled γ-secretase enzyme-substrate/intermediate complexes. The F386S mutation, however, does so only in Aβ-rich regions of the cells, not in C99-rich regions, consistent with the deficiencies of this mutant enzyme only in trimming of Aβ intermediates. These findings provide further evidence that FAD mutations lead to stalled and stabilized γ-secretase enzyme-substrate and/or enzyme-intermediate complexes and are consistent with the stalled process rather than the products of γ-secretase proteolysis as the pathogenic trigger.