High-throughput profiling of sequence recognition by tyrosine kinases and SH2 domains using bacterial peptide display

  1. Allyson Li
  2. Rashmi Voleti
  3. Minhee Lee
  4. Dejan Gagoski
  5. Neel H Shah  Is a corresponding author
  1. Columbia University, United States

Abstract

Tyrosine kinases and SH2 (phosphotyrosine recognition) domains have binding specificities that depend on the amino acid sequence surrounding the target (phospho)tyrosine residue. Although the preferred recognition motifs of many kinases and SH2 domains are known, we lack a quantitative description of sequence specificity that could guide predictions about signaling pathways or be used to design sequences for biomedical applications. Here, we present a platform that combines genetically-encoded peptide libraries and deep sequencing to profile sequence recognition by tyrosine kinases and SH2 domains. We screened several tyrosine kinases against a million-peptide random library and used the resulting profiles to design high-activity sequences. We also screened several kinases against a library containing thousands of human proteome-derived peptides and their naturally-occurring variants. These screens recapitulated independently measured phosphorylation rates and revealed hundreds of phosphosite-proximal mutations that impact phosphosite recognition by tyrosine kinases. We extended this platform to the analysis of SH2 domains and showed that screens could predict relative binding affinities. Finally, we expanded our method to assess the impact of non-canonical and post-translationally modified amino acids on sequence recognition. This specificity profiling platform will shed new light on phosphotyrosine signaling and could readily be adapted to other protein modification/recognition domains.

Data availability

All of the processed data from the high-throughput specificity screens are provided as source data files. The raw fastq and fasta sequencing files are available as a Dryad repository (DOI: 10.5061/dryad.0zpc86727). Custom code used to process/analyze screening data can be found in a GitHub repository, as specified in the manuscript.

The following data sets were generated
The following previously published data sets were used

Article and author information

Author details

  1. Allyson Li

    Department of Chemistry, Columbia University, New York, United States
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-2359-7703
  2. Rashmi Voleti

    Department of Chemistry, Columbia University, New York, United States
    Competing interests
    The authors declare that no competing interests exist.
  3. Minhee Lee

    Department of Chemistry, Columbia University, New York, United States
    Competing interests
    The authors declare that no competing interests exist.
  4. Dejan Gagoski

    Department of Chemistry, Columbia University, New York, United States
    Competing interests
    The authors declare that no competing interests exist.
  5. Neel H Shah

    Department of Chemistry, Columbia University, New York, United States
    For correspondence
    neel.shah@columbia.edu
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-1186-0626

Funding

National Institute of General Medical Sciences (R35GM138014)

  • Neel H Shah

Damon Runyon Cancer Research Foundation (DFS 31-18)

  • Neel H Shah

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Copyright

© 2023, Li et al.

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 2,376
    views
  • 280
    downloads
  • 11
    citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Allyson Li
  2. Rashmi Voleti
  3. Minhee Lee
  4. Dejan Gagoski
  5. Neel H Shah
(2023)
High-throughput profiling of sequence recognition by tyrosine kinases and SH2 domains using bacterial peptide display
eLife 12:e82345.
https://doi.org/10.7554/eLife.82345

Share this article

https://doi.org/10.7554/eLife.82345

Further reading

    1. Biochemistry and Chemical Biology
    Shraddha KC, Kenny H Nguyen ... Thomas C Boothby
    Research Article

    The conformational ensemble and function of intrinsically disordered proteins (IDPs) are sensitive to their solution environment. The inherent malleability of disordered proteins, combined with the exposure of their residues, accounts for this sensitivity. One context in which IDPs play important roles that are concomitant with massive changes to the intracellular environment is during desiccation (extreme drying). The ability of organisms to survive desiccation has long been linked to the accumulation of high levels of cosolutes such as trehalose or sucrose as well as the enrichment of IDPs, such as late embryogenesis abundant (LEA) proteins or cytoplasmic abundant heat-soluble (CAHS) proteins. Despite knowing that IDPs play important roles and are co-enriched alongside endogenous, species-specific cosolutes during desiccation, little is known mechanistically about how IDP-cosolute interactions influence desiccation tolerance. Here, we test the notion that the protective function of desiccation-related IDPs is enhanced through conformational changes induced by endogenous cosolutes. We find that desiccation-related IDPs derived from four different organisms spanning two LEA protein families and the CAHS protein family synergize best with endogenous cosolutes during drying to promote desiccation protection. Yet the structural parameters of protective IDPs do not correlate with synergy for either CAHS or LEA proteins. We further demonstrate that for CAHS, but not LEA proteins, synergy is related to self-assembly and the formation of a gel. Our results suggest that functional synergy between IDPs and endogenous cosolutes is a convergent desiccation protection strategy seen among different IDP families and organisms, yet the mechanisms underlying this synergy differ between IDP families.

    1. Biochemistry and Chemical Biology
    2. Stem Cells and Regenerative Medicine
    Alejandro J Brenes, Eva Griesser ... Angus I Lamond
    Research Article

    Human induced pluripotent stem cells (hiPSCs) have great potential to be used as alternatives to embryonic stem cells (hESCs) in regenerative medicine and disease modelling. In this study, we characterise the proteomes of multiple hiPSC and hESC lines derived from independent donors and find that while they express a near-identical set of proteins, they show consistent quantitative differences in the abundance of a subset of proteins. hiPSCs have increased total protein content, while maintaining a comparable cell cycle profile to hESCs, with increased abundance of cytoplasmic and mitochondrial proteins required to sustain high growth rates, including nutrient transporters and metabolic proteins. Prominent changes detected in proteins involved in mitochondrial metabolism correlated with enhanced mitochondrial potential, shown using high-resolution respirometry. hiPSCs also produced higher levels of secreted proteins, including growth factors and proteins involved in the inhibition of the immune system. The data indicate that reprogramming of fibroblasts to hiPSCs produces important differences in cytoplasmic and mitochondrial proteins compared to hESCs, with consequences affecting growth and metabolism. This study improves our understanding of the molecular differences between hiPSCs and hESCs, with implications for potential risks and benefits for their use in future disease modelling and therapeutic applications.