High-throughput profiling of sequence recognition by tyrosine kinases and SH2 domains using bacterial peptide display

Abstract
Data availability
Article and author information
Metrics

Abstract

Tyrosine kinases and SH2 (phosphotyrosine recognition) domains have binding specificities that depend on the amino acid sequence surrounding the target (phospho)tyrosine residue. Although the preferred recognition motifs of many kinases and SH2 domains are known, we lack a quantitative description of sequence specificity that could guide predictions about signaling pathways or be used to design sequences for biomedical applications. Here, we present a platform that combines genetically-encoded peptide libraries and deep sequencing to profile sequence recognition by tyrosine kinases and SH2 domains. We screened several tyrosine kinases against a million-peptide random library and used the resulting profiles to design high-activity sequences. We also screened several kinases against a library containing thousands of human proteome-derived peptides and their naturally-occurring variants. These screens recapitulated independently measured phosphorylation rates and revealed hundreds of phosphosite-proximal mutations that impact phosphosite recognition by tyrosine kinases. We extended this platform to the analysis of SH2 domains and showed that screens could predict relative binding affinities. Finally, we expanded our method to assess the impact of non-canonical and post-translationally modified amino acids on sequence recognition. This specificity profiling platform will shed new light on phosphotyrosine signaling and could readily be adapted to other protein modification/recognition domains.

Data availability

All of the processed data from the high-throughput specificity screens are provided as source data files. The raw fastq and fasta sequencing files are available as a Dryad repository (DOI: 10.5061/dryad.0zpc86727). Custom code used to process/analyze screening data can be found in a GitHub repository, as specified in the manuscript.

The following data sets were generated

1. Li A
2. et al
(2023) Data from: High-throughput profiling of sequence recognition by tyrosine kinases and SH2 domains using bacterial peptide display
Dryad Digital Repository, doi:10.5061/dryad.0zpc86727.

http://dx.doi.org/10.5061/dryad.0zpc86727

The following previously published data sets were used

(2014) PTMVar dataset
PhosphositePlus.

https://www.phosphosite.org/staticDownloads.action

Article and author information

Author details

Allyson Li

Department of Chemistry, Columbia University, New York, United States

Competing interests
The authors declare that no competing interests exist.

"This ORCID iD identifies the author of this article:" 0000-0003-2359-7703
Rashmi Voleti

Department of Chemistry, Columbia University, New York, United States

Competing interests
The authors declare that no competing interests exist.
Minhee Lee

Department of Chemistry, Columbia University, New York, United States

Competing interests
The authors declare that no competing interests exist.
Dejan Gagoski

Department of Chemistry, Columbia University, New York, United States

Competing interests
The authors declare that no competing interests exist.
Neel H Shah

Department of Chemistry, Columbia University, New York, United States

For correspondence
neel.shah@columbia.edu

Competing interests
The authors declare that no competing interests exist.

"This ORCID iD identifies the author of this article:" 0000-0002-1186-0626

Funding

National Institute of General Medical Sciences (R35GM138014)

Neel H Shah

Damon Runyon Cancer Research Foundation (DFS 31-18)

Neel H Shah

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.