A high-throughput yeast display approach to profile pathogen proteomes for MHC-II binding

  1. Brooke D Huisman
  2. Zheng Dai
  3. David K Gifford
  4. Michael E Birnbaum  Is a corresponding author
  1. Koch Institute for Integrative Cancer Research, United States
  2. Department of Biological Engineering, Massachusetts Institute of Technology, United States
  3. Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, United States
  4. Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, United States
  5. Ragon Institute of MGH, MIT and Harvard, United States
5 figures, 4 tables and 2 additional files

Figures

Figure 1 with 1 supplement
Overview of library and selections.

(a) The defined library contains pathogen proteome peptides (length 15, sliding window 1). Poor binding peptides are displaced with addition of protease, competitor peptide, and HLA-DM. (b) Schematic of doped and undoped libraries: in the doped selection strategy, the library is added to a library of null, non-expressing constructs. (c) Representative flow plots showing enrichment of MHC-expressing yeast over rounds of selection for the library containing SARS-CoV-2 and SARS-CoV peptides on HLA-DR401.

Figure 1—figure supplement 1
Correlations between selection rounds.

Pearson correlation for HLA-DR401 SARS-CoV-2 and SARS-CoV defined library members (+/- signs indicate enriched (+) or not enriched (-) yeast in undoped library; rounds of selection are indicated, e.g. ‘R1’ indicates ‘Round 1’ and ‘R0’ is the unselected ‘Round 0’ library).

Figure 2 with 3 supplements
Output of selections and analysis of selection data.

(a) Overview of filtering peptides and correspondence between selection strategies for SARS-CoV and SARS-CoV-2 library on HLA-DR401. Peptides are filtered for enrichment in both doped and undoped libraries. Further, the relationship between these peptides and peptides which contain a 9mer that is enriched in five or more of the seven peptides containing it is shown. (b) Relationships between enrichment in doped and undoped libraries. Absolute counts following Round 3 of selection of the doped library are plotted against the log2 fold change between read fraction for peptides in Round 2 and Round 1. Data are shown for the library on HLA-DR401. (c) Sequence logo of 2467 peptides that enriched in both doped and undoped selected libraries for HLA-DR401. Registers are inferred with a position weight matrix-based alignment method. Logos were generated with Seq2Logo-2.0.

Figure 2—figure supplement 1
Utilizing overlapping peptides to call high-confidence peptides.

(a) Schematic showing use of overlapping 15mers, containing redundant 9mers. Each 9mer is present in seven 15mers; for each 9mer, we calculate how many of these seven 15mers enriched. (b) Number of peptides containing a given 9mer that are hits, for SARS-CoV-2 nucleocapsid on HLA-DR401. Black = hits in both undoped and doped libraries; blue = hits in undoped library only; red = hits in doped library only. Enrichment categories are stacked, for a maximum of seven 15mer hits, since each 9mer is present in seven 15mers.

Figure 2—figure supplement 2
Full Venn diagrams.

Full Venn diagrams showing relationships between peptides which enriched in the doped library (‘Doped’) and undoped library (‘Undoped’), and contained a 9mer peptide which enriched in five or more of the seven 15mers containing it (‘≥5/7’), for (a) HLA-DR401, (b) HLA-DR402, and (c) HLA-DR404.

Figure 2—figure supplement 3
Sequence logo for HLA-DR402 and HLA-DR404.

(a) HLA-DR402: Sequence logo of 1690 peptides that enriched in both doped and undoped selections of the SARS-CoV and SARS-CoV-2 library for HLA-DR402. (b) HLA-DR404: Sequence logo of 2094 peptides that enriched in both doped and undoped selections of the SARS-CoV and SARS-CoV-2 library for HLA-DR404. Logos were generated with Seq2Logo-2.0.

Comparing HLA-DR401, HLA-DR402, and HLA-DR404 for binding to related spike proteins from SARS-CoV-2 and SARS-CoV.

(a) Sequence alignment showing sequence differences in HLA-DR402 and HLA-DR404 compared to HLA-DR401 and highlighted on HLA-DR401 structure (PDB 1J8H). Colors are: red for amino acids shared between HLA-DR401 and HLA-DR404, green for amino acids shared between HLA-DR402 and HLA-DR404, and yellow for amino acids different in all three alleles. Affected peptide positions (P1, P4, P5, P7) are colored in blue and labelled on the structure. (b) Conservation and enrichment of 9mer peptides from SARS-CoV-2 and SARS-CoV spike proteins. Conserved 9mers are indicated in black. If a 9mer along the proteome enriched in five or more of the adjacent peptides containing it, its enrichment is indicated with a vertical line with color for allele (HLA-DR401: blue; HLA-DR402: red; HLA-DR404: gray) and opacity for virus (SARS-CoV-2: dark; SARS-CoV: light). (b–e) Zoomed regions show enrichment of individual 15mer peptides. Only peptides containing the bolded 9mer sequence are shown. Amino acids in the bolded 9mer that are not conserved between SARS-CoV-2 and SARS-CoV are highlighted in yellow.

Figure 4 with 3 supplements
Comparing measured IC50 values and computational prediction.

Relationship between measured IC50 values and NetMHCIIpan4.0 predicted ranks in Eluted Ligand mode (EL) on invariant-flanked sequences. Data points are colored by label, and IC50 values ≥ 50 µM are set to 50 µM.

Figure 4—figure supplement 1
Comparing defined library selection with algorithmic predictions: SARS-CoV-2 spike protein.

15mer peptides which enriched for binding to HLA-DR401 in both the doped and undoped libraries are indicated with horizontal lines above the enriched 15mer sequence (blue). NetMHCIIpan4.0 predicted binders (rank ≤ 10%) on yeast-formatted peptides are shown in red. Boxed sequences are tested in subsequent fluorescent polarization experiments, and colored as indicated in the legend.

Figure 4—figure supplement 2
Titration curves for peptides tested via fluorescence polarization for binding to HLA-DR401, by category.

(a) Agreed binder peptides which are predicted to bind by NetMHCIIpan4.0 and enriched in yeast display experiments. Dashed line is the positive control HA peptide. (b) Agreed non-binder peptides which did not enrich in yeast display experiments and were not predicted to bind by NetMHCIIpan4.0. (c) Yeast enriched peptides from Table 2 and Table 3. Offset variants from Table 3 are dashed lines. (d) NetMHCIIpan4.0 predicted peptides which are not enriched in the yeast display library. Mean and standard deviation from three replicates are plotted for each peptide concentration. Source data are provided as Figure 4—figure supplement 2—source data 1.

Figure 4—figure supplement 2—source data 1

Fluorescence polarization measurements for peptide-HLA-DR401 binding.

https://cdn.elifesciences.org/articles/78589/elife-78589-fig4-figsupp2-data1-v2.xlsx
Figure 4—figure supplement 3
Comparing measured IC50 values and prediction.

Relationship between measured IC50 values and NetMHCIIpan4.0 predicted ranks in Eluted Ligand mode (EL) on unflanked (native) 15mer sequences. Data points are colored by label, and IC50 values ≥ 50 µM are set to 50 µM.

Conservation and enrichment of dengue virus serotypes 1–4.

(a) Conservation and enrichment of 9mer peptides along four aligned dengue serotypes. All stretches of nine amino acids are compared across the four serotypes and conservation is indicated with a black vertical line (i.e. 2, 3, or 4 of four serotypes conserved). 9mers which enriched on HLA-DR401 are also indicated, colored by virus serotype. (b–d) Zoomed regions, showing enrichment for individual 15mer peptides to HLA-DR401. Only peptides which contain the bolded 9mer sequence are shown. Amino acids in the bolded 9mer that are not conserved between serotypes are highlighted in yellow. Insets show regions which are differently conserved and enriched: (b) non-conserved sequences with peptides from one serotype enriched; (c) conserved sequences enriched across all serotypes; (d) non-conserved sequences which are enriched.

Tables

Table 1
Summary of enriched peptides for each source protein, including: the number of unique 15mers which each enriched in both of the doped and undoped libraries; the number of unique 9mer cores identified by register inference in these enriched 15mers (native cores only, so linker-containing inferred cores excluded); and the number of unique enriched 15mers that contain 9mer sequences enriched in five or more of overlapping neighbors.
VirusProteinProtein length(# of amino acids)MHC allele# of 15mers# of 9mer cores# of smoothed 15mers
SARS-CoVSpike1255HLA-DR40132474221
HLA-DR40221765110
HLA-DR40428961193
SARS-CoVNucleocapsid422HLA-DR40140834
HLA-DR402341312
HLA-DR40431620
SARS-CoV-2Spike1273HLA-DR40130567221
HLA-DR40223062130
HLA-DR40429064217
SARS-CoV-2Nucleocapsid419HLA-DR40134824
HLA-DR402331015
HLA-DR40430818
SARS-CoV-2Replicase polyprotein 1ab7096HLA-DR40116523881204
HLA-DR4021104325678
HLA-DR4041368350890
SARS-CoV-2Non-structural protein 8121HLA-DR401411032
HLA-DR40221717
HLA-DR40432819
SARS-CoV-2Protein 7a121HLA-DR40127818
HLA-DR402730
HLA-DR4041326
SARS-CoV-2Non-structural protein 661HLA-DR401000
HLA-DR402110
HLA-DR404000
SARS-CoV-2Membrane protein222HLA-DR40140729
HLA-DR40226619
HLA-DR40423721
SARS-CoV-2Envelope small membrane protein75HLA-DR401610
HLA-DR402730
HLA-DR404610
SARS-CoV-2Protein 3a275HLA-DR40122411
HLA-DR40213410
HLA-DR4041020
SARS-CoV-2Replicase polyprotein 1a4405HLA-DR401948228658
HLA-DR402657196409
HLA-DR404865222582
SARS-CoV-2ORF10 protein38HLA-DR401616
HLA-DR402200
HLA-DR404515
SARS-CoV-2Protein non-structural 7b43HLA-DR401000
HLA-DR402000
HLA-DR404000
SARS-CoV-2Uncharacterized protein 1473HLA-DR401846
HLA-DR40220516
HLA-DR40422421
SARS-CoV-2Protein 9b97HLA-DR40129727
HLA-DR40235631
HLA-DR40437934
Table 2
Peptides selected for fluorescence polarization (FP) experiments for binding to HLA-DR401.

NetMHCIIpan4.0 predictions for HLA-DR401 binding are performed on 15mers plus invariant flanking residues (N-terminal Ala, C-terminal Gly-Gly-Ser) and percent rank values generated using Eluted Ligand mode. Fluorescence polarization is performed on native 15mer peptides without invariant flanking residues.

Spike positionPeptide + flank(A + 15mer + GGS)NetMHCIIpan4.0predicted core(A + 15mer + GGS)NetMHCIIpan4.0 %Rank(A + 15mer + GGS)15mer affinity from FP (IC50, nM)
Agreed Binders34–48ARGVYYPDKVFRSSVLGGSYYPDKVFRS1.4915.8
87–101ANDGVYFASTEKSNIIGGSVYFASTEKS4.282117
303–317ALKSFTVEKGIYQTSNGGSFTVEKGIYQ8.41396.9
362–376AVADYSVLYNSASFSTGGSYSVLYNSAS8.36113.7
1015–1029AAAEIRASANLAATKMGGSIRASANLAA3.13105.4
1112–1126APQIITTDNTFVSGNCGGSITTDNTFVS7.32527.0
Yeast-Enriched Binders165–179ANCTFEYVSQPFLMDLGGSYVSQPFLMD64.8314,652
172–186ASQPFLMDLEGKQGNFGGSFLMDLEGKQ20.34123.2
286–300ATDAVDCALDPLSETKGGSVDCALDPLS32.68521.6
373–387ASFSTFKCYGVSPTKLGGSYGVSPTKLG16.5918,452
469–483ASTEIYQAGSTPCNGVGGSIYQAGSTPC18.2267.7
580–594AQTLEILDITPCSFGGGGSLEILDITPC62.00119.9
739–753ATMYICGDSTECSNLLGGSYICGDSTEC70.9114.4
920–934AQKLIANQFNSAIGKIGGSFNSAIGKIG20.471121
NetMHC-Predicted Binders113–127AKTQSLLIVNNATNVVGGSIVNNATNVV8.74>50,000
492–506ALQSYGFQPTNGVGYQGGSYGFQPTNGV4.11454.7
1151–1165AELDKYFKNHTSPDVDGGSYFKNHTSPD5.7435,510
Agreed Non-Binders534–548AVKNKCVNFNFNGLTGGGSFNFNGLTGG57.13>50,000
1079–1093APAICHDGKAHFPREGGGSICHDGKAHF80.47>50,000
Table 3
Effects of peptide flanking sequences on NetMHCIIpan4.0 predictions for HLA-DR401 binding and measured fluorescence polarization (FP) values for overlapping peptides.

Yeast display-enriched peptides that are predicted to bind by NetMHCIIpan4.0 when without flanking residues, plus offset variants of these peptides, which are not predicted to bind, with or without flanking sequence. Yeast display register-inferred consensus cores are highlighted in bold. NetMHCIIpan4.0 percent rank values are generated using Eluted Ligand mode.

Spike positionSequenceNetMHCIIpan4.0 predicted core(A + 15mer + GGS)NetMHCIIpan4.0 %Rank(A + 15mer + GGS)NetMHCIIpan4.0 predicted core(15mer)NetMHCIIpan4.0 %Rank(15mer)15mer affinity from FP (IC50, nM)
172–186SQPFLMDLEGKQGNFFLMDLEGKQ20.34FLMDLEGKQ4.1123.2
173–187QPFLMDLEGKQGNFKFLMDLEGKQ27.73FLMDLEGKQ12.218613
286–300TDAVDCALDPLSETKVDCALDPLS32.68VDCALDPLS9.81154
287–301DAVDCALDPLSETKCVDCALDPLS42.42VDCALDPLS22.574,393
469–483STEIYQAGSTPCNGVIYQAGSTPC18.22IYQAGSTPC5.4167.7
467–481DISTEIYQAGSTPCNIYQAGSTPC11.47IYQAGSTPC12.614875
471–485EIYQAGSTPCNGVEGYQAGSTPCN39.17YQAGSTPCN21.8112,519
920–934QKLIANQFNSAIGKIFNSAIGKIG20.47IANQFNSAI7.891495
921–935KLIANQFNSAIGKIQFNSAIGKIQ18.3IANQFNSAI19.7911,937
Key resources table
Reagent type (species) or resourceDesignationSource or referenceIdentifiersAdditional information
Strain, strain background (Saccharomyces cerevisiae)RJY100PMID:26333274
Cell line (Trichoplusia ni)High Five cellsThermo FisherCat#:B85502
Cell line (Spodoptera frugiperda)Sf9 cellsThermo FisherCat#:11496015
AntibodyAnti-Myc-AlexaFluor647 (Mouse monoclonal)Cell Signaling TechnologyCat#:2233Library selections: 1:100
Recombinant DNA reagentPeptide-MHC-II with cleavable peptide linker in pYal (plasmid)PMID:32887877
Recombinant DNA reagentHLA-DR401 in pAcGP67a (plasmid)PMID:32887877
Recombinant DNA reagentHLA-DM in pAcGP67a (plasmid)PMID:32887877
Peptide, recombinant protein3C proteaseOtherPurified from Escherichia coli BL21 cells
Software, algorithmPandaSeqPMID:22333067
Software, algorithmNetMHCIIpan4.0PMID:32406916
Software, algorithmPeptide register inference algorithmThis paperSee Code Availability
Software, algorithmPrismGraphPad Prism software (http://www.graphpad.com/)Version: 9.3

Additional files

Supplementary file 1

Data from generation and deep sequencing of yeast-displayed peptide-major histocompatibility complex (MHC) libraries.

(a) Peptide count data from deep sequencing of the SARS-CoV and SARS-CoV-2 libraries. (b) Inferred binding registers of enriched peptides from the SARS-CoV and SARS-CoV-2 libraries. (c) Peptide count data from deep sequencing of the dengue libraries. (d) Primer sequences utilized for generation and deep sequencing of yeast display libraries.

https://cdn.elifesciences.org/articles/78589/elife-78589-supp1-v2.xlsx
MDAR checklist
https://cdn.elifesciences.org/articles/78589/elife-78589-mdarchecklist1-v2.docx

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Brooke D Huisman
  2. Zheng Dai
  3. David K Gifford
  4. Michael E Birnbaum
(2022)
A high-throughput yeast display approach to profile pathogen proteomes for MHC-II binding
eLife 11:e78589.
https://doi.org/10.7554/eLife.78589