A transfer-learning approach to predict antigen immunogenicity and T-cell receptor specificity

  1. Barbara Bravi  Is a corresponding author
  2. Andrea Di Gioacchino
  3. Jorge Fernandez-de-Cossio-Diaz
  4. Aleksandra M Walczak
  5. Thierry Mora
  6. Simona Cocco
  7. Rémi Monasson
  1. Department of Mathematics, Imperial College London, United Kingdom
  2. Laboratoire de Physique de l’Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université Paris-Cité, France
7 figures and 1 additional file

Figures

Figure 1 with 1 supplement
Cartoon of the differential RBM (diffRBM) learning approach.

(A) The parameters of background RBM (gray) are learnt from the ‘background’ sequence dataset. (B) The diffRBM units (gold) are learnt from a small subset of ‘selected’ sequences. (C) We consider …

Figure 1—figure supplement 1
DiffRBM architecture recapitulating the mathematical notation used in Materials and methods.

In this cartoon example, we have assumed Md=2, Mb=4 and N=9 (corresponding to the length of the actual sequence input in the case of peptides, see Figure 2A and Figure 2—figure supplement 1).

Figure 2 with 4 supplements
DiffRBM model of immunogenicity and structural interpretation of its parameters.

(A) DiffRBM units are learnt from HLA-specific peptides annotated as immunogenic. (B) HLA contact frequency for each peptide position across 41 structures (bars, left-axis). On the right-axis, …

Figure 2—source data 1

List of TCR-pMHC structures from PDB and estimated contact positions at 4Å.

https://cdn.elifesciences.org/articles/85126/elife-85126-fig2-data1-v2.xls
Figure 2—figure supplement 1
Schematic summary of the construction of a diffRBM model of immunogenicity.
Figure 2—figure supplement 2
Hyperparametric search for the diffRBM model of immunogenicity.

(A-B) The hyperparametric search for the diffRBM model of immunogenicity is performed by monitoring the value of the diffRBM units’ score (Equation 10) as a function of the number of differential …

Figure 2—figure supplement 3
Prediction of peptide contact positions with the TCR.

(A) The average PPV shown in Figure 2D is compared to the PPV obtained from: the single-site factors from diffRBM units with fields only (‘diffRBM units (lin.)’, dashed-dotted line); a prediction …

Figure 2—figure supplement 4
Prediction of peptide contact positions with the HLA.

(A) The average PPV of contact prediction shown in Figure 2B (inset) is compared to the PPV obtained from: the single-site factors from a background RBM with all parameters (bold gray line); a …

Figure 3 with 1 supplement
DiffRBM units encode molecular features of immunogenicity.

(A) Amino-acid usage log-enrichment of immunogenic to non-immunogenic peptides, across central positions (4-8) for each HLA type. The color code indicates amino acid properties: negatively charged …

Figure 3—figure supplement 1
Prediction of immunogenicity-related residues and mutation costs.

(A) Same as Figure 3B for all the key residues: W at position 5 (39 sequences) and position 6 (20 sequences), F at position 5 (104 sequences) and position 7 (107 sequences); M at position 5 (33 …

Figure 4 with 6 supplements
Immunogenic vs non-immunogenic peptide discrimination performance.

(A) The Area Under the Curve (AUC, see Materials and methods) is computed for HLA-specific diffRBM units’ scores of immunogenic and non-immunogenic held-out peptides. (B) Performance of diffRBM …

Figure 4—figure supplement 1
Comparison of performance of differential models of immunogenicity.

Comparison of performance, in terms of the AUC of immunogenic vs non-immunogenic discrimination, between diffRBM, its version where the differential part is linear (diffRBM lin.) and a PWM approach. …

Figure 4—figure supplement 2
Score comparison between immunogenic peptides and peptides from the human proteome.

We have drawn at random 105 peptides from all the possible 9-mers in the human proteome and we have assigned to them scores under the HLA-specific models listed in the legend. We have measured via …

Figure 4—figure supplement 3
Leave-one-organism-out cross-validation for HLA-A*02:01-specific model (Materials and methods).

The case of Trypanosoma cruzi visibly constitutes an outlier, whereby the binding affinity to the HLA alone discriminates accurately immunogenic and non immunogenic antigens (AUC = 0.85). This …

Figure 4—figure supplement 4
Further comparison of diffRBM and RBM scores.

Upper row: the AUC of classification of immunogenic vs non-immunogenic peptides given by the full RBM (trained in part on the background dataset and in part on the selected data and indicated as …

Figure 4—figure supplement 5
Hyperparametric search for the classifier of immunogenicity.

The hyperparametric search for the classifier is based on the optimal performance on the validation dataset at discriminating immunogenic vs non-immunogenic peptides as measured by the AUC. We show …

Figure 4—figure supplement 6
Performance of differential models of immunogenicity with sample reweighting.

Performance at discriminating immunogenic vs non-immunogenic peptides (like in Figure 4) where a reweighting scheme based on sequence similarity is applied during training (Materials and methods).

Figure 5 with 3 supplements
DiffRBM model of TCR epitope specificity and structural interpretation.

(A) DiffRBM units are learnt from CDR3β sequences of antigen-specific TCRs. (B) Contact frequency distribution (bars) with peptide at each CDR3β position, across 12 structures (2 for YLQPRTFLL, 3 …

Figure 5—figure supplement 1
Schematic summary of the construction of a diffRBM model of TCR epitope-specificity.
Figure 5—figure supplement 2
Hyperparametric search for the diffRBM model of TCR specificity.

(A-B) Hyperparametric search for the background RBM model using the dataset from Emerson et al., 2017 (Materials and methods). Due to the large sample size, the differences between training and …

Figure 5—figure supplement 3
Prediction of CDR3β contact positions with the peptide.

(A) Similarly to Figure 2—figure supplement 3A, the average PPV of contact prediction shown in Figure 5C is compared to the PPV by two alternative predictors, the expression (Equation 25) (‘diffRBM …

Figure 6 with 4 supplements
Performance at discriminating antigen-specific from generic T-cell receptors.

(A) For a given epitope model (e.g. the Influenza epitope GILGFVFTL), we assign diffRBM units’ scores to held-out sets of antigen-specific CDR3β and generic CDR3β from the bulk repertoire, and we …

Figure 6—figure supplement 1
Comparison of performance of differential models of TCR specificity.

Comparison of performance, in terms of the AUC of epitope-specific vs bulk sequences discrimination, between diffRBM, its version where the differential part is linear (diffRBM lin.) and a PWM …

Figure 6—figure supplement 2
Comparison of performance of differential models of TCR specificity with different background datasets.

The plots show, for each epitope, the AUC of discrimination between the epitope-specific and naive sequences for the diffRBM units, full RBM and background RBM in the case where the background …

Figure 6—figure supplement 3
Hyperparametric search of the optimal k for the k-NN algorithm.

The optimal k (k=26, indicated by the black bold dot) is chosen by looking at the maximal AUC of discrimination between epitope-specific and background sequences in the validation set. For each k,…

Figure 6—figure supplement 4
Comparison of performance of models of TCR specificity without V and J type.

Given that NetTCR2.0 (Montemurro et al., 2021) does not account for V and J type in its input, we report the performance of the diffRBM units, full RBM and background RBM for the 4 epitope-specific …

Appendix 4—figure 1
Model-based entropy estimation.

(A) Entropy (expressed in nats) of the space of HLA-specific presented antigens (evaluated by background RBM) and of HLA-specific immunogenic antigens (evaluated by the full RBM) for the 3 HLAs. …

Additional files

Download links