(A) The parameters of background RBM (gray) are learnt from the ‘background’ sequence dataset. (B) The diffRBM units (gold) are learnt from a small subset of ‘selected’ sequences. (C) We consider …
In this cartoon example, we have assumed , and (corresponding to the length of the actual sequence input in the case of peptides, see Figure 2A and Figure 2—figure supplement 1).
(A) DiffRBM units are learnt from HLA-specific peptides annotated as immunogenic. (B) HLA contact frequency for each peptide position across 41 structures (bars, left-axis). On the right-axis, …
List of TCR-pMHC structures from PDB and estimated contact positions at 4Å.
(A-B) The hyperparametric search for the diffRBM model of immunogenicity is performed by monitoring the value of the diffRBM units’ score (Equation 10) as a function of the number of differential …
(A) The average PPV shown in Figure 2D is compared to the PPV obtained from: the single-site factors from diffRBM units with fields only (‘diffRBM units (lin.)’, dashed-dotted line); a prediction …
(A) The average PPV of contact prediction shown in Figure 2B (inset) is compared to the PPV obtained from: the single-site factors from a background RBM with all parameters (bold gray line); a …
(A) Amino-acid usage log-enrichment of immunogenic to non-immunogenic peptides, across central positions (4-8) for each HLA type. The color code indicates amino acid properties: negatively charged …
(A) Same as Figure 3B for all the key residues: W at position 5 (39 sequences) and position 6 (20 sequences), F at position 5 (104 sequences) and position 7 (107 sequences); M at position 5 (33 …
(A) The Area Under the Curve (AUC, see Materials and methods) is computed for HLA-specific diffRBM units’ scores of immunogenic and non-immunogenic held-out peptides. (B) Performance of diffRBM …
Comparison of performance, in terms of the AUC of immunogenic vs non-immunogenic discrimination, between diffRBM, its version where the differential part is linear (diffRBM lin.) and a PWM approach. …
We have drawn at random 105 peptides from all the possible 9-mers in the human proteome and we have assigned to them scores under the HLA-specific models listed in the legend. We have measured via …
The case of Trypanosoma cruzi visibly constitutes an outlier, whereby the binding affinity to the HLA alone discriminates accurately immunogenic and non immunogenic antigens (AUC = 0.85). This …
Upper row: the AUC of classification of immunogenic vs non-immunogenic peptides given by the full RBM (trained in part on the background dataset and in part on the selected data and indicated as …
The hyperparametric search for the classifier is based on the optimal performance on the validation dataset at discriminating immunogenic vs non-immunogenic peptides as measured by the AUC. We show …
Performance at discriminating immunogenic vs non-immunogenic peptides (like in Figure 4) where a reweighting scheme based on sequence similarity is applied during training (Materials and methods).
(A) DiffRBM units are learnt from CDR3β sequences of antigen-specific TCRs. (B) Contact frequency distribution (bars) with peptide at each CDR3β position, across 12 structures (2 for YLQPRTFLL, 3 …
(A-B) Hyperparametric search for the background RBM model using the dataset from Emerson et al., 2017 (Materials and methods). Due to the large sample size, the differences between training and …
(A) Similarly to Figure 2—figure supplement 3A, the average PPV of contact prediction shown in Figure 5C is compared to the PPV by two alternative predictors, the expression (Equation 25) (‘diffRBM …
(A) For a given epitope model (e.g. the Influenza epitope GILGFVFTL), we assign diffRBM units’ scores to held-out sets of antigen-specific CDR3β and generic CDR3β from the bulk repertoire, and we …
Comparison of performance, in terms of the AUC of epitope-specific vs bulk sequences discrimination, between diffRBM, its version where the differential part is linear (diffRBM lin.) and a PWM …
The plots show, for each epitope, the AUC of discrimination between the epitope-specific and naive sequences for the diffRBM units, full RBM and background RBM in the case where the background …
The optimal (=26, indicated by the black bold dot) is chosen by looking at the maximal AUC of discrimination between epitope-specific and background sequences in the validation set. For each ,…
Given that NetTCR2.0 (Montemurro et al., 2021) does not account for V and J type in its input, we report the performance of the diffRBM units, full RBM and background RBM for the 4 epitope-specific …