The T cell receptor β chain repertoire of tumor infiltrating lymphocytes improves neoantigen prediction and prioritization

eLife assessment

The study presents a potentially valuable approach by combining two measurements (pHLA binding and pHLA-TCR binding) to improve predictions of which mutations in colorectal cancer are likely to be presented to and recognised by the immune system. While this approach is promising, the evidence supporting the primary claim remains somewhat incomplete. The experimental validation of the computational predictions with actual immune responses is still limited, despite the increase in sample size from 4 to 8 in this revision.

https://doi.org/10.7554/eLife.94658.3.sa0

Significance of the findings:

Valuable: Findings that have theoretical or practical implications for a subfield

Landmark
Fundamental
Important
Valuable
Useful

Strength of evidence:

Incomplete: Main claims are only partially supported

Exceptional
Compelling
Convincing
Solid
Incomplete
Inadequate

During the peer-review process the editor and reviewers write an eLife Assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife Assessments

Abstract
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

In the realm of cancer immunotherapy, the meticulous selection of neoantigens plays a fundamental role in enhancing personalized treatments. Traditionally, this selection process has heavily relied on predicting the binding of peptides to human leukocyte antigens (pHLA). Nevertheless, this approach often overlooks the dynamic interaction between tumor cells and the immune system. In response to this limitation, we have developed an innovative prediction algorithm rooted in machine learning, integrating T cell receptor β chain (TCRβ) profiling data from colorectal cancer (CRC) patients for a more precise neoantigen prioritization. TCRβ sequencing was conducted to profile the TCR repertoire of tumor-infiltrating lymphocytes (TILs) from 28 CRC patients. The data unveiled both intra-tumor and inter-patient heterogeneity in the TCRβ repertoires of CRC patients, likely resulting from the stochastic utilization of V and J segments in response to neoantigens. Our novel combined model integrates pHLA binding information with pHLA-TCR binding to prioritize neoantigens, resulting in heightened specificity and sensitivity compared to models using individual features alone. The efficacy of our proposed model was corroborated through ELISpot assays on long peptides, performed on four CRC patients. These assays demonstrated that neoantigen candidates prioritized by our combined model outperformed predictions made by the established tool NetMHCpan. This comprehensive assessment underscores the significance of integrating pHLA binding with pHLA-TCR binding analysis for more effective immunotherapeutic strategies.

Introduction

In metastatic CRC patients, immunotherapy has fulfilled the promise of improving survival rate (Ganesh et al., 2019). Immune checkpoint inhibitors (ICIs), which block negative regulatory pathways in T-cell activation, have been approved by the US Food and Drug Administration (FDA) for the treatment of deficient mismatch repair (dMMR) or high microsatellite instability (MSI-H) CRC patients (Dudley et al., 2016; Le et al., 2015; Overman et al., 2017). However, there is an urgent need for alternative immunotherapy strategies for metastatic CRC patients, as patients with proficient mismatch repair (pMMR) or microsatellite stability (MSS) have not shown significant responses to immune checkpoint inhibitors (Dudley et al., 2016; Le et al., 2017).

Neoantigens (neopeptides) have emerged as potential targets for personalized cancer immunotherapy, including CRC (Yu et al., 2022; Kim et al., 2020). Neoantigens are peptides that result from somatic mutations and can be displayed by class I human leukocyte antigen (HLA-I) molecules on the surface of cancer cells, thereby activating immune-mediated tumor killing (Blass and Ott, 2021). Recent studies have demonstrated that the presence of neoantigens is associated with better responses to immune checkpoint inhibitor (ICI) therapy in CRC patients (Miao et al., 2018; Yarchoan et al., 2017). A high neoantigen burden has been linked to improved overall survival and progression-free survival in patients with various solid tumors, including CRC (Miao et al., 2018; Yarchoan et al., 2017). Therefore, neoantigen-based immunotherapies are considered to have significant potential for improving treatment outcomes for CRC patients.

The successful development of neoantigen-based therapies hinges upon the identification of neoantigens that exhibit a strong binding affinity to their respective HLA-I molecules and demonstrate high immunogenicity. Initially, DNA sequencing of tumor tissues and paired Peripheral Blood Mononuclear Cells identifies cancer-associated genomic mutations. RNA sequencing then determines the patient’s HLA-I allele profile and the gene expression levels of mutated genes. In silico tools are then applied to analyze the tumor somatic variant, HLA-I allele, and gene expression data, predicting the binding affinity of neoantigens to the patient’s HLA-I alleles and their potential to activate T cell responses (Hundal et al., 2020; Chheda et al., 2018; Reynisson et al., 2020). This standard workflow has been successful in identifying clinically relevant neoantigens in various malignancies (Chheda et al., 2018; Schumacher and Schreiber, 2015).

However, despite its achievements, the current approach’s impact on patient outcomes remains limited due to the scarcity of mutations in cancer patients that lead to effective immunogenic neoantigens. This limitation arises primarily from the fact that the selection of neoantigen candidates by these workflows relies solely on neoantigen-HLA-I binding affinity as the criteria for immunogenicity prediction (Chen et al., 2021). While HLA-I binding is indeed a crucial factor for neoantigen presentation, it does not fully account for T cell receptor (TCR) recognition and interaction. The recognition of the neoantigen-HLA complex through TCR is of paramount importance for T cell activation and eliciting an immune response (Szeto et al., 2020; Guerder and Flavell, 1995). Various factors, such as the specific TCR repertoire, TCR clonality, and the structural characteristics of the TCR-peptide-HLA complex, profoundly influence TCR recognition. Unfortunately, these critical aspects are not entirely captured by HLA-I binding prediction alone.

The TCR is a critical component of the adaptive immune system, responsible for recognizing specific antigens presented by antigen-presenting cells (APCs). This membrane-bound heterodimer protein complex is expressed on the surface of T cells and exists in two distinct forms: TCRα/TCRβ for αβ T cells and TCRγ/TCRδ for γδ T cells, both intricately associated with invariant CD3 chain molecules (Kuhns and Badgandi, 2012; Rast, 1997). TCR diversity arises from the recombination of variable (V), diversity (D), and joining (J) genes at the TCRβ and TCRδ loci, along with VJ recombination at the TCRα and TCRγ loci (Rosati et al., 2017), giving rise to a broad array of unique TCRs collectively known as the TCR repertoire. The complementarity-determining region 3 (CDR3), situated at the junction of V, D, and J gene segments (Wucherpfennig et al., 2010), plays a pivotal role in antigen recognition, with the unique combination of CDR3 sequences contributing significantly to the specificity and diversity of the TCR repertoire. With advancements in technology, TCR sequencing has become a powerful technique used to characterize the diversity and composition of TCR repertoires. Valuable information directly obtained through TCR sequencing, including TCR clonotype diversity, V(D)J gene usage, repertoire size, clonal expansion, and repertoire changes, has provided invaluable insights into immune responses, antigen recognition, and the development of targeted immunotherapies.

Sequencing the TCRs of TILs or lymphocytes found in peripheral blood provides crucial insights into the T-cell repertoire and their responses against neoantigens associated with tumors (Porciello et al., 2022; Lu et al., 2021b; Mazzotti et al., 2022). This information holds paramount importance in identifying TCRs that specifically target these neoantigens. Additionally, it proves to be invaluable in assessing the immunogenic potential of predicted neoantigens. Unfortunately, the current bioinformatic workflows used in neoantigen selection and prioritization do not incorporate TCR sequencing data. Therefore, we speculate that integrating TCR sequencing data into the assessment of immunogenicity of predicted neoantigens holds promise for unveiling effective neoantigens which can induce immune response, consequently, advancing the development of personalized immunotherapies for cancer.

To achieve this goal, we first performed TCR sequencing on frozen samples collected from CRC patients to profile the TCRβ repertoire of tumor-filtrating T cells. We next exploited the peptide-HLA (pHLA) and pHLA-TCR binding affinity of peptides with known immunogenicity status from six public databases including 10 X Genomics (10x Genomics, 2024a; 10x Genomics, 2024b; 10x Genomics, 2024c; 10x Genomics, 2024d), McPAS (Tickotsky et al., 2017), PRIME (Schmidt et al., 2021), VDJdb (Shugay et al., 2018), IEDB (Vita et al., 2019), and TBAdb (Zhang et al., 2020) to develop a predictive algorithm. Subsequently, we employed this algorithm to predict and rank neoantigens by using the TCRβ, HLA types, and mutation profiles identified from patients’ frozen tumor tissues. Finally, we experimentally verified the efficacy of our model by measuring the immunogenicity of the ranked peptides by an ELISpot assay.

Results

The workflow of identifying neoantigens by combining both HLA and TCR binding characteristics

The identification of neoantigens has traditionally heavily relied on pHLA binding prediction while often neglecting the significance of pHLA-TCR interactions (Lu et al., 2021a). In this study, we introduce a novel workflow that integrates both pHLA and pHLA-TCR interactions to enhance the precision of neoantigen identification and prioritization (Figure 1). In the first step (Figure 1A), we conducted RNA and DNA sequencing on matched tumor tissues and peripheral blood mononuclear cells (PBMCs) collected from 28 CRC patients to detect cancer-associated nonsynonymous mutations and determine HLA types, as detailed in our previous work (Nguyen et al., 2023). Additionally, T-cell receptor sequencing (TCR-Seq) was performed to profile the TCRβ repertoire of TILs. In the second step (Figure 1B), we used the pVACseq and pMTNet tools to predict the binding affinities of both pHLA and pHLA-TCR interactions. To exploit the information of both pHLA and pHLA-TCR binding for selecting and ranking neoantigen candidates, we constructed a machine-learning model by using immunogenic and non-immunogenic peptide information sourced from six publicly available databases (Figure 1C). Finally, we designed long peptide (LP) candidates encompassing the selected neoantigens and experimentally assessed their immunogenicity using PBMCs obtained from the same patients (Figure 1D).

Figure 1

Download asset Open asset

A novel workflow based on machine learning that integrates T cell receptor β (TCRβ) sequencing data for the identification and ranking of colorectal cancer (CRC) neoantigens.

(A) Tumor biopsies and peripheral blood from CRC patients were subjected to targeted DNA-seq, RNA-seq, and T cell receptor (TCR)-seq. (B) The prediction of peptide-human leukocyte antigen (HLA) binding and peptide-HLA-TCR binding by indicated tools using the DNA-seq, RNA-seq, and TCR-seq data was performed. (C) Machine learning models were subsequently constructed based on the analysis of the peptide-HLA binding and peptide-HLA-TCR binding features to distinguish immunogenic antigens from non-immunogenic peptides. (D) The immunogenicity of predicted neoantigen candidates prioritized by the model was validated by enzyme-linked immunospot (ELISpot) to evaluate the effectiveness of this approach.

Heterogenous tumor infiltrating TCRβ profiles in colorectal cancer patients

Autologous TILs have exhibited varying reactivity levels to neoantigens, suggesting the potential of TIL-based recognition to improve the identification of immunogenic neoantigens (Chen et al., 2019). Furthermore, characterizing TCRs can complement efforts to predict immunogenicity. To explore this, we initiated our study by characterizing the TIL repertoire in a cohort of 28 colorectal cancer patients. This characterization involved sequencing the complementarity-determining region 3 (CDR3) of T-cell receptor beta (TCRβ), renowned for its remarkable diversity within the TCR gene. Our sequencing analysis yielded an average of 2,992,949 productive TCR reads per sample, with a range between 256,035 and 10,888,726 (Supplementary file 1a), following correction for duplications, sequencing errors, and exclusive use of uniquely barcoded reads mapped to TCRβ-CD3 sequences from the ImMunoGeneTics (IMGT) databases. Across the 28 patients, we observed variable numbers of TCRβ-CDR3 clonotypes, ranging from 433 to 27,749 (Figure 2A, Supplementary file 2A, 28 patients were arranged in ascending order), indicating the intra-tumor heterogeneity of TCR clonotypes. Of these clonotypes, 59.5% exhibited a single uniquely barcoded read mapped to the TCRβ-CDR3 sequences (depicted in yellow in Figure 2A), while 40.5% had at least two TCRβ-CDR3 reads confidently identified (Figure 2A). As observed previously (Hey et al., 2023), the length distribution of CDR3 was approximately normally distributed (median 14, range 4–43, Figure 2—figure supplement 1 and Supplementary file 1b). Subsequently, we explored potential associations between TCR diversity, as quantified by the Shannon index, and patient characteristics. In accordance with prior investigations, we identified an inverse relationship between the number of TCR clones and the Shannon index (Figure 2B). Based on our current sequencing depth, we have observed that many of our samples (14 out of 28) have reached sufficient saturation (Figure 2C, Figure 2—figure supplement 3) as their diversity of clonotypes was saturated. Additionally, the TCR clones selected in our studies are unique molecular identifier (UMI)-collapsed reads, with each representing at least three raw reads sharing the same UMI. Consequently, despite some samples having low sequencing depth for TCRβ sequencing, likely due to the diversity of TIL infiltration between patients, our TCRβ profiling analysis is robust and reliable. However, we did not detect any significant correlations between TCR clonality or diversity and clinical variables, including microsatellite status, tumor staging, patient gender, and tumor location (Figure 2—figure supplement 2). In contrast, MSI-H tumors, which are known to be rich in neoantigens (Roudko et al., 2020; Maleki Vareki, 2018), displayed a significantly lower number of TCRβ clonotypes compared to MSS tumors (2181 vs 5330, p=0.057, Figure 2—figure supplement 2a and 3), aligning with previous research (Laghi et al., 2020).

Figure 2 with 3 supplements see all

Download asset Open asset

Tumor-infiltrating T cell receptor β (TCRβ) profiles in 28 colorectal cancer patients.

(A) A bar plot depicting the distribution of T cell receptor (TCR) clonotypes among 28 colorectal cancer (CRC) patients, categorized into two groups: those with a unique read count and those with read counts greater than or equal to 2 for each TCR clonotype. (B) The scatter plot illustrates the relationship between the Shannon-index and the number of TCR clones. (C) The rarefaction plot shows the variable between sample size and diversity among 28 CRC samples.

After filtering out TCR clones with read counts below 1, we obtained a total of 74,590 TCR clones. Subsequently, we conducted a comprehensive assessment of TCRβ repertoire similarity by calculating the proportion of overlapping TCRβ-CDR3 clones across the 28 patients. The preeminent majority, constituting 95.1% of all identified TCRβ-CDR3 clones, comprised unique clonotypes found in one patient, while the remaining 4.9% were recurrently observed in at least two patients (Figure 2—figure supplement 2b). This observation underscores the substantial inter-patient heterogeneity in TCR profiles. The TCRβ repertoire is generated by the random rearrangement of variable (V), diversity (D), and joining (J) segments. We conducted a search specifically targeting the D segment by scoring the similarities of sub-sequences within the junction sequences against the reference sequences of D segments. Consistent with a previous study, we were unable to unambiguously assign the D segments to any defined D region usage due to the truncation of this region (Yassai et al., 2009). In contrast to the diversified D segments, V and J sequments displayed high recurrent rates across 28 patients (Figure 2—figure supplement 2b). As anticipated, we identified 59 distinct V segments (Figure 2—figure supplement 2c) and 13 distinct J segments (Figure 2—figure supplement 2d), collectively sharing 185,627 clones across the 28 tumor tissue samples. This underscores the conservation of these segments (Figure 2—figure supplement 2c and d). Conversely, we observed a varied combination of V and J segments, which significantly contributes to the heterogeneity of the TIL TCRβ repertoire (Figure 2—figure supplement 2e). Collectively, our data elucidate the presence of both intra-tumor and inter-patient heterogeneity in the TCRβ repertoires of CRC patients, likely stemming from the stochastic utilization of V and J segments in response to neoantigens.

pHLA and pHLA-TCR interactions are two complement determinants of neoantigen immunogenicity

While in silico tools predicting HLA-peptide binding affinity have traditionally played a pivotal role in determining neoantigen immunogenicity (Vitiello and Zanetti, 2017), the evaluation of TCR-peptide binding for screening immunogenic neoantigens remains understudied. In light of this, our study aimed to assess the contributions of both pHLA and pHLA-TCR binding affinity in predicting immunogenic neoantigens. To accomplish this, we gathered HLA and TCRβ sequences from established datasets containing immunogenic and non-immunogenic pHLA-TCR complexes (Supplementary file 1c). Subsequently, we employed NetMHCpan (Reynisson et al., 2020) and pMTNet (Lu et al., 2021a) tools to predict pHLA and pHLA-TCR binding, respectively.

For comparative purposes, we generated plots depicting predicted percentile rank values, with lower percentile ranks signifying stronger binding affinity. As anticipated, our analysis revealed a significantly higher prevalence of peptides with robust HLA binding (percentile rank <2%) among immunogenic peptides in contrast to their non-immunogenic counterparts (Figure 3A & B, p<0.00001). Similarly, immunogenic peptides exhibited a greater proportion of peptides with percentile ranks indicating pHLA-TCR binding of <2% compared to non-immunogenic peptides (Figure 3C & D, p=0.086). As recommended by NetMHCpan and pMTNet, we considered peptide candidates with predicted percentile ranks below 2% as binders. Utilizing this predefined threshold, both pHLA and pHLA-TCR binding affinity exhibited comparable positive predictive values (PPV), at 68.5% (Figure 3B) and 64.3% (Figure 3D), respectively. This substantiates the significance of both pHLA and pHLA-TCR binding in determining the immunogenicity of peptides. When we simultaneously applied the cutoff values for both HLA-peptide and TCR-peptide percentile ranks (Q1 group, Figure 3E), the PPV increased to 76.9% from 68.5% and 64.3% for individual binding features (Figure 3F). This underscores the rationale for combining these two criteria to enhance the accuracy of neoantigen prediction. Notably, pHLA-TCR binding displayed a remarkedly lower sensitivity but higher specificity in comparison to pHLA binding (Figure 3G), implying their potential as a complementary criteria for the selection of immunogenic peptides.

Figure 3

Download asset Open asset

Peptide-T cell receptor (TCR) and peptide-human leukocyte antigen (HLA) interactions are two complementary determinants of neoantigen immunogenicity.

(A) The histogram displays the HLA percentile distribution of immunogenic antigens (red bar) and non-immunogenic peptides (gray bar). (B) The percentage of immunogenic antigens (red bar) and non-immunogenic peptides (gray bar) is compared between two groups based on HLA percentile:<2% and ≥ 2% (Chi-square test, p<0.00001). (C) The histogram displays the TCR ranking distribution of immunogenic antigens (red bar) and non-immunogenic peptides (gray bar). (D) The percentage of immunogenic antigens (red bar) and non-immunogenic peptides (gray bar) is compared between two groups based on TCR ranking:<2% and ≥ 2% (Chi-square test, p=0.086). (E) The scatter plot illustrates the relationship between the HLA percentile distribution and TCR ranking of immunogenic antigens (red bar) and non-immunogenic peptides (gray bar). (F) The percentage of immunogenic antigens (red bar) and non-immunogenic peptides (gray bar) is analyzed in four distinct groups based on cutoffs of HLA percentile and TCR ranking. (G) The bar plot illustrates the sensitivity and specificity of three neoantigen prioritization approaches: based on neoantigen-HLA binding affinity alone (yellow bar), neoantigen-TCR binding ranking alone (blue bar), and the combined method using both features (red bar).

Combination of peptide-HLA and peptide-HLA-TCR interactions improves neoantigen prediction

The combination of peptide-HLA and peptide-HLA-TCR binding ranking with a fixed cutoff values of 2% for each feature resulted in high specificity but low sensitivity (Figure 3G). To optimize the precision of immugenic neoantigens, we examined three distinct machine learning classifiers, namely Random Forest (RF), Logistic Regression (LR), and Extreme Gradient Boosting (XGB) classifiers. We first partitioned the pHLA binding and pHLA-TCR binding ranks for immunogenic and non-immunogenic peptides from publicly available databases into separate discovery (70%) and validation (30%) datasets. These subsets are mutually exclusive and do not overlap (Figure 4—figure supplement 1).

These datasets were used to develop and validate machine learning algorithms, as illustrated in Figure 4A. We next assessed the performance of the three examined algorithms, employing a 10-fold cross-validation strategy. Among these algorithms, XGB demonstrated the highest performance, achieving an area under the receiver operating characteristic curve (AUC) of 0.82 in the training dataset and 0.84 in the validation dataset (Figure 4—figure supplement 2). Consequently, the model combining pHLA and pHLA-TCR binding, referred to as the 'combined model' was chosen for further validation.

Figure 4 with 2 supplements see all

Download asset Open asset

The combined model demonstrates improved sensitivity and specificity for neoantigen prioritization.

(A) The workflow for constructing the model. (B) The receiver operating characteristic (ROC) curves demonstrate the performance of both the combined model and individual models in both the discovery and validation cohorts. The bar graphs illustrate the sensitivity (C), negative predictive value (NPV) (D), and positive predictive value (PPV) (E) at specificity levels of at least 95% or 99% for both the combined and individual models in both the discovery and validation cohorts. (F) Ranking coverage scores for the specified models in either the discovery or validation cohorts.

Our findings revealed that the combined model outperformed methods relying on pHLA-TCR or pHLA binding features separately. The combined model yielded sensitivity AUC values of 0.82 (95% CI 0.81–0.84) and 0.84 (95% CI 0.82–0.86) for discovery and validation cohorts, whereas the pHLA-TCR feature alone achieved AUC values of 0.69 (95% CI 0.66–0.71) and 0.74 (95% CI 0.71–0.77). Meanwhile, the pHLA feature resulted in AUC values of 0.76 (95% CI 0.75–0.78) and 0.74 (95% CI 0.72–0.77) (Figure 4B). In order to address the elevated false positive rates associated with current prediction tools for neoantigens, we set the specificity at high thresholds of >95% and>99%. We observed that the combined model achieved a greater sensitivity (39.7% vs 5.9% and 19.7% at > 95% specificity; 47.1% vs 3.5% and 18.8% at >99% specificity, Figure 4C), negative predictive value (NPV, 87.7% vs 82% and 84.2% at >95% specificity; 88.9% vs 81.5% and 83.9% at >99% specificity, Figure 4D), and positive predictive value (PPV, 65.9% vs 22.2% and 47.1% at >95% specificity; 62.3% vs 11.8% and 39.4% at >99% specificity, Figure 4E) compared to the single feature methods.

The accurate prioritization of neoantigen candidates with high immunogenicity holds the potential to streamline the validation process, reducing both costs and time expenditures. Consequently, we proceeded to evaluate the combined model’s capacity to prioritize neoantigens by computing the ranking coverage score, which considers the accuracy in ranking immunogenic peptides versus non-immunogenic peptides (Zhou et al., 2019b). In the validation phase, the combined model exhibited superior rank coverage scores in comparison to the individual feature-based methods. The combined model attained a ranking coverage score of 0.37, while the single-feature methods, pHLA-TCR and pHLA, yielded scores of –0.26 and 0.25, respectively (Figure 4F). These findings underscore the notion that the incorporation of pHLA and pHLA-TCR binding criteria can enhance the accuracy of prediction and prioritization of immunogenic neoantigens.

Experimental validation of the pHLA-TCR and pHLA combined approach in selecting neoantigen candidates in patients with CRC

To experimentally validate the efficacy of our combined approach, we conducted a comparative analysis of the percentage of confirmed immunogenic neoantigen candidates and the ranking coverage scores between the conventional NetMHCpan method, which relies solely on pHLA binding percentiles, and our combined approach (Figure 5A). Due to the limited availability of patient blood samples, we were only able to perform validation on eight patients who possessed a sufficient quantity of PBMCs. For each patient, we chose the top three neoantigen candidates predicted by each method. Among the neoantigen candidates, three were found to be common between the two methods, while twenty neoantigen candidates were uniquely identified by either NetMHCpan or the combined model (Figure 5B), bringing the total number of candidates to 44. Subsequently, we synthesized 44 LPs covering these 44 selected neoantigen candidates, along with 44 LPs corresponding to the wild-type sequences. The LPs were utilized in an ex vivo ELISpot assay to measure the release of interferon-gamma (IFNγ) from the four patients' PBMCs. If the stimulation with a mutant LP resulted in a >twofold increase in IFNγ spots compared to its corresponding wild-type LP, the respective neoantigen candidate was classified as an immunogenic neoantigen.

Figure 5 with 1 supplement see all

Download asset Open asset

Validation of neoantigens identified in silico from the novel workflow through enzyme-linked immunospot (ELISpot) assays conducted on four colorectal cancer (CRC) patients.

(A) A schematic diagram illustrates the procedural steps of neoantigen prioritization and the ELISpot assay. (B) The count of neoantigens identified from each pipeline. (C) The fold change in IFN-γ spots, relative to the wild-type peptides, is shown for 21 long peptides. Note: Only the mutants that result in a positive value in ELISpot are depicted, along with their corresponding amino acid changes and their associated rankings. (D) ELISpot assays on six long peptides resulting in at least a twofold change in IFN-γ spots. (E) The bar graphs display the ranking of validated long peptides identified from the NetMHCpan tool (blue bar) or the combined method (red bar) for individual patients and all patients.

Out of the 44 selected LPs, we confirmed the immunogenicity of three LPs (RNF213_W719S, MAP3K1_L1202F, and TRRAP_T148TEL) identified by the NetMHCpan method and seven LPs (NCOR2_R1963H, TRRAP_F1568S, DICER1_P1592L, KMT2C_K3848T, BRAF_EV275-276-, SMAD4_G230R, and PTPN13_D184A) selected by the combined method (Figure 5C and D). Notably, six patients exhibited at least one immunogenic peptide identified by the combined method, whereas none of the LPs in five patients, PT07, PT18, PT21, PT24, and PT25, chosen via the NetMHCpan method were validated as immunogenic (Figure 5C and D). To further assess the ranking accuracy of these two methods, we calculated the ranking coverage scores. In agreement with our in silico analyses, we observed higher rank coverage scores for the combined method in five out of the eight patients, resulting in an overall score of 0.04 (Figure 5E, Figure 5—figure supplement 1). In contrast, the NetMHCpan method exhibited a lower rank coverage score of –0.37 (Figure 5E, Figure 5—figure supplement 1). To further evaluate our model, we gathered additional public data and assessed its effectiveness in comparison to other models. We utilized immunogenic peptides from databases such as NEPdb (Xia et al., 2021), NeoPeptide (Zhou et al., 2019a), dbPepneo (Tan et al., 2020), Tantigen (Zhang et al., 2021), and TSNAdb (Wu et al., 2018), ensuring there was no overlap with the datasets used for training and validation. For non-immunogenic peptides, we used data from 10 X Genomics Chromium Single Cell Immune Profiling (10x Genomics, 2024a; 10x Genomics, 2024b; 10x Genomics, 2024c; 10x Genomics, 2024d). The findings indicate that the combined model from pMTNet and NetMHCpan outperforms the NetTCR tool (Montemurro et al., 2021; Supplementary file 1d). These outcomes conclusively demonstrate the ability of the combined approach to enhance the prediction and ranking of immunogenic neoantigens in cancer patients.

Discussion

The selection of neoantigens plays a pivotal role in enhancing the efficacy of personalized treatments in cancer immunotherapy. Historically, neoantigen selection has predominantly hinged upon the prediction of pHLA binding. However, the limitations of this approach have become increasingly evident (Borden et al., 2022), as it often neglects the dynamic interplay between tumor cells and the immune system. The findings of our study shed new light on this critical aspect of neoantigen selection. Our research highlights the potential of integrating pHLA binding prediction with the assessment of pHLA-tumor-infiltrating Lymphocyte T Cell receptor (TIL TCR) binding (Figure 1). By encompassing the interaction between neoantigens and the tumor microenvironment, we have demonstrated a substantial enhancement in the accuracy of neoantigen selection and prioritization. Our findings underscore the significance of comprehensive neoantigen assessment in harnessing the full potential of immunotherapeutic strategies in the fight against cancer.

Our investigation into the TIL TCRβ repertoire across 28 CRC patients has unveiled a complex tapestry of intra-tumor and inter-patient heterogeneity (Figure 2A, Figure 2—figure supplement 1). One underlying cause of this diversity is the random rearrangement of variable (V) and joining (J) segments, which contributes to the distinct TCRβ sequences observed (Figure 2—figure supplement 1c, d & e). This remarkable diversity in the TCRβ profile, as elucidated in our study, presents a potential link to the heterogeneity of cancer mutations and neoantigens within CRC. The intricate relationship between the TCRβ repertoire and the genetic alterations within the tumor microenvironment underscores the need for a more comprehensive understanding of this dynamic interplay to develop more precise immunotherapeutic strategies (Joshi et al., 2019). Furthermore, our research revealed a compelling observation regarding patients with MSI-H, who are known to exhibit high mutation and neoantigen burdens (Motta et al., 2021; Xie et al., 2023). Intriguingly, MSI-H patients with strong immune reactivity to neoantigens was shown to display a lower number of TCR clonotypes and reduced diversity, as characterized by the Shannon index, compared to CRC patients with microsatellite stable (MSS) (Figure 2—figure supplement 2). This stark contrast in TCRβ diversity hints at the potential enrichment of neoantigen-reactive TCR clonotypes in MSI-H patients. These findings provide valuable insights into the immune responses to different subtypes of CRC and may guide the development of more tailored immunotherapies for patients with differing mutational landscapes.

By utilizing proven immunogenic and non-immunogenic peptides from public databases, we have demonstrated that predicting the strength of pHLA-TCR binding can be a crucial criterion for selecting immunogenic candidates (Figure 3). For the prediction of pHLA-TCR binding, we employed the well-established tool pMTnet (Lu et al., 2021a). This choice was driven by its proven efficacy, based on its consistently high performance across different validated datasets (Lu et al., 2021a). Although it exhibited lower sensitivity and PPV when compared to pHLA binding alone, pHLA-TCR binding strength exhibited notably superior specificity (Figure 3G). This implies that incorporating pHLA-TCR binding strength as a selection criterion would result in a reduced false positive rate, a crucial factor in the context of neoantigen selection. What makes our study particularly compelling is the convergence of these observations. The integration of both criteria, specifically, pHLA binding and pHLA-TCR binding strength, emerged as a strategy that not only enhanced the PPV but also fortified the rationale for combining these two features in the neoantigen selection process.

The integration of both pHLA binding and pHLA-TCR binding strength features in our approach exhibited superior performance in neoantigen selection and prioritization when compared to the single-feature method (Figure 4). This observation aligns with the findings of previous studies, which consistently indicate that combining multiple criteria enhances the accuracy and efficacy of neoantigen identification (Zhou et al., 2019b; Müller et al., 2023). Furthermore, our experimental validation confirmed the robustness of our approach by consistently demonstrating performance consistent with our analysis of publicly available data (Figure 5, Supplementary file 1d). When examining the percentile ranks of positive LPs predicted by netMHCpan from those predicted by our combined model, the results further underscored the strength of our approach (Figure 5E). This alignment between experimental validation and computational analysis enhances the reliability and applicability of our neoantigen selection method. Interestingly, the four identified neoantigen candidates with confirmed immunogenicity by the combined approach have not been previously reported in public neoantigen-databases, indicating that they could serve as novel targets for neoantigen based therapies.

However, several limitations must be acknowledged. Firstly, the relatively small sample size employed for validation raises the potential for ranking score bias. Additionally, while both TCRα and TCRβ regions play essential roles in engaging peptide-bound HLA complexes, our study focused solely on TCRβ sequences to predict pHLA-TCR binding strength. Future investigations should include TCRα sequences to provide a more comprehensive analysis. Although the PBMC and TILs were previously shown to be congruent in neoantigen reactivity (Malekzadeh et al., 2020), it is important to recognize that differences in the contribution of TIL and PBMC TCR repertoires in neoantigen selection may introduce variability in the selection and validation of neoantigen candidates, and future studies are warranted to address the consistency in neoantigen reactivity between these two sources of T cells. Moreover, to improve the accuracy and effectiveness of the machine learning model in predicting and ranking neoantigens, we have developed an in-house tool called EpiTCR. This tool will utilize immunogenic assays, such as ELISpot and single-cell sequencing, for validation.

In summary, our study delves into the diversity and variation within the TCRβ repertoire of TILs in the tumor tissues of colorectal cancer patients. Through our research, we have introduced a novel approach to the identification and prioritization of neoantigen candidates. This approach combines the assessment of pHLA binding and peptide-human leukocyte antigen-T cell receptor (pHLA-TCR) binding. Our findings underscore the significance of considering pHLA-TCR binding interactions as part of the process for selecting neoantigen candidates. This is a crucial step in the development of personalized immunotherapy strategies. By combining these two factors, we can more accurately identify the neoantigens that hold promise for effective immunotherapy, ultimately improving the prospects for tailored and effective cancer treatment.

Materials and methods

Tumor biopsy and peripheral blood collection

Request a detailed protocol

Twenty-eight patients diagnosed with CRC were enrolled in this study at the University Medical Center in Ho Chi Minh City between June 2022 and May 2023. CRC confirmation was based on abnormal colonoscopies and histopathological analysis. The stages of CRC were determined according to guidelines provided by the American Joint Committee on Cancer and the International Union for Cancer Control (the eighth version) (Amin et al., 2017; Tong et al., 2018). Prior to participation, all patients provided written informed consent for tumor and whole blood sample collection. Relevant clinical data, including demographics, cancer stages, and pathology information, were extracted from the medical records of the University Medical Center. Detailed information regarding the clinical factors of the patients can be found in Supplementary file 1e. This study was approved by the Ethics Committee of the University of Medicine and Pharmacy in Ho Chi Minh City, Vietnam. For eight patients, ten mL of peripheral blood was collected before surgery and stored in Heparin tubes prior to isolation of PBMCs.

Targeted DNA, RNA, and TCRβ sequencing

Request a detailed protocol

The DNA/RNA samples were isolated using either the AllPrep DNA/RNA Mini Kit or the AllPrep DNA/RNA/miRNA Universal Kit (Qiagen, Germany) as per the manufacturer’s protocol. In addition, matched genomic DNA from the white blood cells (WBC) of individuals was also extracted from the buffy coat using the GeneJET Whole Blood Genomic DNA Purification Mini kit (Thermo Fisher, MA, USA), following the manufacturer’s instructions. Genomic DNA samples from the patients’s paired tumor tissues and WBCs were used to prepare DNA libraries for DNA sequencing with the ThruPLEX Tag-seq Kit (Takara Bio, USA). The libraries were then pooled and hybridized with pre-designed probes for 95 targeted genes (Integrated DNA Technologies, USA). This gene panel encompasses commonly mutated genes in CRC tumors, as reported in the Catalog of Somatic Mutations in Cancer (COSMIC) database. The DNA libraries were then subjected to massive parallel sequencing on the DNBSEQ‐G400 sequencer (MGI, Shenzhen, China) for paired-end reads of 2×100 bp with a sequencing depth of 10 X.

Isolated total RNA was processed using the NEBNext Poly(A) mRNA Magnetic Isolation Module (New England Biolabs, MA, USA) to isolate intact poly(A)+ RNA following the manufacturer’s instructions. RNA libraries were prepared using the NEBNext Ultra Directional RNA Library Prep Kit for Illumina (New England Biolabs). These libraries were subsequently sequenced for paired-end reads of 2×100 bp on an MGI system at a sequencing depth of 50 X.

For TCRβ library construction, mRNA was utilized with the SMARTer Human TCR a/b Profiling Kit v2 (Takara, USA). The resulting libraries were sequenced for paired-end reads of 2×250 bp on an Illumina system at a sequencing depth of 7.5 X.

Microsatellite instability status

Request a detailed protocol

Microsatellite instability (MSI) status was determined in both tumor DNA and corresponding DNA from blood, serving as the control, using the MSI Analysis System, version 1.2 (Promega, US). This analysis identified alterations in five mononucleotide repeat markers: BAT-25, BAT-26, NR-21, NR-24, and MONO-27. The results were classified as MSS in cases where either none or only one marker was unstable. Samples showing alterations in two or more markers were categorized as microsatellite instability-high (MSI-H).

Variant calling from DNA-seq and RNA-seq data

Request a detailed protocol

The approach of integrating RNA-seq into calling variants was performed as described in our previous publication (Nguyen et al., 2023). Briefly, we used Dragen (Illumina) (Catreux et al., 2022) in tumor-normal mode to detect somatic mutations from DNA-seq data. The default filtering thresholds of Dragen were applied for the detection of single nucleotide polymorphisms (SNPs) and insertions/deletions (indels). SNPs were further filtered using the dbSNP and 1000 Genome datasets. Germline mutations in tumor tissues were identified by comparison with matched WBC DNA samples. Mutations within immunoglobulin and HLA genes were excluded due to alignment difficulties in these highly polymorphic regions, necessitating specialized analysis tools. Additionally, synonymous mutations were removed from downstream analysis. For analysis, we included somatic mutations that exceeded a minimum threshold of ≥2% variant allele frequency (VAF) in DNA extracted from fresh frozen tissues.

Sequencing reads underwent trimming using Trimmomatic (Bolger et al., 2014) and were then aligned to the human reference genome using STAR (version 2.6.0 c) (Dobin, 2013). Prior to alignment, raw sequencing reads were subjected to quality checks using FastQC version 0.11.9 (Andrews, 2010). VarScan 2 (Koboldt et al., 2012), which accepts both DNA and RNA-seq data, was employed to identify mutations in paired tumor and WBC samples across 95 cancer-associated genes. This analysis was carried out in tumor-normal mode. Four filtering steps were applied: (i) only calls with a PASS status were utilized, (ii) population SNPs overlapping with a panel of normal samples from the 1000 Genome dataset were excluded, (iii) somatic mutations included for analysis met a minimum threshold of ≥10x read depth and ≥2% variant allele frequency (VAF) in RNA extracted from FF tissue, and (iv) synonymous mutations and those related to HLA were removed from downstream analysis.

The resulting BAM files were sorted, and indexed using Samtools version 1.10 (Li et al., 2009), and had PCR duplicates removed using Picard tools version 2.25.6 (Picard, 2024). Somatic variants were manually reviewed using Integrative Genomics Viewer (v2.8.2). The VCF files generated by Dragen (for DNA-seq) and VarScan 2 (for RNA-seq) were subsequently annotated using the Ensembl Variant Effect Predictor (VEP version 105) (McLaren et al., 2016) to extract information about the potential effects of variants on the phenotypic outcome.

CDR3β calling from TCRseq data

Request a detailed protocol

To define the CDR3β from TCRseq, we utilized Cogent NGS Immune Profiler Software v1.0 (Takara Bio, 2024). Before calling clonotypes, the raw sequencing reads underwent filtering based on the following criteria: (i) allowing only one mismatch while splitting reads by matching read sequences to different receptor chains, (ii) excluding reads shorter than 30 bp and reads ambiguously matched to multiple receptor chains, (iii) excluding reads that failed correction when linker-based correction was enabled, (iv) excluding reads that failed the abundance check during sequencing error correction, and (v) excluding molecular identifier groups (MIGs) with fewer than three unique molecular identifier (UMI) reads. The filtered MIGs were then assembled and aligned to the V(D)J reference to define the TCR clonotype.

Shannon index and clonality

Request a detailed protocol

We utilized two indices to characterize T-cell diversity and expansion: the Shannon entropy index and the clonality index.

Shannon entropy:

- \sum_{i = 1}^{n} p_{i} l o g_{e} (p_{i})

Clonality:

1 - \frac{\sum_{i = 1}^{n} p_{i} {l o g}_{e} (p_{i})}{{l o g}_{e} (n)}

These indices consider both the number of T-cell clone types ‘n’ and the frequency ‘p_i’ of each clone. Here, ‘p_i’ represents the proportion of the i-th clone within the TCR library containing n clones.

In silico prediction of peptide-HLA binding and peptide-HLA-TCR binding

Request a detailed protocol

Class I HLA alleles (HLA-A/B/C) with two-digit resolution were identified from patient tumor RNA-seq data using the OptiType tool (Supplementary file 1f; Szolek et al., 2014; Li et al., 2024). The annotated VCF files were analyzed using pVACseq, a tool in pVACtools (v1.5.9) (Hundal et al., 2020; Hundal et al., 2016; Hundal et al., 2019). We used the default settings, except for disabling the coverage and MAF filters. We used all peptide-HLA-I binding algorithms that were implemented in pVACseq to predict 8–11-mer epitopes binding to HLA-I (A, B, or C) for downstream analysis. Mutants with lower binding affinity than wild-type peptides were prioritized using a two-step ranking process. The minimum binding affinity score was calculated from the distribution of scores among the peptides derived from each mutation, and priority was given to mutations with the lowest binding affinity scores. Moreover, in pVACseq, we collected peptides with binding affinity scores lower than wild-type from multiple tools, calculated minimum binding affinity scores for each unique peptide, and incorporated this data into a combined machine learning model.

We used the peptide-HLA-TCR binding algorithm implemented in pMTNet (Lu et al., 2021a) to predict peptide binding to HLA and TCR with default settings. These scores represented the predicted likelihood of peptides being immunogenic. A ranking value for immunogenicity was assigned to each unique peptide by determining the minimum TCR ranking of its immunogenicity score.

Construction of a combined machine-learning model

Request a detailed protocol

To identify immunogenic peptides, we conducted a thorough search across multiple databases, including 10 X Genomics (10x Genomics, 2024a; 10x Genomics, 2024b; 10x Genomics, 2024c; 10x Genomics, 2024d), McPAS (Tickotsky et al., 2017), VDJdb (Shugay et al., 2018), IEDB (Vita et al., 2019), and TBAdb (Zhang et al., 2020). For the creation of a non-immunogenic pHLA-TCR complex, we assembled a negative dataset using the PRIME tool (Schmidt et al., 2021). This dataset was generated by associating each pHLA with 10 randomly generated TCRs sourced from the following databases: 10 X Genomics, McPAS, VDJdb, IEDB, and TBAdb. To objectively train and evaluate the model, we separated the dataset mentioned above into two data subsets: discovery dataset (70%) and validation dataset (30%). These subsets are mutually exclusive and do not overlap (Figure 4—figure supplement 1). pHLA-TCR complex in the discovery dataset, whether labeled as immunogenic or non-immunogenic, were used for model training to classify whether a peptide is positive or not. We examined three machine learning algorithms - Logistic Regression (LR), Random Forest (RF), and Extreme Gradient Boosting (XGB) - for each feature type (pHLA binding, pHLA-TCR binding), as well as for combined features. Feature selection was tested using a k-fold cross-validation approach on the discovery dataset with ‘k’ set to 10-fold. This process splits the discovery dataset into 10 equal-sized folds, iteratively using ninefolds for training and onefold for validation. Model performance was evaluated using the ‘roc_auc’ (Receiver Operating Characteristic Area Under the Curve) metric, which measures the model’s ability to distinguish between positive and negative peptides. The average of these scores provides a robust estimate of the model’s performance and generalizability. The model with the highest ‘roc_auc’ average score, XGB, was chosen for all features. The model cut-off was set based on a threshold specificity of >90% or>95% to achieve high specificity for selecting peptides. This combined model’s performance was evaluated on the independent validation dataset of immunogenic and non-immunogenic pHLA-TCR complex to assess its ability to classify positive peptides from negative peptides. In our GitHub repository, we have included links to the GitHub repositories for NetMHCpan and pMTNet (https://github.com/QuynhPham1220/Combined-model copy archived at Pham, 2024).

Ranking coverage score calculation

Request a detailed protocol

The approach of compare ranking between two algorithms was performed as described in a previous publication (Zhou et al., 2019b). Briefly, the rank coverage score was based on the ranking value calculation given by the formula:

\begin{array}{cc} R a n k C o v e r a g e S c o r e = \frac{\sum_{n \in n e g a t i v e} r a n k (n)}{T \times n u m (n)} \times c o v e r a g e (n) - \frac{\sum_{n \in p o s i t i v e} r a n k (p)}{T \times n u m (p)} \times c o v e r a g e (p) \\ c o v e r a g e (k) = \frac{m a x r a n k (k)}{T} k \in (n, p) \end{array}

Where, T presented the total neoepitope number identified and p and n presented the positive and negative peptides, respectively, that were experimentally validated in vitro.

If positive peptides have smaller rank values than negative peptides, this will result in a high-rank coverage score, indicates the better ranking result.

Isolation, culture, and stimulation of PBMCs with long peptides

Request a detailed protocol

Peripheral blood samples were collected from eight patients prior to surgery using BD Vacutainer Heparin Tubes (BD Biosciences, NJ, USA). PBMCs were isolated through gradient centrifugation using Lymphoprep (STEMCELL Technologies) within 4 hr of blood collection. The PBMCs were then resuspended in a solution of FBS (10%) and DMSO (7–10×10⁶ cells/mL) for cryopreservation in liquid nitrogen. Frozen PBMCs were thawed in AIM-V media (Gibco, Thermo Scientific, MA, USA) supplemented with 10% FBS (Cytiva, USA) and DNase I (Stemcell Technology, Canada) (1 mg/mL) solution. A total of 10⁵ PBMCs were allowed to rest in a 96-round bottom well-plate containing AIM V media supplemented with 10% FBS, 10 mM HEPES, and 50 mM β-mercaptoethanol overnight before stimulation with synthesized long peptides at a concentration of 5 mM in a humidified incubator at 37 °C with 5% CO2. PBMCs were further stimulated with GM-CSF (2000 IU/mL, Gibco, MT, USA) and IL-4 (1000 IU/mL, Invitrogen, MA, USA) for 24 hr. Following this initial stimulation, LPS (100 ng/mL, Sigma-Aldrich, MA, USA) and IFN-γ (10 ng/mL, Gibco, MT, USA) were added to the PBMCs along with the peptides for an additional 12 hr. On the following day, IL-7, IL-15, and IL-21 (each at a concentration of 10 ng/mL) (Peprotech, NJ, USA) were added to the PBMC culture. The restimulation process involved exposing the peptides to fresh media containing IL-7, IL-15, and IL-21 every 3 days, for a total of three times. On day 12, PBMCs were restimulated with peptides and cultured in media without cytokines. ELISpot assays were performed on stimulated PBMCs on day 13.

ELISpot assay on PBMCs stimulated with long peptides

Request a detailed protocol

Cultured T cells were transferred to an enzyme-linked immunospot (ELISpot) plate (Mabtech, Sweden) and incubated for 20 hr at 37 °C. PBMCs cultured with DMSO were used as a negative control group, while PBMCs stimulated with anti-CD3 were used as a positive control group. The ELISpot assay was performed on treated PBMCs using the ELISpot Pro: Human IFN-g (ALP) kit (Mabtech, Sweden), following the manufacturer’s protocol. Developed spots on the ELISpot plate were then counted using an ELISpot reader (Mabtech, Sweden). Reactivity was determined by measuring the fold increase in the number of spots in PBMCs treated with mutant peptides compared to those treated with wild-type peptides. A fold change of two was selected as the cutoff for positivity, indicating a significant increase in reactivity (Moodie et al., 2010).

Statistical analysis

Request a detailed protocol

In this study, t-tests were used to compare the TCR clones, clonality, and Shannon index among two groups of four categories (Microsatellite status, stage, gender, and tumor location). Chi-square test was used to compare the proportions of immununogenic and non-immunogenic peptides. All statistical analyses were performed using R (version 4.3.0) with common data analysis packages, including ggplot2 and pROC. The 95% confidence interval (95% CI) was presented in brackets next to a value as appropriate.

Data availability

The Vietnamese government has enacted a new law on the Protection of Personal Data (Decree No. 13/2023/ND-CP), which came into effect on July 1, 2023. This law includes specific requirements for the storage and sharing of personal data, including genetic information. Consequently, researchers interested in using our genetic data must apply for approval before accessing it.To request access to sequencing data (FastQC files) from individuals, please contact the senior author, Dr. Le Son Tran, Medical Genetics Institute (MGI), at 186-188 Nguyen Duy Duong Street, District 10, Ho Chi Minh City, Vietnam, or via email at leson1808@gmail.com. Researchers must submit a data-sharing agreement, agreeing to the terms and conditions outlined therein. The Director of the Medical Genetics Institute will review and approve all data-sharing agreements. There are no restrictions on access to the raw data; both commercial and academic researchers may use the data for their respective studies.The code used in this study is publicly available at https://github.com/QuynhPham1220/Combined-model, copy archived at Pham, 2024. Other code and software used in this publication are mentioned in the methodology. The processed data used to generate the plots in our manuscript are available via FigShare at https://doi.org/10.6084/m9.figshare.26936107.v2.

The following data sets were generated

1. Pham TMQ
2. Nguyen TN
3. Nguyen BQT
4. Tran TPD
5. Pham NMD
6. Nguyen HTP
7. Ho TKC
8. Nguyen DVL
9. Nguyen HT
10. Tran DH
11. Tran TS
12. Pham TVN
13. Le MT
14. Nguyen TTV
15. Phan MD
16. Giang H
17. Nguyen HN
18. Tran LS
(2024) figshare
The T Cell Receptor β Chain Repertoire of Tumor Infiltrating Lymphocytes Improves Neoantigen Prediction and Prioritization.

https://doi.org/10.6084/m9.figshare.26936107.v3

References

Data
1. 10x Genomics
(authors) (2024a) CD8+ T cells of Healthy Donor 1
10x Genomics.

https://www.10xgenomics.com/resources/datasets/cd-8-plus-t-cells-of-healthy-donor-1-1-standard-3-0-2
Data
1. 10x Genomics
(authors) (2024b) CD8+ T cells of Healthy Donor 2
10x Genomics.

https://www.10xgenomics.com/resources/datasets/cd-8-plus-t-cells-of-healthy-donor-2-1-standard-3-0-2
Data
1. 10x Genomics
(authors) (2024c) CD8+ T cells of Healthy Donor 3
10x Genomics.

https://www.10xgenomics.com/resources/datasets/cd-8-plus-t-cells-of-healthy-donor-3-1-standard-3-0-2
Data
1. 10x Genomics
(authors) (2024d) CD8+ T cells of Healthy Donor 4
10x Genomics.

https://www.10xgenomics.com/resources/datasets/cd-8-plus-t-cells-of-healthy-donor-4-1-standard-3-0-2
1. Amin MB
2. Greene FL
3. Edge SB
4. Compton CC
5. Gershenwald JE
6. Brookland RK
7. Meyer L
8. Gress DM
9. Byrd DR
10. Winchester DP
(2017) The Eighth Edition AJCC cancer staging manual: continuing to build a bridge from a population-based to a more “personalized” approach to cancer staging
CA 67:93–99.

https://doi.org/10.3322/caac.21388
- PubMed
- Google Scholar
Book
1. Andrews S
(2010)
FastQC: a quality control tool for high throughput sequence data

Babraham Bioinformatics, Babraham Institute.
- Google Scholar
1. Blass E
2. Ott PA
(2021) Advances in the development of personalized neoantigen-based therapeutic cancer vaccines
Nature Reviews. Clinical Oncology 18:215–229.

https://doi.org/10.1038/s41571-020-00460-2
- PubMed
- Google Scholar
(2014) Trimmomatic: a flexible trimmer for Illumina sequence data
Bioinformatics 30:2114–2120.

https://doi.org/10.1093/bioinformatics/btu170
- PubMed
- Google Scholar
(2022) Cancer neoantigens: challenges and future directions for prediction, prioritization, and validation
Frontiers in Oncology 12:836821.

https://doi.org/10.3389/fonc.2022.836821
- PubMed
- Google Scholar
Website
1. Catreux VJS
2. Murray L
3. Mehio R
4. Parnaby G
5. Roddey C
6. Ruehle M
(2022) DRAGEN sets new standard for data accuracy in PrecisionFDA benchmark data. Optimizing variant calling performance with Illumina machine learning and DRAGEN graph
Accessed January 12, 2022.

https://www.illumina.com/science/genomics-research/articles/dragen-shines-again-precisionfda-truth-challenge-v2.html
1. Chen F
2. Zou Z
3. Du J
4. Su S
5. Shao J
6. Meng F
7. Yang J
8. Xu Q
9. Ding N
10. Yang Y
11. Liu Q
12. Wang Q
13. Sun Z
14. Zhou S
15. Du S
16. Wei J
17. Liu B
(2019) Neoantigen identification strategies enable personalized immunotherapy in refractory solid tumors
The Journal of Clinical Investigation 129:2056–2070.

https://doi.org/10.1172/JCI99538
- PubMed
- Google Scholar
(2021) Challenges targeting cancer neoantigens in 2021: a systematic literature review
Expert Review of Vaccines 20:827–837.

https://doi.org/10.1080/14760584.2021.1935248
- PubMed
- Google Scholar
1. Chheda ZS
2. Kohanbash G
3. Okada K
4. Jahan N
5. Sidney J
6. Pecoraro M
7. Yang X
8. Carrera DA
9. Downey KM
10. Shrivastav S
11. Liu S
12. Lin Y
13. Lagisetti C
14. Chuntova P
15. Watchmaker PB
16. Mueller S
17. Pollack IF
18. Rajalingam R
19. Carcaboso AM
20. Mann M
21. Sette A
22. Garcia KC
23. Hou Y
24. Okada H
(2018) Novel and shared neoantigen derived from histone 3 variant H3.3K27M mutation for glioma T cell therapy
The Journal of Experimental Medicine 215:141–157.

https://doi.org/10.1084/jem.20171046
- PubMed
- Google Scholar
1. Dobin A
(2013) STAR: ultrafast universal RNA-seq aligner
Bioinformatics 29:15–21.

https://doi.org/10.1093/bioinformatics/bts635
- Google Scholar
1. Dudley JC
2. Lin MT
3. Le DT
4. Eshleman JR
(2016) Microsatellite instability as a biomarker for PD-1 Blockade
Clinical Cancer Research 22:813–820.

https://doi.org/10.1158/1078-0432.CCR-15-1678
- PubMed
- Google Scholar
1. Ganesh K
2. Stadler ZK
3. Cercek A
4. Mendelsohn RB
5. Shia J
6. Segal NH
7. Diaz LA Jr
(2019) Immunotherapy in colorectal cancer: rationale, challenges and potential
Nature Reviews. Gastroenterology & Hepatology 16:361–375.

https://doi.org/10.1038/s41575-019-0126-x
- PubMed
- Google Scholar
1. Guerder S
2. Flavell RA
(1995) T-cell activation: two for T
Current Biology: CB 5:866–868.

https://doi.org/10.1016/s0960-9822(95)00175-8
- Google Scholar
1. Hey S
2. Whyte D
3. Hoang MC
4. Le N
5. Natvig J
6. Wingfield C
7. Onyeama C
8. Howrylak J
9. Toby IT
(2023) Analysis of CDR3 sequences from t-cell receptor β in acute respiratory distress syndrome
Biomolecules 13:825.

https://doi.org/10.3390/biom13050825
- PubMed
- Google Scholar
1. Hundal J
2. Carreno BM
3. Petti AA
4. Linette GP
5. Griffith OL
6. Mardis ER
7. Griffith M
(2016) pVAC-Seq: a genome-guided in silico approach to identifying tumor neoantigens
Genome Medicine 8:11.

https://doi.org/10.1186/s13073-016-0264-5
- PubMed
- Google Scholar
1. Hundal J
2. Kiwala S
3. Feng YY
4. Liu CJ
5. Govindan R
6. Chapman WC
7. Uppaluri R
8. Swamidass SJ
9. Griffith OL
10. Mardis ER
11. Griffith M
(2019) Accounting for proximal variants improves neoantigen prediction
Nature Genetics 51:175–179.

https://doi.org/10.1038/s41588-018-0283-9
- PubMed
- Google Scholar
1. Hundal J
2. Kiwala S
3. McMichael J
4. Miller CA
5. Xia H
6. Wollam AT
7. Liu CJ
8. Zhao S
9. Feng YY
10. Graubert AP
11. Wollam AZ
12. Neichin J
13. Neveau M
14. Walker J
15. Gillanders WE
16. Mardis ER
17. Griffith OL
18. Griffith M
(2020) pVACtools: a computational toolkit to identify and visualize cancer neoantigens
Cancer Immunology Research 8:409–420.

https://doi.org/10.1158/2326-6066.CIR-19-0401
- PubMed
- Google Scholar
1. Joshi K
2. de Massy MR
3. Ismail M
4. Reading JL
5. Uddin I
6. Woolston A
7. Hatipoglu E
8. Oakes T
9. Rosenthal R
10. Peacock T
11. Ronel T
12. Noursadeghi M
13. Turati V
14. Furness AJS
15. Georgiou A
16. Wong YNS
17. Ben Aissa A
18. Sunderland MW
19. Jamal-Hanjani M
20. Veeriah S
21. Birkbak NJ
22. Wilson GA
23. Hiley CT
24. Ghorani E
25. Guerra-Assunção JA
26. Herrero J
27. Enver T
28. Hadrup SR
29. Hackshaw A
30. Peggs KS
31. McGranahan N
32. Swanton C
33. Quezada SA
34. Chain B
35. TRACERx consortium
(2019) Spatial heterogeneity of the T cell receptor repertoire reflects the mutational landscape in lung cancer
Nature Medicine 25:1549–1559.

https://doi.org/10.1038/s41591-019-0592-2
- PubMed
- Google Scholar
1. Kim VM
2. Pan X
3. Soares KC
4. Azad NS
5. Ahuja N
6. Gamper CJ
7. Blair AB
8. Muth S
9. Ding D
10. Ladle BH
11. Zheng L
(2020) Neoantigen-based EpiGVAX vaccine initiates antitumor immunity in colorectal cancer
JCI Insight 5:e136368.

https://doi.org/10.1172/jci.insight.136368
- PubMed
- Google Scholar
1. Koboldt DC
2. Zhang Q
3. Larson DE
4. Shen D
5. McLellan MD
6. Lin L
7. Miller CA
8. Mardis ER
9. Ding L
10. Wilson RK
(2012) VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing
Genome Research 22:568–576.

https://doi.org/10.1101/gr.129684.111
- PubMed
- Google Scholar
1. Kuhns MS
2. Badgandi HB
(2012) Piecing together the family portrait of TCR-CD3 complexes
Immunological Reviews 250:120–143.

https://doi.org/10.1111/imr.12000
- PubMed
- Google Scholar
1. Laghi L
2. Negri F
3. Gaiani F
4. Cavalleri T
5. Grizzi F
6. De’ Angelis GL
7. Malesci A
(2020) Prognostic and predictive cross-roads of microsatellite instability and immune response to colon cancer
International Journal of Molecular Sciences 21:9680.

https://doi.org/10.3390/ijms21249680
- PubMed
- Google Scholar
1. Le DT
2. Uram JN
3. Wang H
4. Bartlett BR
5. Kemberling H
6. Eyring AD
7. Skora AD
8. Luber BS
9. Azad NS
10. Laheru D
11. Biedrzycki B
12. Donehower RC
13. Zaheer A
14. Fisher GA
15. Crocenzi TS
16. Lee JJ
17. Duffy SM
18. Goldberg RM
19. de la Chapelle A
20. Koshiji M
21. Bhaijee F
22. Huebner T
23. Hruban RH
24. Wood LD
25. Cuka N
26. Pardoll DM
27. Papadopoulos N
28. Kinzler KW
29. Zhou S
30. Cornish TC
31. Taube JM
32. Anders RA
33. Eshleman JR
34. Vogelstein B
35. Diaz LA Jr
(2015) PD-1 blockade in tumors with mismatch-repair deficiency
The New England Journal of Medicine 372:2509–2520.

https://doi.org/10.1056/NEJMoa1500596
- PubMed
- Google Scholar
1. Le DT
2. Durham JN
3. Smith KN
4. Wang H
5. Bartlett BR
6. Aulakh LK
7. Lu S
8. Kemberling H
9. Wilt C
10. Luber BS
11. Wong F
12. Azad NS
13. Rucki AA
14. Laheru D
15. Donehower R
16. Zaheer A
17. Fisher GA
18. Crocenzi TS
19. Lee JJ
20. Greten TF
21. Duffy AG
22. Ciombor KK
23. Eyring AD
24. Lam BH
25. Joe A
26. Kang SP
27. Holdhoff M
28. Danilova L
29. Cope L
30. Meyer C
31. Zhou S
32. Goldberg RM
33. Armstrong DK
34. Bever KM
35. Fader AN
36. Taube J
37. Housseau F
38. Spetzler D
39. Xiao N
40. Pardoll DM
41. Papadopoulos N
42. Kinzler KW
43. Eshleman JR
44. Vogelstein B
45. Anders RA
46. Diaz LA Jr
(2017) Mismatch repair deficiency predicts response of solid tumors to PD-1 blockade
Science 357:409–413.

https://doi.org/10.1126/science.aan6733
- PubMed
- Google Scholar
(2009) The sequence alignment/map format and SAMtools
Bioinformatics 25:2078–2079.

https://doi.org/10.1093/bioinformatics/btp352
- PubMed
- Google Scholar
1. Li G
2. Mahajan S
3. Ma S
4. Jeffery ED
5. Zhang X
6. Bhattacharjee A
7. Venkatasubramanian M
8. Weirauch MT
9. Miraldi ER
10. Grimes HL
11. Sheynkman GM
12. Tilburgs T
13. Salomonis N
(2024) Splicing neoantigen discovery with SNAF reveals shared targets for cancer immunotherapy
Science Translational Medicine 16:eade2886.

https://doi.org/10.1126/scitranslmed.ade2886
- Google Scholar
1. Lu T
2. Zhang Z
3. Zhu J
4. Wang Y
5. Jiang P
6. Xiao X
7. Bernatchez C
8. Heymach JV
9. Gibbons DL
10. Wang J
11. Xu L
12. Reuben A
13. Wang T
(2021a) Deep learning-based prediction of the T cell receptor-antigen binding specificity
Nature Machine Intelligence 3:864–875.

https://doi.org/10.1038/s42256-021-00383-2
- PubMed
- Google Scholar
1. Lu YC
2. Zheng Z
3. Lowery FJ
4. Gartner JJ
5. Prickett TD
6. Robbins PF
7. Rosenberg SA
(2021b) Direct identification of neoantigen-specific TCRs from tumor specimens by high-throughput single-cell sequencing
Journal for Immunotherapy of Cancer 9:e002595.

https://doi.org/10.1136/jitc-2021-002595
- PubMed
- Google Scholar
1. Maleki Vareki S
(2018) High and low mutational burden tumors versus immunologically hot and cold tumors and response to immune checkpoint inhibitors
Journal for Immunotherapy of Cancer 6:157.

https://doi.org/10.1186/s40425-018-0479-7
- PubMed
- Google Scholar
1. Malekzadeh P
2. Yossef R
3. Cafri G
4. Paria BC
5. Lowery FJ
6. Jafferji M
7. Good ML
8. Sachs A
9. Copeland AR
10. Kim SP
11. Kivitz S
12. Parkhurst MR
13. Robbins PF
14. Ray S
15. Xi L
16. Raffeld M
17. Yu Z
18. Restifo NP
19. Somerville RPT
20. Rosenberg SA
21. Deniger DC
(2020) Antigen experienced t cells from peripheral blood recognize p53 neoantigens
Clinical Cancer Research 26:1267–1276.

https://doi.org/10.1158/1078-0432.CCR-19-1874
- PubMed
- Google Scholar
1. Mazzotti L
2. Gaimari A
3. Bravaccini S
4. Maltoni R
5. Cerchione C
6. Juan M
7. Navarro EAG
8. Pasetto A
9. Nascimento Silva D
10. Ancarani V
11. Sambri V
12. Calabrò L
13. Martinelli G
14. Mazza M
(2022) T-cell receptor repertoire sequencing and its applications: focus on infectious diseases and cancer
International Journal of Molecular Sciences 23:8590.

https://doi.org/10.3390/ijms23158590
- PubMed
- Google Scholar
1. McLaren W
2. Gil L
3. Hunt SE
4. Riat HS
5. Ritchie GRS
6. Thormann A
7. Flicek P
8. Cunningham F
(2016) The ensembl variant effect predictor
Genome Biology 17:122.

https://doi.org/10.1186/s13059-016-0974-4
- PubMed
- Google Scholar
1. Miao D
2. Margolis CA
3. Gao W
4. Voss MH
5. Li W
6. Martini DJ
7. Norton C
8. Bossé D
9. Wankowicz SM
10. Cullen D
11. Horak C
12. Wind-Rotolo M
13. Tracy A
14. Giannakis M
15. Hodi FS
16. Drake CG
17. Ball MW
18. Allaf ME
19. Snyder A
20. Hellmann MD
21. Ho T
22. Motzer RJ
23. Signoretti S
24. Kaelin WG Jr
25. Choueiri TK
26. Van Allen EM
(2018) Genomic correlates of response to immune checkpoint therapies in clear cell renal cell carcinoma
Science 359:801–806.

https://doi.org/10.1126/science.aan5951
- PubMed
- Google Scholar
1. Montemurro A
2. Schuster V
3. Povlsen HR
4. Bentzen AK
5. Jurtz V
6. Chronister WD
7. Crinklaw A
8. Hadrup SR
9. Winther O
10. Peters B
11. Jessen LE
12. Nielsen M
(2021) NetTCR-2.0 enables accurate prediction of TCR-peptide binding by using paired TCRα and β sequence data
Communications Biology 4:1060.

https://doi.org/10.1038/s42003-021-02610-3
- PubMed
- Google Scholar
(2010) Response definition criteria for ELISPOT assays revisited
Cancer Immunology, Immunotherapy 59:1489–1501.

https://doi.org/10.1007/s00262-010-0875-4
- PubMed
- Google Scholar
(2021)
Immunotherapy in microsatellite instability metastatic colorectal cancer: Current status and future perspectives

Journal of Clinical and Translational Research 7:511–522.
- PubMed
- Google Scholar
(2023) Machine learning methods and harmonized datasets improve immunogenic neoantigen prediction
Immunity 56:2650–2663.

https://doi.org/10.1016/j.immuni.2023.09.002
- PubMed
- Google Scholar
1. Nguyen BQT
2. Tran TPD
3. Nguyen HT
4. Nguyen TN
5. Pham TMQ
6. Nguyen HTP
7. Tran DH
8. Nguyen V
9. Tran TS
10. Pham TVN
11. Le MT
12. Phan MD
13. Giang H
14. Nguyen HN
15. Tran LS
(2023) Improvement in neoantigen prediction via integration of RNA sequencing data for variant calling
Frontiers in Immunology 14:1251603.

https://doi.org/10.3389/fimmu.2023.1251603
- PubMed
- Google Scholar
1. Overman MJ
2. McDermott R
3. Leach JL
4. Lonardi S
5. Lenz HJ
6. Morse MA
7. Desai J
8. Hill A
9. Axelson M
10. Moss RA
11. Goldberg MV
12. Cao ZA
13. Ledeine JM
14. Maglinte GA
15. Kopetz S
16. André T
(2017) Nivolumab in patients with metastatic DNA mismatch repair-deficient or microsatellite instability-high colorectal cancer (CheckMate 142): an open-label, multicentre, phase 2 study
The Lancet. Oncology 18:1182–1191.

https://doi.org/10.1016/S1470-2045(17)30422-9
- PubMed
- Google Scholar
Software
1. Pham TMQ
(2024) Combined-model, version swh:1:rev:c2d29b9949c44b14a8056d418108270e2f71fd76
Software Heritage.

https://archive.softwareheritage.org/swh:1:dir:c5f5ccec64767bed95e5e077be902621c34b9834;origin=https://github.com/QuynhPham1220/Combined-model;visit=swh:1:snp:0c619b1728d9a3c46a07b920927e440da49c88e9;anchor=swh:1:rev:c2d29b9949c44b14a8056d418108270e2f71fd76
Software
1. Picard
(2024) Picard
Github.

http://broadinstitute.github.io/picard/
(2022) T-cell repertoire diversity: friend or foe for protective antitumor response?
Journal of Experimental & Clinical Cancer Research 41:356.

https://doi.org/10.1186/s13046-022-02566-0
- PubMed
- Google Scholar
1. Rast JP
(1997) alpha, beta, gamma, and delta T cell antigen receptor genes arose early in vertebrate phylogeny
Immunity 6:1–11.

https://doi.org/10.1016/s1074-7613(00)80237-x
- Google Scholar
1. Reynisson B
2. Alvarez B
3. Paul S
4. Peters B
5. Nielsen M
(2020) NetMHCpan-4.1 and NetMHCIIpan-4.0: improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data
Nucleic Acids Research 48:W449–W454.

https://doi.org/10.1093/nar/gkaa379
- PubMed
- Google Scholar
1. Rosati E
2. Dowds CM
3. Liaskou E
4. Henriksen EKK
5. Karlsen TH
6. Franke A
(2017) Overview of methodologies for T-cell receptor repertoire analysis
BMC Biotechnology 17:61.

https://doi.org/10.1186/s12896-017-0379-9
- PubMed
- Google Scholar
1. Roudko V
2. Bozkus CC
3. Orfanelli T
4. McClain CB
5. Carr C
6. O’Donnell T
7. Chakraborty L
8. Samstein R
9. Huang KL
10. Blank SV
11. Greenbaum B
12. Bhardwaj N
(2020) Shared immunogenic poly-epitope frameshift mutations in microsatellite unstable tumors
Cell 183:1634–1649.

https://doi.org/10.1016/j.cell.2020.11.004
- PubMed
- Google Scholar
1. Schmidt J
2. Smith AR
3. Magnin M
4. Racle J
5. Devlin JR
6. Bobisse S
7. Cesbron J
8. Bonnet V
9. Carmona SJ
10. Huber F
11. Ciriello G
12. Speiser DE
13. Bassani-Sternberg M
14. Coukos G
15. Baker BM
16. Harari A
17. Gfeller D
(2021) Prediction of neo-epitope immunogenicity reveals TCR recognition determinants and provides insight into immunoediting
Cell Reports. Medicine 2:100194.

https://doi.org/10.1016/j.xcrm.2021.100194
- PubMed
- Google Scholar
1. Schumacher TN
2. Schreiber RD
(2015) Neoantigens in cancer immunotherapy
Science 348:69–74.

https://doi.org/10.1126/science.aaa4971
- PubMed
- Google Scholar
1. Shugay M
2. Bagaev DV
3. Zvyagin IV
4. Vroomans RM
5. Crawford JC
6. Dolton G
7. Komech EA
8. Sycheva AL
9. Koneva AE
10. Egorov ES
11. Eliseev AV
12. Van Dyk E
13. Dash P
14. Attaf M
15. Rius C
16. Ladell K
17. McLaren JE
18. Matthews KK
19. Clemens EB
20. Douek DC
21. Luciani F
22. van Baarle D
23. Kedzierska K
24. Kesmir C
25. Thomas PG
26. Price DA
27. Sewell AK
28. Chudakov DM
(2018) VDJdb: a curated database of T-cell receptor sequences with known antigen specificity
Nucleic Acids Research 46:D419–D427.

https://doi.org/10.1093/nar/gkx760
- PubMed
- Google Scholar
1. Szeto C
2. Lobos CA
3. Nguyen AT
4. Gras S
(2020) TCR Recognition of peptide-MHC-I: rule makers and breakers
International Journal of Molecular Sciences 22:68.

https://doi.org/10.3390/ijms22010068
- PubMed
- Google Scholar
1. Szolek A
2. Schubert B
3. Mohr C
4. Sturm M
5. Feldhahn M
6. Kohlbacher O
(2014) OptiType: precision HLA typing from next-generation sequencing data
Bioinformatics 30:3310–3316.

https://doi.org/10.1093/bioinformatics/btu548
- PubMed
- Google Scholar
Software
1. Takara Bio
(2024) Cogent NGS immune profiler software
Takara Bio.

https://www.takarabio.com/learning-centers/next-generation-sequencing/bioinformatics-resources/cogent-ngs-immune-profiler
1. Tan X
2. Li D
3. Huang P
4. Jian X
5. Wan H
6. Wang G
7. Li Y
8. Ouyang J
9. Lin Y
10. Xie L
(2020) dbPepNeo: a manually curated database for human tumor neoantigen peptides
Database 2020:baaa004.

https://doi.org/10.1093/database/baaa004
- PubMed
- Google Scholar
(2017) McPAS-TCR: a manually curated catalogue of pathology-associated T cell receptor sequences
Bioinformatics 33:2924–2929.

https://doi.org/10.1093/bioinformatics/btx286
- PubMed
- Google Scholar
1. Tong G-J
2. Zhang G-Y
3. Liu J
4. Zheng Z-Z
5. Chen Y
6. Niu P-P
7. Xu X-T
(2018) Comparison of the eighth version of the american joint committee on cancer manual to the seventh version for colorectal cancer: a retrospective review of our data
World Journal of Clinical Oncology 9:148–161.

https://doi.org/10.5306/wjco.v9.i7.148
- PubMed
- Google Scholar
1. Vita R
2. Mahajan S
3. Overton JA
4. Dhanda SK
5. Martini S
6. Cantrell JR
7. Wheeler DK
8. Sette A
9. Peters B
(2019) The immune epitope database (IEDB): 2018 update
Nucleic Acids Research 47:D339–D343.

https://doi.org/10.1093/nar/gky1006
- PubMed
- Google Scholar
1. Vitiello A
2. Zanetti M
(2017) Neoantigen prediction and the need for validation
Nature Biotechnology 35:815–817.

https://doi.org/10.1038/nbt.3932
- PubMed
- Google Scholar
1. Wu J
2. Zhao W
3. Zhou B
4. Su Z
5. Gu X
6. Zhou Z
7. Chen S
(2018) TSNAdb: a database for tumor-specific neoantigens from immunogenomics data analysis
Genomics, Proteomics & Bioinformatics 16:276–282.

https://doi.org/10.1016/j.gpb.2018.06.003
- PubMed
- Google Scholar
(2010) Structural biology of the T-cell receptor: insights into receptor assembly, ligand recognition, and initiation of signaling
Cold Spring Harbor Perspectives in Biology 2:a005140.

https://doi.org/10.1101/cshperspect.a005140
- PubMed
- Google Scholar
1. Xia J
2. Bai P
3. Fan W
4. Li Q
5. Li Y
6. Wang D
7. Yin L
8. Zhou Y
(2021) NEPdb: a database of T-Cell experimentally-validated neoantigens and pan-cancer predicted neoepitopes for cancer immunotherapy
Frontiers in Immunology 12:644637.

https://doi.org/10.3389/fimmu.2021.644637
- PubMed
- Google Scholar
1. Xie N
2. Shen G
3. Gao W
4. Huang Z
5. Huang C
6. Fu L
(2023) Neoantigens: promising targets for cancer therapy
Signal Transduction and Targeted Therapy 8:9.

https://doi.org/10.1038/s41392-022-01270-x
- PubMed
- Google Scholar
(2017) Tumor mutational burden and response rate to PD-1 inhibition
The New England Journal of Medicine 377:2500–2501.

https://doi.org/10.1056/NEJMc1713444
- PubMed
- Google Scholar
(2009) A clonotype nomenclature for T cell receptors
Immunogenetics 61:493–502.

https://doi.org/10.1007/s00251-009-0383-x
- PubMed
- Google Scholar
1. Yu Y
2. Zhang J
3. Ni L
4. Zhu Y
5. Yu H
6. Teng Y
7. Lin L
8. Xue Z
9. Xue X
10. Shen X
11. Song H
12. Su X
13. Sun W
14. Cai Z
(2022) Neoantigen-reactive T cells exhibit effective anti-tumor activity against colorectal cancer
Human Vaccines & Immunotherapeutics 18:1–11.

https://doi.org/10.1080/21645515.2021.1891814
- PubMed
- Google Scholar
1. Zhang W
2. Wang L
3. Liu K
4. Wei X
5. Yang K
6. Du W
7. Wang S
8. Guo N
9. Ma C
10. Luo L
11. Wu J
12. Lin L
13. Yang F
14. Gao F
15. Wang X
16. Li T
17. Zhang R
18. Saksena NK
19. Yang H
20. Wang J
21. Fang L
22. Hou Y
23. Xu X
24. Liu X
(2020) PIRD: pan immune repertoire database
Bioinformatics 36:897–903.

https://doi.org/10.1093/bioinformatics/btz614
- PubMed
- Google Scholar
1. Zhang G
2. Chitkushev L
3. Olsen LR
4. Keskin DB
5. Brusic V
(2021) TANTIGEN 2.0: a knowledge base of tumor T cell antigens and epitopes
BMC Bioinformatics 22:40.

https://doi.org/10.1186/s12859-021-03962-7
- PubMed
- Google Scholar
1. Zhou WJ
2. Qu Z
3. Song CY
4. Sun Y
5. Lai AL
6. Luo MY
7. Ying YZ
8. Meng H
9. Liang Z
10. He YJ
11. Li YH
12. Liu J
(2019a) NeoPeptide: an immunoinformatic database of T-cell-defined neoantigens
Database 2019:baz128.

https://doi.org/10.1093/database/baz128
- PubMed
- Google Scholar
1. Zhou C
2. Wei Z
3. Zhang Z
4. Zhang B
5. Zhu C
6. Chen K
7. Chuai G
8. Qu S
9. Xie L
10. Gao Y
11. Liu Q
(2019b) pTuneos: prioritizing tumor neoantigens from next-generation sequencing data
Genome Medicine 11:67.

https://doi.org/10.1186/s13073-019-0679-x
- PubMed
- Google Scholar

Article and author information

Author details

Thi Mong Quynh Pham

Medical Genetics Institute, Ho Chi Minh City, Viet Nam

Contribution
Data curation, Formal analysis, Methodology, Writing - original draft, Writing - review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-6977-6064
Thanh Nhan Nguyen

Medical Genetics Institute, Ho Chi Minh City, Viet Nam

Contribution
Data curation, Formal analysis, Methodology

Competing interests
No competing interests declared
Bui Que Tran Nguyen

Medical Genetics Institute, Ho Chi Minh City, Viet Nam

Contribution
Data curation, Formal analysis, Methodology

Competing interests
No competing interests declared
Thi Phuong Diem Tran

Medical Genetics Institute, Ho Chi Minh City, Viet Nam

Contribution
Data curation, Formal analysis, Methodology

Competing interests
No competing interests declared
Nguyen My Diem Pham

Medical Genetics Institute, Ho Chi Minh City, Viet Nam

Contribution
Data curation, Formal analysis, Methodology

Competing interests
No competing interests declared
Hoang Thien Phuc Nguyen

Medical Genetics Institute, Ho Chi Minh City, Viet Nam

Contribution
Data curation, Formal analysis, Methodology

Competing interests
No competing interests declared
Thi Kim Cuong Ho

Medical Genetics Institute, Ho Chi Minh City, Viet Nam

Contribution
Data curation, Formal analysis, Methodology

Competing interests
No competing interests declared
Dinh Viet Linh Nguyen

Medical Genetics Institute, Ho Chi Minh City, Viet Nam

Contribution
Data curation, Formal analysis, Methodology

Competing interests
No competing interests declared
Huu Thinh Nguyen

University Medical Center Ho Chi Minh City, Ho Chi Minh City, Viet Nam

Contribution
Conceptualization, Patient Consultation

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-0981-7274
Duc Huy Tran

University Medical Center Ho Chi Minh City, Ho Chi Minh City, Viet Nam

Contribution
Patients recruitment and tissue histological analysis

Competing interests
No competing interests declared
Thanh Sang Tran

University Medical Center Ho Chi Minh City, Ho Chi Minh City, Viet Nam

Contribution
Patients recruitment and tissue histological analysis

Competing interests
No competing interests declared
Truong Vinh Ngoc Pham

University Medical Center Ho Chi Minh City, Ho Chi Minh City, Viet Nam

Contribution
Patients recruitment and tissue histological analysis

Competing interests
No competing interests declared
Minh Triet Le

University Medical Center Ho Chi Minh City, Ho Chi Minh City, Viet Nam

Contribution
Patients recruitment and tissue histological analysis

Competing interests
No competing interests declared
Thi Tuong Vy Nguyen

Medical Genetics Institute, Ho Chi Minh City, Viet Nam

Contribution
Data curation, Formal analysis, Methodology

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0003-3436-3662
Minh-Duy Phan

Medical Genetics Institute, Ho Chi Minh City, Viet Nam

Contribution
Conceptualization, Writing - review and editing

Competing interests
No competing interests declared
Hoa Giang

Medical Genetics Institute, Ho Chi Minh City, Viet Nam

Contribution
Conceptualization, Methodology

Competing interests
No competing interests declared
Hoai-Nghia Nguyen

Medical Genetics Institute, Ho Chi Minh City, Viet Nam

Contribution
Conceptualization, Supervision

For correspondence
nhnghia81@gmail.com

Competing interests
No competing interests declared
Le Son Tran

Medical Genetics Institute, Ho Chi Minh City, Viet Nam

Contribution
Conceptualization, Writing - review and editing

For correspondence
leson1808@gmail.com

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-5382-3903

Funding

NextCalibur Therapeutic (NC01)

Le Son Tran
Minh-Duy Phan

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

The authors thank all participants who agreed to take part in this study. We thank Dr. Kien Nguyen for proofreading our manuscript.

Version history

Preprint posted: November 17, 2023
Sent for peer review: December 20, 2023
Reviewed Preprint version 1: April 12, 2024
Reviewed Preprint version 2: September 11, 2024
Version of Record published: October 28, 2024

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.94658. This DOI represents all versions, and will always resolve to the latest one.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.