1. Computational and Systems Biology
  2. Immunology and Inflammation
Download icon

Human T cell receptor occurrence patterns encode immune history, genetic background, and receptor specificity

  1. William S DeWitt III
  2. Anajane Smith
  3. Gary Schoch
  4. John A Hansen
  5. Frederick A Matsen IV
  6. Philip Bradley  Is a corresponding author
  1. Fred Hutchinson Cancer Research Center, United States
  2. University of Washington, United States
Research Article
Cite this article as: eLife 2018;7:e38358 doi: 10.7554/eLife.38358
12 figures, 4 tables, 3 data sets and 1 additional file

Figures

Clustering of TCR occurrence patterns across the full cohort (top) and within a cohort subset defined by a shared HLA allele (bottom).

As described in detail in the following sections, we used covariation analysis to identify clusters of co-occurring TCRβ chains. Here we provide a graphical introduction to these results by depicting occurrence patterns of clustered TCRs over the full cohort and over a cohort subset defined by a single HLA allele (HLA-A*01:01). TCR clusters over the full cohort are largely driven by the occurrence patterns of specific HLA alleles (compare the occurrence patterns of the top five global clusters to those of the top 5 HLA alleles, respectively), whereas HLA-restricted clusters may reflect shared immune exposures, as illustrated here by a CMV-associated TCR cluster (the pink cluster in the bottom panels). In the top left panels, occurrence patterns of HLA alleles and TCRβ chains (rows) are indicated for each of the cohort subjects (columns) by filled (black) matrix elements. The TCRβ chains chosen for depiction in the occurrence matrix are the members of the 28 global co-occurrence clusters identified in section 'Globally co-occurring TCR pairs form clusters defined by shared associations'. The TCRs (rows) are ordered by cluster membership as indicated by colored bands to the left of the matrix. The selected HLA alleles correspond to the strongest associations for the top 10 clusters (two of which are not HLA-associated). The cohort subjects are ordered by column similarity so as to emphasize block structure present in the matrix. The bottom left panels similarly show occurrence patterns for HLA-A*01:01-associated TCRβ chain clusters over the subset of subjects carrying this allele, alongside an indicator of cytomegalovirus seropositivity for each subject (red). In-depth analysis of these (and other) HLA-associated TCRβ clusters is presented in section 'HLA-restricted TCR clusters'. For visualization purposes, two-dimensional embeddings of the TCRβ chains based on their occurrence patterns (binary strings representing presence/absence in the subjects) are depicted in the right panels, with the TCR chains colored by cluster assignment and annotated by known associations.

https://doi.org/10.7554/eLife.38358.003
Strongly co-occurring TCR pairs form two broad classes distinguished by HLA-association strength.

The co-occurrence p-value PCO for each pair of public TCRs is plotted (x-axis) against the HLA-association p-value PHLA for the HLA allele with the strongest mutual association with that TCR pair (y-axis). There are 6092 TCR-pairs above the diagonal (y=x) and 4713 pairs below the diagonal.

https://doi.org/10.7554/eLife.38358.004
Figure 2—source data 1

TCR pairs and corresponding PCO and PHLA values.

https://doi.org/10.7554/eLife.38358.005
Figure 3 with 3 supplements
Clustering public TCRβ chains by co-occurrence over the full cohort identifies associations with HLA and TRBJ alleles as well as an invariant T cell subset.

(A) Graphical representations of the TCRβ chain occurrence matrix (lower left) and the HLA-allele occurrence matrix (upper left), restricted to members of the 28 global co-occurrence TCR clusters and the associated HLA alleles for the top 10 clusters, respectively. TCRβ chains (rows) are ordered by cluster membership and subjects (columns) are ordered by column similarity (Jaccard distance of TCR sets) to emphasize block structure present in the matrix. (B) Cluster size (x-axis) versus the p-value of the most significant HLA allele association (y-axis), with markers colored according to the locus of the associated allele. Dashed line indicates random expectation based on the total number of alleles, assuming independence. (C) Count of cluster member TCRs found in each subject for the cluster labeled ‘2’ in panel (B) (top right). The dotted line represents an averaged curve based on randomly and independently selecting subject sets for each member TCR. Red and blue dots indicate the occurrence of the DRB1*15:01 allele in the cohort. (D) Count of cluster member TCRs found in each subject for the cluster labeled ‘7’ in panel (B) (center bottom). The dotted line again represents a control pattern, and the red and blue dots indicate the occurrence of the TRBJ2-7*02 allele.

https://doi.org/10.7554/eLife.38358.006
Figure 3—source data 1

Cluster sizes and HLA-allele association p-values.

https://doi.org/10.7554/eLife.38358.010
Figure 3—figure supplement 1
TCRdist tree of the members of the TRBJ2-7*02-associated cluster.

TCRdist tree of the members of the TRBJ2-7*02-associated cluster. Average-linkage dendrogram of TCRdist receptor clusters colored by generation probability (Pgen), with TCR logos for selected receptor subsets (the branches of the tree enclosed in dashed boxes labelled with size of the TCR clusters). Each logo depicts the V- (left side) and J- (right side) gene frequencies, CDR3 amino acid sequences (middle), and inferred rearrangement structure (bottom bars coloured by source region, light grey for the V-region, dark grey for J, black for D, and red for N-insertions) of the grouped receptors.

https://doi.org/10.7554/eLife.38358.007
Figure 3—figure supplement 2
TCRdist tree of the members of the putative MAIT cell cluster.

TCRdist tree of the members of the putative MAIT cell cluster. Average-linkage dendrogram of TCRdist receptor clusters colored by generation probability (Pgen), with TCR logos for selected receptor subsets (the branches of the tree enclosed in dashed boxes labelled with size of the TCR clusters). Each logo depicts the V- (left side) and J- (right side) gene frequencies, CDR3 amino acid sequences (middle), and inferred rearrangement structure (bottom bars coloured by source region, light grey for the V-region, dark grey for J, black for D, and red for N-insertions) of the grouped receptors.

https://doi.org/10.7554/eLife.38358.008
Figure 3—figure supplement 3
More details on the MAIT cell cluster: subject age and N-nucleotide insertion distributions; TCRα chains paired with cluster member TCRβ chains in the pairSEQ dataset of (Howie et al., 2015).

Further details on the putative MAIT cell TCR cluster. (A) Distribution of N-nucleotide insertions for TCRβ chains in the MAIT cluster (red), in the DRB1*15-associated cluster (green), and in the union of the members of the top 10 clusters (excluding the members of the MAIT cluster, blue). MAIT cell cluster members have very few N-insertions relative to the members of the other clusters. (B) Subjects enriched for MAIT cluster TCRs (red curve) are younger than the cohort as a whole (blue curve), a trend that is further strengthened in the top half of the enriched subjects by member-TCR count (the ‘high-count subjects’, magenta curve). (C) TCRα chains paired with MAIT cluster TCRβ chains in the pairSEQ dataset of (Howie et al., 2015). Ten of the 36 paired TCRα chains match the MAIT sequence consensus (TRAV1-2, TRAJ20 or TRAJ33, and a 12 residue CDR3, enclosed in the blue box).

https://doi.org/10.7554/eLife.38358.009
Figure 4 with 2 supplements
HLA-associated TCRs are more clonally expanded and have lower generation probabilities than equally common, non-HLA associated TCRs.

(A) Comparison of clonal expansion index distributions for the set of HLA-associated TCRs (blue) and a cohort-frequency matched set of non HLA-associated TCRs (green). (B) Comparison of VDJ-rearrangement TCR generation probability (Pgen) distributions for the set of HLA-associated TCRs (blue) and a cohort-frequency matched set of non HLA-associated TCRs (green). (C) Two-dimensional probability density function (PDF) for the distribution of Pgen versus clonal expansion index for HLA-associated TCRs. Contours indicate level sets of the PDF. (D) Two-dimensional probability density function (PDF) for the distribution of Pgen versus clonal expansion index for background (non HLA-associated) TCRs whose cohort frequencies match the TCRs in (C).

https://doi.org/10.7554/eLife.38358.012
Figure 4—source data 1

Generation probabilities, clonal expansion indices, and allele associations for the TCRs analyzed here.

https://doi.org/10.7554/eLife.38358.015
Figure 4—figure supplement 1
Two-dimensional feature distributions for HLA-associated TCR subsets defined by HLA locus.

Two-dimensional distributions of TCR generation probability (x-axis, Pgen) and clonal expansion index (y-axis) for TCRs with the indicated HLA associations (panel headers), and for a background set of non-HLA associated, cohort-frequency matched TCRs.

https://doi.org/10.7554/eLife.38358.013
Figure 4—figure supplement 2
TCRdist trees of experimentally determined pathogen-responsive TCRβ chains for two immunodominant epitopes, EBV BMLF1280 and influenza M158, for comparison with TCRβ chains listed in Table 1.

TCRdist trees of experimentally determined pathogen-responsive TCRβ chains for two immunodominant epitopes, EBV BMLF1280 and influenza M158. TCR beta chain sequences were taken from the dataset of (Dash et al., 2017). On the right-hand side are average-linkage dendrograms of TCRdist receptor clusters colored by generation probability (Pgen). TCR logos for selected receptor subsets (the branches of the tree enclosed in dashed boxes labelled with size of the TCR clusters) are shown on the left. Each logo depicts the V- (left side) and J- (right side) gene frequencies, CDR3 amino acid sequences (middle), and inferred rearrangement structure (bottom bars coloured by source region, light grey for the V-region, dark grey for J, black for D, and red for N-insertions) of the grouped receptors.

https://doi.org/10.7554/eLife.38358.014
Figure 5 with 1 supplement
Rates of TCR association vary substantially across HLA loci.

The number of HLA-associated TCRs (y-axis) is plotted as a function of allele frequency in the cohort (x-axis). Best fit lines are shown for each locus and also for the set of five DR/DQ haplotypes (‘DRDQ’) which could not be separated into component alleles in this cohort. The following DR-DQ haplotype abbreviations are used: DRB1*03:01-DQ (DRB1*03:01-DQA1*05:01-DQB1*02:01), DRB1*15:01-DQ (DRB1*15:01-DQA1*01:02-DQB1*06:02), and DRB1*13:01-DQ (DRB1*13:01-DQA1*01:03-DQB1*06:03).

https://doi.org/10.7554/eLife.38358.016
Figure 5—source data 1

Allele frequencies and numbers of associated TCRs.

https://doi.org/10.7554/eLife.38358.018
Figure 5—figure supplement 1
HLA class associations are concordant with CD4/CD8 assignments based on independent repertoire data.

HLA class associations are concordant with CD4/CD8 assignments based on independent repertoire data. For each HLA-associated TCRβ chain, we counted the number of times it was seen in CD4+ versus CD8+ T cell repertoires from independent datasets (see Materials and methods). Given a threshold on the difference between these two counts, we assign as CD4+ (CD8+) all TCRβs whose CD4+ (CD8+) count exceeds its CD8+ (CD4+) count by at least that threshold and then calculate the fraction of TCRβs assigned to the ‘correct’ class (CD8+ for class I-associated TCRβs and CD4+ for class II-associated TCRβs). We can further stratify these accuracies by conditioning on the p-value of the HLA-association and plot them according to this p-value threshold (vertical axis; 7.5×106 corresponds to the approximate FDR threshold of 0.05 used to define HLA-associated TCRs) and the threshold on the CD4 vs CD8 counts difference (horizontal axis). In total, 6808 HLA-associated TCRβ chains occurred in at least one of the independent repertoire datasets.

https://doi.org/10.7554/eLife.38358.017
Figure 6 with 1 supplement
Many HLA-restricted TCR clusters contain TCRβ chains annotated as pathogen-responsive.

Each point represents one of the 78 significant HLA-restricted TCR clusters, plotted based on a normalized cluster size score (Ssize, x-axis) and an aggregate TCR co-occurrence score for the member TCRs (ZCO, y-axis). Markers are colored by the locus of the restricting HLA allele and sized based on the strength of the association between cluster member TCRs and the HLA allele. The database annotations associated to TCRs in each cluster are summarized with text labels using the following abbreviations: B19 = parvovirus B19, INF = influenza, EBV = Epstein Barr Virus, RA = rheumatoid arthritis, MS = multiple sclerosis, MELA = melanoma, T1D = type one diabetes, CMV = cytomegalovirus. Clusters labeled ‘coCMV’ are significantly associated (P<1×105) with CMV seropositivity (see main text discussion of cluster #3). Clusters labeled 1–5 are discussed in the text and examined in greater detail in Figure 7 and Figure 8.

https://doi.org/10.7554/eLife.38358.019
Figure 6—source data 1

Paired TCRα chain sequences from the pairSEQ dataset of (Howie et al., 2015) for all clusters with at least 2 matched TCRβ chains, along with a score for each cluster that assesses the degree of sequence similarity among the partner chains.

https://doi.org/10.7554/eLife.38358.021
Figure 6—figure supplement 1
Distributions of cluster co-occurrence scores on the two validation cohorts.

Smoothed distributions of cluster co-occurrence scores on the two validation cohorts. Gaussian kernel density estimation (KDE)-smoothed distributions of the cluster member TCR co-occurrence scores (ZCO) for the two validation cohorts. A standard normal distribution is shown as an approximate null expectation for these Z-scores.

https://doi.org/10.7554/eLife.38358.020
Top five HLA-restricted clusters (continued on following page).

Details on the TCR sequences, occurrence patterns, and annotations for the five most significant clusters (labeled 1–5 in Figure 6) based on size and TCR co-occurrence scores. Each panel consists of a TCRdist dendrogram (left side, labeled with annotation, CDR3 sequence, and occurrence counts for the member TCRs) and a per-subject TCR count profile (right side) showing the aggregate occurrence pattern of the member TCRs (blue curve) and a control pattern (green curve) produced by averaging occurrence counts from multiple independent randomizations of the subject set for each TCR. The numbers in the two ‘Counts’ columns represent the number of HLA+ (left) and HLA- (right) subjects whose repertoire contained the corresponding TCR, where HLA± means positive/negative for the restricting allele (for example, A*24:02 in the case of cluster 1). Annotations use the following abbreviations: B19 (parvovirus B19), INF (influenza virus), YFV (yellow fever virus), MELA (melanoma), T1D (type 1 diabetes), EBV (Epstein-Barr virus), RA (rheumatoid arthritis). In cases where the peptide epitope for the annotation match is known, the first three peptide amino acids are given after ‘-p’. Non-germline CDR3 amino acids with 2 or 3 non-templated nucleotides in their codon are shown in uppercase, while amino acids with only a single non-templated coding nucleotide are shown in lowercase.

https://doi.org/10.7554/eLife.38358.022
Top five HLA-restricted clusters (continued from previous page).

Clusters 3–5; see preceding legend for details.

https://doi.org/10.7554/eLife.38358.023
Negative correlation between HLA allele charge at DRB1 position 70 and CDR3 charge of HLA-associated TCRs.

(A–B) Allele charge (x-axis) versus average CDR3 charge of allele-associated TCRβ chains (y-axis) for 30 HLA-DRB1 alleles. Charge of the CDR3 loop was calculated over the full CDR3 sequence (A) or over the subset of CDR3 amino acids with at least one non-germline coding nucleotide (B). Correlation p-values correspond to a 2-sided test of the null hypothesis that the slope is zero, as implemented in the function scipy.stats.linregress (N=30 alleles). (C–D) CDR3 charge distributions for TCRs associated with alleles having defined charge at position 70 (x-axis) using the full (C) or non-germline (D) CDR3 sequence (mean values shown as white pluses). (E) Superposition of five TCR:peptide:HLA-DR crystal structures (PDB IDs 1j8h, 2iam, 2wbj, 3o6f, and 4e41; [Hennecke and Wiley, 2002; Deng et al., 2007; Harkiolaki et al., 2009; Yin et al., 2011; Deng et al., 2012]) showing the DRα chain in green, the DRβ chain in cyan, the peptide in magenta, the TCRβ chain in blue with the CDR3 loop colored reddish brown. The TCRα chain is omitted for clarity, and position 70 is highlighted in yellow.

https://doi.org/10.7554/eLife.38358.025
Figure 9—source data 1

Charge at position 70 and average CDR3 charge of allele-associated TCRs for 30 HLA-DRB1 alleles.

https://doi.org/10.7554/eLife.38358.026
CMV-associated TCRβ chains are largely HLA-restricted.

(A) Comparison of CMV-association (x-axis) and HLA-association (y-axis) p-values for 68 CMV-associated TCRβ chains shows that the majority are also HLA associated. (B) Smoothed densities comparing HLA-association p-value distributions for the 68 CMV-associated chains (blue) and a cohort-frequency matched set of 6800 randomly selected public TCRβ chains. CMV-associated TCRs are much more strongly HLA-associated than would be expected based solely on their cohort frequency. (C) CMV-association p-values computed over subsets of the cohort positive (x-axis) or negative (y-axis) for the HLA allele most strongly associated with each TCR. For most of the TCR chains, CMV association is restricted to the subset of the cohort positive for their associated HLA allele. (D) HLA-association p-values computed over CMV-positive (x-axis) or CMV-negative (y-axis) subsets of the cohort suggest that for these 68 CMV-associated TCRβ chains, HLA-association is driven solely by response to CMV (rather than generic affinity for their associated allele, for example, or additional self or viral epitopes). In panels (A), (C), and (D), points are colored by CMV-association p-value; in all panels we use a modified logarithmic scale based on the square root of the exponent when plotting p-values in order to avoid compression due to a few highly significant associations.

https://doi.org/10.7554/eLife.38358.027
Figure 10—source data 1

Full and subsetted CMV- and HLA- association p-values for 68 CMV-associated TCRs.

https://doi.org/10.7554/eLife.38358.028
Analysis of TCR sharing at the nucleotide level and VDJ recombination probabilities helps to identify potential contamination.

Each point represents a TCRβ nucleotide sequence that occurs in more than one repertoire, plotted according to its generation probability (Pgen, x-axis) and the number of repertoires in which it was seen (Nrepertoires, y-axis). Very low probability nucleotide sequences that are shared across many repertoires represent potential cross-contamination, as confirmed for one large cluster of artifactual sequences (see the main text). We excluded all TCRβ nucleotide sequences lying above the boundary indicated by the black line (N=592).

https://doi.org/10.7554/eLife.38358.029
Figure 11—source data 1

TCRβ nucleotide sequences excluded from our analysis.

https://doi.org/10.7554/eLife.38358.030
Schematic diagram illustrating the co-occurrence analysis.

Co-occurrence p-values are calculated to assess TCR-TCR (PCO) and TCR-HLA (PHLA) covariation across the cohort. Shared response to unknown immune exposures may explain strongly co-occurring TCR pairs, while significant HLA association can highlight functional TCRs. TCRβ chains are compared to a set of previously characterized TCRs for annotation purposes.

https://doi.org/10.7554/eLife.38358.031

Tables

Table 1
The top 50 most significant HLA-associated public TCRβ chains and the top 10 for A*02:01 (indicated in bold).
https://doi.org/10.7554/eLife.38358.011
Association p-valueOverlap*TCR Subjects HLA subjectsTotal subjects§V-familyCDR3HLA allele#Epitope annotation
3.7e-90231267268629TRBV19CASSIRSSYEQYFA*02:01Influenza virus
2.4e-72179191268629TRBV29CSVGTGGTNEKLFFA*02:01Epstein-Barr virus
3.8e-66107124134522TRBV20CSARNRDYGYTFDRB1*03:01-DQ
1.9e-659295151630TRBV05CASSLVVSPYEQYFDRB1*07:01
6.7e-649194134522TRBV30CAWSRDSGSGNTIYFDRB1*15:01-DQ
7.5e-59515366630TRBV15CATSREEGDGYTFB*35:01
3.6e-578996134522TRBV11CASSPGQGPGNTIYFDRB1*15:01-DQ
7.4e-56575795630TRBV02CASSENQGSQPQHFDRB1*04:01
1.5e-528687184629TRBV06CASSYDSGTGELFFC*07:01
3.3e-52136143268629TRBV19CASSIRSAYEQYFA*02:01Influenza virus
1.2e-51719694630TRBV27CASSLGGQNYGYTFB*44:02
1.8e-50525294630TRBV28CASSSSPLNYGYTFDRB1*01:01
3.8e-496971142630TRBV04CASSPGQGEGYEQYFB*08:01Epstein-Barr virus
6.3e-499298189629TRBV11CASSFGQMNTEAFFA*01:01
1.3e-487375156630TRBV18CASSPPTESYGYTFB*07:02
3.2e-487987151630TRBV14CASSQAGMNTEAFFDRB1*07:01
8.7e-47494995630TRBV11CASSLDQGGSSSYNEQFFDRB1*04:01
3.2e-46505195630TRBV20CSAQREYNEQFFDRB1*04:01
3.3e-466869134522TRBV05CASSFWGRDTQYFDRB1*03:01-DQ
3.3e-46545994630TRBV05CASSWTGGGGANVLTFDRB1*01:01
3.1e-45546094630TRBV02CASSEARGAGQPQHFDRB1*01:01
1.4e-44414269630TRBV14CASSPLGPGNTIYFDRB1*11:01
2.4e-4392121134522TRBV07CASSPTGLQETQYFDRB1*03:01-DQ
4.1e-43435261630TRBV19CASSPTGGIYEQYFB*44:03Multiple sclerosis
4.5e-43394066629TRBV10CASSESPGNSNQPQHFC*12:03
6.7e-437686134522TRBV28CASRGRPEAFFDRB1*15:01-DQ
7.5e-43505494630TRBV19CASSPTQNTEAFFDRB1*01:01
1.7e-4284110142630TRBV07CASSSGPNYEQYFB*08:01
1.7e-42618195630TRBV05CASSFPGEDTQYFDRB1*04:01
1.3e-41474995630TRBV18CASSPPAGAAYEQYFDRB1*04:01
1.5e-417587151630TRBV28CASSLTSGGQETQYFDRB1*07:01
2.3e-416467151630TRBV07CASSLGQGFYNSPLHFDRB1*07:01
8.2e-407792134522TRBV19CASSISVYGYTFDRB1*15:01-DQ
2.4e-39435466630TRBV10CAISTGDSNQPQHFB*35:01Epstein-Barr virus
3.4e-39115193156630TRBV09CASSGNEQFFB*07:02
9.5e-39151260189629TRBV19CASSIRDSNQPQHFA*01:01
1.2e-38100103268629TRBV20CSARDGTGNGYTFA*02:01Epstein-Barr virus
1.3e-385660130629TRBV25CASSEYSLTDTQYFC*04:01
2.1e-38109116268629TRBV20CSARDRTGNGYTFA*02:01Epstein-Barr virus
2.3e-38102106268629TRBV19CASSVRSSYEQYFA*02:01Influenza virus
6.4e-385454151630TRBV10CAISESQDLNTEAFFDRB1*07:01
1.1e-37434594630TRBV07CASSLAGPPNSPLHFDRB1*01:01
1.2e-37446066630TRBV09CASSARTGELFFB*35:01Epstein-Barr virus
3.3e-377988189629TRBV19CASSIDGEETQYFA*01:01
5.4e-376470134522TRBV05CASSLESPNYGYTFDRB1*03:01-DQ
2.0e-36384369630TRBV06CASGAGHTDTQYFDRB1*11:01
2.9e-365455151630TRBV05CASSLVVQPYEQYFDRB1*07:01
3.3e-36578195630TRBV11CASSPGQDYGYTFDRB1*04:01
2.4e-355053109522TRBV27CASNRQGPNTEAFFDQB1*03:01-DQA1*05:05
5.7e-357595134522TRBV18CASSGQANTEAFFDRB1*03:01-DQ
2.2e-338688268629TRBV14CASSQSPGGTQYFA*02:01Epstein-Barr virus
1.8e-328486268629TRBV10CASSEDGMNTEAFFA*02:01
4.3e-328689268629TRBV05CASSLEGQASSYEQYFA*02:01Melanoma
4.3e-328689268629TRBV29CSVGSGGTNEKLFFA*02:01Epstein-Barr virus
  1. *Number of subjects positive for both the TCRβ chain and the indicated HLA allele.

    †Number of subjects positive for the TCRβ chain with available HLA typing at the corresponding locus.

  2. ‡Number of subjects positive for the indicated HLA allele.

    §Total number of subjects with available HLA typing at the corresponding locus.

  3. #The following DR-DQ haplotype abbreviations are used: DRB1*03:01-DQ (DRB1*03:01-DQA1*05:01-DQB1*02:01) and DRB1*15:01-DQ (DRB1*15:01-DQA1*01:02-DQB1*06:02).

Table 2
Covariation between HLA allele charge and average CDR3 charge of HLA-associated TCRs for HLA positions frequently contacted by CDR3 amino acids in solved TCR:pMHC crystal structures.
https://doi.org/10.7554/eLife.38358.024
MHC ClassPosition*Contact frequencyFull CDR3Non-germline CDR3‡AAs§
R-valuep-valueR-valuep-value
II-β701.48−0.473.3e-04−0.526.1e-05DEGQR
II-α641.09−0.150.33−0.070.64ART
I1520.470.000.99−0.040.72AERTVW
I1510.460.080.500.060.59HR
I690.26−0.130.28−0.140.24ART
I760.21−0.080.49−0.140.25AEV
I700.120.020.860.080.50HKNQS
  1. *Only positions whose charge varies across alleles are included.

    †Total number of CDR3 residues contacted (using a sidechain heavyatom distance threshold of 4.5 Å) divided by number of structures analyzed.

  2. ‡CDR3 charge is calculated over amino acids with at least one non-germline coding nucleotide.

    §Amino acids present at this HLA position.

Table 3
HLA-restricted TCR clusters with size (Ssize) and co-occurrence (ZCO) scores, annotations (abbreviated as in Figure 6), and validation scores.
https://doi.org/10.7554/eLife.38358.032
RankHLA alleleAllele frequencyTCRsSubjectsCluster center

Ssize

ZCO

Annotations

ZCOKeck120

ZCOBrit86

1A*24:021023229TRBV05,CASSGSGGYNEQFF8.9517.64B1910.386.74
2A*02:012184366TRBV19,CASSGRSTDTQYF6.4713.01INF, T1D12.284.28
3DRB1*07:011191736TRBV09,CASSGQGAYEQYF4.0812.91coCMV9.466.40
4DRB1*15:01-DQ1121627TRBV19,CASSPDRSSYNEQFF4.2512.131.651.72
5B*08:011153034TRBV07,CASSQGPAYEQYF5.978.12EBV, RA3.831.83
6C*04:01104724TRBV19,CASSPGGDYNEQFF3.9411.584.482.01
7C*04:011041120TRBV04,CASSHSGTGETYEQYF4.919.037.521.66
8B*15:01552327TRBV19,CASSTTSGSYNEQFF5.437.5110.314.01
9DRB1*03:01-DQ1082639TRBV29,CSVAPGWGMNTEAFF4.498.6110.967.09
10A*01:01154844TRBV24,CATSDGDTQYF3.4710.21CMV, coCMV3.802.42
11B*35:01561824TRBV10,CATGTGDSNQPQHF4.986.13EBV, RA4.505.42
12DRB1*03:01-DQ1081135TRBV07,CASSLSLAGSYNEQFF3.098.155.351.40
13A*02:012181084TRBV20,CSARDRTGNGYTF3.816.66EBV7.143.50
14DRB1*15:01-DQ1121538TRBV05,CASSLRGVRTDTQYF3.058.088.733.31
15A*01:01154630TRBV10,CAISESRASGDYNEQFF3.147.6711.312.99
16DRB1*13:01-DQ4377TRBV20,CSASAGESNQPQHF3.147.64−0.55−0.35
17DRB1*03:01-DQ1081632TRBV20,CSARGGGRSYEQYF3.316.952.573.09
18DRB1*11:01581420TRBV06,CASSYSVRGRYSNQPQHF3.267.028.723.44
19C*08:0237615TRBV28,CASSLGIHYEQYF3.536.371.824.37
20DRB1*15:01-DQ1121351TRBV12,CASSLAGTEKLFF3.276.644.613.01
21DRB1*03:01-DQ1081123TRBV05,CASSSTGLRSYEQYF3.096.924.735.81
22A*02:01218764TRBV04,CASSQGTGRYEQYF3.516.072.793.23
23C*03:0472513TRBV09,CASSVAYRGNEQFF3.396.146.263.23
24DQB1*03:01-DQA1*05:05841039TRBV09,CASSVGTVQETQYF2.976.733.023.54
25DRB1*04:01782535TRBV05,CASSRQGAGETQYF3.006.315.821.55
26B*08:01115730TRBV12,CASSFEGLHGYTF2.676.673.772.95
27C*04:01104625TRBV06,CASRTGLAGTDTQYF3.584.783.533.76
28DRB1*07:01119942TRBV14,CASSLAGMNTEAFF3.155.546.995.58
29DQB1*03:01-DQA1*05:0584736TRBV02,CASSELENTEAFF2.975.765.253.24
30DPB1*03:01-DPA1*01:0342716TRBV30,CAWSADSNQPQHF3.564.162.421.73
31B*15:01551827TRBV29,CSVETRDYEQYF3.543.9413.814.29
32A*01:01154426TRBV09,CASSVGVDSTDTQYF2.396.24−0.312.17
33C*07:02142414TRBV25,CASSPGDEQYF2.945.11coCMV6.373.69
34B*08:01115638TRBV29,CSVGSGDYEQYF3.014.85EBV2.730.75
35A*01:01154637TRBV20,CSAPGQGAVEQYF2.795.242.423.00
36A*23:012257TRBV06,CASSDGNSGNTIYF3.384.021.914.11
37DQB1*03:01-DQA1*05:0584729TRBV15,CATSRDPGGNQPQHF2.974.825.002.67
38DPB1*04:01-DPA1*01:03274565TRBV19,CASSIKGDTEAFF3.314.144.893.42
39DPB1*04:01-DPA1*01:03274455TRBV19,CASRLSGDTQYF2.844.95COLO3.801.25
40B*07:02125737TRBV02,CASRGETQYF2.734.883.202.11
41B*44:0341920TRBV19,CASSATGGIYEQYF3.353.41MS6.618.76
42A*24:02102631TRBV30,CAWSPGTGDYEQYF3.053.913.562.99
43DRB1*07:011191331TRBV18,CASSPSVRNTEAFF2.894.205.320.96
44B*57:0127514TRBV12,CASSPPEGETQYF3.223.476.311.94
45C*06:0274414TRBV02,CASSAGTASTDTQYF2.814.27coCMV4.763.06
46A*11:014757TRBV09,CASSPKGVGYEQYF2.754.312.433.32
47DRB1*01:0182921TRBV19,CASSIPGLAYEQYF2.584.630.96−0.49
48B*07:02125721TRBV09,CASSDRRGYTF2.734.344.570.45
49B*08:01115622TRBV07,CASSSTGAGNQPQHF2.674.24EBV1.002.85
50B*18:014656TRBV27,CASSPTSEDTQYF2.574.265.79−0.23
51B*27:0536713TRBV06,CASSLRLAGLYEQYF2.643.819.251.08
52B*35:015647TRBV07,CASSQGPGRTYEQYF2.464.10--
53B*35:031647TRBV10,CAISVGNEQFF2.783.421.500.73
54A*02:012185126TRBV29,CSVGTGGTNEKLFF2.823.32EBV, MELA5.652.37
55DRB1*03:01-DQ108618TRBV02,CASSAGAGTEAFF2.364.170.982.79
56B*44:0279418TRBV02,CASSADSSYNEQFF2.573.652.092.12
57C*03:047238TRBV27,CASSPRPYNEQFF2.354.081.363.22
58A*24:02102412TRBV20,CSAREDGHEQYF2.623.540.832.94
59A*01:011541265TRBV19,CASSIRDHNQPQHF2.793.178.442.33
60B*27:0536412TRBV07,CASSPPGGSAYNEQFF2.643.231.132.12
61C*14:022349TRBV02,CASSGDTSTNEKLFF2.483.506.23-
62B*27:0536912TRBV27,CASSSGTSGNNEQFF2.643.164.323.24
63C*12:0353625TRBV15,CATSRENEKLFF2.902.511.883.08
64A*68:0129416TRBV05,CASSLIATNEKLFF2.712.883.671.23
65B*51:0153620TRBV04,CASSQDYPGGSYEQYF2.762.736.435.18
66B*35:015648TRBV27,CASSLGAATGELFF2.463.324.523.01
67B*15:0155420TRBV06,CASSAGTGRYEQYF2.443.182.402.23
68B*44:0341714TRBV07,CASSSGESGANVLTF2.972.013.924.81
69DRB1*04:021446TRBV03,CASSQASGGANEQFF2.443.042.042.22
70B*15:0155410TRBV19,CASSHRGGNEQFF2.443.030.923.58
71B*15:015557TRBV05,CASSLGVSAGELFF2.442.98−0.32−0.12
72A*32:013435TRBV12,CASSYGPGNQPQHF2.452.845.763.18
73A*02:01218423TRBV19,CASSTGTATNEKLFF2.422.890.84-
74DRB1*15:01-DQ112751TRBV28,CASSLLGGQPQHF2.582.350.661.89
75B*18:0146515TRBV27,CASSFPGKEQYF2.572.22−0.355.62
76B*49:011638TRBV29,CSVERGYNEQFF2.382.141.030.43
77A*23:012236TRBV20,CSARDREGAGYGYTF2.352.14−0.16−0.12
78B*55:0113310TRBV19,CASRGGNQPQHF2.362.090.95−0.28
Table 4
PDB structures analyzed.
https://doi.org/10.7554/eLife.38358.033
PDB ID*HLA alleleVαJαCDR3αVβJβCDR3βPeptide
5bs0A*01TRAV21*01TRAJ28*01CAVRPGGAGPFFVVFTRBV5-1*01TRBJ2-7*01CASSFNMATGQYFESDPIVAQY
3qdjA*02TRAV12-2*01TRAJ23*01CAVNFGGGKLIFTRBV6-4*01TRBJ1-1*01CASSLSFGTEAFFAAGIGILTV
4l3eA*02TRAV12-2*01TRAJ23*01CAVNFGGGKLIFTRBV6-4*01TRBJ1-1*01CASSWSFGTEAFFELAGIGILTV
5e9dA*02TRAV12-2*01TRAJ24*02CAVTKYSWGKLQFTRBV6-5*01TRBJ2-7*01CASRPGWMAGGVELYFELAGIGILTV
3qfjA*02TRAV12-2*01TRAJ24*02CAVTTDSWGKLQFTRBV6-5*01TRBJ2-7*01CASRPGLAGGRPEQYFLLFGFPVYV
4ftvA*02TRAV12-2*01TRAJ24*02CAVTTDSWGKLQFTRBV6-5*01TRBJ2-7*01CASRPGLMSAQPEQYFLLFGYPVYV
3hg1A*02TRAV12-2*01TRAJ27*01CAVNVAGKSTFTRBV30*01TRBJ2-2*01CAWSETGLGTGELFFELAGIGILTV
4eupA*02TRAV12-2*01TRAJ45*01CAVSGGGADGLTFTRBV28*01TRBJ2-1*01CASSFLGTGVEQYFALGIGILTV
5c0cA*02TRAV12-3*01TRAJ12*01CAMRGDSSYKLIFTRBV12-4*01TRBJ2-4*01CASSLWEKLAKNIQYFRQFGPDWIVA
5eu6A*02TRAV21*01TRAJ53*01CAVLSSGGSNYKLTFTRBV7-3*01TRBJ2-3*01CASSFIGGTDTQYFYLEPGPVTV
2p5eA*02TRAV21*01TRAJ6*01CAVRPLLDGTYIPTFTRBV6-5*01TRBJ2-2*01CASSYLGNTGELFFSLLMWITQC
2bnqA*02TRAV21*01TRAJ6*01CAVRPTSGGSYIPTFTRBV6-5*01TRBJ2-2*01CASSYVGNTGELFFSLLMWITQV
4mnqA*02TRAV22*01TRAJ40*01CAVDSATALPYGYIFTRBV6-5*01TRBJ1-1*01CASSYQGTEAFFILAKFLHWL
5menA*02TRAV22*01TRAJ40*01CAVDSATSGTYKYIFTRBV6-5*01TRBJ1-1*01CASSYQGTEAFFILAKFLHWL
5iszA*02TRAV24*01TRAJ27*01CAFDTNAGKSTFTRBV19*01TRBJ2-7*01CASSIFGQREQYFGILGFVFTL
5d2lA*02TRAV24*01TRAJ49*01CAFITGNQFYFTRBV7-2*02TRBJ2-5*01CASSQTQLWETQYFNLVPMVATV
3gsnA*02TRAV24*01TRAJ49*01CARNTGNQFYFTRBV6-5*01TRBJ1-2*01CASSPVTGGIYGYTFNLVPMVATV
5d2nA*02TRAV26-2*01TRAJ43*01CILDNNNDMRFTRBV7-6*01TRBJ1-4*01CASSLAPGTTNEKLFFNLVPMVATV
5euoA*02TRAV27*01TRAJ37*02CAGAIGPSNTGKLIFTRBV19*01TRBJ2-7*01CASSIRSSYEQYFGILGFVFTL
5hhoA*02TRAV27*01TRAJ42*01CAGAGSQGNLIFTRBV19*01TRBJ2-7*01CASSIRSSYEQYFGILEFVFTL
2vlrA*02TRAV27*01TRAJ42*01CAGAGSQGNLIFTRBV19*01TRBJ2-7*01CASSSRASYEQYFGILGFVFTL
1ogaA*02TRAV27*01TRAJ42*01CAGAGSQGNLIFTRBV19*01TRBJ2-7*01CASSSRSSYEQYFGILGFVFTL
1bd2A*02TRAV29/DV5*01TRAJ54*01CAAMEGAQKLVFTRBV6-5*01TRBJ2-7*01CASSYPGGGFYEQYFLLFGYPVYV
5e6iA*02TRAV35*01TRAJ37*02CAGPGGSSNTGKLIFTRBV27*01TRBJ2-2*01CASSLIYPGELFFGILGFVFTL
3qeqA*02TRAV35*01TRAJ49*01CAGGTGNQFYFTRBV10-3*01TRBJ1-5*01CAISEVGVGQPQHFAAGIGILTV
4zezA*02TRAV38-2/DV8*01TRAJ30*01CAYGEDDKIIFTRBV25-1*01TRBJ2-7*01CASRRGPYEQYFKLVALVINAV
5jhdA*02TRAV38-2/DV8*01TRAJ52*01CAWGVNAGGTSYGKLTFTRBV19*01TRBJ1-2*01CASSIGVYGYTFGILGFVFTL
3o4lA*02TRAV5*01TRAJ31*01CAEDNNARLMFTRBV20-1*01TRBJ1-2*01CSARDGTGNGYTFGLCTLVAML
3vxsA*24TRAV21*01TRAJ12*01CAVRMDSSYKLIFTRBV7-9*01TRBJ2-2*01CASSSWDTGELFFRYPLTLGWCF
3vxmA*24TRAV8-3*01TRAJ28*01CAVGAPSGAGSYQLTFTRBV4-1*01TRBJ2-7*01CASSPTSGIYEQYFRFPLTFGWCF
3sjvB*08TRAV12-1*01TRAJ23*01CVVRAGKLIFTRBV6-2*01TRBJ2-4*01CASGQGNFDIQYFFLRGRAYGL
3ffcB*08TRAV14/DV4*01TRAJ49*01CAMREDTGNQFYFTRBV11-2*01TRBJ2-3*01CASSFTWTSGGATDTQYFFLRGRAYGL
1mi5B*08TRAV26-2*01TRAJ52*01CILPLAGGTSYGKLTFTRBV7-8*01TRBJ2-7*01CASSLGQAYEQYFFLRGRAYGL
4qrpB*08TRAV9-2*01TRAJ43*01CALSDPVNDMRFTRBV11-2*01TRBJ1-5*01CASSLRGRGDQPQHFHSKKKCDEL
4g9fB*27TRAV14/DV4*02TRAJ21*01CAMRDLRDNFNKFYFTRBV6-5*01TRBJ1-1*01CASREGLGGTEAFFKRWIIMGLNK
4jrxB*35TRAV19*01TRAJ34*01CALSGFYNTDKLIFTRBV6-1*01TRBJ1-1*01CASPGETEAFFLPEPLPQGQLTAY
2ak4B*35TRAV19*01TRAJ34*01CALSGFYNTDKLIFTRBV6-1*01TRBJ2-7*01CASPGLAGEYEQYFLPEPLPQGQLTAY
3mv7B*35TRAV20*01TRAJ58*01CAVQDLGTSGSRLTFTRBV9*01TRBJ2-2*01CASSARSGELFFHPVGEADYFEY
4jryB*35TRAV39*01TRAJ33*01CAVGGGSNYQLIWTRBV5-6*01TRBJ2-7*01CASSRTGSTYEQYFLPEPLPQGQLTAY
3dxaB*44TRAV26-1*01TRAJ13*02CIVWGGYQKVTFTRBV7-9*01TRBJ2-1*01CASRYRDDSYNEQFFEENLLDFVRF
3kprB*44TRAV26-2*01TRAJ52*01CILPLAGGTSYGKLTFTRBV7-8*01TRBJ2-7*01CASSLGQAYEQYFEEYLKAWTF
4mjiB*51TRAV17*01TRAJ22*01CATDDDSARQLTFTRBV7-3*01TRBJ2-2*01CASSLTGGGELFFTAFTIPSI
2yplB*57TRAV5*01TRAJ13*01CAVSGGYQKVTFTRBV19*01TRBJ1-2*01CASTGSYGYTFKAFSPEVIPMF
4p4kDPA1*01/DPB1*352TRAV9-2*01TRAJ28*01CALSLYSGAGSYQLTFTRBV5-1*01TRBJ2-5*01CASSLAQGGETQYFQAFWIDLFETIG
4mayDQA1*01/DQB1*05TRAV13-1*01TRAJ48*01CAASSFGNEKLTFTRBV7-3*01TRBJ2-3*01CATSALGDTQYFQLVHFVRDFAQL
5ks9DQA1*03/DQB1*03TRAV20*01TRAJ39*01CAVALNNNAGNMLTFTRBV9*01TRBJ2-3*01CASSVAPGSDTQYFAPSGEGSFQPSQENPQ
4gg6DQA1*03/DQB1*03TRAV26-2*01TRAJ45*01CILRDGRGGADGLTFTRBV9*01TRBJ2-7*01CASSVAVSAGTYEQYFQQYPSGEGSFQPSQENPQ
4z7uDQA1*03/DQB1*03TRAV26-2*01TRAJ49*01CILRDRSNQFYFTRBV9*01TRBJ2-5*01CASSTTPGTGTETQYFAPSGEGSFQPSQENPQGS
4z7vDQA1*03/DQB1*03TRAV26-2*01TRAJ54*01CILRDSRAQKLVFTRBV9*01TRBJ2-7*01CASSAGTSGEYEQYFAPSGEGSFQPSQENPQGS
4z7wDQA1*03/DQB1*03TRAV8-3*01TRAJ36*01CAVGETGANNLFFTRBV6-1*01TRBJ2-1*01CASSEARRYNEQFFAPSGEGSFQPSQENPQGS
4ozhDQA1*05/DQB1*02TRAV26-1*01TRAJ32*01CIVWGGATNKLIFTRBV7-2*01TRBJ2-3*01CASSVRSTDTQYFAPQPELPYPQPGS
4ozgDQA1*05/DQB1*02TRAV26-1*01TRAJ45*01CIVLGGADGLTFTRBV7-2*01TRBJ2-3*01CASSFRFTDTQYFAPQPELPYPQPGS
4ozfDQA1*05/DQB1*02TRAV26-1*01TRAJ54*01CIAFQGAQKLVFTRBV7-2*01TRBJ2-3*01CASSFRALAADTQYFAPQPELPYPQPGS
4oziDQA1*05/DQB1*02TRAV4*01TRAJ4*01CLVGDGGSFSGGYNKLIFTRBV20-1*01TRBJ2-5*01CSAGVGGQETQYFQPFPQPELPYPGS
5ksaDQA1*05/DQB1*03TRAV20*01TRAJ33*01CAVQFMDSNYQLIWTRBV9*01TRBJ2-7*01CASSVAGTPSYEQYFQPQQSFPEQEA
5ksbDQA1*05/DQB1*03TRAV20*01TRAJ6*01CAVQASGGSYIPTFTRBV9*01TRBJ2-3*01CASSNRGLGTDTQYFGPQQSFPEQEA
4e41DRA*01/DRB1*01TRAV22*01TRAJ18*01CAVDRGSTLGRLYFTRBV5-8*01TRBJ2-5*01CASSQIRETQYFGELIGILNAAKVPAD
2iamDRA*01/DRB1*01TRAV22*01TRAJ54*01CAALIQGAQKLVFTRBV6-6*01TRBJ1-3*01CASTYHGTGYFGELIGILNAAKVPAD
1fytDRA*01/DRB1*01TRAV8-4*01TRAJ48*01CAVSESPFGNEKLTFTRBV28*01TRBJ1-2*01CASSSTGLPYGYTFPKYVKQNTLKLAT
3o6fDRA*01/DRB1*04TRAV26-2*01TRAJ32*01CTVYGGATNKLIFTRBV20-1*01TRBJ1-6*01CSARGGSYNSPLHFFSWGAEGQRPGFGSGG
1j8hDRA*01/DRB1*04TRAV8-4*01TRAJ48*01CAVSESPFGNEKLTFTRBV28*01TRBJ1-2*01CASSSTGLPYGYTFPKYVKQNTLKLAT
2wbjDRA*01/DRB1*15TRAV17*01TRAJ40*01CATDTTSGTYKYIFTRBV20-1*01TRBJ2-1*01CSARDLTSGANNEQFFMDFARVHFISALHGSGG
4h1lDRA*01/DRB3*03TRAV8-3*01TRAJ37*01CAVGASGNTGKLIFTRBV19*01TRBJ2-2*01CASSLRDGYTGELFFQHIRCNIPKRISA
1zglDRA*01/DRB5*01TRAV9-2*01TRAJ12*01CALSGGDSSYKLIFTRBV5-1*01TRBJ1-1*01CASSLADRVNTEAFFVHFFKNIVTPRTPGG
  1. *If there are multiple structures with the same TCR and HLA allele, only the ID of the highest-resolution structure is given. During CDR3β contact analysis, however, we combined the contacts from all redundant structures, downweighting so as to equalize the contribution from all TCR/HLA pairs.

Data availability

Data and analysis scripts needed to reproduce the findings of this study have been deposited in the Zenodo database (doi:10.5281/zenodo.1248193).

The following data sets were generated
  1. 1
The following previously published data sets were used
  1. 1
  2. 2

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)