Diversity of T cell receptor (TCR) repertoires, generated by somatic DNA rearrangements, is central to immune system function. However, the level of sequence similarity of TCR repertoires within and between species has not been characterized. Using network analysis of high-throughput TCR sequencing data, we found that abundant CDR3-TCRβ sequences were clustered within networks generated by sequence similarity. We discovered a substantial number of public CDR3-TCRβ segments that were identical in mice and humans. These conserved public sequences were central within TCR sequence-similarity networks. Annotated TCR sequences, previously associated with self-specificities such as autoimmunity and cancer, were linked to network clusters. Mechanistically, CDR3 networks were promoted by MHC-mediated selection, and were reduced following immunization, immune checkpoint blockade or aging. Our findings provide a new view of T cell repertoire organization and physiology, and suggest that the immune system distributes its TCR sequences unevenly, attending to specific foci of reactivity.https://doi.org/10.7554/eLife.22057.001
The T-cell receptor (TCR), which is generated through random rearrangement of genomic V-D-J segments, is the mediator of specific antigen recognition by T lymphocytes. The collective variety of these receptors expressed by an individual, the TCR repertoire, reflects the state of the adaptive immune system and its history, as its composition changes throughout life in response to immune challenges. The individual TCR repertoire is shaped by biases in the process of VDJ recombination (Robins et al., 2010; Miles et al., 2011; Murugan et al., 2012; Ndifon et al., 2012), and by the subsequent expansion and deletion of certain T cell clones upon antigen recognition during T cell development in the thymus, and later in the periphery.
Here, we studied the organization of TCR repertoires using high-throughput TCR sequencing, comparing data from mice and humans. We focused on the CDR3 (complementary determining region 3) amino acid (AA) sequence of the TCRβ chain, which is the most diverse segment of the TCR and is positioned to interact with the antigenic peptide epitope presented by an MHC molecule (Davis and Bjorkman, 1988). The organization of TCR repertoires of individual mice and humans was evaluated using network analysis, where CDR3 sequences were connected based on their level of sequence similarity.
Initially, we constructed TCR networks from a dataset of TCRβ AA sequences obtained from splenic CD4+ T cells from 12 healthy C57BL/6 mice (Madi et al., 2014). We obtained on average about 30,000 different CDR3 sequences from each mouse, which were found at varying abundances and had an average length of 13.4 ± 1.4 (mean ±SD) AA. Figure 1A shows a network obtained using the thousand most frequent CDR3 sequences from a single mouse, which in terms of abundance correspond to 34% of the total sequences obtained for that mouse. CDR3 sequences (nodes) were connected (by edges) if they were separated by one amino acid difference (replacement/addition/ deletion of one AA) – a Levenshtein distance of 1(Levenshtein, 1966). A cluster was defined as a set of two or more nodes that are connected to each other by any number of edges and intermediate nodes (Figure 1A, inset). A similar analysis had previously revealed the existence of networks of B-cell immunoglobulin heavy-chains, which were attributed to clonally derived sequences generated by somatic hyper-mutations (SHM) (Ben-Hamo and Efroni, 2011; Bashford-Rogers et al., 2013). Our analysis demonstrated the existence of networks also for TCRβ sequences. As T cells do not undergo SHM, other factors lead to the formation of TCR similarity networks.
We repeated this analysis for all 12 mice, and found that of the thousand most frequent CDR3 sequences in each mouse (with an accumulated frequency of 34.5 ± 8% of total sequences), 647 ± 104 (mean ±SD) were clustered, with 1282 ± 383 edges. In contrast, networks composed of a thousand randomly selected CDR3 sequences from a single mouse (with an accumulated frequency of 5 ± 0.7% of total sequences) were much sparser (Figure 1B), with only 225 ± 64 sequences clustered, and with 152 ± 52 edges (average values for 10 independent randomized sets of sequences). These results were not sensitive to the number of sequences used for the analysis (Figure 1—figure supplement 1).
To contrast the TCR networks with their BCR counterparts, we tested whether these networks are structurally similar. BCR networks have been shown to center around highly abundant clones, representing a snapshot of the individual-specific local evolution driven by SHM. However, we found no correlation (R2 = 0.11 ± 0.07) between the abundance of a TCR CDR3 sequence and its degree of connectivity in the network (number of edges connecting it to other sequences). We further found that each cluster typically contained sequences of a single (or in some cases two) specific J segment (Figure 1—figure supplement 2). V usage, in contrast, was not cluster-specific; any cluster contained sequences with many different V segments (Figure 1—figure supplement 2). This reflects the higher number of V segments compared with J segments, as well as their lower overlap with CDR3 and the relative similarity of their 3’ ends. Networks of similar connectivity were obtained also for the top 1000 CDR3β sequences from CD8 T cells, and for CD4 T cells of a different mouse strain (C3H.HeSnJ), that bears a different MHC haplotype (H2k; Figure 1—figure supplement 3, Figure 1—figure supplement 4).
We found a parallel network organization also in human TCRβ repertoires: we analyzed previously published data containing the TCRβ repertoires of 39 human subjects of different ages (Britanova et al., 2014), and found that the most abundant CDR3 sequences formed connected clusters in human TCR repertoires (Figure 1C, Supplementary file 1, and Figure 1—figure supplement 1), though with a lower connectivity than that found in the similarity networks of inbred mice. From the thousand most frequent CDR3 sequences (accumulated frequency of 17.1 ± 6.6% of total sequences) in each of the 11 young human subjects in that study (ages 6–25 years), 207 ± 79 nodes were clustered, with 367 ± 201 edges. Networks composed of randomly selected sequences from the individual subjects generated only 8 ± 4 clustered nodes with 4 ± 2 edges. We thus conclude that these newly discovered TCR similarity networks are likely to be driven by conserved evolutionary forces, as opposed to BCR networks that are generated by SHM that operates within individuals.
Next, we tested whether these TCR networks reflect our previous finding that TCRβ CDR3 AA sequences express a range of sharing levels between individual mice. As a measure of sharing level, we used a reference dataset of 28 mice (Madi et al., 2014) and assigned to each CDR3 AA sequence in a network a sharing level ranging from 1 (private, found in only one mouse in the reference dataset) to 28 (public, found in all 28 mice in the reference dataset) (Madi et al., 2014). Interestingly, we found a strong association between the sharing level of a CDR3 sequence and its connectivity in the network: highly shared sequences are positioned at the center of network clusters (Figure 1A). This is indicated by a statistically significant correlation between the degree of node connectivity (number of edges connecting it to other nodes in the network) and its sharing level (Figure 1D), (R = 0.69 ± 0.03, p-value<2.2e-16; see also Supplementary file 1). An independent method for estimation of node centrality, betweenness centrality, confirmed the correlation between CDR3 sharing and centrality for the 1000 most abundant CDR3 sequences, but not for a random set of expressed sequences (Figure 1—figure supplement 5, Supplementary file 1). As in mice, public CDR3 sequences in humans manifested a higher degree of connectivity than did more private sequences (Figure 1C, Figure 1—figure supplement 6), and sequence abundance was not correlated with its level of connectivity (Supplementary file 1). Thus, private and public CDR3 sequences are distributed differently across the mouse and human networks: public sequences are highly connected to other similar sequences and are more central in network clusters; in contrast, more private sequences are found at the edges of clusters, or as un-connected nodes, with rare similarity to other sequences in the network.
These findings of a similar organization of mouse and human TCR networks prompted us to look for the existence of shared CDR3β sequences between the two species. Interestingly, we found that a substantial number of TCRβ CDR3 AA sequences were shared by mice and humans. Out of 5,247,785 unique AA sequences in the human dataset (11 young individuals) and 371,977 in the mouse dataset (28 animals), 27,337 were shared by at least one mouse and one human individual. In general, CDR3 sequences with a higher level of sharing in mice were found to have an increased probability of being found in human repertoires; similarly, sequences more shared in humans were found more frequently in mice (Figure 2A, Figure 2—figure supplement 1). Of note, more than 25% of the public CDR3 sequences (found in all 11 young human subjects, or found in all 28 mice) were found also in at least one individual of the other species (Figure 2A).
We defined a set of cross-species (CS) public CDR3 sequences that were public or relatively public in both mice (found in at least 25 of the 28 mice) and humans (found in all 11 young individuals). All these 86 CS-public sequences contained the human Jβ2.7 or Jβ2.3 segments, and the mouse Jβ2.5 or Jβ2.7 segments. V usage was dominated by Vβ20.1 in humans, but a more diverse V usage was observed in mice. Examples of CS-public sequences are shown in Figure 2B. The CS-public CDR3 sequences manifested a significantly higher degree of connectivity in human and mouse networks than did CDR3 sequences that were public only in humans, only in mice or not public in either (Figure 2C,D and Figure 2—figure supplement 2). Moreover, we found a significant correlation between the mean degrees of CS-public sequences in mouse and human networks (Figure 2—figure supplement 3); CS-public sequences that have more neighbors in mouse networks also tended to have more neighbors in human networks, suggesting an evolutionarily conserved network structure. We note that while CS-public sequences are central in network clusters, their frequency is not higher than that of other public sequences that are found only in humans or in mice. These findings propose that similar driving forces may generate and expand particular public CDR3 TCR sequences that contain conserved sequence motifs in the two species.
To further characterize the mechanisms that contribute to the generation of CS-public sequences, we evaluated their existence in synthetic TCR repertoires that simulate the random generation of TCR sequences (see methods). These simulations do not include any clonal selection, thus they allow discrimination between genetic mechanisms that influence the generation of TCRs and selection mechanisms that shape it somatically. We generated 100 datasets of simulated repertoires of 28 mice and 11 humans, the sizes of which matched the sizes of the experimental repertoires. The simulated repertoires contained a somewhat larger number of CS-public CDR3 sequences than observed in the experimental data (average of 221 ± 9 in the simulations, vs. 86 in the data). The simulated CS-public sequences contained the same restricted set of mouse and human J segments, which are highly similar between the two species (J2.7 mouse and human; J2.5 mouse/J2.3 human). Thus, sequence homology of J segments contributes to the formation of CS-public TCRs, but is not sufficient by itself, and is accompanied by other mechanisms that induce bias in the recombination process (e.g. biased V segment usage, statistics of nucleotide deletions and insertions at V-D and D-J junctions). We also asked whether the simulated repertoires contained the same CS-public sequences as those observed experimentally. We found that 54 out of the 86 experimentally observed CS-public sequences were identical to simulated CS-public sequences, while 32 were not CS-public in the simulations (Figure 2—figure supplement 4). The partial overlap between simulations and data may result from inaccuracies in the assumptions of the simulations regarding the random TCR generation process, or indicate that selection mechanisms in the thymus and in the periphery further influence the existence of specific CS-public sequences.
We further evaluated the similarity between public sequences by analyzing the level of connectivity within a network composed of the most highly shared CDR3 sequences. A network formed by the 1000 most public mouse sequences (found in >25 of the 28 mice) was highly connected, with 965 clustered nodes and 3387 edges (Figure 3A). In contrast, networks formed by the 1000 most abundant private sequences (found in only one of the 28 mice) were very sparse, manifesting only 38 ± 15 clustered nodes and 20 ± 7 edges (mean ± SD, averaged over 28 mice). Similarly, a network formed by the 1000 most public human CDR3 sequences was also highly connected (with 969 clustered nodes and 4398 edges, Figure 3B).
The functional TCR is formed by a complex of TCR alpha and beta chains (Davis and Bjorkman, 1988), hence one cannot attribute specific antigen recognition to CDR3β segments alone. Moreover, the current level of understanding precludes the development of general predicting tools that can computationally relate a TCR sequence to an antigen that it recognizes. Defining TCR antigen specificity is further complicated by substantial TCR cross-reactivity (Burrows et al., 1997; Wooldridge et al., 2012). Yet, TCRβ sequences that bind the same pMHC antigen do contain shared CDR3β sequence motifs (Klinger et al., 2015; Chen et al., 2017; Sun et al., 2017; Tickotsky et al., 2017). Thus, some insight on antigen specificity can be gained by linking the sequence-similarity networks to previously annotated TCR sequences. We have reported that 124 of the CDR3β sequences in our mouse dataset were associated with various mouse immune reactivities previously described in the literature (Madi et al., 2014). As a step towards relating antigen specificity to the clusters of public CDR3 sequences, we looked for these 124 annotated CDR3β sequences within the clusters of shared CDR3 sequences. The annotated sequences were grouped according to four categories: a) Immunity to foreign pathogens; b) Allograft reactions; c) Tumor-associated T cells; and d) Autoimmune conditions. Figure 3A includes these annotations in the network formed by the 1000 most public CDR3β sequences. Out of the 124 annotated sequences, 63 were either identical to one of the existing nodes (n = 11), or linked to an existing node by a Levenshtein distance of 1 (n = 52). The clustered annotated nodes were found to be enriched with annotations related to self or self-like autoimmune, cancer or allograft reactions (self-related: 51/63 = 81% of network-clustered sequences vs. 85/124 = 69% in all 124 annotated sequences, compared to non-self: 12/63 = 19% in clusters vs. 39/124 = 31%; Fisher exact test p=0.0035).
We find that sequences with a similar annotation tended to be linked in the same cluster. Examples include twelve sequences of tumor infiltrating regulatory T cells (Sainz-Perez et al., 2012) which were found in cluster #2; six COPD related CDR3 sequences (Motz et al., 2008) in cluster #6; and four CDR3 sequences connected with cluster #2 that were associated with type 1 diabetes in NOD mice in two different studies (Nakano et al., 1991; Tikochinski et al., 1999). However, different annotations can also be found in the same cluster (Figure 3A); for example, mouse CDR3 sequences associated with experimental autoimmune encephalomyelitis (EAE; [Menezes et al., 2007]) and collagen-induced arthritis (CIA; [Osman et al., 1993]) were also connected to cluster #2. Figure 3B shows that many previously annotated self/self-like sequences of humans and mice were also linked to clusters in the network of public human sequences. Thus, the CDR3 clusters, which serve as repertoire foci, seem to be enriched with TCR sequences that are associated with self (or self-like) reactivities, whereas pathogen-associated TCR sequences are less clustered and so tend to be more evenly spread throughout sequence space.
To analyze mechanisms involved in network formation, we investigated the contribution of antigen selection using two complimentary approaches. First, we analyzed similarity networks formed by CDR3 sequences of CD4-CD8-double-negative (DN) thymocytes. Rearranged TCRβ chains in DN cells are not subject to MHC-dependent selection, which only occurs at later stages of thymic development. We found that networks formed by DN CDR3 sequences were significantly less connected compared to splenic CD4+ T cells, which have undergone antigen selection (Figure 4A and Supplementary file 2). In addition, DN thymocytes and CD4+ spleen T cells manifested different levels of convergent recombination (Venturi et al., 2006, 2008). Public CDR3 AA sequences in DN thymocytes were encoded on average by a low number of nucleotide (nt) sequences, whereas the same AA sequences were encoded by a much larger number of nt sequences in CD4+ splenic T cells (Figure 4C, Figure 4—figure supplement 1). The finding of relatively increased network clusters in T cells that have undergone antigen selection suggests that the CDR3 AA sequences that are found within clusters are positively selected; this antigen selection would extend any underlying physical bias generated during TCR DNA recombination in the thymus (Murugan et al., 2012; Ndifon et al., 2012).
To further study the impact of selection, we evaluated TCR networks formed in the repertoires of splenic T cells from mice lacking four elements needed for physiological MHC-dependent antigen selection: MHC-I and -II molecules together with CD4 and CD8 co-receptor molecules, so-called Quad-KO mice (Van Laethem et al., 2007, 2013). In contrast to wild-type (WT) mice, the TCR of Quad-KO mice are selected by MHC-independent ligands in the thymus and their T cells express a diverse MHC-independent TCR repertoire in the periphery (Van Laethem et al., 2007; Tikhonova et al., 2012; Van Laethem et al., 2013). We found that similarity networks formed by the top 1000 CDR3 sequences from Quad-KO mice were significantly less connected than those of the WT strain (C57BL/6) measured in the same set of experiments (Figure 4A and Supplementary file 2). Together, these findings indicate that MHC-dependent thymic selection plays a significant role in promoting the formation of dense clusters of TCR-similarity networks. Lack of MHC-dependent selection in DN thymocytes and in Quad-KO mice is associated with TCR networks of reduced connectivity; in contrast, TCRs that are subject to MHC selection form dense networks with a higher level of convergent recombination. Thus, recombination biases combined with clonal selection generate a TCR repertoire that is not uniform, but rather focused in specific regions of sequence space that are preferentially associated with self-related antigen-reactivities.
Following these observations, we tested if the relative abundance of CS-public clonotypes is increased by MHC-dependent selection. To this end, we compared the frequency of CS-public sequences in repertoires of Quad-KO mice and DN thymocytes to those of control WT mice (Figure 4B). The cumulative frequencies of the CS-public CDR3 sequences between two sets of experiments done with WT mice (the 28 WT mice used in the network analysis, and the WT mice used as controls in the Quad-KO experiment) show no significant difference (P value = 0.293). On the other hand, the Quad-KO repertoires exhibited lower total frequency of the CS-public CDR3s compared with both 28 WT mice (P value = 4.318e-09) and the Quad-WT mice (P value = 0.01781). The cumulative frequency in the DN shows a similar trend, with no statistical significant (P value = 0.1877). Together, these results indicate that, although sequence homology of V and J germline segments between mice and humans and bias in the recombination process influence the probability for a sequence to be shared between the two species, additional selection forces are influencing its abundance.
Since the composition of the TCR repertoire of an individual changes in response to immune challenges throughout life, we tested the effects of both immunization and aging on the network organization of the TCR repertoire. We immunized naïve mice with p277, a self peptide derived from HSP60 (heat shock protein 60), or with a foreign peptide, derived from ovalbumin (OVA). Peptide p277 was previously found to be recognized by the C9 public TCR in NOD mice (Tikochinski et al., 1999), and the CDR3β sequence of the C9 clone was also public in C57BL/6 mice (Madi et al., 2014). Additionally, we analyzed the network structures in the TCR repertoires of T cells from the immunized mice that were further cultured in vitro with antigen presenting cells loaded with the specific peptide. The distribution of sequence abundances and repertoire evenness were evaluated using the Gini inequality coefficient, which ranges from 0 for a repertoire where every sequence is present in equal abundance, to 1 for a repertoire dominated by a single sequence, with other sequences present at zero abundance (Bashford-Rogers et al., 2013; Thomas et al., 2013).
We found that immunization with either peptide resulted in repertoires that contained a set of expanded CDR3 sequences and had an increased abundance inequality. In vitro re-stimulation further increased inequality (Figure 5A–C and Supplementary file 3). This inequality was associated with the emergence of private clones that dominated the post-immunization repertoire, such that the relative weight of public clones was reduced (Figure 5E). Interestingly, immunization was also associated with network disruption; the number of clustered nodes and the number of edges both fell after immunization in vivo and fell further after in vitro re-stimulation (Figure 5D, Figure 5—figure supplement 1). Both the increased inequality and the decreased network connectivity reversed spontaneously in the OVA-immunized mice 2 months following immunization (Figure 5D,E (right), Figure 5—figure supplement 1). Similar to immunization, repertoires in aged mice (Figure 5F, Figure 5—figure supplement 2) and in aged humans (Figure 5G, Figure 5—figure supplement 3) were more unequal and less connected than those of young individuals, and private CDR3 sequences became relatively more abundant with age (Figure 5—figure supplement 4). Altogether, we found a strong anti-correlation between the Gini Coefficient of TCR inequality and the number of connected nodes in TCR networks in mice (Figure 5F, Spearman correlation = −0.661) and in humans (Figure 5G, Spearman correlation = −0.865).
Another factor that impacted network structure was immune checkpoint blockade. We used published CDR3β sequence data (Robert et al., 2014) from subjects who had undergone CTLA4 (cytotoxic T–lymphocyte-associated protein 4) blockade with tremelimumab. Previous analysis of these data showed that this treatment diversified the peripheral T-cell pool. Applying TCR similarity network analysis, we now show that the 1000 most abundant CDR3 sequences after check-point blockade are less connected than pre-treatment (p value<0.05 ranked Wilcox paired test, Figure 5H left); moreover, this reduction in connectivity was detected concurrently with a decrease in the number of public CDR3 sequences and an increase in the frequency of private ones (p-value=0.01947, ranked Wilcox paired test, Figure 5H right, Figure 5—figure supplement 5). Thus, broadening of the peripheral repertoire following CTLA4 blockade reduces the presence of public clones and enhances the expansion of private clones, similar to the changes we observed in aging or after immunization. This finding raises the possibility that check-point associated immune regulation also could be involved in the prominence of network connectivity of public T cells. Finally, we analyzed TCR repertoires of patients with the autoimmune disease Juvenile Idiopathic Arthritis (JIA)(Henderson et al., 2016). We found that there was a strong increase of public (network promoting) TCRs in the peripheral blood of JIA patients compared to healthy donors (P value = 0.0006, Figure 5I). Thus, while immune perturbations such as immunization and aging lead to reduced levels of public clonotypes and a reduction in network connectivity, this specific autoimmune condition is associated with an increased level of public clones which are putatively associated with self-antigens.
Our application of network analysis to TCRβ CDR3 sequencing data reveals a hitherto unrecognized structure of the TCR repertoire in both mice and humans: In young, healthy individuals, the most abundant TCRβ CDR3 sequences are distributed unevenly in sequence-space, with clusters centered around public CDR3s, and in particular around CS-public sequences, which are public both in mice and humans (Figure 5J top-right, even and focused repertoire). The clustering of the most abundant CDR3 sequences in young and healthy individuals results in a repertoire that is much more restricted than would be expected from the random process of TCR somatic recombination. This basic network architecture is modified by immunization and aging due to the dominant expansion of more private CDR3 clonotypes. Thus, public CDR3s that serve as hubs of the TCR networks become less prominent, leading to reduced connectivity of the TCR networks combined with a more skewed repertoire (Figure 5J bottom-left, skewed and spread repertoire). We find that network organization and repertoire evenness are restored with the resolution of immune responses. It might be the case that incomplete resolution of immune responses throughout life lead to accumulation of changes in the TCR repertoire that eventually result in the skewed and spread (less clustered) repertoires that we observe in aged individuals. Interestingly, TCR repertoires from patients with the autoimmune condition JIA showed increased levels of public TCR sequences. This aligns with our observation that public TCR networks are enriched with self-associated TCRs. Taken together, our analysis supports the idea that the level of network connectivity, frequency of public TCRs and repertoire evenness are linked to each other, and are concurrently modulated by the individual’s immune state (disease/immunization/ aging).
Mechanistically, we found that MHC-dependent antigen selection contributes to the formation of dense networks, since reduced network connectivity was observed in pre-selection DN thymocytes and also by inhibiting MHC-dependent selection, in the Quad-KO mice. These results can be explained by preferential selection and increased survival, in both the thymus and periphery, of T cells that carry specific CDR3 sequences that recognize self-antigens presented by MHC molecules. Different T cell clones, which carry different CDR3 nt sequences but encode the same AA sequence, would appear to enjoy a common selective advantage and accumulate in the peripheral repertoire. This mechanism can explain our observations of increased convergent recombination in splenic CD4+ T cells compared to DN thymocytes (Figure 4—figure supplement 1). Antigen selection can also account for the enhanced network connectivity of TCRs that differ by one AA in their CDR3 sequences; such related CDR3 sequences may be selected by the same peptide-MHC complex, albeit with different affinities (Moss et al., 1991; Serana et al., 2009; Zoete et al., 2013). This working hypothesis needs to be tested experimentally to see if linked CDR3 sequences really cross-react with the same or similar peptide-MHC complexes. MHC-antigen selection of public CDR3 sequences takes place on a background of biases in the biophysical process of DNA recombination (Elhanati et al., 2014). Combined, these processes lead to the formation of dense network clusters of the most abundant public TCR sequences, as we report here. In contrast, the most abundant private TCR sequences generate poorly connected networks. B cell receptor (BCR) sequences (Ben-Hamo and Efroni, 2011; Bashford-Rogers et al., 2013), unlike the T-cell repertoire networks we disclose here, have long been known to generate networks in individual subjects by affinity maturation that is mediated by SHM; T cells do not undergo SHM so TCR networks must be generated in the developmental process. Thus, dominant and public T cell clonotypes have a higher sequence similarity than non-dominant and private ones. In contrast, BCR networks have a distinct structure resulting from the SHM process, in which abundance and degree are correlated, which is not the case in TCR networks.
Our finding that TCR CDR3 networks include identical and related sequences that are not confined to individuals but are shared by most individuals of the same species and even cross the species divide between mice and humans, suggests the likelihood of some fundamental evolutionary advantage in such sequences. As noted above, antigen specificity of a TCR cannot be defined based on its CDR3β alone. However, the same or very similar CDR3β sequences are frequently observed within repertoires of T cells specific for a given antigen, in combination with flexible or preferential pairing with TCRα (Klinger et al., 2015; Chen et al., 2017; Tickotsky et al., 2017). Hence, we hypothesize that T cell clones bearing the conserved, CS-public, CDR3 sequences recognize similar antigenic epitopes that are conserved across species. These antigens may be derived from evolutionarily conserved regions of self proteins, forming a core of T cell reactivities to specific self epitopes, with potential implications for self-maintenance, autoimmunity and cancer. Further studies relating TCRα, TCRβ and peptide specificity will enable to experimentally test this hypothesis.
Our results indicate that T lymphocytes ‘focus their attention’ to specific regions in sequence space. These new findings on the organization of TCR repertoires and their dynamics raise intriguing questions, for example, does the existence of network clusters indicate a healthy immune state? Can restoration of network structure reinstate immune function in the elderly or prevent excess inflammation and autoimmune disease? The theory of the immunological homunculus composed of self-recognizing B cells and T cells (Cohen, 1992, 2000) might be relevant here.
Female 5–8 weeks old C57BL/6 mice were obtained from Harlan Laboratories. Analysis of TCR sequences from aged mice is based on data that was previously described in Shifrut et al. (2013). Analysis of TCR sequences from repertoires which are not subject to MHC-dependent selection, is based on Quad-KO mice, which are lacking four elements needed for physiological MHC-dependent antigen selection: MHC-I and -II molecules together with CD4 and CD8 co-receptor molecules, and matched control WT mice (Van Laethem et al., 2007, 2013) and DN thymocytes, which represent the landscape of generated TCRs before thymic selection.
Dataset of 39 healthy Caucasian donors, ages 6–90 years, was obtained from Britanova et al. (2014) (Robert et al., 2014). CTLA4 blockade data was obtained from Robert et al. (2014). Juvenile Idiopathic Arthritis (JIA) data of patients compared to healthy donors was obtained from Henderson et al. (2016).
Mice were injected intra-peritonealy (IP) with 100 μg of either Chicken Ovalbumin (OVA) or peptide 277 (p277) emulsified in CFA (1:1 ratio). Spleens were harvested on day 7 post immunization and T cells were extracted for TCR analysis. in vitro stimulation: T cells from spleens of immunized mice were harvested on day 7 and were re-stimulated with irradiated splenocytes and the relevant peptide antigen. Five of the OVA-immunized mice received a boost IP injection of 100 μg OVA + CFA on day 14, and spleens were harvested on day 60 for TCR analysis (Supplementary file 3).
Libraries were prepared and pre-processed as published (Ndifon et al., 2012). Briefly, T cells were purified from splenocytes by magnetic bead separation, total RNA was extracted and reverse transcribed using a TCR Cβ-specific primer linked to the 3'-end Illumina sequencing adapter. cDNA was amplified using PCR with a Cβ−3’adpater primer and a set of 20 Vβ-specific 5’ primers, followed by ligation of a 5’Illumina adaptor and a second PCR using universal primers for the 5’ and 3’ Illumina adapters. The libraries were sequenced using Genome Analyzer II or HiSeq 2000 (Illumina). Sequence filtering, VDJ annotation, normalization and translation to AA sequences were performed as published (Ndifon et al., 2012). Libraries for TCR-seq of Quad mice and C57BL/6 controls were sequenced using Illumina sequencers, performed by Adaptive Biotechnologies Corp (Seattle, WA). In brief, αβT cells were isolated by cell sorting, washed in PBS and lysed in Trizol. RNA was extracted using the RNEasy protocol (Qiagen) and 2 µg per sample reverse transcribed to cDNA by oligo (dT) priming with the SuperScript TM III First-Strand Synthesis System (Invitrogen). cDNA was sequenced by Adaptive Biotechnologies Corp.
Statistical analysis was performed using R Software (Core Team, 2013). We used the following packages: ‘ShortRead’ (Morgan et al., 2009) for the pre-processing pipeline; ‘ineq’ (Zeileis, 2012) and ‘reldist’ (Handcock, 2014) to calculate the Gini coefficient; ‘Igraph’ (Csardi and Nepusz, 2006) to create network objects, obtain the degree of a node and its betweeness; ‘stringdist’ (van der Loo, 2014) to calculate Levenshtein distances; and ‘ggplot2’ (Wickham, 2009) for generating figures. Statistical tests performed are stated in the text. All network figures were made using Cytoscape (http://www.cytoscape.org/) (Cline et al., 2007; Smoot et al., 2011; Saito et al., 2012).
The sequence data from this study have been made publicly available (https://usegalaxy.org/u/erezgrn/h/network-tcrs).
Age-related decrease in TCR repertoire diversity measured with deep and normalized sequence profilingThe Journal of Immunology 192:2689–2698.https://doi.org/10.4049/jimmunol.1302064
Cross-reactive memory T cells for Epstein-Barr virus augment the alloresponse to common human leukocyte antigens: degenerate recognition of Major histocompatibility complex-bound peptide by T cells and its role in alloreactivityEuropean Journal of Immunology 27:1726–1736.https://doi.org/10.1002/eji.1830270720
The cognitive principle challenges clonal selectionImmunology Today 13:441–444.https://doi.org/10.1016/0167-5699(92)90071-E
Tending Adam’s Garden: Evolving the Cognitive Immune SelfLondon: Academic Press.
R: A Language and Environment for Statistical ComputingR: A Language and Environment for Statistical Computing, Vienna.
The igraph software package for complex network researchInterJournal, Complex Systems, 1695.
Binary codes capable of correcting deletions, insertions, and reversalsSoviet Physics Doklady 10:707–710.
A public T cell clonotype within a heterogeneous autoreactive repertoire is dominant in driving EAEJournal of Clinical Investigation 117:2176–2185.https://doi.org/10.1172/JCI28277
Bias in the αβ T-cell repertoire: implications for disease pathogenesis and vaccinationImmunology and Cell Biology 89:375–387.https://doi.org/10.1038/icb.2010.139
T cell receptor V gene usage of islet beta cell-reactive T cells is not restricted in non-obese diabetic miceJournal of Experimental Medicine 173:1091–1097.https://doi.org/10.1084/jem.173.5.1091
Characterization of the T cell receptor repertoire causing collagen arthritis in miceJournal of Experimental Medicine 177:387–395.https://doi.org/10.1084/jem.177.2.387
Relative distribution methodsRelative distribution methods, 1.6-3.
CTLA4 blockade broadens the peripheral T-cell receptor repertoireClinical Cancer Research 20:2424–2432.https://doi.org/10.1158/1078-0432.CCR-13-2648
Overlap and effective size of the human CD8+ T cell receptor repertoireScience Translational Medicine 2:47ra64.https://doi.org/10.1126/scitranslmed.3001442
A shared TCR CDR3 sequence in NOD mouse autoimmune diabetesInternational Immunology 11:951–956.https://doi.org/10.1093/intimm/11.6.951
The stringdist package for approximate string matchingThe R Journal 6:111–122.
Ggplot2: Elegant Graphics for Data AnalysisNew York: Springer.
A single autoimmune T cell receptor recognizes more than a million different peptidesJournal of Biological Chemistry 287:1168–1177.https://doi.org/10.1074/jbc.M111.289488
Ineq: Measuring Inequality, Concentration, and PovertyIneq: Measuring Inequality, Concentration, and Poverty.
Structure-Based, rational design of T Cell ReceptorsFrontiers in Immunology 4:268.https://doi.org/10.3389/fimmu.2013.00268
Arup K ChakrabortyReviewing Editor; Ragon Institute of MGH, MIT and Harvard, United States
In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.
Thank you for submitting your article "T cell receptor repertoires of mice and humans are clustered in similarity networks around conserved public sequences" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by Arup Chakraborty as the Reviewing Editor and Senior Editor. The reviewers have opted to remain anonymous.
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
The paper presents insightful analyses of T cell receptor sequence repertoires. Network analyses is used to identify clusters of CDR3β sequences, which are over-represented in peripheral T cell repertoires and are often found in multiple individual mice or humans (so called public TCRs). The analysis primarily concerns sequence data that was collected by the authors from a few distinct strains of mice (inbred wild type, plus a couple of immune system knockout strains). In addition, published human repertoire data is used to explore cross-species relevance of results obtained by analyzing mouse data. The authors focus on the most abundant sequences in the repertoire derived from individual animals (roughly the top.3 percent). They then ask whether these abundant sequences are distinguished by any global characteristics.
The authors find that the abundant sequences can largely be decomposed into subsets such that every sequence in a subset is connected to some other sequence in the same subset by at most one substitution, insertion or deletion. The second observation is that the large clusters contain (or have sequences that are one substitution/insertion/deletion away from) members of a group of 124 TCR sequences that have previously been annotated as responders to specific, identifiable, antigens. Interestingly, and enigmatically, the annotated TCR sequences that connect to these clusters mostly have annotations associated with self-reactivity. The authors develop these basic observation in several directions: 1) they show that a group of sequences selected for being present in multiple individual mice (25 out of 28) have similar properties of clustering and association with known antigens; 2) they show that similar clustering and antigen association of abundant TCR sequences occurs in human data and that there is strong overlap between abundant sequences in the two species; 3) they perform related analyses on mice that are knock-out with respect to various elements of the adaptive immune system and show that the sequence cluster organization of abundant sequences is not present if T cell activation is not possible (by knocking out the antigen presenting MHC complex, for example). In other words, that the sequence cluster organization is a product of T cell activation and response. 4) The authors go one to show that T cell selection provides a competitive advantage for selecting T cells that carry the more frequent TCR CDR3 sequences, i.e., T cell selection limits diversity by selecting against thymocytes expressing low frequency, highly variable CDR3bs. In addition, the authors provide evidence that T cell primary responses, CTLA4 checkpoint blockade and aging disrupts the "normal" TCR CDR3b frequencies; i.e., immune response diversifies the hierarchy of T cell CDR3b sequence frequencies presumably by expanding low-frequency T cells specific for particular antigens. The authors speculate that these frequency networks are reflective or perhaps required for proper T cell immune homeostasis.
While the paper is interesting, a number of points need to be addressed.
Major points to be addressed:
1) There is an over-emphasis on describing the network with minimal provision of primary data; e.g., it would be helpful to provide the actual sequences of each node.
2) Given the subject matter, there is a general lack of discussion regarding V-D-J recombination. Specifically, the following points need to be clarified:
Preface: human and mouse Db1 and Jb2.7 gene are 100% homologous (Db1 nucleotide, Jb2.7 AA sequence). Human Jb2.3 is highly similar to mouse Jb2.5, identical if 2 AA of the Jb are "chewed back”, which is relatively common during V-D-J rearrangement. Because of this sequence homology, CDR3s made from template-only V-D-J recombination using many Vbs, Db1 and Jb2.7 will by definition be identical in mouse and human. It stands to reason that insertions/deletions of these gene segments during recombination will also generate the identical sequences at a reasonable frequency.
"We discovered an unexpected number of public CDR3-TCRβ segments that were identical in mice and humans." Is this more so than would be expected given the extensive sequence homology between mouse and human Db/Jbs?
"These findings propose that similar driving forces may generate and expand particular public CDR3 TCR sequences that contain conserved sequence motifs in the two species." Given that template only V-D-J recombination of Db1 and Jb2.7 (or Jb2.3) would give identical TCRb CDR3 sequences, isn't sequence homology the evolutionary basis of public CS CDR3s?
3) Clarity of discussion of how CDR3β sequences relate to antigen specificity of a TCR. This evidence needs to be spelled out a bit in the main text. The reader is referred to other papers, but the point is so important that it would be appropriate to have a self-contained summary exposition in the paper itself.
4) Given that several aspects of novelty that the authors are claiming are known in other context or are predictable, the authors should directly test their hypothesis that disrupted TCR CDR3β networks are at the minimum a "biomarker" for the disease state; e.g., are there TCR CDR3β network signatures of chronic infection? There are several mouse models (LCMV, TB etc.) or human conditions that could be used as source material.
5) This point concerns Figure 3A and the discussion around it. Why all the nodes in each cluster are colored in the same way is not clear. Only a few nodes in a given cluster are identical to, or one step away from, one of the 124 annotated TCR sequences. Is the implication of the color scheme that any node in the cluster is expected to be responsive to one of the antigens that are identical (or close to) at least one node in the cluster? The discussion of this point not entirely clear.
6) In Figure 4 and the surrounding discussion, mention is made of network analysis of repertoires obtained from DN (double negative CD4- DC8-) thymocytes. This data set is not mentioned in Materials and methods, nor is any link to a repository provided. These data are extremely important as they bear on the question whether the highly shared TCR sequences are abundant because of antigen reaction and clonal expansion or due to some other cause. More detailed information about this data set should be given (how many sequences per DN mouse etc.) and, ideally, a pointer to the repository of this data should be given. The data repository should give the nucleotide sequences and not just the amino acid sequences of the CDR3 since the text makes a point of the difference in the number of nt realizations of specific CDR3 aa sequences when comparing the DN mice with the WT mice.https://doi.org/10.7554/eLife.22057.029
- Asaf Madi
- Nir Friedman
- Nir Friedman
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
We thank Benjamin Chain and Shalev Itzkovitz for helpful comments on the manuscript. This research was supported by grants from the Minerva Foundation with funding from the Federal German Ministry for Education and Research and the I-CORE Program of the Planning and Budgeting Committee and the Israel Science Foundation. AM was supported by the MD Moross Institute for Cancer Research.
Animal experimentation: This study was performed in strict accordance with the recommendations in the Guide for the Care and Use of Laboratory Animals of the National Institutes of Health. All of the animals were handled according to approved institutional animal care and use committee (IACUC) protocols (#24110116-2) of the Weizmann Institute of Science. The protocol was approved by the Committee on the Ethics of Animal Experiments of the Weizmann Institute of Science. Every effort was made to minimize suffering.
- Arup K Chakraborty, Reviewing Editor, Ragon Institute of MGH, MIT and Harvard, United States
© 2017, Madi et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.