Thymic selection of the T cell receptor repertoire is biased toward autoimmunity in females

Hélène Vantomme; Valentin Quiniou; Leslie Adda; Charline Jouannet; Vanessa Mhanna; Céline Albalaa; Pierre Barennes; Nicolas Coatnoan; Vimala Diderot; Johanna Dubois; Gwladys Fourcade; Kenz Le Gouge; Otriv Frédéric Nguekap Tchoumba; Martin Pezous; Paul Stys; Adrien Six; Encarnita Mariotti-Ferrandiz; David Klatzmann

doi:10.7554/eLife.109041.1

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.

Reviewing Editor
Juan Carlos Zúñiga-Pflücker
University of Toronto, Toronto, Canada
Senior Editor
Satyajit Rath
National Institute of Immunology, New Delhi, India

Reviewer #1 (Public review):

Summary:

The goal of this paper was to determine whether the T cell receptor (TCR) repertoire differs between a male and a female human. To address this, this group sequenced TCRs from double-positive and single-positive thymocytes in male and female humans of various ages. Such an analysis on sorted thymocyte subsets has not been performed in the past. The only comparable dataset is a pediatric thymocyte dataset where total thymocytes were sorted.

They report on participant ages and sexes, but not on ethnicity, race, nor provide information about HLA typing of individuals. Though the experiments themselves are heroic, they do represent a relatively small sampling of diverse humans. They observed no differences in TCRbeta or TCRalpha usage, combinational diversity, or differences in the length of the CDR3 region, or amino acid usage in the CD3aa region between males or females. Though they observed some TCRbeta CD3aa sequence motifs that differed between males and females, these findings could not be replicated using an external dataset and therefore were not generalizable to the human population.

They also compared TCRbeta sequences against those identified in the past using computational approaches to recognize cancer-, bacterial-, viral-, or autoimmune-antigens. They found very little overlap of their sequences with these annotated sequences (depending on the individual, ranging from 0.82-3.58% of sequences). Within the sequences that were in overlap, they found that certain sequences against autoimmune or bacterial antigens were significantly over-represented in female versus male CD8 SP cells. Since no other comparable dataset is available, they could not conclude whether this is a finding that is generalizable to the human population.

Strengths:

This is a novel dataset. Overall, the methodologies appear to be sound. There was an attempt to replicate their findings in cases where an appropriate dataset was available. I agree that there are no gross differences in TCR diversity between males and females.

Weaknesses:

Overall, the sample size is small given that it is an outbred population. The cleaner experiment would have been to study the impact of sex in a number of inbred MHC I/II identical mouse strains or in humans with HLA-identical backgrounds.

It is unclear whether there was consensus between the three databases they used regarding the antigens recognized by the TCR sequences. Given the very low overlap between the TCR sequences identified in these databases and their dataset, and the lack of replication, they should tone down their excitement about the CD8 T cell sequences recognizing autoimmune and bacterial antigens being over-represented in females.

The dataset could be valuable to the community.

https://doi.org/10.7554/eLife.109041.1.sa1

Reviewer #2 (Public review):

Summary:

This study addresses the hypothesis that the strikingly higher prevalence of autoimmune diseases in women could be the result of biased thymic generation or selection of TCR repertoires. The biological question is important, and the hypothesis is valuable. Although the topic is conceptually interesting and the dataset is rich, the study has a number of major issues that require substantial improvement. In several instances, the authors conclude that there are no sex-associated differences for specific parameters, yet inspection of the data suggests visible trends that are not properly quantified. The authors should either apply more appropriate statistical approaches to test these trends or provide stronger evidence that the observed differences are not significant. In other analyses, the authors report the differences between sexes based on a pulled analysis of TCR sequences from all the donors, which could result in differences driven by one or two single donors (e.g., having particular HLA variants) rather than reflect sex-related differences.

Strengths:

The key strength of this work is the newly generated dataset of TCR repertoires from sorted thymocyte subsets (DP and SP populations). This approach enables the authors to distinguish between biases in TCR generation (DP) and thymic selection (SP). Bulk TCR sequencing allows deeper repertoire coverage than single-cell approaches, which is valuable here, although the absence of TRA-TRB pairing and HLA context limits the interpretability of antigen specificity analyses. Importantly, this dataset represents a valuable community resource and should be openly deposited rather than being "available upon request."

Weaknesses:

Major:

(1) The authors state that there is "no clear separation in PCA for both TRA and TRB across all subsets." However, Figure 2 shows a visible separation for DP thymocytes (especially TRA, and to a lesser degree TRB) and also for TRA of Tregs. This apparent structure should be acknowledged and discussed rather than dismissed.

(2) Supplementary Figures 2-5 involve many comparisons, yet no correction for multiple testing appears to be applied. After appropriate correction, all the reported differences would likely lose significance. These analyses must be re-evaluated with proper multiple-testing correction, and apparent differences should be tested for reproducibility in an external dataset (for example, the pediatric thymus and peripheral blood repertoires later used for motif validation).

(3) Supplementary Figure 6 suggests that women consistently show higher Rényi entropies across all subsets. Although individual p-values are borderline, the consistent direction of change is notable. The authors should apply an integrated statistical test across subsets (for example, a mixed-effects model) to determine whether there is an overall significant trend toward higher diversity in females.

(4) Figures 4B and S8 clearly indicate enrichment of hydrophobic residues in female CDR3s for both TRA and TRB (excluding alanine, which is not strongly hydrophobic). Because CDR3 hydrophobicity has been linked to increased cross-reactivity and self-reactivity (see, e.g., Stadinski et al., Nat Immunol 2016), this observation is biologically meaningful and consistent with higher autoimmune susceptibility in females.

(5) The majority of "hundreds of sex-specific motifs" are probably donor-specific motifs confounded by HLA restriction. This interpretation is supported by the failure to validate motifs in external datasets (pediatric thymus, peripheral blood). The authors should restrict analysis to public motifs (shared across multiple donors) and report the number of donors contributing to each motif.

(6) When comparing TCRs to VDJdb or other databases, it is critical to consider HLA restriction. Only database matches corresponding to epitopes that can be presented by the donor's HLA should be counted. The authors must either perform HLA typing or explicitly discuss this limitation and how it affects their conclusions.

(7) Although the age distributions of male and female donors are similar, the key question is whether HLA alleles are similarly distributed. If women in the cohort happen to carry autoimmune-associated alleles more often, this alone could explain observed repertoire differences. HLA typing and HLA comparison between sexes are therefore essential.

(8) In some analyses (e.g., Figures 8C-D) data are shown per donor, while others (e.g., Fig. 8A-B) pool all sequences. This inconsistency is concerning. The apparent enrichment of autoimmune or bacterial specificities in females could be driven by one or two donors with particular HLAs. All analyses should display donor-level values, not pooled data.

(9) The reported enrichment of matches to certain specificities relative to the database composition is conceptually problematic. Because the reference database has an arbitrary distribution of epitopes, enrichment relative to it lacks biological meaning. HLA distribution in the studied patients and HLA restrictions of antigens in the database could be completely different, which could alone explain enrichment and depletions for particular specificities. Moreover, differences in Pgen distributions across epitopes can produce apparent enrichment artifacts. Exact matches typically correspond to high-Pgen "public" sequences; thus, the enrichment analysis may simply reflect variation in Pgen of specific TCRs (i.e., fraction of high-Pgen TCRs) across epitopes rather than true selection. Consequently, statements such as "We observed a significant enrichment of unique TRB CDR3aa sequences specific to self-antigens" should be removed.

(10) The overrepresentation of self-specific TCRs in females is the manuscript's most interesting finding, yet it is not described in detail. The authors should list the corresponding self-antigens, indicate which autoimmune diseases they relate to, and show per-donor distributions of these matches.

(11) The concept of polyspecificity is controversial. The authors should clearly explain how polyspecific TCRs were defined in this study and highlight that the experimental evidence supporting true polyspecificity is very limited (e.g., just a single TCR from Figure 5 from Quiniou et al.).

Minor:

(1) Clarify why the Pgen model was used only for DP and CD8 subsets and not for others.

(2) The Methods section should define what a "high sequence reliability score" is and describe precisely how the "harmonized" database was constructed.

(3) The statement "we generated 20,000 permuted mixed-sex groups" is unclear. It is not evident how this permutation corrects for individual variation or sex bias. A more appropriate approach would be to train the Pgen model separately for each individual's nonproductive sequences (if the number of sequences is large enough).

https://doi.org/10.7554/eLife.109041.1.sa0

Thymic selection of the T cell receptor repertoire is biased toward autoimmunity in females

Peer review process

Editors

Be the first to read new articles from eLife