Schematic overview of the generation of thymic TCR dataset and analytical pipeline.

Top Panel: Generation of the thymic TCR dataset. From deceased human thymuses of males and females, we isolated key T-cell subtypes through cell sorting. These subtypes included double-positive (DP) cells (CD3+ CD4+ CD8+), single-positive (SP) CD8+ cells (CD3+ CD4-CD8+), SP CD4+ cells (CD3+ CD4+ CD8-), and further separated SP CD4+ cells into T effector (Teff, CD3+ CD4+ CD8-CD25-) and Treg (CD3+ CD4+ CD8-CD25+) cells. TCR libraries were generated from the RNA of each cell population using rapid amplification of cDNA-ends by PCR (5’RACE PCR). Following sequencing, data preprocessing involved quality sequencing checks, contig alignment, and quality control. The final dataset comprised 20 DP samples (male-to-female ratio of 1:1), 21 SP CD8+ samples (1.1:1), 6 SP CD4+ samples (1:5), 16 SP CD4 Teff samples (1.67:1), and 14 SP CD4 Treg samples (1.33:1). Males are depicted in violet and females in orange. Bottom Panel: Analytical pipeline. We compared the TCR repertoires of males and females across various dimensions. General aspects of the TCR repertoire were evaluated, including diversity, gene usage, CDR3aa length distribution, and aa usage within the CDR3 region. Additionally, we analyzed the probability of sequence generation, the TCR repertoire structure based on CDR3aa sequence similarity, identification of differentially expressed TRB CDR3aa motifs between sex, and TRB CDR3aa sequence specificity.

Comparable overall TCR gene usage between males and females.

(A) Principal Component Analysis (PCA) derived from the distribution of TRAV (left) and TRBV (right) gene usage frequencies across sex groups (males vs females), showing results for DP (N = 22), CD8 (N = 23), CD4 Teff (N = 24) and CD4 Treg (N = 14) cells (displayed from top to bottom). Each point represents an individual. Ellipses indicate 95% confidence intervals. (B) Heatmap showing the Jensen-Shannon Divergence (JSD) score between samples, derived from the distributional usage of TRAV-TRAJ (left) and TRBV-TRBJ (right) gene associations in DP (N = 22), CD8 (N = 23), CD4 Teff (N = 24) and CD4 Treg (N = 14) cells (displayed from top to bottom). Hierarchical clustering was performed using the Euclidean distance and the complete linkage method. Males are shown in violet; females in orange.

Comparable thymic TCR repertoire diversity between males and females.

Boxplots display Shannon (A), Simpson (B), and Berger-Parker (C) index values for TRA (left) and TRB (right), across thymic T cell subtypes, displayed from top to bottom, in DP (N = 22), CD8 (N = 23), CD4 Teff (N = 24) and CD4 Treg (N = 14). Each point represents the median value from 50 rarefactions per sample. Statistical analysis (Wilcoxon test) showed no significant sex bias in TCR repertoire diversity (p > 0.05). Males are shown in violet; females in orange.

CDR3aa length and amino acid composition of TRB CDR3s in males and females.

(A) Distribution of TRB CDR3aa length usage in DP (N = 22), CD8 (N = 23), CD4 Teff (N = 24) and Treg (N = 14) SP cells (displayed from top to bottom). Stars indicate statistical differences between males and females based on the p-value of the Wilcoxon test (*: p < 0.05, **: p < 0.01). (B) Amino acid (aa) usage between males and females represented as the log2 fold change of the median usage of females over males for each aa in the p108 to p114 CDR3aa region for TRB. A line at log2 fold change = 0 indicates the direction of the difference in usage frequency. Bars are colored according to the hydropathy class of the aa (neutral in gray, hydrophilic in blue-green and hydrophobic in gold). Stars indicate statistical differences of usage between males and females based on the p-value of the Wilcoxon test (*: p < 0.05, **: p < 0.01). (C) Cumulative hydrophobic aa usage for each individual at p109 and p110 positions in TRB across thymic T cell subtypes, including DP (N = 22), CD8 (N = 23), CD4 Teff (N = 24) and CD4 Treg (N = 14). Statistical analysis showed no significant sex bias (Wilcoxon test: p > 0.05 for the two positions). Males are depicted in violet; females in orange.

Probabilities of generation of TCRs in males and females.

The figure shows the distribution of Pgen (probability of generation) sequences between males and females for TRA and TRB in DP and CD8 cells. A V(D)J recombination model was created using 400,000 non-productive random sequences derived from the nonproductive sequences of all individuals for each chain, both for DP and CD8 cells (65). This model was then used to calculate the Pgen of sequences for each individual (66). The overall distribution comparison between males and females was tested using the Kolmogorov-Smirnov test, with the D value and associated p-value indicated. Males are depicted in violet and females in orange.

Comparable TCR repertoire network structure based on CDR3 amino acid sequence similarity between males and females.

For each sample, 100 random subsamplings were performed on the minimum number of CDR3aa per cell subtype. Two CDR3aa are linked if they have a Levenshtein distance of one. (A) TRA Network structure of subsampled TCR repertoire of a male subject for DP, CD8, CD4 Teff and CD4 Treg SP (from left to right). Each point represents a CDR3aa. (B-C) Comparison of the proportion of linked sequences (B) and network density (C) between males and females samples, in DP (N = 22), CD8 (N = 23), CD4 Teff (N = 24) and CD4 Treg (N = 14). Each point represents the median value from 100 subsampling iterations for each sample. Statistical analysis using the Wilcoxon test revealed no significant sex differences for these two metrics (p > 0.05). Males are depicted in violet; females in orange.

Thymic TRB TCR sex-associated motifs.

Different structural motifs found differentially expressed between males and females in our dataset. We distinguish local motifs as distinct aa sequences and global motifs as motif region with one variable aa position maintaining a BLOSUM62 score ≥ 0. (A) Number of male and female associated motifs by cell subset. (B) Euler diagram illustrating the distribution and overlap between all sex associated motifs. Numbers indicate the number of motifs in overlap zones. (C-D) Validation of these sex associated TRB CDR3aa motifs. Heatmap of all these TRB CDR3aa motif usage in the external thymic pediatric dataset (35, 36) (C) and those of TRB CD8 in the peripheral dataset (D). Sex and total CDR3aa number are depicted by sample. Males are depicted in violet and females in orange then local motifs in blue-green and global motifs in magenta.

Sex-biased enrichment of TCRs specific for known antigens.

From a pooled and curated database, an exact match with this database infers the specificity of TRB CDR3aa of our thymic dataset. Many specificity groups are defined with the nature of the antigen peptide targeted (bacteria, virus, autoimmune disease [AID], cancer and self-peptide no associated to disease [human]). This analysis compares the distributions of the proportion of unique TRB CDR3aa sequences with a specific specificity (A) and their usage (B) between females and males across cell subtypes, using the log2 fold change of the median values (females over males), following each specificity group. These groups of specificity are additionally classified as microorganism in top (bacteria in gold and virus in light blue) and self at bottom (AID in magenta, cancer in red, human in blue-green). Polyspecific CDR3aa are defined here as CDR3aa capable of recognizing multiple antigens from different organisms (for no self-antigens) or from different specificity groups (e.g. categorized microorganisms, categorized self-antigens, allergens…). Proportion of unique polyspecific CDR3aa in the specific repertoire (C) and their usage (D) is compared between males and females, in DP (N = 22), CD8 (N = 23), CD4 Teff (N = 24) and CD4 Treg (N = 14) cells. Stars indicate statistical differences between males and females based on the p-value of the Wilcoxon test (*: p < 0.05, **: p < 0.01). Males are depicted in violet and females in orange.