Figures and data

Schematic overview of the generation of the thymic TCR dataset and the analytical pipeline.
Top Panel: Generation of the thymic TCR dataset. From deceased human thymuses of males and females, we isolated key T-cell subtypes through cell sorting. These subtypes included double-positive (DP) cells (CD3+ CD4+ CD8+), single-positive (SP) CD8+ cells (CD3+ CD4- CD8+), SP CD4+ cells (CD3+ CD4+ CD8-) that were further separated into T effector (Teff, CD3+ CD4+ CD8- CD25-) and Treg (CD3+ CD4+ CD8- CD25+) cells. TCR libraries were generated from the RNA of each cell population using rapid amplification of cDNA ends by PCR (5’RACE PCR). Following sequencing, data preprocessing involved quality sequencing checks, contig alignment, and quality control. The final dataset comprised 20 DP samples (male-to-female ratio of 1:1), 21 SP CD8+ samples (1.1:1), 6 SP CD4+ samples (1:5), 16 SP CD4 Teff samples (1.67:1), and 14 SP CD4 Treg samples (1.33:1). Males are depicted in violet and females in orange. Bottom Panel: Analytical pipeline. We compared the TCR repertoires of males and females across various dimensions. We evaluated general aspects of the TCR repertoire were evaluated, including diversity, gene usage, CDR3aa length distribution, and aa usage within the CDR3 region. Additionally, we analyzed the probability of sequence generation and the TCR repertoire structure based on CDR3aa sequence similarity. We identified differentially expressed TRB CDR3aa motifs between sexes and analyzed TRB CDR3aa sequence specificity. Created with BioRender.com

Comparable overall TCR gene usage between males and females.
(A) Principal Component Analysis (PCA) derived from the distribution of TRAV (left) and TRBV (right) gene usage frequencies across sex groups (males vs females), showing results for DP (N = 22), CD8 (N = 23), CD4 Teff (N = 24) and CD4 Treg (N = 14) cells (displayed from top to bottom). Each point on the graph represents an individual. Ellipses indicate 95% confidence intervals. (B) Heatmap showing the Jensen-Shannon Divergence (JSD) score between samples, derived from the distributional usage of TRAV-TRAJ (left) and TRBV-TRBJ (right) gene associations in DP (N = 22), CD8 (N = 23), CD4 Teff (N = 24) and CD4 Treg (N = 14) cells (displayed from top to bottom). Hierarchical clustering was performed using the Euclidean distance and the complete linkage method. Males are shown in violet; females in orange.

Comparable thymic TCR repertoire diversity between males and females.
Boxplots display Shannon (A), Simpson (B), and Berger-Parker (C) index values for TRA (left) and TRB (right), across thymic T cell subtypes, displayed from top to bottom, in DP (N = 22), CD8 (N = 23), CD4 Teff (N = 24) and CD4 Treg (N = 14). Each point on the graph represents the median value from 50 rarefactions per sample. Statistical analysis (Wilcoxon test) showed no significant sex bias in TCR repertoire diversity (p > 0.05). Males are shown in violet; females in orange.

CDR3aa length and amino acid composition of TRB CDR3s in males and females.
(A) Distribution of TRB CDR3aa length usage in DP (N = 22), CD8 (N = 23), CD4 Teff (N = 24) and Treg (N = 14) SP cells (displayed from top to bottom). Stars indicate statistical differences between males and females based on the p-value of the Wilcoxon test (*: p < 0.05, **: p < 0.01). (B) The data on amino acid (aa) usage between males and females is presented as the log2 fold change of the median per-donor usage in females over males for each aa in the p108 to p114 CDR3aa region for TRB. A line at log2 fold change = 0 is indicative of the direction of the difference in usage frequency. The bars are colour-coded to the hydropathy class of the aa as defined by the Kyte-Doolittle-based IMGT classification (neutral aa by gray, hydrophilic aa by blue-green and hydrophobic aa by gold). Stars indicate statistical differences of usage between males and females based on the p-value of the Wilcoxon test (*: p < 0.05, **: p < 0.01). (C) Position-specific usage of hydrophobic aa (excluding alanine, due to its weak hydrophobicity) at IMGT positions p109 and p110 in TRB across thymic T cell subtypes, including DP (N = 22), CD8 (N = 23), CD4 Teff (N = 24) and CD4 Treg (N = 14). For each donor, the values represent the proportion of unique TRB CDR3aa sequences carrying a hydrophobic amino acid at the indicated position. Stars indicate significant sex differences based on the p-value of the Wilcoxon test (*: p < 0.05), with a significant increase in hydrophobic usage at p109 in female CD8 SP cells. Males are depicted in violet; females in orange.

Probabilities of generation of TCRs in males and females.
The figure shows the distribution of Pgen (probability of generation) sequences between males and females for TRA and TRB in DP and CD8 cells. A V(D)J recombination model was created using 400,000 non-productive random sequences derived from the nonproductive sequences of all individuals for each chain, both for DP and CD8 cells (65). This model was then used to calculate the Pgen of sequences for each individual (66). The overall distribution comparison between males and females was tested using the Kolmogorov-Smirnov test, with the D value and associated p-value indicated. Males are depicted in violet and females in orange.

Comparable TCR repertoire network structure based on CDR3 amino acid sequence similarity between males and females.
For each sample, 100 random subsamplings were performed on the minimum number of CDR3aa per cell subtype. Two CDR3aa are linked if they have a Levenshtein distance of one. (A) TRA Network structure of subsampled TCR repertoire of a male subject for DP, CD8, CD4 Teff and CD4 Treg SP (from left to right). Each point on the graph represents a CDR3aa. (B-C) Comparison of the proportion of linked sequences (B) and network density (C) between male and female samples, in DP (N = 22), CD8 (N = 23), CD4 Teff (N = 24) and CD4 Treg (N = 14). Each point on the graph represents the median value from 100 subsampling iterations for each sample. Statistical analysis using the Wilcoxon test revealed no significant sex differences for these two metrics (p > 0.05). Males are depicted in violet; females in orange.

Thymic TRB TCR sex-associated motifs.
Different structural motifs found differentially expressed between males and females in our dataset. We distinguish local motifs as distinct aa sequences and global motifs as motif regions with one variable aa position maintaining a BLOSUM62 score ≥ 0. (A) Number of male and female associated motifs by cell subset. (B) Euler diagram illustrating the distribution and overlap between all sex-associated motifs. The numbers indicate the number of motifs in overlap zones. (C-D) Validation of these sex-associated TRB CDR3aa motifs. The heatmap illustrates the usage of all the TRB CDR3aa motif in the external thymic pediatric dataset (35, 36) (C) and those of TRB CD8 in the peripheral dataset (D). Sex and total CDR3aa number are depicted by sample. Males are depicted in violet and females in orange then local motifs in blue-green and global motifs in magenta.

Sex-biased enrichment of TCRs specific for known antigens.
From a pooled and curated database, an exact match with this database infers the specificity of TRB CDR3aa of our thymic dataset. Many specificity groups are defined according to the nature of the antigen peptide targeted (bacteria, virus, autoimmune disease [AID], cancer and self-peptide no associated to disease [human]). This analysis compares the distributions of the proportion of unique TRB CDR3aa sequences with a specific specificity (A) and their usage (B) between females and males across cell subtypes, using the log2 fold change of the median values (females over males), following each specificity group. These groups of specificity are additionally classified as microorganism in the top section (bacteria in gold and virus in light blue) and self at the bottom (AID in magenta, cancer in red, human in blue-green). Polyspecific CDR3aa are defined here as CDR3aa capable of recognizing multiple antigens from different organisms (for no self-antigens) or from different specificity groups (e.g. categorized microorganisms, categorized self-antigens, allergens…). The proportion of unique polyspecific CDR3aa in the specific repertoire (C) and their usage (D) is compared between males and females, in DP (N = 22), CD8 (N = 23), CD4 Teff (N = 24) and CD4 Treg (N = 14) cells. Stars indicate statistical differences between males and females based on the p-value of the Wilcoxon test (*: p < 0.05, **: p < 0.01). Males are depicted in violet and females in orange.

Age distribution of male and female donors is comparable across thymic subsets.
Age of individual donors plotted separately for females (orange) and males (violet) for each thymic cell subset: double-positive (DP) thymocytes, CD8 single-positive (CD8), CD4 effector single-positive (CD4 Teff) and CD4 regulatory T-cell single-positive (CD4 Treg). For each subset, a two-sample Kolmogorov–Smirnov (K-S) test was performed; the D statistic and corresponding p-value are displayed below each panel. In all cases, no significant difference was detected, indicating balanced age distributions between sexes for every subset.

Rank–frequency distributions of thymic TRA and TRB clonotypes.
Log–log rank–frequency plots of unique TRA (left) and TRB (right) clonotypes are shown for each donor and thymic subset (DP, CD8, CD4 Teff, CD4 Treg, from top to bottom). For each sample, clonotypes are ranked by decreasing abundance (x-axis, log scale), and their corresponding relative frequencies are plotted (y-axis, log scale). Curves display broadly comparable shapes across donors, with no obvious systematic differences between males and females, indicating similar clonotype abundance distributions and sampling depth across sexes.

Some differential V and J gene usage in TCR repertoires of DP cells between males and females.
(A-D) Frequencies of TRAV (A), TRAJ (B), TRBV (C), and TRBJ (D) gene usage between males and females. Red lines represent the mean for each sex group. Statistical comparisons between groups were performed using the Wilcoxon test. Stars indicate statistical differences between males and females based on the p-value of the test (*: p < 0.05, **: p < 0.01). (E-F) Principal Component Analysis (PCA) derived from the distribution of the frequency of usage of TRAV-TRAJ (E) and TRBV-TRBJ (F) gene associations across sex groups between males and females. Each point represents an individual. Ellipses show 95% confidence intervals. Males are depicted in violet and females in orange.

Some differential V and J gene usage in TCR repertoires of CD8 cells between males and females.
(A-D) Frequencies of TRAV (A), TRAJ (B), TRBV (C), and TRBJ (D) gene usage between males and females. Red lines represent the mean for each sex group. Statistical comparisons between groups were performed using the Wilcoxon test. Stars indicate statistical differences between males and females based on the p-value of the test (*: p < 0.05, **: p < 0.01). (E-F) Principal Component Analysis (PCA) derived from the distribution of the frequency usage of TRAV-TRAJ (E) and TRBV-TRBJ (F) gene associations across sex groups between males and females. Each point represents an individual. Ellipses show 95% confidence intervals. Males are depicted in violet and females in orange.

Some differential V and J gene usage in TCR repertoires of CD4 Teff cells between males and females.
(A-D) Frequencies of TRAV (A), TRAJ (B), TRBV (C), and TRBJ (D) gene usage between males and females. Red lines represent the mean for each sex group. Statistical comparisons between groups were performed using the Wilcoxon test. Stars indicate statistical differences between males and females based on the p-value of the test (*: p < 0.05, **: p < 0.01). (E-F) Principal Component Analysis (PCA) derived from the distribution of the frequency usage of TRAV-TRAJ (E) and TRBV-TRBJ (F) gene associations across sex groups between males and females. Each point represents an individual. Ellipses show 95% confidence intervals. Males are depicted in violet and females in orange.

Some differential V and J gene usage in TCR repertoires of CD4 Treg cells between males and females.
(A-D) Frequencies of TRAV (A), TRAJ (B), TRBV (C), and TRBJ (D) gene usage between males and females. Red lines represent the mean for each sex group. Statistical comparisons between groups were performed using the Wilcoxon test. Stars indicate statistical differences between males and females based on the p-value of the test (*: p < 0.05, **: p < 0.01). (E-F) Principal Component Analysis (PCA) derived from the distribution of the usage frequency of TRAV-TRAJ (E) and TRBV-TRBJ (F) gene associations across sex groups between males and females. Each point represents an individual. Ellipses show 95% confidence intervals. Males are depicted in violet and females in orange.

Minimal differences in diversity profile of CD8 and CD4 Treg thymic TCR repertoire between males and females.
Diversity profile with Rényi diversity index values (α ranging from 0 to infinity) for all cell subsets, both TRA (left) and TRB (right). Clonotypes of each sample were rarefied fifty times to their effective diversity number [i.e. 𝑒𝑆𝑎𝑛𝑛𝑜𝑛 𝑖𝑛𝑑𝑒𝑥] and the median of their Rényi values was used for analysis. Dotted lines and points show the mean value for each sex group, while the shaded area represents the standard deviation for each sex group. The overall shape of the curves was compared statistically using the Kolmogorov-Smirnov test, with the D value and associated p-value indicated.

Comparable TRA CDR3aa length distribution between males and females.
Distribution of CDR3aa length usage for TRA in all cell subsets. Stars indicate statistical differences between males and females based on the p-value of the Wilcoxon test (*: p < 0.05, **: p < 0.01). Males are depicted in violet and females in orange.

Some differences in TRA CDR3aa usage between males and females.
Usage of each aa in the p108 to p114 CDR3 region represented as the log2 fold change of the median usage of females over males in TRA for each cell subset. A line at log2 fold change = 0 indicates the direction of the usage difference. Bars are colored according to the hydropathy class of the amino acid (neutral in gray, hydrophilic in blue-green and hydrophobic in gold). Stars indicate statistical differences between males and females based on the p-value of the Wilcoxon test (*: p < 0.05, **: p < 0.01).

DP TRB thymic sex-associated motifs in our dataset.
Different structural motifs in the CDR3 amino acid (CDR3aa) region found differentially expressed between males and females within our dataset. Local motifs refer to distinct amino acid sequences, whereas global motifs represent motif regions with a single variable amino acid position maintaining a BLOSUM62 score ≥ 0. The heatmap showcases the differential usage of TRB CDR3aa motifs between males and females in of DP cells. Hierarchical clustering reveals clear segregation of individuals by sex, with most CDR3aa motifs being almost exclusively expressed in one sex while absent in the other. Sample attributes are visualized as follows: (1) Sex: males in violet, females in orange; (2) Age class: children (<18 years) in green, young adults (18-50 years) in blue, older adults (>50 years) in red; (3) sequencing platform: HiSeq2500 samples in persimmon, NovaSeq 6000 samples in purple; (4) the total CDR3aa number. Motifs overexpressed in males are shown in violet, those overexpressed in females in orange; local motifs are indicated in blue-green and global motifs in magenta.

CD8 TRB thymic sex-associated motifs in our dataset.
Different structural motifs in the CDR3 amino acid (CDR3aa) region found differentially expressed between males and females within our dataset. Local motifs refer to distinct amino acid sequences, whereas global motifs represent motif regions with a single variable amino acid position maintaining a BLOSUM62 score ≥ 0. The heatmap showcases the differential usage of TRB CDR3aa motifs between males and females in of CD8 cells. Hierarchical clustering reveals clear segregation of individuals by sex, with most CDR3aa motifs being almost exclusively expressed in one sex while absent in the other. Sample attributes are visualized as follows: (1) Sex: males in violet, females in orange; (2) Age class: children (<18 years) in green, young adults (18-50 years) in blue, older adults (>50 years) in red; (3) sequencing platform: HiSeq2500 samples in persimmon, NovaSeq 6000 samples in purple; (4) the total CDR3aa number. Motifs overexpressed in males are shown in violet, those overexpressed in females in orange; local motifs are indicated in blue-green and global motifs in magenta.

CD4 Teff TRB thymic sex-associated motifs in our dataset.
Different structural motifs in the CDR3 amino acid (CDR3aa) region found differentially expressed between males and females within our dataset. Local motifs refer to distinct amino acid sequences, whereas global motifs represent motif regions with a single variable amino acid position maintaining a BLOSUM62 score ≥ 0. The heatmap showcases the differential usage of TRB CDR3aa motifs between males and females in of CD4 Teff cells. Hierarchical clustering reveals clear segregation of individuals by sex, with most CDR3aa motifs being almost exclusively expressed in one sex while absent in the other. Sample attributes are visualized as follows: (1) Sex: males in violet, females in orange; (2) Age class: children (<18 years) in green, young adults (18-50 years) in blue, older adults (>50 years) in red; (3) sequencing platform: HiSeq2500 samples in persimmon, NovaSeq 6000 samples in purple; (4) the total CDR3aa number. Motifs overexpressed in males are shown in violet, those overexpressed in females in orange; local motifs are indicated in blue-green and global motifs in magenta.

CD4 Treg TRB thymic sex-associated motifs in our dataset.
Different structural motifs in the CDR3 amino acid (CDR3aa) region found differentially expressed between males and females within our dataset. Local motifs refer to distinct amino acid sequences, whereas global motifs represent motif regions with a single variable amino acid position maintaining a BLOSUM62 score ≥ 0. The heatmap showcases the differential usage of TRB CDR3aa motifs between males and females in of CD4 Treg cells. Hierarchical clustering reveals clear segregation of individuals by sex, with most CDR3aa motifs being almost exclusively expressed in one sex while absent in the other. Sample attributes are visualized as follows: (1) Sex: males in violet, females in orange; (2) Age class: children (<18 years) in green, young adults (18-50 years) in blue, older adults (>50 years) in red; (3) sequencing platform: HiSeq2500 samples in persimmon, NovaSeq 6000 samples in purple; (4) the total CDR3aa number. Motifs overexpressed in males are shown in violet, those overexpressed in females in orange; local motifs are indicated in blue-green and global motifs in magenta.

TRB thymic sex associated motifs are not differentially expressed between males and females in peripheral TCR repertoire.
Heatmap representing TRB CDR3aa motif usage differentially expressed between males and females in CD4 Teff (A) and CD4 Treg (B) cells in a peripheral TCR repertoire dataset. Sex and total CDR3aa number are depicted by samples. Males are depicted in violet and females in orange and local motif in blue-green and global motifs in magenta.

Enrichment of cancer associated, unmodified self-peptide specific and polyspecific TCR, in our thymic dataset compared to initial curated database.
(A) Representation of the specificity groups in the pooled and curated database. (B) Proportion of unique CDR3aa in the specific TCR repertoire for each samples for autoimmune disease (AID), bacteria, cancer, human and virus groups is compared between males and females. (C) Proportion of polyspecific CDR3aa in the specific repertoire of samples is compared between males and females. Stars indicate statistical differences males or females with the pooled and curated database based on the p-value of the Wilcoxon test (*: p < 0.05, **: p < 0.01). Males are depicted in violet and females in orange.

Enrichment of AID associated TCR and bacteria targeted TCR in CD8 female cells.
CDR3aa usage of CDR3aa with a specificity in autoimmunity disease, bacteria, cancer, human and virus groups. Stars indicate statistical differences between males and females based on the p-value of the Wilcoxon test (*: p < 0.05, **: p < 0.01). Males are depicted in violet and females in orange.

No association between age and thymic usage of CDR3 sequences specific to autoimmunity or bacterial antigens.
Linear regression models assessing the effect of donor age on the cumulative usage (%) of TRB CDR3aa sequences with known specificity to self-antigens associated with autoimmunity (AID, left panels) or bacterial antigens (right panels), in CD8 SP (top) and CD4 Treg SP (bottom) thymic subsets. Each dot represents one donor. The shaded area indicates the 95% confidence interval around the regression line. Reported slope, 95% confidence interval, R², and p-values show no significant effect of age.
