LINEs, SINEs, and LTRs exhibit distinct expression profiles in thymic cell populations.

(a) UMAP depicting the cell populations present in human thymi (CD4 SP, CD4 single positive thymocytes; CD8 SP, CD8 single positive thymocytes; cTEC, cortical thymic epithelial cells; DC, dendritic cells; DN, double negative thymocytes; DP, double positive thymocytes; Mono/Macro, monocytes and macrophages; mTEC, medullary thymic epithelial cells; NK, natural killer cells; NKT, natural killer T cells; pro/pre-B, pro-B and pre-B cells; Th17, T helper 17 cells; Treg, regulatory T cells; VSMC, vascular smooth muscle cell). Cells were clustered in 19 populations based on the expression of marker genes from Park et al. (40). (b) Upper panel: Heatmap of TE expression during thymic development, with each column representing the expression of one TE subfamily in one cell type. Unsupervised hierarchical clustering was performed, and the dendrogram was manually cut into 3 clusters (red dashed line). Lower panel: The class of TE subfamilies and significant enrichments in the 3 clusters (Fisher’s exact tests; ****p≤0.0001). (pcw, post-conception week; m, month; y, year). (c) Circos plot showing the expression pattern of TE subfamilies across thymic cells. From outermost to innermost tracks: i) proportion of cells in embryonic and postnatal samples, ii) class of TE subfamilies, iii) expression pattern of TE subfamilies identified in (b). TE subfamilies are in the same order for all cell types. (d) Histograms showing the number of cell types sharing the same expression pattern for a given TE subfamily. LINE (n=171), LTR (n=577), and SINE (n=60) were compared to a randomly generated distribution (n=809) (Kolmogorov-Smirnov tests, ****p≤0.0001).

TEs shape complex gene regulatory networks in thymic cells.

(a) Flowchart depicting the decision tree for each TE promoter or enhancer candidate. (b) Density heatmap representing the correlation coefficient and the empirical p-value determined by bootstrap for TF and TE pairs in each cell type of the dataset. The color code shows density (i.e., the occurrence of TF-TE pairs at a specific point). (c) Connectivity map of interactions between TEs and TFs in mTECs. For visualization purposes, only TF-TE pairs with high positive correlations (Spearman correlation coefficient ≥ 0.3 and p-value adjusted for multiple comparisons with the Benjamini-Hochberg procedure ≤ 0.05) and TF binding sites in ≥ 1% of TE loci are shown. (d) Number of TF-TE interactions for each thymic cell population. (e) Sharing of TF-TE pairs between thymic cell types. (f) Number of promoter (top) or enhancer (bottom) TE candidates per transcription factor in hematopoietic cells of the thymus. (g) Genomic tracks depicting ETS1 occupancy (i.e., read coverage) of two identified TE promoter candidates (in red) in ETS1 ChIP-seq data from NK cells.

pDCs and mTEC(II) express diverse and distinct repertoires of TE sequences.

(a) Diversity of TEs expressed by thymic populations measured by Shannon entropy. The x and y axes represent the median diversity of TEs expressed by individual cells in a population and the global diversity of TEs expressed by an entire population, respectively. The equation and blue curve represent a linear model summarizing the data. Thymic APC subsets are indicated in orange. (b) Difference between the observed diversity of TEs expressed by cell populations and the one expected by the linear model in (A). (c) UMAP showing the subsets of thymic APCs (aDC, activated DC; cDC1, conventional DC1; cDC2, conventional DC2; pDC, plasmacytoid DC). (d) Bar plot showing the number and class of differentially expressed TE subfamilies between APC subsets. (e) Frequency of expression of TE subfamilies by the different APC subsets. The distributions for pDCs and mTEC(II) are highlighted in bold.

TE expression in pDCs leads to dsRNA formation and type I IFN signaling.

(a) Frequency of expression of LINE, LTR, and SINE subfamilies in thymic pDCs. (b) Differential expression of TE subfamilies between splenic and thymic pDCs. TE subfamilies significantly upregulated or downregulated by thymic pDCs are indicated in red and blue, respectively (Upregulated, log2(Thymus/Spleen)≥1 and adj. p≤0.05; Downregulated, log2(Thymus/Spleen)≤-1 and adj. p≤0.05). (c,d) Immunostaining of dsRNAs in human thymic pDCs (CD123+) using the J2 antibody (n=3). (c) One representative experiment. Three examples of CD123 and J2 colocalization are shown with white arrows. (d) J2 staining intensity in CD123+ and CD123- cells from three human thymi (Wilcoxon Rank Sum test, ****p-value≤0.0001). (e) UpSet plots showing gene sets enriched in pDCs compared to the other populations of thymic APCs. On the lower panel, black dots represent cell populations for which gene signatures are significantly depleted compared to pDCs. All comparisons where gene signatures were significantly enriched in pDCs are shown.

AIRE, FEZF2, and CHD4 regulate non-redundant sets of TEs in mTECs.

(a) Expression of AIRE, CHD4, and FEZF2 in human TEC subsets. (b) Differential expression of TE loci between wild-type (WT) and Aire-, Chd4- or Fezf2-knockout (KO) mice (Induced, log2(WT/KO)≥2 and adj. p≤0.05; Repressed, log2(WT/KO)≤-2 and adj. p≤0.05). P-values were corrected for multiple comparisons with the Benjamini-Hochberg procedure. The numbers of induced (red) and repressed (blue) TE loci are indicated on the volcano plots. (c) Overlap of TE loci repressed or induced by AIRE, FEZF2, and CHD4. (d) Proportion of TE classes and subfamilies in the TE loci regulated by AIRE, FEZF2, or CHD4, as well as all TE loci in the murine genome for comparison (Chi-squared tests with Bonferroni correction, **adj. p≤0.01, ***adj. p≤0.001). (e) Distance between TE loci induced by AIRE, FEZF2, and CHD4, and random selections of TE loci (Wilcoxon rank-sum tests, ***p≤0.001). (f) Plots for the tag density of H3K4me3 and H3K4me2 on the sequence and flanking regions (3000 base pairs (bp)) of TE loci induced by AIRE, FEZF2, and CHD4.

TE MAPs are presented by cTECs and mTECs.

(a) mTECs and cTECs were isolated from the thymi of K5D1 mice (n=2). The peptide-MHC I complexes were immunoprecipitated for both populations independently, and MAPs were sequenced by MS analyses. (b) Number of LINE-, LTR-, and SINE-derived MAPs in mTECs and cTECs from K5D1 mice. (c) Distributions of TE subfamilies in murine TECs subsets based on expression level (x-axis) and frequency of expression (y-axis).

Annotation of human thymic cell populations.

Dot plot depicting the expression of marker genes in the annotated cell types of the thymus. The average expression and percentage of cells expressing the gene are represented by the color and size of the dot, respectively (DN, double negative thymocytes; DP, double positive thymocytes; CD8 SP, CD8 single positive thymocytes; CD4 SP, CD4 single positive thymocytes, Treg, regulatory T cells; NKT, natural killer T cells; NK, natural killer cells; DC, dendritic cells; Mono/Macro, monocytes and macrophages; VSMC, vascular smooth muscle cells; cTEC, cortical thymic epithelial cells; mTEC, medullary thymic epithelial cells).

Assignation to cluster 2 is independent of the developmental stage of cells.

Correlation between the proportion of cells of a population originating from a postnatal sample and the proportion of TE subfamilies assigned to the cluster 2 by the hierarchical clustering in Fig. 1B.

TE expression is negatively correlated with cell proliferation.

a Spearman correlation between the expression of TE subfamilies and cell cycle scores. Positively (r≥0.2 and adj. p≤0.01) and negatively (r≤-0.2 and adj. p≤0.01) correlated subfamilies are red and blue, respectively. p-values were corrected for multiple comparisons with the Benjamini-Hochberg method). b Proportion of subfamilies positively or negatively correlated with cell proliferation belonging to each TE class. c Percentage of overlap of TE subfamilies positively or negatively correlated with cell proliferation between cell types.

KZFPs repress TE expression in the hematopoietic lineage of the thymus.

Lower panel: pairs of TE subfamilies and KZFPs significantly correlated in at least 2 cell types (significant correlation: r>0.2 and adj. p≤0.05, or r<-0.2 and adj. p≤0.05, p-values corrected for multiple comparisons with the Benjamini-Hochberg method). Middle panel: Enrichment of the KZFP in the sequence of the correlated TE subfamily in ChIP-seq data from Imbeault et al (1). Upper panel: Age of TE subfamilies in millions of years (My). The estimated time of divergence between primates and rodents (82 million years ago) is indicated by the dashed line.

Interaction networks between transcription factors and TE subfamilies.

For each cell type, networks illustrate the interactions between TF and TE subfamilies. Pairs of TF and TE are connected by edges when i) their expressions are significantly correlated (Spearman correlation coefficient ≥ 0.2) and ii) the TF binding motifs are found in the loci of the TE subfamily. TE subfamilies are colored based on the class of TE subfamily (LINE, LTR, and SINE).

TE subfamilies occupying larger genomic spaces interact more frequently with TF.

a Number of interactions formed with TFs for each TE subfamily of the LINE, LTR, and SINE classes (Wilcoxon-Mann-Whitney tests, ****p≤0.0001). b Scatterplot depicting the Kendall tau correlation between the number of interactions with TFs of a TE subfamily and the number of loci of that subfamily in the human genome. The color code indicates the class of TE subfamilies.

Annotation of human thymic antigen presenting cell subsets.

Dot plot depicting the expression of marker genes in the annotated cell types of the thymus. The average expression and percentage of cells expressing the gene are represented by the color and size of the dot, respectively. Myoid-, myeloid- and neuroendocrine-related genes are used as markers of mimetic mTEC. (aDC, activated dendritic cell; cDC1, conventional dendritic cell 1; cDC2, conventional dendritic cell 2; cTEC, cortical thymic epithelial cell; mTEC, medullary thymic epithelial cell; pDC, plasmacytoid dendritic cell).

Differential TE expression in metacells of thymic antigen presenting cells.

a Cellular composition of the metacells (x axis) based on the manual annotation of the thymic cell populations (see Fig. S1). b Number of TE subfamilies overexpressed expressed between the metacells. TE subfamilies are colored based on their class (LINE, LTR, and SINE). c Percentage of overlap of the TE subfamilies overexpressed by each metacell.

TE expression in splenic pDCs.

a UMAP depicting the cell populations present in the human spleen. b Dot plot showing the expression of marker genes in the annotated cell types of the spleen. The average expression and percentage of cells expressing the gene are represented by the color and size of the dot, respectively. c Diversity of TE expressed by splenic populations measured by Shannon entropy. The x and y axes represent the median diversity of TE expressed by individual cells of a population and the global diversity of TE expressed by a population, respectively. A linear model summarizing the data is represented by the equation and blue curve. d Bar plot showing the number (y axis) and class (color) of differentially expressed TE subfamilies between splenic cell populations.

Characterization of TE subfamilies regulated by AIRE, CHD4 and FEZF2.

a Class of TE induced or repressed by AIRE, CHD4 and FEZF2. Distributions were compared to the proportion of LINE, LTR, and SINE amongst all TE sequences of the murine genome with Chi-squared tests (**p≤0.01, ***p≤0.001). b Age of TE induced, repressed or independent of AIRE, CHD4 and FEZF2 (Wilcoxon-Mann-Whitney test, *p≤0.05, ***p≤0.001) (My, millions of years). c Genomic localization of the TE loci induced or repressed by AIRE, CHD4 and FEZF2. d Intron retention ratio of intronic TE induced or independent of AIRE, CHD4 and FEZF2. Dashed line represents intron retention events occurring in at least 10% of transcripts. e Plots for the tag density of histone marks on the sequence and flanking regions (3000bp) of TE loci induced by AIRE, CHD4, and FEZF2.