LINEs, SINEs, and LTRs exhibit distinct expression profiles in human thymic cell populations.

(a) UMAP depicting the cell populations present in human thymi (CD4 SP, CD4 single positive thymocytes; CD8 SP, CD8 single positive thymocytes; cTEC, cortical thymic epithelial cells; DC, dendritic cells; DN, double negative thymocytes; DP, double positive thymocytes; Mono/Macro, monocytes and macrophages; mTEC, medullary thymic epithelial cells; NK, natural killer cells; NKT, natural killer T cells; pro/pre-B, pro-B and pre-B cells; Th17, T helper 17 cells; Treg, regulatory T cells; VSMC, vascular smooth muscle cell). Cells were clustered in 19 populations based on the expression of marker genes from Park et al. (40). (b) Upper panel: Heatmap of TE expression during thymic development, with each column representing the expression of one TE subfamily in one cell type. Unsupervised hierarchical clustering was performed, and the dendrogram was manually cut into 3 clusters (red dashed line). Lower panel: The class of TE subfamilies and significant enrichments in the 3 clusters (Fisher’s exact tests; ****p≤0.0001). (pcw, post-conception week; m, month; y, year). (c) Circos plot showing the expression pattern of TE subfamilies across thymic cells. From outermost to innermost tracks: i) proportion of cells in embryonic and postnatal samples, ii) class of TE subfamilies, iii) expression pattern of TE subfamilies identified in (b). TE subfamilies are in the same order for all cell types. (d) Histograms showing the number of cell types sharing the same expression pattern for a given TE subfamily. LINE (n=171), LTR (n=577), and SINE (n=60) were compared to a randomly generated distribution (n=809) (Kolmogorov-Smirnov tests, ****p≤0.0001).

TEs shape complex gene regulatory networks in human thymic cells.

(a) The flowchart depicts the decision tree for each TE promoter or enhancer candidate. (b) Density heatmap representing the correlation coefficient and the empirical p-value determined by bootstrap for TF and TE pairs in each cell type of the dataset. The color code shows density (i.e., the occurrence of TF-TE pairs at a specific point). (c) Connectivity map of interactions between TEs and TFs in mTECs. For visualization purposes, only TF-TE pairs with high positive correlations (Spearman correlation coefficient ≥ 0.3 and p-value adjusted for multiple comparisons with the Benjamini-Hochberg procedure ≤ 0.05) and TF binding sites in ≥ 1% of TE loci are shown. (d) Number of TF-TE interactions for each thymic cell population. (e) Sharing of TF-TE pairs between thymic cell types. (f) Number of promoter (top) or enhancer (bottom) TE candidates per transcription factor in hematopoietic cells of the thymus. (g) The proportion of statistically significant peaks overlapping with TE sequences in ETS1 ChIP-seq data from NK cells. (h) Genomic tracks depicting the colocalization of ETS1 occupancy (i.e., read coverage) and TE sequences (in red) in the upstream region of two genes in ETS1 ChIP-seq data from NK cells. Statistically significant ETS1 peaks are indicated by the black rectangles.

Human pDCs and mTEC(II) express diverse and distinct repertoires of TE sequences.

(a) Diversity of TEs expressed by thymic populations measured by Shannon entropy. The x and y axes represent the median diversity of TEs expressed by individual cells in a population and the global diversity of TEs expressed by an entire population, respectively. The equation and blue curve repres nt a linear model summarizing the data. Thymic APC subsets are indicated in orange. (b) Difference between the observed diversity of TEs expressed by cell populations and the one expected by the linear model in (A). (c) UMAP showing the subsets of thymic APCs (aDC, activated DC; cDC1, conventional DC1; cDC2, conventional DC2; pDC, plasmacytoid DC). (d) Bar plot showing the number and class of differentially expressed TE subfamilies between APC subsets. (e) Frequency of expression of TE subfamilies by the different APC subsets. The distributions for pDCs and mTEC(II) are highlighted in bold.

TE expression in human pDCs is associated with dsRNA formation and type I IFN signaling.

(a) Frequency of LINE, LTR, and SINE subfamilies expression in thymic pDCs. (b) Differential expression of TE subfamilies between splenic and thymic pDCs. TE subfamilies significantly upregulated or downregulated by thymic pDCs are indicated in red and blue, respectively (Upregulated, log2(Thymus/Spleen)≥1 and adj. p≤0.05; Downregulated, log2(Thymus/Spleen)≤-1 and adj. p≤0.05). (c,d) Immunostaining of dsRNAs in human thymic pDCs (CD123+) using the J2 antibody (n=3). (c) One representative experiment. Three examples of CD123 and J2 colocalization are shown with white arrows. (d) J2 staining intensity in CD123+ and CD123- cells from three human thymi (Wilcoxon Rank Sum test, ****p-value≤0.0001). (e) UpSet plot showing gene sets enriched in pDCs compared to the other populations of thymic APCs. On the lower panel, black dots represent cell populations for which gene signatures are significantly depleted compared to pDCs. All comparisons where gene signatures were significantly enriched in pDCs are shown.

AIRE, FEZF2, and CHD4 regulate non-redundant sets of TEs in murine mTECs.

(a) Expression of AIRE, CHD4, and FEZF2 in human TEC subsets. (b) Differential expression of TE loci between wild-type (WT) and Aire-, Chd4-or Fezf2-knockout (KO) mice (Induced, log2(WT/KO)≥2 and adj. p≤0.05; Repressed, log2(WT/KO)≤-2 and adj. p≤0.05). P-values were corrected for multiple comparisons with the Benjamini-Hochberg procedure. The numbers of induced (red) and repressed (blue) TE loci are indicated on the volcano plots. (c) Overlap of TE loci repressed or induced by AIRE, FEZF2, and CHD4. (d) Proportion of TE classes and subfamilies in the TE loci regulated by AIRE, FEZF2, or CHD4, as well as all TE loci in the murine genome for comparison (Chi-squared tests with Bonferroni correction, **adj. p≤0.01, ***adj. p≤0.001). (e) Plots for the tag density of H3K4me3 and H3K4me2 on the sequence and flanking regions (3000 base pairs) of TE loci induced by AIRE, FEZF2, and CHD4. (f) Proportion of statistically significant peaks overlapping TE sequences in AIRE ChIP-seq data from murine mTECs.

Murine cTECs and mTECs present TE MAPs.

(a) mTECs and cTECs were isolated from the thymi of K5D1 mice (n=2). The peptide-MHC I complexes were immunoprecipitated independently for both populations, and MAPs were sequenced by MS analyses. (b) Number of LINE-, LTR-, and SINE-derived MAPs in mTECs and cTECs from K5D1 mice. (c) Distributions of TE subfamilies in murine TECs subsets based on expression level (x-axis) and frequency of expression (y-axis).

Annotation of human thymic cell populations.

Dot plot depicting the expression of marker genes in the annotated cell types of the thymus. The average expression and percentage of cells expressing the gene are represented by the color and size of the dot, respectively (DN, double negative thymocytes; DP, double positive thymocytes; CD8 SP, CD8 single positive thymocytes; CD4 SP, CD4 single positive thymocytes, Treg, regulatory T cells; NKT, natural killer T cells; NK, natural killer cells; DC, dendritic cells; Mono/Macro, monocytes and macrophages; VSMC, vascular smooth muscle cells; cTEC, cortical thymic epithelial cells; mTEC, medullary thymic epithelial cells).

Assignment to cluster 2 is independent of the developmental stage of cells.

The graph depicts the correlation between the proportion of cells of a population originating from a postnatal sample and the proportion of TE subfamilies assigned to cluster 2 by the hierarchical clustering in Figure 1B.

TE expression is negatively correlated with cell proliferation.

(a) Spearman correlation between the expression of TE subfamilies and cell cycle scores. Positively (r≥0.2 and adj. p≤0.01) and negatively (r≤-0.2 and adj. p≤0.01) correlated subfamilies are red and blue, respectively. P-values were corrected for multiple comparisons with the Benjamini-Hochberg method). Proportion of subfamilies positively or negatively correlated with cell proliferation belonging to each TE class. (c) Percentage of overlap of TE subfamilies positively or negatively correlated with proliferation between cell types.

KZFPs repress TE expression in the hematopoietic lineage of the human thymus.

Lower panel: pairs of TE subfamilies and KZFPs significantly correlated in at least two cell types (significant correlation: r>0.2 and adj. p≤0.05, or r<-0.2 and adj. p≤0.05, p-values corrected for multiple comparisons with the Benjamini-Hochberg method). Middle panel: Enrichment of the KZFP in the sequence of the correlated TE subfamily in ChIP-seq data from Imbeault et al. (44). Upper panel: Age of TE subfamilies in millions of years (My). The dashed line indicates the estimated time of divergence between primates and rodents (82 million years ago).

Interaction networks between transcription factors and TE subfamilies.

For each cell type, networks illustrate the interactions between TF and TE subfamilies. Pairs of TF and TE are connected by edges when i) their expressions are significantly correlated (Spearman correlation coefficient ≥ 0.2) and ii) the TF binding motifs are found in the loci of the TE subfamily. TE subfamilies are colored based on the class of TE subfamily (LINE, LTR, and SINE).

Frequency of interactions between transcription factors and TE subfamilies in thymic cells.

For each cell type of the stromal (left) or hematopoietic (right) compartments of the thymus, the graph shows the number of interactions between transcription factors and TE subfamilies of the LINE, LTR, or SINE groups.

TE subfamilies occupying larger genomic spaces interact more frequently with TF.

(a) Number of interactions formed with TFs for each TE subfamily of the LINE, LTR, and SINE classes (Wilcoxon-Mann-Whitney tests, ****p≤0.0001). (b) Scatterplot depicting the Kendall tau correlation between the number of interactions with TFs of a TE subfamily and the number of loci of that subfamily in the human genome. The color code indicates the class of TE subfamilies.

TE expression decreases during thymocyte differentiation.

The average expression level of TE subfamilies across cells of the four main populations of thymocytes is shown: DN, DP, CD4 SP, and CD8 SP. Black lines between thymocyte subsets connect expression values for the same TE subfamily.

Annotation of human thymic antigen-presenting cell subsets.

Dot plot depicting the expression of marker genes in the annotated cell types of the thymus. The average expression and percentage of cells expressing the gene are represented by the color and size of the dot, respectively. Myoid-, myeloid- and neuroendocrine-related genes are used as markers of mimetic mTEC. (aDC, activated dendritic cell; cDC1, conventional dendritic cell 1; cDC2, conventional dendritic cell 2; cTEC, cortical thymic epithelial cell; mTEC, medullary thymic epithelial cell; pDC, plasmacytoid dendritic cell).

Differential TE expression in metacells of human thymic antigen-presenting cells.

(a) Cellular composition of the metacells (x-axis) based on the manual annotation of the thymic cell populations (see Fig. S1). (b) Number of TE subfamilies overexpressed expressed between the metacells. TE subfamilies are colored based on class (LINE, LTR, and SINE). (c) Percentage of overlap of the TE subfamilies overexpressed by each metacell.

TE expression in human splenic pDCs.

(a) UMAP depicting the cell populations present in the human spleen. (b) Dot plot showing the expression of marker genes in the annotated cell types of the spleen. The average expression and percentage of cells expressing the gene are represented by the color and size of the dot, respectively. (c) Diversity of TE expressed by splenic populations measured by Shannon entropy. The x and y axes represent the median diversity of TE expressed by individual cells of a population and the global diversity of TE expressed by discrete populations, respectively. The equation and blue curve represent a linear model summarizing the data. (d) Bar plot showing the number (y-axis) and class (color) of differentially expressed TE subfamilies between splenic cell populations.

A higher proportion of reads originates from TEs in pDCs than in other thymic APCs.

Boxplots depicting the percentage of reads assigned to (a) TE sequences or (b) mitochondrial reads in the different subpopulations of thymic APCs.

Characterization of TE subfamilies regulated by AIRE, CHD4, and FEZF2 in murine mTECs.

(a) Class of TEs induced or repressed by AIRE, CHD4, and FEZF2. Distributions were compared to the proportion of LINEs, LTRs, and SINEs amongst all TE sequences of the murine genome with Chi-squared tests (**p≤0.01, ***p≤0.001). (b) Age of TEs induced, repressed, or independent of AIRE, CHD4, and FEZF2 (Wilcoxon-Mann-Whitney test, *p≤0.05, ***p≤0.001) (My, millions of years). (c) Distance between TE loci induced by AIRE, FEZF2, and CHD4, and random selections of TE loci (Wilcoxon rank-sum tests, ***p≤0.001). (d) Genomic localization of the TE loci induced or repressed by AIRE, CHD4, and FEZF2. (e) Intron retention ratio of intronic TE induced or independent of A RE, CHD4, and FEZF2. The dashed line represents intron retention events occurring in at least 10% of transcripts.