Regulatory networks of KRAB zinc finger genes and transposable elements changed during human brain evolution and disease
Figures

TEKRABber and the overview of the analysis workflow.
(A) Two independent RNA-seq datasets, Primate Brain Data and Mayo Data, were analyzed in this study. (1) Transcriptomic data were first preprocessed by removing adapters and low-quality reads and then mapped to their reference genome using STAR to generate BAM files. (2) TEtranscript was used to quantify the expression of genes and transposable elements (TEs). (3) Expression profiles were normalized across different species. (4) Differential expression (DE) analysis and pairwise correlations were calculated. Steps (3) and (4) were developed together in an R Bioconductor package, TEKRABber. (B) The user interface of TEKRABber features a dashboard layout that allows users to explore one-to-one gene-TE interactions, including correlation and differential expression results (more details in Materials and methods section).

Comparison of differentially expressed (DE) transposable elements (TEs) with and without scaling.
For across species comparison between humans and the NHPs indicated on the x-axis, using expression data from the primary and secondary cortices as an example. The y-axis shows the percentage of TEs that were only called using scaled or non-scaled data (the remainder needed to add up to 100% is the overlap of DE TEs between both methods). To be called DE, the TE needed to show an absolute log2foldchange larger than 1.5 and adjusted p-value < 0.05. The impact of scaling is the highest for the most distantly related species, the rhesus macaque, where more than 30% of TEs changed in assignment to being DE or not depending on the applied scaling.

Expression of KRAB-ZNF genes and transposable elements (TEs) in Primate Brain Data.
(A) t-SNE plots of the expression of KRAB-ZNF genes and TEs from all 422 samples, including human, chimpanzee, bonobos, and macaques labeled by species and different brain regions. (B) Differentially expressed KRAB-ZNF genes and TEs comparing human and chimpanzee in primary and secondary cortices. (C) Species tree with the inferred numbers of TEs and KRAB-ZNFs that have evolved per branch. (Note: There are 247 relatively old TEs and 52 KRAB-ZNFs that were difficult to place into a specific branch. Thus, they are not presented in this panel.) (D) Expression of KRAB-ZNF genes and TEs in primary and secondary cortices across species. Both KRAB-ZNF genes and TEs were grouped into two groups based on their inferred evolutionary age, old (>44.2 million years ago [mya]) and young (≤44.2 mya). Young KRAB-ZNFs and young TEs have lower expression levels (Wilcoxon rank sum test, p<0.05). Expressions of all brain regions can be found in Figure 2—figure supplement 3. (E) Percentage of differentially expressed KRAB-ZNF genes and TEs in humans compared to chimpanzees in primary and secondary cortices and cerebellar white matter. (F) Human-specific differentially expressed (DE) (i.e. human-specifically changed) KRAB-ZNF genes and TEs in primary and secondary cortices compared to nonhuman primates (NHPs). Gray indicates no expression information. The colors for age inferences in (C), (D), (E), and (F) are the same: blue for evolutionary old and orange for evolutionary young KRAB-ZNFs and TEs, respectively.

Distribution of KRAB-ZNFs evolutionary age inference.
The evolutionary age of KRAB-ZNFs was inferred from GenTree (Shao et al., 2019) and primate orthologous annotations (Jovanovic et al., 2021). The evolutionary young group (≤ 44.2 mya) is in orange and the evolutionary old group (> 44.2 mya) is in blue.

Distribution of transposable elements (TEs) evolutionary age inference.
The age of TEs were estimated using Dfam subfamily species annotation. The evolutionary young group (≤ 44.2 mya) is in orange and the evolutionary old group (> 44.2 mya) is in blue.

Expression of KRAB-ZNFs and transposable elements (TEs) among brain regions.
Both KRAB-ZNF genes and TEs were grouped in two groups based on their inferred evolutionary age, old (> 44.2 mya) and young (≤ 44.2mya). Young KRAB-ZNFs and young TEs have lower expression levels (Wilcoxon Rank Sum Test, p < 0.05).

TE:KRAB-ZNF in human primary and secondary cortices.
(A) and (B) demonstrate the workflow for checking significant TE:KRAB-ZNF using the primary and secondary cortices as an example. (A) We used randomly selected gene sets and KRAB-ZNFs to calculate correlations with transposable elements (TEs). The violet dots indicate the correlation counts of TE:KRAB-ZNF based on comparing all correlations, positive correlations, and negative correlations. They are significantly higher than for random gene sets (boxplots below, 1000 iterations, p<0.001). (B) Overlaps between TE:KRAB-ZNF (y-axis) and the KRAB-ZNF protein ChIP-exo data (Imbeault et al., 2017) (x-axis). Note that we use absolute coefficient values for negative correlations. Correlations under the yellow area are selected (C) Jaccard similarity, demonstrating that the correlations between TEs and KRAB-ZNFs overlapped significantly more with ChIP-exo data than randomly selected TEs and KRAB-ZNFs (p<0.001). The points indicate the overlap with actual correlations and ChIP-exo data. The boxplots indicate the overlap between randomly selected TE and KRAB-ZNF pairs with ChIP-exo data. (D) Subsets of the number of positive and negative TE:KRAB-ZNF in the primary and secondary cortices (c1_p and c1_n) and the limbic and association cortices (c2_p and c2_n). (E) TE:KRAB-ZNF network in the primary and secondary cortices with five modules. Nodes are colored in five colors representing the five modules, and nodes in white do not belong to any module. Young links are in orange and old links are in blue. (F) Distribution of the normalized degree counts in TE and KRAB-ZNF nodes in TE:KRAB-ZNF network. (G) This is the subnetwork colored in pink from (E), showing that this module mainly consists of Alu subfamilies. (H) The log count of correlations classified by TEs and the categories of links, including positive-old (P-O), positive-young (P-Y), negative-old (N-O), and negative young (N-Y). Red stars indicate that the class distribution of the TEs is significantly different (Chi-squared test, p<0.001). The right-hand side barplot shows the exact count of Alu subfamilies from the first row in the heatmap.

Distribution of correlation in limbic and association cortices.
This is the same method mentioned in Figure 3A and B. (A) Higher number of correlations comparing to KRAB-ZNFs (pink dot) and random selected genes (p < 0.001) (B) We use a threshold adjusted p-value < 0.01 and absolute coefficient larger than 0.4 to select TE:KRAB-ZNF for down-stream analysis.

Example of correlations (TE:KRAB-ZNF).
We define it as a significant one-to-one correlation between TE and KRAB-ZNF based on the absolute coefficient being larger than 0.4 and the adjusted p-value being smaller than 0.01. We classify it as a young correlation when at least one of the components (TE or KRAB-ZNF gene) is evolutionary young. (A) A negative young example (young TE correlated with young KRAB-ZNF) (B) A negative young example (young TE correlated with old KRAB-ZNF) (C) An old example (old TE correlated with old KRAB-ZNF). X-axis is the log expression level of KRAB-ZNF and y-axis is the log expression of TE.

TE:KRAB-ZNF network in human limbic and association cortices.
Nodes are colored based on 13 different modules. Nodes in white do not belong to any module. Links demonstrate the evolutionary age of the interaction (young link in orange; old link in blue). (A) primary and secondary cortices (cluster1) and (B) limbic and association cortices (cluster2). Violet dots indicating numbers of significant correlations of TE:KRAB-ZNF (adjusted p < 0.01). Box plots indicate the distribution of 1000 iterations of random selected genes correlated with TEs (hm: human, pt: chimpanzee, pp: bonobo, mm: macaque, all: positive and negative correlations, negative: negative correlations, positive: positive correlations).

Comparison of TE:KRAB-ZNF in human and nonhuman primates (NHPs).
(A) Workflow for selecting TE:KRAB-ZNF comparing between species. First, there were 178 transposable elements (TEs) and 836 KRAB-ZNFs detected in all four species. Second, the leave-one-out test in the human sample was performed for a fair comparison since humans had four repeats and NHPs only had three (adjusted p<0.01 and absolute coefficient >0.4) (B) Number of positive and negative correlations in human and NHPs. Red brackets indicated the change of correlation between two species. For example, there are 276 human positive correlations which are negatively correlated in bonobos (suffix n: negative, p: positive; hs: Homo sapiens, pt: Pan troglodytes, pp: Pan paniscus, mm: Macaca mulatta). (C) Network of 276 TE:KRAB-ZNF that were all positively correlated in humans but negatively correlated in bonobos. This network demonstrates two hubs, ZNF528 and ZNF112, connecting to multiple TE subfamilies. Node size of TEs refers to the relative abundance of connections to the hubs. Details of this network can be found in Figure 4—figure supplement 2. (D) ZNF528 protein sequence difference in a zinc finger domain (ZF), where humans have glutamine (Q) while bonobos have histidine (H) at the –1 finger position. The lower part of the illustration indicates that the zinc finger domain binds to the DNA sequence using the –1, 3, and 6 finger positions. (E) Number of different TE subfamily nodes which form evolutionary old and young correlations in (C) network comparing humans to bonobos. (F) Distribution counts of human-specific correlations categorized based on TE subfamilies showing only young links. N-Y: negative and young correlations; P-Y: positive and young correlations.

276 opposite TE:KRAB-ZNF regulatory network comparing humans to bonobos.
This bipartite network refers to the 276 TE:KRAB-ZNF network mentioned in Figure 4C. KRAB-ZNF nodes are in violet and TE nodes are in green. The colors of the border of nodes and the edges represent their evolutionary age. The evolutionary young nodes and edges are in orange and the evolutionary old nodes and edges are in blue.

Expression of transposable elements (TEs) and KRAB-ZNF genes in Mayo Data.
(A) Variations in the expression of KRAB-ZNF genes and TEs using t-SNE analysis. cbe: cerebellum; tcx: temporal cortex. (B) Distributions of the expression of evolutionary old and young KRAB-ZNF genes and TEs. (C) and (D) Differentially expressed KRAB-ZNF genes and TEs (absolute log2FoldChange>0.5, p<0.05) in temporal cortex (tcx) and cerebellum (cbe). The expression of KRAB-ZNF genes and TEs in the cerebellum shared the same log expression scale.

TE:KRAB-ZNF analysis in Mayo Data.
(A) Overlaps of TE:KRAB-ZNF between control and Alzheimer’s disease (AD) condition in temporal cortex and cerebellum (denoted as cbe_control, tcx_control, cbe_AD, and tcx_AD). (B) 21 human-control-specific TE:KRAB-ZNFs were selected from the intersection of human-specific TE:KRAB-ZNFs from Primate Brain Data (not detected in the other nonhuman primates [NHPs]) and control-specific TE:KRAB-ZNFs from Mayo Data in temporal cortex (not detected in AD samples). (C) Distribution of transposable element (TE) families counts among the 21 control-specific TE:KRAB-ZNF in the temporal cortex. (D) Comparison of the expression and correlation results of AluYc:ZNF182 and L1MA6:ZNF211 in the temporal cortex. (E) and (F) show the bipartite network of 21 TE:KRAB-ZNF in the temporal cortex. (E) Coloring TE nodes in green and KRAB-ZNF nodes in violet. Evolutionary young links are in orange, and evolutionary old links are in blue. Orange border specified that this TE or KRAB-ZNF evolved recently. (F) There are two modules in the network colored in pink and gray based on their bipartite modularity. Brown links indicate negative correlations, and green links are positive correlations.
Tables
Comparison of transposable element (TE) expression analysis software.
Software name | Description | Comparison feature | References |
---|---|---|---|
RepEnrich | Combines different mapping strategies for differentially expressed TE analysis using RNA-seq and ChIP-seq data | Different conditions (same species) | Criscione et al., 2014 |
TETools | Compares TE expression from RNA-seq data | Different conditions (same species) | Lerat et al., 2016 |
Telescope | Estimates TEs in specific genomic locations using RNA-seq data | One condition in one species | Bendall et al., 2019 |
TE Density | Provides a metric showing the presence of TEs relative to genes within flexible genomic distance | One condition in one species | Teresi et al., 2022 |
PlanTEenrichment | Calculates TE enrichment upon inputting a differentially expressed gene list and selection of a specific plant species | Different conditions (same species) | Eskier et al., 2023 |
GeneTEFlow | A nextflow pipeline for analyzing differential expression of genes and TEs | Different conditions (same species) | Liu et al., 2020 |
TEffectR | Estimates the proximal TE effects on gene expression using a linear regression model | Different conditions (same species) | Karakülah et al., 2019 |
TEKRABber | Computes differentially expressed genes/TEs and one-to-one correlations using RNA-seq data | Different conditions (same species) Across species comparison (different species) | Method presented here |
Datasets.
Dataset | Categories(biological replicates) | Total number of samples |
---|---|---|
Primate Brain Data (GSE127898) | Human (4) | 132 |
Chimpanzee (3) | 96 | |
Macaque (3) | 96 | |
Bonobo (3) | 98 | |
Mayo Data (syn5550404) | Control-temporal cortex | 23 |
AD temporal cortex Control-cerebellum AD cerebellum | 24 23 22 |
Additional files
-
Supplementary file 1
Supplementary Tables.
- https://cdn.elifesciences.org/articles/103608/elife-103608-supp1-v1.xlsx
-
MDAR checklist
- https://cdn.elifesciences.org/articles/103608/elife-103608-mdarchecklist1-v1.docx