Comparison of TE expression analysis software

TEKRABber and the overview of the analysis workflow.

(A) Two independent RNA-seq datasets, Primate Brain Data and Mayo Data, were analyzed in this study. (1) Transcriptomic data were first preprocessed by removing adapters and low-quality reads and then mapped to their reference genome using STAR to generate BAM files. (2) TEtranscript was used to quantify the expression of genes and TEs. (3) Expression profiles were normalized across different species. (4) DE analysis and pairwise correlations were calculated. Steps (3) and (4) were developed together in an R Bioconductor package, TEKRABber. (B) The user interface of TEKRABber features a dashboard layout that allows users to explore one-to-one gene-TE interactions, including correlation and differential expression results. (More details in Materials and Methods section)

Expression of KRAB-ZNF genes and TEs in Primate Brain Data.

(A) t-SNE plots of the expression of KRAB-ZNF genes and TEs from all 422 samples including human, chimpanzee, bonobos, and macaques labeled by species and different brain regions. (B) Differentially expressed KRAB-ZNF genes and TEs comparing human and chimpanzee in primary and secondary cortices. (C) Species tree with the inferred numbers of TEs and KRAB-ZNFs that have evolved per branch. (Note: There are 247 relatively old TEs and 52 KRAB-ZNFs that were difficult to place into a specific branch. Thus, they are not presented in this panel) (D) Expression of KRAB-ZNF genes and TEs in primary and secondary cortices across species. Both KRAB-ZNF genes and TEs were grouped in two groups based on their inferred evolutionary age, old (> 44.2 mya) and young (≤ 44.2 mya). Young KRAB-ZNFs and young TEs have lower expression levels (Wilcoxon Rank Sum Test, p < 0.05). Expressions of all brain regions can be found in Figure S4. (E) Percentage of differentially expressed KRAB-ZNF genes and TEs in humans compared to chimpanzees in primary and secondary cortices and cerebellar white matter. (F) Human-specific DE (human-specifically-changed) KRAB-ZNF genes and TEs in primary and secondary cortices compared to NHPs. Gray indicates no expression information. The colors for age inferences in (C), (D), (E), and (F) are the same: blue for evolutionary old and orange for evolutionary young KRAB-ZNFs and TEs, respectively.

TE:KRAB-ZNF in human primary and secondary cortices.

(A) and (B) demonstrate the workflow for checking significant TE:KRAB-ZNF using the primary and secondary cortices as an example. (A) We used randomly selected gene sets and KRAB-ZNFs to calculate correlations with TEs. The violet dots indicate the correlation counts of TE:KRAB-ZNF based on comparing all correlations, positive correlations and negative correlations. They are significantly higher than for random gene sets (boxplots below, 1000 iterations, p < 0.001). (B) Overlaps between TE:KRAB-ZNF (y-axis) and the KRAB-ZNF protein ChIP-exo data (Imbeault et al. 2017) (x-axis). Note that we use absolute coefficient values for negative correlations. Correlations under the yellow area are selected (C) Jaccard similarity, demonstrating that the correlations between TEs and KRAB-ZNFs overlapped significantly more with ChIP-exo data than random selected TEs and KRAB-ZNFs (p < 0.001). The points indicate the overlap with actual correlations and ChIP-exo data. The boxplots indicate the overlap between randomly selected TE and KRAB-ZNF pairs with ChIP-exo data. (D) Subsets of the number of positive and negative TE:KRAB-ZNF in the primary and secondary cortices (c1_p and c1_n) and the limbic and association cortices (c2_p and c2_n). (E) TE:KRAB-ZNF network in the primary and secondary cortices with five modules. Nodes are colored in five colors representing the five modules and nodes in white do not belong to any module. Young links are in orange and old links are in blue. (F) Distribution of the normalized degree counts in TE and KRAB-ZNF nodes in TE:KRAB-ZNF network. (G) This is the subnetwork colored in pink from (E), showing that this module mainly consists of Alu subfamilies. (H) The log count of correlations classified by TEs and the categories of links, including positive-old (P-O), positive-young (P-Y), negative-old (N-O) and negative young (N-Y). Red stars indicate that the class distribution of the TEs is significantly different (Chi-squared test, p < 0.001). The right-hand side barplot shows the exact count of Alu subfamilies from the first row in the heatmap.

Comparison of TE:KRAB-ZNF in human and NHPs.

(A) Workflow for selecting TE:KRAB-ZNF comparing between species. First, there were 178 TEs and 836 KRAB-ZNFs detected in all four species. Second, the leave-one-out test in the human sample was performed for a fair comparison since humans had four repeats and NHPs only had three. (adjusted p < 0.01 and absolute coefficient > 0.4) (B) Number of positive and negative correlations in human and NHPs. Red brackets indicated the change of correlation between two species. For example, there are 276 human positive correlations which are negatively correlated in bonobos (suffix n: negative, p: positive; hs: Homo sapiens, pt: Pan troglodytes, pp: Pan paniscus, mm: Macaca mulatta). (C) Network of 276 TE:KRAB-ZNF that were all positively correlated in humans but negatively correlated in bonobos. This network demonstrates two hubs, ZNF528 and ZNF112, connecting to multiple TE subfamilies. Node size of TEs refers to the relative abundance of connections to the hubs. Details of this network can be found in Figure S9. (D) ZNF528 protein sequence difference in a zinc finger domain (ZF), where humans have Glutamine (Q) while bonobos have Histidine (H) at the -1 finger position. The lower part of the illustration indicates that the zinc finger domain binds to the DNA sequence using the -1, 3, and 6 finger positions. (E) Number of different TE subfamily nodes which form evolutionary old and young correlations in (C) network comparing humans to bonobos. (F) Distribution counts of human-specific correlations categorized based on TE subfamilies showing only young links. N-Y: negative and young correlations; P-Y: positive and young correlations.

Datasets

Expression of TEs and KRAB-ZNF genes in Mayo Data.

(A) Variations in the expression of KRAB-ZNF genes and TEs using t-SNE analysis. cbe: cerebellum; tcx: temporal cortex. (B) Distributions of the expression of evolutionary old and young KRAB-ZNF genes and TEs. (C) and (D) Differentially expressed KRAB-ZNF genes and TEs (absolute log2FoldChange > 0.5, p < 0.05) in temporal cortex (tcx) and cerebellum (cbe). The expression of KRAB-ZNF genes and TEs in cerebellum shared the same log expression scale.

TE:KRAB-ZNF analysis in Mayo Data.

(A) Overlaps of TE:KRAB-ZNF between control and AD condition in temporal cortex and cerebellum (denoted as cbe_control, tcx_control, cbe_AD and tcx_AD). (B) 21 human-control-specific TE:KRAB-ZNFs were selected from the intersection of human-specific TE:KRAB-ZNFs from Primate Brain Data (not detected in the other NHPs) and control-specific TE:KRAB-ZNFs from Mayo Data in temporal cortex (not detected in AD samples). (C) Distribution of TE families counts among the 21 control-specific TE:KRAB-ZNF in the temporal cortex. (D) Comparison of the expression and correlation results of AluYc:ZNF182 and L1MA6:ZNF211 in the temporal cortex. (E) and (F) show the bipartite network of 21 TE:KRAB-ZNF in the temporal cortex. (E) Coloring TE nodes in green and KRAB-ZNF nodes in violet. Evolutionary young links are in orange and evolutionary old links are in blue. Orange border specified that this TE or KRAB-ZNF evolved recently. (F) There are two modules in the network colored in pink and gray based on their bipartite modularity. Brown links indicate negative correlations and green links are positive correlations.