Comparison of TE expression analysis software

TEKRABber and the overview of the analysis workflow.

(A) Two independent RNA-seq datasets, Primate Brain Data and Mayo Data, were analyzed in this study. (1) Transcriptomic data were first preprocessed with removing adapters and low-quality reads and then mapped to their reference genome using STAR to generate BAM files. (2) TEtranscript was used to quantify the expression of genes and TEs. (3) Expression profiles were normalized across different species. (4) DE analysis and pairwise correlations were calculated. Steps (3) and (4) were developed together in an R Bioconductor package, TEKRABber. (B) The user interface of TEKRABber features a dashboard layout that allows users to explore one-to-one gene-TE interaction, including correlation and differential expression results. (More details in Materials and Methods section)

Expression of KRAB-ZNF genes and TEs in Primate Brain Data.

(A) t-SNE plots of the expression of KRAB-ZNF genes and TEs from all 422 samples including human, chimpanzee, bonobos, and macaques labeled by species and different brain regions. (B) Differentially expressed KRAB-ZNF genes and TEs comparing human and chimpanzee in primary and secondary cortices. (C) Species tree with the inferred numbers of TEs and KRAB-ZNFs that have evolved per branch. (Note: There are 247 relatively old TEs and 52 KRAB-ZNFs that were difficult to place into a specific branch. Thus, they are not presented in this panel) (D) Expression of KRAB-ZNF genes and TEs in primary and secondary cortices across species. Both KRAB-ZNF genes and TEs were grouped in two groups based on their inferred evolutionary age, old (> 44.2 mya) and young (≤ 44.2 mya). Young KRAB-ZNFs and young TEs have lower expression levels (Wilcoxon Rank Sum Test, p < 0.05). Expressions of all brain regions can be found in Figure S4. (E) Percentage of differentially expressed KRAB-ZNF genes and TEs in humans compared to chimpanzees in primary and secondary cortices and limbic and association cortices. (F) Human-specific DE (human-specifically-changed) KRAB-ZNF genes and TEs in primary and secondary cortices compared to NHPs. Gray indicates no expression information. The colors for age inferences in (C), (D), (E), and (F) are the same: blue for evolutionary old and orange for evolutionary young KRAB-ZNFs and TEs, respectively.

TE:KRAB-ZNF in human primary and secondary cortices.

(A) and (B) demonstrate the workflow for checking significant TE:KRAB-ZNF using the primary and secondary cortices as an example. (A) We used random select gene sets and KRAB-ZNFs to calculate correlations with TEs. The violet dots indicate the correlation counts of TE:KRAB-ZNF based on comparing all correlations, positive correlations and negative correlations. They are significantly higher than random gene sets (boxplots below, 1000 iterations, p<0.001). (B) Finding overlaps between TE:KRAB-ZNF and the KRAB-ZNF protein ChIP-exo data (Imbeault et al., 2017). The overlapped counts are labeled on the y-axis and x axis is the correlation coefficient. Note that we use absolute coefficient values for negative correlations. Samples under the yellow area are selected (C) Jaccard similarity is used to demonstrate the correlations results were significantly overlapped (higher) with ChIP-exo data than random selected TE and KRAB-ZNF (p < 0.001). The points indicated the overlap with actual correlations and ChIP-exo data. The boxplots indicate the overlap between randomly selected TE and KRAB-ZNF pairs with ChIP-exo data. (D) Subsets of the number of positive and negative TE:KRAB-ZNF in the primary and secondary cortices (c1_p and c1_n) and the limbic and association cortices (c2_p and c2_n). (E) TE:KRAB-ZNF network in the primary and secondary cortices with five modules. Nodes are colored in five colors representing five modules and nodes in white do not belong to any module. Young links are in orange and old links are in blue. (F) Distribution of the normalized degree counts in TE and KRAB-ZNF nodes in TE:KRAB-ZNF network. (G) This is the subnetwork colored in pink from (E), showing this module mainly consisted of Alu subfamilies. (H) The log count of correlations classified by TEs and the categories of link, including positive-old (P-O), positive-young (P-Y), negative-old (N-O) and negative young (N-Y). Red asterisks indicate that the class distribution of the TEs is significantly different (Chi-squared test, p<0.001). The right-hand side barplot shows the exact count of Alu subfamilies from the first row in the heatmap.

Comparison of TE:KRAB-ZNF in human and NHPs.

(A) Workflow for selecting TE:KRAB-ZNF comparing between species. First, there were 178 TEs and 836 KRAB-ZNFs detected in all four species. Second, the leave-one-out test in the human sample was performed for a fair comparison since humans had four repeats and NHPs only had three. (adjusted p < 0.01 and absolute coefficient > 0.4) (B) Number of positive and negative correlations in human and NHPs. Red brackets indicated the change of correlation between two species. For example, there are 276 human positive correlations which are negatively correlated in bonobos (suffix n: negative, p: positive; hs: Homo sapiens, pt: Pan troglodytes, pp: Pan paniscus, mm: Macaca mulatta). (C) 276 TE:KRAB-ZNF network that were all positive correlations in humans but negatively correlated in bonobos. This network demonstrates two hubs, ZNF528 and ZNF112, connecting to multiple TE subfamilies. Node size of TEs refers to the relative abundance of connections to the hubs. Details of this network can be found in Figure S9. (D) ZNF528 protein sequence difference in zinc finger domain (ZF) comparing humans with bonobos. Comparing humans and bonobos at the −1 finger position, humans have Glutamine (Q) while bonobos have Histidine (H). The lower part of the illustration indicates that the zinc finger domain binds to the DNA sequence using the −1, 3, and 6 finger positions. (E) Number of different TE subfamily nodes which form evolutionary old and young correlations in (C) network comparing humans to bonobos. (F) Distribution counts of human-specific correlations categorized based on TE subfamilies showing only young links. N-Y: negative and young correlations; P-Y: positive and young correlations.

Datasets

Expression of TEs and KRAB-ZNF genes in Mayo Data.

(A) Variations in the expression of KRAB-ZNF genes and TEs using t-SNE analysis. cbe: cerebellum; tcx: temporal cortex. (B) Distributions of the expression of evolutionary old and youngKRAB-ZNF genes and TEs expression. (C) and (D) Differentially expressed KRAB-ZNF genes and TEs (absolute log2FoldChange > 0.5, p < 0.05) in temporal cortex (tcx) and cerebellum (cbe). The expression of KRAB-ZNF genes and TEs in cerebellum shared the same log expression scale.

TE:KRAB-ZNF analysis in Mayo Data.

(A) Overlaps of TE:KRAB-ZNF between control and AD condition in temporal cortex and cerebellum (denoted as cbe_control, tcx_control, cbe_AD and tcx_AD). (B) 21 human-control-specific TE:KRAB-ZNFs were selected from the intersection of human-specific TE:KRAB-ZNFs from Primate Brain Data (not detected in the other NHPs) and control-specific TE:KRAB-ZNFs from Mayo Data in temporal cortex (not detected in AD samples). (C) Distribution of TE families counts involve 21 control-specific TE:KRAB-ZNF in the temporal cortex. (D) Comparison of the expression and correlation results of AluYc:ZNF182 and L1MA6:ZNF211 in the temporal cortex. (E) and (F) are the bipartite network of 21 TE:KRAB-ZNF in the temporal cortex. (E) Coloring TE nodes in green and KRAB-ZNF nodes in violet. Evolutionary young links are in orange and evolutionary old links are in blue. Orange border specified that this TE or KRAB-ZNF evolved recently. (F) There are two modules in the network colored in pink and gray based on their bipartite modularity. Brown links indicate negative correlations and green links are positive correlations.