Study Overview. (A) The relationships between the chimpanzees, the three archaic humans (Altai Neanderthals, Denisovans, and Vindija Neanderthals), and the three modern human populations. Dashed lines indicate the phylogenetic distances from modern humans to archaic humans and chimpanzees. Genes and DBSs have the following implications – the DBS in B2M lacks a counterpart in chimpanzees; the DBS in ABL2 has great differences between archaic and modern humans; the DBS in IRNAR1 is highly polymorphic in modern humans. Red letters in DBSs indicate tissue-specific eQTLs or population-specific favored mutations. (B) DBSs’ mean length and binding affinity. (C) The target genes and transcripts of 66 HS lncRNAs. (D) The targeting relationships between HS lncRNAs. (E) The sequence distances of DBSs (the top 40%) from modern humans to chimpanzees and archaic humans. (F) The impacts of interactions between HS lncRNAs and DBSs on gene expression in GTEx tissues.

Genes with strongest DBSs and DBSs with mostly changed sequence distances from modern humans to archaic humans and chimpanzees

1256 target genes whose DBSs have largest distances from modern humans to chimpanzees and Altai Neanderthals are enriched in different Biological Processes GO terms. Upon significance threshold = 0.05 (Benjamini-Hochberg FDR), the two gene sets in chimpanzees and Altai Neanderthals are enriched in 199 and 409 GO terms (50 < terms size < 1000), respectively. Pink color indicates shared GO terms. Left: Top GO terms enriched in genes in chimpanzees. Right: Top GO terms enriched in genes in Altai Neanderthals.

Genes with most polymorphic DBSs and DBSs with mostly changed sequence distances from modern humans to archaic humans and chimpanzees

The impact of HS lncRNA-DBS interaction on gene expression in GTEx tissues and organs. (A) The percentage distribution of correlated HS lncRNA-target transcript pairs across GTEx tissues and organs. Higher percentages of correlated pairs are in the brain than in other tissues and organs. (B) The distribution of significantly changed DBSs in HS lncRNA-target transcript pairs across GTEx tissues and organs between archaic and modern humans. Orange, red, and dark red indicate significant DBS changes from Denisovans (D), Altai Neanderthals and Denisovans (A.D.), and all of the three archaic humans (ADV). DBSs in HS lncRNA-target transcript pairs correlated in seven brain regions (in dark red) have changed significantly and consistently since the Altai Neanderthals, Denisovans, and Vindija Neanderthals (one-sided two-sample Kolmogorov-Smirnov test, significant changes determined by FDR <0.001).

DBSs of the HS lncRNA RP11-423H2.3 in two genomic regions. In the upper and lower panels that display two genomic regions, from top to bottom, are tracks of DBSs (orange peaks), gene annotation, histone modification signals in cell lines, DNA methylation signals in cell lines, H3K4me3 RawSignal, and MRE CpG signals. DBSs overlap very well with DNA methylation and histone modification signals in multiple cell lines.

Predicted DBSs and experimentally identified (by CHART-seq) DNA binding sites of NEAT1 and MALAT1 in two cell lines (West et al., 2014). DBSs were predicted using the DNA sequences of CHART-seq peaks. 99% and 87% of experimentally identified DNA binding sites of NEAT1 and MALAT1 overlap with predicted DBSs. (A) Predicted DBSs and experimentally identified DNA binding sites of NEAT1 in three genomic regions. (B) Predicted DBSs and experimentally identified DNA binding sites of MALAT1 in three genomic regions.

Examples of co-localization of DBSs, TEs, and cCREs in the promoter regions of genes. (A) The DBSs of AC106795.2 in the promoter region of ADARB1. (B) The DBSs of AC106795.2 in the promoter region of CDC42EP1. (C) The DBS of AL008727.1 in the promoter region of CD81. (D) The DBS of AC007876.1 in the promoter region of DIDO1 and GID8.

The expression change of target genes was significantly larger than that of non-target genes after DBD knockout. The fold change of gene expression was computed using the edgeR package. The |fold change| distribution of target genes was compared with the |fold change| distribution of non-target genes (one-sided Mann-Whitney test). (A) The knockout of a 157 bp sequence (chr17:80252565-80252721 and contains the DBD of RP13-516M14.1) in the HeLa cell line. (B) The knockout of a 202 bp sequence (chr1:113392603-113392804 and contains the DBD of RP11-426L16.8) in the RKO cell line. (C) The knockout of a 198 bp sequence (chr17:19460524-19460721 and contains the DBD of SNORA59B) in the SK-MES-1 cell line. (D-E) The knockout of the DBD of a wrongly transcribed long noncoding RNA (chr1:156641670-156661464) in the A549 cell line and the HCT116 cell line. (F-G) The knockout of the DBD of a wrongly transcribed long noncoding RNA (chr10:52443915-52455313) in the A549 cell line and the HCT116 cell line. These wrongly transcribed long noncoding RNAs are labeled as “MSTRG transcripts by the Stringtie package.

Significant up- and down-regulation (|log2 (fold change)| > 1, FDR < 0.1) of target genes after DBD knockout. (A) RP13-516M14.1. (B) RP11-426L16.8. (C) SNORA59B.

Potential targeting regulation between HS lncRNAs. Brown and green regions in the circle indicate promoter regions and gene body regions. Arrows are from the gene body to promoter regions. The width of the arrows indicates the binding affinity of DBSs, and the sizes of blue dots indicate the number of DBSs of the lncRNA in the genome.

Some DBSs (indicated by blue bars) are in human-specific genome sequences. (A-D) The DBSs of RP11-848P1.4 in the genes ADCY2, CTD-3179P9.2, IPO11, and PRKAA1. (E) The DBS of RP11-598D14.1 in the gene NCAPG2.

Many genes and transcripts contain DBSs for multiple HS lncRNAs. (A) Left to right: the DBSs of RP11-65G9.1, LA16c-306A4.2, RP13-516M14.1, SNORA59B, RP11-423H2.3, and TTTY8/8B in the A1BG. (B) Left to right: the DBSs of TTTY8/8B, RP4-669L17.10, and RP11-423H2.3 in TLR1. (C) Left to right: the DBSs of LA16c-306A4.2, RP11-423H2.3, RP11-423H2.3, RP1-118J21.25, RP11-706O15.5, and SNORA59B in TMEM210B.

In the GNAS region, RP11-423H2.3 has a DBS (indicated by the blue bar) wherein a selection signal was detected in CEU and CHB (Tajima’s D=-0.99/-1.13/1.86 in CEU/CHB/YRI, integrated Fst=0.22), and has another DBS (indicated by the orange bar) wherein a selection signal was detected in YRI (Tajima’s D=0.25/1.09/-1.17 in CEU/CHB/YRI, integrated Fst=0.33).

HS lncRNAs on the Y chromosome often have longer DBSs than HS lncRNAs on the autosomes. The top panel shows that the DBS of TTTY2/2B in HLA-C (indicated by the blue bar) is longer than the two DBSs of RP11-423H2.3 (indicated by the green bars). The bottom panel shows that the DBS of TTTY8/8B in IFNAR1 (indicated by the blue bar) is longer than the DBS of LINC00279 (indicated by the green bar).

The LD of the key SNP in DBSs of HS lncRNAs in genes on some chromosomes. (A) The LD of the key SNP in the DBSs of LA16c-306A4.2 in some genes on chromosome 16. (B) The LD of the key SNP in the DBSs of RP11-423H2.3 in some genes on chromosome 1. (C) The LD of the key SNP in the DBSs of SNORA59B in some genes on chromosome 1. (D) The LD of the key SNP in the DBSs of TTTY8B in some genes on chromosome 16.

The top 2000 genes with strong binding affinities (left) and the bottom 2000 genes with weak binding affinities (right) are enriched in many common GO terms (marked in pink).

The bottom 2000 genes with weak binding affinities (right) are also enriched in many specific GO terms (marked in yellow) with relatively low significance.

The numbers of DBSs with large distances between modern humans and archaic humans and chimpanzees and from the human ancestor to modern humans and archaic humans and chimpanzees. Left: DBSs in the top 20% (4248) genes in chimpanzees have distances > 0.034, and DBSs having distances > 0.034 in genes in Altai Neanderthals, Denisovans, and Vindija Neanderthals are 1256, 2514, and 134. Right: DBSs in the top 20% (5033) genes have distances > 0.015 from the human ancestor to modern humans, and DBSs having distances > 0.015 from the human ancestor to Altai Neanderthals, Denisovans, Vindija Neanderthals, and chimpanzees are 6908, 9707, 5189, and 5521.

The most changed DBSs also have large sequence distances between humans and gorillas. (A) Scatter plot showing the sequence distances between humans and chimpanzees and between humans and gorillas. (B) The scatter plot shows the average sequence distances between humans and chimpanzees, the three archaic humans, and between humans and gorillas. The rho and p values were estimated using the Spearman correlation test.

Positive selection signals detected by the XP-CLR program in (A) RP11-848P1.4, (B) RP11-598D14.1, (C) CTD-2291D10.1 in CEU and CHB.

Favored mutations detected by iSAFE. The left and right vertical axes indicate iSAFE scores and recombination rate; the purple diamond marks the top-scored mutation; colors mark LD (r2) between the top-scored mutation and other ones; the yellow line indicates that the probability that mutations above are neutral is smaller than p = 1e-6; the blue curve indicates the position-specific recombination rates.

(A) SNPs in the RP11-598D14.1 gene. The top-scored SNP has DAF 0.125/0.960/0.922 in YRI/CEU/CHB. (B) SNPs in the AC006129.1 gene. The top-scored SNP has DAF 0.134/0.717/0.587 in YRI/CEU/CHB.

HS lncRNA genes with significantly changed Tajima’s D in CEU, CHB, and YRI. Negative and positive Tajima’s D scores, which are significantly smaller or larger than the genome-wide background in a population, indicate the signature of positive selection or balancing selection, respectively, in the population.

The LD of SNPs in HS lncRNA genes in CEU, CHB, and YRI. The red color indicates high LD values. These panels show that LD between SNPs in CEU and CHB in these lncRNA genes is stronger than LD between SNPs in YRI. (A) AC024592.9. (B) AC129929.5. (C) RP11-157B13.7. (D) RP11-277P12.10. (E) CTD-2291D10.1. (F) CTD-2142D14.1.

The distributions of sequence distances between DBSs in modern and archaic humans and between Ensembl-annotated promoters in modern and archaic humans, with the right-hand panels illustrating the distributions of distance > 0.005. (A) The distances between modern humans and Altai Neanderthals. (B) The distances between modern humans and Denisovans. (CD) The distances between modern humans and Vindija Neanderthals. It is most prominent in panel B that a fraction of DBSs has larger distances than promoters, agreeing with the finding of less gene flow from Denisovans to modern humans (Meyer et al., 2012). However, it is least prominent in panel C, agreeing that the evolutionary distance between Vindija Neanderthals and modern humans is short (Figure 1A).

The distribution of SNPs frequencies (MAF > 0.05) in DBSs.

The 14 SNPs have high DAF in YRI and are eQTLs exclusively in the GTEx tissue Thyroid.

DBSs have significantly higher eQTL density than promoters. DBSs and promoters harboring at least one eQTL were used to compute eQTL density and make the comparison. A one-sided Mann-Whitney test was used to compute the p value.