(A) Density plots of transcript length. (B) Box-plots of transcript expression level in log2(FPKM) units. lncRNA_ribo: lncRNAs associated with ribosomes; lncRNA_noribo: lncRNAs for which association …
The percentage of transcripts associated with ribosomes is shown for several transcript expression intervals. codRNA: annotated coding transcripts encoding experimentally verified proteins (except …
Cumulative distribution of TE values in human codRNAs, lncRNAs, and 3′UTR sequences. We randomly selected 3′UTRs with a minimum length of 30 nucleotides to build a set of 3′UTR sequences with the …
Box-plots of transcript translational efficiency (TE) in log2(TE) units. The area within the box-plot comprises 50% of the data, and the line represents the median value. lncRNA: lncRNAs for which …
Single isoforms correspond to data for genes with a single transcript. The number of such genes was 2961 codRNA and 246 lncRNA_ribo for mouse, 2853 codRNA and 150 lncRNA_ribo for human, 9352 codRNA …
(A) Density plot of the relative length of the primary ORF in lncRNA_ribo and codRNA with respect to transcript length. For comparison data for the longest ORF in lncRNA_noribo is also shown (except …
In codRNAs and lncRNA_ribo, we selected the primary ORF (the ORF with the largest number of ribosome profiling reads), whereas in lncRNA_noribo we selected the longest ORF.
(A) Box-plots of TE distribution in primary ORF, 5′UTR, and 3′UTR regions. The analysis considered only genes with one isoform, with UTR and ORF regions expressed at >0.2 FPKM and with 5′UTR and …
(A) Box-plots of TE distribution in primary ORF, 5′UTR, and 3′UTR regions. The analysis considered only annotated transcripts, with UTR and ORF regions expressed at >0.2 FPKM and with 5′UTR and …
We restricted this analysis to transcripts with ORF and UTR regions expressed at >0.2 FPKM and with 5′UTR and 3′UTR longer than 30 nucleotides. (A) Expressed at low levels: transcripts expressed at …
Intron: randomly selected intronic regions; lncRNA_noribo: lncRNAs not associated with ribosomes; lncRNA_ribo: lncRNAs associated with ribosomes; pseudogene: pseudogenes associated with ribosomes; …
Comparison between lncRNAs associated and not associated with ribosomes using the longest ORF in both cases (lncRNA_ribo and lncRNA_noribo, respectively). Differences between lncRNA_ribo and …
Comparison between different transcript classes using only annotated lncRNAs. Yeast transcriptome is composed of very few annotated lncRNAs, and this analysis could not be performed.
Comparison between different transcript classes using only lncRNA with no homologues (noH) in other species. Only species in which several lncRNA_ribo and lncRNA_noribo had homology matches were …
Here we only employed lncRNAs in which the primary ORF was shorter than 100 amino acids. codRNA refers to joined codRNAe and codRNAne sets, since experimentally verified proteins are usually longer …
Equal dicodon was based on the observed hexamer frequencies in coding sequences vs hexamer equiprobability, intron dicodon was based on the differences between hexamer frequencies in coding vs …
(A) Mouse CUFF.34338.1 (chr5:113183493–113188347) is a novel lncRNA, it contains an ORF encoding a 169 amino acid protein associated with ribosomes and with protein-coding homologues in human, …
PN/PS: ratio between the number of non-synonymous and synonymous single nucleotide polymorphisms (SNPs) in the complete set of primary ORFs for a given class of transcripts (in lncRNA_noribo the …
Data sets used in the study
Species | GEO Accession | Mapped reads (millions) | Max read length (bp) | Description | Reference | |
---|---|---|---|---|---|---|
Mouse M. musculus | RNA-seq | GSE30839 | 226.0 | 43 | ES cells, E14 | Ingolia et al., 2011 |
Ribosome profiling | GSE30839 | 39.2 | 47 | |||
Human H. sapiens | RNA-seq | GSE22004 | 29.8 | 36 | HeLa cells | Guo et al., 2010 |
Ribosome profiling | GSE22004 | 78.3 | 36 | |||
Zebrafish D. rerio | RNA-seq | GSE32900 | 1382.2 | 2 × 75 | Series of developmental stages | Chew et al., 2013 |
Ribosome profiling | GSE46512 | 1040.0 | 44 | |||
Fruit fly D. melanogaster | RNA-seq | GSE49197 | 1317.9 | 50 | 0–2hr embryos, wild type | Dunn et al., 2013 |
Ribosome profiling | GSE49197 | 105.7 | 50 | |||
Arabidopsis A. thaliana | RNA-seq | GSE50597 | 79.8 | 51 | No stress conditions, TRAP purification | Juntawong et al., 2014 |
Ribosome profiling | GSE50597 | 140.3 | 51 | |||
Yeast S. cerevisiae | RNA-seq | GSE52119 | 20.54 | 50 | GSY83, diploid | McManus et al., 2014 |
Ribosome profiling | GSE52119 | 6.83 | 50 |
Fraction of transcripts associated with ribosomes
codRNA | lncRNA | |||||
---|---|---|---|---|---|---|
Expressed | Associated with ribosomes (RP) | Expressed | Associated with ribosomes (RP) | |||
Total | Stringent | Total | Stringent | |||
Mouse | 14,245 | 14,196 (99.7%) | 13,918 (97.7%) | 476 | 390 (81.9%) | 367 (77.1%) |
Human | 17,011 | 16,630 (97.8%) | 16,617 (97.7%) | 934 | 403 (43.1%) | 343 (36.7%) |
Zebrafish | 12,595 | 11,643 (92.4%) | 11,637 (92.4%) | 2392 | 726 (30.4%) | 684 (28.6%) |
Fruit fly | 8041 | 8031 (99.9%) | 7623 (94.8%) | 28 | 22 (78.6%) | 10 (35.7%) |
Arabidopsis | 19,162 | 18,879 (98.5%) | 10,329 (53.9%) | 139 | 93 (66.9%) | 68 (48.9%) |
Yeast | 4740 | 4547 (95.9%) | 4335 (91.5%) | 21 | 6 (28.6%) | 6 (28.6%) |
Stringent: number of transcripts significant at p < 0.05 using 3′UTRs as a null model (see ‘Materials and methods’ for more details).
Fraction of translated proteins of different size detected in proteomics databases
Protein size (amino acids) | ||||
---|---|---|---|---|
Species | 24–80 | 81–130 | 131–180 | >180 |
Mouse | 27/58 (46.6%) | 222/286 (77.6%) | 256/330 (77.6%) | 3716/4786 (77.7%) |
Human | 116/272 (42.6%) | 536/748 (71.7%) | 669/875 (76.5%) | 6757/8964 (75.4%) |
Yeast | 27/30 (90.0%) | 168/207 (81.1%) | 234/265 (88.3%) | 2934/3224 (91.0%) |
Only transcripts encoding experimentally validated proteins (codRNAe) were considered.
Long non-coding RNAs as a source of new peptides. (A) Details on the number of coding transcripts associated with ribosomes. (B) ORF density and length in different types of transcripts. (C) Details on the number of non-coding transcripts associated with ribosomes. (D) Homology hits for ORFs. (E) GC content (%) in ORFs and complete sequences. (F) PN and PS values for different sequence subsets.
(A) Human ncRNA literature. (B) IncRNA homologies. (C) IncRNA top coding score. (D) Young codRNAe.