Figures and data

Annotation of ncORFs in humans and mice.
(A) Overview of the integrative pipeline for ncORF annotation in human and mouse genomes. (B-C) RPF coverage plot for a representative uORF (B) and a lncORF (C), visualized using the Ribo-Seq signal track from GWIPS-viz72. ncORFs were shown in yellow and CDSs in cyan. Dashed lines indicate identical genomic regions shared across transcript isoforms. (D-E) Proportion of GENCODE ncORFs rediscovered in this study. GENCODE ncORFs were stratified either by reproducibility across independent studies (D) or by tiered translation evidence strength (E). (F) Known ncORFs that were experimentally characterized in earlier studies and independently rediscovered in this study from humans and mice. CDS, coding sequence; FLOSS, fragment length organization similarity score; MANE, matched annotation from NCBI and EBI; ncORF, non-canonical open reading frame; ORF, open reading frame; RPF, ribosome-protected fragment.

Sequence features of ncORFs and putative ncEPs in humans and mice.
(A) Number of ncORFs belonging to different categories. (B) Distribution of ncORF lengths. (C) Proportion of ncEPs that contain known Pfam domains. (D) Number of ncORFs with Pfam domains among lncORFs and other ncORF categories. (E) Distribution of the proportion of intrinsically disordered residues in CDS- and ncORF-encoded proteins. (F) Most frequently identified Pfam domains among human and mouse ncORFs. (G) Proportion of ncORFs overlapping with TEs, stratified by presence or absence of Pfam domains. Differences were assessed using Wilcoxon rank-sum tests. CDS, coding sequence; ncORF, non-canonical open reading frame; ncEP, ncORF-encoded protein; TE, transposable element.

Evolutionary constraints of ncORFs.
(A) The mean PhyloP scores of 30 base pairs upstream and downstream of the ORF start or stop codons in mammals. Codons were delineated with bars of alternating colors by different frames, and nucleotides in untranslated regions were shown in grey. The significance of three-nucleotide periodicity was assessed by autocorrelation with a lag of three. (B) Cladograms of vertebrates with the number of ncORFs originating at each ancestral branch of humans. Triangles indicate species merged into larger clade for visual simplicity. (C) Relationship between ORF origination rates and node ages measured as node-to-tip distances. Origination rate was defined as the number of ncORFs that originated on a branch divided by the branch length. Blue lines show linear regression fits, and grey bands represent 95% prediction confidence intervals. Spearman’s correlation is indicated. (D) Distribution of ncORF PhyloCSF scores normalized by the number of codons in per ncORF. The dashed line denotes zero. CDS, coding sequence; ncORF, non-canonical open reading frame; ORF, open reading frame.

Lineage-specifically conserved ncORFs.
(A) Schematic illustration for BLS calculation. (B) Cladograms of vertebrates showing the number of lineage-specifically conserved ncORFs (local BLS > 0.9) at each ancestral node for humans and mice. (C) Distribution of PhyloCSF scores per codon for lineage-specifically conserved ncORFs compared to all other ncORFs. BLS, branch length score; ncORF, non-canonical open reading frame; ORF, open reading frame.

Evolutionary dynamics of ncORF expression.
(A) Distribution of mean translation levels of human ncORFs grouped by their origin nodes and further stratified by local BLS. Statistical significance was assessed using Wilcoxon rank-sum tests. ***, P < 0.001; **, P < 0.01; *, P < 0.05. (B) Similar to (A) but ncORFs are further stratified by local BLS. (C) Relationship between ncORF tissue specificity at translation and transcription levels. Differences were determined with Wilcoxon signed-rank tests. (D-E) Similar to (A) and (B) but showing the distribution of tissue specificity at the translation level. CDS, coding sequence; ncORF, non-canonical open reading frame; ORF, open reading frame.

ncORF-CDS co-translation network.
(A) Bipartite network of ncORF-CDS co-translation in humans. (B) Bipartite network of ncORF-CDS co-translation in mice. ncORFs and CDSs are represented by different shapes and colored according to their cluster membership. Only the largest clusters were highlighted (top two in humans and top five in mice). (C) Top five enriched gene ontology terms for each of the two largest clusters in the human network shown in (A). (D) Top five enriched gene ontology terms for each of the three largest clusters in the mouse network shown in (B). (E) Proportion of ncORFs cotranslating with CDSs among ancient (pre-mammalian origin) versus younger (mammalian-specific) ncORFs. Differences were tested with Fisher’s exact tests. CDS, coding sequence; ncORF, non-canonical open reading frame; ORF, open reading frame.