Protein evidence of unannotated ORFs in Drosophila reveals diversity in the evolution and properties of young proteins

  1. Eric B Zheng
  2. Li Zhao  Is a corresponding author
  1. Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, United States
8 figures, 1 table and 2 additional files

Figures

Identification of unannotated translated open reading frames (utORFs) and their properties.

(A) We detected utORFs by searching via a two-round approach through a comprehensive database of potential open reading frames (ORFs) that was generated from a six-frame translation of the …

Figure 1—source data 1

All unannotated translated open reading frames (utORFs) sequences.

https://cdn.elifesciences.org/articles/78772/elife-78772-fig1-data1-v2.csv
Figure 1—source data 2

Unannotated translated open reading frame (utORF) supporting peptides.

https://cdn.elifesciences.org/articles/78772/elife-78772-fig1-data2-v2.csv
Figure 1—source data 3

Unannotated translated open reading frame (utORF) locations.

https://cdn.elifesciences.org/articles/78772/elife-78772-fig1-data3-v2.txt
Synteny-based orthology detection and protein sequence similarity quantification.

(A) When performing a simple homology search for a locus of interest (red arrow) across a given genome (blue line), the search space is orders of magnitude larger, requiring heuristic shortcuts to …

Figure 3 with 1 supplement
Inferred gene ages of the unannotated translated open reading frames (utORFs).

(A) The reference phylogenetic tree used for these analyses (UCSC 27-way insect alignment). Abbreviations are as follows: D. mel: Drosophila melanogaster, D. sim: D. simulans, D. sec: D. sechellia, D…

Figure 3—figure supplement 1
Robustness of gene age inferences with respect to significance threshold.

(A) Figure 3A, but also showing the melanogaster species subgroup, species group, and Drosophila taxa. (B) Change in furthest significant ortholog (using a significance threshold of 2.3 instead of …

Figure 4 with 3 supplements
Latent class analysis of the unannotated translated open reading frames (utORFs) reveals differences between classes.

(A) Class 1 is notably distinct for strong bias toward intergenic and antisense locations at the expense of sense locations. Class 2 is notable for being relatively unbiased and for being the only …

Figure 4—source data 1

Unannotated translated open reading frame (utORF) inferred latent class analysis (LCA) classes.

https://cdn.elifesciences.org/articles/78772/elife-78772-fig4-data1-v2.csv
Figure 4—figure supplement 1
Posterior probabilities per unannotated translated open reading frame (utORF) of class membership inferred from latent class analysis for all utORFs.
Figure 4—figure supplement 2
Latent class analysis of unannotated translated open reading frames (utORFs) with canonical start sites reveals differences between classes.

(A–E) Same as Figure 4 but examining utORFs with canonical start sites.

Figure 4—figure supplement 3
Posterior probabilities per unannotated translated open reading frame (utORF) of class membership inferred from latent class analysis for utORFs with canonical start sites.
Figure 5 with 2 supplements
Differences between inferred classes recapitulate expected trends in age and conservation and reveal surprising trends in lengths and expression.

(A) As expected, phastCons conservation scores vary by class. Scores near 0 indicate low conservation, while scores near 1 indicate high conservation. Note that fast-evolving and melanogaster-specifi…

Figure 5—figure supplement 1
Transcription of unannotated translated open reading frames (utORFs) in selected tissues.

Top panel: utORFs, separated by inferred latent class analysis (LCA) class. Mean TPMs across the given tissue in FlyAtlas2 are log10-transformed with a pseudocount of 1E-3. Horizontal line marks an …

Figure 5—figure supplement 2
Differences between inferred classes for unannotated translated open reading frames (utORFs) with canonical start sites recapitulate expected trends.

(A–G) Same as Figure 5 but examining utORFs with canonical start sites.

Many unannotated translated open reading frames (utORFs) have evidence consistent with a de novo origin.

(A) Proportion of utORFs by inferred class with genomic conservation consistent with de novo origin. Box widths correlate with size of class (Table 1). (B) Number of supporting outgroups by inferred …

Figure 7 with 2 supplements
Independent validation of unannotated translated open reading frame (utORF) identification.

(A) Cumulative distribution of differences between observed and predicted retention times for peptide-spectrum matches (PSMs) of peptides supporting annotated FlyBase proteins (orange) and PSMs of …

Figure 7—figure supplement 1
Cumulative distribution of differences between observed and predicted retention times for every peptide-spectrum match of peptides supporting all annotated FlyBase proteins (blue) and all unannotated translated open reading frames (utORFs) (orange).
Figure 7—figure supplement 2
Proportion of unannotated translated open reading frames (utORFs) by inferred class with supporting evidence from ribosome profiling.

Box widths correlate with size of class (Table 1).

Author response image 1

Tables

Table 1
Latent class analysis of all unannotated translated open reading frames (ORFs).
ClassInterpretationEstimated percentNumber
1Putatively nonfunctional loci4.35%41
2melanogaster-specific ORFs5.71%54
3Fast-evolving ORFs12.03%96
4General unannotated ORFs57.61%591
5Alternative-frame ORFs20.30%161

Additional files

Download links