Protein evidence of unannotated ORFs in Drosophila reveals diversity in the evolution and properties of young proteins

  1. Eric B Zheng
  2. Li Zhao  Is a corresponding author
  1. Rockefeller University, United States

Abstract

De novo gene origination, where a previously non-genic genomic sequence becomes genic through evolution, has been increasingly recognized as an important source of evolutionary novelty across diverse taxa. Many de novo genes have been proposed to be protein-coding, and in several cases have been experimentally shown to yield protein products. However, the systematic study of de novo proteins has been hampered by doubts regarding the translation of their transcripts without the experimental observation of protein products. Using a systematic, ORF-focused mass-spectrometry-first computational approach, we identify almost 1000 unannotated open reading frames with evidence of translation (utORFs) in the model organism Drosophila melanogaster, 371 of which have canonical start codons. To quantify the comparative genomic similarity of these utORFs across Drosophila and to infer phylostratigraphic age, we further develop a synteny-based protein similarity approach. Combining these results with reference datasets on tissue- and life-stage-specific transcription and conservation, we identify different properties amongst these utORFs. Contrary to expectations, the fastest-evolving utORFs are not the youngest evolutionarily. We observed more utORFs in the brain than in the testis. Most of the identified utORFs may be of de novo origin, even accounting for the possibility of false-negative similarity detection. Finally, sequence divergence after an inferred de novo origin event remains substantial, raising the possibility that de novo proteins turn over frequently. Our results suggest that there is substantial unappreciated diversity in de novo protein evolution: many more may exist than have been previously appreciated; there may be divergent evolutionary trajectories; and de novo proteins may be gained and lost frequently. All in all, there may not exist a single characteristic model of de novo protein evolution, but instead, there may be diverse evolutionary trajectories for de novo proteins.

Data availability

Raw MS data are deposited in PRIDE under accession number PXD032197. Relevant scripts and intermediate files can be found in our Github repository https://github.com/LiZhaoLab/utORF_mass_spec.

The following data sets were generated

Article and author information

Author details

  1. Eric B Zheng

    Laboratory of Evolutionary Genetics and Genomics, Rockefeller University, New York, United States
    Competing interests
    The authors declare that no competing interests exist.
  2. Li Zhao

    Laboratory of Evolutionary Genetics and Genomics, Rockefeller University, New York, United States
    For correspondence
    lzhao@rockefeller.edu
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-6776-1996

Funding

National Institute of General Medical Sciences (R35GM133780)

  • Li Zhao

National Institute of General Medical Sciences (T32GM007739)

  • Eric B Zheng

Robertson Foundation

  • Li Zhao

Rita Allen Foundation (Rita Allen Foundation Scholar)

  • Li Zhao

Vallee Foundation (Vallee Scholar)

  • Li Zhao

Monique Weill-Caulier Trust

  • Li Zhao

Alfred P. Sloan Foundation (Alfred P. Sloan Research Fellowship)

  • Li Zhao

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Copyright

© 2022, Zheng & Zhao

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,691
    views
  • 354
    downloads
  • 9
    citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Eric B Zheng
  2. Li Zhao
(2022)
Protein evidence of unannotated ORFs in Drosophila reveals diversity in the evolution and properties of young proteins
eLife 11:e78772.
https://doi.org/10.7554/eLife.78772

Share this article

https://doi.org/10.7554/eLife.78772

Further reading

    1. Evolutionary Biology
    Asher D Cutter
    Review Article

    Haldane’s rule occupies a special place in biology as one of the few ‘rules’ of speciation, with empirical support from hundreds of species. And yet, its classic purview is restricted taxonomically to the subset of organisms with heteromorphic sex chromosomes. I propose explicit acknowledgement of generalized hypotheses about Haldane’s rule that frame sex bias in hybrid dysfunction broadly and irrespective of the sexual system. The consensus view of classic Haldane’s rule holds that sex-biased hybrid dysfunction across taxa is a composite phenomenon that requires explanations from multiple causes. Testing of the multiple alternative hypotheses for Haldane’s rule is, in many cases, applicable to taxa with homomorphic sex chromosomes, environmental sex determination, haplodiploidy, and hermaphroditism. Integration of a variety of biological phenomena about hybrids across diverse sexual systems, beyond classic Haldane’s rule, will help to derive a more general understanding of the contributing forces and mechanisms that lead to predictable sex biases in evolutionary divergence and speciation.

    1. Evolutionary Biology
    Zofia Dubicka, Jarosław Tyszka ... Ulf Bickmeyer
    Research Article

    Living organisms control the formation of mineral skeletons and other structures through biomineralization. Major phylogenetic groups usually consistently follow a single biomineralization pathway. Foraminifera, which are very efficient marine calcifiers, making a substantial contribution to global carbonate production and global carbon sequestration, are regarded as an exception. This phylum has been commonly thought to follow two contrasting models of either in situ ‘mineralization of extracellular matrix’ attributed to hyaline rotaliid shells, or ‘mineralization within intracellular vesicles’ attributed to porcelaneous miliolid shells. Our previous results on rotaliids along with those on miliolids in this paper question such a wide divergence of biomineralization pathways within the same phylum of Foraminifera. We have found under a high-resolution scanning electron microscopy (SEM) that precipitation of high-Mg calcitic mesocrystals in porcelaneous shells takes place in situ and form a dense, chaotic meshwork of needle-like crystallites. We have not observed calcified needles that already precipitated in the transported vesicles, what challenges the previous model of miliolid mineralization. Hence, Foraminifera probably utilize less divergent calcification pathways, following the recently discovered biomineralization principles. Mesocrystalline chamber walls in both models are therefore most likely created by intravesicular accumulation of pre-formed liquid amorphous mineral phase deposited and crystallized within the extracellular organic matrix enclosed in a biologically controlled privileged space by active pseudopodial structures. Both calcification pathways evolved independently in the Paleozoic and are well conserved in two clades that represent different chamber formation modes.