Protein evidence of unannotated ORFs in Drosophila reveals diversity in the evolution and properties of young proteins
Abstract
De novo gene origination, where a previously non-genic genomic sequence becomes genic through evolution, has been increasingly recognized as an important source of evolutionary novelty across diverse taxa. Many de novo genes have been proposed to be protein-coding, and in several cases have been experimentally shown to yield protein products. However, the systematic study of de novo proteins has been hampered by doubts regarding the translation of their transcripts without the experimental observation of protein products. Using a systematic, ORF-focused mass-spectrometry-first computational approach, we identify almost 1000 unannotated open reading frames with evidence of translation (utORFs) in the model organism Drosophila melanogaster, 371 of which have canonical start codons. To quantify the comparative genomic similarity of these utORFs across Drosophila and to infer phylostratigraphic age, we further develop a synteny-based protein similarity approach. Combining these results with reference datasets on tissue- and life-stage-specific transcription and conservation, we identify different properties amongst these utORFs. Contrary to expectations, the fastest-evolving utORFs are not the youngest evolutionarily. We observed more utORFs in the brain than in the testis. Most of the identified utORFs may be of de novo origin, even accounting for the possibility of false-negative similarity detection. Finally, sequence divergence after an inferred de novo origin event remains substantial, raising the possibility that de novo proteins turn over frequently. Our results suggest that there is substantial unappreciated diversity in de novo protein evolution: many more may exist than have been previously appreciated; there may be divergent evolutionary trajectories; and de novo proteins may be gained and lost frequently. All in all, there may not exist a single characteristic model of de novo protein evolution, but instead, there may be diverse evolutionary trajectories for de novo proteins.
Data availability
Raw MS data are deposited in PRIDE under accession number PXD032197. Relevant scripts and intermediate files can be found in our Github repository https://github.com/LiZhaoLab/utORF_mass_spec.
Article and author information
Author details
Funding
National Institute of General Medical Sciences (R35GM133780)
- Li Zhao
National Institute of General Medical Sciences (T32GM007739)
- Eric B Zheng
Robertson Foundation
- Li Zhao
Rita Allen Foundation (Rita Allen Foundation Scholar)
- Li Zhao
Vallee Foundation (Vallee Scholar)
- Li Zhao
Monique Weill-Caulier Trust
- Li Zhao
Alfred P. Sloan Foundation (Alfred P. Sloan Research Fellowship)
- Li Zhao
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Reviewing Editor
- Mia T Levine, University of Pennsylvania, United States
Publication history
- Received: March 18, 2022
- Preprint posted: April 5, 2022 (view preprint)
- Accepted: September 26, 2022
- Accepted Manuscript published: September 30, 2022 (version 1)
- Version of Record published: October 13, 2022 (version 2)
Copyright
© 2022, Zheng & Zhao
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 1,085
- Page views
-
- 280
- Downloads
-
- 0
- Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Developmental Biology
- Evolutionary Biology
We have focused on the mushroom bodies (MB) of Drosophila to determine how the larval circuits are formed and then transformed into those of the adult at metamorphosis. The adult MB has a core of thousands of Kenyon neurons; axons of the early-born g class form a medial lobe and those from later-born a'b' and ab classes form both medial and vertical lobes. The larva, however, hatches with only g neurons and forms a vertical lobe 'facsimile' using larval-specific axon branches from its g neurons. Computations by the MB involves MB input (MBINs) and output (MBONs) neurons that divide the lobes into discrete compartments. The larva has 10 such compartments while the adult MB has 16. We determined the fates of 28 of the 32 types of MBONs and MBINs that define the 10 larval compartments. Seven larval compartments are eventually incorporated into the adult MB; four of their larval MBINs die, while 12 MBINs/MBONs continue into the adult MB although with some compartment shifting. The remaining three larval compartments are larval specific, and their MBIN/MBONs trans-differentiate at metamorphosis, leaving the MB and joining other adult brain circuits. With the loss of the larval vertical lobe facsimile, the adult vertical lobes, are made de novo at metamorphosis, and their MBONs/MBINs are recruited from the pool of adult-specific cells. The combination of cell death, compartment shifting, trans-differentiation, and recruitment of new neurons result in no larval MBIN-MBON connections persisting through metamorphosis. At this simple level, then, we find no anatomical substrate for a memory trace persisting from larva to adult. For the neurons that trans-differentiate, our data suggest that their adult phenotypes are in line with their evolutionarily ancestral roles while their larval phenotypes are derived adaptations for the larval stage. These cells arise primarily within lineages that also produce permanent MBINs and MBONs, suggesting that larval specifying factors may allow information related to birth-order or sibling identity to be interpreted in a modified manner in these neurons to cause them to adopt a modified, larval phenotype. The loss of such factors at metamorphosis, though, would then allow these cells to adopt their ancestral phenotype in the adult system.
-
- Evolutionary Biology
- Genetics and Genomics
Causal loss-of-function (LOF) variants for Mendelian and severe complex diseases are enriched in 'mutation intolerant' genes. We show how such observations can be interpreted in light of a model of mutation-selection balance, and use the model to relate the pathogenic consequences of LOF mutations at present-day to their evolutionary fitness effects. To this end, we first infer posterior distributions for the fitness costs of LOF mutations in 17,318 autosomal and 679 X-linked genes from exome sequences in 56,855 individuals. Estimated fitness costs for the loss of a gene copy are typically above 1%; they tend to be largest for X-linked genes, whether or not they have a Y homolog, followed by autosomal genes and genes in the pseudoautosomal region. We then compare inferred fitness effects for all possible de novo LOF mutations to those of de novo mutations identified in individuals diagnosed with one of six severe, complex diseases or developmental disorders. Probands carry an excess of mutations with estimated fitness effects above 10%; as we show by simulation, when sampled in the population, such highly deleterious mutations are typically only a couple of generations old. Moreover, the proportion of highly deleterious mutations carried by probands reflects the typical age of onset of the disease. The study design also has a discernible influence: a greater proportion of highly deleterious mutations is detected in pedigree than case-control studies, and for autism, in simplex than multiplex families and in female versus male probands. Thus, anchoring observations in human genetics to a population genetic model allows us to learn about the fitness effects of mutations identified by different mapping strategies and for different traits.