Thousands of novel translated open reading frames in humans inferred by ribosome footprint profiling

  1. Anil Raj  Is a corresponding author
  2. Sidney H Wang
  3. Heejung Shim
  4. Arbel Harpak
  5. Yang I Li
  6. Brett Engelmann
  7. Matthew Stephens
  8. Yoav Gilad
  9. Jonathan K Pritchard
  1. Stanford University, United States
  2. University of Chicago, United States
  3. Purdue University, United States

Abstract

Accurate annotation of protein coding regions is essential for understanding how genetic information is translated into function. We describe riboHMM, a new method that uses ribosome footprint data to accurately infer translated sequences. Applying riboHMM to human lymphoblastoid cell lines, we identified 7,273 novel coding sequences, including 2,442 translated upstream open reading frames. We observed an enrichment of footprints at inferred initiation sites after drug-induced arrest of translation initiation, validating many of the novel coding sequences. The novel proteins exhibit significant selective constraint in the inferred reading frames, suggesting that many are functional. Moreover, ~40% of bicistronic transcripts showed negative correlation in the translation levels of their two coding sequences, suggesting a potential regulatory role for these novel regions. Despite known limitations of mass spectrometry to detect protein expressed at low level, we estimated a 14% validation rate. Our work significantly expands the set of known coding regions in humans.

Article and author information

Author details

  1. Anil Raj

    Department of Genetics, Stanford University, Stanford, United States
    For correspondence
    rajanil@stanford.edu
    Competing interests
    The authors declare that no competing interests exist.
  2. Sidney H Wang

    Department of Human Genetics, University of Chicago, Chicago, United States
    Competing interests
    The authors declare that no competing interests exist.
  3. Heejung Shim

    Department of Statistics, Purdue University, West Lafayette, United States
    Competing interests
    The authors declare that no competing interests exist.
  4. Arbel Harpak

    Department of Biology, Stanford University, Stanford, United States
    Competing interests
    The authors declare that no competing interests exist.
  5. Yang I Li

    Department of Genetics, Stanford University, Stanford, United States
    Competing interests
    The authors declare that no competing interests exist.
  6. Brett Engelmann

    Department of Human Genetics, University of Chicago, Chicago, United States
    Competing interests
    The authors declare that no competing interests exist.
  7. Matthew Stephens

    Department of Human Genetics, University of Chicago, Chicago, United States
    Competing interests
    The authors declare that no competing interests exist.
  8. Yoav Gilad

    Department of Human Genetics, University of Chicago, Chicago, United States
    Competing interests
    The authors declare that no competing interests exist.
  9. Jonathan K Pritchard

    Department of Genetics, Stanford University, Stanford, United States
    Competing interests
    The authors declare that no competing interests exist.

Reviewing Editor

  1. Nicholas T Ingolia, University of California, Berkeley, United States

Publication history

  1. Received: December 24, 2015
  2. Accepted: May 26, 2016
  3. Accepted Manuscript published: May 27, 2016 (version 1)
  4. Version of Record published: July 11, 2016 (version 2)
  5. Version of Record updated: July 12, 2016 (version 3)

Copyright

© 2016, Raj et al.

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 6,809
    Page views
  • 1,537
    Downloads
  • 76
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, Scopus, PubMed Central.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Anil Raj
  2. Sidney H Wang
  3. Heejung Shim
  4. Arbel Harpak
  5. Yang I Li
  6. Brett Engelmann
  7. Matthew Stephens
  8. Yoav Gilad
  9. Jonathan K Pritchard
(2016)
Thousands of novel translated open reading frames in humans inferred by ribosome footprint profiling
eLife 5:e13328.
https://doi.org/10.7554/eLife.13328

Further reading

    1. Computational and Systems Biology
    2. Genetics and Genomics
    Raquel Dias, Doug Evans ... Ali Torkamani
    Research Article

    Genotype imputation is a foundational tool for population genetics. Standard statistical imputation approaches rely on the co-location of large whole-genome sequencing-based reference panels, powerful computing environments, and potentially sensitive genetic study data. This results in computational resource and privacy-risk barriers to access to cutting-edge imputation techniques. Moreover, the accuracy of current statistical approaches is known to degrade in regions of low and complex linkage disequilibrium. Artificial neural network-based imputation approaches may overcome these limitations by encoding complex genotype relationships in easily portable inference models. Here we demonstrate an autoencoder-based approach for genotype imputation, using a large, commonly used reference panel, and spanning the entirety of human chromosome 22. Our autoencoder-based genotype imputation strategy achieved superior imputation accuracy across the allele-frequency spectrum and across genomes of diverse ancestry, while delivering at least 4-fold faster inference run time relative to standard imputation tools.

    1. Cancer Biology
    2. Computational and Systems Biology
    Pan Cheng, Xin Zhao ... Teresa Davoli
    Research Article

    How cells control gene expression is a fundamental question. The relative contribution of protein-level and RNA-level regulation to this process remains unclear. Here, we perform a proteogenomic analysis of tumors and untransformed cells containing somatic copy number alterations (SCNAs). By revealing how cells regulate RNA and protein abundances of genes with SCNAs, we provide insights into the rules of gene regulation. Protein complex genes have a strong protein-level regulation while non-complex genes have a strong RNA-level regulation. Notable exceptions are plasma membrane protein complex genes, which show a weak protein-level regulation and a stronger RNA-level regulation. Strikingly, we find a strong negative association between the degree of RNA-level and protein-level regulation across genes and cellular pathways. Moreover, genes participating in the same pathway show a similar degree of RNA- and protein-level regulation. Pathways including translation, splicing, RNA processing, and mitochondrial function show a stronger protein-level regulation while cell adhesion and migration pathways show a stronger RNA-level regulation. These results suggest that the evolution of gene regulation is shaped by functional constraints and that many cellular pathways tend to evolve one predominant mechanism of gene regulation at the protein level or at the RNA level.