A computational screen for alternative genetic codes in over 250,000 genomes

  1. Yekaterina Shulgina
  2. Sean R Eddy  Is a corresponding author
  1. Harvard University, United States

Abstract

The genetic code has been proposed to be a 'frozen accident', but the discovery of alternative genetic codes over the past four decades has shown that it can evolve to some degree. Since most examples were found anecdotally, it is difficult to draw general conclusions about the evolutionary trajectories of codon reassignment and why some codons are affected more frequently. To fill in the diversity of genetic codes, we developed Codetta, a computational method to predict the amino acid decoding of each codon from nucleotide sequence data. We surveyed the genetic code usage of over 250,000 bacterial and archaeal genome sequences in GenBank and discovered five new reassignments of arginine codons (AGG, CGA, and CGG), representing the first sense codon changes in bacteria. In a clade of uncultivated Bacilli, the reassignment of AGG to become the dominant methionine codon likely evolved by a change in the amino acid charging of an arginine tRNA. The reassignments of CGA and/or CGG were found in genomes with low GC content, an evolutionary force which likely helped drive these codons to low frequency and enable their reassignment.

Data availability

Results of computational analyses performed in this study are included in the manuscript and supporting files. Source data files have been provided for Figures 2, 3, and 4.

The following previously published data sets were used

Article and author information

Author details

  1. Yekaterina Shulgina

    Systems Biology, Harvard University, Cambridge, United States
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-7658-9294
  2. Sean R Eddy

    Molecular & Cellular Biology, Harvard University, Cambridge, United States
    For correspondence
    seaneddy@fas.harvard.edu
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-6676-4706

Funding

National Human Genome Research Institute (F31-HG010984)

  • Yekaterina Shulgina

National Human Genome Research Institute (R01-HG009116)

  • Sean R Eddy

Howard Hughes Medical Institute

  • Sean R Eddy

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Copyright

© 2021, Shulgina & Eddy

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 9,553
    views
  • 1,379
    downloads
  • 44
    citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Yekaterina Shulgina
  2. Sean R Eddy
(2021)
A computational screen for alternative genetic codes in over 250,000 genomes
eLife 10:e71402.
https://doi.org/10.7554/eLife.71402

Share this article

https://doi.org/10.7554/eLife.71402