A computational screen for alternative genetic codes in over 250,000 genomes

  1. Yekaterina Shulgina
  2. Sean R Eddy  Is a corresponding author
  1. Harvard University, United States

Abstract

The genetic code has been proposed to be a 'frozen accident', but the discovery of alternative genetic codes over the past four decades has shown that it can evolve to some degree. Since most examples were found anecdotally, it is difficult to draw general conclusions about the evolutionary trajectories of codon reassignment and why some codons are affected more frequently. To fill in the diversity of genetic codes, we developed Codetta, a computational method to predict the amino acid decoding of each codon from nucleotide sequence data. We surveyed the genetic code usage of over 250,000 bacterial and archaeal genome sequences in GenBank and discovered five new reassignments of arginine codons (AGG, CGA, and CGG), representing the first sense codon changes in bacteria. In a clade of uncultivated Bacilli, the reassignment of AGG to become the dominant methionine codon likely evolved by a change in the amino acid charging of an arginine tRNA. The reassignments of CGA and/or CGG were found in genomes with low GC content, an evolutionary force which likely helped drive these codons to low frequency and enable their reassignment.

Data availability

Results of computational analyses performed in this study are included in the manuscript and supporting files. Source data files have been provided for Figures 2, 3, and 4.

The following previously published data sets were used

Article and author information

Author details

  1. Yekaterina Shulgina

    Systems Biology, Harvard University, Cambridge, United States
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-7658-9294
  2. Sean R Eddy

    Molecular & Cellular Biology, Harvard University, Cambridge, United States
    For correspondence
    seaneddy@fas.harvard.edu
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-6676-4706

Funding

National Human Genome Research Institute (F31-HG010984)

  • Yekaterina Shulgina

National Human Genome Research Institute (R01-HG009116)

  • Sean R Eddy

Howard Hughes Medical Institute

  • Sean R Eddy

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Copyright

© 2021, Shulgina & Eddy

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 9,158
    views
  • 1,343
    downloads
  • 39
    citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Yekaterina Shulgina
  2. Sean R Eddy
(2021)
A computational screen for alternative genetic codes in over 250,000 genomes
eLife 10:e71402.
https://doi.org/10.7554/eLife.71402

Share this article

https://doi.org/10.7554/eLife.71402

Further reading

    1. Computational and Systems Biology
    2. Microbiology and Infectious Disease
    Ritwik Maity, Xuepei Zhang ... Javier Sancho
    Research Article

    Antimicrobial resistance is responsible for an alarming number of deaths, estimated at 5 million per year. To combat priority pathogens, like Helicobacter pylori, the development of novel therapies is of utmost importance. Understanding the molecular alterations induced by medications is critical for the design of multi-targeting treatments capable of eradicating the infection and mitigating its pathogenicity. However, the application of bulk omics approaches for unraveling drug molecular mechanisms of action is limited by their inability to discriminate between target-specific modifications and off-target effects. This study introduces a multi-omics method to overcome the existing limitation. For the first time, the Proteome Integral Solubility Alteration (PISA) assay is utilized in bacteria in the PISA-Express format to link proteome solubility with different and potentially immediate responses to drug treatment, enabling us the resolution to understand target-specific modifications and off-target effects. This study introduces a comprehensive method for understanding drug mechanisms and optimizing the development of multi-targeting antimicrobial therapies.

    1. Computational and Systems Biology
    Harlan P Stevens, Carly V Winegar ... Stephen R Piccolo
    Research Article

    To help maximize the impact of scientific journal articles, authors must ensure that article figures are accessible to people with color-vision deficiencies (CVDs), which affect up to 8% of males and 0.5% of females. We evaluated images published in biology- and medicine-oriented research articles between 2012 and 2022. Most included at least one color contrast that could be problematic for people with deuteranopia (‘deuteranopes’), the most common form of CVD. However, spatial distances and within-image labels frequently mitigated potential problems. Initially, we reviewed 4964 images from eLife, comparing each against a simulated version that approximated how it might appear to deuteranopes. We identified 636 (12.8%) images that we determined would be difficult for deuteranopes to interpret. Our findings suggest that the frequency of this problem has decreased over time and that articles from cell-oriented disciplines were most often problematic. We used machine learning to automate the identification of problematic images. For a hold-out test set from eLife (n=879), a convolutional neural network classified the images with an area under the precision-recall curve of 0.75. The same network classified images from PubMed Central (n=1191) with an area under the precision-recall curve of 0.39. We created a Web application (https://bioapps.byu.edu/colorblind_image_tester); users can upload images, view simulated versions, and obtain predictions. Our findings shed new light on the frequency and nature of scientific images that may be problematic for deuteranopes and motivate additional efforts to increase accessibility.