Deep evolutionary analysis reveals the design principles of fold A glycosyltransferases

Abstract

Glycosyltransferases (GTs) are prevalent across the tree of life and regulate nearly all aspects of cellular functions. The evolutionary basis for their complex and diverse modes of catalytic functions remain enigmatic. Here, based on deep mining of over half million GT-A fold sequences, we define a minimal core component shared among functionally diverse enzymes. We find that variations in the common core and emergence of hypervariable loops extending from the core contributed to GT-A diversity. We provide a phylogenetic framework relating diverse GT-A fold families for the first time and show that inverting and retaining mechanisms emerged multiple times independently during evolution. Using evolutionary information encoded in primary sequences, we trained a machine learning classifier to predict donor specificity with nearly 90% accuracy and deployed it for the annotation of understudied GTs. Our studies provide an evolutionary framework for investigating complex relationships connecting GT-A fold sequence, structure, function and regulation.

Data availability

All the data generated during the study are summarized and provided in the manuscript and supporting files. Source files have been provided for Figures 1, 3, 6 and 7. Additionally, all the sequences curated during this study have been deposited to Dryad (doi:10.5061/dryad.v15dv41sh).

The following data sets were generated
The following previously published data sets were used

Article and author information

Author details

  1. Rahil Taujale

    Institute of Bioinformatics, Complex Carbohydrate Research Center, University of Georgia, Athens, United States
    Competing interests
    The authors declare that no competing interests exist.
  2. Aarya Venkat

    Department of Biochemistry and Molecular Biology, University of Georgia, Athens, United States
    Competing interests
    The authors declare that no competing interests exist.
  3. Liang-Chin Huang

    Institute of Bioinformatics, University of Georgia, Athens, United States
    Competing interests
    The authors declare that no competing interests exist.
  4. Zhongliang Zhou

    Department of Computer Science, University of Georgia, Athens, United States
    Competing interests
    The authors declare that no competing interests exist.
  5. Wayland Yeung

    Institute of Bioinformatics, University of Georgia, Athens, United States
    Competing interests
    The authors declare that no competing interests exist.
  6. Khaled M Rasheed

    Department of Computer Science, University of Georgia, Athens, United States
    Competing interests
    The authors declare that no competing interests exist.
  7. Sheng Li

    Department of Computer Science, University of Georgia, Athens, United States
    Competing interests
    The authors declare that no competing interests exist.
  8. Arthur S Edison

    Genetics and Biochemistry & Molecular Biology, University of Georgia, Athens, United States
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5686-2350
  9. Kelley W Moremen

    Complex Carbohydrate Research Center, Department of Biochemistry and Molecular Biology, University of Georgia, Athens, United States
    Competing interests
    The authors declare that no competing interests exist.
  10. Natarajan Kannan

    Institute of Bioinformatics, Department of Biochemistry and Molecular Biology, University of Georgia, Athens, United States
    For correspondence
    nkannan@uga.edu
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-2833-8375

Funding

National Institutes of Health (R01 GM130915)

  • Kelley W Moremen
  • Natarajan Kannan

National Institutes of Health (T32 GM107004)

  • Rahil Taujale

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Reviewing Editor

  1. Monica Palcic, University of Victoria

Publication history

  1. Received: December 17, 2019
  2. Accepted: March 31, 2020
  3. Accepted Manuscript published: April 1, 2020 (version 1)
  4. Version of Record published: April 27, 2020 (version 2)

Copyright

© 2020, Taujale et al.

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 3,785
    Page views
  • 543
    Downloads
  • 23
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, Scopus, PubMed Central.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Rahil Taujale
  2. Aarya Venkat
  3. Liang-Chin Huang
  4. Zhongliang Zhou
  5. Wayland Yeung
  6. Khaled M Rasheed
  7. Sheng Li
  8. Arthur S Edison
  9. Kelley W Moremen
  10. Natarajan Kannan
(2020)
Deep evolutionary analysis reveals the design principles of fold A glycosyltransferases
eLife 9:e54532.
https://doi.org/10.7554/eLife.54532

Further reading

    1. Computational and Systems Biology
    2. Stem Cells and Regenerative Medicine
    Genki N Kanda et al.
    Research Article

    Induced differentiation is one of the most experience- and skill-dependent experimental processes in regenerative medicine, and establishing optimal conditions often takes years. We developed a robotic AI system with a batch Bayesian optimization algorithm that autonomously induces the differentiation of induced pluripotent stem cell-derived retinal pigment epithelial (iPSC-RPE) cells. From 200 million possible parameter combinations, the system performed cell culture in 143 different conditions in 111 days, resulting in 88% better iPSC-RPE production than that obtained by the pre-optimized culture in terms of the pigmentation scores. Our work demonstrates that the use of autonomous robotic AI systems drastically accelerates systematic and unbiased exploration of experimental search space, suggesting immense use in medicine and research.

    1. Computational and Systems Biology
    2. Genetics and Genomics
    Jayashree Kumar et al.
    Research Article Updated

    Splicing is highly regulated and is modulated by numerous factors. Quantitative predictions for how a mutation will affect precursor mRNA (pre-mRNA) structure and downstream function are particularly challenging. Here, we use a novel chemical probing strategy to visualize endogenous precursor and mature MAPT mRNA structures in cells. We used these data to estimate Boltzmann suboptimal structural ensembles, which were then analyzed to predict consequences of mutations on pre-mRNA structure. Further analysis of recent cryo-EM structures of the spliceosome at different stages of the splicing cycle revealed that the footprint of the Bact complex with pre-mRNA best predicted alternative splicing outcomes for exon 10 inclusion of the alternatively spliced MAPT gene, achieving 74% accuracy. We further developed a β-regression weighting framework that incorporates splice site strength, RNA structure, and exonic/intronic splicing regulatory elements capable of predicting, with 90% accuracy, the effects of 47 known and 6 newly discovered mutations on inclusion of exon 10 of MAPT. This combined experimental and computational framework represents a path forward for accurate prediction of splicing-related disease-causing variants.