Deep evolutionary analysis reveals the design principles of fold A glycosyltransferases

Abstract
Data availability
Article and author information
Metrics

Abstract

Glycosyltransferases (GTs) are prevalent across the tree of life and regulate nearly all aspects of cellular functions. The evolutionary basis for their complex and diverse modes of catalytic functions remain enigmatic. Here, based on deep mining of over half million GT-A fold sequences, we define a minimal core component shared among functionally diverse enzymes. We find that variations in the common core and emergence of hypervariable loops extending from the core contributed to GT-A diversity. We provide a phylogenetic framework relating diverse GT-A fold families for the first time and show that inverting and retaining mechanisms emerged multiple times independently during evolution. Using evolutionary information encoded in primary sequences, we trained a machine learning classifier to predict donor specificity with nearly 90% accuracy and deployed it for the annotation of understudied GTs. Our studies provide an evolutionary framework for investigating complex relationships connecting GT-A fold sequence, structure, function and regulation.

Data availability

All the data generated during the study are summarized and provided in the manuscript and supporting files. Source files have been provided for Figures 1, 3, 6 and 7. Additionally, all the sequences curated during this study have been deposited to Dryad (doi:10.5061/dryad.v15dv41sh).

The following data sets were generated

1. Taujale R
2. Venkat A
3. Huang LC
4. Yeung W
5. Rasheed K
6. Edison AS
7. Moremen KW
8. Kannan N
(2019) Deep evolutionary analysis reveals the design principles of fold A glycosyltransferases
Dryad Digital Repository, doi:10.5061/dryad.v15dv41sh.

https://dx.doi.org/10.5061/dryad.v15dv41sh

The following previously published data sets were used

(2007) NCBI reference sequences (RefSeq): a curated non redundant sequence database of genomes, transcripts and proteins
NCBI RefSeq, doi:10.1093/nar/gkl842.

http://www.ncbi.nlm.nih.gov/RefSeq/
1. The UniProt Consortium
(2019) UniProt: a worldwide hub of protein knowledge
UniProt, doi:10.1093/nar/gky1049.

https://www.uniprot.org/

Article and author information

Author details

Rahil Taujale

Institute of Bioinformatics, Complex Carbohydrate Research Center, University of Georgia, Athens, United States

Competing interests
The authors declare that no competing interests exist.
Aarya Venkat

Department of Biochemistry and Molecular Biology, University of Georgia, Athens, United States

Competing interests
The authors declare that no competing interests exist.
Liang-Chin Huang

Institute of Bioinformatics, University of Georgia, Athens, United States

Competing interests
The authors declare that no competing interests exist.
Zhongliang Zhou

Department of Computer Science, University of Georgia, Athens, United States

Competing interests
The authors declare that no competing interests exist.
Wayland Yeung

Institute of Bioinformatics, University of Georgia, Athens, United States

Competing interests
The authors declare that no competing interests exist.
Khaled M Rasheed

Department of Computer Science, University of Georgia, Athens, United States

Competing interests
The authors declare that no competing interests exist.
Sheng Li

Department of Computer Science, University of Georgia, Athens, United States

Competing interests
The authors declare that no competing interests exist.
Arthur S Edison

Genetics and Biochemistry & Molecular Biology, University of Georgia, Athens, United States

Competing interests
The authors declare that no competing interests exist.

"This ORCID iD identifies the author of this article:" 0000-0002-5686-2350
Kelley W Moremen

Complex Carbohydrate Research Center, Department of Biochemistry and Molecular Biology, University of Georgia, Athens, United States

Competing interests
The authors declare that no competing interests exist.
Natarajan Kannan

Institute of Bioinformatics, Department of Biochemistry and Molecular Biology, University of Georgia, Athens, United States

For correspondence
nkannan@uga.edu

Competing interests
The authors declare that no competing interests exist.

"This ORCID iD identifies the author of this article:" 0000-0002-2833-8375

Funding

National Institutes of Health (R01 GM130915)

Kelley W Moremen
Natarajan Kannan

National Institutes of Health (T32 GM107004)

Rahil Taujale

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.