Glycosyltransferases (GTs) are prevalent across the tree of life and regulate nearly all aspects of cellular functions. The evolutionary basis for their complex and diverse modes of catalytic functions remain enigmatic. Here, based on deep mining of over half million GT-A fold sequences, we define a minimal core component shared among functionally diverse enzymes. We find that variations in the common core and emergence of hypervariable loops extending from the core contributed to GT-A diversity. We provide a phylogenetic framework relating diverse GT-A fold families for the first time and show that inverting and retaining mechanisms emerged multiple times independently during evolution. Using evolutionary information encoded in primary sequences, we trained a machine learning classifier to predict donor specificity with nearly 90% accuracy and deployed it for the annotation of understudied GTs. Our studies provide an evolutionary framework for investigating complex relationships connecting GT-A fold sequence, structure, function and regulation.
All the data generated during the study are summarized and provided in the manuscript and supporting files. Source files have been provided for Figures 1, 3, 6 and 7. Additionally, all the sequences curated during this study have been deposited to Dryad (doi:10.5061/dryad.v15dv41sh).
Deep evolutionary analysis reveals the design principles of fold A glycosyltransferasesDryad Digital Repository, doi:10.5061/dryad.v15dv41sh.
NCBI reference sequences (RefSeq): a curated non redundant sequence database of genomes, transcripts and proteinsNCBI RefSeq, doi:10.1093/nar/gkl842.
UniProt: a worldwide hub of protein knowledgeUniProt, doi:10.1093/nar/gky1049.
- Kelley W Moremen
- Natarajan Kannan
- Rahil Taujale
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
- Monica Palcic, University of Victoria
© 2020, Taujale et al.
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Induced differentiation is one of the most experience- and skill-dependent experimental processes in regenerative medicine, and establishing optimal conditions often takes years. We developed a robotic AI system with a batch Bayesian optimization algorithm that autonomously induces the differentiation of induced pluripotent stem cell-derived retinal pigment epithelial (iPSC-RPE) cells. From 200 million possible parameter combinations, the system performed cell culture in 143 different conditions in 111 days, resulting in 88% better iPSC-RPE production than that obtained by the pre-optimized culture in terms of the pigmentation scores. Our work demonstrates that the use of autonomous robotic AI systems drastically accelerates systematic and unbiased exploration of experimental search space, suggesting immense use in medicine and research.
Splicing is highly regulated and is modulated by numerous factors. Quantitative predictions for how a mutation will affect precursor mRNA (pre-mRNA) structure and downstream function are particularly challenging. Here, we use a novel chemical probing strategy to visualize endogenous precursor and mature MAPT mRNA structures in cells. We used these data to estimate Boltzmann suboptimal structural ensembles, which were then analyzed to predict consequences of mutations on pre-mRNA structure. Further analysis of recent cryo-EM structures of the spliceosome at different stages of the splicing cycle revealed that the footprint of the Bact complex with pre-mRNA best predicted alternative splicing outcomes for exon 10 inclusion of the alternatively spliced MAPT gene, achieving 74% accuracy. We further developed a β-regression weighting framework that incorporates splice site strength, RNA structure, and exonic/intronic splicing regulatory elements capable of predicting, with 90% accuracy, the effects of 47 known and 6 newly discovered mutations on inclusion of exon 10 of MAPT. This combined experimental and computational framework represents a path forward for accurate prediction of splicing-related disease-causing variants.