Structure-guided isoform identification for the human transcriptome
Abstract
Recently-developed methods to predict three-dimensional protein structure with high accuracy have opened new avenues for genome and proteome research. We explore a new hypothesis in genome annotation, namely whether computationally predicted structures can help to identify which of multiple possible gene isoforms represents a functional protein product. Guided by protein structure predictions, we evaluated over 230,000 isoforms of human protein-coding genes assembled from over 10,000 RNA sequencing experiments across many human tissues. From this set of assembled transcripts, we identified hundreds of isoforms with more confidently predicted structure and potentially superior function in comparison to canonical isoforms in the latest human gene database. We illustrate our new method with examples where structure provides a guide to function in combination with expression and evolutionary evidence. Additionally, we provide the complete set of structures as a resource to better understand the function of human genes and their isoforms. These results demonstrate the promise of protein structure prediction as a genome annotation tool, allowing us to refine even the most highly-curated catalog of human proteins. More generally we demonstrate a practical, structure-guided approach that can be used to enhance the annotation of any genome.
Data availability
Gene identifiers for all predicted protein isoforms as well as pLDDT scores and evolutionary conservation data from mouse can be found in table S1. Predicted scores and GTEx expression data for all isoforms overlapping a MANE locus can be found in table S2. Data for the 401 alternate isoforms with evidence of relatively superior structure, and possibly superior function, can be found in table S3. Additionally, all data can be downloaded from the project website, isoform.io.
Article and author information
Author details
Funding
National Institutes of Health (R01-HG006677)
- Steven L Salzberg
National Institutes of Health (R35-GM130151)
- Steven L Salzberg
National Research Foundation of Korea (2019R1-A6A1-A10073437)
- Martin Steinegger
National Research Foundation of Korea (2020M3-A9G7-103933)
- Martin Steinegger
National Research Foundation of Korea (2021-R1C1-C102065)
- Martin Steinegger
National Research Foundation of Korea (2021-M3A9-I4021220)
- Martin Steinegger
Seoul National University (Creative-Pioneering Researchers Program)
- Martin Steinegger
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Copyright
© 2022, Sommer et al.
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 5,086
- views
-
- 533
- downloads
-
- 26
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Genetics and Genomics
Genetic effects on complex traits may depend on context, such as age, sex, environmental exposures, or social settings. However, it remains often unclear if the extent of context dependency, or gene-by-environment interaction (GxE), merits more involved models than the additive model typically used to analyze data from genome-wide association studies (GWAS). Here, we suggest considering the utility of GxE models in GWAS as a trade-off between bias and variance parameters. In particular, we derive a decision rule for choosing between competing models for the estimation of allelic effects. The rule weighs the increased estimation noise when context is considered against the potential bias when context dependency is ignored. In the empirical example of GxSex in human physiology, the increased noise of context-specific estimation often outweighs the bias reduction, rendering GxE models less useful when variants are considered independently. However, for complex traits, we argue that the joint consideration of context dependency across many variants mitigates both noise and bias. As a result, polygenic GxE models can improve both estimation and trait prediction. Finally, we exemplify (using GxDiet effects on longevity in fruit flies) how analyses based on independently ascertained ‘top hits’ alone can be misleading, and that considering polygenic patterns of GxE can improve interpretation.
-
- Biochemistry and Chemical Biology
- Genetics and Genomics
Deep Mutational Scanning (DMS) is an emerging method to systematically test the functional consequences of thousands of sequence changes to a protein target in a single experiment. Because of its utility in interpreting both human variant effects and protein structure-function relationships, it holds substantial promise to improve drug discovery and clinical development. However, applications in this domain require improved experimental and analytical methods. To address this need, we report novel DMS methods to precisely and quantitatively interrogate disease-relevant mechanisms, protein-ligand interactions, and assess predicted response to drug treatment. Using these methods, we performed a DMS of the melanocortin-4 receptor (MC4R), a G-protein-coupled receptor (GPCR) implicated in obesity and an active target of drug development efforts. We assessed the effects of >6600 single amino acid substitutions on MC4R’s function across 18 distinct experimental conditions, resulting in >20 million unique measurements. From this, we identified variants that have unique effects on MC4R-mediated Gαs- and Gαq-signaling pathways, which could be used to design drugs that selectively bias MC4R’s activity. We also identified pathogenic variants that are likely amenable to a corrector therapy. Finally, we functionally characterized structural relationships that distinguish the binding of peptide versus small molecule ligands, which could guide compound optimization. Collectively, these results demonstrate that DMS is a powerful method to empower drug discovery and development.