Decoding molecular mechanisms for loss-of-function variants in the human proteome

Matteo Cagiada; Nicolas Jonsson; Kresten Lindorff-Larsen

doi:10.7554/eLife.108160.1

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.

Reviewing Editor
Rosana Collepardo
University of Cambridge, Cambridge, United Kingdom
Senior Editor
Volker Dötsch
Goethe University Frankfurt, Frankfurt am Main, Germany

Reviewer #1 (Public review):

Summary:

In this work, the authors aim to improve upon their previous iterations of frameworks and models that try to decouple variant effects of protein stability from direct effects on function. This is motivated by the utility of understanding the specific molecular mechanisms underlying loss-of-function disease to assist in developing potential treatment approaches, which differ based on the causal mechanisms. The authors demonstrably achieve this goal, with FunC-ESMs presenting an elegant approach, utilizing pre-trained ESM-1b and ESM-IF models, which freed them from model training or running computationally intensive Rosetta predictions. While the performance improvements over their previous model are not unambiguous, in some of the examples, FunC-ESMs allowed them to scale up their analysis to the proteome level, deriving variant classifications of stable-but-inactive and total-loss across 20,144 human proteins, and further allowing them to identify functionally and structurally critical sites. However, the strength of the manuscript could be improved by clarifying or rewording some terminology concerning the molecular effects and what other underlying molecular mechanisms could also reside in the stable-but-inactive group, given the stated motivation of setting up a mechanistic starting point for therapeutic development and clinical applications.

Strengths:

Overall, the manuscript is very well framed and written, with clear motivations and objectives. The previous works are explained well and set up a clear methodological comparison with the new framework. FunC-ESMs is solidly designed to minimize data circularity, and the methodology to derive optimal thresholds is well reasoned. The authors make an effort to provide all the data and code very accessible.

Weaknesses:

(1) Considering how loss-of-function mechanisms dominate the known missense disease variant landscape, it is understandable that the scope of the work focuses on loss of function. However, variants exceeding the established ESM-1b threshold in the manuscript are often generalized as loss-of-function variants (e.g., lines 176, 304; line 285, for instance, uses much more neutral language), which can be misleading due to the guaranteed presence of deleterious variants that manifest through other mechanisms, such as gain-of-function.

While relatively not as well predicted, gain-of-function variants would still likely demonstrate inflated ESM-1b scores and end up in the SBI class. Given the emphasis on the potential utility of the framework for tailoring therapeutic approaches, it seems pertinent to highlight gain-of-function and dominant-negative mechanisms in the manuscript, as they would require considerably different therapeutics than loss-of-function variants.

A short disclaimer explaining the other mechanisms and the potential limitations of the framework in picking them out would improve the clarity of the manuscript. As an additional step, it would be interesting to explore where clinically validated gain-of-function and dominant-negative variant examples fall within the framework's classification.

(2) Given the clinical angle, it would be useful to see the predicted label distribution in population datasets like gnomAD, for instance, focusing on dominant Mendelian disease genes to minimize the impact of non-penetrant or heterozygous disease variants. The performance demonstration using (likely) benign ClinVar variants is not as informative of the real-world utility cases that the method would be used in by clinicians or researchers.

https://doi.org/10.7554/eLife.108160.1.sa1

Reviewer #2 (Public review):

Summary:

The paper by Cagiada et al builds on their previously published work, but now uses two independent and complementary machine learning models to predict the deleteriousness of every missense change in the human proteome. The authors were able to separate all missense variants into three classes - wild-type like, total loss (important for stability), or stable-but-inactive (important for function), showing that the predictions correlated well with intuition in terms of clustering and location in folded versus intrinsically disordered regions. Evaluation of known pathogenic and benign variants from ClinVar suggested that around half of all pathogenic missense variants cause disease by disrupting protein stability. These results could be valuable for researchers and genomic diagnostics laboratories performing variant interpretation.

Strengths:

The method uses data from two independent state-of-the-art ML models, which were developed and published by other groups. The predictions were provided for every missense variant in the entire human proteome, and have been validated against a small previously published experimental dataset, as well as using known pathogenic and benign variants from ClinVar. Results are clearly stated and well illustrated with useful figures.

Weaknesses:

Both the description and the analysis could benefit from some additional work around the thresholds used for both ML models (ESM-1b and ESM-IF). The thresholds were selected based on an ROC analysis using published MAVE data, which has various limitations, including the small number of proteins for which MAVE data are available. Moreover, the correlation between the predictions from the two ML models was not evaluated, and there was no discussion of the limitations of the models or where they might predict different things, which was avoided by using two independent thresholds. The threshold approach needs further explanation, and a sensitivity analysis of how the results would change using different thresholds or by defining thresholds in an alternative way would be informative. In addition, the ClinVar pathogenic variants are all treated equally, when in fact it is known that some act via a gain versus a loss of function mechanism. It would be useful to know if these known patho-mechanisms correlate with predictions of variants that affect stability versus function.

https://doi.org/10.7554/eLife.108160.1.sa0

Decoding molecular mechanisms for loss-of-function variants in the human proteome

Peer review process

Editors

Be the first to read new articles from eLife