AI-enabled Alkaline-resistant Evolution of Protein to Apply in Mass Production

  1. School of Physics and Astronomy, Shanghai Jiao Tong University, Shanghai, China
  2. School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
  3. Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, China
  4. Shanghai National Centre for Applied Mathematics (SJTU Center), MOE-LSC, Shanghai Jiao Tong University, Shanghai, China
  5. Shanghai Artificial Intelligence Laboratory, Shanghai, China
  6. Changchun GeneScience Pharmaceuticals Co., Ltd., Changchun, China
  7. Zhangjiang Institute for Advanced Study, Shanghai Jiao Tong University, Shanghai, China

Peer review process

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Martin Graña
    Institut Pasteur de Montevideo, Montevideo, Uruguay
  • Senior Editor
    David Ron
    University of Cambridge, Cambridge, United Kingdom

Reviewer #1 (Public review):

Summary:

In this manuscript, the model's capacity to capture epistatic interactions through multi-point mutations and its success in finding the global optimum within the protein fitness landscape highlights the strength of deep learning methods over traditional approaches.

Strengths:

It is impressive that the authors used AI combined with limited experimental validation to achieve such significant enhancements in protein performance. Besides, the successful application of the designed antibody in industrial settings demonstrates the practical and economic relevance of the study. Overall, this work has broad implications for future AI-guided protein engineering efforts.

Weaknesses:

However, the authors should conduct a more thorough computational analysis to complement their manuscript. While the identification of improved multi-point mutants is commendable, the manuscript lacks a detailed investigation into the mechanisms by which these mutations enhance protein properties. The authors briefly mention that some physicochemical characteristics of the mutants are unusual, but they do not delve into why these mutations result in improved performance. Could computational techniques, such as molecular dynamics simulations, be employed to explore the effects of these mutations? Additionally, the authors claim that their method is efficient. However, the selected VHH is relatively short (<150 AA), resulting in lower computational costs. It remains unclear whether the computational cost of this approach would still be acceptable when designing larger proteins (>1000 AA). Besides, the design process involves a large number of prediction tasks, including the properties of both single-site saturation and multi-point mutants. The computational load is closely tied to the protein length and the number of mutation sites. Could the authors analyze the model's capability boundaries in this regard and discuss how scalable their approach is when dealing with larger proteins or more complex mutation tasks?

Reviewer #2 (Public review):

In this paper, the authors aim to explore whether an AI model trained on natural protein data can aid in designing proteins that are resistant to extreme environments. While this is an interesting attempt, the study's computational contributions are weak, and the design of the computational experiments appears arbitrary.

(1) The writing throughout the paper is poor. This leaves the reader confused.

(2) The main technical issue the authors address is whether AI can identify protein mutations that adapt to extreme environments based solely on natural protein data. However, the introduction could be more concise and focused on the key points to better clarify the significance of this question.

(3) The authors did not develop a new model but instead used their previously developed Pro-PRIME model. This significantly weakens the novelty and contribution of this work.

(4) The computational experiments are not well-justified. For instance, the authors used a zero-shot setting for single-point mutation experiments but opted for fine-tuning in multiple-point mutation experiments. There is no clear explanation for this discrepancy. How does the model perform in zero-shot settings for multiple-point mutations? How would fine-tuning affect single-point mutation results? The choice of these strategies seems arbitrary and lacks sufficient discussion.

Author response:

Reviewer #1:

Weaknesses:

However, the authors should conduct a more thorough computational analysis to complement their manuscript. While the identification of improved multi-point mutants is commendable, the manuscript lacks a detailed investigation into the mechanisms by which these mutations enhance protein properties. The authors briefly mention that some physicochemical characteristics of the mutants are unusual, but they do not delve into why these mutations result in improved performance. Could computational techniques, such as molecular dynamics simulations, be employed to explore the effects of these mutations? Additionally, the authors claim that their method is efficient. However, the selected VHH is relatively short (<150 AA), resulting in lower computational costs. It remains unclear whether the computational cost of this approach would still be acceptable when designing larger proteins (>1000 AA). Besides, the design process involves a large number of prediction tasks, including the properties of both single-site saturation and multi-point mutants. The computational load is closely tied to the protein length and the number of mutation sites. Could the authors analyze the model's capability boundaries in this regard and discuss how scalable their approach is when dealing with larger proteins or more complex mutation tasks?

We agree that further analysis of the mechanisms by which the identified mutations enhance protein performance would strengthen our study. In the revised manuscript, we plan to conduct molecular dynamics simulations to explore the physicochemical effects of these mutations in more details. This analysis will help elucidate how the observed structural and dynamic changes contribute to the improved resistance and stability of the designed VHH antibody.

We acknowledge the need to assess the scalability of our method to larger proteins. To address this, we will include an analysis of the method’s performance when applied to longer proteins, including an estimation of computational cost and potential bottlenecks.

Reviewer #2:

(1) The writing throughout the paper is poor. This leaves the reader confused.

(2) The main technical issue the authors address is whether AI can identify protein mutations that adapt to extreme environments based solely on natural protein data. However, the introduction could be more concise and focused on the key points to better clarify the significance of this question.

(3) The authors did not develop a new model but instead used their previously developed Pro-PRIME model. This significantly weakens the novelty and contribution of this work.

(4) The computational experiments are not well-justified. For instance, the authors used a zero-shot setting for single-point mutation experiments but opted for fine-tuning in multiple-point mutation experiments. There is no clear explanation for this discrepancy. How does the model perform in zero-shot settings for multiple-point mutations? How would fine-tuning affect single-point mutation results? The choice of these strategies seems arbitrary and lacks sufficient discussion.

(1&2) We will revise the manuscript to improve the overall clarity and readability. Specifically, we will restructure the introduction to focus more concisely on the key scientific questions and contributions of our study.

(3) While the Pro-PRIME model was previously developed, this work focuses on designing proteins with properties that do not naturally exist and are scarce in the natural world. To address the concern about novelty, we will expand the discussion to highlight this unique contribution and its implications for advancing protein design.

(4) We appreciate the comment regarding the discrepancy between the zero-shot and fine-tuning strategies. In the revised manuscript, we will provide a detailed explanation for the choice of these settings, including an analysis of the trade-offs between zero-shot and fine-tuning approaches in multi-point mutation tasks. We will also explore the model’s performance in zero-shot settings for multi-point mutations and report these results in the supplementary materials to ensure completeness.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation