Exploring the repository of de novo designed bifunctional antimicrobial peptides through deep learning

  1. Frontiers Science Center for Synthetic Biology (Ministry of Education), Tianjin Key Laboratory of Function and Application of Biological Macromolecular Structures, School of Life Sciences, Tianjin University, Tianjin 300072, China
  2. Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
  3. Department of Microbiology, School of Basic Medicine, Fourth Military Medical University, Xi’an 710000, Shaanxi, China
  4. Department of Microbiology, Second Military Medical University, Shanghai 200433, China
  5. State Key Laboratory of Trauma and Chemical Poisoning, Institute of Combined Injury of PLA, College of Preventive Medicine, Third Military Medical University (Army Medical University), Chongqing 400038, China
  6. Tianjin Cancer Hospital Airport Hospital, Tianjin 300308, China
  7. School of Computer Science and Engineering, Central South University, Changsha 410083, Hunan, China

Peer review process

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Anne-Florence Bitbol
    Ecole Polytechnique Federale de Lausanne (EPFL), Lausanne, Switzerland
  • Senior Editor
    Aleksandra Walczak
    École Normale Supérieure - PSL, Paris, France

Reviewer #1 (Public Review):

This manuscript presents a pipeline incorporating a deep generative model and peptide property predictors for the de novo design of peptide sequences with dual antimicrobial/antiviral functions. The authors synthesized and experimentally validated three peptides designed by the pipeline, demonstrating antimicrobial and antiviral activities, with one leading peptide exhibiting antimicrobial efficacy in animal models. However, the manuscript as it stands, has several major limitations on the computational side.

Major issues:

(1) The choice of GAN as the generative model. There are multiple deep generative frameworks (e.g., language models, VAEs, and diffusion models), and GANs are known for their training difficulty and mode collapse. Could the authors elaborate on the specific rationale behind choosing GANs for this task?

(2) The pipeline is supposed to generate peptides showing dual properties. Why were antiviral peptides not used to train the GAN? Would adding antiviral peptides into the training lead to a higher chance of getting antiviral generations?

(3) For the antimicrobial peptide predictor, where were the contact maps of peptides sourced from?

(4) Morgan fingerprint can be used to generate amino acid features. Would it be better to concatenate ESM features with amino acid-level fingerprints and use them as node features of GNN?

(5) Although the number of labeled antiviral peptides may be limited, the input features (ESM embeddings) should be predictive enough when coupled with shallow neural networks. Have the authors tried simple GNNs on antiviral prediction and compared the prediction performance to those of existing tools?

(6) Instead of using global alignment to get match scores, the authors should use local alignment.

(7) How novel are the validated peptides? The authors should run a sequence alignment to get the most similar known AMP for each validated peptide, and analyze whether they are similar.

(8) Only three peptides were synthesized and experimentally validated. This is too few and unacceptable in this field currently. The standard is to synthesize and characterize several dozens of peptides at the very least to have a robust study.

Reviewer #2 (Public Review):

Summary:

This study marks a noteworthy advance in the targeted design of AMPs, leveraging a pioneering deep-learning framework to generate potent bifunctional peptides with specificity against both bacteria and viruses. The introduction of a GAN for generation and a GCN-based AMPredictor for MIC predictions is methodologically robust and a major stride in computational biology. Experimental validation in vitro and in animal models, notably with the highly potent P076 against a multidrug-resistant bacterium and P002's broad-spectrum viral inhibition, underpins the strength of their evidence. The findings are significant, showcasing not just promising therapeutic candidates, but also demonstrating a replicable means to rapidly develop new antimicrobials against the threat of drug-resistant pathogens.

Strengths:

The de novo AMP design framework combines a generative adversarial network (GAN) with an AMP predictor (AMPredictor), which is a novel approach in the field. The integration of deep generative models and graph-encoding activity regressors for discovering bifunctional AMPs is cutting-edge and addresses the need for new antimicrobial agents against drug-resistant pathogens. The in vitro and in vivo experimental validations of the AMPs provide strong evidence to support the computational predictions. The successful inhibition of a spectrum of pathogens in vitro and in animal models gives credibility to the claims. The discovery of effective peptides, such as P076, which demonstrates potent bactericidal activity against multidrug-resistant A. baumannii with low cytotoxicity, is noteworthy. This could have far-reaching implications for addressing antibiotic resistance. The demonstrated activity of the peptides against both bacterial and viral pathogens suggests that the discovered AMPs have a wide therapeutic potential and could be effective against a range of pathogens.

Reviewer #3 (Public Review):

Summary:

Dong et al. described a deep learning-based framework of antimicrobial (AMP) generator and regressor to design and rank de novo antimicrobial peptides (AMPs). For generated AMPs, they predicted their minimum inhibitory concentration (MIC) using a model that combines the Morgan fingerprint, contact map, and ESM language model. For their selected AMPs based on predicted MIC, they also use a combination of antiviral peptide (AVP) prediction models to select AMPs with potential antiviral activity. They experimentally validated 3 candidates for antimicrobial activity against S. aureus, A. baumannii, E. coli, and P. aeruginosa, and their toxicity on mouse blood and three human cell lines. The authors select their most promising AMP (P076) for in vivo experiments in A. baumannii-infected mice. They finally test the antiviral activity of their 3 AMPs against viruses.

Strengths:

-The development of de novo antimicrobial peptides (AMPs) with the novelty of being bifunctional (antimicrobial and antiviral activity).

-Novel, combined approach to AMP activity prediction from their amino acid sequence.

Weaknesses:

-I missed justification on why training AMPs without information of their antiviral activity would generate AMPs that could also have antiviral activity with such high frequency (32 out of 104).

-The justification for AMP predictor advantages over previous tools lacks rationale, comparison with previous tools (e.g., with the very successful AMP prediction approach described by Ma et al. 10.1038/s41587-022-01226-0), and proper referencing.

-Experimental validation of three de novo AMPs is a very low number compared to recent similar studies.

-I have concerns regarding the in vivo experiments including i) the short period of reported survival compared to recent studies (0.1038/s41587-022-01226-0, 10.1016/j.chom.2023.07.001, 0.1038/s41551-022-00991-2) and ii) although in Figure 2 f and g statistics have been provided, log scale y-axis would provide a better comparative representation of different conditions.

-I had difficulty reading the story because of the use of acronyms without referring to their full name for the first time, and incomplete annotation in figures and captions.

Author response:

We thank the reviewers for their help and their suggestions to make this manuscript more rigorous. We would like to post provisional author responses when eLife publish the reviewed preprint, and the more detailed responses will be supplemented with the revised manuscript.

  • There are questions about choices made in the computational approach (architecture and type of generative model, training set).

We will train a new generator model based on the current GAN architecture, but with ‘hybrid’ AMP/AVP training sets (Reviewer 1 and 3). Hence, we can directly compare the performances of two generators. Based on our preliminary data, providing GAN with more AVP sequences during training helped the designed peptides pass the AVP filter, at the cost of reducing the average AMPredicgtor scores. The new generator also elevated the diversity of designed sequences.

We also perturbed the detailed architecture of our deep learning models, including fully-connected graph edge encodings and different versions of ESM (e.g. esm1b_t33_650M_UR50S, esm2_t48_15B_UR50D, Reviewer 2). In the revised manuscript, we will report the effects of these modifications and suggest the overall construct of GCN and GAN are suitable for a light-weight sequence label model, as demonstrated in Author response table 1 and 2. For the generator, we suggest that using our approach, we may have reached a plateau for the GAN sampling (Author response table 3).

Author response table 1.

Results of AMPredictor with different graph edge encodings

Author response table 2.

Results of AMPredictor with different ESM versions

Author response table 3.

Evaluation of generated sequences with different sampling numbers

  • There is an important concern about the small number of antimicrobial peptides tested, compared to other studies, and the origin of antiviral activities.

We will address this concern by increasing the number of peptides tested in anti-microbial and anti-viral experiments. As reported in current version of our manuscript, the first generation of GAN generated 128 unique designs and the top 2% (3 designs) was tested experimentally. The second generation of GAN will produce ~1024 designs (1-2 weeks) and the top 2% (~ 20 new sequences) will be tested. We are in the process of synthesize (2-3 weeks) and MIC measurement (1 week). The overall size of tested sample will reach 20-30 sequences. We will focus on sequences with low similarity (< 30%) to any known AMPs, thus expanding the universe functional peptides. We estimated the collection of these new data in 6 weeks.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation