Deep Batch Active Learning for Drug Discovery

Michael Bailey; Saeed Moayedpour; Ruijiang Li; Alejandro Corrochano-Navarro; Alexander Kötter; Lorenzo Kogler-Anele; Saleh Riahi; Christoph Grebner; Gerhard Hessler; Hans Matter; Marc Bianciotto; Pablo Mas; Ziv Bar-Joseph; Sven Jager

doi:10.7554/eLife.89679.2

Revised: This Reviewed Preprint has been revised by the authors in response to the previous round of peer review; the eLife assessment and the public reviews have been updated where necessary by the editors and peer reviewers.

Reviewing Editor
Alan Talevi
Universidad Nacional de La Plata, La Plata, Argentina
Senior Editor
Volker Dötsch
Goethe University Frankfurt, Frankfurt am Main, Germany

Reviewer #1 (Public Review):

The authors present a study focused on addressing the key challenge in drug discovery, which is the optimization of absorption and affinity properties of small molecules through in silico methods. They propose active learning as a strategy for optimizing these properties and describe the development of two novel active learning batch selection methods. The methods are tested on various public datasets with different optimization goals and sizes, and new affinity datasets are curated to provide up-to-date experimental information. The authors claim that their active learning methods outperform existing batch selection methods, potentially reducing the number of experiments required to achieve the same model performance. They also emphasize the general applicability of their methods, including compatibility with popular packages like DeepChem.

Strengths:

Relevance and Importance: The study addresses a significant challenge in the field of drug discovery, highlighting the importance of optimizing absorption and affinity properties of small molecules through in silico methods. This topic is of great interest to researchers and pharmaceutical industries.

Novelty: The development of two novel active learning batch selection methods is a commendable contribution. The study also adds value by curating new affinity datasets that provide chronological information on state-of-the-art experimental strategies.
Comprehensive Evaluation: Testing the proposed methods on multiple public datasets with varying optimization goals and sizes enhances the credibility and generalizability of the findings. The focus on comparing the performance of the new methods against existing batch selection methods further strengthens the evaluation.

Weaknesses:

Lack of Technical Details: The feedback lacks specific technical details regarding the developed active learning batch selection methods. Information such as the underlying algorithms, implementation specifics, and key design choices should be provided to enable readers to understand and evaluate the methods thoroughly.

Evaluation Metrics: The feedback does not mention the specific evaluation metrics used to assess the performance of the proposed methods. The authors should clarify the criteria employed to compare their methods against existing batch selection methods and demonstrate the statistical significance of the observed improvements.

Reproducibility: While the authors claim that their methods can be used with any package, including DeepChem, no mention is made of providing the necessary code or resources to reproduce the experiments. Including code repositories or detailed instructions would enhance the reproducibility and practical utility of the study.

Suggestions for Improvement:

Elaborate on the Methodology: Provide an in-depth explanation of the two active learning batch selection methods, including algorithmic details, implementation considerations, and any specific assumptions made. This will enable readers to better comprehend and evaluate the proposed techniques.

Clarify Evaluation Metrics: Clearly specify the evaluation metrics employed in the study to measure the performance of the active learning methods. Additionally, conduct statistical tests to establish the significance of the improvements observed over existing batch selection methods.

Enhance Reproducibility: To facilitate the reproducibility of the study, consider sharing the code, data, and resources necessary for readers to replicate the experiments. This will allow researchers in the field to validate and build upon your work more effectively.

Conclusion:
The authors' study on active learning methods for optimizing drug discovery presents an important and relevant contribution to the field. The proposed batch selection methods and curated affinity datasets hold promise for improving the efficiency of drug discovery processes. However, to strengthen the study, it is crucial to provide more technical details, clarify evaluation metrics, and enhance reproducibility by sharing code and resources. Addressing these limitations will further enhance the value and impact of the research.

https://doi.org/10.7554/eLife.89679.2.sa1

Reviewer #2 (Public Review):

The authors presented a well-written manuscript describing the comparison of active-learning methods with state-of-art methods for several datasets of pharmaceutical interest. This is a very important topic since active learning is similar to a cyclic drug design campaign such as testing compounds followed by designing new ones which could be used to further tests and a new design cycle and so on. The experimental design is comprehensive and adequate for proposed comparisons.

Text in figures still very small and difficult to read. Please redraw the figures increasing the font size: 10-12pt is ideal in comparison with the main text. In my opinion, it seems like the authors drew the Figure properly but there is an automatic resizing by inserting it in the document. Please consider ensuring that the font size will remain legible in the final document.
I think this work will benefit from a comparison of obtained models to other models reported in the literature and the interpretability of models (e.g. contribution of molecule groups to the modeled activity) as it is required by OECD guide for QSAR purposes.

https://doi.org/10.7554/eLife.89679.2.sa0

Author Response

The following is the authors’ response to the original reviews.

Reviewer No.1 (public)

The authors present a study focused on addressing the key challenge in drug discovery, which is the optimization of absorption and affinity properties of small molecules through in silico methods. They propose active learning as a strategy for optimizing these properties and describe the development of two novel active learning batch selection methods. The methods are tested on various public datasets with different optimization goals and sizes, and new affinity datasets are curated to provide up-todate experimental information. The authors claim that their active learning methods outperform existing batch selection methods, potentially reducing the number of experiments required to achieve the same model performance. They also emphasize the general applicability of their methods, including compatibility with popular packages like DeepChem.

Strengths:

Relevance and Importance: The study addresses a significant challenge in the field of drug discovery, highlighting the importance of optimizing the absorption and affinity properties of small molecules through in silico methods. This topic is of great interest to researchers and pharmaceutical industries.

Novelty: The development of two novel active learning batch selection methods is a commendable contribution. The study also adds value by curating new affinity datasets that provide chronological information on state-of-the-art experimental strategies.

Comprehensive Evaluation: Testing the proposed methods on multiple public datasets with varying optimization goals and sizes enhances the credibility and generalizability of the findings. The focus on comparing the performance of the new methods against existing batch selection methods further strengthens the evaluation.

Weaknesses:

Lack of Technical Details: The feedback lacks specific technical details regarding the developed active learning batch selection methods. Information such as the underlying algorithms, implementation specifics, and key design choices should be provided to enable readers to understand and evaluate the methods thoroughly.

Evaluation Metrics: The feedback does not mention the specific evaluation metrics used to assess the performance of the proposed methods. The authors should clarify the criteria employed to compare their methods against existing batch selection methods and demonstrate the statistical significance of the observed improvements.

Reproducibility: While the authors claim that their methods can be used with any package, including DeepChem, no mention is made of providing the necessary code or resources to reproduce the experiments. Including code repositories or detailed instructions would enhance the reproducibility and practical utility of the study.

Suggestion 1:

Elaborate on the Methodology: Provide an in-depth explanation of the two active learning batch selection methods, including algorithmic details, implementation considerations, and any specific assumptions made. This will enable readers to better comprehend and evaluate the proposed techniques.

Answer: We thank the reviewer for this suggestion. Following this comments we have extended the text in Methods (in Section: Batch selection via determinant maximization and Section: Approximation of the posterior distribution) and in Supporting Methods (Section: Toy example). We have also included the pseudo code for the Batch optimization method.

Suggestion 2:

Clarify Evaluation Metrics: Clearly specify the evaluation metrics employed in the study to measure the performance of the active learning methods. Additionally, conduct statistical tests to establish the significance of the improvements observed over existing batch selection methods.

Answer: Following this comment we added to Table 1 details about the way we computed the cutoff times for the different methods. We also provide more details on the statistics we performed to determine the significance of these differences.

Suggestion 3:

Enhance Reproducibility: To facilitate the reproducibility of the study, consider sharing the code, data, and resources necessary for readers to replicate the experiments. This will allow researchers in the field to validate and build upon your work more effectively.

Answer: This is something we already included with the original submission. The code is publicly available. In fact, we provide a phyton library, ALIEN (Active Learning in data Exploration) which is published on the Sanofi Github(https://github.com/ Sanofi-Public/Alien). We also provide details on the public data used and expect to provide the internal data as well. We included a small paragraph on code and data availability.

Reviewer No.2 (public)

Suggestion 1:

The authors presented a well-written manuscript describing the comparison of activelearning methods with state-of-art methods for several datasets of pharmaceutical interest. This is a very important topic since active learning is similar to a cyclic drug design campaign such as testing compounds followed by designing new ones which could be used to further tests and a new design cycle and so on. The experimental design is comprehensive and adequate for proposed comparisons. However, I would expect to see a comparison regarding other regression metrics and considering the applicability domain of models which are two essential topics for the drug design modelers community.

Answer: We want to thank the reviewer for these comments. We provide a detailed response to the specific comments below.

Reviewer No.1 (Recommendations For The Authors)

Recommendation 1:

The description provided regarding the data collection process and the benchmark datasets used in the study raises some concerns. The comment specifically addresses the use of both private (Sanofi-owned) and public datasets to benchmark the various batch selection methods. Lack of Transparency: The comment lacks transparency regarding the specific sources and origins of the private datasets. It would be crucial to disclose whether these datasets were obtained from external sources or if they were generated internally within Sanofi. Without this information, it becomes difficult to assess the potential biases or conflicts of interest associated with the data.

Answer: We would like to thank the reviewer for this comment. As mentioned in the paper, the public github page contains links to all the public data and we expect also to the internal Sanofi data. We also now provide more information on the specific experiments that were internally done by Sanofi to collect that data.

Potential Data Accessibility Issues: The utilization of private datasets, particularly those owned by Sanofi, may raise concerns about data accessibility. The lack of availability of these datasets to the wider scientific community may limit the ability of other researchers to replicate and validate the study’s findings. It is essential to ensure that the data used in research is openly accessible to foster transparency and encourage collaboration.

Answer: Again, as stated above we expect to release the data collected internally on the github page.

Limited Information on Dataset Properties: The comment briefly mentions that the benchmark datasets cover properties related to absorption, distribution, pharmacokinetic processes, and affinity of small drug molecules to target proteins. However, it does not provide any specific details about the properties included in the datasets or how they were curated. Providing more comprehensive information about the properties covered and the methods used for curation would enhance the transparency and reliability of the study.

To address these concerns, it is crucial for the authors to provide more detailed information about the data sources, dataset composition, representativeness, and curation methods employed. Transparency and accessibility of data are fundamental principles in scientific research, and addressing these issues will strengthen the credibility and impact of the study.

Answer: We agree with this comment and believe that it is important to be explicit about each of the datasets and to provide information on the new data. We note that we already discuss the details of each of the experiments in Methods and, of course, provide links to the original papers for the public data. We have now added text to Supporting Methods that describes the experiments in more details as well as providing literature references for the experimental protocols used. As noted above, we expect to provide our new internal data on the public git page.

Recommendation 2:

Some comments on the modeling example Approximation of the posterior distribution. Lack of Methodological Transparency: The comment fails to provide any information regarding the specific method or approach used for approximating the posterior distribution. Without understanding the methodology employed, it is impossible to evaluate the quality or rigor of the approximation. This lack of transparency undermines the credibility of the study.

Answer: We want to thank the reviewer for pointing this out. Based on this comment we added more information to Section: Approximation of the posterior distribution. Moreover, we now provide details on the posterior approximation in Section: Two approximations for computing the epistemic covariance.

Questionable Assumptions: The comment does not mention any of the assumptions made during the approximation process. The validity of any approximation heavily depends on the underlying assumptions, and their omission suggests a lack of thorough analysis. Failing to acknowledge these assumptions leaves room for doubt regarding the accuracy and relevance of the approximation.

Answer: We are not entirely sure which assumptions the reviewer is referring to here. The main assumption we can think of that we have used is the fact that getting within X% of the optimal model is a good enough approximation. We have specifically discussed this assumption and tested multiple values of X. While it would have been great to have X = 0 this is unrealistic for retrospective studies. For Active Learning the main question is how many experiments can be saved to obtain similar results and the assumptions we used are basically ’what is the definition of similar’. We now added this to Discussion.

Inadequate Validation: There is no mention of any validation measures or techniques used to assess the accuracy and reliability of the approximated posterior distribution. Without proper validation, it is impossible to determine whether the approximation provides a reasonable representation of the true posterior. The absence of validation raises concerns about the potential biases or errors introduced by the approximation process.

Answer: We sincerely appreciate your concern regarding the validation of the approximated posterior distribution. We acknowledge that our initial submission might not have clearly highlighted our validation strategy. It is, of course, very hard to determine the accuracy of the distribution our model learns since such distribution cannot be directly inferred using experiments (no ’ground truth’). Instead, we use an indirect method to determine the accuracy. Specifically, we conducted retrospective experiment using the learned distribution. In these experiments, we indirectly validated our approximation by measuring the error with the respective method. The results from these retrospective experiments provided evidence for the accuracy and reliability of our approximation in representing the true posterior distribution. We now emphasize this in Methods.

Uncertainty Quantification: The comment does not discuss the quantification of uncertainty associated with the approximated posterior distribution. Properly characterizing the uncertainty is crucial in statistical inference and decision-making. Neglecting this aspect undermines the usefulness and applicability of the approximation results.

Answer: Thank you for pointing out the importance of characterizing uncertainty in statistical inference and decision-making, a sentiment with which we wholeheartedly agree. In our work, we have indeed addressed the quantification of uncertainty associated with the approximated posterior distribution. Specifically, we utilized Monte Carlo Dropout (MC Dropout) as our method of choice. MC Dropout is a widely recognized and employed technique in the neural networks domain to approximate the posterior distribution, and it offers an efficient way to estimate model uncertainty without requiring any changes to the existing network architecture [1, 2]. In the revised version, we provide a more detailed discussion on the use of Monte Carlo Dropout in our methodology and its implications for characterizing uncertainty.

Comparison with Gold Standard: There is no mention of comparing the approximated posterior distribution with a gold standard or benchmark. Failing to provide such a comparison leaves doubts about the performance and accuracy of the approximation method. A lack of benchmarking makes it difficult to ascertain the superiority or inferiority of the approximation technique employed.

Answer: As noted above, it is impossible to find gold standard information for the uncertainly distribution. It is not even clear to us how such gold standard can be experimentally determined since its a function of a specific model and data. If the reviewer is aware of such gold standard we would be happy to test it. Instead, in our study, we opted to benchmark our results against state-of-the-art batch active learning methods, which also rely on uncertainty prediction (such uncertainty prediction is the heart of any active learning method as we discuss). Results clearly indicate that our method outperforms prior methods though we agree that this is only an indirect way to validate the uncertainty approximation.

Reviewer No.2 (Recommendations For The Authors)

Recommendation 1:

The text is kind of messy: there are two results sections, for example. It seems that part of the text was duplicated. Please correct it.

Answer: We want to thank the reviewer pointing this out. These were typos and we fixed them accordingly.

Recommendation 2:

Text in figures is very small and difficult to read. Please redraw the figures, increasing the font size: 10-12pt is ideal in comparison with the main text.

Answer: We want to thank the reviewer for this comment and we have made the graphics larger.

Recommendation 3: Please, include specific links to data availability instead of just stating it is available at the Sanofi-Public repository.

Answer: We want to thank the reviewer for this comment and added the links and data to the Sanofi Github page listed in the paper.

Recommendation 4:

What are the descriptors used to train the models?

Answer: We represented the molecules as molecular graphs using the MolGraphConvFeaturizer from the DeepChem library. We now explicitly mention this in Methods.

Recommendation 5:

Regarding the quality of the models, I strongly suggest two approaches instead of using only RMSE as metrics of models’ performance. I recommend using the most metrics as possible as reported by Gramatica (https://doi.org/10.1021/acs.jcim.6b00088). I also recommend somehow comparing the increment on the dataset diversity according to the employed descriptors (applicability domain) as a measurement to further applications on the unseen molecules.

Answer: We want to thank the reviewer for this great suggestions. As suggested we added new comparison metrics to the Supplement.

• Distribution plot for the range of the Y values Figure 8 • Clustering of the data sets represented as fingerprints Supplementary material Figure 5,6

• Retrospective experiments with Spearman correlation coefficient. Supplementary material Figure: 2,3,4

I suggest also a better characterization of datasets including the nature and range of the Y variable, the source of data in terms of experimentation, and chemical (structural and physicochemical) comparison of samples within each dataset.

Answer: As noted above in response to a similar comment by Reviewer 1, we have added more detailed information about the different experiments we tested to Supporting Methods.

References

[1] Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In Maria Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 1050–1059, New York, New York, USA, 20–22 Jun 2016. PMLR.

[2] N.D. Lawrence. Variational Inference in Probabilistic Models. University of Cambridge, 2001.

https://doi.org/10.7554/eLife.89679.2.sa3

Deep Batch Active Learning for Drug Discovery

Peer review process

Editors

Be the first to read new articles from eLife