Improved clinical data imputation via classical and quantum determinantal point processes

  1. Skander Kazdaghli  Is a corresponding author
  2. Iordanis Kerenidis
  3. Jens Kieckbusch
  4. Philip Teare
  1. QC Ware, France
  2. Universite de Paris, CNRS, IRIF, France
  3. Emerging Innovations Unit, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, United Kingdom
  4. Centre for AI, Data Science & AI, BioPharmaceuticals R&D, AstraZeneca, United Kingdom

Editors

Senior Editor
  1. Aleksandra M Walczak
  2. École Normale Supérieure - PSL, France
Reviewing Editor
  1. Martin Graña
  2. Institut Pasteur de Montevideo, Uruguay

Reviewer #1 (Public Review):

Summary:

The article written by Kazdaghli et al. proposes a modification of imputation methods, to better account and exploit the variability of the data. The aim is to reduce the variability of the imputation results.

The authors propose two methods, one that still includes some imputation variability, but accounts for the distribution of the data points to improve the imputation. The other one proposes a determinantal sampling, that presents no variation in the imputation data, but it seems to be, that they measure the variation in the classification task, instead. As these methods grow easily in computation requirements and time, they also propose an algorithm to run these methods in quantum processors.

Strengths:

The sampling method for imputing missing values that account for the variability of the data seems to be accurate.

Weaknesses:

The authors state "Ultimately, the quality and reliability of imputations can be measured by the performance of a downstream predictor, which is usually the AUC (area under the receiver operating curve) for a classification task." but there is no citation of other scientists doing this. I think the authors could have evaluated the imputations directly, as they mention in the introduction, I understand that the final goal in the task is to have a better classification. In a real situation, they would have data that would be used for training the algorithm, and then new data that needs to be imputed and classified. Is there any difference between imputing all the data together and training the algorithm, versus doing the imputation, training a classifier, then imputing new data (for the testing set), and then testing the classification?

I wonder if there could be some spurious interaction between the imputation and the classification methods, that could bias the data in the sense of having a better classification, but not imputing the real values; in particular when the deterministic DPP is used.

https://doi.org/10.7554/eLife.89947.3.sa1

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Skander Kazdaghli
  2. Iordanis Kerenidis
  3. Jens Kieckbusch
  4. Philip Teare
(2024)
Improved clinical data imputation via classical and quantum determinantal point processes
eLife 12:RP89947.
https://doi.org/10.7554/eLife.89947.3

Share this article

https://doi.org/10.7554/eLife.89947