Development and evaluation of a machine learning-based in-hospital COVID-19 disease outcome predictor (CODOP): A multicontinental retrospective study

  1. Riku Klén
  2. Disha Purohit
  3. Ricardo Gómez-Huelgas
  4. José Manuel Casas-Rojo
  5. Juan Miguel Antón-Santos
  6. Jesús Millán Núñez-Cortés
  7. Carlos Lumbreras
  8. José Manuel Ramos-Rincón
  9. Noelia García Barrio
  10. Miguel Pedrera-Jiménez
  11. Antonio Lalueza Blanco
  12. María Dolores Martin-Escalante
  13. Francisco Rivas-Ruiz
  14. Maria Ángeles Onieva-García
  15. Pablo Young
  16. Juan Ignacio Ramirez
  17. Estela Edith Titto Omonte
  18. Rosmery Gross Artega
  19. Magdy Teresa Canales Beltrán
  20. Pascual Ruben Valdez
  21. Florencia Pugliese
  22. Rosa Castagna
  23. Ivan A Huespe
  24. Bruno Boietti
  25. Javier A Pollan
  26. Nico Funke
  27. Benjamin Leiding
  28. David Gómez-Varela  Is a corresponding author
  1. Turku PET Centre, University of Turku and Turku University Hospital, Finland
  2. Max Planck Institute of Experimental Medicine, Germany
  3. Internal Medicine Department, Regional University Hospital of Málaga, Biomedical Research Institute of Málaga (IBIMA), University of Málaga (UMA), Spain
  4. Internal Medicine Department, Infanta Cristina University Hospital, Spain
  5. Internal Medicine Department, Gregorio Marañón University Hospital, Spain
  6. Internal Medicine Department, 12 de Octubre University Hospital, Spain
  7. Internal Medicine Department, General University Hospital of Alicante, Alicante Institute for 22 Health and Biomedical Research (ISABIAL), Spain
  8. Data Science Unit, Research Institute Hospital 12 de Octubre, Spain
  9. Internal Medicine Department, Hospital Costa del Sol, Spain
  10. Hospital Costa del Sol. Research Unit, Spain
  11. Preventive Medicine Department, Hospital Costa del Sol, Spain
  12. Hospital Británico of Buenos Aires, Argentina
  13. Internal Medicine Service, Hospital Santa Cruz - Caja Petrolera de Salud, Bolivia
  14. Epidemiology Unit, Hospital of San Juan de Dios, Bolivia
  15. Instituto Hondureno of social security, Hospital Honduras Medical Centre, Honduras
  16. Hospital Velez Sarsfield, Argentina
  17. Hospital Italiano de Buenos Aires, Argentina
  18. Max Planck Institute for Experimental Medicine, Germany
  19. Institute for Software and Systems Engineering at TU Clausthal, Germany
  20. Systems Biology of Pain, Division of Pharmacology & Toxicology, Department of Pharmaceutical Sciences, University of Vienna, Austria
4 figures, 1 table and 3 additional files

Figures

Flowchart depicting the different patient cohorts used in this study and the steps followed during the development, test, and independent evaluation of CODOP.
Figure 2 with 3 supplements
Discriminatory ability (using area under the receiver operating curves or AUROC; A) and calibration curves (B) for CODOP, COPE, Zhang et al., and Age in the training dataset.
Figure 2—source data 1

Prediction values for CODOP, COPE, Zhang et.

al in the training dataset.

https://cdn.elifesciences.org/articles/75985/elife-75985-fig2-data1-v2.xlsx
Figure 2—figure supplement 1
Optimisation of the final COPOD model by selecting predictors using the least absolute shrinkage and selection operator (LASSO) method.

The mean squared error is plotted versus log of the Penalty parameter (λ). Figure is produced by function cv.glmnet from R package glmnet.

Figure 2—figure supplement 1—source data 1

Mean squared error and the Penalty parameter (λ).

https://cdn.elifesciences.org/articles/75985/elife-75985-fig2-figsupp1-data1-v2.xlsx
Figure 2—figure supplement 2
Discriminatory ability (using area under receiver operating curves or AUROC) (A) and calibration curves (B) for CODOP, COPE, Zhang et al., and Age in the test datasets.
Figure 2—figure supplement 2—source data 1

Prediction values for CODOP, COPE, Zhang et.

al. in the Test 1, Test 2, External Test 3, and Test 4 datasets.

https://cdn.elifesciences.org/articles/75985/elife-75985-fig2-figsupp2-data1-v2.xlsx
Figure 2—figure supplement 3
Discriminatory ability of CODOP (using area under receiver operating curves or AUROC) taking into account the Delta and Omicron VOCs (A) and the vaccination status of the patients (B) in the Test 4 dataset.
Figure 2—figure supplement 3—source data 1

Prediction values for CODOP in vaccinated individuals and in patients infected with the Delta or Omicron virus variants.

https://cdn.elifesciences.org/articles/75985/elife-75985-fig2-figsupp3-data1-v2.xlsx
Figure 3 with 1 supplement
Horizon analysis (A) and survival analysis (B) in the training dataset.

In the horizon plot, x-axis represents the number of days at the hospital before clinical resolution, the bar plot is for the number of samples (the green colour is for survival and red for death), and lines are for sensitivity when the specificity was fixed at 75% in the training cohort (the black line is CODOP, the red line is COPE, the green line is Zhang et al., and the blue line is Age). In the survival analysis, the risk scores refer to the probability provided by CODOP.

Figure 3—source data 1

Prediction values for CODOP, COPE, Zhang et.

al. for the horizon analysis in the training dataset.

https://cdn.elifesciences.org/articles/75985/elife-75985-fig3-data1-v2.xlsx
Figure 3—figure supplement 1
Survival analysis in the test datasets.

Shade areas indicate 95% confidence intervals. The risk scores refer to the probability provided by CODOP.

Figure 3—figure supplement 1—source data 1

Prediction values for CODOP, COPE, Zhang et.

al. for the risk stratification analysis in the Test 1, Test 2, and External Test 3 datasets.

https://cdn.elifesciences.org/articles/75985/elife-75985-fig3-figsupp1-data1-v2.xlsx
Figure 4 with 1 supplement
The geographical location of the external cohorts from 42 different Latin American hospitals used during the online evaluations (A) and performance of web calculators CODOP-Ovt and CODOP-Unt in these external cohorts number of patients from each institution are indicated in parenthesis; (B).
Figure 4—source data 1

Prediction values for CODOP in the Latin American cohort.

https://cdn.elifesciences.org/articles/75985/elife-75985-fig4-data1-v2.xlsx
Figure 4—figure supplement 1
Horizon analysis in the training dataset for sensitivity (A) and specificity (B).

The black solid line is CODOP-Ovt and black dotted line is CODOP-Unt.

Figure 4—figure supplement 1—source data 1

Prediction values for CODOP-Ovt and CODOP-Unt for the horizon analysis in the training dataset.

https://cdn.elifesciences.org/articles/75985/elife-75985-fig4-figsupp1-data1-v2.xlsx

Tables

Table 1
Features used during CODOP development with the training cohort, the values used for imputation, and the percentage of missing values.

Numerical variables are reported by median (Md) and interquartile range (IQR).

VariableImputed valueMd (IQR)Missing %
Age (years)66·67,91168 (56–79)0·0
Sex (male, female)none6 775 females and 9 127 males0·0
Hemoglobin (g/dL)13·33,20113 (12–15)1·7
Platelet Count (x 106 /L)250 097·7223,000 (164 000-311 000)1·8
Eosinophils (x 106 /L)63·81,81710 (0–100)3·0
Lymphocytes (x 106 /L)1 243·5751,000 (700-1 420)1·9
Neutrophils (x 106 /L)5 525·8944 490 (3 090-6 800)2·2
Monocytes (x 106 /L)535·8,804470 (300–660)2·7
C-Reactive Protein (mg/L)74·48,96441 (12–108)4·6
Creatinine (mg/dL)1·156,5741 (1–1)2·0
Lactate Dehydrogenase (U/L)363·9,083306 (234–424)13·0
Aspartate aminotransferase (U/L)49·27,09835 (24–53)18·4
Alanine aminotransferase (U/L)48·99,69932 (20–54)7·4
Total bilirrubin (mg/dL)0·64292021 (0–1)26·5
Serum Sodium (mmol/L)138·4,268138 (136–141)2·6
Serum Potassium (mmol/L)4·178,4414 (4–4)3·7
Glucose (mg/dL)124·2,852108 (92–135)5·2
Prothrombin time (s)19·99,79813 (12–14)35·8
Fibrinogen (mg/dL)608·0043601 (497–713)37·0
Dimer (ng/mL)2 122·158672 (370–1 320)21·7

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Riku Klén
  2. Disha Purohit
  3. Ricardo Gómez-Huelgas
  4. José Manuel Casas-Rojo
  5. Juan Miguel Antón-Santos
  6. Jesús Millán Núñez-Cortés
  7. Carlos Lumbreras
  8. José Manuel Ramos-Rincón
  9. Noelia García Barrio
  10. Miguel Pedrera-Jiménez
  11. Antonio Lalueza Blanco
  12. María Dolores Martin-Escalante
  13. Francisco Rivas-Ruiz
  14. Maria Ángeles Onieva-García
  15. Pablo Young
  16. Juan Ignacio Ramirez
  17. Estela Edith Titto Omonte
  18. Rosmery Gross Artega
  19. Magdy Teresa Canales Beltrán
  20. Pascual Ruben Valdez
  21. Florencia Pugliese
  22. Rosa Castagna
  23. Ivan A Huespe
  24. Bruno Boietti
  25. Javier A Pollan
  26. Nico Funke
  27. Benjamin Leiding
  28. David Gómez-Varela
(2022)
Development and evaluation of a machine learning-based in-hospital COVID-19 disease outcome predictor (CODOP): A multicontinental retrospective study
eLife 11:e75985.
https://doi.org/10.7554/eLife.75985