Development and evaluation of a machine learning-based in-hospital COvid-19 disease outcome predictor (CODOP): a multicontinental retrospective study
Abstract
New SARS-CoV-2 variants, breakthrough infections, waning immunity, and sub-optimal vaccination rates account for surges of hospitalizations and deaths. There is an urgent need for clinically valuable and generalizable triage tools assisting the allocation of hospital resources, particularly in resource-limited countries. We developed and validate CODOP, a machine learning-based tool for predicting the clinical outcome of hospitalized COVID-19 patients. CODOP was trained, tested and validated with six cohorts encompassing 29223 COVID-19 patients from more than 150 hospitals in Spain, the USA and Latin America during 2020-22. CODOP uses 12 clinical parameters commonly measured at hospital admission for reaching high discriminative ability up to nine days before clinical resolution (AUROC: 0·90-0·96), it is well calibrated, and it enables an effective dynamic risk stratification during hospitalization. Furthermore, CODOP maintains its predictive ability independently of the virus variant and the vaccination status. To reckon with the fluctuating pressure levels in hospitals during the pandemic, we offer two online CODOP calculators, suited for undertriage or overtriage scenarios, validated with a cohort of patients from 42 hospitals in three Latin American countries (78-100% sensitivity and 89-97% specificity). The performance of CODOP in heterogeneous and geographically disperse patient cohorts and the easiness of use strongly suggest its clinical utility, particularly in resource-limited countries.
Data availability
The raw patient data used in this study are not freely available due to legal restrictions of the ethical committees of the different hospitals. However, they can be accessed upon request to the Scientific Committees of these organisms. An exception to this is the patient data from the USA cohort, which has been published elsewhere. All the model's numerical data necessary to generate all figures can be found in the submitted source data tables. Furthermore, all supplementary tables can be found in Supplementary File 1.
Article and author information
Author details
Funding
Max Planck Society (Publication cost)
- David Gómez-Varela
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Reviewing Editor
- Evangelos J Giamarellos-Bourboulis, National and Kapodistrian University of Athens, Medical School, Greece
Version history
- Preprint posted: September 22, 2021 (view preprint)
- Received: November 30, 2021
- Accepted: April 24, 2022
- Accepted Manuscript published: May 17, 2022 (version 1)
- Version of Record published: May 24, 2022 (version 2)
Copyright
© 2022, Klén et al.
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 1,364
- views
-
- 263
- downloads
-
- 5
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Epidemiology and Global Health
- Medicine
- Microbiology and Infectious Disease
eLife has published the following articles on SARS-CoV-2 and COVID-19.
-
- Computational and Systems Biology
- Genetics and Genomics
We propose a new framework for human genetic association studies: at each locus, a deep learning model (in this study, Sei) is used to calculate the functional genomic activity score for two haplotypes per individual. This score, defined as the Haplotype Function Score (HFS), replaces the original genotype in association studies. Applying the HFS framework to 14 complex traits in the UK Biobank, we identified 3619 independent HFS–trait associations with a significance of p < 5 × 10−8. Fine-mapping revealed 2699 causal associations, corresponding to a median increase of 63 causal findings per trait compared with single-nucleotide polymorphism (SNP)-based analysis. HFS-based enrichment analysis uncovered 727 pathway–trait associations and 153 tissue–trait associations with strong biological interpretability, including ‘circadian pathway-chronotype’ and ‘arachidonic acid-intelligence’. Lastly, we applied least absolute shrinkage and selection operator (LASSO) regression to integrate HFS prediction score with SNP-based polygenic risk scores, which showed an improvement of 16.1–39.8% in cross-ancestry polygenic prediction. We concluded that HFS is a promising strategy for understanding the genetic basis of human complex traits.