Investigating phenotypes of pulmonary COVID-19 recovery: A longitudinal observational prospective multicenter trial

  1. Thomas Sonnweber
  2. Piotr Tymoszuk
  3. Sabina Sahanic
  4. Anna Boehm
  5. Alex Pizzini
  6. Anna Luger
  7. Christoph Schwabl
  8. Manfred Nairz
  9. Philipp Grubwieser
  10. Katharina Kurz
  11. Sabine Koppelstätter
  12. Magdalena Aichner
  13. Bernhard Puchner
  14. Alexander Egger
  15. Gregor Hoermann
  16. Ewald Wöll
  17. Günter Weiss
  18. Gerlig Widmann
  19. Ivan Tancevski  Is a corresponding author
  20. Judith Löffler-Ragg  Is a corresponding author
  1. Department of Internal Medicine II, Medical University of Innsbruck, Austria
  2. Department of Radiology, Medical University of Innsbruck, Austria
  3. The Karl Landsteiner Institute, Austria
  4. Central Institute of Medical and Chemical Laboratory Diagnostics, University Hospital Innsbruck, Austria
  5. Munich Leukemia Laboratory, Germany
  6. Department of Internal Medicine, St. Vinzenz Hospital, Austria

Abstract

Background:

The optimal procedures to prevent, identify, monitor, and treat long-term pulmonary sequelae of COVID-19 are elusive. Here, we characterized the kinetics of respiratory and symptom recovery following COVID-19.

Methods:

We conducted a longitudinal, multicenter observational study in ambulatory and hospitalized COVID-19 patients recruited in early 2020 (n = 145). Pulmonary computed tomography (CT) and lung function (LF) readouts, symptom prevalence, and clinical and laboratory parameters were collected during acute COVID-19 and at 60, 100, and 180 days follow-up visits. Recovery kinetics and risk factors were investigated by logistic regression. Classification of clinical features and participants was accomplished by unsupervised and semi-supervised multiparameter clustering and machine learning.

Results:

At the 6-month follow-up, 49% of participants reported persistent symptoms. The frequency of structural lung CT abnormalities ranged from 18% in the mild outpatient cases to 76% in the intensive care unit (ICU) convalescents. Prevalence of impaired LF ranged from 14% in the mild outpatient cases to 50% in the ICU survivors. Incomplete radiological lung recovery was associated with increased anti-S1/S2 antibody titer, IL-6, and CRP levels at the early follow-up. We demonstrated that the risk of perturbed pulmonary recovery could be robustly estimated at early follow-up by clustering and machine learning classifiers employing solely non-CT and non-LF parameters.

Conclusions:

The severity of acute COVID-19 and protracted systemic inflammation is strongly linked to persistent structural and functional lung abnormality. Automated screening of multiparameter health record data may assist in the prediction of incomplete pulmonary recovery and optimize COVID-19 follow-up management.

Funding:

The State of Tyrol (GZ 71934), Boehringer Ingelheim/Investigator initiated study (IIS 1199-0424).

Clinical trial number:

ClinicalTrials.gov: NCT04416100

Editor's evaluation

This is an informative paper describing the incidence and predictors of long-term radiological and functional lung abnormalities following COVID-19. Congratulations on the importance of the work!

https://doi.org/10.7554/eLife.72500.sa0

Introduction

The ongoing COVID-19 pandemic challenges health-care systems. As of December 2021, the John Hopkins dashboard (Dong et al., 2020)⁠ reports 276 million cases and 5.4 million COVID-19-related deaths worldwide (Johns Hopkins Coronavirus Resource Center, 2021)⁠. Although the vast majority of COVID-19 patients display mild disease, approximately 10–15% of cases progress to a severe condition and approximately 5% suffer from critical illness (Perez-Saez, 2021; Huang et al., 2020). Similar to severe acute respiratory syndrome (SARS) (Hui et al., 2005; Ng et al., 2004; Ngai et al., 2010; Lam et al., 2009)⁠, a significant portion of COVID-19 patients report lingering or recurring clinical impairment and cardiopulmonary recovery may take several months to years (Sonnweber et al., 2021; Sahanic et al., 2021; Caruso et al., 2021; Huang et al., 2021b; Huang et al., 2021a; Faverio et al., 2021; Hellemons et al., 2021; Zhou et al., 2021; Venkatesan, 2021)⁠. This observation has led to the introduction of the term ‘long COVID,’ defined by the persistence of COVID-19 symptoms for more than 4 weeks, and the ‘post-acute sequelae of COVID-19’ (PASC) referring to symptom persistence for more than 12 weeks (Sahanic et al., 2021; Shah et al., 2021; Sudre et al., 2021b)⁠. Evidence-based strategies for prediction, monitoring, and treatment of PASC are urgently needed (Raghu and Wilson, 2020)⁠.

We herein prospectively analyzed the prevalence of nonresolving structural and functional lung abnormalities and persistent COVID-19-related symptoms 6 months after diagnosis. Using univariate risk modeling as well as multiparameter clustering and machine learning (ML), we investigated sets of risk factors and tested the operability of ML classifiers at predicting protracted lung and symptom recovery. The classification and prediction procedures were implemented in an open-source risk assessment tool (https://im2-ibk.shinyapps.io/CovILD/).

Methods

Study design

The CovILD (‘Development of interstitial lung disease in COVID-19’) multicenter, longitudinal observational study (Sonnweber et al., 2021) was initiated in April 2020. Adult residents of Tyrol, Austria, with symptomatic, PCR-confirmed SARS-CoV-2 infection (WHO, 2021)⁠ were enrolled by the Department of Internal Medicine II at the Medical University of Innsbruck (primary follow-up center), St. Vinzenz Hospital in Zams, and the acute rehabilitation facility in Münster (Table 1). The participants were diagnosed with COVID-19 between 3 March and 29 June 2020. In course of the study, including the 2020 SARS-CoV-2 outbreak and follow-up visits, the regional health system was able to guarantee an unrestricted, optimal standard of diagnostics and care for all participants. Corticosteroids were not standard of care during the recruitment period of the study, thus were not administered as a therapy of acute COVID-19. Some participants with nonresolving pneumonia received systemic steroids beginning from week 4 post diagnosis at the discretion of the physician (Table 2). The analysis endpoints were the presence of any, mild (severity score ≤ 5), and moderate-to-severe (severity score > 5) lung computed tomography (CT) abnormalities, impaired lung function (LF), and persistent COVID-19 symptoms at the 180-day follow-up visit (Table 3).

Table 1
Characteristics of the study population.
Characteristics (% cohort)
Total participants – no.145
Mean age, years57.3 (SD = 14.3)
Female sex42.4% (n = 63)
Obesity (body mass index >30 kg/m2)19.3% (n = 28)
Ex-smoker39.3% (n = 57)
Active smoker2.8% (n = 4)
Acute COVID-19 severity (% cohort)
Mild: outpatient24.8% (n = 36)
Moderate: inpatient without oxygen therapy25.5% (n = 37)
Severe: inpatient with oxygen therapy27.6% (n = 40)
Critical: intensive care unit22.1% (n = 32)
Comorbidities (% cohort)
None22.8% (n = 33)
Cardiovascular disease40% (n = 58)
Pulmonary disease18.6% (n = 27)
Metabolic disease43.4% (n = 63)
Chronic kidney disease6.9% (n = 10)
Gastrointestinal tract diseases13.8% (n = 20)
Malignancy11.7% (n = 17)
Table 2
Hospitalization and medication during acute COVID-19.
ParameterOutpatient (n = 36)Hospitalized (n = 37)Hospitalized oxygen therapy (n = 40)Hospitalized intensive care unit (n = 32)
Mean hospitalization time, days0 (SD = 0)6.9 (SD = 3.6)11.8 (SD = 6.3)34.8 (SD = 15.7)
Hospitalized >7 days0% (n = 0)43.2% (n = 16)80% (n = 32)100% (n = 32)
Anti-infectives11.1% (n = 4)45.9% (n = 17)72.5% (n = 29)87.5% (n = 28)
Antiplatelet drugs2.8% (n = 1)10.8% (n = 4)22.5% (n = 9)25% (n = 8)
Anticoagulatives2.8% (n = 1)2.7% (n = 1)5% (n = 2)15.6% (n = 5)
Corticosteroids*2.8% (n = 1)5.4% (n = 2)22.5% (n = 9)40.6% (n = 13)
Immunosuppression0% (n = 0)2.7% (n = 1)5% (n = 2)9.4% (n = 3)
  1. *

    From the week 4 post diagnosis on, at the discretion of the physician.

  2. Subsumed under ‘immunosuppression, acute COVID-19’ for data analysis.

  3. Immunosuppressive medication prior to COVID-19.

Table 3
Radiological, functional, and clinical study outcomes.
Outcome60-day follow-up100-day follow-up180-day follow-up
Any lung CT abnormalities (complete: n = 103)74.8% (n = 77)60.2% (n = 62)48.5% (n = 50)
Mild lung CT abnormalities (severity score ≤ 5) (complete: n = 103)26.2% (n = 27)36.9% (n = 38)29.1% (n = 30)
Moderate-to-severe CT abnormalities (severity score > 5) (complete: n = 103)48.5% (n = 50)23.3% (n = 24)19.4% (n = 20)
Functional lung impairment (complete: n = 116)39.7% (n = 46)37.1% (n = 43)33.6% (n = 39)
Persistent symptoms (complete: n = 145)79.3% (n = 115)67.6% (n = 98)49% (n = 71)
  1. CT = computed tomography.

In total, 190 COVID-19 patients were screened for participation. Thereof, n = 18 subjects refused to give informed consent, n = 27 declared difficulties to appear at the study follow-ups. Data of n = 145 participants were eligible for analysis (Figure 1). All participants gave written informed consent. The study was approved by the Institutional Review Board at the Medical University of Innsbruck (approval number: 1103/2020) and registered at ClinicalTrials.gov (NCT04416100).

Procedures

We retrospectively assessed patient characteristics during acute COVID-19 and performed follow-up investigations at 60 days (63 ± 23 days [mean ± SD]; visit 1), 100 days (103 ± 21 days; visit 2), and 180 days (190 ± 15 days; visit 3) after diagnosis of COVID-19. Each visit included symptom and physical performance assessment with a standardized questionnaire, LF testing, standard laboratory testing, and a CT scan of the chest. The variables available for analysis with their stratification schemes are listed in Appendix 1—table 1.

Serological markers were determined in certified laboratories (Central Institute of Clinical and Chemical Laboratory Diagnostics, Rheumatology and Infectious Diseases Laboratory, both at the University Hospital of Innsbruck). C-reactive protein (CRP), interleukin-6 (IL-6), N-terminal pro natriuretic peptide (NT-proBNP), and serum ferritin were measured using a Roche Cobas 8000 analyzer. D-dimer was determined with a Siemens BCS-XP instrument using the Siemens D-Dimer Innovance reagent. Anti-S1/S2 protein SARS-CoV-2 immunoglobulin gamma (IgG) were quantified with LIAISON chemoluminescence assay (DiaSorin, Italy), expressed as binding antibody units (BAU, conversion factor = 5.7) and stratified by quartiles (Ferrari et al., 2021)⁠.

Low-dose (100 kVp tube potential) craniocaudal CT scans of the chest were acquired without iodine contrast and without ECG gating on a 128-slice multidetector CT (128 × 0.6 mm collimation, 1.1 spiral pitch factor, SOMATOM Definition Flash, Siemens Healthineers, Erlangen, Germany). In case of clinically suspected pulmonary embolism, CT scans were performed with a contrast agent. Axial reconstructions were done with 1 mm slices. CT scans were evaluated for ground-glass opacities, consolidations, bronchial dilation, and reticulations as defined by the Fleischner Society. Lung findings were graded with a semi-quantitative CT severity score (0–25 points) (Sonnweber et al., 2021)⁠.

Impaired LF was defined as (1) forced vital capacity (FVC) < 80% or (2) forced expiratory volume in 1 s (FEV1) < 80%, or (3) FEV1:FVC < 70% or (4) total lung capacity (TLC) < 80% or (5) diffusing capacity of carbon monoxide (DLCO) < 80% predicted.

Statistical analysis

Statistical analyses were performed with R version 4.0.5 (Figure 1). Data transformation and visualization were accomplished by tidyverse (Wickham et al., 2019)⁠, ggplot2 (Wickham, 2016)⁠, ggvenn, plotROC (Sachs, 2017),⁠ and cowplot (Wilke, 2019)⁠ packages. The recorded variables were binarized as shown in Appendix 1—table 1. Acute COVID-19 severity strata were defined as presented in Table 1. p-Values were corrected for multiple comparisons with the Benjamini–Hochberg method (Benjamini and Hochberg, 1995), and effects were termed significant for p<0.05.

Variable overlap, kinetics, and risk modeling

Overlap between the 180-day follow-up outcome features was assessed by analysis of quasi-proportional Venn plots (package nVennR) (Pérez-Silva et al., 2018)⁠ and calculation of the Cohen’s κ statistic (package vcd) (Fleiss et al., 1969)⁠. Kinetics of binary outcome variables in participants subsets with the complete longitudinal data record was modeled with mixed-effect logistic regression (random effect: individual, fixed effect: time, packages lme4 [Bates et al., 2015]⁠ and lmerTest [Kuznetsova et al., 2017]⁠). Analyses in the severity groups were done with separate models. Significance was assessed by the likelihood ratio test (LRT) against the random-term-only model. Univariate risk modeling was performed with fixed-effect logistic regression (Appendix 1—table 2). Odds ratio (OR) significance was determined by Wald Z test. In-house-developed linear modeling wrappers around base R tools are available at https://github.com/PiotrTymoszuk/lmqc.

Cluster analysis

Clustering of non-CT and non-LF binary clinical features (Appendix 1—table 1) was accomplished with PAM algorithm (partitioning around medoids, package cluster) (Amato et al., 2019)⁠ and simple matching distance (SMD, package nomclust) (Boriah et al., 2008)⁠. Association analysis for the participants was performed with a combined procedure involving clustering of the observations by the self-organizing map algorithm (SOM, 4 × 4 hexagonal grid, SMD distance, kohonen package), followed by clustering of the SOM nodes by the Ward.D2 hierarchical clustering algorithm (Euclidean distance, hclust() function, package stats) (Vesanto and Alhoniemi, 2000; Kohonen, 1995; Wehrens and Kruisselbrink, 2018)⁠. Clustering analyses were performed in the participant subset with the complete set of clustering variables. The selection of the optimal clustering algorithm was motivated by the highest ratio of between-cluster to total variance and the best stability measured by mean classification error in 20-fold cross-validation (CV) (Figure 6—figure supplement 1A and B, Figure 7—figure supplement 1A and B; Lange et al., 2004)⁠. The optimal cluster number was determined by the bend of the within-cluster sum-of-squares curve (function fviz_nbclust(), package factoextra) and by the stability in 20-fold CV (Figure 6—figure supplement 1C and D, Figure 7—figure supplement 1D and F; Lange et al., 2004; Wang, 2010)⁠, as well as by a visual inspection of the SOM node clustering dendrograms (Figure 7—figure supplement 1E). Assignment of 180-day follow-up outcome features to the clusters of clinical parameters was accomplished with a k-nearest neighbor (kNN) label propagation algorithm (Appendix 1—table 3; Sahanic et al., 2021; Leng et al., 2013)⁠. Cluster assignment visualization in a four-dimensional principal analysis score plot was done with the PCAproj() tool (package pcaPP) (Croux et al., 2007)⁠. To determine the importance of particular clustering variables, the variance (between-cluster to total variance ratio) between the initial cluster structure and the structure with random resampling of the variable was compared, as initially proposed for the random forests ML classifier (Breiman, 2001)⁠. Frequencies of the outcome events in the participant clusters were compared with χ2 test. In-house-developed association analysis wrappers are available at https://github.com/PiotrTymoszuk/clustering-tools-2.

Machine learning

ML classifiers C5.0 (package C50) (Quinlan, 1993)⁠, random forests (randomForest) (Breiman, 2001)⁠, support vector machines with radial kernel (kernlab) (Weston and Watkins, 1998)⁠, neural networks (nnet) (Ripley, 2014)⁠, and elastic net (glmnet) (Friedman et al., 2010)⁠ were trained to predict the 180-day follow-up outcomes employing non-CT and non-LF binary explanatory features (Appendix 1—table 1). The ML training was performed in the participant subsets with the complete set of explanatory and outcome variables. The training, optimization, and CV (20-fold, five repetitions) were accomplished by the train() tool from caret package, with the Cohen’s κ statistic as a model selection metric (Appendix 1—table 4; Kuhn, 2008)⁠. Classifier ensembles were constructed with the elastic net procedure (caretStack() function, caretEnsemble package, Appendix 1—table 4; Deane-Mayer and Knowles, 2019)⁠. Classifier performance in the training cohort and CV was assessed by receiver-operating characteristics (ROCs), Cohen’s κ and accuracy (packages caret and vcd, Appendix 1—table 5; Fleiss et al., 1969; Kuhn, 2008). Variable importance measures were extracted from the C5.0 (percent variable usage, c5imp() function, package C50) (Quinlan, 1993)⁠, random forests (Δ Gini index, importance(), package randomForest) (Breiman, 2001)⁠, and elastic net classifiers (regression coefficient β, coef(), package glmnet) (Friedman et al., 2010)⁠.

Pulmonary recovery assessment app

Participant clustering and ML classifiers trained in the CovILD cohort were implemented in an open-source online pulmonary assessment R shiny app (https://im2-ibk.shinyapps.io/CovILD/; code: https://github.com/PiotrTymoszuk/COVILD-recovery-assessment-app). Prediction of the cluster assignment based on the user-provided patient data is done by the kNN label propagation algorithm (Sahanic et al., 2021; Leng et al., 2013)⁠.

Results

Patient characteristics

The CovILD study participants (n = 145) were predominantly male (57.8%), age ranging between 19 and 87 years. 77.2% of participants displayed preexisting comorbidity, predominantly cardiovascular and metabolic disease. The cohort included mild (outpatient care, 24.8%), moderate (hospitalization without oxygen supply, 25.5%), severe (hospitalization with oxygen supply, 27.6%), and critical (intensive care unit [ICU] treatment, 22.1%) cases of acute COVID-19 (Table 1). The majority of hospitalized participants received anti-infectives during acute COVID-19, anticoagulative, and/or antiplatelet treatment introduced primarily in the ventilated patients. Systemic steroid administration was initiated at the discretion of the physician beginning from week 4 after diagnosis (Table 2).

Clinical recovery after COVID-19

Most patients, irrespective of the acute COVID-19 severity, showed a significant resolution of disease symptoms over time (Figure 1, Figure 2A). Persistent complaints at the 6-month follow-up were reported by 49% of the study subjects (Table 3), with self-reported impaired physical performance (34.7%), sleep disorders (27.1%), and exertional dyspnea (22.8%) as leading manifestations. The frequency of all investigated symptoms declined significantly, even though the pace of their resolution was remarkably slower in the late (100- and 180-day follow-ups) than in the early recovery phase (acute COVID-19 till 60-day follow-up) (Figure 2B).

Study inclusion flow diagram and analysis scheme.
Kinetic of recovery from COVID-19 symptoms.

Recovery from any COVID-19 symptoms was investigated by mixed-effect logistic modeling (random effect: individual; fixed effect: time). Significance was determined by the likelihood ratio test corrected for multiple testing with the Benjamini–Hochberg method, and p-values and the numbers of complete observations are indicated in the plots. (A) Frequencies of individuals with any symptoms in the study cohort stratified by acute COVID-19 severity. (B) Frequencies of participants with particular symptoms. imp.: impaired.

Impaired LF was observed in 33.6% of the participants at the 6-month follow-up (Table 3). Except for the critical COVID-19 survivors (60 days: 66.7%; 180 days post-COVID-19: 50%), no significant reduction in the frequency of LF impairment over time was observed (Figure 3). At the 6-month follow-up, structural lung abnormalities were found in 48.5% of patients and moderate-to-severe radiological lung alterations (CT severity score > 5) were present in 19.4% of participants (Table 3). The majority of the participants with impaired LF displayed radiological lung findings. However, a substantial fraction of CT abnormalities, especially mild ones, were accompanied neither by persistent symptoms nor by LF deficits (Figure 3—figure supplement 1, Figure 3—figure supplement 2, Figure 3—figure supplement 3A).

Figure 3 with 3 supplements see all
Kinetic of pulmonary recovery.

Recovery from any lung computed tomography (CT) abnormalities, moderate-to-severe lung CT abnormalities (severity score > 5), and recovery from functional lung impairment were investigated in the participants stratified by acute COVID-19 severity by mixed-effect logistic modeling (random effect: individual; fixed effect: time). Significance was determined by the likelihood ratio test corrected for multiple testing with the Benjamini–Hochberg method. Frequencies of the given abnormality at the indicated time points are presented, and p-values and the numbers of complete observations are indicated in the plots.

The frequency, scoring, and recovery of CT lung findings were related to the severity of acute infection. Pulmonary lesions scored > 5 CT severity points at the 180-day follow-up were most frequent in the individuals with severe and critical acute COVID-19 (Figure 3—figure supplement 3). Notably, the hospitalized group with oxygen therapy demonstrated the fastest recovery kinetics. As for the symptom resolution, LF and CT lung recovery decelerated in the late phase of COVID-19 convalescence (Figure 3).

Risk factors of protracted recovery

To identify risk factors of delayed recovery at the 6-month follow-up, we screened a set of 52 binary clinical parameters (Appendix 1—table 1) recorded during acute COVID-19 and at the 60-day visit by univariate modeling (Appendix 1—table 2). By this means, no significant correlates for long-term symptom persistence were identified. Risk factors and readouts of severe and critical COVID-19 including multimorbidity, malignancy, male sex, prolonged hospitalization, ICU stay, and immunosuppressive therapy were significantly associated with persistent CT (Figure 4) and LF abnormalities (Figure 5). Persistently elevated inflammatory markers, IL-6 (>7 ng/L) and CRP (>0.5 mg/L), were strong unfavorable risk factors for incomplete radiological and functional pulmonary recovery. Additionally, the biochemical readout of microvascular inflammation, D-dimer (>500 pg/mL) was significantly linked to LF deficits. Low serum anti-S1/S2 IgG titers at the 60-day follow-up and ambulatory acute COVID-19 correlated with an improved pulmonary recovery (Figures 4 and 5).

Risk factors of persistent radiological lung abnormalities.

Association of 52 binary explanatory variables (Appendix 1—table 1) with the presence of any lung computed tomography (CT) abnormalities (A) or moderate-to-severe lung CT abnormalities (severity score > 5) (B) at the 180-day follow-up visit was investigated with a series of univariate logistic models (Appendix 1—table 2). Odds ratio (OR) significance was determined by Wald Z test and corrected for multiple testing with the Benjamini–Hochberg method. ORs with 95% confidence intervals for significant favorable and unfavorable factors are presented in forest plots. Model baseline (ref) and numbers of complete observations are presented in the plot axis text. Q1, Q2, Q3, Q4: first, second, third, and fourth quartile of anti-S1/S2 IgG titer; ICU: intensive care unit.

Risk factors of persistent functional lung impairment.

Association of 52 binary explanatory variables (Appendix 1—table 1) with the presence of functional lung impairment at the 180-day follow-up visit was investigated with a series of univariate logistic models (Appendix 1—table 2). Odds ratio (OR) significance was determined by Wald Z test and corrected for multiple testing with the Benjamini–Hochberg method. ORs with 95% confidence intervals for the significant favorable and unfavorable factors are presented in a forest plot. Model baseline (ref) and n numbers of complete observations are presented in the plot axis text. Q1, Q2, Q3, Q4: first, second, third, and fourth quartile of anti-S1/S2 IgG titer; CKD: chronic kidney disease.

Clusters of clinical features linked to persistent symptoms and lung abnormalities

Employing the unsupervised PAM algorithm (Amato et al., 2019)⁠, three clusters of co-occurring non-CT and non-LF clinical features of acute COVID-19 and early convalescence (Appendix 1—table 1) were identified (Figure 6—figure supplement 1, Appendix 1—table 3): (1) cluster 1 with male sex, hypertension, and cardiovascular and metabolic comorbidity; (2) cluster 2, including characteristics of acute COVID-19 severity and inflammatory markers; and (3) cluster 3 consisting of acute and persistent COVID-19 symptoms (Figure 6—figure supplement 2, Appendix 1—table 3).

The 6-month follow-up outcome variables were incorporated in the cluster structure using kNN prediction (Leng et al., 2013)⁠. Long-term symptom persistence was associated with acute and long-lasting COVID-19 symptoms in cluster 3, whereas pulmonary outcome parameters were grouped with cluster 2 features (Figure 6A, Figure 6—figure supplement 2, Appendix 1—table 3). Preexisting comorbidities such as malignancy, kidney, lung and gastrointestinal disease, obesity, and diabetes were found the closest cluster neighbors of mild CT abnormalities (severity score ≤ 5). Moderate-to-severe structural alterations (severity score > 5) and LF deficits were, in turn, tightly linked to markers of protracted systemic inflammation (IL-6, CRP, anemia of inflammation) (Sonnweber et al., 2020;⁠ Figure 6B).

Figure 6 with 2 supplements see all
Association of incomplete symptom, lung function, and radiological lung recovery with demographic and clinical parameters of acute COVID-19 and early recovery.

Clustering of 52 non-computed tomography (non-CT) and non-lung function binary explanatory variables recorded for acute COVID-19 or at the early 60-day follow-up visit (Appendix 1—table 1) was investigated by partitioning around medoids (PAM) algorithm with simple matching distance (SMD) dissimilarity measure (Figure 6—figure supplement 1, Appendix 1—table 3). The cluster assignment for the outcome variables at the 180-day follow-up visit (persistent symptoms, functional lung impairment, mild lung CT abnormalities [severity score ≤ 5] and moderate-to-severe lung CT abnormalities [severity score > 5]) was predicted by k-nearest neighbor (kNN) label propagation procedure. Numbers of complete observations and numbers of features in the clusters are indicated in (A). (A) Cluster assignment of the outcome variables (diamonds) presented in the plot of principal component (PC) scores. The first two major PCs are displayed. The explanatory variables are visualized as points. Percentages of the data set variance associated with the PC are presented in the plot axes. (B) Five nearest neighbors (lowest SMD) of the outcome variables presented in radial plots. Font size, point radius, and color code for SMD values. Q1, Q2, Q3, Q4: first, second, third, and fourth quartile of anti-S1/S2 IgG titer; GITD: gastrointestinal disease; CKD: chronic kidney disease; ICU: intensive care unit; COPD: chronic obstructive pulmonary disease.

Risk stratification for perturbed pulmonary recovery by unsupervised clustering

Next, we tested whether subsets of patients at risk of an incomplete 6-month recovery may be defined by a similar clustering procedure employing exclusively non-CT and non-LF clinical variables (Appendix 1—table 1). Applying a combined SOM – hierarchical clustering approach, three clusters of the study participants were identified (Figure 7, Figure 7—figure supplement 1; Vesanto and Alhoniemi, 2000; Kohonen, 1995)⁠. Prolonged hospitalization, anti-infective therapy, overweight or obesity, pain during acute COVID-19, and low anti-S1/S2 titers at the 60-day follow-up were found the most influential clustering features (Figure 7—figure supplement 2; Breiman, 2001)⁠. The patient subsets identified by the SOM approach differed significantly in frequency of radiological lung abnormalities and substantially, yet not significantly, in the frequency of LF impairment at the 180-day follow-up. In particular, most of the individuals assigned to the largest, low-risk (LR) subset were CT and LF abnormality-free. The frequency and severity of radiological pulmonary findings were elevated in the smallest intermediate-risk subset (IR) and peaked in the high-risk (HR) group (Figure 8A). Despite a comparable frequency of long-term symptoms between the LR, IR, and HR subsets (Figure 8A), the HR collective showed the lowest prevalence of dyspnea, cough, night sweating, pain, gastrointestinal manifestations, and complete absence of hyposmia at the 180-day follow-up (Figure 8B). Although the LR subset primarily comprised mild COVID-19 cases and the HR subset ICU survivors, the cluster assignment (IR vs. LR, HR vs. LR) remained an independent correlate of persistent CT and LF abnormalities after adjustment for the acute COVID-19 severity (Figure 8—figure supplement 1).

Figure 7 with 2 supplements see all
Clustering of the study participants by non-lung function and non-computed tomography (non-CT) clinical features.

Study participants (n = 133 with the complete variable set) were clustered with respect to 52 non-CT and non-lung function binary explanatory variables recorded for acute COVID-19 or at the 60-day follow-up visit (Appendix 1—table 1) using a combined self-organizing map (SOM: simple matching distance) and hierarchical clustering (Ward.D2 method, Euclidean distance) procedure (Figure 7—figure supplement 1). The numbers of participants assigned to low-risk (LR), intermediate-risk (IR), and high-risk (HR) clusters are indicated in (A). (A) Cluster assignment of the study participants in the plot of principal component (PC) scores. The first two major PCs are displayed. Percentages of the data set variance associated with the PC are presented in the plot axes. (B) Presence of the most influential clustering features (Figure 7—figure supplement 2) in the participant clusters presented as a heat map. Cluster #1, #2, and #3 refer to the feature clusters defined in Figure 6. Q1, Q2, Q3, Q4: first, second, third, and fourth quartile of anti-S1/S2 IgG titer; GITD: gastrointestinal disease; CKD: chronic kidney disease; CVD: cardiovascular disease; GI: gastrointestinal; PD: pulmonary disease.

Figure 8 with 1 supplement see all
Frequency of persistent radiological lung abnormalities, functional lung impairment, and symptoms in the participant clusters.

The clusters of study participants were defined by non-lung function and non-computed tomography (non-CT) features as presented in Figure 7. Frequencies of outcome variables at the 180-day follow-up visit (mild [severity score ≤ 5], moderate-to-severe lung CT abnormalities [severity score > 5], functional lung impairment, and persistent symptoms) were compared between the low-risk (LR), intermediate-risk (IR), and high-risk (HR) participant clusters by χ2 test corrected for multiple testing with the Benjamini–Hochberg method. p-Values and numbers of participants assigned to the clusters are indicated in the plots. (A) Frequencies of the outcome features in the participant clusters. (B) Frequencies of specific symptoms in the participant clusters.

Prediction of persistent symptoms and pulmonary abnormalities by machine learning

Finally, we investigated if the 6-month follow-up outcome may be predicted by ML classifiers trained with a set of non-CT and non-LF variables recorded during acute COVID-19 and at the 60-day follow-up (Appendix 1—table 1). To this end, five technically unrelated ML classifiers were tested (Appendix 1—table 4; Kuhn, 2008)⁠: C5.0 (Quinlan, 1993)⁠, random forests (RF) (Breiman, 2001)⁠, support vector machines with radial kernel (SVM-R) (Weston and Watkins, 1998)⁠, shallow neural network (Nnet) (Ripley, 2014)⁠, and elastic net generalized linear regression (glmNet) (Friedman et al., 2010)⁠. In addition, the single classifiers with varying outcome-specific accuracy (Figure 9—figure supplement 1) were bundled into ensembles by the elastic net procedure (Figure 9—figure supplement 2, Appendix 1—table 4; Kuhn, 2008; Deane-Mayer and Knowles, 2019)⁠. Finally, the classifier and ensemble performance was investigated in the training cohort and 20-fold CV by ROC (Appendix 1—table 5).

All tested ML algorithms and ensembles demonstrated good accuracy (area under the curve [AUC] > 0.78) and sensitivity (>0.84) at predicting any lung CT abnormalities at the 6-month follow-up in the study cohort serving as a training data set. Their efficiency in CV was moderate (AUC: 0.69–0.81; sensitivity: 0.69–0.78) (Figure 9, Figure 9—figure supplement 3, Appendix 1—table 5). In turn, moderate-to-severe structural lung findings were recognized with markedly lower sensitivity both in the training data set (>0.43) and the CV (0.39–0.48). Even though impaired LF and persistent symptoms were common at the 6-month follow-up in the training data set (Figures 2 and 3), nearly half of the cases were not identified by any of the tested ML algorithms and their ensembles in the CV setting (Figure 9, Figure 9—figure supplement 3, Appendix 1—table 5). The sensitivity of the ensembles and single classifiers at predicting CT and LF abnormalities was substantially better in severe and critical COVID-19 survivors than in ambulatory and moderate cases (Figure 10, Appendix 1—table 6).

Figure 9 with 7 supplements see all
Prediction of persistent radiological lung abnormalities, functional lung impairment, and symptoms by machine learning algorithms.

Single machine learning classifiers (C5.0; RF: random forests; SVM-R: support vector machines with radial kernel; NNet: neural network; glmNet: elastic net) and their ensemble (Ens) were trained in the cohort data set with 52 non-computed tomography (non-CT) and non-lung function binary explanatory variables recorded for acute COVID-19 or at the 60-day follow-up visit (Appendix 1—table 1) for predicting outcome variables at the 180-day follow-up visit (any lung CT abnormalities, moderate-to-severe lung CT abnormalities [severity score > 5], functional lung impairment, and persistent symptoms) (Appendix 1—table 4). The prediction accuracy was verified by repeated 20-fold cross-validation (five repeats). Receiver-operating characteristics (ROCs) of the algorithms in the cross-validation are presented: area under the curve (AUC), sensitivity (Sens), and specificity (Spec) (Appendix 1—table 5). The numbers of complete observations and outcome events are indicated under the plots.

Performance of the machine learning ensemble classifier in mild-to-moderate and severe-to-critical COVID-19 convalescents.

The machine learning classifier ensemble (Ens) was developed as presented in Figure 9. Its performance at predicting outcome variables at the 180-day follow-up visit (any computed tomography [CT] lung abnormalities, moderate-to-severe lung CT abnormalities [severity score > 5], functional lung impairment, and persistent symptoms) in the entire cohort, mild-to-moderate (outpatient or hospitalized without oxygen), and severe-to-critical COVID-19 convalescents (oxygen therapy or ICU) in repeated 20-fold cross-validation (five repeats) was assessed by receiver-operating characteristic (ROC) (Appendix 1—table 6). ROC curves and statistics (AUC: area under the curve; Se: sensitivity; Sp: specificity) in the cross-validation are shown. Numbers of complete observations and outcome events are indicated in the plots.

The most important explanatory variables for pulmonary abnormalities by three unrelated classifiers (C5.0, RF, and glmNet) included preexisting malignancy, multimorbidity, markers of systemic inflammation (IL-6 and CRP), and anti-S1/S2 antibody levels at the 60-day follow-up (Figure 9—figure supplement 4, Figure 9—figure supplement 5, Figure 9—figure supplement 6). The highly influential parameters at prediction of symptoms at the 180-day follow-up encompassed symptom presence at the 60-day follow-up, as well as obesity and dyspnea during acute COVID-19 (Figure 9—figure supplement 7).

Discussion

Herein, we prospectively evaluated trajectories of COVID-19 recovery in an observational cohort enrolled in the Austrian CovILD study (Sonnweber et al., 2021)⁠. Despite the resolution of symptoms and pulmonary abnormalities at the 6-month follow-up in a large fraction of the study participants, the recovery pace was substantially slower in the late convalescence when compared with the first three months after diagnosis (Sonnweber et al., 2021; Huang et al., 2021a)⁠. Persistent symptoms and CT findings were detected in more than 40% and reduced LF in approximately one-third of the cohort, which is in line with recovery kinetics and signs of lung lesion chronicity reported by others (Caruso et al., 2021; Huang et al., 2021b; Huang et al., 2021a; Faverio et al., 2021; Hellemons et al., 2021; Zhou et al., 2021)⁠. By comparison, similar protracted pulmonary recovery was reported for SARS (Hui et al., 2005; Ng et al., 2004; Ngai et al., 2010; Lam et al., 2009)⁠ and non-COVID-19 acute respiratory distress syndrome (Wilcox et al., 2013; Masclans et al., 2011)⁠. Of note, treatment approaches for hospitalized patients in our cohorts and similar cohorts recruited at the pandemic onset in early 2020 (Caruso et al., 2021; Huang et al., 2021b; Huang et al., 2021a; Faverio et al., 2021; Hellemons et al., 2021)⁠ differ significantly from the current standard of care for acute COVID-19, which includes early systemic steroid use and antiviral and various immunomodulatory medications. How improved standardized therapy and anti-SARS-CoV-2 vaccination affect the clinical and pulmonary recovery needs to be investigated.

In roughly half of our study participants with abnormal lung CT findings, and especially in those with low-grade structural abnormalities, no overt LF impairment at follow-up was discerned. Still, even subclinical lung alterations may bear the potential for clinically relevant progression of interstitial lung disease (Suliman et al., 2015; Hatabu et al., 2020) requiring systematic CT and LF monitoring. Conversely, symptom persistence was weakly associated with incomplete functional or structural pulmonary recovery.

Since PASC are found in as many as 10% of COVID-19 patients (Sahanic et al., 2021; Venkatesan, 2021; Sudre et al., 2021b)⁠, robust, resource-saving tools assessing the individual risk of pulmonary complications are urgently needed (Shah et al., 2021; Raghu and Wilson, 2020)⁠. Covariates and characteristics of severe acute COVID-19 such as male sex, age, and preexisting comorbidities, hospitalization, ventilation, and ICU stay were proposed as the risk factors of persistent pulmonary impairment (Sonnweber et al., 2021; Caruso et al., 2021; Huang et al., 2021a; Faverio et al., 2021; Raghu and Wilson, 2020)⁠. However, their applicability in predicting complications of pulmonary recovery from mild or moderate COVID-19 is limited. Our results of univariate modeling, clustering, and ML prediction point towards a distinct long-term pulmonary risk phenotype that manifests during acute COVID-19 and early recovery and whose central components are protracted systemic (IL-6, CRP, anemia of inflammation) and microvascular inflammation (D-dimer), and strong humoral response (anti-S1/S2 IgG) demographic risk factors and comorbidities (Sonnweber et al., 2020)⁠. Hence, consecutive monitoring of systemic inflammatory parameters analogous to concepts of interstitial lung disease in autoimmune disorders (Khanna et al., 2020) and anti-S1/S2 antibody levels may improve identification of the individuals at risk of chronic pulmonary damage irrespective of the acute COVID-19 severity.

Clustering and ML have been employed for deep phenotyping and predicting acute and post-acute COVID-19 outcomes in multivariable data sets (Sahanic et al., 2021; Sudre et al., 2021a; Estiri et al., 2021; Demichev et al., 2021; Benito-León et al., 2021)⁠. We demonstrate that subsets of COVID-19 patients that significantly differ in the risk for long-term CT abnormalities may be defined by an easily accessible clinical parameter set available at the early post-COVID-19 assessment. This approach did not involve any CT or LF variables. Furthermore, the cluster classification correlated with the risk of long-term pulmonary abnormalities independently of the acute COVID-19 severity. Thus, these characteristics provide a useful tool for broad screening of convalescent populations, including individuals who experienced mild or moderate COVID-19.

We show that technically unrelated ML classifiers and their ensemble trained without CT and LF explanatory variables can predict lung CT findings independently of their grading at the 6-month follow-up with good specificity and sensitivity in the training collective and CV. By contrast, the more specific prediction of moderate-to-severe lung CT or risk estimation for LF deficits demonstrated a limited sensitivity. For the moderate-to-severe CT abnormalities, this can be primarily traced back to their low frequency resulting in a suboptimal classifier training, especially in CV. A substantial fraction of the participants (20.7%, n = 30) suffered from a preexisting respiratory condition (pulmonary disease, asthma, or COPD) likely paralleled by LF reduction, which possibly confounded the prediction of the post-COVID-19 LF deficits both by clustering and ML. Accumulating evidence suggests that post-acute COVID-19 symptoms are highly heterogeneous conditions with multiorgan, neurocognitive, and psychological manifestations (Sahanic et al., 2021; Evans et al., 2021; Davis et al., 2021)⁠, which may differ in risk factor constellations. This could explain why univariate modeling, clustering, and ML failed to estimate persistent symptom risk in our small study cohort. In general, the ML prediction quality may greatly benefit from a larger training data set and inclusion of additional explanatory variables such as cellular readouts of inflammation, in-depth medication, and broader acute symptom data. Nevertheless, the herein described cluster- and ML classifiers represent resource-effective tools that may assist in the screening of medical record data and identification of COVID-19 patients requiring systematic CT and LF monitoring. To facilitate the identification of patients at risk for protracted respiratory recovery and enable validation in an external collective, we implemented the clustering and prediction procedures in an open-source risk assessment application (https://im2-ibk.shinyapps.io/CovILD/).

Our study bears limitations primarily concerning the low sample size and the cross-sectional character of the trial. Because of the impaired availability of the patients and the prolonged inpatient rehabiliation, the 60- and 100-day follow-up visits in part showed a temporal overlap that may have impacted the accuracy of the longitudinal data. Missingness of the consecutive outcome variable record and the participant dropout, particularly of mild and moderate COVID-19 cases, may have also potentially confounded the participant clustering results and ML risk estimation for CT abnormalities and LF impairment since prolonged hospitalization was found to be a crucial cluster-defining and influential explanatory feature. Additionally, even though the reproducibility of the risk assessment algorithms was partially addressed by CV, cluster and ML classifiers call for verification in a larger, independent multicenter collective of COVID-19 convalescents.

In summary, in our CovILD study cohort we found a high frequency of CT and LF abnormalities and persistent symptoms at the 6-month follow-up, and a flattened recovery kinetics after 3 months post-COVID-19. Systematic risk modeling reveled a set of clinical variables linked to protracted pulmonary recovery apart from the severity of acute infection such as inflammatory markers, anti-S1/S2 IgG levels, multimorbidity, and male sex. We demonstrate that clustering and ML classifiers may help to identify individuals at risk of persistent lung lesions and to relocate medical resources to prevent long-term disability.

Appendix 1

Appendix 1—table 1
Study variables.

Variable: variable name in the analysis pipeline; reference time point: study visit, the variable was recorded at; label: variable label in figures and tables.

VariableReference time pointLabelVariable typeStratification cutoff
sex_male_V0Acute COVID-19Male sexExplanatory
obesity_rec_V0Acute COVID-19ObesityExplanatoryBMI > 30 kg/m2
current_smoker_V0Acute COVID-19Current smokerExplanatory
smoking_ex_V0Acute COVID-19Ex-smokerExplanatory
CVDis_rec_V0Acute COVID-19CVDExplanatory
hypertension_rec_V0Acute COVID-19HypertensionExplanatory
PDis_rec_V0Acute COVID-19PDExplanatory
COPD_rec_V0Acute COVID-19COPDExplanatory
asthma_rec_V0Acute COVID-19AsthmaExplanatory
endocrine_metabolic_rec_V0Acute COVID-19Metabolic disordersExplanatory
hypercholesterolemia_rec_V0Acute COVID-19HypercholesterolemiaExplanatory
diabetes_rec_V0Acute COVID-19DiabetesExplanatory
CKDis_rec_V0Acute COVID-19CKDExplanatory
GITDis_rec_V0Acute COVID-19GITDExplanatory
malignancy_rec_V0Acute COVID-19MalignancyExplanatory
immune_deficiency_rec_V0Acute COVID-19Immune deficiencyExplanatory
weight_change_rec_V0Acute COVID-19Weight loss, acute COVID-19Explanatory≥1 kg
dyspnoe_rec_V0Acute COVID-19Dyspnea, acute COVID-19Explanatory
cough_rec_V0Acute COVID-19Cough, acute COVID-19Explanatory
fever_rec_V0Acute COVID-19Fever, acute COVID-19Explanatory
night_sweat_rec_V0Acute COVID-19Night sweat, acute COVID-19Explanatory
pain_rec_V0Acute COVID-19Pain, acute COVID-19Explanatory
GI_sympt_rec_V0Acute COVID-19GI symptoms, acute COVID-19Explanatory
anosmia_rec_V0Acute COVID-19Anosmia, acute COVID-19Explanatory
ECOG_imp_rec_V0Acute COVID-19Impaired performance, acute COVID-19ExplanatoryECOG ≥ 1
sleep_disorder_rec_V0Acute COVID-19Sleep disorders, acute COVID-19Explanatory
treat_antiinfec_rec_V0Acute COVID-19Anti-infectives, acute COVID-19Explanatory
treat_antiplat_rec_V0Acute COVID-19Antiplatelet, acute COVID-19Explanatory
treat_anticoag_rec_V0Acute COVID-19Anticoagulatives, acute COVID-19Explanatory
treat_immunosuppr_rec_V0Acute COVID-19Immunosuppression, acute COVID-19Explanatory
anemia_rec_V160-day follow-upAnemia, 60-day visitExplanatoryMale: Hb < 14 g/dL; female: Hb <12 g/dL
ferr_elv_rec_V160-day follow-upElevated ferritin, 60-day visitExplanatoryMale: > 300 ng/mL; female: > 150 ng/mL
NTelv_rec_V160-day follow-upElevated NTproBNP, 60-day visitExplanatory>125 pg/mL
Ddimerelv_rec_V160-day follow-upElevated D-dimer, 60-day visitExplanatory>500 pg/mL FEU
CRP_elv_rec_V160-day follow-upElevated CRP, 60-day visitExplanatory>0.5 mg/dL
IL6_elv_rec_V160-day follow-upElevated IL-6, 60-day visitExplanatory>7 pg/mL
iron_deficiency_30_rec_V160-day follow-upIron deficiency, 60-day visitExplanatoryTF-saturation < 15%
age_65_V0Acute COVID-19Age over 65Explanatory>65 years
hosp_7d_V0Acute COVID-19Hospitalized > 7 days, acute COVID-19Explanatory>7 days
comorb_present_V0Acute COVID-19Any comorbidityExplanatory>0 comorbidities
comorb_3_V0Acute COVID-19>3 comorbiditiesExplanatory>3 comorbidities
overweight_V0Acute COVID-19Overweight or obesityExplanatoryBMI > 25 kg/m2
sympt_6_V0Acute COVID-19>6 symptoms, acute COVID-19Explanatory>6 symptoms
sympt_present_V160-day follow-upPersistent symptoms, 60-day visitExplanatory>0 symptoms at 180-day visit
ab_0_V160-day follow-upAnti-S1/S2 IgG Q1, 60-day visitExplanatory(0, 312] BAU/mL
ab_25_V160-day follow-upAnti-S1/S2 IgG Q2, 60-day visitExplanatory(312, 644] BAU/mL
ab_50_V160-day follow-upAnti-S1/S2 IgG Q3, 60-day visitExplanatory(644, 975] BAU/mL
ab_75_V160-day follow-upAnti-S1/S2 IgG Q4, 60-day visitExplanatory> 975 BAU/mL
pat_group_G1_V0Acute COVID-19Ambulatory, acute COVID-19Explanatory
pat_group_G2_V0Acute COVID-19Hospitalized, acute COVID-19Explanatory
pat_group_G3_V0Acute COVID-19Oxygen therapy, acute COVID-19Explanatory
pat_group_G4_V0Acute COVID-19ICU, acute COVID-19Explanatory
CT_findings_V3180-day follow-upCT abnormalities at 180-day visitOutcome
CT_sev_low_V3180-day follow-upCT severity score 1–5 at 180-day visitOutcome
CTsevabove5_V3180-day follow-upCT severity score >5 at 180-day visitOutcome
sympt_present_V3180-day follow-upSymptoms at 180-day visitOutcome
lung_function_impaired_V3180-day follow-upLung function impairment at 180-day visitOutcome
  1. CVD = cardiovascular disease; PD = pulmonary disease; COPD = chronic obstructive pulmonary disease; CKD = chronic kidney disease; GITD = gastrointestinal disease; GI = gastrointestinal; CRP = C-reactive protein; ICU = intensive care unit; CT = computed tomography; BMI = body mass index; BAU = binding antibody unit.

Appendix 1—table 2
Results of univariate risk modeling.

Outcome: outcome variable at the 180-day follow-up visit; covariate: explanatory variable; baseline: reference level of the explanatory variable; OR: odds ratios with 95% confidence intervals; pFDR: significanct p-value corrected for multiple testing with the Benjamini–Hochberg method (FDR: false discovery rate).

OutcomeCovariateBaselineComplete casesORpFDR
CT abnormalities at 180-day visitMale sex, n = 63No male sex, n = 551183.79 [1.77–8.44]p=0.01
CT abnormalities at 180-day visitObesity, n = 22No obesity, n = 961181.07 [0.415–2.72]ns (p=0.9)
CT abnormalities at 180-day visitCurrent smoker, n = 4No current smoker, n = 1141180.412 [0.02–3.33]ns (p=0.51)
CT abnormalities at 180-day visitEx-smoker, n = 48No ex-smoker, n = 701181.5 [0.716–3.16]ns (p=0.36)
CT abnormalities at 180-day visitCVD, n = 45No CVD, n = 731183.36 [1.57–7.43]p=0.012
CT abnormalities at 180-day visitHypertension, n = 34No hypertension, n = 841183.97 [1.73–9.54]p=0.01
CT abnormalities at 180-day visitPD, n = 24No PD, n = 941182.06 [0.837–5.25]ns (p=0.2)
CT abnormalities at 180-day visitCOPD, n = 6No COPD, n = 1121182.67 [0.499–19.8]ns (p=0.34)
CT abnormalities at 180-day visitAsthma, n = 9No asthma, n = 1091181.02 [0.24–4.04]ns (p=0.99)
CT abnormalities at 180-day visitMetabolic disorders, n = 50No metabolic disorders, n = 681183.14 [1.48–6.81]p=0.017
CT abnormalities at 180-day visitHypercholesterolemia, n = 22No hypercholesterolemia, n = 961182.67 [1.04–7.27]ns (p=0.093)
CT abnormalities at 180-day visitDiabetes, n = 18No diabetes, n = 1001184.07 [1.41–13.5]p=0.041
CT abnormalities at 180-day visitGITD, n = 17No GITD, n = 1011183.66 [1.25–12.2]ns (p=0.061)
CT abnormalities at 180-day visitMalignancy, n = 13No malignancy, n = 10511819.5 [3.63–362]p=0.021
CT abnormalities at 180-day visitImmune deficiency, n = 5No immune deficiency, n = 1131181.96 [0.313–15.3]ns (p=0.53)
CT abnormalities at 180-day visitWeight loss, acute COVID-19, n = 84No weight loss, acute COVID-19, n = 341184.45 [1.83–12.1]p=0.011
CT abnormalities at 180-day visitDyspnea, acute COVID-19, n = 81No dyspnea, acute COVID-19, n = 371181.45 [0.661–3.27]ns (p=0.43)
CT abnormalities at 180-day visitCough, acute COVID-19, n = 83No cough, acute COVID-19, n = 351181.07 [0.484–2.41]ns (p=0.89)
CT abnormalities at 180-day visitFever, acute COVID-19, n = 83No fever, acute COVID-19, n = 351182.56 [1.12–6.21]ns (p=0.072)
CT abnormalities at 180-day visitNight sweat, acute COVID-19, n = 74No night sweat, acute COVID-19, n = 441181.93 [0.902–4.26]ns (p=0.17)
CT abnormalities at 180-day visitPain, acute COVID-19, n = 65No pain, acute COVID-19, n = 531180.339 [0.157–0.713]p=0.021
CT abnormalities at 180-day visitGI symptoms, acute COVID-19, n = 47No GI symptoms, acute COVID-19, n = 711180.675 [0.316–1.42]ns (p=0.38)
CT abnormalities at 180-day visitAnosmia, acute COVID-19, n = 53No anosmia, acute COVID-19, n = 651181.09 [0.526–2.28]ns (p=0.85)
CT abnormalities at 180-day visitImpaired performance, acute COVID-19, n = 106No impaired performance, acute COVID-19, n = 121181.12 [0.335–3.98]ns (p=0.89)
CT abnormalities at 180-day visitSleep disorders, acute COVID-19, n = 40No sleep disorders, acute COVID-19, n = 771170.887 [0.407–1.91]ns (p=0.82)
CT abnormalities at 180-day visitAnti-infectives, acute COVID-19, n = 64No anti-infectives, acute COVID-19, n = 541183.56 [1.67–7.9]p=0.01
CT abnormalities at 180-day visitAntiplatelet, acute COVID-19, n = 12No antiplatelet, acute COVID-19, n = 1061184.4 [1.23–20.7]ns (p=0.077)
CT abnormalities at 180-day visitAnticoagulatives, acute COVID-19, n = 4No anticoagulatives, acute COVID-19, n = 1141183.98 [0.493–81.8]ns (p=0.32)
CT abnormalities at 180-day visitImmunosuppression, acute COVID-19, n = 20No immunosuppression, acute COVID-19, n = 981186.89 [2.32–25.5]p=0.01
CT abnormalities at 180-day visitAnemia, 60-day visit, n = 10No anemia, 60-day visit, n = 1081185.82 [1.38–39.8]ns (p=0.072)
CT abnormalities at 180-day visitElevated ferritin, 60-day visit, n = 20No elevated ferritin, 60-day visit, n = 981182.18 [0.825–6.01]ns (p=0.2)
CT abnormalities at 180-day visitElevated NTproBNP, 60-day visit, n = 38No elevated NTproBNP, 60-day visit, n = 801182.29 [1.05–5.1]ns (p=0.084)
CT abnormalities at 180-day visitElevated D-dimer, 60-day visit, n = 49No elevated D-dimer, 60-day visit, n = 691182.9 [1.37–6.28]p=0.023
CT abnormalities at 180-day visitElevated CRP, 60-day visit, n = 18No elevated CRP, 60-day visit, n = 1001185.71 [1.89–21.3]p=0.019
CT abnormalities at 180-day visitElevated IL-6, 60-day visit, n = 11No elevated IL-6, 60-day visit, n = 10711815.5 [2.81–289]p=0.036
CT abnormalities at 180-day visitIron deficiency, 60-day visit, n = 6No iron deficiency, 60-day visit, n = 1121180.239 [0.0123–1.55]ns (p=0.29)
CT abnormalities at 180-day visitAge over 65, n = 32No age over 65, n = 861182.81 [1.23–6.66]p=0.045
CT abnormalities at 180-day visitHospitalized >7 days, acute COVID-19, n = 59No hospitalized >7 days, acute COVID-19, n = 591184.93 [2.28–11.1]p=0.0026
CT abnormalities at 180-day visitAny comorbidity, n = 90No any comorbidity, n = 281186.86 [2.41–24.8]p=0.01
CT abnormalities at 180-day visit>3 comorbidities, n = 37No >3 comorbidities, n = 811186.05 [2.62–14.9]p=0.0026
CT abnormalities at 180-day visitOverweight or obesity, n = 72No overweight or obesity, n = 461181.61 [0.762–3.48]ns (p=0.3)
CT abnormalities at 180-day visit>6 symptoms, acute COVID-19, n = 33No >6 symptoms, acute COVID-19, n = 851180.767 [0.333–1.73]ns (p=0.59)
CT abnormalities at 180-day visitPersistent symptoms, 60-day visit, n = 93No persistent symptoms, 60-day visit, n = 251181.91 [0.769–5.08]ns (p=0.26)
CT abnormalities at 180-day visitAnti-S1/S2 IgG Q1, 60-day visit, n = 31No anti-S1/S2 IgG Q1, 60-day visit, n = 791100.0769 [0.0173–0.24]p=0.0026
CT abnormalities at 180-day visitAnti-S1/S2 IgG Q2, 60-day visit, n = 30No anti-S1/S2 IgG Q2, 60-day visit, n = 801101.12 [0.481–2.62]ns (p=0.83)
CT abnormalities at 180-day visitAnti-S1/S2 IgG Q3, 60-day visit, n = 27No anti-S1/S2 IgG Q3, 60-day visit, n = 831101.8 [0.753–4.4]ns (p=0.28)
CT abnormalities at 180-day visitAnti-S1/S2 IgG Q4, 60-day visit, n = 22No anti-S1/S2 IgG Q4, 60-day visit, n = 881105.95 [2.13–19.5]p=0.01
CT abnormalities at 180-day visitAmbulatory, acute COVID-19, n = 33No ambulatory, acute COVID-19, n = 851180.106 [0.0296–0.299]p=0.0026
CT abnormalities at 180-day visitHospitalized, acute COVID-19, n = 33No hospitalized, acute COVID-19, n = 851181.28 [0.569–2.88]ns (p=0.61)
CT abnormalities at 180-day visitOxygen therapy, acute COVID-19, n = 33No oxygen therapy, acute COVID-19, n = 851181.52 [0.676–3.43]ns (p=0.38)
CT abnormalities at 180-day visitICU, acute COVID-19, n = 19No ICU, acute COVID-19, n = 991186.28 [2.1–23.3]p=0.012
CT severity score >5 at 180-day visitMale sex, n = 63No male sex, n = 551185.1 [1.75–18.7]p=0.01
CT severity score >5 at 180-day visitObesity, n = 22No obesity, n = 961180.38 [0.0577–1.46]ns (p=0.26)
CT severity score >5 at 180-day visitCurrent smoker, n = 4No current smoker, n = 1141181.48 [0.0711–12.2]ns (p=0.77)
CT severity score >5 at 180-day visitEx-smoker, n = 48No ex-smoker, n = 701181.59 [0.623–4.09]ns (p=0.37)
CT severity score >5 at 180-day visitCVD, n = 45No CVD, n = 731184.71 [1.8–13.5]p=0.0042
CT severity score >5 at 180-day visitHypertension, n = 34No hypertension, n = 841183.17 [1.21–8.38]p=0.029
CT severity score >5 at 180-day visitPD, n = 24No PD, n = 941182.17 [0.735–6.02]ns (p=0.18)
CT severity score >5 at 180-day visitCOPD, n = 6No COPD, n = 1121182.3 [0.304–12.7]ns (p=0.39)
CT severity score >5 at 180-day visitAsthma, n = 9No asthma, n = 1091182.37 [0.468–9.85]ns (p=0.29)
CT severity score >5 at 180-day visitMetabolic disorders, n = 50No metabolic disorders, n = 681182.92 [1.14–7.95]p=0.045
CT severity score >5 at 180-day visitHypercholesterolemia, n = 22No hypercholesterolemia, n = 961182.52 [0.845–7.12]ns (p=0.12)
CT severity score >5 at 180-day visitDiabetes, n = 18No diabetes, n = 1001182.63 [0.816–7.87]ns (p=0.12)
CT severity score >5 at 180-day visitCKD, n = 6No CKD, n = 1121184.89 [0.851–28.3]ns (p=0.091)
CT severity score >5 at 180-day visitGITD, n = 17No GITD, n = 1011182.9 [0.892–8.83]ns (p=0.092)
CT severity score >5 at 180-day visitMalignancy, n = 13No malignancy, n = 1051180.333 [0.0178–1.84]ns (p=0.35)
CT severity score >5 at 180-day visitImmune deficiency, n = 5No immune deficiency, n = 1131187.42 [1.16–59.3]ns (p=0.052)
CT severity score >5 at 180-day visitWeight loss, acute COVID-19, n = 84No weight loss, acute COVID-19, n = 341183.02 [0.939–13.5]ns (p=0.13)
CT severity score >5 at 180-day visitDyspnea, acute COVID-19, n = 81No dyspnea, acute COVID-19, n = 371181.7 [0.609–5.54]ns (p=0.38)
CT severity score >5 at 180-day visitCough, acute COVID-19, n = 83No cough, acute COVID-19, n = 351180.537 [0.206–1.44]ns (p=0.25)
CT severity score >5 at 180-day visitFever, acute COVID-19, n = 83No fever, acute COVID-19, n = 351182.15 [0.727–7.9]ns (p=0.24)
CT severity score >5 at 180-day visitNight sweat, acute COVID-19, n = 74No night sweat, acute COVID-19, n = 441182.33 [0.84–7.55]ns (p=0.17)
CT severity score >5 at 180-day visitPain, acute COVID-19, n = 65No pain, acute COVID-19, n = 531180.495 [0.187–1.26]ns (p=0.18)
CT severity score >5 at 180-day visitGI symptoms, acute COVID-19, n = 47No GI symptoms, acute COVID-19, n = 711180.503 [0.168–1.34]ns (p=0.23)
CT severity score >5 at 180-day visitAnosmia, acute COVID-19, n = 53No anosmia, acute COVID-19, n = 651181.61 [0.634–4.16]ns (p=0.36)
CT severity score >5 at 180-day visitImpaired performance, acute COVID-19, n = 106No impaired performance, acute COVID-19, n = 121182.72 [0.486–51]ns (p=0.39)
CT severity score >5 at 180-day visitSleep disorders, acute COVID-19, n = 40No sleep disorders, acute COVID-19, n = 771171.13 [0.412–2.91]ns (p=0.84)
CT severity score >5 at 180-day visitAnti-infectives, acute COVID-19, n = 64No anti-infectives, acute COVID-19, n = 541184.89 [1.68–17.9]p=0.012
CT severity score >5 at 180-day visitAntiplatelet, acute COVID-19, n = 12No antiplatelet, acute COVID-19, n = 1061183.74 [1.01–13.2]ns (p=0.06)
CT severity score >5 at 180-day visitAnticoagulatives, acute COVID-19, n = 4No anticoagulatives, acute COVID-19, n = 1141184.7 [0.538–41.1]ns (p=0.17)
CT severity score >5 at 180-day visitImmunosuppression, acute COVID-19, n = 20No immunosuppression, acute COVID-19, n = 981185.35 [1.85–15.6]p=0.0036
CT severity score >5 at 180-day visitAnemia, 60-day visit, n = 10No anemia, 60-day visit, n = 1081188.62 [2.23–37.1]p=0.0039
CT severity score >5 at 180-day visitElevated ferritin, 60-day visit, n = 20No elevated ferritin, 60-day visit, n = 981182.2 [0.693–6.42]ns (p=0.2)
CT severity score >5 at 180-day visitElevated NTproBNP, 60-day visit, n = 38No elevated NTproBNP, 60-day visit, n = 801183.23 [1.25–8.55]p=0.026
CT severity score >5 at 180-day visitElevated D-dimer, 60-day visit, n = 49No elevated D-dimer, 60-day visit, n = 691182.41 [0.945–6.38]ns (p=0.096)
CT severity score >5 at 180-day visitElevated CRP, 60-day visit, n = 18No elevated CRP, 60-day visit, n = 1001184.91 [1.63–14.7]p=0.0075
CT severity score >5 at 180-day visitElevated IL-6, 60-day visit, n = 11No elevated IL-6, 60-day visit, n = 10711832.5 [7.43–230]p=7.5e-05
CT severity score >5 at 180-day visitIron deficiency, 60-day visit, n = 6No iron deficiency, 60-day visit, n = 1121180.867 [0.044–5.75]ns (p=0.92)
CT severity score >5 at 180-day visitAge over 65, n = 32No age over 65, n = 861182.8 [1.05–7.4]ns (p=0.055)
CT severity score >5 at 180-day visitHospitalized >7 days, acute COVID-19, n = 59No hospitalized >7 days, acute COVID-19, n = 591184.37 [1.58–14.2]p=0.012
CT severity score >5 at 180-day visitAny comorbidity, n = 90No any comorbidity, n = 281188.22 [1.59–151]ns (p=0.065)
CT severity score >5 at 180-day visit>3 comorbidities, n = 37No >3 comorbidities, n = 811185.55 [2.11–15.5]p=0.0013
CT severity score >5 at 180-day visitOverweight or obesity, n = 72No overweight or obesity, n = 461180.72 [0.282–1.87]ns (p=0.53)
CT severity score >5 at 180-day visit>6 symptoms, acute COVID-19, n = 33No >6 symptoms, acute COVID-19, n = 851181.26 [0.438–3.35]ns (p=0.69)
CT severity score >5 at 180-day visitPersistent symptoms, 60-day visit, n = 93No persistent symptoms, 60-day visit, n = 251183.15 [0.831–20.7]ns (p=0.18)
CT severity score >5 at 180-day visitAnti-S1/S2 IgG Q2, 60-day visit, n = 30No anti-S1/S2 IgG Q2, 60-day visit, n = 801101.87 [0.666–5.07]ns (p=0.26)
CT severity score >5 at 180-day visitAnti-S1/S2 IgG Q3, 60-day visit, n = 27No anti-S1/S2 IgG Q3, 60-day visit, n = 831100.675 [0.18–2.05]ns (p=0.55)
CT severity score >5 at 180-day visitAnti-S1/S2 IgG Q4, 60-day visit, n = 22No anti-S1/S2 IgG Q4, 60-day visit, n = 881104.38 [1.53–12.6]p=0.01
CT severity score >5 at 180-day visitAmbulatory, acute COVID-19, n = 33No ambulatory, acute COVID-19, n = 851180.0952 [0.0052–0.488]p=0.039
CT severity score >5 at 180-day visitHospitalized, acute COVID-19, n = 33No hospitalized, acute COVID-19, n = 851180.714 [0.218–2.01]ns (p=0.58)
CT severity score >5 at 180-day visitOxygen therapy, acute COVID-19, n = 33No oxygen therapy, acute COVID-19, n = 851180.958 [0.316–2.61]ns (p=0.95)
CT severity score >5 at 180-day visitICU, acute COVID-19, n = 19No ICU, acute COVID-19, n = 991188.06 [2.75–24.5]p=0.00035
Symptoms at 180-day visitMale sex, n = 82No male sex, n = 631450.701 [0.361–1.35]ns (p=0.97)
Symptoms at 180-day visitObesity, n = 28No obesity, n = 1171450.42 [0.169–0.982]ns (p=0.84)
Symptoms at 180-day visitCurrent smoker, n = 4No current smoker, n = 1411453.22 [0.401–66]ns (p=0.97)
Symptoms at 180-day visitEx-smoker, n = 57No ex-smoker, n = 881451.27 [0.654–2.49]ns (p=0.97)
Symptoms at 180-day visitCVD, n = 58No CVD, n = 871450.851 [0.436–1.66]ns (p=0.97)
Symptoms at 180-day visitHypertension, n = 44No hypertension, n = 1011450.931 [0.456–1.89]ns (p=0.97)
Symptoms at 180-day visitPD, n = 27No PD, n = 1181451.38 [0.598–3.26]ns (p=0.97)
Symptoms at 180-day visitCOPD, n = 8No COPD, n = 1371451.04 [0.238–4.58]ns (p=0.97)
Symptoms at 180-day visitAsthma, n = 10No asthma, n = 1351451.05 [0.279–3.92]ns (p=0.97)
Symptoms at 180-day visitMetabolic disorders, n = 63No metabolic disorders, n = 821451.02 [0.527–1.96]ns (p=0.97)
Symptoms at 180-day visitHypercholesterolemia, n = 27No hypercholesterolemia, n = 1181450.55 [0.226–1.28]ns (p=0.97)
Symptoms at 180-day visitDiabetes, n = 24No diabetes, n = 1211451.05 [0.434–2.54]ns (p=0.97)
Symptoms at 180-day visitCKD, n = 10No CKD, n = 1351451.62 [0.442–6.56]ns (p=0.97)
Symptoms at 180-day visitGITD, n = 20No GITD, n = 1251451.68 [0.649–4.55]ns (p=0.97)
Symptoms at 180-day visitMalignancy, n = 17No malignancy, n = 1281450.7 [0.241–1.94]ns (p=0.97)
Symptoms at 180-day visitImmune deficiency, n = 9No immune deficiency, n = 1361450.824 [0.197–3.24]ns (p=0.97)
Symptoms at 180-day visitWeight loss, acute COVID-19, n = 106No weight loss, acute COVID-19, n = 391451.34 [0.644–2.84]ns (p=0.97)
Symptoms at 180-day visitDyspnea, acute COVID-19, n = 98No dyspnea, acute COVID-19, n = 471452.84 [1.39–6.04]ns (p=0.2)
Symptoms at 180-day visitCough, acute COVID-19, n = 102No cough, acute COVID-19, n = 431451.97 [0.96–4.17]ns (p=0.88)
Symptoms at 180-day visitFever, acute COVID-19, n = 106No fever, acute COVID-19, n = 391451.17 [0.559–2.45]ns (p=0.97)
Symptoms at 180-day visitNight sweat, acute COVID-19, n = 92No night sweat, acute COVID-19, n = 531451.42 [0.723–2.83]ns (p=0.97)
Symptoms at 180-day visitPain, acute COVID-19, n = 78No pain, acute COVID-19, n = 671451.92 [0.993–3.75]ns (p=0.84)
Symptoms at 180-day visitGI symptoms, acute COVID-19, n = 59No GI symptoms, acute COVID-19, n = 861451.27 [0.656–2.48]ns (p=0.97)
Symptoms at 180-day visitAnosmia, acute COVID-19, n = 62No anosmia, acute COVID-19, n = 831451.69 [0.874–3.31]ns (p=0.96)
Symptoms at 180-day visitImpaired performance, acute COVID-19, n = 132No impaired performance, acute COVID-19, n = 131451.13 [0.358–3.69]ns (p=0.97)
Symptoms at 180-day visitSleep disorders, acute COVID-19, n = 56No sleep disorders, acute COVID-19, n = 881441.38 [0.708–2.73]ns (p=0.97)
Symptoms at 180-day visitAnti-infectives, acute COVID-19, n = 78No anti-infectives, acute COVID-19, n = 671450.701 [0.362–1.35]ns (p=0.97)
Symptoms at 180-day visitAntiplatelet, acute COVID-19, n = 22No antiplatelet, acute COVID-19, n = 1231451.05 [0.42–2.63]ns (p=0.97)
Symptoms at 180-day visitAnticoagulatives, acute COVID-19, n = 9No anticoagulatives, acute COVID-19, n = 1361452.18 [0.553–10.7]ns (p=0.97)
Symptoms at 180-day visitImmunosuppression, acute COVID-19, n = 27No immunosuppression, acute COVID-19, n = 1181451.38 [0.598–3.26]ns (p=0.97)
Symptoms at 180-day visitAnemia, 60-day visit, n = 16No anemia, 60-day visit, n = 1291450.591 [0.191–1.69]ns (p=0.97)
Symptoms at 180-day visitElevated ferritin, 60-day visit, n = 26No elevated ferritin, 60-day visit, n = 1181441.29 [0.551–3.07]ns (p=0.97)
Symptoms at 180-day visitElevated NTproBNP, 60-day visit, n = 52No elevated NTproBNP, 60-day visit, n = 931451.96 [0.987–3.94]ns (p=0.84)
Symptoms at 180-day visitElevated D-dimer, 60-day visit, n = 60No elevated D-dimer, 60-day visit, n = 851451.7 [0.874–3.33]ns (p=0.96)
Symptoms at 180-day visitElevated CRP, 60-day visit, n = 23No elevated CRP, 60-day visit, n = 1221451.16 [0.475–2.88]ns (p=0.97)
Symptoms at 180-day visitElevated IL-6, 60-day visit, n = 17No elevated IL-6, 60-day visit, n = 1281450.529 [0.173–1.48]ns (p=0.97)
Symptoms at 180-day visitIron deficiency, 60-day visit, n = 6No iron deficiency, 60-day visit, n = 1381442.18 [0.412–16.1]ns (p=0.97)
Symptoms at 180-day visitAge over 65, n = 43No age over 65, n = 1021451.69 [0.827–3.51]ns (p=0.97)
Symptoms at 180-day visitHospitalized >7 days, acute COVID-19, n = 80No hospitalized >7 days, acute COVID-19, n = 651451.1 [0.569–2.12]ns (p=0.97)
Symptoms at 180-day visitAny comorbidity, n = 112No any comorbidity, n = 331451.03 [0.47–2.24]ns (p=0.97)
Symptoms at 180-day visit>3 comorbidities, n = 47No >3 comorbidities, n = 981451.46 [0.727–2.95]ns (p=0.97)
Symptoms at 180-day visitOverweight or obesity, n = 86No overweight or obesity, n = 591450.7 [0.358–1.36]ns (p=0.97)
Symptoms at 180-day visit>6 symptoms, acute COVID-19, n = 42No >6 symptoms, acute COVID-19, n = 1031451.82 [0.885–3.82]ns (p=0.96)
Symptoms at 180-day visitPersistent symptoms, 60-day visit, n = 115No persistent symptoms, 60-day visit, n = 301454.12 [1.71–11.1]ns (p=0.2)
Symptoms at 180-day visitAnti-S1/S2 IgG Q1, 60-day visit, n = 34No anti-S1/S2 IgG Q1, 60-day visit, n = 1001341.04 [0.476–2.28]ns (p=0.97)
Symptoms at 180-day visitAnti-S1/S2 IgG Q2, 60-day visit, n = 33No anti-S1/S2 IgG Q2, 60-day visit, n = 1011341.13 [0.512–2.49]ns (p=0.97)
Symptoms at 180-day visitAnti-S1/S2 IgG Q3, 60-day visit, n = 34No anti-S1/S2 IgG Q3, 60-day visit, n = 1001340.646 [0.289–1.41]ns (p=0.97)
Symptoms at 180-day visitAnti-S1/S2 IgG Q4, 60-day visit, n = 33No anti-S1/S2 IgG Q4, 60-day visit, n = 1011341.32 [0.603–2.95]ns (p=0.97)
Symptoms at 180-day visitAmbulatory, acute COVID-19, n = 36No ambulatory, acute COVID-19, n = 1091450.911 [0.426–1.94]ns (p=0.97)
Symptoms at 180-day visitHospitalized, acute COVID-19, n = 37No hospitalized, acute COVID-19, n = 1081450.983 [0.463–2.08]ns (p=0.97)
Symptoms at 180-day visitOxygen therapy, acute COVID-19, n = 40No oxygen therapy, acute COVID-19, n = 1051450.922 [0.442–1.91]ns (p=0.97)
Symptoms at 180-day visitICU, acute COVID-19, n = 32No ICU, acute COVID-19, n = 1131451.24 [0.564–2.74]ns (p=0.97)
Lung function impairment at 180-day visitMale sex, n = 71No male sex, n = 511222.12 [0.964–4.85]ns (p=0.1)
Lung function impairment at 180-day visitObesity, n = 22No obesity, n = 1001221.94 [0.746–5]ns (p=0.22)
Lung function impairment at 180-day visitCurrent smoker, n = 3No current smoker, n = 1191224.26 [0.397–93.4]ns (p=0.3)
Lung function impairment at 180-day visitEx-smoker, n = 45No ex-smoker, n = 771221.95 [0.897–4.26]ns (p=0.13)
Lung function impairment at 180-day visitCVD, n = 49No CVD, n = 731221.57 [0.727–3.39]ns (p=0.31)
Lung function impairment at 180-day visitHypertension, n = 35No hypertension, n = 871221.56 [0.683–3.54]ns (p=0.34)
Lung function impairment at 180-day visitPD, n = 23No PD, n = 991222.21 [0.869–5.62]ns (p=0.13)
Lung function impairment at 180-day visitCOPD, n = 7No COPD, n = 1151222.93 [0.615–15.5]ns (p=0.23)
Lung function impairment at 180-day visitAsthma, n = 9No asthma, n = 1131221.03 [0.208–4.12]ns (p=0.97)
Lung function impairment at 180-day visitMetabolic disorders, n = 53No metabolic disorders, n = 691221.73 [0.807–3.73]ns (p=0.22)
Lung function impairment at 180-day visitHypercholesterolemia, n = 24No hypercholesterolemia, n = 981221.03 [0.383–2.61]ns (p=0.96)
Lung function impairment at 180-day visitDiabetes, n = 21No diabetes, n = 1011222.15 [0.816–5.64]ns (p=0.16)
Lung function impairment at 180-day visitCKD, n = 8No CKD, n = 11412217.2 [2.9–328]p=0.02
Lung function impairment at 180-day visitGITD, n = 16No GITD, n = 1061223.11 [1.07–9.42]ns (p=0.061)
Lung function impairment at 180-day visitMalignancy, n = 14No malignancy, n = 1081224.47 [1.43–15.6]p=0.025
Lung function impairment at 180-day visitImmune deficiency, n = 6No immune deficiency, n = 1161222.14 [0.38–12]ns (p=0.43)
Lung function impairment at 180-day visitWeight loss, acute COVID-19, n = 91No weight loss, acute COVID-19, n = 311221.56 [0.645–4.08]ns (p=0.41)
Lung function impairment at 180-day visitDyspnea, acute COVID-19, n = 82No dyspnea, acute COVID-19, n = 401223.17 [1.31–8.58]p=0.029
Lung function impairment at 180-day visitCough, acute COVID-19, n = 88No cough, acute COVID-19, n = 341220.856 [0.375–2.01]ns (p=0.76)
Lung function impairment at 180-day visitFever, acute COVID-19, n = 92No fever, acute COVID-19, n = 301221.19 [0.496–3.01]ns (p=0.76)
Lung function impairment at 180-day visitNight sweat, acute COVID-19, n = 79No night sweat, acute COVID-19, n = 431220.39 [0.176–0.852]p=0.033
Lung function impairment at 180-day visitPain, acute COVID-19, n = 65No pain, acute COVID-19, n = 571220.609 [0.282–1.3]ns (p=0.26)
Lung function impairment at 180-day visitGI symptoms, acute COVID-19, n = 46No GI symptoms, acute COVID-19, n = 761220.715 [0.316–1.57]ns (p=0.47)
Lung function impairment at 180-day visitAnosmia, acute COVID-19, n = 51No anosmia, acute COVID-19, n = 711220.895 [0.41–1.92]ns (p=0.82)
Lung function impairment at 180-day visitImpaired performance, acute COVID-19, n = 111No impaired performance, acute COVID-19, n = 111220.84 [0.238–3.38]ns (p=0.82)
Lung function impairment at 180-day visitSleep disorders, acute COVID-19, n = 46No sleep disorders, acute COVID-19, n = 751210.7 [0.309–1.54]ns (p=0.44)
Lung function impairment at 180-day visitAnti-infectives, acute COVID-19, n = 63No anti-infectives, acute COVID-19, n = 591222.65 [1.22–6]p=0.03
Lung function impairment at 180-day visitAntiplatelet, acute COVID-19, n = 17No antiplatelet, acute COVID-19, n = 1051224.8 [1.67–15.1]p=0.011
Lung function impairment at 180-day visitAnticoagulatives, acute COVID-19, n = 7No anticoagulatives, acute COVID-19, n = 1151222.93 [0.615–15.5]ns (p=0.23)
Lung function impairment at 180-day visitImmunosuppression, acute COVID-19, n = 22No immunosuppression, acute COVID-19, n = 1001222.45 [0.95–6.34]ns (p=0.096)
Lung function impairment at 180-day visitAnemia, 60-day visit, n = 11No anemia, 60-day visit, n = 1111224.14 [1.17–16.7]ns (p=0.053)
Lung function impairment at 180-day visitElevated ferritin, 60-day visit, n = 21No elevated ferritin, 60-day visit, n = 1001211.37 [0.498–3.6]ns (p=0.58)
Lung function impairment at 180-day visitElevated NTproBNP, 60-day visit, n = 44No elevated NTproBNP, 60-day visit, n = 781222.42 [1.11–5.33]p=0.046
Lung function impairment at 180-day visitElevated D-dimer, 60-day visit, n = 50No elevated D-dimer, 60-day visit, n = 721223.23 [1.49–7.2]p=0.0089
Lung function impairment at 180-day visitElevated CRP, 60-day visit, n = 17No elevated CRP, 60-day visit, n = 1051226.6 [2.24–22.3]p=0.0029
Lung function impairment at 180-day visitElevated IL-6, 60-day visit, n = 9No elevated IL-6, 60-day visit, n = 11312220.2 [3.52–383]p=0.013
Lung function impairment at 180-day visitIron deficiency, 60-day visit, n = 6No iron deficiency, 60-day visit, n = 1151211.05 [0.142–5.65]ns (p=0.96)
Lung function impairment at 180-day visitAge over 65, n = 33No age over 65, n = 891222.55 [1.11–5.88]p=0.046
Lung function impairment at 180-day visitHospitalized >7 days, acute COVID-19, n = 66No hospitalized >7 days, acute COVID-19, n = 561223.83 [1.7–9.21]p=0.0045
Lung function impairment at 180-day visitAny comorbidity, n = 93No any comorbidity, n = 291223.95 [1.39–14.2]p=0.032
Lung function impairment at 180-day visit>3 comorbidities, n = 41No >3 comorbidities, n = 811222.47 [1.12–5.48]p=0.044
Lung function impairment at 180-day visitOverweight or obesity, n = 72No overweight or obesity, n = 501221.24 [0.575–2.73]ns (p=0.64)
Lung function impairment at 180-day visit>6 symptoms, acute COVID-19, n = 34No >6 symptoms, acute COVID-19, n = 881220.538 [0.207–1.29]ns (p=0.23)
Lung function impairment at 180-day visitPersistent symptoms, 60-day visit, n = 96No persistent symptoms, 60-day visit, n = 261221.83 [0.702–5.39]ns (p=0.3)
Lung function impairment at 180-day visitAnti-S1/S2 IgG Q1, 60-day visit, n = 28No anti-S1/S2 IgG Q1, 60-day visit, n = 841120.245 [0.0675–0.704]p=0.03
Lung function impairment at 180-day visitAnti-S1/S2 IgG Q2, 60-day visit, n = 27No anti-S1/S2 IgG Q2, 60-day visit, n = 851122.23 [0.913–5.45]ns (p=0.12)
Lung function impairment at 180-day visitAnti-S1/S2 IgG Q3, 60-day visit, n = 28No anti-S1/S2 IgG Q3, 60-day visit, n = 841120.72 [0.27–1.78]ns (p=0.55)
Lung function impairment at 180-day visitAnti-S1/S2 IgG Q4, 60-day visit, n = 29No anti-S1/S2 IgG Q4, 60-day visit, n = 831121.88 [0.784–4.51]ns (p=0.21)
Lung function impairment at 180-day visitAmbulatory, acute COVID-19, n = 32No ambulatory, acute COVID-19, n = 901220.214 [0.0597–0.603]p=0.017
Lung function impairment at 180-day visitHospitalized, acute COVID-19, n = 32No hospitalized, acute COVID-19, n = 901221.1 [0.458–2.56]ns (p=0.85)
Lung function impairment at 180-day visitOxygen therapy, acute COVID-19, n = 32No oxygen therapy, acute COVID-19, n = 901221.33 [0.561–3.07]ns (p=0.57)
Lung function impairment at 180-day visitICU, acute COVID-19, n = 26No ICU, acute COVID-19, n = 961222.56 [1.05–6.27]ns (p=0.061)
  1. CVD = cardiovascular disease; PD = pulmonary disease; COPD = chronic obstructive pulmonary disease; CKD = chronic kidney disease; GITD = gastrointestinal disease; GI = gastrointestinal; CRP = C-reactive protein; ICU = intensive care unit; CT = computed tomography.

Appendix 1—table 3
Feature cluster assignment scheme.
Cluster #Variable
1Male sex, CVD, hypertension, metabolic disorders, anti-infectives, acute COVID-19, elevated NTproBNP, 60-day visit, elevated D-dimer, 60-day visit, hospitalized >7 days, acute COVID-19, >3 comorbidities, overweight
2Obesity, current smoker, ex-smoker, PD, COPD, asthma, hypercholesterolemia, diabetes, CKD, GITD, malignancy, immune deficiency, GI symptoms, acute COVID-19, anosmia, acute COVID-19, sleep disorders, acute COVID-19, antiplatelet, acute COVID-19, anticoagulatives, acute COVID-19, immunosuppression, acute COVID-19, anemia, 60-day visit, elevated ferritin, 60-day visit, elevated CRP, 60-day visit, elevated IL-6, 60-day visit, iron deficiency, 60-day visit, age over 65, > 6 symptoms, acute COVID-19, anti-S1/S2 IgG Q1, 60-day visit, anti-S1/S2 IgG Q2, 60-day visit, anti-S1/S2 IgG Q3, 60-day visit, anti-S1/S2 IgG Q4, 60-day visit, ambulatory, acute COVID-19, hospitalized, acute COVID-19, oxygen therapy, acute COVID-19, ICU, acute COVID-19, CT severity score 1–5 at 180-day visit, CT severity score >5 at 180-day visit, lung function impairment at 180-day visit
3Weight loss, acute COVID-19, dyspnea, acute COVID-19, cough, acute COVID-19, fever, acute COVID-19, night sweat, acute COVID-19, pain, acute COVID-19, impaired performance, acute COVID-19, any comorbidity, persistent symptoms, 60-day visit, symptoms at 180-day visit
  1. CVD = cardiovascular disease; PD = pulmonary disease; COPD = chronic obstructive pulmonary disease; CKD = chronic kidney disease; GITD = gastrointestinal disease; GI = gastrointestinal; CRP = C-reactive protein; ICU = intensive care unit; CT = computed tomography.

Appendix 1—table 4
Development of machine learning models.

Outcome: outcome variable at the 180-day follow-up visit

OutcomeClassifier typeCaret methodDescriptionPackageOptimal arguments
CT abnormalities at 180-day visitmodelC5.0C5.0C50trials = 10, model = tree, winnow = FALSE
rfRandom ForestrandomForestmtry = 27
svmRadialSupport Vector Machines with Radial Basis Function Kernelkernlabsigma = 0.0105, C = 0.5
nnetNeural Networknnetsize = 1, decay = 0
glmnetElastic-Net Regularized Generalized Linear Modelsglmnetalpha = 0.1, lambda = 0.000431
ensembleglmnetElastic-Net Regularized Generalized Linear Modelsglmnetalpha = 1, lambda = 0.0523
CT severity score >5 at 180-day visitmodelC5.0C5.0C50trials = 1, model = rules, winnow = TRUE
rfRandom ForestrandomForestmtry = 52
svmRadialSupport Vector Machines with Radial Basis Function Kernelkernlabsigma = 0.00979, C = 0.5
nnetNeural Networknnetsize = 1, decay = 0.1
glmnetElastic-Net Regularized Generalized Linear Modelsglmnetalpha = 0.1, lambda = 0.0419
ensembleglmnetElastic-Net Regularized Generalized Linear Modelsglmnetalpha = 0.1, lambda = 0.00379
Symptoms at 180-day visitmodelC5.0C5.0C50trials = 1, model = tree, winnow = FALSE
rfRandom ForestrandomForestmtry = 27
svmRadialSupport Vector Machines with Radial Basis Function Kernelkernlabsigma = 0.0109, C = 1
nnetNeural Networknnetsize = 3, decay = 0.1
glmnetElastic-Net Regularized Generalized Linear Modelsglmnetalpha = 0.1, lambda = 0.000247
ensembleglmnetElastic-Net Regularized Generalized Linear Modelsglmnetalpha = 0.1, lambda = 0.0167
Lung function impairment at 180-day visitmodelC5.0C5.0C50trials = 1, model = rules, winnow = FALSE
rfRandom ForestrandomForestmtry = 52
svmRadialSupport Vector Machines with Radial Basis Function Kernelkernlabsigma = 0.0108, C = 0.5
nnetNeural Networknnetsize = 1, decay = 0.1
glmnetElastic-Net Regularized Generalized Linear Modelsglmnetalpha = 0.55, lambda = 0.0341
ensembleglmnetElastic-Net Regularized Generalized Linear Modelsglmnetalpha = 0.55, lambda = 0.0387
Appendix 1—table 5
Performance of machine learning classifiers.

Outcome: outcome variable at the 180-day follow-up visit; Method: Caret method, Accuracy: model accuracy with 95% confidence intervals, Kappa: model kappa statistic with 95% confidence intervals, AUC: area under the curve.

OutcomeTotal NEvents NMethodData setAccuracyKappaAUCSensitivitySpecificity
CT abnormalities at 180-day visit10949C5.0CV0.72 [0.36–1]0.43 [-0.35–1]0.780.690.74
CT abnormalities at 180-day visit10949C5.0Training11111
CT abnormalities at 180-day visit10949ensembleCV0.78 [0.63–0.93]0.55 [0.26–0.85]0.810.750.8
CT abnormalities at 180-day visit10949ensembleTraining0.930.850.980.860.98
CT abnormalities at 180-day visit10949glmnetCV0.71 [0.3–1]0.42 [-0.52–1]0.790.710.72
CT abnormalities at 180-day visit10949glmnetTraining11111
CT abnormalities at 180-day visit10949nnetcCV0.67 [0.26–1]0.35 [-0.38–1]0.690.710.64
CT abnormalities at 180-day visit10949nnetTraining0.760.540.7810.57
CT abnormalities at 180-day visit10949rfCV0.73 [0.4–1]0.45 [-0.33–1]0.780.720.74
CT abnormalities at 180-day visit10949rfTraining11111
CT abnormalities at 180-day visit10949svmRadialCV0.75 [0.4–1]0.51 [-0.25–1]0.80.780.73
CT abnormalities at 180-day visit10949svmRadialTraining0.850.70.930.840.87
CT severity score >5 at 180-day visit10921C5.0CV0.86 [0.67–1]0.37 [-0.2–1]0.70.390.98
CT severity score >5 at 180-day visit10921C5.0Training0.870.50.70.430.98
CT severity score >5 at 180-day visit10921ensembleCV0.88 [0.81–0.96]0.51 [0.044–0.89]0.750.450.98
CT severity score >5 at 180-day visit10921ensembleTraining0.890.570.650.480.99
CT severity score >5 at 180-day visit10921glmnetCV0.84 [0.6–1]0.34 [-0.25–1]0.760.410.94
CT severity score >5 at 180-day visit10921glmnetTraining0.940.80.970.711
CT severity score >5 at 180-day visit10921nnetCV0.79 [0.5–1]0.31 [-0.29–1]0.720.470.87
CT severity score >5 at 180-day visit10921nnetTraining0.990.9710.951
CT severity score >5 at 180-day visit10921rfCV0.84 [0.6–1]0.34 [-0.25–1]0.730.40.95
CT severity score >5 at 180-day visit10921rfTraining11111
CT severity score >5 at 180-day visit10921svmRadialCV0.87 [0.63–1]0.43 [-0.23–1]0.750.480.97
CT severity score >5 at 180-day visit10921svmRadialTraining0.920.680.990.571
Lung function impairment at 180-day visit11138C5.0CV0.73 [0.33–1]0.39 [-0.5–1]0.70.540.84
Lung function impairment at 180-day visit11138C5.0Training0.860.70.850.790.9
Lung function impairment at 180-day visit11138ensembleCV0.75 [0.61–0.86]0.39 [0.052–0.67]0.720.480.89
Lung function impairment at 180-day visit11138ensembleTraining0.890.750.980.790.95
Lung function impairment at 180-day visit11138glmnetCV0.74 [0.4–1]0.37 [-0.36–1]0.660.510.86
Lung function impairment at 180-day visit11138glmnetTraining0.830.590.890.610.95
Lung function impairment at 180-day visit11138nnetCV0.65 [0.2–1]0.2 [-0.5–1]0.590.440.76
Lung function impairment at 180-day visit11138nnetTraining0.930.830.820.791
Lung function impairment at 180-day visit11138rfCV0.73 [0.4–1]0.35 [-0.33–1]0.720.490.85
Lung function impairment at 180-day visit11138rfTraining11111
Lung function impairment at 180-day visit11138svmRadialCV0.72 [0.36–1]0.35 [-0.44–1]0.690.50.84
Lung function impairment at 180-day visit11138svmRadialTraining0.870.710.940.710.96
Symptoms at 180-day visit13365C5.0CV0.6 [0.22–0.93]0.2 [-0.51–0.87]0.570.610.58
Symptoms at 180-day visit13365C5.0Training0.930.860.960.890.97
Symptoms at 180-day visit13365ensembleCV0.58 [0.41–0.74]0.16 [-0.19–0.49]0.60.520.63
Symptoms at 180-day visit13365ensembleTraining0.990.9810.981
Symptoms at 180-day visit13365glmnetCV0.56 [0.17–0.86]0.13 [-0.64–0.72]0.560.540.58
Symptoms at 180-day visit13365glmnetTraining0.850.70.920.820.88
Symptoms at 180-day visit13365nnetCV0.59 [0.29–0.86]0.17 [-0.52–0.72]0.580.60.57
Symptoms at 180-day visit13365nnetTraining11111
Symptoms at 180-day visit13365rfCV0.56 [0.29–0.86]0.13 [-0.46–0.71]0.590.560.56
Symptoms at 180-day visit13365rfTraining11111
Symptoms at 180-day visit13365svmRadialCV0.54 [0.17–0.83]0.089 [-0.67–0.67]0.550.450.62
Symptoms at 180-day visit13365svmRadialTraining0.860.730.940.850.88
  1. AUC = area under the curve; CT = computed tomography; glmnet = elastic-net regularized generalized linear models; nnet = neural networks; svmRadial = support vector machines with radial basis function kernel; rf = random forest; ensemble = model ensemble with elastic-net regularized generalized linear models

Appendix 1—table 6
Performance of machine learning classifiers in the acute COVID-19 severity strata.

Outcome: outcome variable at the 180-day follow-up visit; cohort subset: cohort acute COVID-19 severity strata (mild–moderate: outpatient or hospitalized without oxygen; severe–critical: oxygen therapy or ICU),

OutcomeCohort subsetTotal NEvents NMethodData setAUCSensitivitySpecificity
CT abnormalities at 180-day visitWhole cohort10949C5.0Training111
CT abnormalities at 180-day visitMild–moderate COVID-195818C5.0Training111
CT abnormalities at 180-day visitSevere–critical COVID-195131C5.0Training111
CT abnormalities at 180-day visitWhole cohort10949rfTraining111
CT abnormalities at 180-day visitMild–moderate COVID-195818rfTraining111
CT abnormalities at 180-day visitSevere–critical COVID-195131rfTraining111
CT abnormalities at 180-day visitWhole cohort10949svmRadialTraining0.930.840.87
CT abnormalities at 180-day visitMild–moderate COVID-195818svmRadialTraining0.90.610.95
CT abnormalities at 180-day visitSevere–critical COVID-195131svmRadialTraining0.960.970.7
CT abnormalities at 180-day visitWhole cohort10949nnetTraining0.7810.57
CT abnormalities at 180-day visitMild–moderate COVID-195818nnetTraining0.9210.85
CT abnormalities at 180-day visitSevere–critical COVID-195131nnetTraining0.510
CT abnormalities at 180-day visitWhole cohort10949glmnetTraining111
CT abnormalities at 180-day visitMild–moderate COVID-195818glmnetTraining111
CT abnormalities at 180-day visitSevere–critical COVID-195131glmnetTraining111
CT abnormalities at 180-day visitWhole cohort10949ensembleTraining0.980.860.98
CT abnormalities at 180-day visitMild–moderate COVID-195818ensembleTraining0.980.611
CT abnormalities at 180-day visitSevere–critical COVID-195131ensembleTraining110.95
CT severity score >5 at 180-day visitWhole cohort10921C5.0Training0.70.430.98
CT severity score >5 at 180-day visitMild–moderate COVID-19586C5.0Training0.570.170.98
CT severity score >5 at 180-day visitSevere–critical COVID-195115C5.0Training0.750.530.97
CT severity score >5 at 180-day visitWhole cohort10921rfTraining111
CT severity score >5 at 180-day visitMild–moderate COVID-19586rfTraining111
CT severity score >5 at 180-day visitSevere–critical COVID-195115rfTraining111
CT severity score >5 at 180-day visitWhole cohort10921svmRadialTraining0.990.571
CT severity score >5 at 180-day visitMild–moderate COVID-19586svmRadialTraining0.980.171
CT severity score >5 at 180-day visitSevere–critical COVID-195115svmRadialTraining10.731
CT severity score >5 at 180-day visitWhole cohort10921nnetTraining10.951
CT severity score >5 at 180-day visitMild–moderate COVID-19586nnetTraining10.831
CT severity score >5 at 180-day visitSevere–critical COVID-195115nnetTraining111
CT severity score >5 at 180-day visitWhole cohort10921glmnetTraining0.970.711
CT severity score >5 at 180-day visitMild–moderate COVID-19586glmnetTraining0.940.331
CT severity score >5 at 180-day visitSevere–critical COVID-195115glmnetTraining10.871
CT severity score >5 at 180-day visitWhole cohort10921ensembleTraining0.650.480.99
CT severity score >5 at 180-day visitMild–moderate COVID-19586ensembleTraining0.380.170.98
CT severity score >5 at 180-day visitSevere–critical COVID-195115ensembleTraining0.740.61
Symptoms at 180-day visitWhole cohort13365C5.0Training0.960.890.97
Symptoms at 180-day visitMild–moderate COVID-196430C5.0Training0.970.91
Symptoms at 180-day visitSevere–critical COVID-196935C5.0Training0.960.890.94
Symptoms at 180-day visitWhole cohort13365rfTraining111
Symptoms at 180-day visitMild–moderate COVID-196430rfTraining111
Symptoms at 180-day visitSevere–critical COVID-196935rfTraining111
Symptoms at 180-day visitWhole cohort13365svmRadialTraining0.940.850.88
Symptoms at 180-day visitMild–moderate COVID-196430svmRadialTraining0.930.770.85
Symptoms at 180-day visitSevere–critical COVID-196935svmRadialTraining0.950.910.91
Symptoms at 180-day visitWhole cohort13365nnetTraining111
Symptoms at 180-day visitMild–moderate COVID-196430nnetTraining111
Symptoms at 180-day visitSevere–critical COVID-196935nnetTraining111
Symptoms at 180-day visitWhole cohort13365glmnetTraining0.920.820.88
Symptoms at 180-day visitMild–moderate COVID-196430glmnetTraining0.910.730.88
Symptoms at 180-day visitSevere–critical COVID-196935glmnetTraining0.920.890.88
Symptoms at 180-day visitWhole cohort13365ensembleTraining10.981
Symptoms at 180-day visitMild–moderate COVID-196430ensembleTraining10.971
Symptoms at 180-day visitSevere–critical COVID-196935ensembleTraining111
Lung function impairment at 180-day visitWhole cohort11138C5.0Training0.850.790.9
Lung function impairment at 180-day visitMild–moderate COVID-195514C5.0Training0.810.710.9
Lung function impairment at 180-day visitSevere–critical COVID-195624C5.0Training0.870.830.91
Lung function impairment at 180-day visitWhole cohort11138rfTraining111
Lung function impairment at 180-day visitMild–moderate COVID-195514rfTraining111
Lung function impairment at 180-day visitSevere–critical COVID-195624rfTraining111
Lung function impairment at 180-day visitWhole cohort11138svmRadialTraining0.940.710.96
Lung function impairment at 180-day visitMild–moderate COVID-195514svmRadialTraining0.880.50.98
Lung function impairment at 180-day visitSevere–critical COVID-195624svmRadialTraining0.980.830.94
Lung function impairment at 180-day visitWhole cohort11138nnetTraining0.820.791
Lung function impairment at 180-day visitMild–moderate COVID-195514nnetTraining0.70.641
Lung function impairment at 180-day visitSevere–critical COVID-195624nnetTraining0.890.881
Lung function impairment at 180-day visitWhole cohort11138glmnetTraining0.890.610.95
Lung function impairment at 180-day visitMild–moderate COVID-195514glmnetTraining0.840.290.95
Lung function impairment at 180-day visitSevere–critical COVID-195624glmnetTraining0.910.790.94
Lung function impairment at 180-day visitWhole cohort11138ensembleTraining0.980.790.95
Lung function impairment at 180-day visitMild–moderate COVID-195514ensembleTraining0.970.710.95
Lung function impairment at 180-day visitSevere–critical COVID-195624ensembleTraining0.980.830.94
CT abnormalities at 180-day visitWhole cohort10949C5.0CV0.780.690.74
CT abnormalities at 180-day visitMild–moderate COVID-195818C5.0CV0.690.430.8
CT abnormalities at 180-day visitSevere–critical COVID-195131C5.0CV0.780.850.62
CT abnormalities at 180-day visitWhole cohort10949rfCV0.780.720.74
CT abnormalities at 180-day visitMild–moderate COVID-195818rfCV0.760.430.88
CT abnormalities at 180-day visitSevere–critical COVID-195131rfCV0.710.880.47
CT abnormalities at 180-day visitWhole cohort10949svmRadialCV0.80.780.73
CT abnormalities at 180-day visitMild–moderate COVID-195818svmRadialCV0.750.560.9
CT abnormalities at 180-day visitSevere–critical COVID-195131svmRadialCV0.760.920.4
CT abnormalities at 180-day visitWhole cohort10949nnetCV0.690.710.64
CT abnormalities at 180-day visitMild–moderate COVID-195818nnetCV0.670.590.77
CT abnormalities at 180-day visitSevere–critical COVID-195131nnetCV0.620.780.39
CT abnormalities at 180-day visitWhole cohort10949glmnetCV0.790.710.72
CT abnormalities at 180-day visitMild–moderate COVID-195818glmnetCV0.780.660.78
CT abnormalities at 180-day visitSevere–critical COVID-195131glmnetCV0.750.750.6
CT abnormalities at 180-day visitWhole cohort10949ensembleCV0.810.750.8
CT abnormalities at 180-day visitMild–moderate COVID-195818ensembleCV0.760.550.92
CT abnormalities at 180-day visitSevere–critical COVID-195131ensembleCV0.790.870.55
CT severity score >5 at 180-day visitWhole cohort10921C5.0CV0.70.390.98
CT severity score >5 at 180-day visitMild–moderate COVID-19586C5.0CV0.550.130.98
CT severity score >5 at 180-day visitSevere–critical COVID-195115C5.0CV0.760.490.97
CT severity score >5 at 180-day visitWhole cohort10921rfCV0.730.40.95
CT severity score >5 at 180-day visitMild–moderate COVID-19586rfCV0.580.0330.96
CT severity score >5 at 180-day visitSevere–critical COVID-195115rfCV0.760.550.93
CT severity score >5 at 180-day visitWhole cohort10921svmRadialCV0.750.480.97
CT severity score >5 at 180-day visitMild–moderate COVID-19586svmRadialCV0.590.130.97
CT severity score >5 at 180-day visitSevere–critical COVID-195115svmRadialCV0.790.610.97
CT severity score >5 at 180-day visitWhole cohort10921nnetCV0.720.470.87
CT severity score >5 at 180-day visitMild–moderate COVID-19586nnetCV0.570.170.89
CT severity score >5 at 180-day visitSevere–critical COVID-195115nnetCV0.760.590.84
CT severity score >5 at 180-day visitWhole cohort10921glmnetCV0.760.410.94
CT severity score >5 at 180-day visitMild–moderate COVID-19586glmnetCV0.630.170.97
CT severity score >5 at 180-day visitSevere–critical COVID-195115glmnetCV0.780.510.89
CT severity score >5 at 180-day visitWhole cohort10921ensembleCV0.750.450.98
CT severity score >5 at 180-day visitMild–moderate COVID-19586ensembleCV0.640.170.98
CT severity score >5 at 180-day visitSevere–critical COVID-195115ensembleCV0.780.570.99
Symptoms at 180-day visitWhole cohort13365C5.0CV0.570.610.58
Symptoms at 180-day visitMild–moderate COVID-196430C5.0CV0.580.620.56
Symptoms at 180-day visitSevere–critical COVID-196935C5.0CV0.550.60.6
Symptoms at 180-day visitWhole cohort13365rfCV0.590.560.56
Symptoms at 180-day visitMild–moderate COVID-196430rfCV0.60.610.55
Symptoms at 180-day visitSevere–critical COVID-196935rfCV0.570.520.58
Symptoms at 180-day visitWhole cohort13365svmRadialCV0.550.450.62
Symptoms at 180-day visitMild–moderate COVID-196430svmRadialCV0.540.480.59
Symptoms at 180-day visitSevere–critical COVID-196935svmRadialCV0.560.430.66
Symptoms at 180-day visitWhole cohort13365nnetCV0.580.60.57
Symptoms at 180-day visitMild–moderate COVID-196430nnetCV0.580.630.54
Symptoms at 180-day visitSevere–critical COVID-196935nnetCV0.580.580.6
Symptoms at 180-day visitWhole cohort13365glmnetCV0.560.540.58
Symptoms at 180-day visitMild–moderate COVID-196430glmnetCV0.560.570.6
Symptoms at 180-day visitSevere–critical COVID-196935glmnetCV0.550.510.56
Symptoms at 180-day visitWhole cohort13365ensembleCV0.60.520.63
Symptoms at 180-day visitMild–moderate COVID-196430ensembleCV0.610.570.63
Symptoms at 180-day visitSevere–critical COVID-196935ensembleCV0.60.490.63
Lung function impairment at 180-day visitWhole cohort11138C5.0CV0.70.540.84
Lung function impairment at 180-day visitMild–moderate COVID-195514C5.0CV0.610.370.86
Lung function impairment at 180-day visitSevere–critical COVID-195624C5.0CV0.750.630.81
Lung function impairment at 180-day visitWhole cohort11138rfCV0.720.490.85
Lung function impairment at 180-day visitMild–moderate COVID-195514rfCV0.580.260.9
Lung function impairment at 180-day visitSevere–critical COVID-195624rfCV0.790.620.79
Lung function impairment at 180-day visitWhole cohort11138svmRadialCV0.690.50.84
Lung function impairment at 180-day visitMild–moderate COVID-195514svmRadialCV0.560.290.88
Lung function impairment at 180-day visitSevere–critical COVID-195624svmRadialCV0.750.620.79
Lung function impairment at 180-day visitWhole cohort11138nnetCV0.590.440.76
Lung function impairment at 180-day visitMild–moderate COVID-195514nnetCV0.470.290.83
Lung function impairment at 180-day visitSevere–critical COVID-195624nnetCV0.620.520.67
Lung function impairment at 180-day visitWhole cohort11138glmnetCV0.660.510.86
Lung function impairment at 180-day visitMild–moderate COVID-195514glmnetCV0.550.210.88
Lung function impairment at 180-day visitSevere–critical COVID-195624glmnetCV0.730.680.83
Lung function impairment at 180-day visitWhole cohort11138ensembleCV0.720.480.89
Lung function impairment at 180-day visitMild–moderate COVID-195514ensembleCV0.590.260.92
Lung function impairment at 180-day visitSevere–critical COVID-195624ensembleCV0.780.610.85
  1. AUC = area under the curve; CT = computed tomography; ICU = intensive care unit; glmnet = elastic-net regularized generalized linear models; nnet = neural network; svmRadial = support vector machines with radial basis function kernel; rf = random forest; ensemble = model ensemble with elastic-net regularized generalized linear models

Data availability

The complete R analysis pipeline and the anonymized study data in form of stratified study variables are available as a public GitHub repository: https://github.com/PiotrTymoszuk/CovILD_6_Months (copy archived at swh:1:rev:df521ede1d284e074a0484d3e4d0ce71097d00c3). The R code for the key tools used for uni-variate modeling and model quality control (Figures 4 and 5, https://github.com/PiotrTymoszuk/lmqc; copy archived at swh:1:rev:a020119d8f23b60901115c5c2ce6f6c71998ed31), cluster analysis and its quality control (Figures 6–7, https://github.com/PiotrTymoszuk/clustering-tools-2; copy archived at swh:1:rev:64141197ca28838a8978dce9093443537157d79f) and the risk assessment applicaiton (https://github.com/PiotrTymoszuk/COVILD-recovery-assessment-app; copy archived at swh:1:rev:95f02215f4c13425d3b76f6a13b7862a53279ab9) is available at GitHub. Source data for Figures 2–10 has been included as Source data 1.

References

    1. Amato G
    2. Gennaro C
    3. Oria V
    4. Radovanović M
    (2019)
    Faster K-Medoids Clustering: Improving the PAM, CLARA, and CLARANS
    171–187, Similarity Search and Applications, Faster K-Medoids Clustering: Improving the PAM, CLARA, and CLARANS, Cham, Springer, 10.1007/978-3-030-32047-8.
    1. Bates D
    2. Mächler M
    3. Bolker BM
    4. Walker SC
    (2015)
    Fitting linear mixed-effects models using lme4
    Journal of Statistical Software 67:1–48.
  1. Conference
    1. Boriah S
    2. Chandola V
    3. Kumar V
    (2008) Proceedings of the 2008 SIAM International Conference on Data Mining
    Similarity Measures for Categorical Data: A Comparative Evaluation. pp. 243–254.
    https://doi.org/10.1137/1.9781611972788.22
  2. Book
    1. Quinlan JR
    (1993)
    C4.5: Programs for Machine Learning
    San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
  3. Book
    1. Wilke CO
    (2019)
    Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures
    Sebastopol: O’Reilly Media.

Decision letter

  1. Joshua T Schiffer
    Reviewing Editor; Fred Hutchinson Cancer Research Center, United States
  2. Jos W Van der Meer
    Senior Editor; Radboud University Medical Centre, Netherlands
  3. Guang-Shing Cheng
    Reviewer; Fred Hutchinson Cancer Research Center, United States
  4. Joshua T Schiffer
    Reviewer; Fred Hutchinson Cancer Research Center, United States

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Investigating phenotypes of pulmonary COVID- 19 recovery -a longitudinal observational prospective multicenter trial" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, including Joshua T Schiffer as the Reviewing Editor and Reviewer #2, and the evaluation has been overseen by Jos Van der Meer as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Guang-Shing Cheng (Reviewer #1).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

1) Please describe potential methods to operationalize the machine learning approach.

2) Please take care to precisely define all endpoints throughout the study as per the requests of reviewer 3.

3) Please provide rationale for using radiologic endpoints rather than clinical and functional endpoints throughout the study and consider further analyses using clinical and functional endpoints.

4) Please clearly discriminate the radiologic endpoints. If severe abnormalities of a subset of any abnormality, then please explicitly state this. It is necessary to include the number of study participants who meet each endpoint to provide complete clarity.

5) More precisely describe the meaning of low, medium and high risk in figure 6.

6) Please be sure that exposure and outcome variables are independent of one another or remove the exposure variable from the analysis.

Reviewer #1 (Recommendations for the authors):

Well done study, well written, and of great interest to me personally and scientifically.

1. Would you be able to apply your machine learning algorithms to an external validation cohort from the same time frame? Would lend additional support to your model.

2. Additional follow-up assessments at 1 year would be informative, but perhaps that data is forthcoming in another manuscript.

3. How would you operationalize ML algorithms for clinical use?

Reviewer #2 (Recommendations for the authors):

1) Please justify selection of radiologic endpoints as primary endpoints rather than functional and symptomatic endpoints.

2) Please define endpoints specifically and explicitly state the number of patients who fall within each endpoint, taking great care to discriminate whether different groups overlap.

https://doi.org/10.7554/eLife.72500.sa1

Author response

Essential revisions:

1) Please describe potential methods to operationalize the machine learning approach.

We appreciate this important point. The applicability of a machine learning algorithm to classify real-life data is the central challenge and the greatest strength of the approach. Unfortunately, our longitudinal one-cohort study does not provide us with a possibility of external validation of the clustering and classification procedures presented in the manuscript. For this reason, we hesitated to discuss extensively the performance features and the reproducibility of the presented machine learning algorithms in the initial manuscript.

In the revised manuscript we discuss the potential of machine-learning-assisted analysis of medical record data, laboratory and patient self-reported data in early prediction of COVID-19 severity [1–3]⁠ as well as prediction and phenotyping of complicated recovery [3–6]⁠. In addition to gain more confidence in the robustness of the clustering and classification procedures shown in the manuscript, we consistently included the 20-fold cross-validation for all those analyses instead of the repeated holdout strategy used previously (feature cluster validation: Figure 6—figure supplement 1, participant cluster validation: Figure 7—figure supplement 1, validation of machine learning models: Figure 9 and Appendix 1 – table 5).

Finally, we developed an online, open source pulmonary assessment tools based on the R Shiny platform (https://im2-ibk.shinyapps.io/CovILD/, code available from https://github.com/PiotrTymoszuk/COVILD-recovery-assessment-app). The tool implements assignment of the user-provided patient records to the Risk Clusters described in the manuscript. In addition, it enables predictions of any lung CT abnormalities, moderate-to-severe CT abnormalities or functional lung impairment at the 180-day follow-up with the machine learning algorithms presented in the manuscript, which were trained and cross-validated in the CovILD cohort. We believe that such tool can increase the visibility of our work, foster collaboration, and give us an opportunity to validate our clustering approach in the future.

2) Please take care to precisely define all endpoints throughout the study as per the requests of reviewer 3.

We are thankful for pointing out this unclarity. In the revised manuscript, we precisely define the primary (any radiological lung findings at the 6-month follow-up) and secondary endpoints (radiological lung abnormalities with CT score > 5, lung function impairment and persistent symptoms at the 6-month follow-up) of the study and analysis. See: Introduction and Methods/Study design for the description in the text and Table 3 with the numbers and percentages of the study participants reaching the endpoints. The overlap between the subjects reaching the radiological, functional and clinical endpoints is presented in Figure 3—figure supplement 1 and Figure 3—figure supplement 2.

3) Please provide rationale for using radiologic endpoints rather than clinical and functional endpoints throughout the study and consider further analyses using clinical and functional endpoints.

This is an important issue which we clarify in the revised manuscript. First, the study was established after the emergence of the first COVID-19 in Europa in March 2020, and at this time hardly any information was available concerning the pulmonary outcome of COVID-19. One major issue was the concern that comparable to SARS-CoV-1 infection, many patients may develop long-term persistent structural lung abnormalities in general and interstitial lung disease (ILD) in particular following acute COVID-19 pneumonia [7–9]⁠. Thus, we implemented computed tomography as a primary assessment tool, which is the best diagnostic tool to assess early ILD [10,11]⁠.

Another goal of the CovILD study was to provide evidence for the development of structured follow-up algorithms for COVID-19 patients. As medical resources are limited, especially during a pandemic, we aimed to identify surrogate parameters, which enable us to identify patients at risk for structural pulmonary damage and the need for close-meshed functional and radiological follow-up. In this context, clinical symptoms, which are typically multifactorial and do not necessarily aid early identification of ILD, were used as a secondary outcome parameter.

Still, we agree, that clinical and functional endpoints are of great interest for the scientific, clinical and patient community. For this reason, we additionally included the long-term symptom persistence and lung function impairment outcome variables in the univariate (Figure 5, Appendix 1 – table 2) and machine learning multi-parameter risk modeling (Figure 9 – 10, Appendix 1 – table 5 and Appendix 1 – table 6). We also compare the frequency of those outcome variables in the Risk Clusters of the study participants (Figure 8).

4) Please clearly discriminate the radiologic endpoints. If severe abnormalities of a subset of any abnormality, then please explicitly state this. It is necessary to include the number of study participants who meet each endpoint to provide complete clarity.

We now clarify this important point in the Introduction, Methods and Results. N numbers of the participants meeting the endpoints at subsequent follow-up visits are presented in Table 2.

The individuals with CT severity score > 5 were a subset of the participants with any CT abnormality. The same was true for the GGO-positive patients. We agree with Editor and Reviewer 2, that the overlap between the radiological outcomes obscures the message of the clustering and modeling results. To overcome this, we removed the GGO outcome variable from the kinetic (Figure 3) and risk modeling (Figure 4), machine learning classification (Figure 9 and 10) and comparisons of CT abnormality frequency between the patient clusters (Figure 8). Please note that the great majority of the CT findings present in the study collective was anyway classified as GGOs and a detailed characteristic of CT abnormalities will be addressed by another report of our study team (Luger A et al., in revision).

In the revised manuscript, we differentiate between mild (CT severity score ≤ 5) and moderate-to-severe radiological abnormalities (CT severity score > 5) in feature (Figure 6) and participant clustering (Figure 8). Furthermore, to guarantee the consequent distinction of explanatory and outcome variables, both the feature (Figure 6) and participant clusters (Figure 7) are defined exclusively with non-CT variables. To investigate the association of mild and moderate-to-severe CT abnormalities with other non-CT variables (Figure 6), the CT features are assigned to the no-CT clusters by a k-NN-based label propagation algorithm, i. e. semi-supervised procedure [12,13]⁠ employed in our recent paper as well [6]⁠.

5) More precisely describe the meaning of low, medium and high risk in figure 6.

The nomenclature: low-, intermediate- and high- risk pertains to the frequency of long-term radiological lung abnormalities in the study participant clusters (Figure 8A). We now describe it more clearly in the section ‘Results/Risk stratification for perturbed pulmonary recovery by unsupervised clustering’ before describing other cluster features.

6) Please be sure that exposure and outcome variables are independent of one another or remove the exposure variable from the analysis.

This is an important issue. We agree with the argumentation of the Editor and Reviewer 3 that the inclusion of the overlapping CT responses in risk modeling, clustering and machine learning classification obfuscates the conclusions. For this reason and the reasons described in response to Essential revisions comment 4, we removed the GGO variable from the revised analysis pipeline and differentiate between mild (CT severity score ≤ 5) and moderate-to-severe (CT severity score > 5) radiological lung abnormalities in the modeling, clustering and machine learning classification. In addition, we define symptom and participant clusters solely with the non-CT parameters. See response to Essential revisions comment 4 for details.

Reviewer #1 (Recommendations for the authors):

Well done study, well written, and of great interest to me personally and scientifically.

1. Would you be able to apply your machine learning algorithms to an external validation cohort from the same time frame? Would lend additional support to your model.

This is an extremely important point. As stressed already in the initial version of the manuscript, the optimal performance of clustering and classification algorithms can be achieved with large training cohorts and external validation is a crucial step, which would require an additional cohort with a comparable parameter record. We searched for external collaborators, but unfortunately, none of the contacted collaborating academic pulmonology centers in Europe could provide us with a comparably rich set of demography, clinics, biochemistry, functional and imaging data collected at analogical time points. Furthermore, large cross-sectional observation cohorts, like the Wuhan Study [14]⁠, do not offer open data access.

In the revised manuscript we tried to improve the robustness of the classifiers and partially address the lacking possibility for external validation:

1. We do not restrict the analysis to the subset of the CovILD study with the complete set of all variables. Instead, the non-missingness criterion is applied to each outcome variable separately (any CT abnormalities: n = 109, moderate-to-severe abnormalities: n = 109, lung function impairment: n = 111, persistent symptoms: n = 133). This resulted in a greater number of observations used for training of the machine learning algorithms.

2. We altered the internal validation strategy. Instead of the repeated holdout approach applied to the machine learning classification, which strongly limits the size of the training data set, we switched to 20-fold cross-validation both for the cluster algorithms (as described by Lange et al. [15]⁠, Figure 6—figure supplement 1BD and Figure 7—figure supplement 1BF) and the machine learning models (Figure 9, Appendix 1 – table 5).

3. The algorithm of clustering of the study participants was changed to a more stable one as investigated by 20-fold cross -validation stability test [15]⁠. Instead of the k-means procedure [16]⁠ applied to the clinical non-CT parameters in the initial manuscript version, a combined self-organizing map (SOM) – hierarchical clustering algorithm is used (Supplementary Figure S6) [17,18]⁠. Importantly, both methods classified the study participants in a comparable way into clusters differing significantly in frequency of pulmonary CT abnormalities (Figure 8). In addition, the cluster assignment was shown to be a significant correlate of persistent radiological lung abnormalities independently of the acute COVID-19 severity (Figure 8—figure supplement 1B), as shown in the initial version of the manuscript.

4. The set of machine learning models were optimized and includes now multiple tools provided by the R package caret [19]⁠, which represent various families of machine learning algorithms (rule tree classifier: C5.0 [20]⁠, bagged tree classifier: Random Forests [21]⁠, support vector machines (SVM) with radial kernel [22]⁠, shallow neural networks: nnet [23]⁠ and elastic net: glmnet [24]⁠, Appendix 1 – table 4) and provided more consistent results than the simple kNN and naive Bayes algorithms presented before. Finally, ensemble models being a linear combination of the C5.0, Radom Forests, SVM, nnet and glmnet classifiers were constructed based on the elastic net algorithm and caretEnsemble package (Figure 9—figure supplement 2) [25]⁠. Notably, the respective ensemble models showed a superior accuracy at predicting any CT abnormalities and persistent symptoms in the cross-validation setting (Figure 9, Appendix 1 – table 5).

5. We provide an open-source online R Shiny application (https://im2-ibk.shinyapps.io/CovILD/) implementing k-NN-based assignment (supervised clustering) [12,13,26]⁠ the user-provided patient’s records to the low-, intermediate- and high-risk clusters (Figure 7) and enables pulmonary outcome prediction by machine learning. We suppose that such tool may foster collaboration and facilitate verification of the manuscript’s findings intramurally and by all interested centers.

2. Additional follow-up assessments at 1 year would be informative, but perhaps that data is forthcoming in another manuscript.

Unfortunately, we are not able to disclose the one-year follow-up data in the revised manuscript, as the radiological and clinical findings of the CovILD cohort are included in a manuscript currently under revision in Radiology (Luger A et al.) and we are still working on the analysis of the clinical, cardiopulmonary and mental recovery data for the one-year follow-up time point.

3. How would you operationalize ML algorithms for clinical use?

We appreciate this point. The drawback of the clustering and classification algorithms presented in the manuscript is the large set of input variables (50 non-CT features) precluding manual one-by-one risk computation in clinical routine. As described in response to the public review, an open-source risk cluster classification and machine learning pulmonary outcome prediction tool accompanies now the revised manuscript (https://im2-ibk.shinyapps.io/CovILD/). Such tool implements a data sheet (.xlsx) input, enabling concomitant analysis of multiple patient records at a time.

Reviewer #2 (Recommendations for the authors):

1) Please justify selection of radiologic endpoints as primary endpoints rather than functional and symptomatic endpoints.

Please see response to the public review for the motivation of the focus on radiological/structural lung recovery.

2) Please define endpoints specifically and explicitly state the number of patients who fall within each endpoint, taking great care to discriminate whether different groups overlap.

Done as requested, please see response to the public review and Table 3 of the revised manuscript.

References:

1. Gutmann C, Takov K, Burnap SA, et al. SARS-CoV-2 RNAemia and proteomic trajectories inform prognostication in COVID-19 patients admitted to intensive care. Nat Commun 2021;12. doi:10.1038/S41467-021-23494-1

2. Benito-León J, Castillo MD Del, Estirado A, et al. Using Unsupervised Machine Learning to Identify Age- and Sex-Independent Severity Subgroups Among Patients with COVID-19: Observational Longitudinal Study. J Med Internet Res 2021;23. doi:10.2196/25988

3. Demichev V, Tober-Lau P, Lemke O, et al. A time-resolved proteomic and prognostic map of COVID-19. Cell Syst 2021;12:780. doi:10.1016/J.CELS.2021.05.005

4. Estiri H, Strasser ZH, Brat GA, et al. Evolving phenotypes of non-hospitalized patients that indicate long COVID. BMC Med 2021;19. doi:10.1186/S12916-021-02115-0

5. Sudre CH, Murray B, Varsavsky T, et al. Attributes and predictors of long COVID. Nat Med 2021;27. doi:10.1038/s41591-021-01292-y

6. Sahanic S, Tymoszuk P, Ausserhofer D, et al. Phenotyping of acute and persistent COVID-19 features in the outpatient setting: exploratory analysis of an international cross-sectional online survey. Clin Infect Dis Published Online First: 26 November 2021. doi:10.1093/CID/CIAB978

7. Hui DS, Wong KT, Ko FW, et al. The 1-Year Impact of Severe Acute Respiratory Syndrome on Pulmonary Function, Exercise Capacity, and Quality of Life in a Cohort of Survivors. Chest 2005;128:2247–61. doi:10.1378/CHEST.128.4.2247

8. Ng CK, Chan JWM, Kwan TL, et al. Six month radiological and physiological outcomes in severe acute respiratory syndrome (SARS) survivors. Thorax 2004;59:889–91. doi:10.1136/THX.2004.023762

9. Raghu G, Wilson KC. COVID-19 interstitial pneumonia: monitoring the clinical course in survivors. Lancet Respir. Med. 2020;8:839–42. doi:10.1016/S2213-2600(20)30349-0

10. Suliman YA, Dobrota R, Huscher D, et al. Pulmonary function tests: High rate of false-negative results in the early detection and screening of scleroderma-related interstitial lung disease. Arthritis Rheumatol 2015;67:3256–61. doi:10.1002/ART.39405/ABSTRACT

11. Hatabu H, Hunninghake GM, Richeldi L, et al. Interstitial lung abnormalities detected incidentally on CT: a Position Paper from the Fleischner Society. Lancet Respir Med 2020;8:726. doi:10.1016/S2213-2600(20)30168-5

12. Leng M, Wang J, Cheng J, et al. Adaptive semi-supervised clustering algorithm with label propagation. J Softw Eng 2014;8:14–22. doi:10.3923/JSE.2014.14.22

13. Lelis L, Sander J. Semi-supervised density-based clustering. Proc – IEEE Int Conf Data Mining, ICDM 2009;:842–7. doi:10.1109/ICDM.2009.143

14. Huang C, Huang L, Wang Y, et al. 6-month consequences of COVID-19 in patients discharged from hospital: a cohort study. Lancet 2021;397:220–32. doi:10.1016/S0140-6736(20)32656-8

15. Lange T, Roth V, Braun ML, et al. Stability-Based Validation of Clustering Solutions. Neural Comput 2004;16:1299–323. doi:10.1162/089976604773717621

16. Hartigan JA, Wong MA. Algorithm AS 136: A K-Means Clustering Algorithm. Appl Stat 1979;28:100. doi:10.2307/2346830

17. Kohonen T. Self-Organizing Maps. Berlin, Heidelberg: : Springer Berlin Heidelberg 1995. doi:10.1007/978-3-642-97610-0

18. Vesanto J, Alhoniemi E. Clustering of the self-organizing map. IEEE Trans Neural Networks 2000;11:586–600. doi:10.1109/72.846731

19. Kuhn M. Building predictive models in R using the caret package. J Stat Softw 2008;28:1–26. doi:10.18637/jss.v028.i05

20. Quinlan JR. C4.5: Programs for Machine Learning. San Francisco, CA, USA: : Morgan Kaufmann Publishers Inc. 1993. doi:10.5555/152181

21. Breiman L. Random forests. Mach Learn 2001;45:5–32. doi:10.1023/A:1010933404324

22. Weston J, Watkins C. Multi-Class Support Vector Machines. 1998.

23. Ripley BD. Pattern recognition and neural networks. Cambridge University Press 2014. doi:10.1017/CBO9780511812651

24. Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw 2010;33:1–22. doi:10.18637/jss.v033.i01

25. Deane-Mayer ZA, Knowles JE. Ensembles of Caret Models [R package caretEnsemble version 2.0.1]. 2019.https://cran.r-project.org/package=caretEnsemble (accessed 13 Dec 2021).

26. Glennan T, Leckie C, Erfani SM. Improved Classification of Known and Unknown Network Traffic Flows Using Semi-supervised Machine Learning. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 2016;9723:493–501. doi:10.1007/978-3-319-40367-0_33

27. Sonnweber T, Sahanic S, Pizzini A, et al. Cardiopulmonary recovery after COVID-19 – an observational prospective multi-center trial. Eur Respir J Published Online First: 10 December 2020. doi:10.1183/13993003.03481-2020

https://doi.org/10.7554/eLife.72500.sa2

Article and author information

Author details

  1. Thomas Sonnweber

    Department of Internal Medicine II, Medical University of Innsbruck, Innsbruck, Austria
    Contribution
    Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Visualization, Writing – original draft, Writing – review and editing
    Contributed equally with
    Piotr Tymoszuk
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5080-386X
  2. Piotr Tymoszuk

    Department of Internal Medicine II, Medical University of Innsbruck, Innsbruck, Austria
    Present address
    Data Analytics As a Service Tirol, Innsbruck, Austria
    Contribution
    Data curation, Formal analysis, Investigation, Methodology, Project administration, Software, Writing – original draft, Writing – review and editing
    Contributed equally with
    Thomas Sonnweber
    Competing interests
    owns his own business, Data Analytics as a Service Tirol, for which he performs freelance data science work. Has also received an honorarium for the study data management, curation and analysis and minor manuscript work. The author has no other competing interests to declare
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-0398-6034
  3. Sabina Sahanic

    Department of Internal Medicine II, Medical University of Innsbruck, Innsbruck, Austria
    Contribution
    Conceptualization, Investigation, Methodology, Project administration, Resources
    Competing interests
    No competing interests declared
  4. Anna Boehm

    Department of Internal Medicine II, Medical University of Innsbruck, Innsbruck, Austria
    Contribution
    Conceptualization, Investigation, Methodology, Project administration
    Competing interests
    No competing interests declared
  5. Alex Pizzini

    Department of Internal Medicine II, Medical University of Innsbruck, Innsbruck, Austria
    Contribution
    Conceptualization, Investigation, Methodology, Project administration
    Competing interests
    No competing interests declared
  6. Anna Luger

    Department of Radiology, Medical University of Innsbruck, Innsbruck, Austria
    Contribution
    Investigation, Methodology, Project administration
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-0445-8372
  7. Christoph Schwabl

    Department of Radiology, Medical University of Innsbruck, Innsbruck, Austria
    Contribution
    Investigation, Methodology, Project administration, Resources
    Competing interests
    No competing interests declared
  8. Manfred Nairz

    Department of Internal Medicine II, Medical University of Innsbruck, Innsbruck, Austria
    Contribution
    Investigation, Methodology, Project administration, Resources
    Competing interests
    No competing interests declared
  9. Philipp Grubwieser

    Department of Internal Medicine II, Medical University of Innsbruck, Innsbruck, Austria
    Contribution
    Methodology, Resources
    Competing interests
    No competing interests declared
  10. Katharina Kurz

    Department of Internal Medicine II, Medical University of Innsbruck, Innsbruck, Austria
    Contribution
    Investigation, Methodology, Project administration
    Competing interests
    No competing interests declared
  11. Sabine Koppelstätter

    Department of Internal Medicine II, Medical University of Innsbruck, Innsbruck, Austria
    Contribution
    Investigation, Methodology, Project administration
    Competing interests
    No competing interests declared
  12. Magdalena Aichner

    Department of Internal Medicine II, Medical University of Innsbruck, Innsbruck, Austria
    Contribution
    Investigation, Methodology, Project administration
    Competing interests
    No competing interests declared
  13. Bernhard Puchner

    The Karl Landsteiner Institute, Muenster, Austria
    Contribution
    Investigation, Methodology, Project administration
    Competing interests
    No competing interests declared
  14. Alexander Egger

    Central Institute of Medical and Chemical Laboratory Diagnostics, University Hospital Innsbruck, Innsbruck, Austria
    Contribution
    Investigation, Methodology, Project administration
    Competing interests
    No competing interests declared
  15. Gregor Hoermann

    1. Central Institute of Medical and Chemical Laboratory Diagnostics, University Hospital Innsbruck, Innsbruck, Austria
    2. Munich Leukemia Laboratory, Munich, Germany
    Contribution
    Investigation, Methodology, Project administration
    Competing interests
    No competing interests declared
  16. Ewald Wöll

    Department of Internal Medicine, St. Vinzenz Hospital, Zams, Austria
    Contribution
    Investigation, Methodology, Project administration
    Competing interests
    No competing interests declared
  17. Günter Weiss

    Department of Internal Medicine II, Medical University of Innsbruck, Innsbruck, Austria
    Contribution
    Conceptualization, Investigation, Methodology, Project administration, Resources, Writing – review and editing
    Competing interests
    No competing interests declared
  18. Gerlig Widmann

    Department of Radiology, Medical University of Innsbruck, Innsbruck, Austria
    Contribution
    Conceptualization, Investigation, Methodology, Project administration, Resources
    Competing interests
    No competing interests declared
  19. Ivan Tancevski

    Department of Internal Medicine II, Medical University of Innsbruck, Innsbruck, Austria
    Contribution
    Conceptualization, Data curation, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Writing – original draft, Writing – review and editing
    For correspondence
    Ivan.Tancevski@i-med.ac.at
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-5116-8960
  20. Judith Löffler-Ragg

    Department of Internal Medicine II, Medical University of Innsbruck, Innsbruck, Austria
    Contribution
    Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Supervision, Validation, Writing – original draft, Writing – review and editing
    For correspondence
    Judith.Loeffler@i-med.ac.at
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-0873-7501

Funding

Land Tirol (GZ 71934)

  • Judith Löffler-Ragg

Boehringer Ingelheim (IIS 1199-0424)

  • Ivan Tancevski

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We acknowledge the commitment of the staff and providers of our institutions through the COVID-19 crisis and the suffering and loss of our patients as well as their families. PT is (from May 2021 on) a freelance data scientist working in his own enterprise ‘Data Analytics as a Service Tirol’. He received an honorary for the study data management, curation and analysis and minor manuscript work. The other authors declare no conflict of interest related to this study. The study was funded by the research fund of the state of Tyrol (Project GZ 71934, JLR) and an Investigator-Initiated Study grant by Boehringer Ingelheim (IIS 1199-0424, IT). The funding bodies did not influence the development of the research and manuscript.

Ethics

Human subjects: All participants gave written informed consent. The study was approved by the institutional review board at the Medical University of Innsbruck (approval number: 1103/2020), and registered at ClinicalTrials.gov (NCT04416100).

Senior Editor

  1. Jos W Van der Meer, Radboud University Medical Centre, Netherlands

Reviewing Editor

  1. Joshua T Schiffer, Fred Hutchinson Cancer Research Center, United States

Reviewers

  1. Guang-Shing Cheng, Fred Hutchinson Cancer Research Center, United States
  2. Joshua T Schiffer, Fred Hutchinson Cancer Research Center, United States

Publication history

  1. Preprint posted: June 25, 2021 (view preprint)
  2. Received: July 26, 2021
  3. Accepted: January 19, 2022
  4. Accepted Manuscript published: February 8, 2022 (version 1)
  5. Version of Record published: March 4, 2022 (version 2)

Copyright

© 2022, Sonnweber et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,164
    Page views
  • 183
    Downloads
  • 4
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Thomas Sonnweber
  2. Piotr Tymoszuk
  3. Sabina Sahanic
  4. Anna Boehm
  5. Alex Pizzini
  6. Anna Luger
  7. Christoph Schwabl
  8. Manfred Nairz
  9. Philipp Grubwieser
  10. Katharina Kurz
  11. Sabine Koppelstätter
  12. Magdalena Aichner
  13. Bernhard Puchner
  14. Alexander Egger
  15. Gregor Hoermann
  16. Ewald Wöll
  17. Günter Weiss
  18. Gerlig Widmann
  19. Ivan Tancevski
  20. Judith Löffler-Ragg
(2022)
Investigating phenotypes of pulmonary COVID-19 recovery: A longitudinal observational prospective multicenter trial
eLife 11:e72500.
https://doi.org/10.7554/eLife.72500
  1. Further reading

Further reading

    1. Medicine
    2. Microbiology and Infectious Disease
    3. Epidemiology and Global Health
    4. Immunology and Inflammation
    Edited by Jos WM van der Meer et al.
    Collection

    eLife has published articles on a wide range of infectious diseases, including COVID-19, influenza, tuberculosis, HIV/AIDS, malaria and typhoid fever.

    1. Epidemiology and Global Health
    Carlos A Prete Jr, Lewis Fletcher Buss ... Ester C Sabino
    Research Article

    Background: The COVID-19 situation in Brazil is complex due to large differences in the shape and size of regional epidemics. Understanding these patterns is crucial to understand future outbreaks of SARS-CoV-2 or other respiratory pathogens in the country.

    Methods: We tested 97,950 blood donation samples for IgG antibodies from March 2020 to March 2021 in eight of Brazil’s most populous cities. Residential postal codes were used to obtain representative samples. Weekly age- and sex- specific seroprevalence was estimated by correcting the crude seroprevalence by test sensitivity, specificity and antibody waning.

    Results: The inferred attack rate of SARS-CoV-2 in December 2020, before the Gamma VOC was dominant, ranged from 19.3% (95% CrI 17.5% - 21.2%) in Curitiba to 75.0% (95% CrI 70.8% - 80.3%) in Manaus. Seroprevalence was consistently smaller in women and donors older than 55 years. The age-specific infection fatality rate (IFR) differed between cities and consistently increased with age. The infection hospitalisation rate (IHR) increased significantly during the Gamma-dominated second wave in Manaus, suggesting increased morbidity of the Gamma VOC compared to previous variants circulating in Manaus. The higher disease penetrance associated with the health system's collapse increased the overall IFR by a minimum factor of 2.91 (95% CrI 2.43 - 3.53).

    Conclusions: These results highlight the utility of blood donor serosurveillance to track epidemic maturity and demonstrate demographic and spatial heterogeneity in SARS-CoV-2 spread.

    Funding: This work was supported by Itaú Unibanco 'Todos pela Saude' program; FAPESP (grants 18/14389-0, 2019/21585-0); Wellcome Trust and Royal Society Sir Henry Dale Fellowship 204311/Z/16/Z; the Gates Foundation (INV- 034540 and INV-034652); REDS-IV-P (grant HHSN268201100007I); the UK Medical Research Council (MR/S0195/1, MR/V038109/1); CAPES; CNPq (304714/2018-6); Fundação Faculdade de Medicina; Programa Inova Fiocruz-CE/Funcap - Edital 01/2020 Number: FIO-0167-00065.01.00/20 SPU Nº06531047/2020; JBS - Fazer o bem faz bem.