1. Epidemiology and Global Health
  2. Medicine
Download icon

Metabolic biomarker profiling for identification of susceptibility to severe pneumonia and COVID-19 in the general population

  1. Heli Julkunen
  2. Anna Cichońska
  3. P Eline Slagboom
  4. Peter Würtz  Is a corresponding author
  5. Nightingale Health UK Biobank Initiative
  1. Nightingale Health Plc, Finland
  2. Molecular Epidemiology, Department of Biomedical Data Sciences, Leiden University Medical Center, Netherlands
  3. Max Planck Institute for Biology of Ageing, Germany
Research Article
  • Cited 0
  • Views 1,021
  • Annotations
Cite this article as: eLife 2021;10:e63033 doi: 10.7554/eLife.63033

Abstract

Biomarkers of low-grade inflammation have been associated with susceptibility to a severe infectious disease course, even when measured prior to disease onset. We investigated whether metabolic biomarkers measured by nuclear magnetic resonance (NMR) spectroscopy could be associated with susceptibility to severe pneumonia (2507 hospitalised or fatal cases) and severe COVID-19 (652 hospitalised cases) in 105,146 generally healthy individuals from UK Biobank, with blood samples collected 2007–2010. The overall signature of metabolic biomarker associations was similar for the risk of severe pneumonia and severe COVID-19. A multi-biomarker score, comprised of 25 proteins, fatty acids, amino acids, and lipids, was associated equally strongly with enhanced susceptibility to severe COVID-19 (odds ratio 2.9 [95%CI 2.1–3.8] for highest vs lowest quintile) and severe pneumonia events occurring 7–11 years after blood sampling (2.6 [1.7–3.9]). However, the risk for severe pneumonia occurring during the first 2 years after blood sampling for people with elevated levels of the multi-biomarker score was over four times higher than for long-term risk (8.0 [4.1–15.6]). If these hypothesis generating findings on increased susceptibility to severe pneumonia during the first few years after blood sampling extend to severe COVID-19, metabolic biomarker profiling could potentially complement existing tools for identifying individuals at high risk. These results provide novel molecular understanding on how metabolic biomarkers reflect the susceptibility to severe COVID-19 and other infections in the general population.

eLife digest

National policies for mitigating the COVID-19 pandemic include stricter measures for people considered to be at high risk of severe and potentially fatal cases of the disease. Although older age and pre-existing health conditions are strong risk factors, it is poorly understood why susceptibility varies so widely in the population.

People with cardiometabolic diseases, such as diabetes and liver diseases, or chronic inflammation are at higher risk of severe COVID-19 and other infections including pneumonia. These conditions alter the molecules circulating in the blood, providing potential ‘biomarkers’ to determine whether a person is more likely to develop a fatal infection. Uncovering these blood biomarkers could help to identify people who are prone to life-threatening infections despite not having ever been diagnosed with a cardiometabolic disease.

To find these biomarkers, Julkunen et al. studied blood samples that had been collected from 105,000 healthy individuals in the United Kingdom over ten years ago. The data showed that individuals with biomarkers linked to low-grade inflammation and cardiometabolic disease were more likely to have died or been hospitalised with pneumonia.

A score based on 25 of these biomarkers provided the best predictor of severe pneumonia. This biomarker score performed up to four times better within the first few years after blood sampling compared to predicting cases of pneumonia a decade later. The same blood biomarker changes were also linked with developing severe COVID-19 over ten years after the blood samples had been collected. The predictive value of the biomarker score was similar for both severe COVID-19 and the long-term risk of severe pneumonia.

Julkunen et al. propose that the metabolic biomarkers reflect inhibited immunity that impairs response to infections. The results from over 100,000 individuals suggest that these blood biomarkers may help to identify people at high risk of severe COVID-19 or other infectious diseases.

Introduction

The coronavirus disease 2019 (COVID-19) pandemic affects societies and healthcare systems worldwide. Protection of those individuals who are most susceptible to a severe and potentially fatal COVID-19 disease course is a prime component of national policies, with stricter social distancing and other preventative means recommended mainly for elderly people and individuals with pre-existing disease conditions. The prominent susceptibility to severe COVID-19 for people at high age has been linked with impaired immune response due to chronic inflammation caused by ageing processes (Akbar and Gilroy, 2020). However, large numbers of seemingly healthy middle-aged individuals also suffer from severe COVID-19 (Zhou et al., 2020; Atkins et al., 2020; Williamson et al., 2020); this could partly be due to similar molecular processes related to impaired immunity. A better understanding of the molecular factors predisposing to severe COVID-19 outcomes may help to explain the risk elevation ascribed to pre-existing disease conditions. From a translational point of view, this might also complement the identification of highly susceptible individuals in general population settings beyond current risk factor assessment.

Pneumonia is a life-threatening complication of COVID-19 and the most common diagnosis in severe COVID-19 patients. As for COVID-19, the main factors that increase the susceptibility for severe community-acquired pneumonia are high age and pre-existing respiratory and cardiometabolic diseases, which can weaken the lungs and the immune system (Almirall et al., 2017). Based on analyses of large blood sample collections of healthy individuals, biomarkers associated with the risk for severe COVID-19 are largely shared with the biomarkers associated with the risk for severe pneumonia, including elevated markers of impaired kidney function and inflammation and lower HDL cholesterol (Ho et al., 2020). This may indicate that these molecular markers may reflect an overall susceptibility to severe complications after contracting an infectious disease.

Comprehensive profiling of metabolic biomarkers, also known as metabolomics, in prospective population studies have suggested a range of blood biomarkers for cardiovascular disease and diabetes to also be reflective of the susceptibility for severe infectious diseases (Ritchie et al., 2015; Deelen et al., 2019). Metabolic profiling could therefore potentially identify biomarkers that reflect the susceptibility to severe COVID-19 among initially healthy individuals. However, such studies require measurement of vast numbers of blood samples collected prior to the COVID-19 pre-pandemic. Conveniently, a broad panel of metabolic biomarkers have recently been measured using nuclear magnetic resonance (NMR) spectroscopy in over 100,000 plasma samples from the UK Biobank.

Here, we examined if NMR-based metabolic biomarkers from blood samples collected a decade before the COVID-19 pandemic associate with the risk of severe infectious disease in UK general population settings. Exploiting the shared risk factor relation between susceptibility to severe COVID-19 and pneumonia (Ho et al., 2020), we used well-powered statistical analyses of biomarkers with severe pneumonia events to develop a multi-biomarker score that condenses the information from the metabolic measures into a single multi-biomarker score. Taking advantage of the time-resolved information on the occurrence of severe pneumonia events in the UK Biobank, we mimicked the influence of the decade lag from blood sampling to the COVID-19 pandemic on the biomarker associations, and used analyses with short-term follow-up to interpolate to a scenario of identifying individuals susceptible to severe COVID-19 in a preventative screening setting. Our primary aim was to improve the molecular understanding on how metabolic risk markers may contribute to increased predisposition to severe COVID-19 and other infections.

Results

A flow diagram of eligible study participants and case numbers is shown in Figure 1. Clinical characteristics of the study population are listed in Table 1. Among the 105,146 UK Biobank study participants with complete data on metabolic biomarkers and severe pneumonia outcomes, and no prior history of diagnosed pneumonia, there were 2507 severe pneumonia events recorded in hospital or death registries after the baseline blood sampling (median follow-up time 8.1 years).

Figure 1 with 1 supplement see all
Flow diagram of study participants and case numbers.

Overview of eligible study participants for the analysis of metabolic biomarkers for the susceptibility to severe pneumonia and COVID-19 in the UK Biobank. Case and control definitions are described in Materials and methods.

Table 1
Clinical characteristics of the UK Biobank participants in the current study.
Severe pneumonia
(diagnosis in hospital or death record)
Severe COVID-19
(diagnosis in hospital)
Incident casesControlsIncident casesControls
Individuals with NMR biomarker measures2507102 63965292 073
Age at blood sampling (median, [range])62
[40-70]
58
[39-70]
60
[40-70]
58
[39-70]
Females (%)44%54%43%54%
Body mass index (mean, kg/m2)28.527.428.727.3
Proportion with prevalent diseases
Cardiovascular disease (%)17.5%6.6%14.7%6.4%
Diabetes (%)9.3%3.9%9.2%3.8%
Lung cancer (%)0.4%0.1%0.3%0.1%
Chronic obstructive pulmonary disease (%)6.1%0.7%1.8%0.8%
Liver diseases (%)1.5%0.7%1.7%0.7%
Renal failure (%)3.6%1.3%2.9%1.4%
Dementia (%)0.1%0.01%0.0%0.01%
  1. The number of individuals analysed for severe COVID-19 is slightly lower than for severe pneumonia, since COVID-19 data were not available from assessment centres in Scotland and Wales.

For the severe COVID-19 analyses, there were 652 PCR-confirmed positive cases diagnosed in hospital (inferred as severe cases in this study) among the 92,725 individuals with COVID-19 data linkage available per 3rd of February 2021. The number of severe COVID-19 cases in the UK Biobank closely followed the trends in hospitalised individuals for COVID-19 in England (Figure 1—figure supplement 1). In February 2021, the age range of study participants was 49–84 years. The median duration from blood sampling to the COVID-19 pandemic was 11.2 years (interquartile range 10.0–12.6). The prevalence of chronic respiratory and cardiometabolic diseases was similar for study participants who developed severe pneumonia and those who contracted COVID-19 and required hospitalisation, with the exception of COPD. There were 33 overlapping cases between severe pneumonia and COVID-19.

Metabolic biomarkers and severe pneumonia risk

Figure 2A shows the associations of 37 biomarkers with severe pneumonia events occurring during the follow-up in the entire study population (n = 105 146). The biomarkers highlighted here are those with a regulatory approval for diagnostics use in the Nightingale Health NMR platform. These biomarkers span most of the different metabolic pathways captured with the NMR platform; results for all 249 metabolic measures quantified are shown in Figure 2—figure supplements 13. Strong associations were observed across several metabolic pathways: increased plasma concentrations of cholesterol measures, omega-3 and omega-6 fatty acid levels, histidine, branched-chain amino acids and albumin were associated with lower susceptibility to contracting severe pneumonia. Increased concentrations of monounsaturated and saturated fatty acids, as well glycoprotein acetyls (GlycA, a marker of low-grade inflammation) were associated with elevated susceptibility to contracting severe pneumonia.

Figure 2 with 3 supplements see all
Relation of baseline biomarker concentrations to future risk of severe pneumonia in the UK Biobank (n = 105 146; 2507 incident events).

(A) Odds ratios with severe pneumonia (2507 hospitalisations or deaths during a median of 8 years of follow-up) for 37 clinically validated biomarkers measured simultaneously in a single assay by Nightingale Health NMR platform. (B) Odds ratio with severe pneumonia for the multi-biomarker infectious disease score. The infectious disease score comprises of the weighted sum of 25 out of 37 clinically validated biomarkers, optimised for association with severe pneumonia based on one half of the study population using LASSO regression. Biomarkers included in the infectious disease score are marked by #. The odds ratio for infectious disease score is evaluated in the other half of the study population (n = 52 573; 1250 events). All models are adjusted for age, sex, and assessment centre. Odds ratios are per 1-SD increment in the biomarker levels. Horizontal bars denote 95% confidence intervals. Closed circles denote p-value<0.001 and open circles p-value≥0.001. BCAA indicates branched-chain amino acids; DHA: docosahexaenoic acid; MUFA: monounsaturated fatty acids; PUFA: polyunsaturated fatty acids; SFA: saturated fatty acids.

Figure 2—source data 1

Numerical tabulation of odds ratios, betas, standard errors, and p-values for results shown in Figure 2.

https://cdn.elifesciences.org/articles/63033/elife-63033-fig2-data1-v2.csv

Since all the biomarkers are quantified in the same single measurement, we examined if even stronger associations with severe pneumonia could be obtained using a combination of multiple biomarkers. We derived this multi-biomarker combination, denoted ‘infectious disease score’, using logistic regression with LASSO for variable selection, considering the 37 clinically validated biomarkers in a half of the study population as the training set. This resulted in an infectious disease score comprised of the weighted sum of 25 biomarkers, with the weights selected by the machine learning algorithm (Supplementary file 1). Broadly similar results were obtained using all 249 metabolic measures quantified in the Nightingale Health NMR platform to derive the multi-biomarker score.

The multi-biomarker infectious disease score was then tested for association with severe pneumonia in the other half of the study population. The magnitude of association for the infectious disease score was approximately twice as strong with severe pneumonia compared to any of the individual biomarkers (Figure 2B). The odds for contracting severe pneumonia was increased 67% per 1-SD increment in the infectious disease score. This corresponds to close to fourfold higher risk for contracting severe pneumonia among people in the highest quintile of the infectious disease score, compared to those with a score in the lowest quintile.

To assess the robustness of the multi-biomarker score association with severe pneumonia, we adjusted the analyses for prevalent diseases and performed analyses stratified by age and sex (Figure 3). The association was attenuated by ~10% in magnitude when adjusting for, or omitting, individuals with a diagnosis of prevalent diseases at time of blood sampling (cardiovascular diseases, diabetes, lung cancer, COPD, liver diseases, renal failure, and dementia; panels 3A and 3B). The association was similar across age groups, and also for men and women analysed separately (panels 3C and 3D).

Relation of the multi-biomarker infectious disease score to future risk of severe pneumonia with additional adjustments and in subgroups (n = 52 573; 1250 incident events).

(A) Odds ratios with severe pneumonia after additional adjustments for BMI, smoking status, and prevalent diseases. (B) Odds ratios with severe pneumonia in study participants with and without prevalent diseases. (C) Odds ratios by age tertiles at the time of blood sampling. (D) Odds ratios for men and women separately. All models are adjusted for age, sex, and assessment centre. The left-hand side shows odds ratios per 1-SD increment in the multi-biomarker infectious disease score, and the right-hand side odds ratios for comparing individuals in the highest and lowest quintiles of the score. The results are based on the validation half of the study population not used in deriving the infectious disease score (1250 events during a median of 8 years of follow-up).

Figure 3—source data 1

Numerical tabulation of odds ratios, betas, standard errors, and p-values for results shown in Figure 3.

https://cdn.elifesciences.org/articles/63033/elife-63033-fig3-data1-v2.csv

To mimic the influence of the decade-long lag from blood sample collection to the COVID-19 pandemic, we tested the association of the multi-biomarker infectious disease score with severe pneumonia events occurring during 7–11 years after the blood sampling (Figure 4A). Since there were only few severe pneumonia events recorded with more than 9 years of follow-up, we could not fully mimic the decade long time lag to the COVID-19 pandemic. The risk elevation observed in this time-lag accounting scenario was only approximately half of that observed for severe pneumonia events occurring within the first 7 years (odds ratio 1.43 vs 1.75 per 1-SD, respectively; and 2.59 vs 4.27 for individuals in the highest vs lowest quintile of the infectious disease score).

Figure 4 with 1 supplement see all
Relation of the multi-biomarker infectious disease score to long-term and short-term future risk for severe pneumonia (n = 52 573; 1250 incident events).

(A) Odds ratios with severe pneumonia events occurring within the first 7 years after the blood sampling, compared to events that occurred 7–11 years after blood sampling. (B) Odds ratios for severe pneumonia occurring within and after the first 2 years of blood sampling. Models are adjusted for age, sex, and assessment centre. The left-hand side shows odds ratios per 1-SD increment in the multi-biomarker infectious disease score, and the right-hand side odds ratios for comparing individuals in the highest and lowest quintiles of the score. The results are based on the validation half of the study population that was not used in deriving the infectious disease score.

Figure 4—source data 1

Numerical tabulation of odds ratios, betas, standard errors, and p-values for results shown in Figure 4.

https://cdn.elifesciences.org/articles/63033/elife-63033-fig4-data1-v2.csv

To interpolate to a screening scenario conducted today, we also tested the association with short-term risk of severe pneumonia by analysing events occurring within the first 2 years after the blood sampling (Figure 4B). The association magnitude in this analysis of short-term risk scenario was approximately twice as strong as for severe pneumonia events occurring more than 2 years after blood sampling (odds ratio 2.21 vs 1.59 per 1-SD; 7.95 vs 3.35 for individuals in the highest vs lowest quintile of the infectious disease score). The elevated susceptibility to severe pneumonia associated with the multi-biomarker score was therefore three to four times stronger when examining short-term risk as compared to risk of severe pneumonia events occurring almost a decade after the blood sampling.

The elevation in the short-term risk for severe pneumonia for high levels of the infectious disease multi-biomarker score remained strong when adjusting for BMI, smoking and prevalent diseases (odds ratio 6.10 for individuals in the highest vs lowest quintile; Figure 4—figure supplement 1).

We further explored the risk gradient for a future onset of severe pneumonia along increasing levels of the infectious disease score, since non-linear effects could potentially facilitate the identification of thresholds for individuals at high susceptibility. Figure 5A shows the increase in the proportion of individuals who contracted severe pneumonia according to percentiles of the score. The risk increased prominently in the highest quintile, and particularly for the highest few percentiles. The time-resolved plot of the cumulative probability of severe pneumonia during follow-up is shown in Figure 5B. The susceptibility to severe pneumonia was particularly elevated among individuals with the very highest levels of the multi-biomarker infectious disease score. This was observed already during the first few years of follow-up, corroborating the results for long-term and short-term risk shown in Figure 3. The prominent and immediate elevation in susceptibility to severe pneumonia was also observed when limiting analyses to individuals without chronic respiratory and cardiometabolic diseases at the time of blood sampling (Figure 5—figure supplement 1).

Figure 5 with 1 supplement see all
Risk gradient for contracting severe pneumonia after the blood sampling according to percentiles of the multi-biomarker infectious disease score (n = 52 573; 1250 incident events).

(A) Proportion of individuals who contracted severe pneumonia during a median follow-up time of 8.1 years after the blood sampling according to percentiles of the multi-biomarker infectious disease score. Each point represents approximately 500 individuals. (B) Kaplan-Meier curves of the cumulative probability for severe pneumonia in quantiles of the multi-biomarker infectious disease score. The follow-up time was truncated at 9.5 years since only a small fraction of individuals were followed longer. Results are based on the validation half of the study population that was not used in deriving the infectious disease score (n = 52,573). The corresponding plots for individuals free of baseline respiratory and cardiometabolic diseases are shown in Figure 5—figure supplement 1.

Figure 5—source data 1

Numerical tabulation of event rates for each percentile in Figure 5A.

https://cdn.elifesciences.org/articles/63033/elife-63033-fig5-data1-v2.csv

Metabolic biomarkers and severe COVID-19

Figure 6 shows the associations of the 37 clinically validated biomarkers and the infectious disease score with the future onset of severe COVID-19 (defined as PCR-confirmed positive inpatient diagnosis). Many of the individual biomarkers had significant associations (p-value<0.001) with increased risk for severe COVID-19. These biomarkers for susceptibility to severe COVID-19 include lower levels of omega-3 omega-6 fatty acids as well as albumin, and higher levels of GlycA. We observed a high concordance in the overall pattern of COVID-19 biomarker associations with severe pneumonia (Figure 2A), with a Spearman correlation of 0.89 between the overall biomarker association signatures for severe pneumonia and severe COVID-19 (Figure 7).

Relation of baseline biomarkers and multi-biomarker infectious disease score to future risk of severe COVID-19 (n = 92 725; 652 cases diagnosed in hospital).

(A) Odds ratios with severe COVID-19 (defined as PCR-positive diagnosis in hospital; 652 cases out of 92 725 individuals) for 37 clinically validated biomarkers measured by NMR. (B) Odds ratio with severe COVID-19 for the multi-biomarker infectious disease score. Biomarkers included in the infectious disease score are marked by #. Odds ratios are per 1-SD increment in the biomarker levels. Models are adjusted for age, sex, and assessment centre.

Figure 6—source data 1

Numerical tabulation of odds ratios, betas, standard errors, and p-values for results shown in Figure 6.

https://cdn.elifesciences.org/articles/63033/elife-63033-fig6-data1-v2.csv
Concordance of the overall pattern of biomarker associations with future onset of severe pneumonia and severe COVID-19.

Biomarker associations with future onset of severe pneumonia (y-axis) plotted against the corresponding associations with severe COVID-19 (x-axis). The odds ratios, with adjustment for age, sex, and assessment centre, for each of the 37 clinically validated biomarkers in the Nightingale Health NMR platform are given with 95% confidence intervals in vertical and horizontal error bars. The dashed line denotes the diagonal.

Figure 7—source data 1

Numerical tabulation of odds ratios, and 95% confidence intervals for results shown in Figure 7.

https://cdn.elifesciences.org/articles/63033/elife-63033-fig7-data1-v2.csv

The multi-biomarker infectious disease score derived for the future onset of severe pneumonia was also robustly associated with the future onset of severe COVID-19. The odds ratio was 1.40 per 1-SD increment and 2.90 for comparing individuals in the highest quintile of the multi-biomarker infectious disease score to those in the lowest quintile. This magnitude of association with susceptibility to severe COVID-19 was similar to that observed with severe pneumonia events occurring during the interval of 7–11 years after the blood sampling.

We further examined the association of the multi-biomarker infectious disease score with severe COVID-19 after adjustment or exclusion for prevalent diseases, and conducted stratified analyses for age and sex (Figure 8). The association with severe COVID-19 was attenuated, but remained significant when adjusted for BMI, smoking and prevalent diseases (panel 7A). The association magnitudes were approximately 20% weaker when limiting the COVID-19 analyses to individuals without prevalent diseases at time of blood sampling (panel 7B). There was no robust evidence of differences in association magnitude according to age (panel 7C) and odd ratios were broadly similar for men and women (panel 7D).

Relation of the multi-biomarker infectious disease score to future risk of severe COVID-19 with additional adjustments and in subgroups of the study population (n = 92,725; 652 cases diagnosed in hospital).

(A) Odds ratios with severe COVID-19 after additional adjustments for BMI, smoking status and prevalent diseases. (B) Odds ratios with severe pneumonia in study participants with and without prevalent diseases at the time of blood sampling. (C) Odds ratios by age tertiles at the time of the COVID-19 pandemic. (D) Odds ratios for men and women, separately. The left-hand side shows the odds ratios per 1-SD increment in the multi-biomarker infectious disease score, and the right-hand side the odds ratios for comparing individuals in the highest and lowest quintiles of the score. All models are adjusted for age, sex, and assessment centre.

Figure 8—source data 1

Numerical tabulation of odds ratios, betas, standard errors, and p-values for results shown in Figure 8.

https://cdn.elifesciences.org/articles/63033/elife-63033-fig8-data1-v2.csv

Finally, we examined the technical repeatability and biological stability of measuring the multi-biomarker infectious disease score. The measurement repeatability was high (Pearson correlation 0.94 in blind duplicate samples; Figure 9A). Even though the blood samples were primarily non-fasting, the levels of the infectious disease score remained broadly stable during 4 years based on blood samples from repeat visits (Pearson correlation 0.61 between baseline and repeat visit measurements; Figure 9B).

Technical repeatability for measuring the multi-biomarker infectious disease score and biological stability in repeat measures 4 years after the baseline blood sampling.

(A) Technical repeatability of the infectious disease score assessed in blind duplicate samples. The correlation plot is based on 2863 blind duplicate plasma samples measured along with the regular measurements of ~105,000 samples in the Nightingale Health-UK Biobank initiative. (B) Biological stability of the infectious disease score based on plasma samples from 1298 individuals who attended both the baseline visit as well as a repeat visit ~4 years later at the UK Biobank assessment centres.

Discussion

Most biomarker studies on COVID–19 have focused on characterising already infected patients and their disease prognosis (Kermali et al., 2020; Shen et al., 2020; Messner et al., 2020; Dierckx et al., 2020). In contrast, in the largest blood metabolic profiling study to date, we explored biomarker associations for susceptibility to severe pneumonia and COVID-19 in general population settings. We developed a multi-biomarker score for increased susceptibility to a severe infectious disease course, and demonstrated that this biomarker score captures an increased risk for COVID-19 hospitalisation a decade after the blood sampling.

The overall signature of biomarker associations was similar for the susceptibility to severe COVID-19 and to severe pneumonia (Figure 7). The proportions of individuals with existing cardiometabolic diseases were also consistent for both of these infectious diseases (Table 1). We used these observations of a shared risk factor basis to draw an analogy between susceptibility to severe pneumonia and severe COVID-19, and hereby infer potential implications for preventative screening. We therefore exploited the strong statistical power and time-resolved information on severe pneumonia events for more detailed analyses than was feasible with COVID-19. This led to three important observations. First, the infectious disease multi-biomarker score was largely independent of prevalent chronic respiratory and cardiometabolic diseases (Figure 3). Second, the susceptibility to severe pneumonia was drastically elevated in the extreme tail of the multi-biomarker infectious disease score, with 5–10 times higher risk compared to individuals with normal levels of the multi-biomarker score (Figure 5). Such features might aid in establishing thresholds for identifying individuals most susceptible to a severe disease course. Third, the odds ratio of the multi-biomarker score for severe pneumonia events occurring after 7–11 years closely matched that of severe COVID-19, for which all events occurred over decade after blood sampling (Figure 4A). Yet, screening for the susceptibility to severe COVID-19 would require a strong association with the short-term risk. When confining the analyses of severe pneumonia to events occurring within the first 2 years after blood sampling, the short-term risk elevation was over four times stronger than that observed for long-term risk — individuals with high levels of the multi-biomarker score were almost 7-times more susceptible than people with low levels (Figure 4B). If similar enhancement in short-term risk extend to COVID-19, our results could potentially indicate applications for identification of individuals at high susceptibility to a severe COVID-19 disease course. However, the unavailability of metabolic biomarker data from blood samples drawn shortly prior to the pandemic prevents us from examining biomarker associations with short-term COVID-19 susceptibility, and our results should therefore be considered of hypothesis generating nature.

We observed multiple blood biomarkers commonly linked with the risk for cardiovascular disease and diabetes (Soininen et al., 2015; Würtz et al., 2017; Holmes et al., 2018; Ahola-Olli et al., 2019) to also be associated with increased susceptibility to both severe pneumonia and severe COVID-19. The biomarkers span multiple metabolic pathways, including low concentrations of lipoprotein lipids, impaired fatty acid balance, decreased amino acid levels and high chronic inflammation. This is the first study to show that many of these blood biomarkers associate with susceptibility to severe infections, potentially indicating that fatty acids and amino acids should not be considered only as biomarkers for cardiometabolic risk. The associations of omega-3 and other fatty acids with the risk for severe COVID-19 may be particularly important, as these measures are more directly modifiable by lifestyle means than common markers of inflammation. The overall pattern of biomarker associations followed a characteristic metabolic signature reflective of an increased susceptibility to a severe infectious disease. This pattern of biomarker associations is broadly similar to what has previously been reported with the risk for all-cause mortality in smaller prospective cohort studies (Deelen et al., 2019). It is therefore unlikely that the identified biomarker signature is specific to the risk for severe pneumonia and COVID-19, or even specific to infectious diseases in general. We propose that the overall metabolic biomarker perturbations observed here reflect molecular signals of low-grade inflammation that exacerbate disease severity, in case of both infectious and chronic diseases (Akbar and Gilroy, 2020; Bonafè et al., 2020). In line with this, prior studies have demonstrated that elevated levels of GlycA, the biomarker with the strongest weight in the infectious disease score, is associated with increased neutrophil activity and the long-term risk for fatal infections (Ritchie et al., 2015). Such over-activity of immune response from pneumonia or COVID-19 infection is known to cause tissue damage and organ dysfunction through cytokine storm, a common complication of severe COVID-19 (Mangalmurti and Hunter, 2020). While the specific biological mechanisms underpinning the blood metabolic biomarker associations with chronic and infectious diseases remain poorly understood, we emphasize that the observational character of our study does not allow us to conclude whether the biomarkers are contributing causally to increase the risk or are merely indirect risk markers.

Replication of novel biomarker associations is a key aspect in observational studies. We are not aware of other prospective studies with sufficient COVID-19 hospitalisation events and NMR-based metabolic biomarker data to address this. However, a preprint of the present study featured analysis of 195 severe COVID-19 cases, based on data available in UK Biobank back in June 2020 (Julkunen et al., 2020). In the present updated analyses, with over three times the number of cases, all biomarker associations with susceptibility to severe COVID-19 were similar or stronger, and hereby provide a within-cohort replication of our initial findings. In addition, a recent study used the same metabolic biomarker panel in three cohorts of hospitalised patients and observed similar overall biomarker perturbations to be predictive of COVID-19 severity (Dierckx et al., 2020). The study also reported the multi-biomarker infectious disease score to be among the strongest biomarkers for discriminating COVID-19 severity among already hospitalised patients.

Our study has both strengths and limitations. Strengths include the large sample size, which enabled the analysis of biomarkers for susceptibility to severe COVID-19 based on pre-pandemic blood samples from general population settings. We used a validated metabolic profiling platform that enables simultaneous quantification of numerous metabolic biomarkers in a scalable low-cost setup. Although the number of hospitalised COVID-19 cases was in line with the prevalence in England, we acknowledge that the statistical power was limited for prediction analyses even with close to 100,000 samples linked with COVID-19 outcome data. Furthermore, the UK Biobank study participants are not fully representative of the UK population by demographic characteristics; the individuals were enrolled on a volunteer basis and are therefore more representative of healthier individuals than average (Sudlow et al., 2015; Fry et al., 2017). Even though this is generally not a concern for investigating risk associations (Keyes and Westreich, 2019), it does limit the statistical power to explore effects of ethnicity and old age. Other limitations include the decade long duration from blood sampling to the COVID-19 pandemic. While this limits inference on how well the biomarkers predict short-term risk for severe COVID-19, our analogy with long-term risk for severe pneumonia indicates that the time lag likely attenuates the biomarker association magnitudes substantially. Conversely, the remarkably strong associations for short-term risk of severe pneumonia led us to speculate that similar enhancements in association magnitudes could also hold for severe COVID-19. However, this inference should be further tested, in particular in the light of the bacterial origin of many severe pneumonia cases and the viral origin of COVID-19. Weaker biomarker associations for severe COVID-19 compared to severe pneumonia may also arise from the UK Biobank COVID-19 data being influenced by ascertainment bias in terms of differential healthcare seeking and differential testing (Griffith et al., 2020), whereas pneumonia is anticipated to have nearly complete case ascertainment (Ho et al., 2020).

In conclusion, a metabolic signature of perturbed blood biomarkers is associated with an increased susceptibility to both severe pneumonia and COVID-19 in blood samples collected a decade before the pandemic. The multi-biomarker score captures an elevated susceptibility to severe pneumonia within few years after blood sampling that is several times stronger than the risk elevation associated with many pre-existing health conditions, such as obesity and diabetes (Ho et al., 2020). If the three- to fourfold elevation in short-term risk compared to long-term risk of severe pneumonia also applies to severe COVID-19, then the metabolic biomarker profiling could potentially complement existing tools for identifying individuals most susceptible to a severe COVID-19 disease course. Regardless of the translational prospects, these results provide novel understanding on how metabolic biomarkers may reflect the susceptibility of severe COVID-19 and other infections.

Materials and methods

Study population

Request a detailed protocol

Details of the design of the UK Biobank have been reported previously (Sudlow et al., 2015). Briefly, the UK Biobank recruited 502,639 participants aged 37–70 years in 22 assessment centres across the UK. All study participants had to be able to attend the assessment centres by their own means, and there was no enrolment at nursing homes. All participants provided written informed consent and ethical approval was obtained from the North West Multi-Center Research Ethics Committee. Blood samples were drawn at baseline between 2007 and 2010. The current analysis was approved under UK Biobank Project 30418. No selection criteria were applied to the sampling.

Metabolic biomarker profiling

Request a detailed protocol

From the entire UK Biobank population, a random subset of non-fasting baseline plasma samples (aliquot 3) from 118 466 individuals and 1298 repeat-visit samples were measured using high-throughput NMR spectroscopy (Nightingale Health Plc; biomarker quantification version 2020). This provides simultaneous quantification of 249 metabolic biomarker measures in a single assay, including routine lipids, lipoprotein subclass profiling with lipid concentrations within 14 subclasses, fatty acid composition, and various low-molecular-weight metabolites such as amino acids, ketone bodies, and glycolysis metabolites quantified in molar concentration units. Technical details and epidemiological applications of the metabolic biomarker data have been reviewed (Soininen et al., 2015; Würtz et al., 2017). The Nightingale NMR platform has received various regulatory approvals, including CE-mark, and 37 biomarkers in the panel have been certified for diagnostics use. We focused on this particular set of certified biomarkers, as we wanted to investigate if these markers of systemic metabolism — commonly linked to cardiometabolic diseases — could also be associated with future risk for severe infectious disease. Furthermore, these clinically validated biomarkers span most of the different metabolic pathways measured by the NMR platform and could facilitate potential translational applications as they are certified for diagnostics use and are measured simultaneously in a single assay. The mean and standard deviation of concentrations for 249 quantified metabolic biomarkers are given in Supplementary file 2.

Measurements of the metabolic biomarkers were conducted blinded prior to the linkage to the UK Biobank health outcomes. The metabolic biomarker data were curated and linked to UK Biobank clinical data in late-May 2020. The metabolic biomarker dataset has been made available for the research community through the UK biobank in March 2021.

Severe pneumonia outcomes

Request a detailed protocol

We combined ICD-10 codes J12–J18 to define the pneumonia endpoint. To strengthen the analogy with the analysis of severe COVID-19, we focused on severe pneumonia events, defined as diagnosis in hospital or death records based on UK Hospital Episode Statistics data and national death registries (2507 incident cases in the current study). All analyses are based on the first occurrence of a diagnosis. Therefore, 2658 individuals with recorded hospitalisation of pneumonia prior to the blood sampling were excluded. Additionally, 346 individuals with pneumonia diagnosis recorded in primary care settings and by self-reports were also omitted from the analyses. The registry-based follow-up was from blood sampling in 2007–2010 through to 2016–2017, depending on assessment centre (850,000 person-years).

Severe COVID-19 outcomes

Request a detailed protocol

We used COVID-19 data available in the UK Biobank per 3rd of February 2021, which covers test results from 16 March to 1st of February 2021. These data include information on positive/negative PCR-based diagnosis results and explicit evidence in the microbiological record on whether the participant was an inpatient (Resource UKBiobankD, 2020). For the present analyses, we focused on PCR-positive inpatient diagnoses. These hospitalised cases are here denoted as severe COVID-19 (652 cases in the current study). COVID-19 data were not available for assessment centres in Scotland and Wales, so individuals from these centres were excluded. Individuals who had died during follow-up prior to 2018 were also excluded, since they were never exposed to COVID-19.

Control group

Request a detailed protocol

The entire study population of non-cases was used as controls in the statistical analyses (n = 102,639 for severe pneumonia and n = 92,073 for severe COVID-19, respectively). This choice of controls is consistent with the majority of publications examining risk factors for susceptibility to severe COVID-19 (e.g. Ho et al., 2020; Williamson et al., 2020). It allows to address the question of whether an initially healthy person with a high value of a given biomarker is at an increased risk of eventually getting the disease outcome (severe pneumonia or COVID-19 hospitalisation) compared to people from the general population with low levels of the biomarker. This choice of controls also overcomes biases that may arise from analyses using confirmed mild infections as the control group, such as collider bias caused by non-random testing of the control group compared to the rest of the study population (Griffith et al., 2020).

Prevalent diseases

Request a detailed protocol

To examine the influence of prevalent diseases in the prospective analyses of severe pneumonia and severe COVID-19, we used the following: prevalent cardiovascular disease (ICD-10 codes I20–I25, I50, I60–I64, and G45), diabetes (E10–E14), lung cancer (C33–C34, D02.2, Z85.1), chronic obstructive pulmonary disease (COPD; J43–J44), liver diseases (K70–K77), renal failure (N17–N19), and dementia (F00-F03).

Statistical methods

Request a detailed protocol

Biomarker levels outside four interquartile ranges from median were considered as outliers and excluded. All 37 biomarkers were scaled to standard deviation (SD) units prior to analyses. For biomarker association testing with severe pneumonia and with severe COVID-19 (as separate outcomes), we used logistic regression models adjusted for age, sex, and assessment centre. To examine the utility of multiple biomarkers in combination, we used a weighted sum of the biomarkers optimised for association with future risk of severe pneumonia; this multi-biomarker score was denoted as ‘infectious disease score’. To minimise the collinearity of the biomarkers, the multi-biomarker score was trained using logistic regression with least absolute shrinkage and selection operator (LASSO), which uses L1 regularisation that adds penalty equal to the absolute value of the magnitude of the coefficients. The multi-biomarker infectious disease score was trained using half of the study population with complete data available for the 37 clinically validated biomarkers (n = 52,573 and 1257 severe pneumonia events) using five-fold cross-validation to optimise the regularizsation parameter λ. The remaining half of the study population was used in validating the performance of the biomarker score in relation to future risk for severe pneumonia. The multi-biomarker infectious disease score was subsequently tested for association with severe pneumonia and COVID-19 in logistic regression models adjusted for age, sex, and assessment centre. We further examined the effect of additional adjustment for body mass index (BMI) and smoking status (never, former, current) and prevalent diseases. The associations were also examined by omitting individuals with prevalent diseases and stratified by age and sex. In the case of severe pneumonia, we further examined the association magnitudes according to follow-up time: we used severe pneumonia events occurring during 7–11 years after the blood sampling to mimic the decade long lag from blood sampling to the COVID-19 pandemic, and severe pneumonia events occurring within the first 2 years to interpolate to the scenario of preventative COVID-19 screening carried out today. In both scenarios, the confined follow-up times were arbitrarily chosen to be as short as possible while ensuring sufficient numbers of events. Finally, to explore potential non-linear effects, the infectious disease score was plotted as a proportion of individuals who contracted severe pneumonia during follow-up when binning individuals into percentiles of the infectious disease score (Khera et al., 2018). The time-resolution was further examined by Kaplan-Meier curves of the cumulative risk for severe pneumonia.

Data availability

The data are available for approved researchers from UK Biobank. The metabolic biomarker data has been released to the UK Biobank resource in March 2021.

The following data sets were generated

References

Decision letter

  1. Edward D Janus
    Reviewing Editor; University of Melbourne, Australia
  2. Matthias Barton
    Senior Editor; University of Zurich, Switzerland
  3. Harin Karunajeewa
    Reviewer; University of Western Australia, Australia
  4. Edward D Janus
    Reviewer; University of Melbourne, Australia

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

The authors show that in over 100,000 healthy individuals a baseline multi-biomarker score comprised of 25 proteins, fatty acids, amino acids and lipids obtained simultaneously from a single sample strongly predicts the risk of severe pneumonia and COVID 19 hospitalization 7-11 years later. These findings now been validated in Belgian patients provide novel insights for both pneumonia and severe COVID -19 risk.

Decision letter after peer review:

Thank you for submitting your article "Metabolic biomarker profiling for identification of high risk for severe pneumonia and COVID-19 hospitalisation" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, including Edward D Janus as the Reviewing Editor and Reviewer #3, and the evaluation has been overseen by a Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Harin Karunajeewa (Reviewer #2).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

We would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (https://elifesciences.org/articles/57162). Specifically, when editors judge that a submitted work as a whole belongs in eLife but that some conclusions require a modest amount of additional new data, as they do with your paper, we are asking that the manuscript be revised to either limit claims to those supported by data in hand, or to explicitly state that the relevant conclusions require additional supporting data.

Our expectation is that the authors will eventually carry out the additional experiments and report on how they affect the relevant conclusions either in a preprint on bioRxiv or medRxiv, or if appropriate, as a Research Advance in eLife, either of which would be linked to the original paper.

Summary:

The authors report that in over 100,000 healthy individuals a baseline multi-biomarker score comprised of 25 proteins, fatty acids, amino acids and lipids strongly predicts the risk of severe pneumonia and COVID 19 hospitalization 7-11 years later. The issue is whether other parameters related to inflammatory or immunological processes have been excluded. Also there are many simpler scores for determining severity and outcomes in clinical practice.

Thus there are two aspects the population susceptibility and the prediction of severe disease. There is a need to change the comparison group if looking for a useful predictor of severe disease.

The paper needs to be revised to highlight its hypothesis generation aspects which is its major point of interest, but also to cover or at least discuss the broader context of other inflammatory, immunological markers and other patient susceptibility characteristics and limitations including whether there is actual clinical utility.

Essential revisions:

This is a dense and complex analysis of a large, rich and powerful dataset that examines a clinically relevant issue that is obviously extremely timely. As an observational study it is limited to "hypothesis-generating conclusions". Its findings are intriguing and potentially of high medical and scientific importance.

The manuscript identifies a group of biomarkers that is highly predictive of severe pneumonia and COVID-19 hospitalization in prospective samples that were collected many years earlier. There are several strengths of the manuscript, including the large samples size using the UK biobank. The inquiry of a consistent set of markers that predict severe response to multiple important outcomes is also important. However, there are also not notable issues with the manuscript that need to be addressed and we believe inaccurate conclusions have been drawn from the analyses.

1. Intuitively one would expect markers of inflammation and immunological susceptibility to be the strongest predictors in this disease situation. Have these been explored previously? Is it that you are testing the already available marker set heavily slanted to metabolic markers because that is what is available so its wider use is being explored?

2. The primary concern with the manuscript as written is that the controls being used in both analyses do not seem appropriate. It is unclear why controls are being used in the analyses vs. those with pneumonia without a severe response or those with COVID-19 without a severe reaction. Aren't those the real comparison groups? By just using random controls, the analysis is not taking into account that disease itself. This is a fundamental flaw in the way the analyses are presented and the conclusions drawn. Furthermore, it completely inflates that actual statistical power in the analyses, given that the control group is literally several magnitudes bigger in the case of COVID-19.

3. Having said that, at an absolute minimum, these analyses should be rerun using COVID-19+ with severe and non-severe response. In this case, the real question is being answered: using prospective samples or individuals who get pneumonia/COVID, can we predict this that have a severe response vs. those who do not? In this case we are actually deciphering the difference between the biology of individuals who we know will have a severe response and those who we know will not. In the analyses, that the authors do, we have no idea who in the control group would have a severe response and who would not. Therefore the control group is completely inappropriate for addressing that issue.

4. In the case where one is using severe/non-severe, it should be acknowledged that the actual statistical power of the tru comparison is substantially lower that what is being used with the larger UK biobank samples.

5. Some approach to assess the robustness of these findings in an additional population would be ideal; however, it is acknowledged that this may not be feasible currently.

6. Please advise specifically on this point – that the metabolomic data has been transparently documented prior to the linked UK biobank data being unlocked, so that it can be matched up with that present in the final linked dataset – ie verifies that authors were blinded and hasn't changed since unblinded. The UK biobank policies and procedures hopefully address these issues of research integrity.

7. Please expand on clinical utility. The tests and score would need to be affordable and readily available and compliment existing predictors – not only cardiometabolic diseases but many others and specifically age and be cost effective.

A key line in the introduction outlines the study's rationale as its potential value for risk stratification as "facilitating the identification of high-risk individuals missed by current clinical tools". I think the authors should therefore address this point more explicitly when framing their results in the conclusion. To what extent do their results really support this statement? This could take the form of providing important context as to the predictive ability of existing or proposed clinical risk stratification tools. There has been an explosion of these in relation to COVID, many of which show that easily collected clinical and demographic information can predict very large differences in risk of hospitalization or death – eg Halalau et al. Annals of Internal Medicine quote an odds ratio of 19 for risk of hospitalization in their highest risk strata. So how much would metabolomic profiling really add to information much more readily and easily accessible without a blood test and expensive laboratory analysis?

8. The authors need to acknowledge the limitations of the clinical data that has been used to adjust comparisons. For instance they have adjusted for cardiac and respiratory disease but there is no mention of dementia, general "frailty, residence in a nursing home and measures of independent function which are all very highly associated with the risk of pneumonia. Could the high risk metabolic profiles lie on the same causal pathway and therefore just be surrogates for these easily-measured biological risk factors? Again, goes to the question of what metabolomic profiling could add to much more readily available clinical and demographic information.

https://doi.org/10.7554/eLife.63033.sa1

Author response

Summary:

The authors report that in over 100,000 healthy individuals a baseline multi-biomarker score comprised of 25 proteins, fatty acids, amino acids and lipids strongly predicts the risk of severe pneumonia and COVID 19 hospitalization 7-11 years later. The issue is whether other parameters related to inflammatory or immunological processes have been excluded. Also there are many simpler scores for determining severity and outcomes in clinical practice.

Thus there are two aspects the population susceptibility and the prediction of severe disease. There is a need to change the comparison group if looking for a useful predictor of severe disease.

The paper needs to be revised to highlight its hypothesis generation aspects which is its major point of interest, but also to cover or at least discuss the broader context of other inflammatory, immunological markers and other patient susceptibility characteristics and limitations including whether there is actual clinical utility.

We thank the editors and reviewers for their valuable comments. We have now revised the paper according to the comments. We have also shifted the wording to emphasise the population susceptibility rather than the prediction aspect, and toned down some of the initially proposed public health implications to be more in line with current COVID-19 preventative policies and vaccine roll outs in many countries.

We would like to emphasize two points that substantially strengthen the revised manuscript:

1. We have re-analysed biomarker associations with severe COVID-19 based on updated COVID-19 data from UK Biobank. This update more than triples the number of severe COVID-19 cases (from 195 to 652). All association magnitudes are similar or stronger in these re-analyses, and the statistical significance is therefore substantially improved: all analyses of the multi-biomarker infectious disease score vs severe COVID-19 risk are now significant and many individual biomarkers are also now significant (Figure 6). The updated analysis thus provides a within-cohort replication of our initial results. The updated analysis also overcomes a prime limitation of the initial submission, since the multi-biomarker score associations with severe COVID-19 are now robust to adjustments and omission of individuals with prevalent diseases (Figure 8).

2. After our initial submission, a medRxiv preprint (Dierckx et al. (2020), https://doi.org/10.1101/2020.11.09.20228221) has provided independent validation of our biomarker findings in hospital settings: this preprint examines the same metabolic biomarker panel in relation to COVID-19 severity in three cohorts of COVID‑19 patients in Belgian hospitals. Their results confirm that all the key changes in the metabolic biomarkers and the “infectious disease biomarker score” that we propose to be predictive of severe COVID-19 among initially disease-free individuals, are also discriminating COVID-19 severity among already hospitalised patients.

Essential revisions:

This is a dense and complex analysis of a large, rich and powerful dataset that examines a clinically relevant issue that is obviously extremely timely. As an observational study it is limited to "hypothesis-generating conclusions". Its findings are intriguing and potentially of high medical and scientific importance.

The manuscript identifies a group of biomarkers that is highly predictive of severe pneumonia and COVID-19 hospitalization in prospective samples that were collected many years earlier. There are several strengths of the manuscript, including the large samples size using the UK biobank. The inquiry of a consistent set of markers that predict severe response to multiple important outcomes is also important. However, there are also not notable issues with the manuscript that need to be addressed and we believe inaccurate conclusions have been drawn from the analyses.

1. Intuitively one would expect markers of inflammation and immunological susceptibility to be the strongest predictors in this disease situation. Have these been explored previously?

Many inflammatory biomarkers have been shown to predict COVID-19 severity among infected individuals, but only few studies have examined molecular markers of inflammation in relation to future COVID-19 risk among healthy individuals. This is likely due to the challenge of attaining the kind of study design we present here, as it requires that the biomarkers are measured from blood samples drawn prior to the pandemic. Studies with similar design based on UK Biobank have shown that hsCRP and blood cell counts are only modest predictors of COVID-19 hospitalisation and pneumonia (Ho et al., BMJ Open 2020); and those results are weaker than many of the metabolic biomarkers analysed in the present study (p. 7).

Glycoprotein acetyls (GlycA), the primary marker of low-grade inflammation in the metabolic biomarker panel analysed in our study, has indeed previously been associated with long-term risk for severe infections (Ritchie et al., Cell Systems 2015). Also PUFA% has been associated with risk of death from non-localised infections (Deelen et al., Nature Communications 2019), but no prior studies have examined the wider panel of metabolic biomarkers quantified by the NMR platform in relation to infectious diseases. Although GlycA was the strongest individual biomarker for the risk of severe pneumonia, this was not the case for the risk of severe COVID‑19 (for which blood levels of DHA and monounsaturated fatty acids were the strongest biomarkers). In addition to the expected results on the GlycA inflammatory biomarker, our study demonstrates that many cardiometabolic biomarkers, such as fatty acids and amino acids, can also be predictive of severe infectious disease risk among generally healthy individuals (p. 6). We believe that the novel evidence on how the susceptibility to severe infectious diseases is reflected in the systemic metabolic profile could broaden the understanding of impaired immunity beyond the focus on established inflammatory biomarkers. In addition, we show that the combination of many metabolic biomarkers measured simultaneously from a single assay provides stronger associations for infectious disease susceptibility than individual biomarkers.

Is it that you are testing the already available marker set heavily slanted to metabolic markers because that is what is available so its wider use is being explored?

All the metabolic markers analysed in our study are indeed obtained simultaneously in a single assay using NMR spectroscopy. We have now clarified this point in the manuscript (p. 3 and 7). The measurements of these metabolic biomarkers in >100,000 samples from the UK Biobank were undertaken prior to the COVID-19 pandemic. It is thus a pre-specified set of biomarkers that has been related primarily to risk for cardiovascular disease and diabetes in earlier epidemiological studies based on smaller cohorts.

The reason for focusing on this particular set of metabolic biomarkers was two-fold: first, we wanted to investigate whether these biomarkers of systemic metabolism, and commonly described as cardiometabolic risk markers, could be also associated with susceptibility to a severe infectious disease course in general population settings. The second reason was that these biomarkers can be measured simultaneously in a single blood assay at low cost. This allows using multi-biomarker scores for enhanced risk prediction without additional costs. These points can both provide molecular insights into COVID-19 etiology, and potentially open for novel tools to support identification of individuals who are most susceptible. The panel of biomarkers shown in Figures 2 and 6 are those from the employed NMR platform that have been certified for clinical use and which would therefore facilitate potential translational applications, as now noted on p. 3 and 7.

2. The primary concern with the manuscript as written is that the controls being used in both analyses do not seem appropriate. It is unclear why controls are being used in the analyses vs. those with pneumonia without a severe response or those with COVID-19 without a severe reaction. Aren't those the real comparison groups? By just using random controls, the analysis is not taking into account that disease itself. This is a fundamental flaw in the way the analyses are presented and the conclusions drawn. Furthermore, it completely inflates that actual statistical power in the analyses, given that the control group is literally several magnitudes bigger in the case of COVID-19.

The use of the entire non-case study population as controls in our statistical analyses is the established approach for biomarker association testing with disease risk in the general population settings – regardless of whether it is for infectious diseases, cardiovascular disease or any other disease. The same selection of controls is used by many other COVID-19 related publications based on UK Biobank (e.g. Petermann-Rocha et al., BMC Medicine 2020). The choice of using all non-cases as controls, rather than only test-positive mild cases, is also taken in high profile papers on risk factors associated with COVID-19-related death, e.g. the OpenSAFELY study on determining the relative risk ascribed to the prevalence of many chronic conditions (Williamson et al., Nature 2020).

We are convinced that the chosen control group and the statistical analysis framework is the appropriate one to address study question we are after: for a person with a given value of the biomarker score, what is the risk increase for the disease outcome (severe pneumonia or COVID-19 hospitalisation) compared to the risk in a reference population (such as people in the general population with median levels of that biomarker score). This is the same principle as when risk factors for COVID-19 mortality are reported, e.g. the two-fold risk increase linked to class III obesity vs people with normal weight (Williamson et al., Nature 2020). The chosen statistical analyses can thus be used to identify individuals who are at increased risk compared to their peers according to their metabolic biomarker profile. We have now revised the wording throughout the manuscript to emphasize that we estimate the susceptibility for the disease outcome with respect to that person’s peers, rather than the relative risk of a given person for getting severe disease vs mild disease. We have also clarified the selection of control group in detail on p. 8.

The statistical power in the analyses is dictated by the number of cases when there is strong class-imbalance between the number of cases and controls. We address this point further below in answer #4.

3. Having said that, at an absolute minimum, these analyses should be rerun using COVID-19+ with severe and non-severe response. In this case, the real question is being answered: using prospective samples or individuals who get pneumonia/COVID, can we predict this that have a severe response vs. those who do not? In this case we are actually deciphering the difference between the biology of individuals who we know will have a severe response and those who we know will not. In the analyses, that the authors do, we have no idea who in the control group would have a severe response and who would not. Therefore the control group is completely inappropriate for addressing that issue.

As reasoned above, we are answering a slightly different study question than whether an individual will have a severe response or not; instead, we deliberately address the study question of the elevated risk for severe disease relative to peers from the general population as the reference group. We have now framed this point as identification of individuals at increased susceptibility in the general population, e.g. in the title and the abstract of the revised manuscript.

The chosen study design was also deemed necessary due to the limited COVID-19 testing performed in the UK at the time of the analysis. At the time of submission, the COVID-19 testing was heavily focused on healthcare workers and those with severe symptoms. Therefore, the subset of UK biobank participants who were tested for COVID-19 were far from being a random sample of the study participants. This is known to cause bias and potentially spurious results, and it prevents us from robustly addressing the question of the risk for severe vs mild response for an infected individual. For instance, being a health care worker influences the likelihood of being tested at all, and in turn more healthcare workers than average end of being tested positive – if health care workers generally have more healthy biomarker profiles than mild cases who never get tested in the general population, it can inflate the differences between mild and severe cases.

The various biases introduced by different case-control designs of COVID-19 analyses in the UK Biobank have been described in detail recently (Griffith et al., Nature Communications 2020); we have now referred to this in the revised manuscript (p. 7 and 8). In the case of the suggested mild vs severe cases design the authors note: when sampling conditional on having a positive test for COVID-19 infection and testing is unlikely to be random, then conditioning on the positive test result introduces bias by factors causing severe infections as well as those causing increased likelihood of testing. The ascertainment bias induced by these factors would have therefore caused the associations to not reflect patterns in the general population of interest. Please also see our response above explaining the study question we aim to address with the chosen study design.

Finally, we note that the fraction of the population with severe COVID-19 in UK Biobank largely follow the hospitalisation rate in the UK, see Figure 1—figure supplement 1. Only a small fraction of hospitalised cases is likely to be missed (due to lack of inpatient positive PCR test); however, many mild cases continue to be missed throughout the pandemic, which further complicates addressing the study design and related scientific questions suggested by the reviewers.

4. In the case where one is using severe/non-severe, it should be acknowledged that the actual statistical power of the tru comparison is substantially lower that what is being used with the larger UK biobank samples.

Please see our response above explaining the reasoning behind the chosen study design.

We acknowledge that the statistical power is dictated by the number of cases rather than the total of the study population. We now further note this in the revised manuscript (p. 7).

The tripling of the number of severe COVID-19 cases in our analyses of the most recent UK Biobank alleviates this limitation. Nonetheless, we would like to emphasise that a sample size of >100,000 individuals is required to obtain sufficient number of cases in the prospective population-based study design that we present – and that this a major novelty compared to studies based on samples from already infected individuals.

5. Some approach to assess the robustness of these findings in an additional population would be ideal; however, it is acknowledged that this may not be feasible currently.

We agree that replication and validation of the findings are essential for biomarker studies. No replication data existed at the time of submission, owing to difficulties in accessing pre-pandemic blood samples with subsequent COVID-19 infection. This has now changed, with two sources providing replication and validation of our initial findings as detailed in a new paragraph on p. 6.

First, UK Biobank updates the information on COVID-19 hospitalisations on a monthly basis. This has more than tripled the number of cases with severe COVID-19 since the initial submission, from 195 to 652 individuals with metabolic biomarker data. When re-analysing the data using the enhanced number of severe COVID-19 cases, all association magnitudes are now similar or even stronger. This provides within-cohort replication of our results. Second, our findings have later on been supported by study on Belgian COVID-19 patient cohorts (Dierckx et al. 2020, MedrXiv preprint). The authors report strong associations with COVID-19 severity for the same set of metabolic biomarkers that we observe, e.g. inflammatory markers, amino acids and fatty acids. The multibiomarker score derived in the present study, “Infectious Disease Score”, was also found to be strongly associated with COVID-19 severity in these already infected patients (odds ratios of 4 and 4.5 with COVID-19 for having the most severe disease course, compared to mild-albeit-hospitalised disease course, in both patient cohorts with such data available). These results provide further support to the robustness of our findings.

6. Please advise specifically on this point – that the metabolomic data has been transparently documented prior to the linked UK biobank data being unlocked, so that it can be matched up with that present in the final linked dataset – ie verifies that authors were blinded and hasn't changed since unblinded. The UK biobank policies and procedures hopefully address these issues of research integrity.

The measurements of the metabolic biomarker data have been conducted blinded prior to linkage to the UK Biobank health outcomes. We have now clarified this point in the revised manuscript. We also emphasized that all the metabolic biomarker data analyzed for this paper are becoming available for the research community through UK Biobank, with data released on 23 March 2021 (p. 8 and 10).

We would like to note that the metabolic biomarker measurements have been undertaken in accordance with a prespecified protocol approved by UK Biobank’s biomarker advisory committee and UK Biobank laboratory. This includes detailed quality control assurance using blinded duplicate blood samples measured throughout the project. These aspects are detailed along with the release of the metabolic biomarker data for the research community by UK Biobank.

7. Please expand on clinical utility. The tests and score would need to be affordable and readily available and compliment existing predictors – not only cardiometabolic diseases but many others and specifically age and be cost effective.

We have toned down the wording on clinical utility throughout in the revised manuscript to account for the latest public health policy developments. Instead, we have emphasized the novel biological insights gained from the analyses of both pneumonia and severe COVID-19 risk.

The re-analyses of the COVID-19 data to the latest follow-up in UK Biobank, which more than tripled the number of cases with severe disease, reinforces the utility to identify high-risk individuals over and above existing risk factors. This includes age, which is accounted for in all the models. Nonetheless, the real-world clinical utility relies on the premise that the 3-fold increase in short-term risk observed for severe pneumonia, compared to long-term risk, also applies for risk of severe COVID-19. The inherently long follow-up time between UK Biobank blood sampling and COVID-19 prevents us from demonstrating this point with currently available data. We have acknowledged this limitation in the manuscript; further studies with shorter time between blood sampling and COVID-19 are under way, but accessing large amounts of samples with ascertained COVID-19 hospitalized and blood drawn prior to the pandemic is challenging.

With a view to translation, we would like to highlight that employed NMR platform has received various regulatory approvals, and the 37 biomarkers included in the present study have been certified for diagnostics use.

A key line in the introduction outlines the study's rationale as its potential value for risk stratification as "facilitating the identification of high-risk individuals MISSED BY CURRENT CLINICAL TOOLS". I think the authors should therefore address this point more explicitly when framing their results in the conclusion. To what extent do their results really support this statement? This could take the form of providing important context as to the predictive ability of existing or proposed clinical risk stratification tools. There has been an explosion of these in relation to COVID, many of which show that easily collected clinical and demographic information can predict very large differences in risk of hospitalization or death – eg Halalau et al. Annals of Internal Medicine quote an odds ratio of 19 for risk of hospitalization in their highest risk strata. So how much would metabolomic profiling really add to information much more readily and easily accessible without a blood test and expensive laboratory analysis?

We have now reworded the introduction and toned down the claims on potential clinical utility throughout, as described for the previous point. Despite the improved results in Figure 8 in the revised manuscript, the specific utility compared to other risk factors rely on the assumption of stronger associations for short-term risk of severe COVID-19, as demonstrated for short-term risk of severe pneumonia. We have also strongly clarified this point in the Abstract, Introduction and Discussion (p. 1, 2, 5 and 7).

We agree that many studies have suggested risk stratification tools for patients with acute COVID-19 infection, such as the cited paper by Halalau et al. That is very different from our study, which aims to identify high risk individuals prior to being infected, i.e. the susceptibility to a severe disease course in case they are infected. The risk differences are generally much smaller in this type of study setting. One of the most highly cited papers with such study setting is Williamson et al. Nature 2020, where they report hazard ratios of ~2-3 for many diseases that are commonly used to shift middle-aged people into the group of “high-risk individuals”, such as uncontrolled diabetes, class III obesity, and newly diagnosed cancer. The results we present on Figure 8 are of similar hazard ratios, and based on our hypothesis generating results for short-term risk of severe pneumonia, the odds ratios are likely to be considerably stronger magnitude for short-term risk.

8. The authors need to acknowledge the limitations of the clinical data that has been used to adjust comparisons. For instance they have adjusted for cardiac and respiratory disease but there is no mention of dementia, general "frailty, residence in a nursing home and measures of independent function which are all very highly associated with the risk of pneumonia. Could the high risk metabolic profiles lie on the same causal pathway and therefore just be surrogates for these easily-measured biological risk factors? Again, goes to the question of what metabolomic profiling could add to much more readily available clinical and demographic information.

We have now revised the analyses to also account for presence of dementia. The results were essentially unaltered because of this – in fact, none of the 652 severe COVID-19 cases had a dementia diagnosis at the time of blood sampling. This point reflects the general characteristics of the study population in the UK Biobank, with baseline age 40–70 and volunteer enrollment skewing the disease prevalence towards more healthy than average. All study participants had to be able to attend the assessment centres by their own means, and there was no enrollment at nursing homes. We have now clarified these points in the revised manuscript (p 4).

We agree that the metabolic biomarkers highlighted in our study correlate with ageing, and that they likely provide a molecular reflection of frailty, immune response and overall susceptibility to severe disease. However, the associations with future risk for pneumonia and severe COVID-19 were only weakly attenuated after accounting for prevalent cardiovascular disease, diabetes, lung cancer, COPD, liver diseases, renal failure and dementia – we are therefore convinced that they do reflect aspects of frailty and disease risk that is not easily captured by readily available clinical and demographic information.

https://doi.org/10.7554/eLife.63033.sa2

Article and author information

Author details

  1. Heli Julkunen

    Nightingale Health Plc, Helsinki, Finland
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing
    Competing interests
    HJ is employee and holds stock options with Nightingale Health Plc.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-4282-0248
  2. Anna Cichońska

    Nightingale Health Plc, Helsinki, Finland
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Supervision, Investigation, Visualization, Methodology, Writing - original draft, Project administration
    Competing interests
    AC is employee and holds stock options with Nightingale Health Plc.
  3. P Eline Slagboom

    1. Molecular Epidemiology, Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, Netherlands
    2. Max Planck Institute for Biology of Ageing, Cologne, Germany
    Contribution
    Investigation, Methodology, Writing - original draft, Writing - review and editing
    Competing interests
    No competing interests declared
  4. Peter Würtz

    Nightingale Health Plc, Helsinki, Finland
    Contribution
    Conceptualization, Supervision, Funding acquisition, Investigation, Visualization, Methodology, Writing - original draft, Project administration, Writing - review and editing
    For correspondence
    peter.wurtz@nightingalehealth.com
    Competing interests
    PW is employee and shareholder of Nightingale Health Plc.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5832-0221
  5. Nightingale Health UK Biobank Initiative

    Nightingale Health Plc, Helsinki, Finland

Funding

Nightingale Health Plc

  • Heli Julkunen
  • Anna Cichońska
  • Peter Würtz

This work, including data collection, statistical analysis and writing of the paper, was done by employees of Nightingale Health Plc.

Acknowledgements

The authors are grateful to UK Biobank for access to data to undertake this study (Project #30418).

Ethics

Human subjects: The UK Biobank recruited 502 639 participants aged 37-70 years in 22 assessment centres across the UK. All participants provided written informed consent and ethical approval was obtained from the North West Multi-Center Research Ethics Committee. Details of the design of the UK Biobank have been reported previously (Sudlow et al PLOS Medicine 2015). The current analysis was approved under UK Biobank Project 30418.

Senior Editor

  1. Matthias Barton, University of Zurich, Switzerland

Reviewing Editor

  1. Edward D Janus, University of Melbourne, Australia

Reviewers

  1. Harin Karunajeewa, University of Western Australia, Australia
  2. Edward D Janus, University of Melbourne, Australia

Publication history

  1. Received: September 11, 2020
  2. Accepted: May 2, 2021
  3. Accepted Manuscript published: May 4, 2021 (version 1)
  4. Version of Record published: June 2, 2021 (version 2)

Copyright

© 2021, Julkunen et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,021
    Page views
  • 116
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

  1. Further reading

Further reading

    1. Computational and Systems Biology
    2. Epidemiology and Global Health
    Ewan Colman et al.
    Research Article

    Humans and other group-living animals tend to distribute their social effort disproportionately. Individuals predominantly interact with a small number of close companions while maintaining weaker social bonds with less familiar group members. By incorporating this behavior into a mathematical model, we find that a single parameter, which we refer to as social fluidity, controls the rate of social mixing within the group. Large values of social fluidity correspond to gregarious behavior, whereas small values signify the existence of persistent bonds between individuals. We compare the social fluidity of 13 species by applying the model to empirical human and animal social interaction data. To investigate how social behavior influences the likelihood of an epidemic outbreak, we derive an analytical expression of the relationship between social fluidity and the basic reproductive number of an infectious disease. For species that form more stable social bonds, the model describes frequency-dependent transmission that is sensitive to changes in social fluidity. As social fluidity increases, animal-disease systems become increasingly density-dependent. Finally, we demonstrate that social fluidity is a stronger predictor of disease outcomes than both group size and connectivity, and it provides an integrated framework for both density-dependent and frequency-dependent transmission.

    1. Epidemiology and Global Health
    Daniel J Laydon et al.
    Research Article Updated

    Background:

    Sanofi-Pasteur’s CYD-TDV is the only licensed dengue vaccine. Two phase three trials showed higher efficacy in seropositive than seronegative recipients. Hospital follow-up revealed increased hospitalisation in 2–5- year-old vaccinees, where serostatus and age effects were unresolved.

    Methods:

    We fit a survival model to individual-level data from both trials, including year 1 of hospital follow-up. We determine efficacy by age, serostatus, serotype and severity, and examine efficacy duration and vaccine action mechanism.

    Results:

    Our modelling indicates that vaccine-induced immunity is long-lived in seropositive recipients, and therefore that vaccinating seropositives gives higher protection than two natural infections. Long-term increased hospitalisation risk outweighs short-lived immunity in seronegatives. Independently of serostatus, transient immunity increases with age, and is highest against serotype 4. Benefit is higher in seropositives, and risk enhancement is greater in seronegatives, against hospitalised disease than against febrile disease.

    Conclusions:

    Our results support vaccinating seropositives only. Rapid diagnostic tests would enable viable ‘screen-then-vaccinate’ programs. Since CYD-TDV acts as a silent infection, long-term safety of other vaccine candidates must be closely monitored.

    Funding:

    Bill & Melinda Gates Foundation, National Institute for Health Research, UK Medical Research Council, Wellcome Trust, Royal Society.

    Clinical trial number:

    NCT01373281 and NCT01374516.