Epigenetic scores for the circulating proteome as tools for disease prediction

  1. Danni A Gadd
  2. Robert F Hillary
  3. Daniel L McCartney
  4. Shaza B Zaghlool
  5. Anna J Stevenson
  6. Yipeng Cheng
  7. Chloe Fawns-Ritchie
  8. Cliff Nangle
  9. Archie Campbell
  10. Robin Flaig
  11. Sarah E Harris
  12. Rosie M Walker
  13. Liu Shi
  14. Elliot M Tucker-Drob
  15. Christian Gieger
  16. Annette Peters
  17. Melanie Waldenberger
  18. Johannes Graumann
  19. Allan F McRae
  20. Ian J Deary
  21. David J Porteous
  22. Caroline Hayward
  23. Peter M Visscher
  24. Simon R Cox
  25. Kathryn L Evans
  26. Andrew M McIntosh
  27. Karsten Suhre
  28. Riccardo E Marioni  Is a corresponding author
  1. Centre for Genomic and Experimental Medicine, Institute of Genetics and Cancer, University of Edinburgh, United Kingdom
  2. Department of Physiology and Biophysics, Weill Cornell Medicine-Qatar, Education City, Qatar
  3. Computer Engineering Department, Virginia Tech, United States
  4. Department of Psychology, University of Edinburgh, United Kingdom
  5. Lothian Birth Cohorts, University of Edinburgh, United Kingdom
  6. Centre for Clinical Brain Sciences, Chancellor’s Building, University of Edinburgh, United Kingdom
  7. Department of Psychiatry, University of Oxford, United Kingdom
  8. Department of Psychology, The University of Texas at Austin, United States
  9. Population Research Center, The University of Texas at Austin, United States
  10. Research Unit Molecular Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Germany
  11. Institute of Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Germany
  12. German Center for Cardiovascular Research (DZHK), partner site Munich Heart Alliance, Germany
  13. German Center for Diabetes Research (DZD), Germany
  14. Scientific Service Group Biomolecular Mass Spectrometry, Max Planck Institute for Heart and Lung Research, W.G. Kerckhoff Institute, Germany
  15. German Centre for Cardiovascular Research (DZHK), Partner Site Rhine-Main, Max Planck Institute of Heart and Lung Research, Germany
  16. Institute for Molecular Bioscience, University of Queensland, Australia
  17. Medical Research Council Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, United Kingdom
  18. Division of Psychiatry, University of Edinburgh, Royal Edinburgh Hospital, United Kingdom
7 figures, 1 table and 2 additional files

Figures

EpiScores for plasma proteins as tools for disease prediction study design.

DNA methylation scores were trained on 953 circulating plasma protein levels in the KORA and LBC1936 cohorts. There were 109 EpiScores selected based on performance (r > 0.1, p < 0.05) in …

Figure 2 with 2 supplements
Test performance for the 109 selected protein EpiScores.

Test set correlation coefficients for associations between protein EpiScores for (a) inflammatory Olink, (b) neurology Olink, and (c) SOMAmer protein panel EpiScores and measured protein levels are …

Figure 2—figure supplement 1
Correlation heatmap for protein EpiScore measures in Generation Scotland.

Correlation heatmap for EpiScore measures projected into Generation Scotland (N = 9537) for the 109 protein EpiScores selected in the test sample (r > 0.1, p < 0.05). At the top of the heatmap, an …

Figure 2—figure supplement 2
GeneSet enrichment of canonical pathways common to the genes encoding proteins that were used to train the 109 selected EpiScores.

Genes selected for pathway enrichment (false discovery rate [FDR]-adjusted p < 0.05) are summarised, with the proportion of overlapping genes enriched in the gene-set also shown. The corresponding …

Figure 3 with 1 supplement
Nested Cox proportional hazards assessment of protein EpiScore-disease prediction.

Mixed effects Cox proportional hazards analyses in Generation Scotland (n = 9537) tested the relationships between each of the 109 selected EpiScores and the incidence of 12 leading causes of …

Figure 3—figure supplement 1
Phenotypic trait and estimated white blood cell proportion correlations with EpiScores.

Heatmap of Pearson's correlations (r) between the 70 protein EpiScore measures that were associated with incident disease (with p < 0.05 in the fully adjusted Cox mixed effects proportional hazards …

Protein EpiScore associations with incident disease.

EpiScore-disease associations for 9 of the 11 morbidities with associations where p < 0.05 in the fully adjusted mixed effects Cox proportional hazards models in Generation Scotland (n = 9537). …

Protein EpiScores that associated with the greatest number of morbidities.

EpiScores with a minimum of three relationships with incident morbidities in the fully adjusted Cox models. The network includes 16 EpiScores as dark blue (SOMAscan) and grey (Olink) nodes, with …

Replication of known protein-diabetes associations with protein EpiScores.

EpiScore-incident diabetes associations in Generation Scotland (n = 9537). The 34 SOMAscan (top panel) and four Olink (bottom panel) associations shown with p < 0.05 in fully adjusted mixed effects …

Author response image 1

Tables

Table 1
Incident morbidities in the Generation Scotland cohort.

Counts are provided for the number of cases and controls for each incident trait in the basic and fully adjusted Cox models run in the Generation Scotland cohort (n = 9537). Mean time-to-event is …

Basic modelFully adjusted model
MorbidityN casesN controlsYears to event
(mean, SD)
N casesN controlsYears to event
(mean, SD)
Rheumatoid arthritis6392895.6 (3.5)5277426.1 (3.3)
Alzheimer’s dementia6937647.7 (3)5231377.6 (3.1)
Bowel cancer7893986.4 (3.2)6678176.5 (3.2)
Depression9583174 (3.2)7569843.8 (3.2)
Lung cancer10094335.6 (3.2)7878505.6 (3.1)
Breast cancer13153566.1 (3.4)11144025.9 (3.4)
Inflammatory bowel disease19491145 (3.6)15575924.8 (3.6)
Stroke31390266.4 (3.4)24675476.3 (3.5)
COPD32289605.5 (3.4)25374765.5 (3.5)
Ischaemic heart disease38586495.6 (3.4)30272515.7 (3.4)
Diabetes42987575.6 (3.4)32273325.5 (3.4)
Pain132954804.8 (3.5)108145934.9 (3.5)
  1. COPD: chronic obstructive pulmonary disease.

Additional files

Supplementary file 1

Demographic information and supplementary datasets.

(A) Demographic and array information for the cohorts and samples used in the study. (B) SomaScan panel EpiScore performance in the Stratifying Resilience and Depression Longitudinally (STRADL) test set. (C) Performance of Olink panel EpiScores in holdout, STRADL, and LBC1921 test sets. (D) Annotations for the proteins corresponding to the 109 selected EpiScores. (E) Predictor weights for the 109 EpiScores from Olink and SomaScan panels which passed testing in independent cohorts. (F) CpG feature counts for the 109 selected EpiScores. (G) Frequency of CpG sites selected for EpiScores with EWAS catalog annotations to phenotypic traits. (H) FUMA canonical pathway Gene set enrichment for the genes encoding the 109 proteins EpiScores were trained on. (I) Basic Cox proportional hazards model results in Generation Scotland. (J) Fully adjusted and sensitivity analyses results for Cox proportional hazards models in Generation Scotland. (K) Schoenfeld residual Cox sensitivity analyses. (L) Schoenfeld residual Cox sensitivity analyses split by year of follow-up. (M) SOMAscan-EpiScore diabetes association lookup against three large-scale plasma protein-diabetes studies. (N) White blood cell sensitivity analyses. (O) GrimAge sensitivity analyses. (P) COVID-19 analyses. Q-1B1 Primary and secondary diagnosis codes for each of the 12 morbidities in this study that were used to assign case/control status of participants.

https://cdn.elifesciences.org/articles/71802/elife-71802-supp1-v3.xlsx
Transparent reporting form
https://cdn.elifesciences.org/articles/71802/elife-71802-transrepform1-v3.pdf

Download links