Epigenome-wide analysis of DNA methylation and coronary heart disease: a nested case-control study

  1. Jiahui Si
  2. Songchun Yang
  3. Dianjianyi Sun
  4. Canqing Yu
  5. Yu Guo
  6. Yifei Lin
  7. Iona Y Millwood
  8. Robin G Walters
  9. Ling Yang
  10. Yiping Chen
  11. Huaidong Du
  12. Yujie Hua
  13. Jingchao Liu
  14. Junshi Chen
  15. Zhengming Chen
  16. Wei Chen
  17. Jun Lv  Is a corresponding author
  18. Liming Liang  Is a corresponding author
  19. Liming Li  Is a corresponding author
  20. China Kadoorie Biobank Collaborative Group
  1. Department of Epidemiology and Biostatistics, School of Public Health, Peking University Health Science Center, China
  2. Departments of Epidemiology and Biostatistics, Harvard T.H. Chan School of Public Health, United States
  3. Chinese Academy of Medical Sciences, China
  4. Department of Urology, West China Hospital, Sichuan University, China
  5. Medical Research Council Population Health Research Unit at the University of Oxford, United Kingdom
  6. Clinical Trial Service Unit & Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, United Kingdom
  7. NCDs Prevention and Control Department, Suzhou CDC, China
  8. NCDs Prevention and Control Department, Wuzhong CDC, China
  9. China National Center for Food Safety Risk Assessment, China
  10. Department of Epidemiology, School of Public Health and Tropical Medicine, Tulane University, United States
  11. Key Laboratory of Molecular Cardiovascular Sciences (Peking University), Ministry of Education, China
  12. Peking University Institute of Environmental Medicine, China

Abstract

Background:

Identifying environmentally responsive genetic loci where DNA methylation is associated with coronary heart disease (CHD) may reveal novel pathways or therapeutic targets for CHD. We conducted the first prospective epigenome-wide analysis of DNA methylation in relation to incident CHD in the Asian population.

Methods:

We did a nested case-control study comprising incident CHD cases and 1:1 matched controls who were identified from the 10 year follow-up of the China Kadoorie Biobank. Methylation level of baseline blood leukocyte DNA was measured by Infinium Methylation EPIC BeadChip. We performed the single cytosine-phosphate-guanine (CpG) site association analysis and network approach to identify CHD-associated CpG sites and co-methylation gene module.

Results:

After quality control, 982 participants (mean age 50.1 years) were retained. Methylation level at 25 CpG sites across the genome was associated with incident CHD (genome-wide false discovery rate [FDR] < 0.05 or module-specific FDR < 0.01). One SD increase in methylation level of identified CpGs was associated with differences in CHD risk, ranging from a 47 % decrease to a 118 % increase. Mediation analyses revealed 28.5 % of the excessed CHD risk associated with smoking was mediated by methylation level at the promoter region of ANKS1A gene (P for mediation effect = 0.036). Methylation level at the promoter region of SNX30 was associated with blood pressure and subsequent risk of CHD, with the mediating proportion to be 7.7 % (P = 0.003) via systolic blood pressure and 6.4 % (P = 0.006) via diastolic blood pressure. Network analysis revealed a co-methylation module associated with CHD.

Conclusions:

We identified novel blood methylation alterations associated with incident CHD in the Asian population and provided evidence of the possible role of epigenetic regulations in the smoking- and blood pressure-related pathways to CHD risk.

Funding:

This work was supported by National Natural Science Foundation of China (81390544 and 91846303). The CKB baseline survey and the first re-survey were supported by a grant from the Kadoorie Charitable Foundation in Hong Kong. The long-term follow-up is supported by grants from the UK Wellcome Trust (202922/Z/16/Z, 088158/Z/09/Z, 104085/Z/14/Z), grant (2016YFC0900500, 2016YFC0900501, 2016YFC0900504, 2016YFC1303904) from the National Key R&D Program of China, and Chinese Ministry of Science and Technology (2011BAI09B01).

Introduction

Coronary heart disease (CHD) is one of the leading causes of morbidity and mortality worldwide (GBD 2017 Causes of Death Collaborators, 2018). Despite known environmental risk factors and the identification of genetic variations, a considerable proportion of the observed CHD risk remains unexplained (Deloukas et al., 2013).

Methylation at cytosine-phosphate-guanine (CpG) dinucleotides is a common epigenetic modification of DNA (Deaton and Bird, 2011), which forms an interface between the genotype and the environment (Rosa-Garrido et al., 2018). DNA methylation are responsive to environmental stimuli and unhealthy lifestyles, including smoking (McCartney et al., 2018), alcohol consumption (Liu et al., 2018), and obesity (Wahl et al., 2017). This makes DNA methylation a potential biomarker of environmental-related and lifestyle-driven diseases of adulthood, for example, metabolic dysfunction (including hypertension (Richard et al., 2017), diabetes (Chambers et al., 2015), and atherogenic dyslipidemia (Irvin et al., 2014). Unhealthy lifestyles, together with metabolic dysfunction, will further increase the risk of cardiovascular disease. Investigating the environmentally responsive DNA methylation change linked to CHD could gain insights into the underlying mechanisms and identify novel clinical biomarkers and therapeutic targets of CHD.

Previous epigenome-wide analysis of DNA methylation and CHD was characterized by small sample size (Silvio et al., 2014; Nakatochi et al., 2017; Guarrera et al., 2015; Sharma et al., 2014; Li et al., 2017; Yamada et al., 2014), based in primarily Western countries (Silvio et al., 2014; Guarrera et al., 2015; Yamada et al., 2014; Golareh et al., 2019; Liu et al., 2017; Fernández-Sanlés et al., 2018; Rask-Andersen et al., 2016), focusing on selective genomic regions (Guarrera et al., 2015; Sharma et al., 2014), or the cross-sectional nature of findings which precludes establishment of any temporal relationship (Silvio et al., 2014; Nakatochi et al., 2017; Sharma et al., 2014; Li et al., 2017; Yamada et al., 2014; Liu et al., 2017; Fernández-Sanlés et al., 2018; Rask-Andersen et al., 2016). Only a few prospective studies were conducted in the white populations (Guarrera et al., 2015; Golareh et al., 2019).

We examined the association between epigenome-wide methylation of blood-derived DNA and CHD risk over the next 10 years, by comparing prospectively ascertained CHD cases with 1:1 matched controls in the China Kadoorie Biobank (CKB). We then examined the relationships between the identified CHD-associated methylation sites and cardiovascular risk factors, and further identified potential pathway by causal mediation analysis. The overall analysis flowchart is provided in Figure 1.

Flowchart of the present study. CHD = coronary heart disease; QC = quality control; CpG = cytosine-phosphate-guanine; FDR = false discovery rate.

Materials and methods

Study population

Request a detailed protocol

The CKB is a prospective cohort of 512,715 adults aged 30–79 years from 10 geographically diverse areas across China (five urban and five rural areas) since 2004–2008. Details of the study design, survey methods, and long-term follow-up have been given elsewhere (Chen et al., 2011). Briefly, all participants completed laptop-based questionnaires (including sociodemographic, lifestyle factors, and medical and medication history) and physical measurements (including body weight, height, and blood pressure). Participants also provided a 10 ml random blood sample for an immediate on-site test of random plasma glucose and long-term storage, with the time since last meal recorded. Mortality and morbidity during follow-up were identified through linkage with local death and disease registries, with the national health insurance system, and by active follow-up if necessary (i.e., visiting local communities or directly contacting participants).

Study design

Request a detailed protocol

Baseline DNA methylation was measured for 494 CHD cases, whose CHD occurred during the follow-up period until 31 December 2015, and 494 matched controls. All these participants were free of heart disease, stroke, or cancer at baseline. They also had clinical chemistry measured for baseline plasma sample, including total cholesterol (TC), low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), and triglycerides (TG) (Wolfson Laboratory at University of Oxford, UK).

Incident CHD cases were defined as fatal ischemic heart disease (IHD) coded as ICD-10 I20-I25 and nonfatal acute myocardial infarction coded as I21. The diagnosis adjudication has finished for 134 reported cases by a review of hospital medical records. Overall, 90 % of the diagnoses of CHD were confirmed. Cases were excluded if they have developed malignant neoplasms (C00-C97) or cerebrovascular diseases (I60-I69) during follow-up. Each case was individually matched to one control who was free of IHD, malignant neoplasms, or cerebrovascular diseases throughout follow-up. Controls were matched to cases by birth year ( ± 3 years), age at baseline ( ± 3 years), sex, study area, hours fasting prior to blood draw (0- < 6, 6- < 8, 8- < 10, and ≥10 hours) at baseline.

Measurement of DNA methylation

Request a detailed protocol

For 494 pairs of CHD cases and controls, epigenome-wide methylation level of baseline blood leukocyte DNA was measured by Infinium Methylation EPIC BeadChip (Illumina, USA), which interrogates ~850,000 CpG sites (BGI, China). Although the laboratory staff were blinded to case/control status, the cases and controls were not strictly randomized on arrays.

We used minfi package (RRID:SCR_012830) to process methylation data. CpG sites were excluded if they: (1) were assayed SNPs rather than CpGs (n = 59); (2) had bead count <3 in 5 % of samples (n = 1,644), or had >1% of samples with a detection P > 0.05 (n = 2,536); (3) were overlapped with SNPs in the 1,000 Genome Project (20130502 release) with minor allele frequency in Eastern Asian population >0.05 at the target CpGs sites, single base extension sites of Type I probe, or the probe body (Pidsley et al., 2016); (4) possibly cross-hybridized to other genomic locations (Pidsley et al., 2016) (3 and 4 contained 132,762 sites in total). Samples were excluded if they (1) were outliers detected by multidimensional scaling analysis (n = 0); (2) were sex mixed-up samples (n = 2); (3) had missing rate >0.01 across CpG sites (n = 2); (4) were measured in a distinct study batch (n = 2).

After quality control, 982 of 988 samples with 747,726 CpG sites were retained. Also, we randomly chose 11 samples (one sample per plate) for duplicate measurements. The correlation for duplicate measurements on the same sample ranged from 0.992 to 0.997.

Assessment of covariates

Request a detailed protocol

In the baseline questionnaire, for smoking, we asked frequency, type, and amount of tobacco smoked per day for ever smokers, and reason to quit for former smokers. We included former smokers who stopped smoking for illness in the current smoker category to avoid misleadingly elevated risk. We then calculated the current average number of cigarette equivalents consumed per day. For alcohol consumption, we asked drinking frequency on a week, type of alcoholic beverage, and volume of alcohol consumed on a typical drinking day. We calculated average pure alcohol volume consumed per day. For physical activity, we asked the usual type and duration of activities. The daily level of physical activity was calculated by multiplying the metabolic equivalent tasks (METs) value for a particular type of physical activity by hours spent on that activity per day and summing the MET-hours for all activities. For dietary habit, we used a short qualitative food frequency questionnaire to assess habitual intakes of 12 conventional food groups, that are mainly addressed in the Chinese dietary guidelines (2016). We then calculated a diet score: consuming fresh vegetables and fruits every day, red meat <7 days/week, soybean products ≥ 4 days/week, fish ≥1 day/week, and coarse grains ≥ 4 days/week, each item as one score. We then summed the above six scores for the total diet score. Trained staff measured weight and height with calibrated instruments. Body mass index (BMI) was calculated as weight in kilograms divided by the square of the height in meters.

Statistical analysis

Single DNA methylation marker and incident CHD

Request a detailed protocol

In the epigenome-wide analysis, raw methylation matrices were normalized using the dasen method in the wateRmelon package (RRID:SCR_001296). Linear regression was applied for single-marker tests, with the beta-values of methylation as dependent variables, CHD as an indicator, and age (years), sex (male or female), 10 study area, fasting time (< 8 or ≥ 8 hr), education level (no formal or primary school, middle or high school, technical school or college or higher), marital status (married or not), smoking (current average number of cigarette equivalents consumed per day), alcohol consumption (average pure alcohol volume consumed per day), physical activity (MET-hours), diet score (continuous variable ranging from 0 to 6), and BMI (kg/m2) as covariates. To quantify latent factors, including the effects of unobserved batch effects, cell compositions, and other unmeasured confounding factors, we used smart surrogate variable analysis by the smartSVA package (Chen et al., 2017). This method has been reported to be a fast and robust method for removing batch effects and preserve power (Brägelmann and Lorenzo Bermejo, 2019). Variables considered in the smart surrogate variable (SV) analysis included case or control status and all covariates. A total of 56 SVs were generated and also included as covariates in the above model. We used false discovery rate (FDR) < 0.05 to determine epigenome-wide significant CpGs in relation to CHD. We annotated CpGs to genes based on official EPIC array annotation file from Illumina, 2017.

In sensitivity analysis, we excluded 100 participants who reported usage of blood pressure-lowering drugs at baseline to avoid potential confounding effect caused by medications.

Weighted gene co-methylation network and incident CHD

Request a detailed protocol

We also used the network approach to first identify CHD related co-methylation network module and then CHD related CpGs within the discovered module. We used weighted gene co-methylation network analysis (R package WGCNA, RRID:SCR_003302) to identify potential co-thylation network related to CHD. To ensure computation feasibility, we selected the top 20,000 CHD-associated CpGs from single-marker tests. This is about the maximum number of CpGs the WGCNA package can handle on our high performance computing cluster. Two samples were outliers and excluded during sample clustering. We used the function “blockwiseModules” with a minimum module size of 30 sites to construct network automatically. Modules were created and merged with the mergeCutHeight set to 0.25. We then identified modules that were statistically significantly associated with CHD using the module eigengene (the first principal component of the given module), with the same set of covariates as in the individual CpG association analysis. After detection of CHD associated modules, we performed the visualization of network modules and its hub gene to depict the connection among the annotated genes by VisANT 5.0 (http://visant.bu.edu/). Because the module was rather large, we restricted the genes used in the visualization to the annotated genes of the 24 CpGs with module-specific FDR < 0.01.

To ensure the selection of top 20,000 CpGs did not inflate the false positives of CHD-module association, we carried out a permutation-based test by shuffling the case-control status and re-selected top 20,000 CpGs based on the permuted data to construct module and test for association with CHD. In the permutation test, we found no inflated false positives due to the selection of top 20,000 CpGs (the most significant module has P > 0.032, Figure 2—figure supplement 1).

For CHD-associated modules (P < 0.05/the number of modules, Bonferroni correction), we performed gene enrichment analysis using the list of annotated genes from this module (DAVID, https://david.ncifcrf.gov/) (Huang et al., 2007), and further determined the significant CHD related CpGs within the module (module-specific FDR < 0.01). For CHD-associated loci, we further fitted logistic regression adjusting for the same set of covariates and all SVs to interpret the effect size better.

Association between CHD-associated CpGs and aardiovascular risk factors

Request a detailed protocol

We investigated the associations between lifestyle factors and CHD-associated CpGs, with the methylation value as the outcome. Lifestyle factors included smoking, alcohol consumption, physical activity, dietary habit, and BMI.

If the lifestyle-methylation association suggests marginal significance (P < 0.05), we performed causal mediation analysis using parametric regression models, achieved by paramed package in STATA (RRID:SCR_012763). Two models were estimated for each CpG: (1) a model for the mediator (methylation level as a continuous variable) conditional on exposure (the corresponding lifestyle factor) and covariates (age, sex, study area, fasting time, education level, marital status, the other four lifestyle factors, and batch); (2) a model for the risk of CHD conditional on exposure, the mediator, and covariates. We allowed for the presence of exposure-mediator interactions in the outcome regression model.

We aimed to calculate how much of the CHD risk associated with lifestyle factors (total effect, TE) was attributable to mediating effect of methylation level at a specific locus (natural indirect effect, NIE). The proportion attributable to the NIE was calculated as NIE divided by TE on log odds scale, with 0 indicating no mediation effect.

We also investigated the association between CHD-associated CpGs and cardiometabolic traits, with the cardiometabolic traits as the outcome. Cardiometabolic traits included systolic blood pressure (SBP), diastolic blood pressure (DBP), blood lipid level (TC, LDL-C, HDL-C, and TG), and random glucose.

For CpGs which were statistically significantly associated with any of the cardiometabolic traits, we calculated the mediation effect of methylation level on CHD through a specific cardiometabolic trait. Two models were estimated for each CpG: (1) a model for the mediator (the corresponding cardiometabolic risk factor) conditional on exposure (methylation level) and covariates (age, sex, study area, fasting time, education level, marital status, five lifestyle factors, and batch); (2) a model for the risk of CHD conditional on exposure, the mediator, and covariates.

In the analysis of blood pressure, we added 15 and 10 mmHg to the measured SBP and DBP respectively among participants who reported usage of blood pressure-lowering medications. In the analysis of random glucose, we additionally adjusted for treatment of diabetes at baseline.

We adjusted batch IDs instead of SVs in CpGs-CHD risk factor association analysis and the corresponding mediation analysis because SVs adjustment was more appropriate when the methylation value was treated as the outcome.

Results

Baseline DNA methylation was measured for 494 CHD cases and 494 matched controls. After quality control, we inculded 491 cases free of CHD at baseline and developing CHD during follow up and 491 controls free of CHD at baseline and follow up and matched for birth year, age at baseline, sex, study area, and area, hours fasting prior to blood draw.

The mean age was 50.6 ± 7.6 years for incident CHD cases and 49.5 ± 7.3 years for matched controls. Compared with control participants, the CHD cases were more likely to be daily smokers, have unhealthy dietary habits, and have higher BMI. CHD cases also had a higher prevalence of hypertension and diabetes and worse lipid profile at baseline (Table 1).

Table 1
Age-, sex- and study area-adjusted baseline characteristics of 982 participants according to the case or control status.
Baseline characteristicsCases(n = 491)Controls(n = 491)P value
Age, year50.649.5-
Female, %43.643.6-
Urban area, %20.620.6-
Middle school and above, %43.445.60.730
Married, %90.494.70.028
Family history of heart attack, %6.94.70.127
Fasting time, h4.04.0-
Lifestyle factors
Daily tobacco smoker, %46.640.30.004
Daily alcohol drinker, %9.010.00.455
Physical activity, MET-h/day22.023.90.097
Diet score2.32.50.001
Vegetables 7 days/week, %92.791.00.278
Fruit 7 days/week, %9.413.80.030
Read meat <7 days/week, %79.280.00.600
Soybean product ≥4 days/week, %5.99.60.026
Fish ≥1 days/week, %24.628.90.022
Coarse grains ≥ 4 days/week, %22.824.60.047
Body mass index, kg/m223.923.30.002
Metabolic risk factors
Prevalent hypertension, %52.529.9< 0.001
Prevalent diabetes, %10.04.50.004
Blood lipids
Total cholesterol, mmol/L4.694.520.005
LDL-C, mmol/L2.352.210.003
HDL-C, mmol/L1.221.180.025
Triglyceride, mmol/L2.202.010.064
  1. The results are presented as means or percentages. P values were not showed for matched factors. MET = metabolic equivalent of task; LDL-C = low-density lipoprotein cholesterol; HDL-C = high-density lipoprotein cholesterol.

Association between single DNA methylation marker and incident CHD

EWAS revealed an excess of association across a range of P thresholds (Supplementary file 2A). The genomic inflation factor was 1.09 after adjustment for covariates and surrogate variables (SV). The Quantile-Quantile plot (Q-Q plot) indicated little residual confounding (Supplementary file 2B). Methylation markers at two genetic regions were associated with incident CHD at FDR < 0.05 (Table 2 and Supplementary file 2B). The corresponding p-value of the FDR = 0.05 threshold was 2.01E-07. The adjusted difference (standard error, SE) in methylation level between cases and controls was –0.003 (0.0006) for cg23398826 (P = 1.57E-08), which was annotated to SNX30. The SD of the beta value of cg23398826 was 0.008. The odds ratio (OR) (95% confidence interval [CI]) for incident CHD was 0.56 (0.45, 0.70) per SD increase in methylation level at cg23398826. The corresponding adjusted difference (SE) for cg02386575 was 0.006 (0.0011; P = 9.61E-08), annotated to IMPDH2 and QRICH1 (Table 2). The SD was 0.016. The OR (95% CI) for per SD increase in cg02386575 was 2.00 (1.57, 2.56).

Table 2
Associations of 25 significant CpGs with the risk of coronary heart disease.
ChrPosition(hg19)CpGSDGeneRelation to geneEWASWGCNA*Odds Ratio (95% CI)
βPFDRModule-specific FDR
9115513036cg233988260.008SNX30TSS200–0.0031.57E-080.0121.05E-040.56 (0.45, 0.70)
349068057cg023865750.016IMPDH2TSS15000.0069.61E-080.0362.02E-042.00 (1.57, 2.56)
QRICH1Body
1937329330cg104009370.007ZNF790TSS2000.0021.09E-050.2880.0091.53 (1.24, 1.89)
12131758671cg205628210.022(RPS6P20 §)0.0052.42E-050.2880.0091.72 (1.28, 2.31)
634855635cg081066610.016TAF111stExon0.0033.16E-050.3050.0091.87 (1.35, 2.59)
ANKS1ATSS1500
1153203211cg116306100.019(MIR584§)0.0053.83E-050.3290.0091.77 (1.36, 2.32)
18426319cg203021710.018RERE5'UTR–0.0044.29E-050.3400.0090.55 (0.42, 0.73)
1163909324cg263341310.025MACROD1Body–0.0054.44E-050.3400.0090.53 (0.38, 0.73)
202444631cg075604080.018SNORD119TSS1500–0.0054.46E-050.3400.0090.60 (0.47, 0.77)
SNRPBBody
1946522185cg212105370.027MIR769TSS2000.0044.85E-050.3560.0092.18 (1.47, 3.23)
2060546782cg158334470.021(TAF4§)0.0065.55E-050.3750.0091.50 (1.20, 1.88)
1194963255cg025918260.005LOC100129203TSS2000.0025.70E-050.3750.0091.52 (1.23, 1.87)
7100861083cg166391380.006ZNHIT15'UTR/1stExon0.0026.46E-050.3750.0091.52 (1.24, 1.86)
PLOD3TSS200
627863042cg015454540.007(HIST1H2BO§)0.0027.29E-050.3780.0091.64 (1.26, 2.13)
1203242409cg072191030.008(CHIT1§)0.0027.35E-050.3780.0091.78 (1.28, 2.47)
2223994996cg056816430.018GUSBP11Body0.0047.42E-050.3780.0091.60 (1.24, 2.08)
288991375cg063585660.009RPIA1stExon–0.0027.74E-050.3850.0090.62 (0.48, 0.80)
2162273185cg195832110.016TBR11stExon–0.0037.97E-050.3850.0090.56 (0.41, 0.77)
203613189cg106438500.025ATRNBody0.0048.04E-050.3850.0091.97 (1.37, 2.82)
1717460905cg133114940.016PEMTBody–0.0058.50E-050.3970.0090.64 (0.52, 0.79)
1179852195cg117546700.009TOR1AIP1Body0.0018.84E-050.3980.0092.04 (1.40, 2.97)
1574928935cg057406320.014EDC3Body–0.0049.07E-050.3980.0090.62 (0.49, 0.78)
111972510cg084841000.023MRPL23Body–0.0049.19E-050.3980.0090.54 (0.40, 0.74)
152822428cg247921790.019CC2D1BBody0.0049.87E-050.4100.0091.79 (1.36, 2.35)
768973036cg227947120.021(LOC100507468§)–0.0061.10E-040.4130.0100.63 (0.50, 0.80)
  1. *

    cg23398826 in the Turquoise module, all other CpGs in the Brown module.

  2. Odds ratios were for per standard deviation increase in DNA methylation level.

  3. Effect sizes were calculated based on normalized methylation values, denoting the methylation difference between cases and controls.

  4. §

    For inter-genic CpG sites, R package FDb.InfiniumMethylation.hg19 was used to locate the nearest annotated gene.

  5. CpG = cytosine-phosphoguanine site; Chr = chromosome; EWAS = epigenome wide association; WGCNA = weighted gene co-methylation network analysis; FDR = false discovery rate; CI = confidence interval; TSS200 = within 200 bp from transcription start site; TSS1500 = within 1500 bp from transcription start site; Body = the CpG is in gene body; 1stExon = the first exon; and UTR = untranslated region.

Association between weighted gene co-methylation network and CHD risk

We used weighted gene co-methylation network analysis (WGCNA) (Langfelder and Horvath, 2008) to identify potential co-methylation network related to CHD. This method can be used for identifying clusters of highly correlated co-methylation genes and relating modules to external sample traits to find biologically or clinically significant modules. Two samples were outliers and excluded during the sample clustering step. We included 491 cases and 489 controls in the following analysis. A total of five modules were produced in the clustering step of WGCNA (Figure 2). One module (called: Brown module), containing 2,106 CpG sites, was associated with incident CHD after adjustment for covariates and all SVs (P < 0.05/the number of modules, P = 6.41E-08).

Figure 2 with 2 supplements see all
Heatmap of association with methylation network modules.

Correlation coefficient and -log10(P) (inside the bracket) were reported; the degree of -log10(p) is illustrated with the color legend. Linear regressions were fitted with inverse normal transformed module eigengene (ME) as dependent variables; coronary heart disease (CHD1) as indicator; and age, sex, education, marital status, smoking (SMK), drinking (DRK), physical activity (PA1 and PA2 as the second and third tertile respectively), diet score, body mass index, fasting time, study area and all surrogate variables as covariates.

Gene enrichment analysis of the annotated genes of 2106 CpG sites in this module revealed six annotation clusters with at least one term having an FDR < 0.05 (Supplementary file 2C). These annotation clusters were significantly enriched in terms associated with intracellular signaling (zinc-finger, pleckstrin homology domains, C2 domains, and protein kinase activity) and transcription regulation. Annotated genes in this module were also enriched in genes associated with tobacco use disorder, stroke, and kidney disease (the Genetic Association Database). We performed the visualization of the Brown module (Hu et al., 2008) and found CpGs annotated to ZNF790, CC2D1B, TBR1, RERE, and PLXNB2 had the most connections with other genes (Figure 2—figure supplement 2).

Within the Brown module, 24 CpGs were significantly associated with CHD (module-specific FDR < 0.01), with P ranging from 1.10E-04–9.61E-08 (Table 2). Together with two CpGs identified from single-maker tests, a total of 25 CpGs were associated with CHD, with OR (95 % confidence interval) ranging from 0.53 (0.38, 0.73) for cg26334131 to 2.18 (1.47, 3.23) for cg21210537 (Table 2).

CHD-associated CpGs and cardiovascular risk factors

Methylation level at cg08106661 was associated with the average number of cigarette equivalents per day (effect size = 1.50E-04, SE = 4.67E-05, P = 0.001; Table 3). Further mediation analysis revealed that 28.5 % of the smoking-associated CHD risk was mediated through methylation level at cg08106661 (P = 0.036). We also found three loci associated with diet score and two loci associated with BMI, but no statistically significant mediation effect was noted (P > 0.05). Alcohol consumption and physical activity were not associated with any of the CHD-associated CpGs (Table 3 and Table 3—source data 1).

Table 3
Associations between lifestyle factors and methylation level of identified CpGs, and the risk of coronary heart disease mediated through methylation level of CpG sites.

Effect size(SE)PMediation effect
Proportion mediated, %P
Smoking, no. of cigarettes/day



cg081066611.50E-04 (4.67E-05)0.00128.500.036
Diet score (ranging 0-6)



cg212105373.60E-03 (1.27E-03)0.0054.660.206
cg106438502.57E-03 (1.26E-03)0.042-6.910.088
cg057406321.37E-03 (6.88E-04)0.04711.300.068
Body mass index, kg/m2



cg203021713.90E-04 (1.67E-04)0.020-2.870.267
cg084841004.17E-04 (2.10E-04)0.048-1.910.373
  1. Linear regression was fitted by including all five lifestyle factors (smoking, alcohol consumption, physical activity, diet score, and body mass index) simultaneously in the same model, with methylation values as dependent variables, and age, sex, study area, fasting time, education level, marital status and batch as covariates. CpG = cytosine-phosphoguanine site; SE = standard error. Alcohol consumption and physical activity were not associated with any of the coronary heart disease-associated CpGs. Details were reported in the Table 3—source data 1.

Table 3—source data 1

Association between lifestyle factors and identified CpGs.

https://cdn.elifesciences.org/articles/68671/elife-68671-table3-data1-v3.docx

Compared with participants in the bottom quartile of methylation level at cg23398826, those in the top quartile had 6.4 (SE 2.1) mmHg lower SBP (P = 0.003) and 3.6 (1.2) mmHg lower DBP (P = 0.003). The proportions of reduced CHD risk associated with cg23398826 mediated by SBP and DBP were 7.65 % (P = 0.003) and 6.39 % (P = 0.006), respectively (Table 4, Table 4—source data 1 and Table 4—source data 2). The analysis also showed statistically significant mediation of methylation level at cg13311494 (annotated to PEMT) on CHD risk through SBP and DBP, with the mediation proportions of 15.61% and 12.38%, respectively. Four TC-related (Table 4 and Table 4—source data 3), six LDL-C related (Table 4 and Table 4—source data 4), two HDL-C related (Table 4 and Table 4—source data 5), and six random glucose-related (Table 4 and Table 4—source data 6) CpGs had P for trend <0.05. However, no statistically significant mediation effect was shown for the associations between corresponding methylation level on CHD through these traits. TG was not associated with any of the CHD-associated CpGs (Table 4—source data 7).

Table 4
Associations between quartile methylation level of identified CpGs and cardiometabolic traits, and the risk of coronary heart disease mediated through different cardiometabolic traits.


Quartile 1 vs. 4P for trendMediation effect
Effect size(SE)*PProportion mediated, %P
Systolic blood pressure, mmHg




cg23398826-6.410 (2.118)0.003<0.0017.650.003
cg13311494-6.580 (2.122)0.0020.02015.610.031
Diastolic blood pressure, mmHg




cg23398826-3.574 (1.218)0.003<0.0016.390.006
cg13311494-3.650 (1.221)0.0030.02912.380.045
Total cholesterol, mmol/L




cg263341310.197 (0.089)0.0260.003-31.620.168
cg057406320.163 (0.089)0.0660.013-3.210.126
cg212105370.175 (0.094)0.0640.027-8.190.197
cg19583211-0.064 (0.088)0.4660.0472.730.270
Cholesterol in LDL, mmol/L




cg263341310.110 (0.063)0.0790.007-32.140.135
cg203021710.107 (0.063)0.090.029-10.480.161
cg057406320.109 (0.063)0.0830.019-3.380.110
cg19583211-0.078 (0.062)0.2080.0203.600.210
cg13311494-0.117 (0.062)0.060.0273.700.208
cg212105370.126 (0.067)0.0590.044-8.550.177
Cholesterol in HDL, mmol/L




cg158334470.037 (0.026)0.1540.013-10.770.180
cg212105370.040 (0.027)0.1460.0197.300.235
Random blood glucose, mmol/L




cg104009370.551 (0.231)0.0170.0036.720.107
cg015454540.203 (0.234)0.3850.0069.580.097
cg117546700.466 (0.236)0.0490.00536.390.086
cg26334131-0.517 (0.234)0.0280.01832.700.109
cg072191030.578 (0.244)0.0180.0326.170.135
cg20302171-0.556 (0.234)0.0180.02715.770.123
  1. *

    Effect sizes denoted the differences of metabolic traits between the top and bottom quartile methylation level. Details of other quartiles were reported in the Table 4—source data 1.

  2. We added 15 and 10 mmHg to the measured systolic blood pressure and diastolic blood pressure respectively among participants who reported usage of blood pressure-lowering medications.

  3. Additionally adjusted for treatment of diabetes (yes or no) at baseline. Multivariable model was adjusted for: age, sex, education level, marital status, smoking, drinking, physical activity, dietary score, body mass index, fasting time, study area, and batch. The CpGs which were significantly associated with any metabolic risk factors were reported. Details of other CpGs were reported in the Supplementary tables. CpG = cytosine-phosphoguanine site; SE = standard error; LDL = low-density lipoprotein; HDL = high-density lipoprotein.

Table 4—source data 1

Association between quartile methylation level of identified CpGs and systolic blood pressure (mmHg).

https://cdn.elifesciences.org/articles/68671/elife-68671-table4-data1-v3.docx
Table 4—source data 2

Association between quartile methylation level of identified CpGs and diastolic blood pressure (mmHg).

https://cdn.elifesciences.org/articles/68671/elife-68671-table4-data2-v3.docx
Table 4—source data 3

Association between quartile methylation level of identified CpGs and total cholesterol (mmol/L).

https://cdn.elifesciences.org/articles/68671/elife-68671-table4-data3-v3.docx
Table 4—source data 4

Association between quartile methylation level of identified CpGs and cholesterol in low-density lipoprotein (mmol/L).

https://cdn.elifesciences.org/articles/68671/elife-68671-table4-data4-v3.docx
Table 4—source data 5

Association between quartile methylation level of identified CpGs and cholesterol in high-density lipoprotein (mmol/L).

https://cdn.elifesciences.org/articles/68671/elife-68671-table4-data5-v3.docx
Table 4—source data 6

Association between quartile methylation level of identified CpGs and random glucose (mmol/L).

https://cdn.elifesciences.org/articles/68671/elife-68671-table4-data6-v3.docx
Table 4—source data 7

Association between quartile methylation level of identified CpGs and triglyceride (mmol/L).

https://cdn.elifesciences.org/articles/68671/elife-68671-table4-data7-v3.docx

To test the robustness of the findings, we restricted both CpGs-CHD and CpGs-SBP/DBP analyses to participants without the usage of blood pressure-lowering drugs at baseline (n = 880). The association magnitudes of methylation level with CHD were mostly unchanged (Supplementary file 2D). Such restriction slightly attenuated the association of methylation level at cg23398826 and cg13311494 with SBP and DBP (Supplementary file 2E and F). In the association analyses of 25 CHD-associated CpGs and cardiovascular risk factors, we also performed smart SVA for each trait. Adjustment for all SVs instead of batch did not change the association materially (Table 4—source data 1; Table 4—source data 2; Table 4—source data 3; Table 4—source data 4; Table 4—source data 5; Table 4—source data 6; Table 4—source data 7).

Discussion

In this prospective study of middle-aged Chinese, we found methylation at 25 CpGs across the genome were associated with incident CHD risk over the next 10 years. One SD increase in methylation level of identified CpGs was associated with differences in CHD risk, ranging from a 47 % decrease (cg26334131) to a 118 % increase (cg21210537) in CHD risk. Further mediation analyses revealed two potential pathways to CHD risk, one with methylation at cg08106661 mediating the impact of smoking, and the other with blood pressure mediating the impact of methylation at cg23398826 and cg13311494. One co-methylation network suggested a role for intracellular signaling in CHD risk.

We summarized the annotated or nearest annotated gene of the identified CHD-associated CpGs in our study and the previous GWAS findings (Supplementary file 2G). Four of the total 25 identified CpGs map to genes that have been reported in association with cardiovascular disease in previous GWAS studies. CpG cg08106661 maps to the ANKS1A (Ankyrin repeat and SAM domain-containing protein 1 A) gene with critical roles in regulating the epidermal growth factor receptor (EGFR). Activation of EGFR has been implicated in endothelial dysfunction, atherogenesis, and cardiac remodeling (Makki et al., 2013). SNPs in ANKS1A have been consistently linked to CHD and smoking behaviour in different populations (Dichgans et al., 2014; Charmet et al., 2018; Schunkert et al., 2011). Furthermore, our mediation analysis noted that more than 25 % of the increased CHD risk related to smoking was mediated through methylation level at cg08106661. Our results provide evidence that smoking-induced epigenetic modification of DNA may play an important part in the underlying pathway from smoking to CHD.

Five identified CHD-associated CpG loci were linked to blood pressure in previous GWAS studies. CpG cg23398826 was located within 200 bp from the transcription start site of the SNX30 (Sorting Nexin 30) gene, a member of the sorting nexin family which plays a vital role in endocytic trafficking. The perturbation of this process may lead to impaired homeostatic responses and possibly disease states, including cardiovascular disease (Yang et al., 1979). SNX30 has been reported in a GWAS of DBP night-to-day ratio (Rimpelä et al., 2018). The methylation level at CpG cg23398826 was found to be associated with SBP and DBP in our study. Further mediation analysis showed that blood pressure mediated ~10 % of the reduced CHD risk related to methylation at cg23398826, suggesting that such epigenetic regulation might exert an important influence on blood pressure and subsequent risk of CHD. However, methylation level and blood pressure were both measured at baseline. We note that directional association between methylation and blood pressure is still unknown.

WGCNA identified one CHD-associated gene co-methylation network. Gene members of this network were enriched in several protein domains, molecule function, and pathways that are involved in intracellular signaling. Cells can respond to the environment and extracellular cues by this vital mechanism (Schulman, 2013). One previous study using an in vitro model of cardiac hypertrophy revealed that differentially methylated promoters were involved in the intracellular signaling process (Stenzig et al., 2015). Nevertheless, these findings could only be interpreted as a possible functional indication and may stimulate future studies in translating these findings toward a better understanding of disease mechanisms.

Previous epigenome-wide studies of CHD in Asian population were all cross-sectional design with relatively small sample size (Nakatochi et al., 2017; Sharma et al., 2014; Li et al., 2017), in which the changes in DNA methylation at identified CpGs might be a result of disease state. Only two studies have employed prospective design (Guarrera et al., 2015; Golareh et al., 2019). One of them used a meta-analysis of nine population-based cohorts from the US and Europe (11,461 participants, mean age 64 years, mean follow-up time 11.2 years) to analyze CHD-associated DNA methylation at a single-nucleotide resolution (Golareh et al., 2019). Based on HumanMethylation450 BeadChip data, methylation levels at 52 CpG sites were identified to be associated with incident CHD or myocardial infarction. Differences in the nature of study design, genetic background and age distribution of the study population, follow-up time, and coverage of CpG sites might explain that our findings did not overlap with the previous CHD EWAS in either Asian or European population. Three identified loci in the present study could be replicated at the gene level (ANKS1A, RERE, and EDC3), by a small study which investigated differential DNA methylation loci using 15 donor-matched healthy and atherosclerotic human aorta samples in Spain (Silvio et al., 2014). Similar DNA methylation patterns at certain genes might be consistent in blood leukocytes and atherosclerotic lesions during the development of CHD.

Our study is by far the first prospective and the largest EWAS of CHD in the Asian population. The prospective design allowed us to identify loci where changes in DNA methylation potentially predict the risk of future CHD. The use of the latest DNA methylation array that covers over 850,000 CpG methylation sites provides extensive coverage of CpG islands, genes, and enhancer, however, also increases the burden of multiple hypothesis testing. The causal mediation analysis was added to help understand the functional potential of identified loci.

Our study has limitations. Although we have made a comprehensive adjustment for preselected potential confounders and also used the recommended SVA method to remove the unknown confounding effect, residual confounding is still possible. However, the potential inflation due to unadjusted confounding effect was small as indicated by the Q-Q plot. The cases and controls were not randomized on arrays. Adjusting batch effect may lose power to some extent. Future studies with samples at multiple time points preceding CHD onset are expected to provide insights into the role of dynamic changes of methylation and expression level in the progress of CHD.

We presented novel findings on associations of leukocyte DNA methylation at 25 CpGs with CHD risk over the next ten years among Chinese. Our findings also suggested the possible role of epigenetic regulations in the pathways to CHD risk, through or from lifestyle and cardiometabolic factors. Studies are warranted to validate our findings, elucidate the functional mechanisms of newly identified CpGs, and further translate our findings toward preventive or clinical implications.

Data availability

According to the Regulation of the People's Republic of China on the Administration of Human Genetic Resources, we are not allowed to provide Chinese human clinical and genetic data abroad without an official approval. The process of obtaining official approval usually takes 2-3 months. According to our previous experience, we can make the raw data of part data (significant CpGs that were found in our study), not all data available after the approval. For researchers who are interested to access the original data, the access policy and procedures are available at https://www.ckbiobank.org/site/. In brief, the China Kadoorie Biobank (CKB) is being conducted jointly by the Peking University (PKU) in Beijing, the Clinical Trial Service Unit (CTSU), and Nuffield Department of Population Health, University of Oxford. Requesters should be employees of a recognised academic institution, health service organisation or charitable research organisation with experience in medical research. Requestors should be able to demonstrate, through their peer reviewed publications in the area of interest, their ability to carry out the proposed study. After registration, details of the required information are provided on the CKB Data Access System. The CKB Access Team will review and respond to data requests within 6-8 weeks. We are providing our syntax of statistical analysis and the source data for Table 3 and Table 4. Figure 2 is the direct output from R software.

References

    1. Deloukas P
    2. Kanoni S
    3. Willenborg C
    4. Farrall M
    5. Assimes TL
    6. Thompson JR
    7. Ingelsson E
    8. Saleheen D
    9. Erdmann J
    10. Goldstein BA
    11. Stirrups K
    12. König IR
    13. Cazier JB
    14. Johansson A
    15. Hall AS
    16. Lee JY
    17. Willer CJ
    18. Chambers JC
    19. Esko T
    20. Folkersen L
    21. Goel A
    22. Grundberg E
    23. Havulinna AS
    24. Ho WK
    25. Hopewell JC
    26. Eriksson N
    27. Kleber ME
    28. Kristiansson K
    29. Lundmark P
    30. Lyytikäinen LP
    31. Rafelt S
    32. Shungin D
    33. Strawbridge RJ
    34. Thorleifsson G
    35. Tikkanen E
    36. Van Zuydam N
    37. Voight BF
    38. Waite LL
    39. Zhang W
    40. Ziegler A
    41. Absher D
    42. Altshuler D
    43. Balmforth AJ
    44. Barroso I
    45. Braund PS
    46. Burgdorf C
    47. Claudi-Boehm S
    48. Cox D
    49. Dimitriou M
    50. Do R
    51. DIAGRAM Consortium
    52. CARDIOGENICS Consortium
    53. Doney ASF
    54. El Mokhtari N
    55. Eriksson P
    56. Fischer K
    57. Fontanillas P
    58. Franco-Cereceda A
    59. Gigante B
    60. Groop L
    61. Gustafsson S
    62. Hager J
    63. Hallmans G
    64. Han BG
    65. Hunt SE
    66. Kang HM
    67. Illig T
    68. Kessler T
    69. Knowles JW
    70. Kolovou G
    71. Kuusisto J
    72. Langenberg C
    73. Langford C
    74. Leander K
    75. Lokki ML
    76. Lundmark A
    77. McCarthy MI
    78. Meisinger C
    79. Melander O
    80. Mihailov E
    81. Maouche S
    82. Morris AD
    83. Müller-Nurasyid M
    84. MuTHER Consortium
    85. Nikus K
    86. Peden JF
    87. Rayner NW
    88. Rasheed A
    89. Rosinger S
    90. Rubin D
    91. Rumpf MP
    92. Schäfer A
    93. Sivananthan M
    94. Song C
    95. Stewart AFR
    96. Tan ST
    97. Thorgeirsson G
    98. van der Schoot CE
    99. Wagner PJ
    100. Wellcome Trust Case Control Consortium
    101. Wells GA
    102. Wild PS
    103. Yang TP
    104. Amouyel P
    105. Arveiler D
    106. Basart H
    107. Boehnke M
    108. Boerwinkle E
    109. Brambilla P
    110. Cambien F
    111. Cupples AL
    112. de Faire U
    113. Dehghan A
    114. Diemert P
    115. Epstein SE
    116. Evans A
    117. Ferrario MM
    118. Ferrières J
    119. Gauguier D
    120. Go AS
    121. Goodall AH
    122. Gudnason V
    123. Hazen SL
    124. Holm H
    125. Iribarren C
    126. Jang Y
    127. Kähönen M
    128. Kee F
    129. Kim HS
    130. Klopp N
    131. Koenig W
    132. Kratzer W
    133. Kuulasmaa K
    134. Laakso M
    135. Laaksonen R
    136. Lee JY
    137. Lind L
    138. Ouwehand WH
    139. Parish S
    140. Park JE
    141. Pedersen NL
    142. Peters A
    143. Quertermous T
    144. Rader DJ
    145. Salomaa V
    146. Schadt E
    147. Shah SH
    148. Sinisalo J
    149. Stark K
    150. Stefansson K
    151. Trégouët DA
    152. Virtamo J
    153. Wallentin L
    154. Wareham N
    155. Zimmermann ME
    156. Nieminen MS
    157. Hengstenberg C
    158. Sandhu MS
    159. Pastinen T
    160. Syvänen AC
    161. Hovingh GK
    162. Dedoussis G
    163. Franks PW
    164. Lehtimäki T
    165. Metspalu A
    166. Zalloua PA
    167. Siegbahn A
    168. Schreiber S
    169. Ripatti S
    170. Blankenberg SS
    171. Perola M
    172. Clarke R
    173. Boehm BO
    174. O’Donnell C
    175. Reilly MP
    176. März W
    177. Collins R
    178. Kathiresan S
    179. Hamsten A
    180. Kooner JS
    181. Thorsteinsdottir U
    182. Danesh J
    183. Palmer CNA
    184. Roberts R
    185. Watkins H
    186. Schunkert H
    187. Samani NJ
    (2013) Large-scale association analysis identifies new risk loci for coronary artery disease
    Nature Genetics 45:25–33.
    https://doi.org/10.1038/ng.2480
  1. Book
    1. Illumina
    (2017)
    Infinium Methylation
    EPIC Product.
  2. Book
    1. Schulman H
    (2013)
    Intracellular signaling
    In: Squire LR, Berg D, Bloom FE, du Lac S, Ghosh A, Spitzer NC, editors. Fundamental Neuroscience. Academic Press. pp. 189–209.

Decision letter

  1. Edward D Janus
    Reviewing Editor; University of Melbourne, Australia
  2. Matthias Barton
    Senior Editor; University of Zurich, Switzerland
  3. Edward D Janus
    Reviewer; University of Melbourne, Australia
  4. Ida Karlsson
    Reviewer; Karolinska Institutet, Sweden

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

The authors present the first large long term study in a non white population of the association between epigenetic changes and coronary heart disease risk over ten years. In this well conducted study there are novel findings for the increased risk of smoking and for impact on blood pressure. This advances our overall knowledge of epigenetic regulation in the pathways to coronary heart disease.

Decision letter after peer review:

Thank you for submitting your article "Epigenome-wide analysis of DNA methylation and coronary heart disease: a nested case-control study" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, including Edward D Janus as Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Matthias Barton as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Ida Karlsson (Reviewer #3).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

The overall comments from the three reviewers are reproduced below. The paper is excellent. We have provided a list of comments and request you consider these in providing your revision.

Suggested recommendations:

1) Starting with the main concern, the lack of replication is of course an issue (as in most EWASs). Could you compare the top hits from the largest previous EWAS of incident CHD, to see if the direction of effect is similar and if the significance is suggestive in your data? That would tell us a little bit more about the comparability.

2) It would be helpful if some of the methods (and sample description) were provided along with the results, to better follow along the findings (without having to go to the end to read the results first). I found Figure 1 helpful in understanding the results and suggest that it could be presented with the results instead of the methods.

3) Some minor clarifications and justifications I would find helpful as a reader:

a. I am not very familiar with the co-methylation network approach. Could you provide a brief overview of the method (preferably along with the results)? As the majority of the findings stem from this method, a justification of its robustness would be helpful.

b. I agree with what you say in the discussion, that it is not possible to say what direction mediation works, but it would be good with a sentence to explain the reason of the directions selected to study mediation.

c. The sample selection: I assume you aimed to capture severe cases by including only fatal IHD or nonfatal MI counted as CHD? Why were individuals with neoplasms or cerebrovascular disease excluded?

4) A couple of questions regarding the statistical analyses:

a. The analysis description for single CpG sites only says linear regression. Were the analyses not matched for case-control status (i.e. conditional linear regression), and if so why not? That should be the more powerful and robust approach.

b. In relation to that, why adjust for the matching variables?

c. I am slightly worried about overadjustment, especially in the mediation analyses as several of the lifestyle covariates are likely correlated (e.g. physical activity and BMI). Might including these adjustments in the mediation analyses mask an effect? And for the main analyses, did you also test a simpler model with only basic adjustments for comparison?

5) Please check the number of incident cases and controls so the numbers are consistent in the abstract, figure 1, introduction, results etc. The numbers as currently shown vary slightly from 489 to 494.

6) In the first paragraph of the Results section please expand as the readers won't all look at the methodology section. See also (2) above.

Suggest – 491 cases free of CHD at baseline and developing CHD during follow up and 491 controls free of CHD at baseline and follow up and matched for age, gender, region and timing of blood sampling…

7) Table 1. Please show statistical significance p value for prevalent hypertension and diabetes and for lipids.

8) line 122, can you be specific about items related "healthy lifestyle" and are they known risk factors for CHD?

9) line 126, in my opinion, the genomic inflation factor is not helpful here as it is not a good indicator for EWAS. It is because CpGs are more much correlated than SNPs, and the inflation factor is very much dependent on the trait.

10) Line 130: The difference "-0.003" between cases and controls is very small and hard to detect. Can you also show the SD of the two CpGs, or simply plot their distributions in cases and controls.

11) Line 144: Why do you use "probe" not "CpG" in this paragraph?

12) Line 156 change none to no.

13) Line 242: Did you compare the maximum follow-up time and age distribution in the other two prospective publications? I believe the choice of different follow-up time can be an important factor as we are not sure how long it takes CpGs to have effects on CHD.

14) Line 312: "lab staff were blinded" does it also mean the cases and controls were randomized on arrays/batches. This is important as we don't want to lose study power by adjusting batch effect.

15) Line 365: Remove the repeated sentence "A total of 56 SVs were generated."

16) Line 434: How were cellular compositions controlled when you could not use SV here?

17) Can you also report the corresponding p-value of the FDR 0.05 threshold?

18) If you used β-values of DNA methylation in the analysis, please state it.Reviewer #1:

The authors examine the association between genome wide epigenetic changes and coronary heart disease in an effort to further explore the residual risk not explained by known risk factors

The major strengths are that this is a large longitudinal study over 10 years in non whites that is Chinese. Earlier studies have used small samples, been primarily in Western countries, have focused on selective genomic regions rather than the whole genome and with a few exceptions have been cross sectional which precludes establishment of a temporal and potentially causative relationship.

No major weaknesses are apparent. There has been a comprehensive effort to exclude confounders. In limitations the possibility of some residual confounders is noted.

In this well conducted study there are novel findings for impact of DNA methylation at 25 cytosine phosphate-guanine (CpG) dinucleotides, predominantly novel, on CHD risk; for the increased risk of smoking (25% mediated by methylation at one specific site) and for impact on blood pressure and intracellular signaling as well.

The authors have achieved their aims and the results support their conclusions.

This study advances our overall knowledge of epigenetic regulation in the pathways to coronary heart diseaseReviewer #2:

This study used a matched case-control design to study the prospective effect of DNA methylation on coronary heart disease. The study data included 491 cases and 491 controls which can provide enough statistical power. The authors used proper statistical methods to achieve their aims, and the results support their conclusions. This study will contribute to our better understanding of the effect of DNA methylation on coronary heart disease.

Reviewer #3:

The authors present a thorough and very well-written EWAS of incident CHD in 491 cases and 491 matched controls. The sample was drawn from the China Kadoorie Biobank, with 10-years of follow-up, and the data had rich baseline information in addition to DNA methylation measures. Single site association analyses identified two sites (FDR p-value <0.05), and a network approach 24 sites (module-specific FDR<0.01; one site overlapping with single site associations). They further conducted mediation analyses, testing a wide range of cardiometabolic and lifestyle factors, and found evidence of mediation effects at some of the sites in relation to smoking and blood pressure. A weakness is that which is common to most EWASs, namely the lack of replication. However, the issue also highlights the importance of further studies in the field. The prospective nature as well as the Asian population further strengthens the importance, as both aspects are largely lacking in the field.

The methodology is sound and support the conclusions of the paper. I especially appreciate the addition of mediation analyses, as it provides a better understanding of potential biological pathways.

https://doi.org/10.7554/eLife.68671.sa1

Author response

Suggested recommendations:

1) Starting with the main concern, the lack of replication is of course an issue (as in most EWASs). Could you compare the top hits from the largest previous EWAS of incident CHD, to see if the direction of effect is similar and if the significance is suggestive in your data? That would tell us a little bit more about the comparability.

The previous largest EWAS of incident CHD used a meta-analysis of nine population-based cohorts and 11,461 participants from the United States and European countries1. Based on HumanMethylation450 BeadChip data, methylation levels at 30 CpG sites were identified to be associated with incident CHD, and 30 were associated with incident myocardial infarction (MI). The direction of effect was not quite the same. The effect of 55.2% (CHD-associated) and 51.9% (MI-associated) CpGs showed the same directions in our study and in the previous study. (Author response table 1) showed the β coefficient and p-value of these 60 top hits in our study and in the previous study. The genetic background of the study population might be an important factor for this difference and lack of comparability. Also, as the reviewer mentioned in comment 14, age distribution might explain the difference in results. The mean age of the participants in the present study was 50.1 years, and that of the previous EWAS was 64 years.

Author response table 1
OutcomeCpG namebeta coefficient in ourstudyp-value in our studybeta coefficient in the previous studyp-valuein theprevious studyGene
CHDcg22617878-5.89E-043.75E-01−0.37191.99E-08ATP2B2
cg138272098.39E-044.47E-010.2683.76E-08TGFBR1
cg14185717//−0.28781.38E-07BNC2
cg10307345-3.62E-031.88E-01−0.14801.86E-07PTPN5
cg13822123-6.12E-041.45E-010.41382.03E-07PSME4
cg232453161.04E-031.18E-01−0.46742.17E-07TSSC1
cg249772762.83E-034.60E-02−0.32562.54E-07GTF2I
cg244477881.17E-033.20E-01−0.26794.33E-07(PTBP1**)
cg084228031.35E-033.47E-010.19947.52E-07ITGB2
cg01751802-1.43E-035.85E-010.14739.35E-07KANK2
cg024493732.25E-059.77E-010.37159.98E-07FUT1
cg02683350-1.19E-033.62E-01−0.50621.55E-06ADAMTS2
cg058203122.89E-047.87E-010.50311.60E-06TRAPPC9
cg06639874-8.25E-046.41E-01−0.25061.83E-06MLPH
cg06582394-2.56E-031.49E-010.16571.90E-06CASR
cg02155262-4.62E-043.07E-010.4771.97E-06AGA
cg127663832.26E-033.11E-02−0.61941.98E-06UBR4
cg058924845.55E-059.60E-01−0.50202.01E-06MAD1L1
cg030318681.46E-048.65E-010.34612.29E-06ESD
cg25497530-5.63E-031.45E-03−0.22252.62E-06PTPRN2
cg06596307-2.06E-031.24E-01−0.41982.99E-06IGF1R
cg10702366-2.35E-031.47E-01−0.10933.09E-06FGGY
cg264701011.24E-034.46E-010.30523.09E-06(DLX2**)
cg260420241.66E-034.16E-01−0.31093.13E-06ZFAT
cg004661217.27E-044.26E-010.46463.16E-06ZNHIT6
cg04987302-1.96E-031.85E-01−0.33783.71E-06(OTX2-AS1**)
cg088534941.82E-047.27E-010.2214.03E-06RCHY1;THAP6
cg264677251.34E-032.02E-01−0.42254.22E-06SLCO3A1
cg064421929.07E-045.78E-01−0.52414.89E-06ZNF541
cg003933731.11E-032.92E-01−0.31564.91E-06ZNF518B
MIcg228717974.77E-046.85E-01−0.5995.29E-08CYFIP1
cg249772762.83E-034.60E-02−0.3669.97E-08GTF2I
cg185988611.43E-031.98E-01−0.6711.61E-07IRF9
cg097777767.95E-047.38E-010.2872.25E-07ZNF254
cg20545941-3.22E-046.69E-01−0.8852.47E-07MPPED1
cg199358459.61E-045.30E-01−0.3364.65E-07TNXB
cg24423782-2.37E-031.33E-01−0.3985.37E-07MIR182
cg003933731.11E-032.92E-01−0.4017.68E-07ZNF518B
cg004661217.27E-044.26E-010.4877.79E-07ZNHIT6
cg19227382//−0.5048.12E-07CDH23
cg03467256//−0.4088.33E-07HPCAL1
cg251968813.45E-047.46E-01−0.2691.05E-06(THBS1**)
cg023211129.68E-058.99E-010.391.08E-06(MNX1-AS1**)
cg003557992.79E-048.27E-01−0.2161.40E-06(LOC339529**)
cg17556588-1.20E-033.90E-01−0.1541.45E-06PRRG4
cg04987302-1.96E-031.85E-01−0.4281.50E-06(OTX2-AS1**)
cg07289306-1.45E-033.24E-020.6161.71E-06(MIR138-1**)
cg058924845.55E-059.60E-01−0.5511.84E-06MAD1L1
cg10702366-2.35E-031.47E-01−0.1502.11E-06FGGY
cg22618720//−0.4242.37E-06(MIR5095**)
cg140101943.46E-047.84E-01−0.4842.71E-06GUCA1B
cg138272098.39E-044.47E-010.2852.71E-06TGFBR1
cg24318598-1.75E-032.59E-01−0.2542.79E-06ANO1
cg070157751.01E-031.36E-010.4793.13E-06ZNHIT6
cg210181562.00E-031.39E-01−0.1353.17E-06(LINC01312**)
cg07475527-1.64E-032.46E-01−0.2253.77E-06(RCAN3**)
cg200005622.06E-038.15E-020.2183.93E-06SFTA3
cg074368073.57E-047.88E-01−0.7794.10E-06STAMBPL1; ACTA2
cg14029912-1.62E-031.39E-01−0.3674.29E-06(BHLHE40**)
cg228717974.77E-046.85E-01−0.5995.29E-08CYFIP1

2) It would be helpful if some of the methods (and sample description) were provided along with the results, to better follow along the findings (without having to go to the end to read the results first). I found Figure 1 helpful in understanding the results and suggest that it could be presented with the results instead of the methods.

We thank the reviewer for the thoughtful comment. We have revised the manuscript to make it easy to follow (Line 81-88).

3) Some minor clarifications and justifications I would find helpful as a reader:

a. I am not very familiar with the co-methylation network approach. Could you provide a brief overview of the method (preferably along with the results)? As the majority of the findings stem from this method, a justification of its robustness would be helpful.

We used weighted gene co-methylation network analysis2 to identify potential co-methylation network related to CHD. This method can be used to identify clusters of highly correlated co-methylation genes and relate modules to external sample traits to find biologically or clinically significant modules. By calculating correlations among the methylation level of selected CpG sites, we constructed a gene co-methylation network. We then identified gene modules using hierarchical clustering. Next, we related gene modules to CHD outcome.

For computational reasons, we selected the top 20,000 CHD-associated CpGs from single-marker tests. This is about the maximum number of CpGs the WGCNA package can handle on our high-performance computing cluster. A previous study has been restricted 23,000 probe sets to 3600 probes to test the robustness. They found that the module detection results were generally similar2. We also additionally carried out a permutation-based test by shuffling the case-control status and re-selected the top 20,000 CpGs based on the permuted data to construct module and test for association with CHD. We found no inflated false positives due to the selection of top 20,000 CpGs (the most significant module has P>0.032, Figure 2—figure supplement 1).

We have added a brief overview of the co-methylation network approach to the “Result” section (Line 111-114).

b. I agree with what you say in the discussion, that it is not possible to say what direction mediation works, but it would be good with a sentence to explain the reason of the directions selected to study mediation.

We thank the reviewer for the thoughtful comment. DNA methylation is responsive to environmental stimuli and unhealthy lifestyles. This makes DNA methylation a potential biomarker of environmental-related and lifestyle-driven diseases of adulthood, for example, metabolic dysfunction. Unhealthy lifestyles, together with metabolic dysfunction, will further increase the risk of cardiovascular disease. We have added to address this comment (Line 59-65).

c. The sample selection: I assume you aimed to capture severe cases by including only fatal IHD or nonfatal MI counted as CHD? Why were individuals with neoplasms or cerebrovascular disease excluded?

Yes, we included fatal IHD and nonfatal acute MI to capture severe cases. Previous studies suggested that DNA methylation was also a potential biomarker for neoplasms3 or cerebrovascular disease.4,5 Thus, cases with both CHD and cerebrovascular or neoplasms could present a mixture of epigenetic changes. We excluded participants who reported at baseline or have developed neoplasms or cerebrovascular diseases during follow-up to better capture the DNA methylation change associated with incident CHD.

4) A couple of questions regarding the statistical analyses:

a. The analysis description for single CpG sites only says linear regression. Were the analyses not matched for case-control status (i.e. conditional linear regression), and if so why not? That should be the more powerful and robust approach.

We didn’t use conditional linear regression in the analysis. Instead, we followed the recommendation by Leek JT, et al.6 to use simple linear regression when yielding surrogate variable analysis for removing batch effects and other unwanted variations in high-throughput experiments. Previous studies7–10 that used matched case-control design also didn’t perform conditional analysis for the matched factors, although the authors employed different methods to remove batch effects and other unmeasured cofounding (SVA,9 adjustment for principal component,7,10 or adjustment for technical variables directly8).

b. In relation to that, why adjust for the matching variables?

We agree with the reviewer’s consideration for the matched design. Besides the reason we mentioned in the last comment, we noticed that cases were still slightly older than controls despite they were already matched by age (Table 1). Thus, we adjusted for matching factors in the model instead of matching for case-control status in the analysis, same as previous studies.7–10

c. I am slightly worried about overadjustment, especially in the mediation analyses as several of the lifestyle covariates are likely correlated (e.g. physical activity and BMI). Might including these adjustments in the mediation analyses mask an effect? And for the main analyses, did you also test a simpler model with only basic adjustments for comparison?

We re-calculated SVs and adjusted for age, sex, body mass index, smoking status, education level, study area, and all SVs for comparison with the previous largest EWAS of CHD. We called this a basic model and our original model as a full model. Please find the results of these two models in Author response table 2. Adjustment for additional lifestyle covariates did not change the association materially.

Author response table 2
CpGBasic modelFull model
βPβP
cg23398826-0.0033.44E-08-0.0031.57E-08
cg023865750.0052.76E-070.0069.61E-08
cg104009370.0021.28E-050.0021.09E-05
cg205628210.0052.95E-050.0052.42E-05
cg081066610.0036.75E-050.0033.16E-05
cg116306100.0054.22E-050.0053.83E-05
cg20302171-0.0043.86E-05-0.0044.29E-05
cg26334131-0.0053.51E-05-0.0054.44E-05
cg07560408-0.0045.73E-05-0.0054.46E-05
cg212105370.00434.81E-050.0044.85E-05
cg158334470.0050.0001070.0065.55E-05
cg025918260.0024.97E-050.0025.70E-05
cg166391380.0020.0001220.0026.46E-05
cg015454540.0026.38E-050.0027.29E-05
cg072191030.0026.77E-050.0027.35E-05
cg056816430.0047.33E-050.0047.42E-05
cg06358566-0.0028.02E-05-0.0027.74E-05
cg19583211-0.0036.46E-05-0.0037.97E-05
cg106438500.0040.0001040.0048.04E-05
cg13311494-0.0048.04E-05-0.0058.50E-05
cg117546700.0010.0001010.0018.84E-05
cg05740632-0.0044.51E-05-0.0049.07E-05
cg08484100-0.0053.69E-05-0.0049.19E-05
cg247921790.0040.0001150.0049.87E-05
cg22794712-0.0050.000173-0.0061.10E-04

Similarly, we also fitted basic models in the mediation analysis by including age, sex, BMI (exclude when BMI was exposure), smoking status (exclude when smoking was exposure), education level, study area, and batch as covariates. The results were largely retained (See Author response table 3).

Author response table 3
Basic modelFull model
Proportion mediated, %Proportion mediated, %Proportion mediated, %Proportion mediated, %
Smoking, no. of cigarettes/daycg0810666126.800.03828.500.036
Diet score (ranging 0-6)cg212105373.890.2564.660.206
cg10643850-6.730.083-6.910.088
cg0574063211.030.06211.300.068
Body mass index, kg/m2cg20302171-2.900.269-2.870.267
cg08484100-2.320.304-1.910.373
Systolic blood pressure*, mmHg cg2339882613.210.0027.650.003
cg133114943.140.08215.610.031
Diastolic blood pressure*, mmHg cg2339882616.320.0026.390.006
cg133114943.800.07912.380.045
Total cholesterol, mmol/L cg26334131-5.670.449-31.620.168
cg05740632-33.040.020-3.210.126
cg21210537-6.700.169-8.190.197
cg195832118.670.1142.730.270
Cholesterol in LDL, mmol/L cg26334131-3.250.406-32.140.135
cg20302171-4.470.263-10.480.161
cg05740632-17.770.032-3.380.110
cg195832119.030.1753.600.210
cg133114949.020.1773.700.208
cg21210537-5.500.175-8.550.177
Cholesterol in HDL, mmol/L cg15833447-3.690.293-10.770.180
cg212105373.790.2607.300.235
Random blood glucose‡, mmol/L cg104009378.920.0566.720.107
cg015454546.080.1229.580.097
cg117546701.260.61336.390.086
cg263341311.500.61232.700.109
cg072191036.760.1026.170.135
cg203021712.490.39715.770.123
  1. * We added 15 and 10 mmHg to the measured systolic blood pressure and diastolic blood pressure respectively among participants who reported usage of blood pressure-lowering medications.

  2. ‡ Additionally adjusted for treatment of diabetes (yes or no) at baseline.

5) Please check the number of incident cases and controls so the numbers are consistent in the abstract, figure 1, introduction, results etc. The numbers as currently shown vary slightly from 489 to 494.

Baseline DNA methylation was measured for 494 CHD cases and 494 matched controls. In the quality control process, we excluded sex mixed-up samples (n=2); samples with missing rate >0.01 across probes (n=2); samples measured in a distinct study batch (n=2). A total of 491 cases and 491 matched controls were retained for the single marker test. Two samples were further excluded during the network analysis because they were outliers during the sample clustering step, with 491 cases and 489 controls retained. We have revised the manuscript to avoid confusion (Line 85-86, and line 114-116).

6) In the first paragraph of the Results section please expand as the readers won't all look at the methodology section. See also (2) above.

Suggest – 491 cases free of CHD at baseline and developing CHD during follow up and 491 controls free of CHD at baseline and follow up and matched for age, gender, region and timing of blood sampling…

We thank the reviewer for the thoughtful comment. We have revised the manuscript to make it easy to follow (Line 81-88).

7) Table 1. Please show statistical significance p value for prevalent hypertension and diabetes and for lipids

We have added the corresponding p values to Table 1.

8) line 122, can you be specific about items related "healthy lifestyle" and are they known risk factors for CHD?

In our study, we found CHD cases were more likely to be daily smokers, have unhealthy dietary habits, and have higher BMI. We have revised the “Results” section to make it clear (Line 92, 93).

9) line 126, in my opinion, the genomic inflation factor is not helpful here as it is not a good indicator for EWAS. It is because CpGs are more much correlated than SNPs, and the inflation factor is very much dependent on the trait.

We showed the inflation factor for comparison with previous studies.12–16 The QQ plot was also shown in Supplementary file 2B. No evidence for inflation was observed in the QQ plots. We have revised the manuscript to address this comment (Line 98, 246). If the reviewer and editor think it is redundant to present the inflation factor, we would be happy to remove it.

10) Line 130: The difference "-0.003" between cases and controls is very small and hard to detect. Can you also show the SD of the two CpGs, or simply plot their distributions in cases and controls.

We have added a column to Table 2 to show the SD of each CpG. We also showed the SD of the top two hits in the “Result” section (Line 104, 105, and 108).

11) Line 144: Why do you use "probe" not "CpG" in this paragraph?

We thank the reviewer for pointing this out. We performed a gene enrichment analysis of the annotated genes of CpG sites from the CHD-associated module. We have revised to keep consistent and avoid confusion (Line 122, 127).

12) Line 156 change none to no

We have revised “none” to “no” (Line 159).

13) Line 242: Did you compare the maximum follow-up time and age distribution in the other two prospective publications? I believe the choice of different follow-up time can be an important factor as we are not sure how long it takes CpGs to have effects on CHD.

We have summarized the mean age and mean follow-up time of our study and two previous prospective studies as below. Our study included younger participants and followed them up for a shorter time period. We agree with the reviewer that different follow-up time and age distribution might explain the differences between our findings and previous prospective publications. We have revised the “Discussion section” to address this comment (Line 223, 227, Author response table 4).

Author response table 4
Present studyGuarrera S, et al.13Agha Golareh, et al.1
No. of cases4912921,895
Sample size98258411,461
Mean age (years)50.152.464
Mean Follow-up time (years)7.612.911.2

14) Line 312: "lab staff were blinded" does it also mean the cases and controls were randomized on arrays/batches. This is important as we don't want to lose study power by adjusting batch effect.

The cases and controls were not strictly randomized on arrays. However, the lab staff was really blinded to the case/control status. We agree with the reviewer that adjusting the batch effect may lose power to some extent. We have added it as a limitation to address this comment (Line 246-248, 292, and 293).

15) Line 365: Remove the repeated sentence "A total of 56 SVs were generated."

We thank the reviewer for pointing this out. We have deleted (Line 346).

16) Line 434: How were cellular compositions controlled when you could not use SV here?

In the association analyses of 25 CHD-associated CpGs and cardiovascular risk factors, we also have performed smartSVA for each trait. Adjustment for all SVs instead of batch did not change the association materially (Table 4–Source Data 1-7). To further address this comment, we also additionally adjusted for cellular compositions. The results were generally similar (Author response table 5) . If the reviewer and the editors think it is better to present the results with adjustment for cellular compositions, we would be happy to update all tables in the manuscript.

Author response table 5
Model 1+adjusted for CC
Effect size(SE)PEffect size(SE)P
Smoking, no. of cigarettes/day cg081066611.50E-04 (4.67E-05)0.0019.00E-05 (4.39E-05)0.041
Systolic blood pressure, mmHg cg23398826-313.75 (91.72)<0.001-255.60 (94.88)0.007
cg13311494-108.86 (48.16)0.020-101.84 (47.02)0.031
Diastolic blood pressure, mmHg cg23398826-184.53 (52.74)<0.001-151.73 (55.07)0.006
cg13311494-59.54 (27.30)0.029-54.51 (27.30)0.046

17) Can you also report the corresponding p-value of the FDR 0.05 threshold?

The corresponding p-value of the FDR = 0.05 threshold was 2.01E-07. We have added to report in the “Result” section (Line 101-102).

18) If you used β-values of DNA methylation in the analysis, please state it.

Yes, we used the β-values of DNA methylation in the analysis. We have added to clarify (Line 334).References

1. Agha Golareh, Mendelson Michael M., Ward-Caviness Cavin K., et al., Blood Leukocyte DNA Methylation Predicts Risk of Future Myocardial Infarction and Coronary Heart Disease. Circulation 2019;140(8):645–57.

2. Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 2008;9(1):559.

3. Koch A, Joosten SC, Feng Z, et al., Analysis of DNA methylation in cancer: location revisited. Nat Rev Clin Oncol 2018;15(7):459–66.

4. Martínez-Iglesias O, Carrera I, Carril JC, Fernández-Novoa L, Cacabelos N, Cacabelos R. DNA Methylation in Neurodegenerative and Cerebrovascular Disorders. Int J Mol Sci 2020;21(6):2220.

5. Li X-G, Ma N, Wang B, et al., The impact of P2Y12 promoter DNA methylation on the recurrence of ischemic events in Chinese patients with ischemic cerebrovascular disease. Sci Rep 2016;6(1):34570.

6. Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinforma Oxf Engl 2012;28(6):882–3.

7. Nakatochi M, Ichihara S, Yamamoto K, et al., Epigenome-wide association of myocardial infarction with DNA methylation sites at loci related to cardiovascular disease. Clin Epigenetics 2017;9:54.

8. Guarrera S, Fiorito G, Onland-Moret NC, et al., Gene-specific DNA methylation profiles and LINE-1 hypomethylation are associated with myocardial infarction risk. Clin Epigenetics 2015;7:133.

9. Li J, Zhu X, Yu K, et al., Genome-Wide Analysis of DNA Methylation and Acute Coronary Syndrome. Circ Res 2017;120(11):1754–67.

10. Chambers JC, Loh M, Lehne B, et al., Epigenome-wide association of DNA methylation markers in peripheral blood from Indian Asians and Europeans with incident type 2 diabetes: a nested case-control study. Lancet Diabetes Endocrinol 2015;3(7):526–34.

11. Wahl S, Drong A, Lehne B, et al., Epigenome-wide association study of body mass index, and the adverse outcomes of adiposity. Nature 2017;541(7635):81–6.

12. Nakatochi M, Ichihara S, Yamamoto K, et al., Epigenome-wide association of myocardial infarction with DNA methylation sites at loci related to cardiovascular disease. Clin Epigenetics 2017;9:54.

13. Guarrera S, Fiorito G, Onland-Moret NC, et al., Gene-specific DNA methylation profiles and LINE-1 hypomethylation are associated with myocardial infarction risk. Clin Epigenetics 2015;7:133.

14. Li J, Zhu X, Yu K, et al., Genome-Wide Analysis of DNA Methylation and Acute Coronary Syndrome. Circ Res 2017;120(11):1754–67.

15. Fernández-Sanlés A, Sayols-Baixeras S, Curcio S, Subirana I, Marrugat J, Elosua R. DNA Methylation and Age-Independent Cardiovascular Risk, an Epigenome-Wide Approach: The REGICOR Study (REgistre GIroní del COR). Arterioscler Thromb Vasc Biol 2018;38(3):645–52.

16. Rask-Andersen M, Martinsson D, Ahsan M, et al., Epigenome-wide association study reveals differential DNA methylation in individuals with a history of myocardial infarction. Hum Mol Genet 2016;25(21):4739–48.

https://doi.org/10.7554/eLife.68671.sa2

Article and author information

Author details

  1. Jiahui Si

    1. Department of Epidemiology and Biostatistics, School of Public Health, Peking University Health Science Center, Beijing, China
    2. Departments of Epidemiology and Biostatistics, Harvard T.H. Chan School of Public Health, Boston, United States
    Contribution
    Conceptualization, Formal analysis, Validation, Visualization, Writing – original draft, Data curation
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-0827-4973
  2. Songchun Yang

    Department of Epidemiology and Biostatistics, School of Public Health, Peking University Health Science Center, Beijing, China
    Contribution
    Formal analysis, Validation, Data curation
    Competing interests
    No competing interests declared
  3. Dianjianyi Sun

    Department of Epidemiology and Biostatistics, School of Public Health, Peking University Health Science Center, Beijing, China
    Contribution
    Investigation, Funding acquisition, Data curation
    Competing interests
    No competing interests declared
  4. Canqing Yu

    Department of Epidemiology and Biostatistics, School of Public Health, Peking University Health Science Center, Beijing, China
    Contribution
    Project administration, Resources, Data curation
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-0019-0014
  5. Yu Guo

    Chinese Academy of Medical Sciences, Beijing, China
    Contribution
    Project administration, Supervision, Resources, Data curation
    Competing interests
    No competing interests declared
  6. Yifei Lin

    Department of Urology, West China Hospital, Sichuan University, Chengdu, China
    Contribution
    Investigation, Data curation
    Competing interests
    No competing interests declared
  7. Iona Y Millwood

    1. Medical Research Council Population Health Research Unit at the University of Oxford, Oxford, United Kingdom
    2. Clinical Trial Service Unit & Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford, United Kingdom
    Contribution
    Project administration, Resources, Data curation
    Competing interests
    No competing interests declared
  8. Robin G Walters

    1. Medical Research Council Population Health Research Unit at the University of Oxford, Oxford, United Kingdom
    2. Clinical Trial Service Unit & Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford, United Kingdom
    Contribution
    Project administration, Resources, Data curation
    Competing interests
    No competing interests declared
  9. Ling Yang

    1. Medical Research Council Population Health Research Unit at the University of Oxford, Oxford, United Kingdom
    2. Clinical Trial Service Unit & Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford, United Kingdom
    Contribution
    Project administration, Resources, Data curation
    Competing interests
    No competing interests declared
  10. Yiping Chen

    1. Medical Research Council Population Health Research Unit at the University of Oxford, Oxford, United Kingdom
    2. Clinical Trial Service Unit & Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford, United Kingdom
    Contribution
    Project administration, Resources, Data curation
    Competing interests
    No competing interests declared
  11. Huaidong Du

    1. Medical Research Council Population Health Research Unit at the University of Oxford, Oxford, United Kingdom
    2. Clinical Trial Service Unit & Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford, United Kingdom
    Contribution
    Project administration, Resources, Data curation
    Competing interests
    No competing interests declared
  12. Yujie Hua

    NCDs Prevention and Control Department, Suzhou CDC, Jiangsu, China
    Contribution
    Resources, Data curation
    Competing interests
    No competing interests declared
  13. Jingchao Liu

    NCDs Prevention and Control Department, Wuzhong CDC, Jiangsu, China
    Contribution
    Resources, Data curation
    Competing interests
    No competing interests declared
  14. Junshi Chen

    China National Center for Food Safety Risk Assessment, Beijing, China
    Contribution
    Project administration, Writing – review and editing, Data curation
    Competing interests
    No competing interests declared
  15. Zhengming Chen

    Clinical Trial Service Unit & Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford, United Kingdom
    Contribution
    Project administration, Supervision, Writing – review and editing, Data curation
    Competing interests
    No competing interests declared
  16. Wei Chen

    Department of Epidemiology, School of Public Health and Tropical Medicine, Tulane University, New Orleans, United States
    Contribution
    Investigation, Funding acquisition, Data curation
    Competing interests
    No competing interests declared
  17. Jun Lv

    1. Department of Epidemiology and Biostatistics, School of Public Health, Peking University Health Science Center, Beijing, China
    2. Key Laboratory of Molecular Cardiovascular Sciences (Peking University), Ministry of Education, Beijing, China
    3. Peking University Institute of Environmental Medicine, Beijing, China
    Contribution
    Conceptualization, Project administration, Supervision, Resources, Investigation, Writing – review and editing, Methodology, Software, Data curation
    For correspondence
    lvjun@bjmu.edu.cn
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-7916-3870
  18. Liming Liang

    Departments of Epidemiology and Biostatistics, Harvard T.H. Chan School of Public Health, Boston, United States
    Contribution
    Conceptualization, Investigation, Methodology, Funding acquisition, Software, Data curation
    For correspondence
    lliang@hsph.harvard.edu
    Competing interests
    No competing interests declared
  19. Liming Li

    Department of Epidemiology and Biostatistics, School of Public Health, Peking University Health Science Center, Beijing, China
    Contribution
    Conceptualization, Project administration, Supervision, Resources, Writing – review and editing, Methodology, Software, Data curation
    For correspondence
    lmlee@vip.163.com
    Competing interests
    No competing interests declared
  20. China Kadoorie Biobank Collaborative Group

    Competing interests
    The members of the steering committee and collaborative group are listed in the Supplementary file 1.

Funding

National Natural Science Foundation of China (81390544)

  • Jun Lv

National Natural Science Foundation of China (91846303)

  • Jun Lv

Wellcome Trust (202922/Z/16/Z)

  • Zhengming Chen

National Key Research and Development Program of China (2016YFC0900500)

  • Yu Guo

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

The most important acknowledgement is to the participants in the study and the members of the survey teams in each of the 10 regional centres, as well as to the project development and management teams based at Beijing, Oxford and the 10 regional centres.

Ethics

The study protocol was approved by the Ethics Review Committee of the Chinese Center for Disease Control and Prevention (Beijing, China), the Oxford Tropical Research Ethics Committee, University of Oxford (UK), and Peking University Institutional Review Board (Beijing, China). All participants provided written informed consent.

Senior Editor

  1. Matthias Barton, University of Zurich, Switzerland

Reviewing Editor

  1. Edward D Janus, University of Melbourne, Australia

Reviewers

  1. Edward D Janus, University of Melbourne, Australia
  2. Ida Karlsson, Karolinska Institutet, Sweden

Version history

  1. Received: March 23, 2021
  2. Accepted: September 12, 2021
  3. Accepted Manuscript published: September 13, 2021 (version 1)
  4. Version of Record published: November 11, 2021 (version 2)
  5. Version of Record updated: November 17, 2021 (version 3)

Copyright

© 2021, Si et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,395
    Page views
  • 211
    Downloads
  • 11
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Jiahui Si
  2. Songchun Yang
  3. Dianjianyi Sun
  4. Canqing Yu
  5. Yu Guo
  6. Yifei Lin
  7. Iona Y Millwood
  8. Robin G Walters
  9. Ling Yang
  10. Yiping Chen
  11. Huaidong Du
  12. Yujie Hua
  13. Jingchao Liu
  14. Junshi Chen
  15. Zhengming Chen
  16. Wei Chen
  17. Jun Lv
  18. Liming Liang
  19. Liming Li
  20. China Kadoorie Biobank Collaborative Group
(2021)
Epigenome-wide analysis of DNA methylation and coronary heart disease: a nested case-control study
eLife 10:e68671.
https://doi.org/10.7554/eLife.68671

Further reading

    1. Epidemiology and Global Health
    Charumathi Sabanayagam, Feng He ... Ching Yu Cheng
    Research Article Updated

    Background:

    Machine learning (ML) techniques improve disease prediction by identifying the most relevant features in multidimensional data. We compared the accuracy of ML algorithms for predicting incident diabetic kidney disease (DKD).

    Methods:

    We utilized longitudinal data from 1365 Chinese, Malay, and Indian participants aged 40–80 y with diabetes but free of DKD who participated in the baseline and 6-year follow-up visit of the Singapore Epidemiology of Eye Diseases Study (2004–2017). Incident DKD (11.9%) was defined as an estimated glomerular filtration rate (eGFR) <60 mL/min/1.73 m2 with at least 25% decrease in eGFR at follow-up from baseline. A total of 339 features, including participant characteristics, retinal imaging, and genetic and blood metabolites, were used as predictors. Performances of several ML models were compared to each other and to logistic regression (LR) model based on established features of DKD (age, sex, ethnicity, duration of diabetes, systolic blood pressure, HbA1c, and body mass index) using area under the receiver operating characteristic curve (AUC).

    Results:

    ML model Elastic Net (EN) had the best AUC (95% CI) of 0.851 (0.847–0.856), which was 7.0% relatively higher than by LR 0.795 (0.790–0.801). Sensitivity and specificity of EN were 88.2 and 65.9% vs. 73.0 and 72.8% by LR. The top 15 predictors included age, ethnicity, antidiabetic medication, hypertension, diabetic retinopathy, systolic blood pressure, HbA1c, eGFR, and metabolites related to lipids, lipoproteins, fatty acids, and ketone bodies.

    Conclusions:

    Our results showed that ML, together with feature selection, improves prediction accuracy of DKD risk in an asymptomatic stable population and identifies novel risk factors, including metabolites.

    Funding:

    This study was supported by the National Medical Research Council, NMRC/OFLCG/001/2017 and NMRC/HCSAINV/MOH-001019-00. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

    1. Epidemiology and Global Health
    C Kim, Benjamin Chen ... RECOVER Mechanistic Pathways Task Force
    Review Article

    The NIH-funded RECOVER study is collecting clinical data on patients who experience a SARS-CoV-2 infection. As patient representatives of the RECOVER Initiative’s Mechanistic Pathways task force, we offer our perspectives on patient motivations for partnering with researchers to obtain results from mechanistic studies. We emphasize the challenges of balancing urgency with scientific rigor. We recognize the importance of such partnerships in addressing post-acute sequelae of SARS-CoV-2 infection (PASC), which includes ‘long COVID,’ through contrasting objective and subjective narratives. Long COVID’s prevalence served as a call to action for patients like us to become actively involved in efforts to understand our condition. Patient-centered and patient-partnered research informs the balance between urgency and robust mechanistic research. Results from collaborating on protocol design, diverse patient inclusion, and awareness of community concerns establish a new precedent in biomedical research study design. With a public health matter as pressing as the long-term complications that can emerge after SARS-CoV-2 infection, considerate and equitable stakeholder involvement is essential to guiding seminal research. Discussions in the RECOVER Mechanistic Pathways task force gave rise to this commentary as well as other review articles on the current scientific understanding of PASC mechanisms.