Effects of smoking on genome-wide DNA methylation profiles: A study of discordant and concordant monozygotic twin pairs

  1. Jenny van Dongen  Is a corresponding author
  2. Gonneke Willemsen
  3. BIOS Consortium
  4. Eco JC de Geus
  5. Dorret I Boomsma
  6. Michael C Neale
  1. Department of Biological Psychology, Vrije Universiteit Amsterdam, Netherlands
  2. Amsterdam Public Health Research Institute, Netherlands
  3. Amsterdam Reproduction and Development (AR&D) Research Institute, Netherlands
  4. Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, United States

Abstract

Background:

Smoking-associated DNA methylation levels identified through epigenome-wide association studies (EWASs) are generally ascribed to smoking-reactive mechanisms, but the contribution of a shared genetic predisposition to smoking and DNA methylation levels is typically not accounted for.

Methods:

We exploited a strong within-family design, that is, the discordant monozygotic twin design, to study reactiveness of DNA methylation in blood cells to smoking and reversibility of methylation patterns upon quitting smoking. Illumina HumanMethylation450 BeadChip data were available for 769 monozygotic twin pairs (mean age = 36 years, range = 18–78, 70% female), including pairs discordant or concordant for current or former smoking.

Results:

In pairs discordant for current smoking, 13 differentially methylated CpGs were found between current smoking twins and their genetically identical co-twin who never smoked. Top sites include multiple CpGs in CACNA1D and GNG12, which encode subunits of a calcium voltage-gated channel and G protein, respectively. These proteins interact with the nicotinic acetylcholine receptor, suggesting that methylation levels at these CpGs might be reactive to nicotine exposure. All 13 CpGs have been previously associated with smoking in unrelated individuals and data from monozygotic pairs discordant for former smoking indicated that methylation patterns are to a large extent reversible upon smoking cessation. We further showed that differences in smoking level exposure for monozygotic twins who are both current smokers but differ in the number of cigarettes they smoke are reflected in their DNA methylation profiles.

Conclusions:

In conclusion, by analysing data from monozygotic twins, we robustly demonstrate that DNA methylation level in human blood cells is reactive to cigarette smoking.

Funding:

We acknowledge funding from the National Institute on Drug Abuse grant DA049867, the Netherlands Organization for Scientific Research (NWO): Biobanking and Biomolecular Research Infrastructure (BBMRI-NL, NWO 184.033.111) and the BBRMI-NL-financed BIOS Consortium (NWO 184.021.007), NWO Large Scale infrastructures X-Omics (184.034.019), Genotype/phenotype database for behaviour genetic and genetic epidemiological studies (ZonMw Middelgroot 911-09-032); Netherlands Twin Registry Repository: researching the interplay between genome and environment (NWO-Groot 480-15-001/674); the Avera Institute, Sioux Falls (USA), and the National Institutes of Health (NIH R01 HD042157-01A1, MH081802, Grand Opportunity grants 1RC2 MH089951 and 1RC2 MH089995); epigenetic data were generated at the Human Genomics Facility (HuGe-F) at ErasmusMC Rotterdam. Cotinine assaying was sponsored by the Neuroscience Campus Amsterdam. DIB acknowledges the Royal Netherlands Academy of Science Professor Award (PAH/6635).

Editor's evaluation

This study presents valuable findings regarding how smoking can leave a lasting imprint on the human genome. The twin pairs study design is unique, and the methods applied by the authors are solid, providing an excellent starting point for large translational studies with rigorous laboratory approaches. This work will be of interest to geneticists and genetic epidemiologists.

https://doi.org/10.7554/eLife.83286.sa0

eLife digest

The genetic information of people who smoke present distinctive characteristics. In particular, previous research has revealed differences in patterns of DNA methylation, a type of chemical modification that helps cells switch certain genes on or off. However, most of these studies could not establish for sure whether these changes were caused by smoking, predisposed individuals to smoke, or were driven by underlying genetic variation in the DNA sequence itself.

To investigate this question, van Dongen et al. examined DNA methylation data from the blood cells of over 700 pairs of identical twins. These individuals share the exact same genetic information, making it possible to better evaluate the impact of lifestyle on DNA modifications.

The analyses identified differences in methylation at 13 DNA locations in pairs of twins where one was a current smoker and their sibling had never smoked. Two of the genes code for proteins involved in the response to nicotine, the primary addictive chemical in cigarette smoke. The differences were smaller if one of the twins had stopped smoking, suggesting that quitting can help to reverse some of these changes.

These findings confirm that DNA methylation in blood cells is influenced by cigarette smoke, which could help to better understand smoking-associated diseases. They also demonstrate how useful identical twins studies can be to identify methylation changes that are markers of lifestyle.

Introduction

Epigenome-wide association studies (EWASs) have identified robust differences in DNA methylation between smokers and non-smokers (Gao et al., 2015; Heikkinen et al., 2022). In a meta-analysis of blood-based DNA methylation studies (N = 15,907 individuals; the largest EWAS of smoking to date), 2623 CpG sites passed the Bonferroni threshold for genome-wide significance in a comparison of current and never smokers (Joehanes et al., 2016). Based on comparison with loci identified in large genome-wide association studies (GWASs), differentially methylated sites were significantly enriched in genes implicated in well-established smoking-associated diseases, such as cancer, cardiovascular disease, inflammatory disease, and lung disease, as well as in genes associated with schizophrenia and educational attainment (Joehanes et al., 2016). It has been hypothesized that smoking-induced methylation changes might also contribute to the addictive effect of smoking (Zillich et al., 2022).

Importantly, smoking-associated DNA methylation levels, as established in human EWA studies, may reflect different mechanisms. They may reflect causal effects of smoking on methylation, causal effects of methylation on smoking behaviour, methylation differences associated with epiphenomena of other exposures that correlate with smoking e.g. alcohol use (Liu et al., 2018) , or they may reflect a shared genetic predisposition to smoking and methylation level. To distinguish these different mechanisms require incisive study designs (Vink et al., 2017). Establishing whether methylation levels in smokers revert to levels of never smokers upon smoking cessation is a first step. A previous study of 2648 former smokers with cross-sectional methylation data from the Framingham Heart Study suggested that methylation levels at most CpGs return to the level of never smokers within 5 years after quitting smoking, but 36 CpGs were still differentially methylated in former smokers, who had quit smoking for 30 years (Joehanes et al., 2016). In the large EWAS meta-analysis of smoking (Joehanes et al., 2016), 185 CpGs were differentially methylated between former and never smokers (compared to 2623 between current and never smokers). In addition, differences between former and never smokers were smaller than between current and never smokers. Reversible DNA methylation patterns may suggest that DNA methylation is reactive to smoking. However, it is also possible that the different methylation level in current smokers reflects a higher genetic liability to smoking behaviour (that makes them more likely to initiate and keep smoking). Similarly, differences between former smokers and never smokers could reflect that smoking has caused a persistent methylation change but can also be driven by genetic factors.

In population-based studies, smoking cases and non-smoking individuals may differ on many aspects, including their genetic predisposition to smoking. On the other hand, monozygotic twins are genetically identical (except for de novo mutations, but these are rare [Jonsson et al., 2021; Ouwens et al., 2018]), share a womb, and are matched on sex, age, and childhood environment. They have been exposed to similar prenatal conditions, which may include second hand smoke from smoking mothers and others. Differences in prenatal environment of monozygotic twins due to for instance unequal vascular supply are also recognized (Hall, 1996; Martin et al., 1997), although it remains to be investigated to what extent the impact of prenatal smoke exposure might differ between monozygotic twins. Smoking discordant monozygotic twin pairs offer a unique opportunity to assess smoking-reactive DNA methylation patterns (Leeuwen et al., 2007; Vink et al., 2017). Despite the large number of previous population-based smoking EWASs, only one previous study compared genome-wide DNA methylation in smoking discordant monozygotic twin pairs (Allione et al., 2015). This study analysed whole-blood Illumina 450k array methylation data from 20 discordant pairs, and reported 22 top loci, many of which had been previously associated with cigarette smoking in previous studies. However, following the correction for multiple testing, none of the differentially methylated loci were statistically significant, and this previous twin study did not examine reversibility of smoking effects, that is, where methylation status changes again following smoking cessation.

Here, we analyse unique data from a large cohort of monozygotic twin pairs. This cohort is sufficiently large to include current smoking discordant and concordant pairs, as well as pairs discordant for former smoking (Figure 1). These groups allow identification of loci that are reactive to smoking, and examination of the extent to which effects are reversible upon quitting smoking. Monozygotic pairs in which both twins are current smokers, but who differ in quantity smoked, enable examination of the effects of smoking intensity. Finally, concordant pairs who never smoked allow assessment of the amount of DNA methylation variation at smoking-reactive loci that is due to non-genetic sources of variation other than smoking. In secondary enrichment analyses, we examined whether smoking-reactive methylation patterns are enriched (1) at loci detected in previous EWASs of other traits and exposures, (2) at loci detected in a previous large GWAS meta-analysis of smoking initiation (Liu et al., 2019) – these loci are presumed to have a causal effect on smoking behaviour, and (3) within Gene Ontology and Kegg pathways. Finally, we examined the relationship between DNA methylation and RNA transcript levels in blood for smoking-reactive loci.

DNA methylation analysis in smoking discordant and smoking concordant monozygotic twin pairs.

Blood DNA methylation profiles (Illumina 450k array) from six groups of monozygotic twin pairs were analysed.

Methods

Participants

In the Netherlands Twin Register (NTR), DNA methylation data are available for 3089 whole-blood samples from 3057 individuals in twin families, as described in detail previously (van Dongen et al., 2016). The samples were obtained from twins and family members, who participated in NTR longitudinal survey studies (Ligthart et al., 2019) and the NTR biobank project (Willemsen et al., 2010). In the current study, methylation data from monozygotic twin pairs were analysed. Among 768 monozygotic twin pairs with genome-wide methylation data and information on smoking and covariates, we identified the following discordant pairs: 53 discordant pairs, in which one twin was a current smoker at blood draw and the other never smoked, 72 discordant pairs, in which one twin was a former smoker at blood draw and the other never smoked, 66 discordant pairs of which one twin was a former smoker and the other a current smoker at blood draw. In addition, we identified the following concordant pairs: 83 twin pairs concordant for current smoking, 88 twin pairs concordant for former smoking, and 406 concordant twin pairs who never smoked. A flowchart is provided in Figure 2. Informed consent was obtained from all participants. The twin pairs were primarily of Dutch-European ancestry. For 753 of the 768 MZ pairs who are included in the current study, ancestry could be derived from principal components (PCs) calculated from genome-wide Single Nucleotide Polymorphism (SNP) array data that were available for the twins (750 pairs) or for both of their parents (3 pairs). According to the genotype data PCs, 4.5% of the pairs classify as ancestry outliers.The study was approved by the Central Ethics Committee on Research Involving Human Subjects of the VU University Medical Centre, Amsterdam, an Institutional Review Board certified by the U.S. Office of Human Research Protections (IRB number IRB00002991 under Federal-wide Assurance – FWA00017598; IRB/institute code, NTR 03-180).

Study flowchart.

Peripheral blood DNA methylation and cell counts

Genome-wide DNA methylation in whole blood was measured by the Human Genomics facility (HugeF) of ErasmusMC, the Netherlands (http://www.glimdna.org/). DNA methylation was assessed with the Infinium HumanMethylation450 BeadChip Kit (Illumina, San Diego, CA, USA). Genomic DNA (500 ng) from whole blood was bisulfite treated using the Zymo EZ DNA Methylation kit (Zymo Research Corp, Irvine, CA, USA), and 4 μl of bisulfite-converted DNA was measured on the Illumina 450k array (Bibikova et al., 2011) following the manufacturer’s protocol. A custom pipeline for quality control and normalization of the methylation data was developed by the BIOS consortium. First, sample quality control was performed using MethylAid (van Iterson et al., 2014). Next, probe filtering was applied with DNAmArray (Sinke et al., 2019) to remove: ambiguously mapped probes (Chen et al., 2013), probes with a detection p-value >0.01, or bead number <3, or raw signal intensity of zero. After these probe filtering steps, probes and samples with a success rate <95% were removed. Next, the DNA methylation data were normalized using functional normalization (Fortin et al., 2014), as implemented in DNAmArray (Sinke et al., 2019) using the cohort-specific optimum number of control probe-based PCs. Probes containing an SNP, identified in a DNA sequencing project in the Dutch population (The Genome of the Netherlands Consortium, 2014), within the CpG site (at the C or G position) were excluded irrespective of minor allele frequency, and only autosomal probes were analysed, leading to a total number of 411,169 methylation sites. The following subtypes of white blood cells were counted in blood samples: neutrophils, lymphocytes, monocytes, eosinophils, and basophils (Willemsen et al., 2010).

Smoking and other phenotypes

Information on smoking behaviour was obtained by interview during the home visit for blood collection as part of the NTR biobank project (2004–2008 and 2010–2011). The questions are included in Supplementary file 1. Participants were asked: ‘Did you ever smoke?’, with answer categories: (1) no, I never smoked, (2) I’m a former smoker, and (3) yes. Current smokers were asked how many years they smoked and how many cigarettes per day they smoked at present, while ex-smokers were asked how many years ago they quit, for how many years they smoked and how many cigarettes per day they smoked (note that the question on cigarettes per day to former smoker did not specify a particular time period, which may introduce variation in responses). Data were checked for consistencies and missing data were completed by linking this information to data from surveys filled out close to the time of biobanking within the longitudinal survey study of the NTR. More details on these checks are described in Supplementary file 1. Packyears were calculated as the (number of cigarettes smoked per day/20) × number of years smoked. Plasma cotinine level measurements have been described previously (Bot et al., 2013). Body mass index (BMI) was obtained at blood draw. Educational attainment was obtained in multiple longitudinal surveys and was defined as the highest completed level of education at the age of 25 or higher. It was classified on a 7-point scale: 1 = primary school only, 2 = lower vocational schooling, 3 = lower secondary schooling (general), 4 = intermediate vocational schooling, 5 = intermediate/higher secondary schooling (general), 6 = higher vocational schooling, 7 = university.

Statistical analyses

Overview and hypotheses

All analyses were performed in R (R Development Core Team, 2013). Analyses were performed in six groups of monozygotic twin pairs (Figure 1). To identify DNA methylation differences in smoking-discordant monozygotic twin pairs, we first compared the twin pairs, in which one twin had never smoked, and the other was a current smoker at the time of blood sampling. Second, to identify which of these DNA methylation differences might be reversible, we analysed data from (1) monozygotic pairs in which one twin had never smoked, and the other was a former smoker at the time of blood sampling, (2) from monozygotic pairs in which one twin was a current smoker, and the other was a former smoker at the time of blood sampling, and (3) from monozygotic pairs who were both former smokers. Third, to quantify within-pair methylation differences that occur by chance alone, we compared the within-pair differences monozygotic twins concordant for never having smoked. Forth, data from monozygotic twins concordant for current smoking were analysed to examine the effects of smoking intensity. Our hypotheses were as follows: (1) if DNA methylation level is reactive to cigarette smoking, methylation differences will be present between smokers and non-smokers after ruling out genetic differences, that is in smoking-discordant monozygotic twin pairs, and these differences will be larger than in monozygotic pairs concordant for never smoking, (2) if DNA methylation patterns are reversible upon quitting smoking, methylation differences (∆M) in monozygotic pairs will show the following pattern: ∆M discordant current-never > ∆M discordant current-former and ∆M discordant former-never > ∆M concordant never, (3) a correlation between time since quitting smoking and ∆M in pairs discordant for former smoking is consistent with a gradual reversibility of methylation levels upon quitting smoking, and (4) a correlation between ∆M and the difference in number of cigarettes smoked per day in smoking concordant pairs is consistent with smoking-reactive methylation patterns that show a dose–response relationship with amount of cigarettes smoked.

Epigenome-wide association study

In the entire dataset of 3089 blood samples, we used linear regression analysis to correct the DNA methylation levels (β-values) for commonly used covariates (van Rooij et al., 2019), including HM450k array row, bisulphite plate (dummy-coding) and white blood cell percentages (% neutrophils, % monocytes, and % eosinophils). White blood cell percentages were included to account for variation in cellular composition between whole-blood samples. Lymphocyte percentage was not included in models because it was strongly correlated with neutrophil percentage (r = −0.93), and basophil percentage was not included because it showed little variation between subjects, with a large number of subjects having 0% of basophils. We did not adjust for sex and age, because monozygotic twins have the same sex and age. The residuals from this regression analysis were used in the within-pair EWAS analyses. Specifically, the residuals were used as input for paired t-tests to compare the methylation of the smoking twins with that of their non-smoking co-twins. Similarly, paired t-tests were applied to data from smoking concordant pairs. Statistical significance was assessed following stringent Bonferroni correction for the number of methylation sites tested (α = 0.05/411,169 = 1.2 × 10−7). For each EWAS analysis, the R package Bacon was used to compute the Bayesian inflation factor (van Iterson et al., 2017). A previous power analysis for DNA methylation studies in discordant monozygotic twins indicated that with 50 discordant pairs, there is 80% power to detect methylation differences of 15% (at epigenome-wide significance; that is following multiple testing correction) (Tsai and Bell, 2015). Power quickly drops for smaller effect sizes; for example, with 50 discordant pairs, the power to detect a 10% methylation difference is 10% and the power to detect a methylation difference of 5% approaches alpha (Tsai and Bell, 2015). We tested for within-pair differences in demographics (e.g. BMI, educational attainment) and smoking characteristics (e.g. amount of cigarettes per day) with paired t-tests (continuous data) and Wilcoxon Signed Ranks tests (ordinal data) in R.

Dose–response relationships

For significant CpGs from the EWAS of discordant monozygotic twin pairs, we examined dose–response relationships in smoking concordant pairs (both twins were current smokers) by correlating within-pair differences in DNA methylation with within-pair differences in smoking packyears and cigarettes per day. All correlations reported in this paper are Pearson correlations. Secondly, in twin pairs discordant for former smoking (one twin never smoked and the other one is a former smoker), we correlated and plotted within-pair differences in DNA methylation with the time since quitting smoking to assess the relationship between time since quitting smoking and reversal of methylation differences within monozygotic twin pairs.

Enrichment analyses

We used the EWAS Toolkit from the EWAS atlas (Li et al., 2019) to perform enrichment analyses of Gene Ontology Terms, Kegg pathways, and previously associated traits among top sites from the EWAS in discordant monozygotic twin pairs (current versus never). With the trait enrichment tool of the EWAS analysis, we tested for enrichment of all traits (680) that were present in the atlas on 26 April 2022. Because the software requires a minimum of 20 input CpGs, we selected the top 20 CpGs from the EWAS in discordant monozygotic pairs for the enrichment analyses using the EWAS toolkit.

To study overlap of EWAS signal with genetic findings for smoking, we compared our EWAS results against GWAS results from the largest GWAS meta-analysis of smoking phenotypes. This is the meta-analysis of smoking initiation by the GWAS and Sequencing Consortium of Alcohol and Nicotine use (GSCAN) (Liu et al., 2019). We obtained leave-one out meta-analysis results with NTR excluded. From the GWAS, we selected all SNPs with a p-value <5.0 × 10−8 and determined the distance of each Illumina 450k methylation site to each SNP. We then tested whether methylation sites within 1 Mb of genome-wide significant SNPs from the GWAS showed a stronger signal in the within-pair EWAS of smoking discordant monozygotic pairs compared to other genome-wide methylation sites, by regressing the EWAS test statistics on a variable (GWAS locus) indicating if the CpG is located within a 1 Mb window from SNPs associated with smoking initiation (1 = yes, 0 = no):

t=Intercept+βGWASlocus*GWASlocus

where t represents the absolute t-statistic from the paired t-test comparing within-pair methylation differences in smoking discordant pairs and βGWASlocus represents the estimate for GWASlocus, that is the change in the t-test statistic associated with a one-unit change in the variable GWASlocus (e.g. being within 1 Mb of SNPs associated with smoking initiation). For each enrichment test, bootstrap standard erors were computed with 2000 bootstraps with the R-package ‘simpleboot’.

Gene expression

For significant CpGs from the EWAS of discordant monozygotic twin pairs (current versus never), we examined whether the DNA methylation was associated with gene expression levels in cis. To this end, we used an independent whole-blood RNA-sequencing dataset from the Biobank-based Integrative Omics Study (BIOS) consortium that did not include NTR, and tested associations between genome-wide CpGs and transcripts in cis (<250 kb), as described in detail previously (the BIOS Consortium et al., 2017). In short, methylation and expression levels in whole-blood samples (n = 2101) were quantified with Illumina Infinium HumanMethylation450 BeadChip arrays and with RNA-seq (2 × 50 bp paired-end, Hiseq2000, >15 M read pairs per sample). For each target CpG (epigenome-wide significant differentially methylated positions [DMPs]), we identified transcripts in cis (<250 kb), for which methylation levels were significantly associated with gene expression levels at the experiment-wide threshold applied by this study (False Discovery Rate (FDR) <5.0%), after regressing out methylation Quantitative Trait Locus (mQTL) and expression Quantitative Trait Locus (eQTL) effects. We also examined whether significant CpGs from the EWAS of discordant monozygotic twin pairs mapped to genes that were previously reported to be differentially expressed in monozygotic pairs of which one twin never smoked, and the other was a current smoker at the time of blood sampling (based on Affymetrix U219 array data; n = 56 pairs; note: the 53 discordant pairs included in the current study of DNA methylation are a subset of the 56 discordant pairs included in the study of gene expression) (Vink et al., 2017).

Results

Descriptives of the smoking-discordant and concordant monozygotic twin pairs are given in Table 1. In twin pairs discordant for current smoking status (i.e. one twin a current smoker at the time of blood sampling and the other never initiated regular smoking, N = 53 pairs, mean age = 33 years), the smoking twin on average smoked 8.9 cigarettes per day at the time of blood sampling, and had an average smoking history equivalent to 6.8 packyears. The EWAS analysis in pairs discordant for current smoking status identified 13 epigenome-wide significant (p < 1.20 × 10−7) DMPs (Figure 3a). Genome-wide test statistics were not inflated (Supplementary file 2). Absolute differences in methylation ranged from 2.5% to 13% (0.025–0.13 on the methylation β-value scale), with a mean of 5.4% (Table 2). Eight of the 13 CpGs (61.5%) showed lower methylation in the current smoking twins compared to their non-smoking twins. Pair-level methylation β-values are shown in Figure 3—figure supplement 1 and illustrate large consistency in the direction of effect. For example, at top CpG site cg05575921, for 51 out of the 53 pairs, the smoking twin had a lower methylation level than the non-smoking twin. At 11 of the 13 CpGs, the methylation difference in smoking discordant monozygotic twin pairs was smaller (on average 19.0%, range = 2.2–37.5%) compared to the methylation difference reported previously in an EWAS meta-analysis of smoking (Joehanes et al., 2016). At two CpGs, the methylation difference in smoking discordant monozygotic twins was larger (on average 24.6%).

Figure 3 with 1 supplement see all
Top differentially methylated loci identified in monozygotic twin pairs discordant for current smoking.

(a) Manhattan plot of the epigenome-wide association study (EWAS) in 53 smoking discordant monozygotic twin pairs (current versus never). The red horizontal line denotes the epigenome-wide significance threshold (Bonferroni correction) and 13 CpGs with significant differences are highlighted. (b) Mean within-pair differences in monozygotic twin pairs at the 13 CpGs that were epigenome-wide significant in smoking discordant monozygotic pairs. Mean within-pair differences of the residuals obtained after correction of methylation β-values for covariates are shown for 53 monozygotic pairs discordant for current/never smoking, 66 monozygotic pairs discordant for current/former smoking, 72 monozygotic pairs discordant for former/never smoking, 83 concordant current smoking monozygotic pairs, 88 concordant former smoking monozygotic pairs, and 406 concordant never smoking monozygotic pairs. (c) QQ-plot showing p-values from the EWAS in 53 smoking discordant monozygotic twin pairs (current versus never). P-values for CpGs located nearby significant SNPs from the genome-wide association study (GWAS) of smoking initiation are plotted in blue and all other genome-wide CpGs are plotted in orange.

Table 1
Descriptive statistics for smoking discordant and concordant monozygotic twin pairs.
Discordant current/never(53 pairs)Discordant former/never(72 pairs)Discordant current/former(66 pairs)Concordant current(83 pairs)Concordant never(406 pairs)Concordant former(88 pairs)
Current smokerNever-smokerMean diffp-valueFormer smokerNever-smokerMean diffp-valueCurrent smokerFormer smokerMean diffp-valueTwin 1Twin 2Mean diffp-valueTwin 1Twin 2Mean diffp-valueTwin 1Twin 2Mean diffp-value
% Female pairs60.4%60.4%n.a.n.a.77.80%77.80%n.a.n.a.69.7%69.7%n.a.n.a.61.4%61.4%n.a.n.a.73.6%73.6%n.a.n.a.64.8%64.8%n.a.n.a.
Age at blood sampling, mean (SD)33.1 (8.0)33.0 (7.9)0.100.3441.4 (13.2)41.4 (13.1)0.020.8342.2
(12.6)
42.2
(12.5)
−0.060.4533.8
(10.3)
33.9
(10.5)
−0.120.1033.1
(11.3)
33.0
(11.2)
0.060.0845.2
(13.4)
45.2
(13.4)
0.090.29
Cigarettes per day at blood sampling, mean (SD), N missings8.9 (6.4), 6n.a.n.a.n.a.n.a.n.a.n.a.n.a.11.9 (7.2), 9n.a.n.a.n.a.11.1
(7.0), 2
10.9
(6.9), 1
0.001.00n.a.n.a.n.a.n.a.n.a.n.a.n.a.n.a.
Packyears, mean (SD), N missings6.8 (7.0), 13n.a.n.a.n.a.5.9 (11.1), 15n.a.n.a.n.a.13.6
(13.2), 9
9.3
(8.7), 10
3.90.059.7 (9.3), 108.3 (7.6), 90.220.82n.a.n.a.n.a.n.a.10.6
(11.5), 7
9.8
(10.4), 11
0.780.55
Years since quitting smoking, mean (SD), N missingsn.a.n.a.n.a.n.a.13.5 (11.4), 9n.a.n.a.n.a.n.a.9.0
(10.2), 2
n.a.n.a.n.a.n.a.n.a.n.a.n.a.n.a.n.a.n.a.11.9
(9.1), 8
13.6
(11.8), 7
−1.620.20
Plasma cotinine level, mean (SD), N missings*222 (197.5), 11.8 (2.6), 28261.11.6 × 10−51.4 (1.8), 490.9 (1.0), 520.620.43286.7 (330.5), 219.2 (70.0), 28293.82.5 × 10−6267 (290.9), 3279 (308.1), 3−8.70.781.3 (9.7), 2740.5 (0.8), 2600.070.5055.8 (222.2), 616 (16.2), 6483.30.26
Educational Attainment, N (%)0.040.970.340.600.760.83
N missing18111313141428341051081420
1.  Primary school only0 (0%)1 (2.3%)0 (0%)1 (1.7%)2 (3.8%)5 (9.6%)1 (1.8%)2 (4.1%)3 (1.0%)4 (1.3%)6 (8.1%)4 (5.9%)
2.  Lower vocational schooling1 (2.9%)0 (0%)7 (11.9%)6 (10.2%)10 (19.2%)4 (7.7%)4 (7.3%)1 (2.0%)8 (2.7%)10 (3.4%)2 (2.7%)10 (14.7%)
3. Lower secondary schooling (general)2 (5.7%)2 (4.8%)11 (18.6%)8 (13.6%)7 (13.5%)4 (7.7%)5 (9.1%)7 (14.3%)11 (3.7%)11 (3.7%)10 (13.5%)6 (8.8%)
4. Intermediate vocational schooling12 (34.3%)15 (35.7%)13 (23.7%)19 (32.2%)8 (15.4%)13 (25.0%)16 (29.1%)16 (32.7%)81 (26.9%)85 (28.5%)26 (35.1%)17 (25.0%)
5. Intermediate/higher secondary schooling (general)2 (5.7%)1 (2.4%)3 (6.8%)3 (5.1%)2 (3.8%)2 (3.8%)4 (7.3%)6 (12.2%)15 (5.0%)17 (5.7%)3 (4.1%)2 (2.9%)
6. Higher vocational schooling14 (40.0%)14 (33.3%)14 (22.0%)11 (18.6%)15 (28.8%)16 (30.8%)19 (34.5%)11 (22.4%)97 (32.2%)81 (27.2%)17 (23.0%)17 (25.0%)
7. University4 (11.4%)9 (21.4%)10 (16.9%)11 (18.6%)8 (15.4%)8 (15.4%)6 (10.9%)6 (12.2%)86 (28.6%)90 (30.2%)10 (13.5%)12 (17.6%)
BMI, mean (SD), N missings23.8 (3.8), n.a.24.0 (3.4), n.a.−0.170.7425.0 (3.6), n.a.25.1 (4.2), n.a.−0.130.7523.7 (3.1)25.2 (4.2)−1.473.4 × 10−423.6 (3.5), n.a.23.1 (3.2), n.a.0.470.0623.8 (3.8),423.6 (3.4), 20.270.0325.6 (4.0), n.a.24.9 (3.7), n.a.0.630.02
Percentage monocytes, mean (SD), N missings8.0 (2.3), 08.5 (2.4), 0−0.440.198.6 (2.0), 08.3 (1.8), 00.290.198.6
(1.9),0
9.2
(3.1), 0
−0.570.148.3 (2.1), 08.1 (1.9), 00.200.368.5 (2.0), 08.5
(2.2), 0
0.030.758.4
(2.4), 0
8.5
(2.4), 0
−0.060.75
Percentage lymphocytes, mean (SD), N missings35.0 (8.9), 035.9 (10.0)−0.940.5033.6 (8.5), 034.0 (8.7)−0.370.7735.8
(8.2),0
35.6
(8.5), 0
0.250.8433.7 (8.3), 034.1 (8.3), 0−0.440.6736.3 (8.4), 036.2 (8.4), 00.040.9235.0
(7.7), 0
34.1
(8.4), 0
0.950.26
Percentage neutrophils, mean (SD), N missings53.4 (9.5), 052.1 (9.8), 01.340.3854.4 (9.1), 054.5 (9.1), 0−0.080.9551.6
(9.0),0
51.7
(8.9), 0
0–0.060.9653.7 (8.9), 053.7 (9.3), 00.030.9851.8, (8.7), 051.9 (9.3), 0−0.080.8652.8
(8.2), 0
53.7
(8.4), 0
−0.840.35
Percentage eosinophils, mean (SD), N missings3.1 (2.5), 03.1 (2.1), 00.050.913.1 (1.9), 02.9 (2.0), 00.150.533.3
(1.9),0
3.1
(1.7), 0
0.210.333.4 (2.2), 03.4 (1.8), 0−0.030.912.9 (1.8), 02.9
(1.9), 0
−0.040.663.1
(1.9), 0
3.2
(2.4), 0
−0.060.77
Percentage basophils, mean (SD), N missings0.5 (0.7), 00.5 (0.7), 0−0.010.960.3 (0.3), 00.4 (0.5), 0−0.020.760.6
(0.9),0
0.4
(0.4), 0
0.180.120.9 (3.1), 00.6 (1.1), 00.250.490.5 (0.7), 00.4
(0.7), 0
0.060.210.6
(1.1), 0
0.6
(0.9), 0
−0.010.97
  1. *

    Missing values include values that are below the detection limit. BMI = body mass Index.

Table 2
Epigenome-wide significant differentially methylated CpGs in monozygotic pairs discordant for current smoking status.
Current smoking discordant pairsFormer smoking discordant pairs (former/never)
IlmnIDCHRMAPINFOGene*Nearest geneMean diffp-value95conf_L95conf_HT-StatisticMean diffp-value95conf_L95conf_HT-Statistic
cg055759215373378AHRRAHRR0.1324.9 × 10−110.1000.1658.2650.0273.3 × 10−40.0130.0413.778
cg215666422233284661ALPPL20.0921.5 × 10−100.0690.1157.9600.0333.2 × 10−60.0200.0465.067
cg059512212233284402ALPPL20.0661.8 × 10−90.0480.0847.2700.0264.6 × 10−60.0160.0374.964
cg019402732233284934ALPPL20.0602.1 × 10−90.0440.0777.2400.0181.3 × 10−40.0090.0274.052
cg13411554353700276CACNA1DCACNA1D−0.0386.0 × 10−9−0.049−0.027−6.947−0.0070.10−0.0160.002−1.655
cg019013321175031054ARRB1ARRB10.0258.0 × 10−90.0180.0336.8680.0060.16−0.0020.0131.425
cg211611385399360AHRRAHRR0.0461.9 × 10−80.0320.0596.6420.0020.64−0.0060.0090.466
cg00336149353700195CACNA1DCACNA1D−0.0272.0 × 10−8−0.035−0.019−6.615−0.0020.60−0.0080.005−0.524
cg22132788745002486MYO1GMYO1G−0.0562.4 × 10−8−0.073−0.039−6.596−0.0114.5 × 10−3−0.019−0.004−2.930
cg21188533353700263CACNA1DCACNA1D−0.0363.9 × 10−8−0.047−0.025−6.437−0.0060.24−0.0150.004−1.196
cg09935388192947588GFI1GFI10.0524.1 × 10−80.0350.0686.4230.0020.61−0.0060.0100.519
cg256482035395444AHRRAHRR0.0355.3 × 10−80.0240.0466.3530.0020.48−0.0040.0080.710
cg19089201745002287MYO1GMYO1G−0.0407.5 × 10−8−0.053−0.028−6.260−0.0070.13−0.0170.002−1.529
  1. Coordinates are given based on genome build 37. Mean differences represent non-smoking twin minus smoking-twin (hence positive values indicate a higher methylation level in non-smoking twins). The table shows the 13 epigenome-wide significant CpGs from the within-pair EWAS in 53 discordant monozygotic twin pairs (current versus never smokers). Results from the comparison within 72 monozygotic pairs discordant for former smoking are also shown.

  2. *

    CpGs without a gene name are located in intergenic regions. 95conf_L = 95% confidence interval lower bound, 95conf_H = 95% confidence interval upper bound.

In twin pairs discordant for former smoking (N = 72 pairs, mean age = 41 years), the twins, who used to smoke, had quit smoking on average 14 years ago (standard deviation [SD] = 11.4, range = 0.04–50 years), while the other twins had never initiated regular smoking. In this group, no epigenome-wide significant DMPs were identified, and within-pair differences at the 13 significant DMPs identified in the previous analysis were diminished (average reduction: 81%, range = 61–96%; Figure 3b, Table 2). By contrast, in twin pairs of which one twin was a current smoker at blood draw and the co-twin had quit smoking (on average 9 years, ago, SD = 10.2, range = 0.02–40 years, N = 66 pairs), the reduction of within-pair differences at the 13 top CpGs was much smaller (on average, 31%, range 15–52%; Figure 3b, Supplementary file 3), and 5 of the 13 DMPs identified by comparing current and never smoking twins were also epigenome-wide significant in this group. Furthermore, five additional epigenome-wide CpGs were identified in current/former smoking discordant pairs (Supplementary file 4). Figure 3b illustrates the pattern of within-pair differences at the 13 top DMPs identified in current/never discordant monozygotic pairs: largest differences in current/never smoking discordant pairs, smaller differences in former/never discordant pair, and current/former discordant pairs are intermediate. Differences are smallest within smoking concordant pairs. This pattern is in line with smoking-associated methylation patterns in blood cells being to a large extent reversible upon quitting smoking.

Distributions of within-pair differences in smoking discordant and concordant pairs for the top 1000 CpGs of the EWAS in discordant pairs are shown in Figure 4a. The distributions illustrate that differences are largest, as expected, within monozygotic twin pairs discordant for current smoking (current/never smoking pairs), followed by discordant current/former smoking discordant pairs, followed by former/never smoking discordant monozygotic twin pairs. Monozygotic pairs concordant for current smoking also show notable within-pair differences at these CpGs that are substantially larger compared to monozygotic pairs concordant for never smoking (Figure 4a). This could be explained by within-pair differences in the number of cigarettes smoked by monozygotic twins who were concordant for current smoking. The twin correlations in current smoking monozygotic twin pairs were r = 0.50, p = 2.2 × 10−6 for cigarettes per day (Figure 4b), r = 0.43, p = 3.2 × 10−4 for packyears, and r = 0.58, p = 1.6 × 10−8, for plasma cotinine levels, respectively. Within-pair differences in DNA methylation at the 13 top CpGs correlated with within-pair differences in the number of cigarettes smoked per day (mean absolute r = 0.38, range [for different CpGs]: –0.56 to 0.41; Table 3, Figure 4c),with within-pair differences in packyears (mean absolute r = 0.46, range: −0.65 to 0.42; Table 3), but did not correlate strongly with within-pair differences in plasma cotinine level (mean absolute r = 0.14, range: −0.23 to 0.28, Table 3). In twin pairs discordant for former smoking, within-pair differences in DNA methylation at the 13 top CpGs were weakly correlated with time since quitting smoking (mean r = −0.11, range = −0.28 to 0.05, Supplementary file 5). Based on scatterplots of the within-pair methylation differences against time since quitting smoking (Figure 4d), we hypothesized that the lack of a strong correlation with time since quitting smoking might be explained by most of the reversal taking place within the first years after quitting smoking. We therefore repeated the analysis restricting to those pairs of which the smoking twin had quit smoking less than 5 years ago (N = 15 pairs). In this group, within-pair differences in DNA methylation at the 13 top CpGs were on average more strongly correlated with time since quitting smoking (mean r = −0.16, range = −0.48 to 0.23) but the sample size was greatly reduced and correlations were non-significant.

DNA methylation differences in smoking discordant and smoking concordant pairs.

(a) Distributions of the mean absolute within-pair differences in discordant and concordant pairs at the top 1000 CpGs with the lowest p-value from the epigenome-wide association study (EWAS) in discordant monozygotic pairs (current versus never smokers). (b) Scatterplot of cigarettes smoked per day in 80 concordant current smoking monozygotic pairs with complete data. (c) Scatterplot of within-pair differences in cigarettes smoked per day versus DNA methylation at cg05575921 (AHRR) in 80 concordant current smoking monozygotic pairs with complete data. (d) Scatterplot of within-pair differences in DNA methylation at cg05575921 (AHRR) versus time since quitting smoking (years) in 63 pairs discordant for former smoking.

Table 3
Correlations of within-pair differences in DNA methylation with within-pair differences in cigarettes per day and packyears in 83 concordant current smoking monozygotic pairs.
cgidCHRPositionGeneNearest geneCigarettes per dayPackyearsp-value
rp-valuerr
cg055759215373378AHRRAHRR−0.525.9 × 10−7−0.551.2 × 10−6−0.080.50
cg215666422233284661ALPPL2−0.494.7 × 10−6−0.568.1 × 10−7−0.190.10
cg059512212233284402ALPPL2−0.444.7 × 10−5−0.569.0 × 10−7−0.180.12
cg019402732233284934ALPPL2−0.567.0 × 10−8−0.652.0 × 10−9−0.230.04
cg13411554353700276CACNA1DCACNA1D0.271.4 × 10−20.327.4 × 10−30.200.08
cg019013321175031054ARRB1ARRB1−0.233.8 × 10−2−0.345.5 × 10−3−0.040.71
cg211611385399360AHRRAHRR−0.528.4 × 10−7−0.527.4 × 10−6−0.050.65
cg00336149353700195CACNA1DCACNA1D0.361.0 × 10−30.423.5 × 10−40.160.16
cg22132788745002486MYO1GMYO1G0.412.1 × 10−40.418.4 × 10−40.140.23
cg21188533353700263CACNA1DCACNA1D0.324.1 × 10−30.381.4 × 10−30.270.01
cg09935388192947588GFI1GFI1−0.429.1 × 10−5−0.591.4 × 10−7−0.030.80
cg256482035395444AHRRAHRR−0.281.2 × 10−2−0.424.3 × 10−4−0.060.61
cg19089201745002287MYO1GMYO1G0.151.8 × 10−10.272.6 × 10−20.230.05

All 13 differentially methylated CpGs identified in current smoking discordant pairs and all 10 CpGs identified in former smoking discordant pairs have been previously associated with smoking. To study the overlap of methylation differences in smoking discordant twin pairs with loci that have a causal effect on smoking, we considered the largest GWAS meta-analysis of smoking phenotypes, the meta-analysis of smoking initiation by the GWAS and Sequencing Consortium of Alcohol and Nicotine use (GSCAN) (Liu et al., 2019). Three of the 13 epigenome-wide significant DMPs detected in smoking discordant monozygotic pairs (cg13411554, cg00336149, and cg21188533 in CACNA1D) are located within 1 Mb of a GWAS locus associated with smoking initiation. The methylation sites within 1 Mb of genome-wide significant SNPs from the GWAS overall did not show a stronger signal in the within-pair EWAS of smoking discordant monozygotic pairs compared to other genome-wide methylation sites (β = −0.002, se = 0.004, p = 0.56, Figure 3c).

We tested for enrichment of methylation sites previously associated with 680 traits reported in the EWAS atlas (Li et al., 2019), among the top differentially methylated loci in smoking discordant pairs, which showed strong enrichment of smoking-related traits (Supplementary file 6). Enrichment analysis based on Kegg pathways showed one significantly enriched pathway; Dopaminergic Synapse (hsa04728; Supplementary file 7), with three of the top differentially methylated loci in smoking discordant monozygotic pairs mapping to this pathway: CACNA1D, GNG12, and ARRB1. No significant enrichment was seen in GO pathways after multiple testing correction (Supplementary file 8).

To examine potential functional consequences of top DMPs, we used previously published data on whole-blood DNA methylation and RNA-sequencing (n = 2101 samples). At 4 of the 13 CpGs, DNA methylation level in blood was associated with the expression level of nearby genes (Table 4). At three CpGs, a higher methylation level correlated with lower expression level. None of the 13 CpGs overlapped with six genes that were differentially expressed in monozygotic pairs discordant for current smoking (Vink et al., 2017).

Table 4
Significantly associated transcripts in cis for CpGs that are differentially methylated in smoking discordant monozygotic twin pairs.
CpGGeneZ scorep-valueFDR
cg25648203EXOC3−7.342.11e−130
cg19089201RP4-647J21.15.552.84e−80
cg05575921EXOC3−4.860.000001190.00039
cg21161138EXOC3−3.820.0001330.0254

Discussion

Previous EWASs have identified robust differences in DNA methylation between smokers and non-smokers at a number of loci. These differences may reflect true smoking-reactive DNA methylation patterns, but can also be driven by (genetic) confounding or reverse causation. We exploited a strong within-family design, that is, the discordant monozygotic twin design (Bell and Spector, 2012), to identify smoking-reactive loci. By analysing whole-blood genome-wide DNA methylation patterns in 53 monozygotic pairs discordant for current smoking, we found 13 CpGs with a difference in methylation level between the current smoking twin and the twin who never smoked. All 13 CpGs have been previously associated with smoking in unrelated individuals and in line with previous studies that compared unrelated smokers and controls (Joehanes et al., 2016), our data from monozygotic pairs discordant for former smoking also indicate that methylation patterns are to a large extent reversible upon smoking cessation. We further showed that differences in smoking level exposure for monozygotic twins who are both current smokers but differ in the number of cigarettes they smoke are reflected in their DNA methylation profiles.

The strongest smoking-associated loci typically detected in human blood EWAS are genes involved in detoxification pathways of aromatic hydrocarbons, such as AHRR and CYP1A1 (Gao et al., 2015), of which AHRR was also present among the top differentially methylated loci in our analysis of discordant twin pairs. Mainstream tobacco smoke is a mixture of thousands of chemicals (Rodgman and Perfetti, 2008). Although the effects of many of the compounds present in cigarette smoke are unknown, several mechanisms have been described through which cigarette smoking may affect global or gene-specific DNA methylation levels. These include DNA damage induced by certain compounds such as arsenic, chromium, formaldehyde, polycyclic aromatic hydrocarbons, and nitrosamines that all cause double-stranded breaks (Smith and Hansch, 2000) (which causes increased methylation near repaired DNA) (Mortusewicz et al., 2005; Cuozzo et al., 2007), hypoxia induced by carbon monoxide (Olson, 1984) (causing global CpG island demethylation by disrupting methyl donor availability), and modulation of the expression level or activity of DNA-binding proteins, such as transcription factors (Lee and Pausova, 2013). Nicotine, presumed to be the major addictive compound in cigarette smoke (although other putative addictive compounds have also been described [Talhout et al., 2011]), has gene regulatory effects. Binding of nicotine to nicotinic acetylcholine receptors causes downstream activation of cAMP response element-binding protein, which is a key transcription factor for many genes (Shen and Yakel, 2009). In mouse brain, nicotine downregulates the DNA methyl transferase gene Dnmt (Satta et al., 2008). Previous EWAS studies based on blood cotinine levels, as a biomarker for nicotine exposure, and based on a polygenic scores for nicotine metabolism, reported differentially methylated CpGs that largely overlap with CpGs found in EWAS of smoking status (Gupta et al., 2019; Lee et al., 2016). Furthermore, E-cigarette-based nicotine exposure of mice has been shown to cause DNA methylation changes in white blood cells (Peng et al., 2022).

Importantly, effects of smoking on DNA methylation in brain cells have been hypothesized to contribute to addiction (Zillich et al., 2022), but it is largely unknown to what extent addiction-related DNA methylation dynamics are captured in other tissues such as blood. Nicotinic receptors are especially abundant in the central and peripheral nervous system, but are also present in other tissues. In peripheral blood, nicotinic receptors are present on lymphocytes and polymorphonuclear cells (Benhammou et al., 2000), suggesting that EWA studies performed on blood cells might capture nicotine-reactive methylation patterns. Interesting in this regard is our finding that among the top differentially methylated CpGs in smoking discordant pairs are multiple CpGs in CACNA1D and GNG12, which encode subunits of a calcium voltage-gated channel and G protein, respectively; proteins that interact with the nicotinic acetylcholine receptor, and the related enrichment of Kegg pathway dopaminergic neuron. Methylation levels at these CpGs might be reactive to nicotine exposure. Furthermore, the CpGs in CACNA1D are in proximity of a GWAS locus for smoking initiation, suggesting that this might be a locus that is not only reactive to smoking exposure, but may also contribute to smoking behaviour. Although it remains to be established if the epigenetic and genetic variation at this locus are functionally connected (i.e. have the same downstream consequences on gene expression), these results suggest that these CpGs can be interesting candidates for further studies into peripheral biomarkers of smoking addiction. Since we applied a discordant monozygotic twin design, the methylation differences identified at this locus in our study cannot be driven by mQTL effects of the SNPs associated with smoking. The data from monozygotic pairs discordant for former smoking indicate that methylation patterns are to a large extent reversible upon smoking cessation, which is in line with DNA methylation patterns being reactive to smoking. Nevertheless, our findings do not rule out that the possibility that reverse causation (DNA methylation driving smoking behaviour) might also contribute to the (maintenance of) smoking discordance in smoking discordant monozygotic twin pairs. Future analyses combining DNA methylation and genetic data from monozygotic and dizygotic twins may be applied to examine bidirectional causal associations between DNA methylation and smoking (Minică et al., 2018).

The main strength of our study is the use of the discordant monozygotic twin design to examine the effects of smoking, because it rules out genetic confounding, as well as many other confounding factors. The value of studying smoking effects against an identical genetic background is clear if one considers that one of the most strongly associated genetic variants for nicotine dependence is located in the DNA methyltransferase gene DNMT3B (Hancock et al., 2018). This strongly implies a role for DNA methylation in nicotine addiction, but it also suggests that horizontal genetic pleiotropy might contribute to associations between DNA methylation and smoking in ordinary case–control EWASs, where differences in DNA methylation between unrelated smokers and non-smokers may reflect differences in genotype. Our analysis had adequate power to detect large effects (i.e., the top hits identified in typical smoking EWAS) (Tsai and Bell, 2015). These reflect only a small proportion, however, of all smoking-associated sites. In our analysis of 53 monozygotic twin pairs discordant for current versus never smoking, we detected 13 CpGs at genome-wide significance, which represent 0.5% of the total number of CpGs (2623) detected in the smoking meta-analysis of unrelated individuals (2433 current verus 6956 never smokers) (Joehanes et al., 2016). The within-pair difference in smoking discordant monozygotic pairs was smaller compared to the effect size reported previously based on the comparison of unrelated smokers and non-smokers. Larger sample sizes are required to achieve adequate power to detect smaller effects. While the pattern of within-pair differences in current/never, current/former and former/never discordant monozygotic twin pairs was clearly in line with reversal of methylation patterns following smoking cessation, we did not find a strong correlation between within-pair differences in DNA methylation and time since quitting smoking in former smoking discordant pairs. If most reversal takes place gradually in the first view years after smoking cessation, it might require larger sample sizes of twin pairs discordant for recently quitting smoking to detect such a correlation. Larger samples sizes may be achieved by combining data from multiple twin cohorts in a meta-analysis.

Common limitations that apply to many EWA studies including ours are that we only analysed DNA methylation data from blood and that the technique used to measure DNA methylation only covers a small subset of all CpG sites in the genome. Another limitation is that information on smoking was obtained through self-report. We previously described smoking misclassification in this cohort based on blood levels of cotinine (van Dongen et al., 2018), a biomarker for nicotine exposure, that has been measured in a subset of the cohort (Bot et al., 2013), which indicated a low misclassification rate. Plasma cotinine levels were available for 591 individuals classified as never smokers by self-report. Five of these individuals (0.8%) had cotinine levels ≥15 ng/ml, which is indicative of smoking, and thus indicates a misclassification of smoking status. In the current paper, we further showed that the correlation between cotinine levels in concordant current smoking pairs was similar to the correlation between self-reported number of cigarettes per day.

Conclusion

In conclusion, we studied reactiveness of DNA methylation in blood cells to smoking and reversibility of methylation patterns upon quitting smoking in monozygotic twins. Analyses in special groups such as monozygotic twins are valuable to validate results from large population-based EWAS meta-analyses, or to train more accurate methylation scores for environmental exposures that are not confounded by genetic effects. Our results illustrate the potential to utilize DNA methylation profiles of monozygotic twins as a read out of discordant exposures at present and in the past.

Data availability

The HumanMethylation450 BeadChip data from the NTR are available as part of the Biobank-based Integrative Omics Studies (BIOS) Consortium in the European Genome-phenome Archive (EGA), under the accession code EGAD00010000887, https://ega-archive.org/datasets/EGAD00010000887 (Study ID EGAS00001001077, Title: The mission of the BIOS Consortium is to create a large-scale data infrastructure and to bring together BBMRI researchers focusing on integrative omics studies in Dutch Biobanks, contact: The BIOS Consortium: Biobank-based Integrative Omics Studies, Contact person: Rick Jansen). The OMICs data and additional phenotype data are available upon request via the BBMRI-NL BIOS consortium (https://www.bbmri.nl/acquisition-use-analyze/bios). All NTR data can be requested by bona fida researchers (https://ntr-data-request.psy.vu.nl/). Because of the consent given by study participants the data cannot be made publicly available. The pipeline for DNA methylation-array analysis developed by the Biobank-based Integrative Omics Study (BIOS) consortium is available here: https://molepi.github.io/DNAmArray_workflow/ copy archived at Sinke, 2020 (https://doi.org/10.5281/zenodo.3355292). The code for the EWAS analysis in monozygotic twin pairs is included in Source code 1.

The following data sets were generated

References

    1. Liu M
    2. Jiang Y
    3. Wedow R
    4. Li Y
    5. Brazel DM
    6. Chen F
    7. Datta G
    8. Davila-Velderrain J
    9. McGuire D
    10. Tian C
    11. Zhan X
    12. Choquet H
    13. Docherty AR
    14. Faul JD
    15. Foerster JR
    16. Fritsche LG
    17. Gabrielsen ME
    18. Gordon SD
    19. Haessler J
    20. Hottenga JJ
    21. Huang H
    22. Jang SK
    23. Jansen PR
    24. Ling Y
    25. Mägi R
    26. Matoba N
    27. McMahon G
    28. Mulas A
    29. Orrù V
    30. Palviainen T
    31. Pandit A
    32. Reginsson GW
    33. Skogholt AH
    34. Smith JA
    35. Taylor AE
    36. Turman C
    37. Willemsen G
    38. Young H
    39. Young KA
    40. Zajac GJM
    41. Zhao W
    42. Zhou W
    43. Bjornsdottir G
    44. Boardman JD
    45. Boehnke M
    46. Boomsma DI
    47. Chen C
    48. Cucca F
    49. Davies GE
    50. Eaton CB
    51. Ehringer MA
    52. Esko T
    53. Fiorillo E
    54. Gillespie NA
    55. Gudbjartsson DF
    56. Haller T
    57. Harris KM
    58. Heath AC
    59. Hewitt JK
    60. Hickie IB
    61. Hokanson JE
    62. Hopfer CJ
    63. Hunter DJ
    64. Iacono WG
    65. Johnson EO
    66. Kamatani Y
    67. Kardia SLR
    68. Keller MC
    69. Kellis M
    70. Kooperberg C
    71. Kraft P
    72. Krauter KS
    73. Laakso M
    74. Lind PA
    75. Loukola A
    76. Lutz SM
    77. Madden PAF
    78. Martin NG
    79. McGue M
    80. McQueen MB
    81. Medland SE
    82. Metspalu A
    83. Mohlke KL
    84. Nielsen JB
    85. Okada Y
    86. Peters U
    87. Polderman TJC
    88. Posthuma D
    89. Reiner AP
    90. Rice JP
    91. Rimm E
    92. Rose RJ
    93. Runarsdottir V
    94. Stallings MC
    95. Stančáková A
    96. Stefansson H
    97. Thai KK
    98. Tindle HA
    99. Tyrfingsson T
    100. Wall TL
    101. Weir DR
    102. Weisner C
    103. Whitfield JB
    104. Winsvold BS
    105. Yin J
    106. Zuccolo L
    107. Bierut LJ
    108. Hveem K
    109. Lee JJ
    110. Munafò MR
    111. Saccone NL
    112. Willer CJ
    113. Cornelis MC
    114. David SP
    115. Hinds DA
    116. Jorgenson E
    117. Kaprio J
    118. Stitzel JA
    119. Stefansson K
    120. Thorgeirsson TE
    121. Abecasis G
    122. Liu DJ
    123. Vrieze S
    124. 23andMe Research Team
    125. HUNT All-In Psychiatry
    (2019) Association studies of up to 1.2 million individuals yield new insights into the genetic etiology of tobacco and alcohol use
    Nature Genetics 51:237–244.
    https://doi.org/10.1038/s41588-018-0307-5
    1. Peng G
    2. Xi Y
    3. Bellini C
    4. Pham K
    5. Zhuang ZW
    6. Yan Q
    7. Jia M
    8. Wang G
    9. Lu L
    10. Tang MS
    11. Zhao H
    12. Wang H
    (2022)
    Nicotine dose-dependent epigenomic-wide DNA methylation changes in the mice with long-term electronic cigarette exposure
    American Journal of Cancer Research 12:3679–3692.
  1. Software
    1. R Development Core Team
    (2013) R: A language and environment for statistical computing
    R Foundation for Statistical Computing, Vienna, Austria.

Decision letter

  1. Melinda Aldrich
    Reviewing Editor; Vanderbilt University Medical Center, United States
  2. W Kimryn Rathmell
    Senior Editor; Vanderbilt University Medical Center, United States
  3. Melinda Aldrich
    Reviewer; Vanderbilt University Medical Center, United States
  4. Jeff Craig
    Reviewer; Deakin University, Australia
  5. Jaakko Kaprio
    Reviewer; University of Helsinki/FIMM, Finland

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Effects of smoking on genome-wide DNA methylation profiles: A study of discordant and concordant monozygotic twin pairs" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, including Melinda Aldrich as Reviewing Editor and Reviewer #1, and the evaluation has been overseen by W Kimryn Rathmell as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Jeff Craig (Reviewer #2); Jaakko Kaprio (Reviewer #3).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

Request from Reviewing Editor:

One of the references cited by the authors (Allione et al., Plos One 2015) cites two papers (Andreoli et al. and Marcon et al) that utilize a study population collected and phenotyped with support from the tobacco industry, specifically by British American Tobacco. We request the Allione reference be annotated to indicate the study utilized data that was collected with support from the tobacco industry.Reviewer #1 (Recommendations for the authors):

1. There are several typos throughout that need to be corrected (extra words inserted, for example: line 208 says "To the study the overlap…" this should instead read "To study the overlap…").

2. The EWAS analysis says that the linear regression analysis corrected for array row. It is unclear to this reviewer why one would correct for array row. Can this be clarified?

3. It is unclear whether the 5 significant CpGs that were identified using the discordant current-former twin pair design have been previously identified or if these are novel findings.

4. What type of correlation test was performed? Please add this detail.

5. Please write out the abbreviation for DMP the first time used in the manuscript.

6. Clarification regarding the time period that was asked about for smoking behaviors would be helpful. For example, the authors indicate current smokers were asked how many cigarettes per day they smoked, but no time period is provided. Same for former smokers. Also, this reviewer assumes the authors are referring to the average number of cigarettes were smoked per day. If the authors could clarify these points that would be useful for the reader.

Reviewer #2 (Recommendations for the authors):

1. Introduction, lines 111-112 "They [identical twins] have been exposed to similar prenatal conditions": please expand to differentiating between shared and nonshared intrauterine exposures.

2. Results, lines 228-9 "At four of the 13 CpGs, DNA methylation level in blood was associated with the expression level of nearby genes." Can you test whether this number would have been achieved by chance?

3. Results: could you compare average effect sizes with studies of singletons to test whether genetic identity attenuated any effects?

4. Results line 161 "Genome-wide test statistics were not inflated": could you please explain how you came to this conclusion using Figure 2C, especially for those without experience os using this metric. Please also note that Figure 2C is too small to read even when viewed on an A4 page.

Reviewer #3 (Recommendations for the authors):

1. In the introduction, a recent review could also be cited (Heikkinen A, Bollepalli S, Ollikainen M. The potential of DNA methylation as a biomarker for obesity and smoking. J Intern Med. 2022 Sep;292(3):390-408.. PMID: 35404524).

2. Line 111, It should be noted that in MZ twins matching on early environment is not perfect, especially the prenatal environment with placentation and chorionicity effects (Martin et al., 1997, PMID: 9398838).

3. Line 163-164 How consistent was the difference within pairs? On average the smoking twin had lower methylation, but was this the case in all pairs?

4. Line 170 The average time since cessation was 9 years, but what was the variation (SD) and range. Did the authors define a minimum duration of smoking cessation for the smoker to be considered as a true former smoker. Persons who report very recent quitting are often found to continue some level of smoking (based on biomarkers).

5. Line 190 -191 The intrapair correlations of amount (CPD) and total exposure (packyears) within twin pairs in which both smoke are consistent with prior literature. Could the author speculate on why the correlations are not higher – is there a role for measurement error, or differing brands of cigarettes being smoked but yielding equal amounts of nicotine (which is not measured). Would biomarkers have provided higher correlations?

6. Line 208 is a strange sentence that does not make sense.

In addition to the discordant pair analyses, I suggest they run bivariate twin analyses to evaluate shared genetics (a) of smoking with the methylation values of top CpGs, and (b) of smoking with a epigenome-wide predicted smoking score (such EpiSmokEr, PMIDs 35716602, 31466478). Such analyses would help to address to what extent within-pair analyses capture differences seen between individuals.

7. Line 237 I would like to see some more discussion and evidence for or against their second potential explanation of Reverse causation. Given the complex nature of smoking behavior, is it at all likely that methylation drives smoking behavior?

8. Line 295-297 Can the authors quantify what fraction of individual based EWAS hits are identified here and how much of the difference between smokers and never smokers is accounted for by the observed differences in twin pairs discordant for smoking.

9. Topics for discussion that could have been included.

a) How substantial are the effect of misclassification of smoking status and of amount smoked (see related comment on assessment of smoking behavior).

b) Metabolism of nicotine to cotinine and related metabolites. Cotinine is a well-known biomarker of recent tobacco use/nicotine exposure. Methylation associations with cotinine levels have been published (Gupta R et al., 2019. PMID: 30611298; Lee MK et al., 2016 PMID: 27688819), which specifically address the relationship between nicotine and methylation (rather than all exposures in tobacco). Recent model organism work could also be cited (Peng et al., 2022, PMID 36119846).

10. Methods: Line 338 – Are there any ethnicity effects? Please provide more detail on the pairs (in Table 1 or in text) on their socio-economic status, marital status and spousal tobacco use, and other behavioral traits that affect methylation (such as obesity, alcohol use, traumatic events). Did the birth weights and birth order of the smoking and non-smoking twins in a discordant pair differ?

11. Line 339 From how many families are the 3055 individuals? A comparison with DZ twins discordant for smoking would be a valuable addition, to tease apart effects of controlling for genetics and shared environment (in MZ pairs) versus some genetics and shared environment (in DZ pairs). The genetic risk for smoking could be controlled for in DZ pairs using polygenic risk scores for smoking behavior from the GSCAN consortium and/or UK Biobank.

12. Line 377 Smoking

Was the interview for smoking behavior more detailed than described here? How was a regular smoker defined? For example the given question "Did you ever smoke? (line 380) implies two answers: yes and no, and a yes answer is an ever smoker. It does not distinguish between current and former smokers. Please provide the actual questions used in a supplement, or a link to an appropriate webpage with the items (in Dutch and English).

13. Given that the participants have answered multiple surveys, can you document the consistency of the responses over time. For example, how many who now reported having never smokers had reported smoking in an earlier survey?

How did you handle non-daily smokers? Are they considered non-smokers, smokers or excluded?

14. Did you ask any pairs discordant for smoking why one had initiated smoking and the other did not., likewise why one quit and the other did not?

15. Are there any validation studies of smoking status, using biomarkers such as cotinine or carbon monoxide in the NTR?

16. Did smoking assess cigarette use only or all smoked tobacco products such as cigars and pipe use. What about smokeless tobacco/snus, e-cigs (rare at that time I believe) or nicotine replacement therapy as a source of nicotine?

17. Line 406 uses the wording " smoking discordant monozygotic twins". I would use twin pairs discordant for smoking OR smoking discordant pairs consistently as the pairs are discordant, not the individual twins.

18. Line 432 The Bonferroni correction is overly conservative here as the CpG sites are correlated, so that should be taken into account.

19. References: There is missing or erroneous information in the cited literature for example line 636 has no author names, lines 555, 562, 608, 633, 643, 576, 586, 605 etc no issue and/or page info, etc. No publisher (line 621) Please check all references.

[Editors’ note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Effects of smoking on genome-wide DNA methylation profiles: A study of discordant and concordant monozygotic twin pairs" for further consideration by eLife. Your revised article has been evaluated by W Kimryn Rathmell (Senior Editor) and a Reviewing Editor.

The manuscript has been improved but there are some remaining issues that need to be addressed, as outlined below:

– In their rebuttal, the authors mention cigs per day was measured at the present time for current smokers and for former smokers no time period was requested. This leaves it vague as to the time period the former smoker respondents provided, e.g. was it on average over the time smoked or during the last year they smoked? A brief acknowledgment should be included that the time period was not captured when assessing cigs per day for former smokers. Thus, it was left to the respondent to determine the time period, which could lead to variation in reporting by respondents.

– The authors corrected smoking status using longitudinal survey data but did not indicate that they made these corrections in the revised manuscript, only in the rebuttal. This detail should be included for transparency. Related to this, the authors provide in their response and revised manuscript the cotinine levels for a large subset of the never smoking participants. For most, the cotinine levels were consistent with amounts expected for never smokers, but there were 5 (0.8%) persons that had cotinine levels indicative of a current smoker. Can the authors confirm corrections to smoking status were not made with cotinine? Or if they were corrected, this should be mentioned in the manuscript.

– Line 574 has a typo: ‘low classification rate’ should read ‘low misclassification rate’.

– In the initial review, one of the reviewers asked for information about the statistical test for inflation of the GWAS results. The authors indicate in their response that a sentence was added to the manuscript about this inflation factor, but it appears this sentence may have been mistakenly omitted from the manuscript.

– In the prior review, it was requested the authors remove the phrase "smoking discordant monozygotic twins" and instead refer to pairs. They made the requested revision, but then added it back into one of their revised sentences (line 369). This should be adjusted to address the reviewer's comment.

https://doi.org/10.7554/eLife.83286.sa1

Author response

Essential revisions:

Request from Reviewing Editor:

One of the references cited by the authors (Allione et al. Plos One 2015) cites two papers (Andreoli et al. and Marcon et al) that utilize a study population collected and phenotyped with support from the tobacco industry, specifically by British American Tobacco. We request the Allione reference be annotated to indicate the study utilized data that was collected with support from the tobacco industry.

We thank the editor for this comment and have added this information to the reference list.

Reviewer #1 (Recommendations for the authors):

1. There are several typos throughout that need to be corrected (extra words inserted, for example: line 208 says "To the study the overlap…" this should instead read "To study the overlap…").

We thank the reviewer for the careful reading of our manuscript. We thoroughly checked the manuscript and corrected typos.

2. The EWAS analysis says that the linear regression analysis corrected for array row. It is unclear to this reviewer why one would correct for array row. Can this be clarified?

Each Illumina 450k array contains 6 rows and 2 columns to fit 12 samples in total. Array row is a technical confounder with a linear effect on DNA methylation signals. This is connected to the technical procedure of fluorescent staining in which the arrays are placed in vertical orientation in the machine and fluorescent dye is injected at the top, resulting in a signal gradient across samples from the top to the bottom row of the array. Most of this effect (at the global level) is removed with normalization, however, probe-specific effects typically persist. Probe-specific effects of array row can be corrected for efficiently by including array row number as a covariate. It is common practice in Illumina DNA methylation array analysis to correct for array row (see for example the following reference on commonly used analysis strategies for epigenome-wide association studies https://doi.org/10.1186/s13059-019-1878-x; we have now cited this paper in our methods section).

3. It is unclear whether the 5 significant CpGs that were identified using the discordant current-former twin pair design have been previously identified or if these are novel findings.

Thank you for pointing this out. We have now clarified that all CpGs were previously identified:

“All 13 differentially methylated CpGs identified in current smoking-discordant pairs and all 10 CpGs identified in former-smoking discordant pairs have been previously associated with smoking.”

4. What type of correlation test was performed? Please add this detail.

We’ve now clarified in the methods section that Pearson correlations were used.

5. Please write out the abbreviation for DMP the first time used in the manuscript.

We did this upon first occurrence (results, page 7, line 162).

6. Clarification regarding the time period that was asked about for smoking behaviors would be helpful. For example, the authors indicate current smokers were asked how many cigarettes per day they smoked, but no time period is provided. Same for former smokers. Also, this reviewer assumes the authors are referring to the average number of cigarettes were smoked per day. If the authors could clarify these points that would be useful for the reader.

We apologize that this was unclear. During the home visit, blood samples were collected and participants were interviewed. Thus, at blood draw, current smokers were asked how many cigarettes they smoked per day at present (at the moment of blood draw), with the simple question: “how many cigarettes do you smoke per day?” and former smokers were asked how many cigarettes per day they used to smoke (“how many cigarettes did you use to smoke per day?”). We’ve edited the following sentences to clarify this.

Page 15, line 383: “Current smokers were asked how many years they smoked and how many cigarettes per day they smoked at present, while ex-smokers were asked how many years ago they quit, for how many years they smoked and how many cigarettes per day they smoked.”

Reviewer #2 (Recommendations for the authors):

1. Introduction, lines 111-112 "They [identical twins] have been exposed to similar prenatal conditions": please expand to differentiating between shared and nonshared intrauterine exposures.

Thank you for this suggestion. We have modified this section as follows:

“On the other hand, monozygotic twins are genetically identical (except for de novo mutations, but these are rare), share a womb, and are matched on sex, age and childhood environment. They have been exposed to similar prenatal conditions, which may include second hand smoke from smoking mothers and others. Differences in prenatal environment of monozygotic twins due to for instance unequal vascular supply are also recognized, although it remains to be investigated to what extent the impact of prenatal smoke exposure might differ between monozygotic twins.”

2. Results, lines 228-9 "At four of the 13 CpGs, DNA methylation level in blood was associated with the expression level of nearby genes." Can you test whether this number would have been achieved by chance?

The associations between DNA methylation and gene expression are unlikely due to chance, because multiple testing correction was applied. Associations between DNA methylation and gene expression were previously tested in an independent whole blood RNA-sequencing dataset from the Biobank-based Integrative Omics Study (BIOS) consortium that did not include NTR, and which tested associations between all genome-wide CpGs and transcripts in cis (<250 kb). In this previous study, we looked up for which of our top CpGs the methylation levels were significantly associated with gene expression levels at the experiment-wide threshold applied by this study (FDR<5.0%, across all genome-wide CpGs and transcripts in cis). Hence these associations are unlikely due to chance.

3. Results: could you compare average effect sizes with studies of singletons to test whether genetic identity attenuated any effects?

We have added a comparison of the effect sizes observed in the discordant monozygotic twin pairs to the effect sizes observed previously in a large EWAS meta-analysis of unrelated smokers and non-smokers to the Results section.

Page 7, line 164: “At 11 of the 13 CpGs, the methylation difference in smoking discordant monozygotic twin pairs was smaller (on average 19.0%, range=2.2-37.5%) compared to the methylation difference reported previously in an EWAS meta-analysis of smoking. At two CpGs, the methylation difference in smoking discordant monozygotic twins was larger (on average 24.6%).”

4. Results line 161 "Genome-wide test statistics were not inflated": could you please explain how you came to this conclusion using Figure 2C, especially for those without experience os using this metric. Please also note that Figure 2C is too small to read even when viewed on an A4 page.

Please note that this cannot be directly assessed based on figure 2C, because in this figure, 2 sets of p-values are plotted (this figure is meant to illustrate the p-value distribution of CpGs inside and outside of smoking-associated genetic regions identified in GWAS). The sentence quoted refers to Additional file 1, which gives the λ (inflation factor) for each analysis (all were close to 1, indicating no inflation of test statistics). We realized that we had omitted to describe how we assessed inflation, and have added the following sentence to the methods section. We also improved the readability of the figures.

“For each EWAS analysis, the R package Bacon was used to compute the Bayesian inflation factor.”

Reviewer #3 (Recommendations for the authors):

1. In the introduction, a recent review could also be cited (Heikkinen A, Bollepalli S, Ollikainen M. The potential of DNA methylation as a biomarker for obesity and smoking. J Intern Med. 2022 Sep;292(3):390-408.. PMID: 35404524).

Thank you for sharing the reference, we’ve added it to the introduction.

2. Line 111, It should be noted that in MZ twins matching on early environment is not perfect, especially the prenatal environment with placentation and chorionicity effects (Martin et al., 1997, PMID: 9398838).

Thank you for this suggestion. We have modified this section as follows:

“On the other hand, monozygotic twins are genetically identical (except for de novo mutations, but these are rare), share a womb, and are matched on sex, age and childhood environment. They have been exposed to similar prenatal conditions, which may include second hand smoke from smoking mothers and others. Differences in prenatal environment of monozygotic twins due to for instance unequal vascular supply are also recognized, although it remains to be investigated to what extent the impact of prenatal smoke exposure might differ between monozygotic twins.”

3. Line 163-164 How consistent was the difference within pairs? On average the smoking twin had lower methylation, but was this the case in all pairs?

The differences were highly consistent. We’ve added figures of the raw methylation β-values of the discordant twin pairs for the 13 top CpGs (Figure 2 —figure supplement 1), and added the following sentence to the Results section:

“Pair-level methylation β-values are show in Figure 2 —figure supplement 1 and illustrate large consistency in the direction of effect. For example, at top CpG site cg05575921, for 51 out of the 53 discordant pairs, the smoking twin had a lower methylation level than the non-smoking twin.”

4. Line 170 The average time since cessation was 9 years, but what was the variation (SD) and range. Did the authors define a minimum duration of smoking cessation for the smoker to be considered as a true former smoker. Persons who report very recent quitting are often found to continue some level of smoking (based on biomarkers).

Thank you for this suggestion. We’ve added SD and range of quitting smoking. We did not apply a threshold for minimum duration of reported quitting smoking. We acknowledge that a small percentage of smoking statuses may be misclassified. Plasma cotinine level (which we have now added to table 1), indeed suggest active cigarette smoking for a small number of individuals whose self-reported status is former-smoker, while cotinine levels are essentially consistent with smoking status in the other groups (current and never smokers). We have now added the topic of misclassification to the discussion.

“In twin pairs discordant for former smoking (N=72 pairs, mean age=41 years), the twins, who used to smoke, had quit smoking on average 14 years ago (SD=11.4,range=0.04-50 years), while the other twins had never initiated regular smoking”

“By contrast, in twin pairs of which one twin was a current smoker at blood draw and the co-twin had quit smoking (on average 9 years, ago, SD=10.2, range=0.02-40 years, N=66 pairs),”

“Another limitation is that information on smoking was obtained through self-report. We previously described smoking misclassification in this cohort based on blood levels of cotinine, a biomarker for nicotine exposure, that has been measured in a subset of the cohort, which indicated a low classification rate. Plasma cotinine levels were available for 591 individuals classified as never smokers by self-report. Five of these individuals (0.8%) had cotinine levels > = 15 ng/mL, which is indicative of smoking, and thus indicates a misclassification of smoking status. In the current paper, we further showed that the correlation between cotinine levels in concordant current smoking pairs was similar to the correlation between self-reported number of cigarettes per day.”

5. Line 190 -191 The intrapair correlations of amount (CPD) and total exposure (packyears) within twin pairs in which both smoke are consistent with prior literature. Could the author speculate on why the correlations are not higher – is there a role for measurement error, or differing brands of cigarettes being smoked but yielding equal amounts of nicotine (which is not measured). Would biomarkers have provided higher correlations?

This is an interesting question. We’ve now also looked at the correlation between plasma levels of cotinine in the smoking-concordant twin pairs, and the correlation (r=0.58) is only slightly higher compared to the correlation for self-reported cigarettes per day (r=0.50), and the correlation for packyears (r=0.43), i.e. the biomarker level is quite consistent with the similarity in reported amounts of cigarettes smoked per day. One could argue that the exact amount (or brand) smoked is not only subject to genetic influences but is an individual habit that is also influenced by the unique environment, individual life trajectories of the twins and potentially by differences in nicotine metabolic rate. An alternative explanation could be that the twins are in fact typically smoking similar amounts, but that self-report and measured cotinine levels have similar amounts of measurement error.

“The twin correlations in current smoking monozygotic twin pairs were r=0.50, p=2.2x10-6 for cigarettes per day (Figure 3b), r=0.43, p=3.2 x10-4 for packyears, and r=0.58, p=1.6x10-8, for plasma cotinine levels, respectively.”

6. Line 208 is a strange sentence that does not make sense.

We have adjusted the sentence.

“To study overlap of EWAS signal with genetic findings for smoking, we compared our EWAS results against GWAS results from the largest GWAS meta-analysis of smoking phenotypes. This is the meta-analysis of smoking initiation by the GWAS and Sequencing Consortium of Alcohol and Nicotine use (GSCAN)”

In addition to the discordant pair analyses, I suggest they run bivariate twin analyses to evaluate shared genetics (a) of smoking with the methylation values of top CpGs, and (b) of smoking with a epigenome-wide predicted smoking score (such EpiSmokEr, PMIDs 35716602, 31466478). Such analyses would help to address to what extent within-pair analyses capture differences seen between individuals.

We agree that this is a great suggestion for follow-up, that we are in fact currently working on (van Dongen et al. 2021 BGA Conference Abstract). This work will be included in a next manuscript. In the current manuscript, we choose to focus on the discordant/concordant monozygotic twin analysis. At present, there is only one previous study on DNA methylation in smoking discordant monozygotic twins.

van Dongen, Jenny, et al. "Examining Causality of the Association Between Smoking and DNA Methylation." Behavior Genetics 51.6 (2021): 749-750.

7. Line 237 I would like to see some more discussion and evidence for or against their second potential explanation of Reverse causation. Given the complex nature of smoking behavior, is it at all likely that methylation drives smoking behavior?

Thank you for this suggestion, but there is not much more we can say about it based on the current results. As already mentioned in the discussion, it is not unlikely that DNA methylation in relevant brain regions contributes to smoking behavior, however there is currently no evidence that DNA methylation in blood cells may reflect causal effects on smoking behaviour.

Discussion line 271 “Importantly, effects of smoking on DNA methylation in brain cells have been hypothesized to contribute to addiction, but it is largely unknown to what extent addiction-related DNA methylation dynamics are captured in other tissues such as blood. Nicotinic receptors are especially abundant in the central and peripheral nervous system, but are also present in other tissues. In peripheral blood, nicotinic receptors are present on lymphocytes and polymorphonuclear cells, suggesting that EWA studies performed on blood cells might capture nicotine-reactive methylation patterns.”

We have also added the following sentences to the discussion:

“The data from monozygotic pairs discordant for former smoking indicate that methylation patterns are to a large extent reversible upon smoking cessation, which is in line with DNA methylation patterns being reactive to smoking. Nevertheless, our findings do not rule out the possibility that reverse causation (DNA methylation driving smoking behaviour) might contribute to the (maintenance of) smoking discordance in smoking discordant monozygotic twin pairs. Future analyses combining DNA methylation and genetic data from monozygotic and dizygotic twins may be applied to examine bi-directional causal associations between DNA methylation and smoking (Minică et al., 2018).Future analyses combining DNA methylation and genetic data from monozygotic and dizygotic twins may be applied to examine bi-directional causal associations between DNA methylation and smoking (Minica et al. 2018).”

8. Line 295-297 Can the authors quantify what fraction of individual based EWAS hits are identified here and how much of the difference between smokers and never smokers is accounted for by the observed differences in twin pairs discordant for smoking.

Thank you for this suggestion. We’ve added the following sentences to the discussion:

“These reflect only a small proportion, however, of all smoking-associated sites. In our analysis of 53 monozygotic twin pairs discordant for current versus never smoking, we detected 13 CpGs at genome-wide significance, which represent 0.5% of the total number of CpGs (2623) detected in the smoking meta-analysis of unrelated individuals (2433 current verus 6956 never smokers). The within-pair difference in smoking discordant monozygotic pairs was smaller compared to the effect size reported previously based on the comparison of unrelated smokers and non-smokers.”

We’ve also added the following to the results (see reply to reviewer 2, comment 3).

Page 7, line 164: “At 11 of the 13 CpGs, the methylation difference in smoking discordant monozygotic twin pairs was smaller (on average 19.0%, range=2.2-37.5%) compared to the methylation difference reported previously in an EWAS meta-analysis of smoking. At two CpGs, the methylation difference in smoking discordant monozygotic twins was larger (on average 24.6%).”

9. Topics for discussion that could have been included.

a) How substantial are the effect of misclassification of smoking status and of amount smoked (see related comment on assessment of smoking behavior).

Thank you for this suggestion. We’ve added the following to the discussion.

“Another limitation is that information on smoking was obtained through self-report. We previously described smoking misclassification in this cohort based on blood levels of cotinine, a biomarker for nicotine exposure, that has been measured in a subset of the cohort, which indicated a low classification rate. Plasma cotinine levels were available for 591 individuals classified as never smokers by self-report. Five of these individuals (0.8%) had cotinine levels > = 15 ng/mL, which is indicative of smoking, and thus indicates a misclassification of smoking status. In the current paper, we further showed that the correlation between cotinine levels in concordant current smoking pairs was similar to the correlation between self-reported number of cigarettes per day.”

b) Metabolism of nicotine to cotinine and related metabolites. Cotinine is a well-known biomarker of recent tobacco use/nicotine exposure. Methylation associations with cotinine levels have been published (Gupta R et al., 2019. PMID: 30611298; Lee MK et al., 2016 PMID: 27688819), which specifically address the relationship between nicotine and methylation (rather than all exposures in tobacco). Recent model organism work could also be cited (Peng et al., 2022, PMID 36119846).

This is a great suggestion, thank you for pointing out these references. We’ve added the following sentences to the discussion.

“Previous EWAS studies based on blood cotinine levels, as a biomarker for nicotine exposure, and based on a polygenic scores for nicotine metabolism, reported differentially methylated CpGs that largely overlap with CpGs found in EWAS of smoking status. Furthermore, E-cigarette based nicotine exposure of mice has been shown to cause DNA methylation changes in white blood cells.”

10. Methods: Line 338 – Are there any ethnicity effects? Please provide more detail on the pairs (in Table 1 or in text) on their socio-economic status, marital status and spousal tobacco use, and other behavioral traits that affect methylation (such as obesity, alcohol use, traumatic events). Did the birth weights and birth order of the smoking and non-smoking twins in a discordant pair differ?

Thank you for the suggestion. We’ve added the most widely available information: BMI and educational attainment as a measure of SES (note that alcohol use was not measured at the time of blood sampling, and that spousal information on smoking is also not always available). This showed a significant difference in BMI in pairs discordant for current/former smoking (higher BMI in the former smoking twin), and no significant differences in educational attainment that would survive multiple testing correction. Note that these twins took part in the Adult Netherlands Twin Register, and have not been followed since birth. Therefore, we have limited information related to birth for this group. In addition, we’ve added information on ancestry of the cohort in the methods section. A strength of the within-MZ pair design is that it is robust to effects of genetic variation and ancestry.

“The twin pairs were primarily of Dutch-European ancestry. For 753 of the 768 MZ pairs who are included in the current study, ancestry could be derived from principal components (PCs) calculated from genome-wide SNP array data that were available for the twins (750 pairs) or for both of their parents (3 pairs). According to the genotype data PCs, 4.5% of the pairs classify as ancestry outliers.”

11. Line 339 From how many families are the 3055 individuals? A comparison with DZ twins discordant for smoking would be a valuable addition, to tease apart effects of controlling for genetics and shared environment (in MZ pairs) versus some genetics and shared environment (in DZ pairs). The genetic risk for smoking could be controlled for in DZ pairs using polygenic risk scores for smoking behavior from the GSCAN consortium and/or UK Biobank.

As indicated in our response to comment 6 and 7, we are currently performing follow-up work that will be included in a next manuscript. In the current manuscript, we deliberately choose to focus on the data from monozygotic twins only, to fully showcase the value of data from monozygotic twins. We believe the current analysis of ‘just monozygotic twins’ is already a valuable addition to the existing literature because of the uniqueness and strength of the design and because (to our surprise) only one earlier EWAS study of smoking discordant monozygotic twins has been published so far (which had a much smaller sample size than ours and only included current/never discordant pairs).

12. Line 377 Smoking

Was the interview for smoking behavior more detailed than described here? How was a regular smoker defined? For example the given question "Did you ever smoke? (line 380) implies two answers: yes and no, and a yes answer is an ever smoker. It does not distinguish between current and former smokers. Please provide the actual questions used in a supplement or a link to an appropriate webpage with the items (in Dutch and English).

Thank you for this suggestion. To clarify, we have now added an English translation of the questions that were asked at blood draw to the supplement (additional file 8):

Additional file 8: Questions on smoking that were asked at blood draw

“Did you ever smoke?”

(1) no, I never smoked

(2) I’m a former smoker

2a How many years ago did you quit?

2b For how many years have you smoked?

2c How many cigarettes did you use to smoke per day?

(3) yes.

3a How many years have you smoked?

3b How many cigarettes do you smoke per day?

13. Given that the participants have answered multiple surveys, can you document the consistency of the responses over time. For example, how many who now reported having never smokers had reported smoking in an earlier survey?

How did you handle non-daily smokers? Are they considered non-smokers, smokers or excluded?

We indeed have used the longitudinal surveys to compare (and correct, if necessary) the smoking status at blood draw with smoking status from longitudinal NTR. In case of in consistencies, smoking status has been adjusted. Out of 9628 participants of the NTR biobank 1 project, smoking status at blood draw was consistent with longitudinal surveys for 97.1% of participants. The remaining 2.9% included the following cases: For 0.3% the status at blood draw has been adjusted based on checks against longitudinal surveys (e.g. the person reported to have never smoked at blood draw, while they reported smoking in longitudinal surveys). For 2%, blood status at blood draw was missing, and has been added from survey data. For 0.1%, smoking status was set to missing due to unresolvable inconsistencies (such participants are not included in our study because of missing smoking status). For 0.4%, the status at blood draw was missing and it could also not be retrieved from surveys due to insufficient information(these participants are also not included in our study). This is summarized in Author response table 1.

Author response table 1
check
FrequencyPercentValid PercentCumulative Percent
Valid1,00 no changes in original status934997.197.197.1
2,00 original status were adjusted based on data checks33.3.397.4
3,00 original status was missing, added from survey data1922.02.099.4
4,00 made missing based on data checks11.1.199.6
5,00 original status missing and insufficient data to add status43.4.4100.0
Total9628100.0100.0

Note, we also looked up specifically the MZ pairs discordant for current/never smoking. For all of these twins, smoking status obtained at blood draw was consistent with longitudinal surveys.

When individuals reported that they smoked regularly they were classified as a smoker, but there was no cutoff on the number of times an individual smoked. Based on the reported number of cigarettes per day in current smokers, we can see that the majority report smoking at least one cigarette per day, but a few report less than one cigarette per day, indicative of non-daily smoking.

14. Did you ask any pairs discordant for smoking why one had initiated smoking and the other did not., likewise why one quit and the other did not?

We did not, unfortunately. That would be interesting information to obtain indeed.

15. Are there any validation studies of smoking status, using biomarkers such as cotinine or carbon monoxide in the NTR?

Yes, plasma cotinine levels were measured in a subset of the samples (4099 NTR participants, described in Bot et al. 2013 https://doi.org/10.1016/j.jpsychores.2013.08.016). We previously described smoking misclassification among individuals with DNA methylation data and cotinine levels in van Dongen et al. 2018 (https://doi.org/10.1038/s41539-018-0020-2): “Plasma cotinine levels were available for 591 individuals classified as never smokers by self-report. Five of these individuals (0.8%) had cotinine levels > = 15 ng/mL, which is indicative of smoking, and thus indicates a misclassification of smoking status”.

Bot, M. et al. Exposure to secondhand smoke and depression and anxiety: A report from two studies in the Netherlands. J. Psychosom. Res. 75, 431–436 (2013).

van Dongen, J., Bonder, M.J., Dekkers, K.F. et al. DNA methylation signatures of educational attainment. npj Science Learn 3, 7 (2018). https://doi.org/10.1038/s41539-018-0020-2

16. Did smoking assess cigarette use only or all smoked tobacco products such as cigars and pipe use. What about smokeless tobacco/snus, e-cigs (rare at that time I believe) or nicotine replacement therapy as a source of nicotine?

We focused on cigarette smoking; we did not include questions on other ways to take in nicotine (like e-cigarettes or water pipe) or on cannabis use at the time of blood draw. Note that snus use is not common in the Netherlands (in particular not at the time when blood sampling was carried out) and that E-cigarettes have also increased in popularity only more recently. Furthermore, a survey on E-cigarette use in the Dutch population that was conducted in the time period when our study took place pointed out that individuals who reported the use of E-cigarettes primarily consisted of individuals who also smoked conventional cigarettes (Willemsen et al. De elektronische sigaret. Gebruik, gezondheidsrisico’s, en effectiviteit als stopmethode. Ned Tijdschr Geneeskd. 2015;159:A9259.).

17. Line 406 uses the wording " smoking discordant monozygotic twins". I would use twin pairs discordant for smoking OR smoking discordant pairs consistently as the pairs are discordant, not the individual twins.

Thank you for the suggestion. We have corrected this.

18. Line 432 The Bonferroni correction is overly conservative here as the CpG sites are correlated, so that should be taken into account.

We are aware that Bonferroni correction is stringent, and have now acknowledged this in the methods section:

“Statistical significance was assessed following stringent Bonferroni correction for the number of methylation sites tested (α = 0.05/411,169 = 1.2 x 10-7).”

Both Bonferroni correction and FDR are commonly applied in EWA studies. In our experience, Bonferroni correction, although perhaps slightly too stringent, produces the most robust results. For example, we previously reported that the replication rate across cohorts is much higher for Bonferroni-significant CpGs than it is for CpGs that meet FDR 5% (https://doi.org/10.1186/s13059-019-1878-x).

19. References: There is missing or erroneous information in the cited literature for example line 636 has no author names, lines 555, 562, 608, 633, 643, 576, 586, 605 etc no issue and/or page info, etc. No publisher (line 621) Please check all references.

Thank you for pointing this out. We have carefully checked and corrected the reference list.

[Editors’ note: what follows is the authors’ response to the second round of review.]

The manuscript has been improved but there are some remaining issues that need to be addressed, as outlined below:

– In their rebuttal, the authors mention cigs per day was measured at the present time for current smokers and for former smokers no time period was requested. This leaves it vague as to the time period the former smoker respondents provided, e.g. was it on average over the time smoked or during the last year they smoked? A brief acknowledgment should be included that the time period was not captured when assessing cigs per day for former smokers. Thus, it was left to the respondent to determine the time period, which could lead to variation in reporting by respondents.

Indeed, we asked participants the simple question “how many cigarettes did you use to smoke”. We have added a sentence on the inherit limitations of this question:

“Current smokers were asked how many years they smoked and how many cigarettes per day they smoked at present, while ex-smokers were asked how many years ago they quit, for how many years they smoked and how many cigarettes per day they smoked (note that the question on cigarettes per day to former smoker did not specify a particular time period, which may introduce variation in responses).”

– The authors corrected smoking status using longitudinal survey data but did not indicate that they made these corrections in the revised manuscript, only in the rebuttal. This detail should be included for transparency.

Please note that we did not feel that additions were necessary during revision because it was already stated in the first submission of the manuscript:

Page 9 “Data were checked for consistencies and missing data were completed by linking this information to data from surveys filled out close to the time of biobanking within the longitudinal survey study of the NTR.”

We had now added additional information that we included in the rebuttal letter to the supplementary information:

“Out of 9628 participants of the NTR biobank 1 project, smoking status at blood draw was consistent with longitudinal surveys for 97.1% of participants. The remaining 2.9% included the following cases: For 0.3% the status at blood draw has been adjusted based on checks against longitudinal surveys (e.g. the person reported to have never smoked at blood draw, while they reported smoking in longitudinal surveys). For 2%, blood status at blood draw was missing, and has been added from survey data. For 0.1%, smoking status was set to missing due to unresolvable inconsistencies (such participants are not included in our study because of missing smoking status). For 0.4%, the status at blood draw was missing and it could also not be retrieved from surveys due to insufficient information (these participants are also not included in our study).”

And refer to this on page 9 of the main text: “More details on these checks are described in Supplementary file 1.”

Related to this, the authors provide in their response and revised manuscript the cotinine levels for a large subset of the never smoking participants. For most, the cotinine levels were consistent with amounts expected for never smokers, but there were 5 (0.8%) persons that had cotinine levels indicative of a current smoker. Can the authors confirm corrections to smoking status were not made with cotinine? Or if they were corrected, this should be mentioned in the manuscript.

No, self-reported smoking data were not adjusted based on cotinine levels.

– Line 574 has a typo: ‘low classification rate’ should read ‘low misclassification rate’.

Thank you. We have corrected this.

– In the initial review, one of the reviewers asked for information about the statistical test for inflation of the GWAS results. The authors indicate in their response that a sentence was added to the manuscript about this inflation factor, but it appears this sentence may have been mistakenly omitted from the manuscript.

We apologize for the omission. The sentence has now been added to page 11.

– In the prior review, it was requested the authors remove the phrase "smoking discordant monozygotic twins" and instead refer to pairs. They made the requested revision, but then added it back into one of their revised sentences (line 369). This should be adjusted to address the reviewer's comment.

We apologize for this mistake. We now use discordant pairs throughout the manuscript.

https://doi.org/10.7554/eLife.83286.sa2

Article and author information

Author details

  1. Jenny van Dongen

    1. Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
    2. Amsterdam Public Health Research Institute, Amsterdam, Netherlands
    3. Amsterdam Reproduction and Development (AR&D) Research Institute, Amsterdam, Netherlands
    Contribution
    Conceptualization, Formal analysis, Writing - original draft
    For correspondence
    j.van.dongen@vu.nl
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-2063-8741
  2. Gonneke Willemsen

    1. Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
    2. Amsterdam Public Health Research Institute, Amsterdam, Netherlands
    Contribution
    Data curation, Funding acquisition, Writing - review and editing
    Competing interests
    No competing interests declared
  3. BIOS Consortium

    Contribution
    Data curation, Funding acquisition, Methodology, Software
    Competing interests
    No competing interests declared
    1. Bastiaan T Heijmans, Molecular Epidemiology Section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, Netherlands
    2. Peter AC ’t Hoen, Department of Human Genetics, Leiden University Medical Center, Leiden, Netherlands
    3. Joyce van Meurs, Department of Internal Medicine, Erasmus MC, Rotterdam, Netherlands
    4. Aaron Isaacs, Department of Genetic Epidemiology, Erasmus MC, Rotterdam, Netherlands
    5. Rick Jansen, Department of Psychiatry, VU University Medical Center, Neuroscience Campus Amsterdam, Amsterdam, Netherlands
    6. Lude Franke, Department of Genetics, University of Groningen, University Medical Centre Groningen, Groningen, Netherlands
    7. Dorret I Boomsma, Department of Biological Psychology, VU University Amsterdam, Neuroscience Campus Amsterdam, Amsterdam, Netherlands
    8. René Pool, Department of Biological Psychology, VU University Amsterdam, Neuroscience Campus Amsterdam, Amsterdam, Netherlands
    9. Jenny van Dongen, Department of Biological Psychology, VU University Amsterdam, Neuroscience Campus Amsterdam, Amsterdam, Netherlands
    10. Jouke J Hottenga, Department of Biological Psychology, VU University Amsterdam, Neuroscience Campus Amsterdam, Amsterdam, Netherlands
    11. Marleen MJ van Greevenbroek, Department of Internal Medicine and School for Cardiovascular Diseases (CARIM), Maastricht University Medical Center, Maastricht, Netherlands
    12. Coen DA Stehouwer, Department of Internal Medicine and School for Cardiovascular Diseases (CARIM), Maastricht University Medical Center, Maastricht, Netherlands
    13. Carla JH van der Kallen, Department of Internal Medicine and School for Cardiovascular Diseases (CARIM), Maastricht University Medical Center, Maastricht, Netherlands
    14. Casper G Schalkwijk, Department of Internal Medicine and School for Cardiovascular Diseases (CARIM), Maastricht University Medical Center, Maastricht, Netherlands
    15. Cisca Wijmenga, Department of Genetics, University of Groningen, University Medical Centre Groningen, Groningen, Netherlands
    16. Sasha Zhernakova, Department of Genetics, University of Groningen, University Medical Centre Groningen, Groningen, Netherlands
    17. Ettje F Tigchelaar, Department of Genetics, University of Groningen, University Medical Centre Groningen, Groningen, Netherlands
    18. P Eline Slagboom, Molecular Epidemiology Section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, Netherlands
    19. Marian Beekman, Molecular Epidemiology Section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, Netherlands
    20. Joris Deelen, Molecular Epidemiology Section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, Netherlands
    21. Diana van Heemst, Department of Gerontology and Geriatrics, Leiden University Medical Center, Leiden, Netherlands
    22. Jan H Veldink, Department of Neurology, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht, Netherlands
    23. Leonard H van den Berg, Department of Neurology, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht, Netherlands
    24. Cornelia M van Duijn, Department of Genetic Epidemiology, Erasmus MC, Rotterdam, Netherlands
    25. Bert A Hofman, Department of Epidemiology, Erasmus MC, Rotterdam, Netherlands
    26. André G Uitterlinden, Department of Internal Medicine, Erasmus MC, Rotterdam, Netherlands
    27. P Mila Jhamai, Department of Internal Medicine, Erasmus MC, Rotterdam, Netherlands
    28. Michael Verbiest, Department of Internal Medicine, Erasmus MC, Rotterdam, Netherlands
    29. H Eka D Suchiman, Molecular Epidemiology Section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, Netherlands
    30. Marijn Verkerk, Department of Internal Medicine, Erasmus MC, Rotterdam, Netherlands
    31. Ruud van der Breggen, Molecular Epidemiology Section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, Netherlands
    32. Jeroen van Rooij, Department of Internal Medicine, Erasmus MC, Rotterdam, Netherlands
    33. Nico Lakenberg, Molecular Epidemiology Section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, Netherlands
    34. Hailiang Mei, Sequence Analysis Support Core, Leiden University Medical Center, Leiden, Netherlands
    35. Maarten van Iterson, Molecular Epidemiology Section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, Netherlands
    36. Michiel van Galen, Department of Human Genetics, Leiden University Medical Center, Leiden, Netherlands
    37. Jan Bot, SURFsara, Amsterdam, Netherlands
    38. Dasha V Zhernakova, Department of Genetics, University of Groningen, University Medical Centre Groningen, Groningen, Netherlands
    39. Peter van ’t Hof, Sequence Analysis Support Core, Leiden University Medical Center, Leiden, Netherlands
    40. Patrick Deelen, Department of Genetics, University of Groningen, University Medical Centre Groningen, Groningen, Netherlands
    41. Irene Nooren, SURFsara, Amsterdam, Netherlands
    42. Bastiaan T Heijmans, Molecular Epidemiology Section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, Netherlands
    43. Matthijs Moed, Molecular Epidemiology Section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, Netherlands
    44. Martijn Vermaat, Department of Human Genetics, Leiden University Medical Center, Leiden, Netherlands
    45. René Luijk, Molecular Epidemiology Section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, Netherlands
    46. Marc Jan Bonder, Department of Genetics, University of Groningen, University Medical Centre Groningen, Groningen, Netherlands
    47. Freerk van Dijk, Genomics Coordination Center, University Medical Center Groningen, University of Groningen, Groningen, Netherlands
    48. Wibowo Arindrarto, Sequence Analysis Support Core, Leiden University Medical Center, Leiden, Netherlands
    49. Szymon M Kielbasa, Medical Statistics Section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, Netherlands
    50. Morris A Swertz, Genomics Coordination Center, University Medical Center Groningen, University of Groningen, Groningen, Netherlands
    51. Erik W van Zwet, Medical Statistics Section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, Netherlands
    52. Peter-Bram’t Hoen, Department of Human Genetics, Leiden University Medical Center, Leiden, Netherlands
  4. Eco JC de Geus

    1. Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
    2. Amsterdam Public Health Research Institute, Amsterdam, Netherlands
    Contribution
    Supervision, Funding acquisition, Writing - review and editing
    Competing interests
    No competing interests declared
  5. Dorret I Boomsma

    1. Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
    2. Amsterdam Public Health Research Institute, Amsterdam, Netherlands
    3. Amsterdam Reproduction and Development (AR&D) Research Institute, Amsterdam, Netherlands
    Contribution
    Supervision, Funding acquisition, Writing - review and editing
    Competing interests
    No competing interests declared
  6. Michael C Neale

    Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, United States
    Contribution
    Supervision, Funding acquisition, Writing - review and editing
    Competing interests
    No competing interests declared

Funding

National Institute on Drug Abuse (DA049867)

  • Michael C Neale

ZonMw (NWO-Groot 480-15-001/674)

  • Gonneke Willemsen
  • Eco JC de Geus
  • Dorret I Boomsma

The funders had no role in study design, data collection, and interpretation, or the decision to submit the work for publication.

Acknowledgements

NTR warmly thanks all participants. We thank Conor Dolan for providing feedback on the manuscript. We acknowledge the contributions of the investigators of the BIOS consortium (Supplementary file 9): Bastiaan T Heijmans, Peter AC ’t Hoen, Joyce van Meurs, Aaron Isaacs, Rick Jansen, Lude Franke, René Pool, Jouke J Hottenga, Marleen MJ van Greevenbroek, Coen DA Stehouwer, Carla JH van der Kallen, Casper G Schalkwijk, Cisca Wijmenga, Sasha Zhernakova, Ettje F Tigchelaar, P Eline Slagboom, Marian Beekman, Joris Deelen, Diana van Heemst, Jan H Veldink, Leonard H van den Berg, Cornelia M van Duijn, Bert A Hofman, Aaron Isaacs, André G Uitterlinden, P Mila Jhamai, Michael Verbiest, H Eka D Suchiman, Marijn Verkerk, Ruud van der Breggen, Jeroen van Rooij, Nico Lakenberg, Hailiang Mei, Maarten van Iterson, Michiel van Galen, Jan Bot, Dasha V Zhernakova, Rick Jansen, Peter van ’t Hof, Patrick Deelen, Irene Nooren, Matthijs Moed, Martijn Vermaat, René Luijk, Marc Jan Bonder, Freerk van Dijk, Michiel van Galen, Wibowo Arindrarto, Szymon M Kielbasa, Morris A Swertz, and Erik W van Zwet.

Ethics

Informed consent was obtained from all participants. The study was approved by the Central Ethics Committee on Research Involving Human Subjects of the VU University Medical Centre, Amsterdam, an Institutional Review Board certified by the U.S. Office of Human Research Protections (IRB number IRB00002991 under Federal-wide Assurance – FWA00017598; IRB/institute code, NTR 03-180).

Senior Editor

  1. W Kimryn Rathmell, Vanderbilt University Medical Center, United States

Reviewing Editor

  1. Melinda Aldrich, Vanderbilt University Medical Center, United States

Reviewers

  1. Melinda Aldrich, Vanderbilt University Medical Center, United States
  2. Jeff Craig, Deakin University, Australia
  3. Jaakko Kaprio, University of Helsinki/FIMM, Finland

Version history

  1. Preprint posted: August 19, 2022 (view preprint)
  2. Received: September 6, 2022
  3. Accepted: August 8, 2023
  4. Accepted Manuscript published: August 10, 2023 (version 1)
  5. Version of Record published: September 14, 2023 (version 2)

Copyright

© 2023, van Dongen et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 537
    Page views
  • 112
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Jenny van Dongen
  2. Gonneke Willemsen
  3. BIOS Consortium
  4. Eco JC de Geus
  5. Dorret I Boomsma
  6. Michael C Neale
(2023)
Effects of smoking on genome-wide DNA methylation profiles: A study of discordant and concordant monozygotic twin pairs
eLife 12:e83286.
https://doi.org/10.7554/eLife.83286

Further reading

    1. Biochemistry and Chemical Biology
    2. Epidemiology and Global Health
    Takashi Sasaki, Yoshinori Nishimoto ... Yasumichi Arai
    Research Article

    Background: High levels of circulating adiponectin are associated with increased insulin sensitivity, low prevalence of diabetes, and low body mass index (BMI); however, high levels of circulating adiponectin are also associated with increased mortality in the 60-70 age group. In this study, we aimed to clarify factors associated with circulating high-molecular-weight (cHMW) adiponectin levels and their association with mortality in the very old (85-89 years old) and centenarians.

    Methods: The study included 812 (women: 84.4%) for centenarians and 1,498 (women: 51.7%) for the very old. The genomic DNA sequence data were obtained by whole genome sequencing or DNA microarray-imputation methods. LASSO and multivariate regression analyses were used to evaluate cHMW adiponectin characteristics and associated factors. All-cause mortality was analyzed in three quantile groups of cHMW adiponectin levels using Cox regression.

    Results: The cHMW adiponectin levels were increased significantly beyond 100 years of age, were negatively associated with diabetes prevalence, and were associated with SNVs in CDH13 (p = 2.21 × 10-22) and ADIPOQ (p = 5.72 × 10-7). Multivariate regression analysis revealed that genetic variants, BMI, and high-density lipoprotein cholesterol (HDLC) were the main factors associated with cHMW adiponectin levels in the very old, whereas the BMI showed no association in centenarians. The hazard ratios for all-cause mortality in the intermediate and high cHMW adiponectin groups in very old men were significantly higher rather than those for all-cause mortality in the low level cHMW adiponectin group, even after adjustment with BMI. In contrast, the hazard ratios for all-cause mortality were significantly higher for high cHMW adiponectin groups in very old women, but were not significant after adjustment with BMI.

    Conclusions: cHMW adiponectin levels increased with age until centenarians, and the contribution of known major factors associated with cHMW adiponectin levels, including BMI and HDLC, varies with age, suggesting that its physiological significance also varies with age in the oldest old.

    Funding: This study was supported by grants from the Ministry of Health, Welfare, and Labour for the Scientific Research Projects for Longevity; a Grant-in-Aid for Scientific Research (No 21590775, 24590898, 15KT0009, 18H03055, 20K20409, 20K07792, 23H03337) from the Japan Society for the Promotion of Science; Keio University Global Research Institute (KGRI), Kanagawa Institute of Industrial Science and Technology (KISTEC), Japan Science and Technology Agency (JST) Research Complex Program 'Tonomachi Research Complex' Wellbeing Research Campus: Creating new values through technological and social innovation (JP15667051), the Program for an Integrated Database of Clinical and Genomic Information from the Japan Agency for Medical Research and Development (No. 16kk0205009h001, 17jm0210051h0001, 19dk0207045h0001); the medical-welfare-food-agriculture collaborative consortium project from the Japan Ministry of Agriculture, Forestry, and Fisheries; and the Biobank Japan Program from the Ministry of Education, Culture, Sports, and Technology.

    1. Epidemiology and Global Health
    Charumathi Sabanayagam, Feng He ... Ching Yu Cheng
    Research Article Updated

    Background:

    Machine learning (ML) techniques improve disease prediction by identifying the most relevant features in multidimensional data. We compared the accuracy of ML algorithms for predicting incident diabetic kidney disease (DKD).

    Methods:

    We utilized longitudinal data from 1365 Chinese, Malay, and Indian participants aged 40–80 y with diabetes but free of DKD who participated in the baseline and 6-year follow-up visit of the Singapore Epidemiology of Eye Diseases Study (2004–2017). Incident DKD (11.9%) was defined as an estimated glomerular filtration rate (eGFR) <60 mL/min/1.73 m2 with at least 25% decrease in eGFR at follow-up from baseline. A total of 339 features, including participant characteristics, retinal imaging, and genetic and blood metabolites, were used as predictors. Performances of several ML models were compared to each other and to logistic regression (LR) model based on established features of DKD (age, sex, ethnicity, duration of diabetes, systolic blood pressure, HbA1c, and body mass index) using area under the receiver operating characteristic curve (AUC).

    Results:

    ML model Elastic Net (EN) had the best AUC (95% CI) of 0.851 (0.847–0.856), which was 7.0% relatively higher than by LR 0.795 (0.790–0.801). Sensitivity and specificity of EN were 88.2 and 65.9% vs. 73.0 and 72.8% by LR. The top 15 predictors included age, ethnicity, antidiabetic medication, hypertension, diabetic retinopathy, systolic blood pressure, HbA1c, eGFR, and metabolites related to lipids, lipoproteins, fatty acids, and ketone bodies.

    Conclusions:

    Our results showed that ML, together with feature selection, improves prediction accuracy of DKD risk in an asymptomatic stable population and identifies novel risk factors, including metabolites.

    Funding:

    This study was supported by the National Medical Research Council, NMRC/OFLCG/001/2017 and NMRC/HCSAINV/MOH-001019-00. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.