Effects of smoking on genome-wide DNA methylation profiles: A study of discordant and concordant monozygotic twin pairs

  1. Jenny van Dongen  Is a corresponding author
  2. Gonneke Willemsen
  3. BIOS Consortium
  4. Eco JC de Geus
  5. Dorret I Boomsma
  6. Michael C Neale
  1. Department of Biological Psychology, Vrije Universiteit Amsterdam, Netherlands
  2. Amsterdam Public Health Research Institute, Netherlands
  3. Amsterdam Reproduction and Development (AR&D) Research Institute, Netherlands
  4. Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, United States

Abstract

Background:

Smoking-associated DNA methylation levels identified through epigenome-wide association studies (EWASs) are generally ascribed to smoking-reactive mechanisms, but the contribution of a shared genetic predisposition to smoking and DNA methylation levels is typically not accounted for.

Methods:

We exploited a strong within-family design, that is, the discordant monozygotic twin design, to study reactiveness of DNA methylation in blood cells to smoking and reversibility of methylation patterns upon quitting smoking. Illumina HumanMethylation450 BeadChip data were available for 769 monozygotic twin pairs (mean age = 36 years, range = 18–78, 70% female), including pairs discordant or concordant for current or former smoking.

Results:

In pairs discordant for current smoking, 13 differentially methylated CpGs were found between current smoking twins and their genetically identical co-twin who never smoked. Top sites include multiple CpGs in CACNA1D and GNG12, which encode subunits of a calcium voltage-gated channel and G protein, respectively. These proteins interact with the nicotinic acetylcholine receptor, suggesting that methylation levels at these CpGs might be reactive to nicotine exposure. All 13 CpGs have been previously associated with smoking in unrelated individuals and data from monozygotic pairs discordant for former smoking indicated that methylation patterns are to a large extent reversible upon smoking cessation. We further showed that differences in smoking level exposure for monozygotic twins who are both current smokers but differ in the number of cigarettes they smoke are reflected in their DNA methylation profiles.

Conclusions:

In conclusion, by analysing data from monozygotic twins, we robustly demonstrate that DNA methylation level in human blood cells is reactive to cigarette smoking.

Funding:

We acknowledge funding from the National Institute on Drug Abuse grant DA049867, the Netherlands Organization for Scientific Research (NWO): Biobanking and Biomolecular Research Infrastructure (BBMRI-NL, NWO 184.033.111) and the BBRMI-NL-financed BIOS Consortium (NWO 184.021.007), NWO Large Scale infrastructures X-Omics (184.034.019), Genotype/phenotype database for behaviour genetic and genetic epidemiological studies (ZonMw Middelgroot 911-09-032); Netherlands Twin Registry Repository: researching the interplay between genome and environment (NWO-Groot 480-15-001/674); the Avera Institute, Sioux Falls (USA), and the National Institutes of Health (NIH R01 HD042157-01A1, MH081802, Grand Opportunity grants 1RC2 MH089951 and 1RC2 MH089995); epigenetic data were generated at the Human Genomics Facility (HuGe-F) at ErasmusMC Rotterdam. Cotinine assaying was sponsored by the Neuroscience Campus Amsterdam. DIB acknowledges the Royal Netherlands Academy of Science Professor Award (PAH/6635).

Editor's evaluation

This study presents valuable findings regarding how smoking can leave a lasting imprint on the human genome. The twin pairs study design is unique, and the methods applied by the authors are solid, providing an excellent starting point for large translational studies with rigorous laboratory approaches. This work will be of interest to geneticists and genetic epidemiologists.

https://doi.org/10.7554/eLife.83286.sa0

eLife digest

The genetic information of people who smoke present distinctive characteristics. In particular, previous research has revealed differences in patterns of DNA methylation, a type of chemical modification that helps cells switch certain genes on or off. However, most of these studies could not establish for sure whether these changes were caused by smoking, predisposed individuals to smoke, or were driven by underlying genetic variation in the DNA sequence itself.

To investigate this question, van Dongen et al. examined DNA methylation data from the blood cells of over 700 pairs of identical twins. These individuals share the exact same genetic information, making it possible to better evaluate the impact of lifestyle on DNA modifications.

The analyses identified differences in methylation at 13 DNA locations in pairs of twins where one was a current smoker and their sibling had never smoked. Two of the genes code for proteins involved in the response to nicotine, the primary addictive chemical in cigarette smoke. The differences were smaller if one of the twins had stopped smoking, suggesting that quitting can help to reverse some of these changes.

These findings confirm that DNA methylation in blood cells is influenced by cigarette smoke, which could help to better understand smoking-associated diseases. They also demonstrate how useful identical twins studies can be to identify methylation changes that are markers of lifestyle.

Introduction

Epigenome-wide association studies (EWASs) have identified robust differences in DNA methylation between smokers and non-smokers (Gao et al., 2015; Heikkinen et al., 2022). In a meta-analysis of blood-based DNA methylation studies (N = 15,907 individuals; the largest EWAS of smoking to date), 2623 CpG sites passed the Bonferroni threshold for genome-wide significance in a comparison of current and never smokers (Joehanes et al., 2016). Based on comparison with loci identified in large genome-wide association studies (GWASs), differentially methylated sites were significantly enriched in genes implicated in well-established smoking-associated diseases, such as cancer, cardiovascular disease, inflammatory disease, and lung disease, as well as in genes associated with schizophrenia and educational attainment (Joehanes et al., 2016). It has been hypothesized that smoking-induced methylation changes might also contribute to the addictive effect of smoking (Zillich et al., 2022).

Importantly, smoking-associated DNA methylation levels, as established in human EWA studies, may reflect different mechanisms. They may reflect causal effects of smoking on methylation, causal effects of methylation on smoking behaviour, methylation differences associated with epiphenomena of other exposures that correlate with smoking e.g. alcohol use (Liu et al., 2018) , or they may reflect a shared genetic predisposition to smoking and methylation level. To distinguish these different mechanisms require incisive study designs (Vink et al., 2017). Establishing whether methylation levels in smokers revert to levels of never smokers upon smoking cessation is a first step. A previous study of 2648 former smokers with cross-sectional methylation data from the Framingham Heart Study suggested that methylation levels at most CpGs return to the level of never smokers within 5 years after quitting smoking, but 36 CpGs were still differentially methylated in former smokers, who had quit smoking for 30 years (Joehanes et al., 2016). In the large EWAS meta-analysis of smoking (Joehanes et al., 2016), 185 CpGs were differentially methylated between former and never smokers (compared to 2623 between current and never smokers). In addition, differences between former and never smokers were smaller than between current and never smokers. Reversible DNA methylation patterns may suggest that DNA methylation is reactive to smoking. However, it is also possible that the different methylation level in current smokers reflects a higher genetic liability to smoking behaviour (that makes them more likely to initiate and keep smoking). Similarly, differences between former smokers and never smokers could reflect that smoking has caused a persistent methylation change but can also be driven by genetic factors.

In population-based studies, smoking cases and non-smoking individuals may differ on many aspects, including their genetic predisposition to smoking. On the other hand, monozygotic twins are genetically identical (except for de novo mutations, but these are rare [Jonsson et al., 2021; Ouwens et al., 2018]), share a womb, and are matched on sex, age, and childhood environment. They have been exposed to similar prenatal conditions, which may include second hand smoke from smoking mothers and others. Differences in prenatal environment of monozygotic twins due to for instance unequal vascular supply are also recognized (Hall, 1996; Martin et al., 1997), although it remains to be investigated to what extent the impact of prenatal smoke exposure might differ between monozygotic twins. Smoking discordant monozygotic twin pairs offer a unique opportunity to assess smoking-reactive DNA methylation patterns (Leeuwen et al., 2007; Vink et al., 2017). Despite the large number of previous population-based smoking EWASs, only one previous study compared genome-wide DNA methylation in smoking discordant monozygotic twin pairs (Allione et al., 2015). This study analysed whole-blood Illumina 450k array methylation data from 20 discordant pairs, and reported 22 top loci, many of which had been previously associated with cigarette smoking in previous studies. However, following the correction for multiple testing, none of the differentially methylated loci were statistically significant, and this previous twin study did not examine reversibility of smoking effects, that is, where methylation status changes again following smoking cessation.

Here, we analyse unique data from a large cohort of monozygotic twin pairs. This cohort is sufficiently large to include current smoking discordant and concordant pairs, as well as pairs discordant for former smoking (Figure 1). These groups allow identification of loci that are reactive to smoking, and examination of the extent to which effects are reversible upon quitting smoking. Monozygotic pairs in which both twins are current smokers, but who differ in quantity smoked, enable examination of the effects of smoking intensity. Finally, concordant pairs who never smoked allow assessment of the amount of DNA methylation variation at smoking-reactive loci that is due to non-genetic sources of variation other than smoking. In secondary enrichment analyses, we examined whether smoking-reactive methylation patterns are enriched (1) at loci detected in previous EWASs of other traits and exposures, (2) at loci detected in a previous large GWAS meta-analysis of smoking initiation (Liu et al., 2019) – these loci are presumed to have a causal effect on smoking behaviour, and (3) within Gene Ontology and Kegg pathways. Finally, we examined the relationship between DNA methylation and RNA transcript levels in blood for smoking-reactive loci.

DNA methylation analysis in smoking discordant and smoking concordant monozygotic twin pairs.

Blood DNA methylation profiles (Illumina 450k array) from six groups of monozygotic twin pairs were analysed.

Methods

Participants

In the Netherlands Twin Register (NTR), DNA methylation data are available for 3089 whole-blood samples from 3057 individuals in twin families, as described in detail previously (van Dongen et al., 2016). The samples were obtained from twins and family members, who participated in NTR longitudinal survey studies (Ligthart et al., 2019) and the NTR biobank project (Willemsen et al., 2010). In the current study, methylation data from monozygotic twin pairs were analysed. Among 768 monozygotic twin pairs with genome-wide methylation data and information on smoking and covariates, we identified the following discordant pairs: 53 discordant pairs, in which one twin was a current smoker at blood draw and the other never smoked, 72 discordant pairs, in which one twin was a former smoker at blood draw and the other never smoked, 66 discordant pairs of which one twin was a former smoker and the other a current smoker at blood draw. In addition, we identified the following concordant pairs: 83 twin pairs concordant for current smoking, 88 twin pairs concordant for former smoking, and 406 concordant twin pairs who never smoked. A flowchart is provided in Figure 2. Informed consent was obtained from all participants. The twin pairs were primarily of Dutch-European ancestry. For 753 of the 768 MZ pairs who are included in the current study, ancestry could be derived from principal components (PCs) calculated from genome-wide Single Nucleotide Polymorphism (SNP) array data that were available for the twins (750 pairs) or for both of their parents (3 pairs). According to the genotype data PCs, 4.5% of the pairs classify as ancestry outliers.The study was approved by the Central Ethics Committee on Research Involving Human Subjects of the VU University Medical Centre, Amsterdam, an Institutional Review Board certified by the U.S. Office of Human Research Protections (IRB number IRB00002991 under Federal-wide Assurance – FWA00017598; IRB/institute code, NTR 03-180).

Study flowchart.

Peripheral blood DNA methylation and cell counts

Genome-wide DNA methylation in whole blood was measured by the Human Genomics facility (HugeF) of ErasmusMC, the Netherlands (http://www.glimdna.org/). DNA methylation was assessed with the Infinium HumanMethylation450 BeadChip Kit (Illumina, San Diego, CA, USA). Genomic DNA (500 ng) from whole blood was bisulfite treated using the Zymo EZ DNA Methylation kit (Zymo Research Corp, Irvine, CA, USA), and 4 μl of bisulfite-converted DNA was measured on the Illumina 450k array (Bibikova et al., 2011) following the manufacturer’s protocol. A custom pipeline for quality control and normalization of the methylation data was developed by the BIOS consortium. First, sample quality control was performed using MethylAid (van Iterson et al., 2014). Next, probe filtering was applied with DNAmArray (Sinke et al., 2019) to remove: ambiguously mapped probes (Chen et al., 2013), probes with a detection p-value >0.01, or bead number <3, or raw signal intensity of zero. After these probe filtering steps, probes and samples with a success rate <95% were removed. Next, the DNA methylation data were normalized using functional normalization (Fortin et al., 2014), as implemented in DNAmArray (Sinke et al., 2019) using the cohort-specific optimum number of control probe-based PCs. Probes containing an SNP, identified in a DNA sequencing project in the Dutch population (The Genome of the Netherlands Consortium, 2014), within the CpG site (at the C or G position) were excluded irrespective of minor allele frequency, and only autosomal probes were analysed, leading to a total number of 411,169 methylation sites. The following subtypes of white blood cells were counted in blood samples: neutrophils, lymphocytes, monocytes, eosinophils, and basophils (Willemsen et al., 2010).

Smoking and other phenotypes

Information on smoking behaviour was obtained by interview during the home visit for blood collection as part of the NTR biobank project (2004–2008 and 2010–2011). The questions are included in Supplementary file 1. Participants were asked: ‘Did you ever smoke?’, with answer categories: (1) no, I never smoked, (2) I’m a former smoker, and (3) yes. Current smokers were asked how many years they smoked and how many cigarettes per day they smoked at present, while ex-smokers were asked how many years ago they quit, for how many years they smoked and how many cigarettes per day they smoked (note that the question on cigarettes per day to former smoker did not specify a particular time period, which may introduce variation in responses). Data were checked for consistencies and missing data were completed by linking this information to data from surveys filled out close to the time of biobanking within the longitudinal survey study of the NTR. More details on these checks are described in Supplementary file 1. Packyears were calculated as the (number of cigarettes smoked per day/20) × number of years smoked. Plasma cotinine level measurements have been described previously (Bot et al., 2013). Body mass index (BMI) was obtained at blood draw. Educational attainment was obtained in multiple longitudinal surveys and was defined as the highest completed level of education at the age of 25 or higher. It was classified on a 7-point scale: 1 = primary school only, 2 = lower vocational schooling, 3 = lower secondary schooling (general), 4 = intermediate vocational schooling, 5 = intermediate/higher secondary schooling (general), 6 = higher vocational schooling, 7 = university.

Statistical analyses

Overview and hypotheses

All analyses were performed in R (R Development Core Team, 2013). Analyses were performed in six groups of monozygotic twin pairs (Figure 1). To identify DNA methylation differences in smoking-discordant monozygotic twin pairs, we first compared the twin pairs, in which one twin had never smoked, and the other was a current smoker at the time of blood sampling. Second, to identify which of these DNA methylation differences might be reversible, we analysed data from (1) monozygotic pairs in which one twin had never smoked, and the other was a former smoker at the time of blood sampling, (2) from monozygotic pairs in which one twin was a current smoker, and the other was a former smoker at the time of blood sampling, and (3) from monozygotic pairs who were both former smokers. Third, to quantify within-pair methylation differences that occur by chance alone, we compared the within-pair differences monozygotic twins concordant for never having smoked. Forth, data from monozygotic twins concordant for current smoking were analysed to examine the effects of smoking intensity. Our hypotheses were as follows: (1) if DNA methylation level is reactive to cigarette smoking, methylation differences will be present between smokers and non-smokers after ruling out genetic differences, that is in smoking-discordant monozygotic twin pairs, and these differences will be larger than in monozygotic pairs concordant for never smoking, (2) if DNA methylation patterns are reversible upon quitting smoking, methylation differences (∆M) in monozygotic pairs will show the following pattern: ∆M discordant current-never > ∆M discordant current-former and ∆M discordant former-never > ∆M concordant never, (3) a correlation between time since quitting smoking and ∆M in pairs discordant for former smoking is consistent with a gradual reversibility of methylation levels upon quitting smoking, and (4) a correlation between ∆M and the difference in number of cigarettes smoked per day in smoking concordant pairs is consistent with smoking-reactive methylation patterns that show a dose–response relationship with amount of cigarettes smoked.

Epigenome-wide association study

In the entire dataset of 3089 blood samples, we used linear regression analysis to correct the DNA methylation levels (β-values) for commonly used covariates (van Rooij et al., 2019), including HM450k array row, bisulphite plate (dummy-coding) and white blood cell percentages (% neutrophils, % monocytes, and % eosinophils). White blood cell percentages were included to account for variation in cellular composition between whole-blood samples. Lymphocyte percentage was not included in models because it was strongly correlated with neutrophil percentage (r = −0.93), and basophil percentage was not included because it showed little variation between subjects, with a large number of subjects having 0% of basophils. We did not adjust for sex and age, because monozygotic twins have the same sex and age. The residuals from this regression analysis were used in the within-pair EWAS analyses. Specifically, the residuals were used as input for paired t-tests to compare the methylation of the smoking twins with that of their non-smoking co-twins. Similarly, paired t-tests were applied to data from smoking concordant pairs. Statistical significance was assessed following stringent Bonferroni correction for the number of methylation sites tested (α = 0.05/411,169 = 1.2 × 10−7). For each EWAS analysis, the R package Bacon was used to compute the Bayesian inflation factor (van Iterson et al., 2017). A previous power analysis for DNA methylation studies in discordant monozygotic twins indicated that with 50 discordant pairs, there is 80% power to detect methylation differences of 15% (at epigenome-wide significance; that is following multiple testing correction) (Tsai and Bell, 2015). Power quickly drops for smaller effect sizes; for example, with 50 discordant pairs, the power to detect a 10% methylation difference is 10% and the power to detect a methylation difference of 5% approaches alpha (Tsai and Bell, 2015). We tested for within-pair differences in demographics (e.g. BMI, educational attainment) and smoking characteristics (e.g. amount of cigarettes per day) with paired t-tests (continuous data) and Wilcoxon Signed Ranks tests (ordinal data) in R.

Dose–response relationships

For significant CpGs from the EWAS of discordant monozygotic twin pairs, we examined dose–response relationships in smoking concordant pairs (both twins were current smokers) by correlating within-pair differences in DNA methylation with within-pair differences in smoking packyears and cigarettes per day. All correlations reported in this paper are Pearson correlations. Secondly, in twin pairs discordant for former smoking (one twin never smoked and the other one is a former smoker), we correlated and plotted within-pair differences in DNA methylation with the time since quitting smoking to assess the relationship between time since quitting smoking and reversal of methylation differences within monozygotic twin pairs.

Enrichment analyses

We used the EWAS Toolkit from the EWAS atlas (Li et al., 2019) to perform enrichment analyses of Gene Ontology Terms, Kegg pathways, and previously associated traits among top sites from the EWAS in discordant monozygotic twin pairs (current versus never). With the trait enrichment tool of the EWAS analysis, we tested for enrichment of all traits (680) that were present in the atlas on 26 April 2022. Because the software requires a minimum of 20 input CpGs, we selected the top 20 CpGs from the EWAS in discordant monozygotic pairs for the enrichment analyses using the EWAS toolkit.

To study overlap of EWAS signal with genetic findings for smoking, we compared our EWAS results against GWAS results from the largest GWAS meta-analysis of smoking phenotypes. This is the meta-analysis of smoking initiation by the GWAS and Sequencing Consortium of Alcohol and Nicotine use (GSCAN) (Liu et al., 2019). We obtained leave-one out meta-analysis results with NTR excluded. From the GWAS, we selected all SNPs with a p-value <5.0 × 10−8 and determined the distance of each Illumina 450k methylation site to each SNP. We then tested whether methylation sites within 1 Mb of genome-wide significant SNPs from the GWAS showed a stronger signal in the within-pair EWAS of smoking discordant monozygotic pairs compared to other genome-wide methylation sites, by regressing the EWAS test statistics on a variable (GWAS locus) indicating if the CpG is located within a 1 Mb window from SNPs associated with smoking initiation (1 = yes, 0 = no):

t=Intercept+βGWASlocus*GWASlocus

where t represents the absolute t-statistic from the paired t-test comparing within-pair methylation differences in smoking discordant pairs and βGWASlocus represents the estimate for GWASlocus, that is the change in the t-test statistic associated with a one-unit change in the variable GWASlocus (e.g. being within 1 Mb of SNPs associated with smoking initiation). For each enrichment test, bootstrap standard erors were computed with 2000 bootstraps with the R-package ‘simpleboot’.

Gene expression

For significant CpGs from the EWAS of discordant monozygotic twin pairs (current versus never), we examined whether the DNA methylation was associated with gene expression levels in cis. To this end, we used an independent whole-blood RNA-sequencing dataset from the Biobank-based Integrative Omics Study (BIOS) consortium that did not include NTR, and tested associations between genome-wide CpGs and transcripts in cis (<250 kb), as described in detail previously (the BIOS Consortium et al., 2017). In short, methylation and expression levels in whole-blood samples (n = 2101) were quantified with Illumina Infinium HumanMethylation450 BeadChip arrays and with RNA-seq (2 × 50 bp paired-end, Hiseq2000, >15 M read pairs per sample). For each target CpG (epigenome-wide significant differentially methylated positions [DMPs]), we identified transcripts in cis (<250 kb), for which methylation levels were significantly associated with gene expression levels at the experiment-wide threshold applied by this study (False Discovery Rate (FDR) <5.0%), after regressing out methylation Quantitative Trait Locus (mQTL) and expression Quantitative Trait Locus (eQTL) effects. We also examined whether significant CpGs from the EWAS of discordant monozygotic twin pairs mapped to genes that were previously reported to be differentially expressed in monozygotic pairs of which one twin never smoked, and the other was a current smoker at the time of blood sampling (based on Affymetrix U219 array data; n = 56 pairs; note: the 53 discordant pairs included in the current study of DNA methylation are a subset of the 56 discordant pairs included in the study of gene expression) (Vink et al., 2017).

Results

Descriptives of the smoking-discordant and concordant monozygotic twin pairs are given in Table 1. In twin pairs discordant for current smoking status (i.e. one twin a current smoker at the time of blood sampling and the other never initiated regular smoking, N = 53 pairs, mean age = 33 years), the smoking twin on average smoked 8.9 cigarettes per day at the time of blood sampling, and had an average smoking history equivalent to 6.8 packyears. The EWAS analysis in pairs discordant for current smoking status identified 13 epigenome-wide significant (p < 1.20 × 10−7) DMPs (Figure 3a). Genome-wide test statistics were not inflated (Supplementary file 2). Absolute differences in methylation ranged from 2.5% to 13% (0.025–0.13 on the methylation β-value scale), with a mean of 5.4% (Table 2). Eight of the 13 CpGs (61.5%) showed lower methylation in the current smoking twins compared to their non-smoking twins. Pair-level methylation β-values are shown in Figure 3—figure supplement 1 and illustrate large consistency in the direction of effect. For example, at top CpG site cg05575921, for 51 out of the 53 pairs, the smoking twin had a lower methylation level than the non-smoking twin. At 11 of the 13 CpGs, the methylation difference in smoking discordant monozygotic twin pairs was smaller (on average 19.0%, range = 2.2–37.5%) compared to the methylation difference reported previously in an EWAS meta-analysis of smoking (Joehanes et al., 2016). At two CpGs, the methylation difference in smoking discordant monozygotic twins was larger (on average 24.6%).

Figure 3 with 1 supplement see all
Top differentially methylated loci identified in monozygotic twin pairs discordant for current smoking.

(a) Manhattan plot of the epigenome-wide association study (EWAS) in 53 smoking discordant monozygotic twin pairs (current versus never). The red horizontal line denotes the epigenome-wide significance threshold (Bonferroni correction) and 13 CpGs with significant differences are highlighted. (b) Mean within-pair differences in monozygotic twin pairs at the 13 CpGs that were epigenome-wide significant in smoking discordant monozygotic pairs. Mean within-pair differences of the residuals obtained after correction of methylation β-values for covariates are shown for 53 monozygotic pairs discordant for current/never smoking, 66 monozygotic pairs discordant for current/former smoking, 72 monozygotic pairs discordant for former/never smoking, 83 concordant current smoking monozygotic pairs, 88 concordant former smoking monozygotic pairs, and 406 concordant never smoking monozygotic pairs. (c) QQ-plot showing p-values from the EWAS in 53 smoking discordant monozygotic twin pairs (current versus never). P-values for CpGs located nearby significant SNPs from the genome-wide association study (GWAS) of smoking initiation are plotted in blue and all other genome-wide CpGs are plotted in orange.

Table 1
Descriptive statistics for smoking discordant and concordant monozygotic twin pairs.
Discordant current/never(53 pairs)Discordant former/never(72 pairs)Discordant current/former(66 pairs)Concordant current(83 pairs)Concordant never(406 pairs)Concordant former(88 pairs)
Current smokerNever-smokerMean diffp-valueFormer smokerNever-smokerMean diffp-valueCurrent smokerFormer smokerMean diffp-valueTwin 1Twin 2Mean diffp-valueTwin 1Twin 2Mean diffp-valueTwin 1Twin 2Mean diffp-value
% Female pairs60.4%60.4%n.a.n.a.77.80%77.80%n.a.n.a.69.7%69.7%n.a.n.a.61.4%61.4%n.a.n.a.73.6%73.6%n.a.n.a.64.8%64.8%n.a.n.a.
Age at blood sampling, mean (SD)33.1 (8.0)33.0 (7.9)0.100.3441.4 (13.2)41.4 (13.1)0.020.8342.2
(12.6)
42.2
(12.5)
−0.060.4533.8
(10.3)
33.9
(10.5)
−0.120.1033.1
(11.3)
33.0
(11.2)
0.060.0845.2
(13.4)
45.2
(13.4)
0.090.29
Cigarettes per day at blood sampling, mean (SD), N missings8.9 (6.4), 6n.a.n.a.n.a.n.a.n.a.n.a.n.a.11.9 (7.2), 9n.a.n.a.n.a.11.1
(7.0), 2
10.9
(6.9), 1
0.001.00n.a.n.a.n.a.n.a.n.a.n.a.n.a.n.a.
Packyears, mean (SD), N missings6.8 (7.0), 13n.a.n.a.n.a.5.9 (11.1), 15n.a.n.a.n.a.13.6
(13.2), 9
9.3
(8.7), 10
3.90.059.7 (9.3), 108.3 (7.6), 90.220.82n.a.n.a.n.a.n.a.10.6
(11.5), 7
9.8
(10.4), 11
0.780.55
Years since quitting smoking, mean (SD), N missingsn.a.n.a.n.a.n.a.13.5 (11.4), 9n.a.n.a.n.a.n.a.9.0
(10.2), 2
n.a.n.a.n.a.n.a.n.a.n.a.n.a.n.a.n.a.n.a.11.9
(9.1), 8
13.6
(11.8), 7
−1.620.20
Plasma cotinine level, mean (SD), N missings*222 (197.5), 11.8 (2.6), 28261.11.6 × 10−51.4 (1.8), 490.9 (1.0), 520.620.43286.7 (330.5), 219.2 (70.0), 28293.82.5 × 10−6267 (290.9), 3279 (308.1), 3−8.70.781.3 (9.7), 2740.5 (0.8), 2600.070.5055.8 (222.2), 616 (16.2), 6483.30.26
Educational Attainment, N (%)0.040.970.340.600.760.83
N missing18111313141428341051081420
1.  Primary school only0 (0%)1 (2.3%)0 (0%)1 (1.7%)2 (3.8%)5 (9.6%)1 (1.8%)2 (4.1%)3 (1.0%)4 (1.3%)6 (8.1%)4 (5.9%)
2.  Lower vocational schooling1 (2.9%)0 (0%)7 (11.9%)6 (10.2%)10 (19.2%)4 (7.7%)4 (7.3%)1 (2.0%)8 (2.7%)10 (3.4%)2 (2.7%)10 (14.7%)
3. Lower secondary schooling (general)2 (5.7%)2 (4.8%)11 (18.6%)8 (13.6%)7 (13.5%)4 (7.7%)5 (9.1%)7 (14.3%)11 (3.7%)11 (3.7%)10 (13.5%)6 (8.8%)
4. Intermediate vocational schooling12 (34.3%)15 (35.7%)13 (23.7%)19 (32.2%)8 (15.4%)13 (25.0%)16 (29.1%)16 (32.7%)81 (26.9%)85 (28.5%)26 (35.1%)17 (25.0%)
5. Intermediate/higher secondary schooling (general)2 (5.7%)1 (2.4%)3 (6.8%)3 (5.1%)2 (3.8%)2 (3.8%)4 (7.3%)6 (12.2%)15 (5.0%)17 (5.7%)3 (4.1%)2 (2.9%)
6. Higher vocational schooling14 (40.0%)14 (33.3%)14 (22.0%)11 (18.6%)15 (28.8%)16 (30.8%)19 (34.5%)11 (22.4%)97 (32.2%)81 (27.2%)17 (23.0%)17 (25.0%)
7. University4 (11.4%)9 (21.4%)10 (16.9%)11 (18.6%)8 (15.4%)8 (15.4%)6 (10.9%)6 (12.2%)86 (28.6%)90 (30.2%)10 (13.5%)12 (17.6%)
BMI, mean (SD), N missings23.8 (3.8), n.a.24.0 (3.4), n.a.−0.170.7425.0 (3.6), n.a.25.1 (4.2), n.a.−0.130.7523.7 (3.1)25.2 (4.2)−1.473.4 × 10−423.6 (3.5), n.a.23.1 (3.2), n.a.0.470.0623.8 (3.8),423.6 (3.4), 20.270.0325.6 (4.0), n.a.24.9 (3.7), n.a.0.630.02
Percentage monocytes, mean (SD), N missings8.0 (2.3), 08.5 (2.4), 0−0.440.198.6 (2.0), 08.3 (1.8), 00.290.198.6
(1.9),0
9.2
(3.1), 0
−0.570.148.3 (2.1), 08.1 (1.9), 00.200.368.5 (2.0), 08.5
(2.2), 0
0.030.758.4
(2.4), 0
8.5
(2.4), 0
−0.060.75
Percentage lymphocytes, mean (SD), N missings35.0 (8.9), 035.9 (10.0)−0.940.5033.6 (8.5), 034.0 (8.7)−0.370.7735.8
(8.2),0
35.6
(8.5), 0
0.250.8433.7 (8.3), 034.1 (8.3), 0−0.440.6736.3 (8.4), 036.2 (8.4), 00.040.9235.0
(7.7), 0
34.1
(8.4), 0
0.950.26
Percentage neutrophils, mean (SD), N missings53.4 (9.5), 052.1 (9.8), 01.340.3854.4 (9.1), 054.5 (9.1), 0−0.080.9551.6
(9.0),0
51.7
(8.9), 0
0–0.060.9653.7 (8.9), 053.7 (9.3), 00.030.9851.8, (8.7), 051.9 (9.3), 0−0.080.8652.8
(8.2), 0
53.7
(8.4), 0
−0.840.35
Percentage eosinophils, mean (SD), N missings3.1 (2.5), 03.1 (2.1), 00.050.913.1 (1.9), 02.9 (2.0), 00.150.533.3
(1.9),0
3.1
(1.7), 0
0.210.333.4 (2.2), 03.4 (1.8), 0−0.030.912.9 (1.8), 02.9
(1.9), 0
−0.040.663.1
(1.9), 0
3.2
(2.4), 0
−0.060.77
Percentage basophils, mean (SD), N missings0.5 (0.7), 00.5 (0.7), 0−0.010.960.3 (0.3), 00.4 (0.5), 0−0.020.760.6
(0.9),0
0.4
(0.4), 0
0.180.120.9 (3.1), 00.6 (1.1), 00.250.490.5 (0.7), 00.4
(0.7), 0
0.060.210.6
(1.1), 0
0.6
(0.9), 0
−0.010.97
  1. *

    Missing values include values that are below the detection limit. BMI = body mass Index.

Table 2
Epigenome-wide significant differentially methylated CpGs in monozygotic pairs discordant for current smoking status.
Current smoking discordant pairsFormer smoking discordant pairs (former/never)
IlmnIDCHRMAPINFOGene*Nearest geneMean diffp-value95conf_L95conf_HT-StatisticMean diffp-value95conf_L95conf_HT-Statistic
cg055759215373378AHRRAHRR0.1324.9 × 10−110.1000.1658.2650.0273.3 × 10−40.0130.0413.778
cg215666422233284661ALPPL20.0921.5 × 10−100.0690.1157.9600.0333.2 × 10−60.0200.0465.067
cg059512212233284402ALPPL20.0661.8 × 10−90.0480.0847.2700.0264.6 × 10−60.0160.0374.964
cg019402732233284934ALPPL20.0602.1 × 10−90.0440.0777.2400.0181.3 × 10−40.0090.0274.052
cg13411554353700276CACNA1DCACNA1D−0.0386.0 × 10−9−0.049−0.027−6.947−0.0070.10−0.0160.002−1.655
cg019013321175031054ARRB1ARRB10.0258.0 × 10−90.0180.0336.8680.0060.16−0.0020.0131.425
cg211611385399360AHRRAHRR0.0461.9 × 10−80.0320.0596.6420.0020.64−0.0060.0090.466
cg00336149353700195CACNA1DCACNA1D−0.0272.0 × 10−8−0.035−0.019−6.615−0.0020.60−0.0080.005−0.524
cg22132788745002486MYO1GMYO1G−0.0562.4 × 10−8−0.073−0.039−6.596−0.0114.5 × 10−3−0.019−0.004−2.930
cg21188533353700263CACNA1DCACNA1D−0.0363.9 × 10−8−0.047−0.025−6.437−0.0060.24−0.0150.004−1.196
cg09935388192947588GFI1GFI10.0524.1 × 10−80.0350.0686.4230.0020.61−0.0060.0100.519
cg256482035395444AHRRAHRR0.0355.3 × 10−80.0240.0466.3530.0020.48−0.0040.0080.710
cg19089201745002287MYO1GMYO1G−0.0407.5 × 10−8−0.053−0.028−6.260−0.0070.13−0.0170.002−1.529
  1. Coordinates are given based on genome build 37. Mean differences represent non-smoking twin minus smoking-twin (hence positive values indicate a higher methylation level in non-smoking twins). The table shows the 13 epigenome-wide significant CpGs from the within-pair EWAS in 53 discordant monozygotic twin pairs (current versus never smokers). Results from the comparison within 72 monozygotic pairs discordant for former smoking are also shown.

  2. *

    CpGs without a gene name are located in intergenic regions. 95conf_L = 95% confidence interval lower bound, 95conf_H = 95% confidence interval upper bound.

In twin pairs discordant for former smoking (N = 72 pairs, mean age = 41 years), the twins, who used to smoke, had quit smoking on average 14 years ago (standard deviation [SD] = 11.4, range = 0.04–50 years), while the other twins had never initiated regular smoking. In this group, no epigenome-wide significant DMPs were identified, and within-pair differences at the 13 significant DMPs identified in the previous analysis were diminished (average reduction: 81%, range = 61–96%; Figure 3b, Table 2). By contrast, in twin pairs of which one twin was a current smoker at blood draw and the co-twin had quit smoking (on average 9 years, ago, SD = 10.2, range = 0.02–40 years, N = 66 pairs), the reduction of within-pair differences at the 13 top CpGs was much smaller (on average, 31%, range 15–52%; Figure 3b, Supplementary file 3), and 5 of the 13 DMPs identified by comparing current and never smoking twins were also epigenome-wide significant in this group. Furthermore, five additional epigenome-wide CpGs were identified in current/former smoking discordant pairs (Supplementary file 4). Figure 3b illustrates the pattern of within-pair differences at the 13 top DMPs identified in current/never discordant monozygotic pairs: largest differences in current/never smoking discordant pairs, smaller differences in former/never discordant pair, and current/former discordant pairs are intermediate. Differences are smallest within smoking concordant pairs. This pattern is in line with smoking-associated methylation patterns in blood cells being to a large extent reversible upon quitting smoking.

Distributions of within-pair differences in smoking discordant and concordant pairs for the top 1000 CpGs of the EWAS in discordant pairs are shown in Figure 4a. The distributions illustrate that differences are largest, as expected, within monozygotic twin pairs discordant for current smoking (current/never smoking pairs), followed by discordant current/former smoking discordant pairs, followed by former/never smoking discordant monozygotic twin pairs. Monozygotic pairs concordant for current smoking also show notable within-pair differences at these CpGs that are substantially larger compared to monozygotic pairs concordant for never smoking (Figure 4a). This could be explained by within-pair differences in the number of cigarettes smoked by monozygotic twins who were concordant for current smoking. The twin correlations in current smoking monozygotic twin pairs were r = 0.50, p = 2.2 × 10−6 for cigarettes per day (Figure 4b), r = 0.43, p = 3.2 × 10−4 for packyears, and r = 0.58, p = 1.6 × 10−8, for plasma cotinine levels, respectively. Within-pair differences in DNA methylation at the 13 top CpGs correlated with within-pair differences in the number of cigarettes smoked per day (mean absolute r = 0.38, range [for different CpGs]: –0.56 to 0.41; Table 3, Figure 4c),with within-pair differences in packyears (mean absolute r = 0.46, range: −0.65 to 0.42; Table 3), but did not correlate strongly with within-pair differences in plasma cotinine level (mean absolute r = 0.14, range: −0.23 to 0.28, Table 3). In twin pairs discordant for former smoking, within-pair differences in DNA methylation at the 13 top CpGs were weakly correlated with time since quitting smoking (mean r = −0.11, range = −0.28 to 0.05, Supplementary file 5). Based on scatterplots of the within-pair methylation differences against time since quitting smoking (Figure 4d), we hypothesized that the lack of a strong correlation with time since quitting smoking might be explained by most of the reversal taking place within the first years after quitting smoking. We therefore repeated the analysis restricting to those pairs of which the smoking twin had quit smoking less than 5 years ago (N = 15 pairs). In this group, within-pair differences in DNA methylation at the 13 top CpGs were on average more strongly correlated with time since quitting smoking (mean r = −0.16, range = −0.48 to 0.23) but the sample size was greatly reduced and correlations were non-significant.

DNA methylation differences in smoking discordant and smoking concordant pairs.

(a) Distributions of the mean absolute within-pair differences in discordant and concordant pairs at the top 1000 CpGs with the lowest p-value from the epigenome-wide association study (EWAS) in discordant monozygotic pairs (current versus never smokers). (b) Scatterplot of cigarettes smoked per day in 80 concordant current smoking monozygotic pairs with complete data. (c) Scatterplot of within-pair differences in cigarettes smoked per day versus DNA methylation at cg05575921 (AHRR) in 80 concordant current smoking monozygotic pairs with complete data. (d) Scatterplot of within-pair differences in DNA methylation at cg05575921 (AHRR) versus time since quitting smoking (years) in 63 pairs discordant for former smoking.

Table 3
Correlations of within-pair differences in DNA methylation with within-pair differences in cigarettes per day and packyears in 83 concordant current smoking monozygotic pairs.
cgidCHRPositionGeneNearest geneCigarettes per dayPackyearsp-value
rp-valuerr
cg055759215373378AHRRAHRR−0.525.9 × 10−7−0.551.2 × 10−6−0.080.50
cg215666422233284661ALPPL2−0.494.7 × 10−6−0.568.1 × 10−7−0.190.10
cg059512212233284402ALPPL2−0.444.7 × 10−5−0.569.0 × 10−7−0.180.12
cg019402732233284934ALPPL2−0.567.0 × 10−8−0.652.0 × 10−9−0.230.04
cg13411554353700276CACNA1DCACNA1D0.271.4 × 10−20.327.4 × 10−30.200.08
cg019013321175031054ARRB1ARRB1−0.233.8 × 10−2−0.345.5 × 10−3−0.040.71
cg211611385399360AHRRAHRR−0.528.4 × 10−7−0.527.4 × 10−6−0.050.65
cg00336149353700195CACNA1DCACNA1D0.361.0 × 10−30.423.5 × 10−40.160.16
cg22132788745002486MYO1GMYO1G0.412.1 × 10−40.418.4 × 10−40.140.23
cg21188533353700263CACNA1DCACNA1D0.324.1 × 10−30.381.4 × 10−30.270.01
cg09935388192947588GFI1GFI1−0.429.1 × 10−5−0.591.4 × 10−7−0.030.80
cg256482035395444AHRRAHRR−0.281.2 × 10−2−0.424.3 × 10−4−0.060.61
cg19089201745002287MYO1GMYO1G0.151.8 × 10−10.272.6 × 10−20.230.05

All 13 differentially methylated CpGs identified in current smoking discordant pairs and all 10 CpGs identified in former smoking discordant pairs have been previously associated with smoking. To study the overlap of methylation differences in smoking discordant twin pairs with loci that have a causal effect on smoking, we considered the largest GWAS meta-analysis of smoking phenotypes, the meta-analysis of smoking initiation by the GWAS and Sequencing Consortium of Alcohol and Nicotine use (GSCAN) (Liu et al., 2019). Three of the 13 epigenome-wide significant DMPs detected in smoking discordant monozygotic pairs (cg13411554, cg00336149, and cg21188533 in CACNA1D) are located within 1 Mb of a GWAS locus associated with smoking initiation. The methylation sites within 1 Mb of genome-wide significant SNPs from the GWAS overall did not show a stronger signal in the within-pair EWAS of smoking discordant monozygotic pairs compared to other genome-wide methylation sites (β = −0.002, se = 0.004, p = 0.56, Figure 3c).

We tested for enrichment of methylation sites previously associated with 680 traits reported in the EWAS atlas (Li et al., 2019), among the top differentially methylated loci in smoking discordant pairs, which showed strong enrichment of smoking-related traits (Supplementary file 6). Enrichment analysis based on Kegg pathways showed one significantly enriched pathway; Dopaminergic Synapse (hsa04728; Supplementary file 7), with three of the top differentially methylated loci in smoking discordant monozygotic pairs mapping to this pathway: CACNA1D, GNG12, and ARRB1. No significant enrichment was seen in GO pathways after multiple testing correction (Supplementary file 8).

To examine potential functional consequences of top DMPs, we used previously published data on whole-blood DNA methylation and RNA-sequencing (n = 2101 samples). At 4 of the 13 CpGs, DNA methylation level in blood was associated with the expression level of nearby genes (Table 4). At three CpGs, a higher methylation level correlated with lower expression level. None of the 13 CpGs overlapped with six genes that were differentially expressed in monozygotic pairs discordant for current smoking (Vink et al., 2017).

Table 4
Significantly associated transcripts in cis for CpGs that are differentially methylated in smoking discordant monozygotic twin pairs.
CpGGeneZ scorep-valueFDR
cg25648203EXOC3−7.342.11e−130
cg19089201RP4-647J21.15.552.84e−80
cg05575921EXOC3−4.860.000001190.00039
cg21161138EXOC3−3.820.0001330.0254

Discussion

Previous EWASs have identified robust differences in DNA methylation between smokers and non-smokers at a number of loci. These differences may reflect true smoking-reactive DNA methylation patterns, but can also be driven by (genetic) confounding or reverse causation. We exploited a strong within-family design, that is, the discordant monozygotic twin design (Bell and Spector, 2012), to identify smoking-reactive loci. By analysing whole-blood genome-wide DNA methylation patterns in 53 monozygotic pairs discordant for current smoking, we found 13 CpGs with a difference in methylation level between the current smoking twin and the twin who never smoked. All 13 CpGs have been previously associated with smoking in unrelated individuals and in line with previous studies that compared unrelated smokers and controls (Joehanes et al., 2016), our data from monozygotic pairs discordant for former smoking also indicate that methylation patterns are to a large extent reversible upon smoking cessation. We further showed that differences in smoking level exposure for monozygotic twins who are both current smokers but differ in the number of cigarettes they smoke are reflected in their DNA methylation profiles.

The strongest smoking-associated loci typically detected in human blood EWAS are genes involved in detoxification pathways of aromatic hydrocarbons, such as AHRR and CYP1A1 (Gao et al., 2015), of which AHRR was also present among the top differentially methylated loci in our analysis of discordant twin pairs. Mainstream tobacco smoke is a mixture of thousands of chemicals (Rodgman and Perfetti, 2008). Although the effects of many of the compounds present in cigarette smoke are unknown, several mechanisms have been described through which cigarette smoking may affect global or gene-specific DNA methylation levels. These include DNA damage induced by certain compounds such as arsenic, chromium, formaldehyde, polycyclic aromatic hydrocarbons, and nitrosamines that all cause double-stranded breaks (Smith and Hansch, 2000) (which causes increased methylation near repaired DNA) (Mortusewicz et al., 2005; Cuozzo et al., 2007), hypoxia induced by carbon monoxide (Olson, 1984) (causing global CpG island demethylation by disrupting methyl donor availability), and modulation of the expression level or activity of DNA-binding proteins, such as transcription factors (Lee and Pausova, 2013). Nicotine, presumed to be the major addictive compound in cigarette smoke (although other putative addictive compounds have also been described [Talhout et al., 2011]), has gene regulatory effects. Binding of nicotine to nicotinic acetylcholine receptors causes downstream activation of cAMP response element-binding protein, which is a key transcription factor for many genes (Shen and Yakel, 2009). In mouse brain, nicotine downregulates the DNA methyl transferase gene Dnmt (Satta et al., 2008). Previous EWAS studies based on blood cotinine levels, as a biomarker for nicotine exposure, and based on a polygenic scores for nicotine metabolism, reported differentially methylated CpGs that largely overlap with CpGs found in EWAS of smoking status (Gupta et al., 2019; Lee et al., 2016). Furthermore, E-cigarette-based nicotine exposure of mice has been shown to cause DNA methylation changes in white blood cells (Peng et al., 2022).

Importantly, effects of smoking on DNA methylation in brain cells have been hypothesized to contribute to addiction (Zillich et al., 2022), but it is largely unknown to what extent addiction-related DNA methylation dynamics are captured in other tissues such as blood. Nicotinic receptors are especially abundant in the central and peripheral nervous system, but are also present in other tissues. In peripheral blood, nicotinic receptors are present on lymphocytes and polymorphonuclear cells (Benhammou et al., 2000), suggesting that EWA studies performed on blood cells might capture nicotine-reactive methylation patterns. Interesting in this regard is our finding that among the top differentially methylated CpGs in smoking discordant pairs are multiple CpGs in CACNA1D and GNG12, which encode subunits of a calcium voltage-gated channel and G protein, respectively; proteins that interact with the nicotinic acetylcholine receptor, and the related enrichment of Kegg pathway dopaminergic neuron. Methylation levels at these CpGs might be reactive to nicotine exposure. Furthermore, the CpGs in CACNA1D are in proximity of a GWAS locus for smoking initiation, suggesting that this might be a locus that is not only reactive to smoking exposure, but may also contribute to smoking behaviour. Although it remains to be established if the epigenetic and genetic variation at this locus are functionally connected (i.e. have the same downstream consequences on gene expression), these results suggest that these CpGs can be interesting candidates for further studies into peripheral biomarkers of smoking addiction. Since we applied a discordant monozygotic twin design, the methylation differences identified at this locus in our study cannot be driven by mQTL effects of the SNPs associated with smoking. The data from monozygotic pairs discordant for former smoking indicate that methylation patterns are to a large extent reversible upon smoking cessation, which is in line with DNA methylation patterns being reactive to smoking. Nevertheless, our findings do not rule out that the possibility that reverse causation (DNA methylation driving smoking behaviour) might also contribute to the (maintenance of) smoking discordance in smoking discordant monozygotic twin pairs. Future analyses combining DNA methylation and genetic data from monozygotic and dizygotic twins may be applied to examine bidirectional causal associations between DNA methylation and smoking (Minică et al., 2018).

The main strength of our study is the use of the discordant monozygotic twin design to examine the effects of smoking, because it rules out genetic confounding, as well as many other confounding factors. The value of studying smoking effects against an identical genetic background is clear if one considers that one of the most strongly associated genetic variants for nicotine dependence is located in the DNA methyltransferase gene DNMT3B (Hancock et al., 2018). This strongly implies a role for DNA methylation in nicotine addiction, but it also suggests that horizontal genetic pleiotropy might contribute to associations between DNA methylation and smoking in ordinary case–control EWASs, where differences in DNA methylation between unrelated smokers and non-smokers may reflect differences in genotype. Our analysis had adequate power to detect large effects (i.e., the top hits identified in typical smoking EWAS) (Tsai and Bell, 2015). These reflect only a small proportion, however, of all smoking-associated sites. In our analysis of 53 monozygotic twin pairs discordant for current versus never smoking, we detected 13 CpGs at genome-wide significance, which represent 0.5% of the total number of CpGs (2623) detected in the smoking meta-analysis of unrelated individuals (2433 current verus 6956 never smokers) (Joehanes et al., 2016). The within-pair difference in smoking discordant monozygotic pairs was smaller compared to the effect size reported previously based on the comparison of unrelated smokers and non-smokers. Larger sample sizes are required to achieve adequate power to detect smaller effects. While the pattern of within-pair differences in current/never, current/former and former/never discordant monozygotic twin pairs was clearly in line with reversal of methylation patterns following smoking cessation, we did not find a strong correlation between within-pair differences in DNA methylation and time since quitting smoking in former smoking discordant pairs. If most reversal takes place gradually in the first view years after smoking cessation, it might require larger sample sizes of twin pairs discordant for recently quitting smoking to detect such a correlation. Larger samples sizes may be achieved by combining data from multiple twin cohorts in a meta-analysis.

Common limitations that apply to many EWA studies including ours are that we only analysed DNA methylation data from blood and that the technique used to measure DNA methylation only covers a small subset of all CpG sites in the genome. Another limitation is that information on smoking was obtained through self-report. We previously described smoking misclassification in this cohort based on blood levels of cotinine (van Dongen et al., 2018), a biomarker for nicotine exposure, that has been measured in a subset of the cohort (Bot et al., 2013), which indicated a low misclassification rate. Plasma cotinine levels were available for 591 individuals classified as never smokers by self-report. Five of these individuals (0.8%) had cotinine levels ≥15 ng/ml, which is indicative of smoking, and thus indicates a misclassification of smoking status. In the current paper, we further showed that the correlation between cotinine levels in concordant current smoking pairs was similar to the correlation between self-reported number of cigarettes per day.

Conclusion

In conclusion, we studied reactiveness of DNA methylation in blood cells to smoking and reversibility of methylation patterns upon quitting smoking in monozygotic twins. Analyses in special groups such as monozygotic twins are valuable to validate results from large population-based EWAS meta-analyses, or to train more accurate methylation scores for environmental exposures that are not confounded by genetic effects. Our results illustrate the potential to utilize DNA methylation profiles of monozygotic twins as a read out of discordant exposures at present and in the past.

Data availability

The HumanMethylation450 BeadChip data from the NTR are available as part of the Biobank-based Integrative Omics Studies (BIOS) Consortium in the European Genome-phenome Archive (EGA), under the accession code EGAD00010000887, https://ega-archive.org/datasets/EGAD00010000887 (Study ID EGAS00001001077, Title: The mission of the BIOS Consortium is to create a large-scale data infrastructure and to bring together BBMRI researchers focusing on integrative omics studies in Dutch Biobanks, contact: The BIOS Consortium: Biobank-based Integrative Omics Studies, Contact person: Rick Jansen). The OMICs data and additional phenotype data are available upon request via the BBMRI-NL BIOS consortium (https://www.bbmri.nl/acquisition-use-analyze/bios). All NTR data can be requested by bona fida researchers (https://ntr-data-request.psy.vu.nl/). Because of the consent given by study participants the data cannot be made publicly available. The pipeline for DNA methylation-array analysis developed by the Biobank-based Integrative Omics Study (BIOS) consortium is available here: https://molepi.github.io/DNAmArray_workflow/ copy archived at Sinke, 2020 (https://doi.org/10.5281/zenodo.3355292). The code for the EWAS analysis in monozygotic twin pairs is included in Source code 1.

The following data sets were generated

References

    1. Liu M
    2. Jiang Y
    3. Wedow R
    4. Li Y
    5. Brazel DM
    6. Chen F
    7. Datta G
    8. Davila-Velderrain J
    9. McGuire D
    10. Tian C
    11. Zhan X
    12. Choquet H
    13. Docherty AR
    14. Faul JD
    15. Foerster JR
    16. Fritsche LG
    17. Gabrielsen ME
    18. Gordon SD
    19. Haessler J
    20. Hottenga JJ
    21. Huang H
    22. Jang SK
    23. Jansen PR
    24. Ling Y
    25. Mägi R
    26. Matoba N
    27. McMahon G
    28. Mulas A
    29. Orrù V
    30. Palviainen T
    31. Pandit A
    32. Reginsson GW
    33. Skogholt AH
    34. Smith JA
    35. Taylor AE
    36. Turman C
    37. Willemsen G
    38. Young H
    39. Young KA
    40. Zajac GJM
    41. Zhao W
    42. Zhou W
    43. Bjornsdottir G
    44. Boardman JD
    45. Boehnke M
    46. Boomsma DI
    47. Chen C
    48. Cucca F
    49. Davies GE
    50. Eaton CB
    51. Ehringer MA
    52. Esko T
    53. Fiorillo E
    54. Gillespie NA
    55. Gudbjartsson DF
    56. Haller T
    57. Harris KM
    58. Heath AC
    59. Hewitt JK
    60. Hickie IB
    61. Hokanson JE
    62. Hopfer CJ
    63. Hunter DJ
    64. Iacono WG
    65. Johnson EO
    66. Kamatani Y
    67. Kardia SLR
    68. Keller MC
    69. Kellis M
    70. Kooperberg C
    71. Kraft P
    72. Krauter KS
    73. Laakso M
    74. Lind PA
    75. Loukola A
    76. Lutz SM
    77. Madden PAF
    78. Martin NG
    79. McGue M
    80. McQueen MB
    81. Medland SE
    82. Metspalu A
    83. Mohlke KL
    84. Nielsen JB
    85. Okada Y
    86. Peters U
    87. Polderman TJC
    88. Posthuma D
    89. Reiner AP
    90. Rice JP
    91. Rimm E
    92. Rose RJ
    93. Runarsdottir V
    94. Stallings MC
    95. Stančáková A
    96. Stefansson H
    97. Thai KK
    98. Tindle HA
    99. Tyrfingsson T
    100. Wall TL
    101. Weir DR
    102. Weisner C
    103. Whitfield JB
    104. Winsvold BS
    105. Yin J
    106. Zuccolo L
    107. Bierut LJ
    108. Hveem K
    109. Lee JJ
    110. Munafò MR
    111. Saccone NL
    112. Willer CJ
    113. Cornelis MC
    114. David SP
    115. Hinds DA
    116. Jorgenson E
    117. Kaprio J
    118. Stitzel JA
    119. Stefansson K
    120. Thorgeirsson TE
    121. Abecasis G
    122. Liu DJ
    123. Vrieze S
    124. 23andMe Research Team
    125. HUNT All-In Psychiatry
    (2019) Association studies of up to 1.2 million individuals yield new insights into the genetic etiology of tobacco and alcohol use
    Nature Genetics 51:237–244.
    https://doi.org/10.1038/s41588-018-0307-5
    1. Peng G
    2. Xi Y
    3. Bellini C
    4. Pham K
    5. Zhuang ZW
    6. Yan Q
    7. Jia M
    8. Wang G
    9. Lu L
    10. Tang MS
    11. Zhao H
    12. Wang H
    (2022)
    Nicotine dose-dependent epigenomic-wide DNA methylation changes in the mice with long-term electronic cigarette exposure
    American Journal of Cancer Research 12:3679–3692.
  1. Software
    1. R Development Core Team
    (2013) R: A language and environment for statistical computing
    R Foundation for Statistical Computing, Vienna, Austria.

Article and author information

Author details

  1. Jenny van Dongen

    1. Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
    2. Amsterdam Public Health Research Institute, Amsterdam, Netherlands
    3. Amsterdam Reproduction and Development (AR&D) Research Institute, Amsterdam, Netherlands
    Contribution
    Conceptualization, Formal analysis, Writing - original draft
    For correspondence
    j.van.dongen@vu.nl
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-2063-8741
  2. Gonneke Willemsen

    1. Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
    2. Amsterdam Public Health Research Institute, Amsterdam, Netherlands
    Contribution
    Data curation, Funding acquisition, Writing - review and editing
    Competing interests
    No competing interests declared
  3. BIOS Consortium

    Contribution
    Data curation, Funding acquisition, Methodology, Software
    Competing interests
    No competing interests declared
    1. Bastiaan T Heijmans, Molecular Epidemiology Section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, Netherlands
    2. Peter AC ’t Hoen, Department of Human Genetics, Leiden University Medical Center, Leiden, Netherlands
    3. Joyce van Meurs, Department of Internal Medicine, Erasmus MC, Rotterdam, Netherlands
    4. Aaron Isaacs, Department of Genetic Epidemiology, Erasmus MC, Rotterdam, Netherlands
    5. Rick Jansen, Department of Psychiatry, VU University Medical Center, Neuroscience Campus Amsterdam, Amsterdam, Netherlands
    6. Lude Franke, Department of Genetics, University of Groningen, University Medical Centre Groningen, Groningen, Netherlands
    7. Dorret I Boomsma, Department of Biological Psychology, VU University Amsterdam, Neuroscience Campus Amsterdam, Amsterdam, Netherlands
    8. René Pool, Department of Biological Psychology, VU University Amsterdam, Neuroscience Campus Amsterdam, Amsterdam, Netherlands
    9. Jenny van Dongen, Department of Biological Psychology, VU University Amsterdam, Neuroscience Campus Amsterdam, Amsterdam, Netherlands
    10. Jouke J Hottenga, Department of Biological Psychology, VU University Amsterdam, Neuroscience Campus Amsterdam, Amsterdam, Netherlands
    11. Marleen MJ van Greevenbroek, Department of Internal Medicine and School for Cardiovascular Diseases (CARIM), Maastricht University Medical Center, Maastricht, Netherlands
    12. Coen DA Stehouwer, Department of Internal Medicine and School for Cardiovascular Diseases (CARIM), Maastricht University Medical Center, Maastricht, Netherlands
    13. Carla JH van der Kallen, Department of Internal Medicine and School for Cardiovascular Diseases (CARIM), Maastricht University Medical Center, Maastricht, Netherlands
    14. Casper G Schalkwijk, Department of Internal Medicine and School for Cardiovascular Diseases (CARIM), Maastricht University Medical Center, Maastricht, Netherlands
    15. Cisca Wijmenga, Department of Genetics, University of Groningen, University Medical Centre Groningen, Groningen, Netherlands
    16. Sasha Zhernakova, Department of Genetics, University of Groningen, University Medical Centre Groningen, Groningen, Netherlands
    17. Ettje F Tigchelaar, Department of Genetics, University of Groningen, University Medical Centre Groningen, Groningen, Netherlands
    18. P Eline Slagboom, Molecular Epidemiology Section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, Netherlands
    19. Marian Beekman, Molecular Epidemiology Section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, Netherlands
    20. Joris Deelen, Molecular Epidemiology Section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, Netherlands
    21. Diana van Heemst, Department of Gerontology and Geriatrics, Leiden University Medical Center, Leiden, Netherlands
    22. Jan H Veldink, Department of Neurology, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht, Netherlands
    23. Leonard H van den Berg, Department of Neurology, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht, Netherlands
    24. Cornelia M van Duijn, Department of Genetic Epidemiology, Erasmus MC, Rotterdam, Netherlands
    25. Bert A Hofman, Department of Epidemiology, Erasmus MC, Rotterdam, Netherlands
    26. André G Uitterlinden, Department of Internal Medicine, Erasmus MC, Rotterdam, Netherlands
    27. P Mila Jhamai, Department of Internal Medicine, Erasmus MC, Rotterdam, Netherlands
    28. Michael Verbiest, Department of Internal Medicine, Erasmus MC, Rotterdam, Netherlands
    29. H Eka D Suchiman, Molecular Epidemiology Section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, Netherlands
    30. Marijn Verkerk, Department of Internal Medicine, Erasmus MC, Rotterdam, Netherlands
    31. Ruud van der Breggen, Molecular Epidemiology Section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, Netherlands
    32. Jeroen van Rooij, Department of Internal Medicine, Erasmus MC, Rotterdam, Netherlands
    33. Nico Lakenberg, Molecular Epidemiology Section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, Netherlands
    34. Hailiang Mei, Sequence Analysis Support Core, Leiden University Medical Center, Leiden, Netherlands
    35. Maarten van Iterson, Molecular Epidemiology Section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, Netherlands
    36. Michiel van Galen, Department of Human Genetics, Leiden University Medical Center, Leiden, Netherlands
    37. Jan Bot, SURFsara, Amsterdam, Netherlands
    38. Dasha V Zhernakova, Department of Genetics, University of Groningen, University Medical Centre Groningen, Groningen, Netherlands
    39. Peter van ’t Hof, Sequence Analysis Support Core, Leiden University Medical Center, Leiden, Netherlands
    40. Patrick Deelen, Department of Genetics, University of Groningen, University Medical Centre Groningen, Groningen, Netherlands
    41. Irene Nooren, SURFsara, Amsterdam, Netherlands
    42. Bastiaan T Heijmans, Molecular Epidemiology Section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, Netherlands
    43. Matthijs Moed, Molecular Epidemiology Section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, Netherlands
    44. Martijn Vermaat, Department of Human Genetics, Leiden University Medical Center, Leiden, Netherlands
    45. René Luijk, Molecular Epidemiology Section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, Netherlands
    46. Marc Jan Bonder, Department of Genetics, University of Groningen, University Medical Centre Groningen, Groningen, Netherlands
    47. Freerk van Dijk, Genomics Coordination Center, University Medical Center Groningen, University of Groningen, Groningen, Netherlands
    48. Wibowo Arindrarto, Sequence Analysis Support Core, Leiden University Medical Center, Leiden, Netherlands
    49. Szymon M Kielbasa, Medical Statistics Section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, Netherlands
    50. Morris A Swertz, Genomics Coordination Center, University Medical Center Groningen, University of Groningen, Groningen, Netherlands
    51. Erik W van Zwet, Medical Statistics Section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, Netherlands
    52. Peter-Bram’t Hoen, Department of Human Genetics, Leiden University Medical Center, Leiden, Netherlands
  4. Eco JC de Geus

    1. Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
    2. Amsterdam Public Health Research Institute, Amsterdam, Netherlands
    Contribution
    Supervision, Funding acquisition, Writing - review and editing
    Competing interests
    No competing interests declared
  5. Dorret I Boomsma

    1. Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
    2. Amsterdam Public Health Research Institute, Amsterdam, Netherlands
    3. Amsterdam Reproduction and Development (AR&D) Research Institute, Amsterdam, Netherlands
    Contribution
    Supervision, Funding acquisition, Writing - review and editing
    Competing interests
    No competing interests declared
  6. Michael C Neale

    Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, United States
    Contribution
    Supervision, Funding acquisition, Writing - review and editing
    Competing interests
    No competing interests declared

Funding

National Institute on Drug Abuse (DA049867)

  • Michael C Neale

ZonMw (NWO-Groot 480-15-001/674)

  • Gonneke Willemsen
  • Eco JC de Geus
  • Dorret I Boomsma

The funders had no role in study design, data collection, and interpretation, or the decision to submit the work for publication.

Acknowledgements

NTR warmly thanks all participants. We thank Conor Dolan for providing feedback on the manuscript. We acknowledge the contributions of the investigators of the BIOS consortium (Supplementary file 9): Bastiaan T Heijmans, Peter AC ’t Hoen, Joyce van Meurs, Aaron Isaacs, Rick Jansen, Lude Franke, René Pool, Jouke J Hottenga, Marleen MJ van Greevenbroek, Coen DA Stehouwer, Carla JH van der Kallen, Casper G Schalkwijk, Cisca Wijmenga, Sasha Zhernakova, Ettje F Tigchelaar, P Eline Slagboom, Marian Beekman, Joris Deelen, Diana van Heemst, Jan H Veldink, Leonard H van den Berg, Cornelia M van Duijn, Bert A Hofman, Aaron Isaacs, André G Uitterlinden, P Mila Jhamai, Michael Verbiest, H Eka D Suchiman, Marijn Verkerk, Ruud van der Breggen, Jeroen van Rooij, Nico Lakenberg, Hailiang Mei, Maarten van Iterson, Michiel van Galen, Jan Bot, Dasha V Zhernakova, Rick Jansen, Peter van ’t Hof, Patrick Deelen, Irene Nooren, Matthijs Moed, Martijn Vermaat, René Luijk, Marc Jan Bonder, Freerk van Dijk, Michiel van Galen, Wibowo Arindrarto, Szymon M Kielbasa, Morris A Swertz, and Erik W van Zwet.

Ethics

Informed consent was obtained from all participants. The study was approved by the Central Ethics Committee on Research Involving Human Subjects of the VU University Medical Centre, Amsterdam, an Institutional Review Board certified by the U.S. Office of Human Research Protections (IRB number IRB00002991 under Federal-wide Assurance – FWA00017598; IRB/institute code, NTR 03-180).

Version history

  1. Preprint posted: August 19, 2022 (view preprint)
  2. Received: September 6, 2022
  3. Accepted: August 8, 2023
  4. Accepted Manuscript published: August 10, 2023 (version 1)
  5. Version of Record published: September 14, 2023 (version 2)

Copyright

© 2023, van Dongen et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,253
    views
  • 191
    downloads
  • 4
    citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Jenny van Dongen
  2. Gonneke Willemsen
  3. BIOS Consortium
  4. Eco JC de Geus
  5. Dorret I Boomsma
  6. Michael C Neale
(2023)
Effects of smoking on genome-wide DNA methylation profiles: A study of discordant and concordant monozygotic twin pairs
eLife 12:e83286.
https://doi.org/10.7554/eLife.83286

Share this article

https://doi.org/10.7554/eLife.83286

Further reading

    1. Epidemiology and Global Health
    Caroline Krag, Maria Saur Svane ... Tinne Laurberg
    Research Article

    Background:

    Comorbidity with type 2 diabetes (T2D) results in worsening of cancer-specific and overall prognosis in colorectal cancer (CRC) patients. The treatment of CRC per se may be diabetogenic. We assessed the impact of different types of surgical cancer resections and oncological treatment on risk of T2D development in CRC patients.

    Methods:

    We developed a population-based cohort study including all Danish CRC patients, who had undergone CRC surgery between 2001 and 2018. Using nationwide register data, we identified and followed patients from date of surgery and until new onset of T2D, death, or end of follow-up.

    Results:

    In total, 46,373 CRC patients were included and divided into six groups according to type of surgical resection: 10,566 Right-No-Chemo (23%), 4645 Right-Chemo (10%), 10,151 Left-No-Chemo (22%), 5257 Left-Chemo (11%), 9618 Rectal-No-Chemo (21%), and 6136 Rectal-Chemo (13%). During 245,466 person-years of follow-up, 2556 patients developed T2D. The incidence rate (IR) of T2D was highest in the Left-Chemo group 11.3 (95% CI: 10.4–12.2) per 1000 person-years and lowest in the Rectal-No-Chemo group 9.6 (95% CI: 8.8–10.4). Between-group unadjusted hazard ratio (HR) of developing T2D was similar and non-significant. In the adjusted analysis, Rectal-No-Chemo was associated with lower T2D risk (HR 0.86 [95% CI 0.75–0.98]) compared to Right-No-Chemo.

    For all six groups, an increased level of body mass index (BMI) resulted in a nearly twofold increased risk of developing T2D.

    Conclusions:

    This study suggests that postoperative T2D screening should be prioritised in CRC survivors with overweight/obesity regardless of type of CRC treatment applied.

    Funding:

    The Novo Nordisk Foundation (NNF17SA0031406); TrygFonden (101390; 20045; 125132).

    1. Epidemiology and Global Health
    Xiaoxin Yu, Roger S Zoh ... David B Allison
    Review Article

    We discuss 12 misperceptions, misstatements, or mistakes concerning the use of covariates in observational or nonrandomized research. Additionally, we offer advice to help investigators, editors, reviewers, and readers make more informed decisions about conducting and interpreting research where the influence of covariates may be at issue. We primarily address misperceptions in the context of statistical management of the covariates through various forms of modeling, although we also emphasize design and model or variable selection. Other approaches to addressing the effects of covariates, including matching, have logical extensions from what we discuss here but are not dwelled upon heavily. The misperceptions, misstatements, or mistakes we discuss include accurate representation of covariates, effects of measurement error, overreliance on covariate categorization, underestimation of power loss when controlling for covariates, misinterpretation of significance in statistical models, and misconceptions about confounding variables, selecting on a collider, and p value interpretations in covariate-inclusive analyses. This condensed overview serves to correct common errors and improve research quality in general and in nutrition research specifically.