Environmentally sensitive hotspots in the methylome of the early human embryo

  1. Matt J Silver  Is a corresponding author
  2. Ayden Saffari
  3. Noah J Kessler
  4. Gririraj R Chandak
  5. Caroline HD Fall
  6. Prachand Issarapu
  7. Akshay Dedaniya
  8. Modupeh Betts
  9. Sophie E Moore
  10. Michael N Routledge
  11. Zdenko Herceg
  12. Cyrille Cuenin
  13. Maria Derakhshan
  14. Philip T James
  15. David Monk
  16. Andrew M Prentice
  1. Medical Research Council Unit The Gambia at the London School of Hygiene and Tropical Medicine, United Kingdom
  2. Department of Genetics, University of Cambridge, United Kingdom
  3. Genomic Research on Complex Diseases, CSIR-Centre for Cellular and Molecular Biology, India
  4. MRC Lifecourse Epidemiology Unit, University of Southampton, Southampton General Hospital, United Kingdom
  5. Department of Women and Children's Health, King's College London, United Kingdom
  6. School of Medicine, University of Leeds, United Kingdom
  7. School of Food and Biological Engineering, Jiangsu University, China
  8. Epigenomics and Mechanisms Branch, International Agency For Research On Cancer, France
  9. Biomedical Research Centre, University of East Anglia, United Kingdom
  10. Bellvitge Institute for Biomedical Research, Spain
24 figures, 4 tables and 2 additional files

Figures

Study design.

DNAm: DNA methylation; EPIC: Illumina Infinium MethylationEPIC BeadChip; Epig. Roadmap: Roadmap Epigenomics Consortium; ESC: embryonic stem cell; ESS: epigenetic supersimilarity; GSA: Global …

Identification of Gambian season of conception associated CpGs: ENID (2 yr) vs. EMPHASIS (7–9 yr) cross-cohort analysis.

(A) Relationship between date of conception and date of sample collection for ENID (top) and EMPHASIS (bottom) cohorts. (B) Modelled seasonal change in methylation for 768 SoC-associated loci (false …

Identification of Gambian season of conception associated CpGs: ENID 2 yr vs. 5–7 yr longitudinal analysis.

(A) Relationship between conception date of modelled methylation maximum measured at 2 and 5–7 yr in the same n = 138 individuals from the ENID cohort. n = 157 SoC-CpGs with a significant SoC …

Properties of SoC-CpGs.

(A) Date of conception at modelled methylation maxima for 259 SoC-CpGs and 259 corresponding matched and random controls across all three analysed datasets. Green and yellow bands indicate the …

Early stage embryo, gametic, and parent-of-origin-specific methylation (PofOm).

(A) Methylation distribution of SoC-CpGs, matched controls, and array background in pre-gastrulation inner cell mass (ICM) and post-gastrulation embryonic liver, measured in reduced representation …

SoC-CpG overlap with predicted chromatin states.

Chromatin states predicted by ChromHMM (Ernst and Kellis, 2012) from chromatin marks in four cell lines and tissues generated by the Roadmap Epigenomics Consortium et al., 2015. Predicted states for …

Links between endogenous retroviruses (ERVs), ZFP57 binding sites, genetic variation, and DNAm at season of conception associated loci.

(A) Proportion of SoC-CpGs, matched controls, and array background CpGs proximal to ERV1 endogenous retroviral elements (top) and ZFP57 binding sites (bottom), within the specified distance. CpG …

Appendix 1—figure 1
Cross-cohort correlation of mean methylation values at 768 season of conception (SoC)-associated loci.

Data comprises n=233 and n=289 individuals from the ENID (2 yr) and EMPHASIS cohorts, respectively. 768 SoC-associated CpGs (false discovery rate [FDR]<5%) identified in the ENID (2 yr) cohort.

Appendix 1—figure 2
Date of conception vs. date of collection for samples analysed in the ENID longitudinal (5–7 yr) dataset.

Data comprises samples from n=138 ENID participants with methylation data at both 2 yr and 5–7 yr.

Appendix 1—figure 3
Date of conception at modelled methylation minima.

Conception dates at methylation minima for loci plotted in Figure 4A. Note that since seasonality is modelled by a single pair of Fourier terms, maxima and minima are 6 months apart.

Appendix 1—figure 4
Genomic locations (hg19) of 259 SoC-CpGs.
Appendix 1—figure 5
Relationship between the inter-CpG distance used to define season of conception (SoC-CpG clusters) and the number of identified clusters and singletons (CpGs not falling within a cluster).
Appendix 1—figure 6
Comparison of SoC effects at SoC-CpGs within and outside of SoC-CpG clusters.

Top: Conception date at modelled methylation maxima for ENID 2 yr (x-axis) and EMPHASIS (y-axis) cohorts, according to whether locus falls within (n=154; left-red), or outside (n=105; right-blue) of …

Appendix 1—figure 7
Seasonal variation at the IGF1R locus in ENID (2 yr) and EMPHASIS cohorts.

Unadjusted methylation beta values are plotted. Boxes represent inter-quartile ranges (IQRs) for season-specific DNAm at each locus. Note that for visualisation purposes Gambian seasons are …

Appendix 1—figure 8
Cluster-adjusted pairwise inter-CpG methylation correlations for CpG sets in ENID 2 yr (left) and EMPHASIS (right) cohorts.

This is the same as Figure 4D, but with a single CpG randomly sampled from each CpG cluster so that CpGs in each set are a minimum distance of 5000 bp apart. This shows that pairwise correlation …

Appendix 1—figure 9
Mean methylation at SoC-CpGs and array background measured in n=289 individuals in the EMPHASIS (7–9 yr) cohort stratified by sperm and oocyte methylation status.

(Left): Methylation stratified according to sperm methylation status reported in Sugden et al., 2020. Sperm hypomethylation is defined as methylation ≤ 25%. (Right) As left but with loci …

Appendix 1—figure 10
Histone marks overlapping SoC-CpGs in H1 embryonic stem cells.

Histone marks or combinations thereof are ordered by abundance. H3 marks were generated by the Roadmap Epigenomics Consortium Sanchez-Delgado et al., 2016 and downloaded using the annotatr (v1.10.0) …

Appendix 1—figure 11
Links between ERVK endogenous retroviral elements, ZFP57 and TRIM28 binding sites, and DNAm at SoC-CpGs.

(A) Proportion of SoC-CpGs, matched controls, and array background CpGs proximal to ERVK endogenous retroviral elements (top), and ZFP57 and TRIM28 binding sites (bottom), within the specified …

Appendix 1—figure 12
Population structure in the EMPHASIS cohort.

(A) First and second (left) and second and third (right) principal components (PCs) from a principal component analysis (PCA) of genome-wide genetic data from n=294 individuals from the EMPHASIS …

Appendix 1—figure 13
Season of conception analysis adjusted for ethnicity.

This figure replicates Figure 4A in the main paper for the ENID 2 yr and EMPHASIS cohorts, but with additional adjustment for ethnicity. Here, we adjust for ethnicity in the EMPHASIS cohort using …

Appendix 1—figure 14
UCSC genome browser plot of the IGF1R SoC-CpG cluster.

Seven SoC-CpGs on the Illumina 450k array are highlighted. This region falls within intron 2 and bears the hallmarks of being a promoter and/or active or poised enhancer in multiple cell lines. The …

Appendix 1—figure 15
Seasonal differences in methylation at 30,573 LINE1 and 30,840 Alu elements, and 391,814 array background CpGs.

Methylation values at each locus are centred to have mean zero to enable comparison across loci. LINE1 and Alu methylation values are predicted using REMP (v1.16.0) (see Materials and methods). …

Appendix 1—figure 16
Selection of matched controls.

Matched controls are selected from array background using Kolmogorov-Smirnov (KS) tests to identify CpGs with similar methylation distributions to SoC-CpGs (see Materials and methods). Top: …

Appendix 1—figure 17
Seasonal variation in estimated cell composition.

Cell proportions for all samples from each cohort are estimated using the estimateCellCounts function from the minfi package in R. Seasonal variation (red curves) for each cell type is determined …

Tables

Table 1
Gambian seasonality-methylation analysis: cohort characteristics.
CohortSample sizeAge% maleTissueMethylation array
ENID (2 yr)2332 years50.6Peripheral bloodIllumina Infinium HM450
ENID (5–7 yr)1385–7 years56.5Illumina Infinium MethylationEPIC
EMPHASIS2897–9 years54.3Peripheral bloodIllumina Infinium MethylationEPIC
  1. Note: ENID: Early Nutrition and Immune Development Trial (Moore et al., 2012); EMPHASIS: Epigenetic Mechanisms linking Pre-conceptional nutrition and Health Assessed in India and Sub-Saharan Africa (Chandak et al., 2017). Individuals with ENID longitudinal (5–7 yr) methylation data are a subset of those with methylation at 2 yr. There is no overlap between individuals included in the ENID and EMPHASIS cohorts.

Table 2
CpG sets considered in this analysis.
CpG setNumber of CpGsNotes
Array background391,814Intersection of CpGs on Illumina HM450 (ENID 2 yr) and EPIC (EMPHASIS) cohort arrays, post QC
SoC-CpGs259SoC-associated CpGs with SoC effect size (SoC methylation amplitude) > 4% in the ENID 2 yr dataset
Matched controls259CpGs with similar methylation distributions to SoC-CpGs in the ENID 2 yr dataset*
Random controls259Random sample from array background
  1. *

    Matching methylation distributions determined by Kolmogorov-Smirnov tests (see Appendix 1—figure 16). QC: quality control; LRT: likelihood ratio test. See Materials and methods for further details.

Table 3
External datasets considered in this analysis.
CpG setNotes
Putative metastable epialleles (MEs)1881 ME/SIV/ESS CpGs overlapping array background identified in multi-tissue and MZ/DZ screens in Van Baak et al., 2018 and Kessler et al., 2018.
Parent-of-origin-specific methylation (PofOm)699 Parent-of-origin-specific methylation loci identified in peripheral blood in Zink et al., 2018, overlapping array background.
Embryo DNAm dataRRBS data for inner cell mass and embryonic liver (<10 weeks’ gestation) from Guo et al., 2014.
Sperm DNAm dataWGBS data from Okae et al., 2014.
Germline DMRs (gDMRs)Regions differentially methylated in sperm and oocytes identified in WGBS data by Sanchez-Delgado et al., 2016.
Transposons (ERVs)ERVs determined by RepeatMasker were downloaded from the UCSC h19 annotations repository.
Transcription factor ChIP-seqZFP57, TRIM28, and CTCF transcription factor binding sites identified from ChIP-seq in human embryonic kidney and hESCs are described in Kessler et al., 2018.
Chromatin state predictions and histone three marksChromatin state predictions for H1 ESCs, fetal brain, fetal muscle, and fetal small intestine generated using Ernst and Kellis, 2012, from Roadmap Epigenomics Consortium et al., 2015. Histone mark data are from the same source.
  1. ME: metastable epiallele; SIV: systemic interindividual variation; ESS: epigenetic supersimilarity; MZ/DZ: monozygotic/dizygotic twins; PofOm: parent-of-origin methylation; RRBS: reduced representation bisulfite-seq; DMR: differentially methylated region; ERV: endogenous retrovirus; ESCs: embryonic stem cells. See materials and methods for further details.

Table 4
Methylation quantitative trait loci (mQTL) associated with SoC-CpGs and controls.
CpG setNumber of CpGs with mQTLNumber of mQTL (cis/trans)Median number of mQTL per CpG (IQR)Methylation variance explained*
SoC-CpGs130 (50%)2771 (2549/222)6 (2–30)0.09 (0.08–0.15)
Matched controls201 (78%)7886 (7417/469)15 (4–50)0.09 (0.06–0.21)
Random controls50 (19%)1512 (1476/36)7 (2–35)0.1 (0.08–0.18)
  1. *

    delta adjusted R2 (see Materials and methods); IQR: inter-quartile range.

Additional files

Transparent reporting form
https://cdn.elifesciences.org/articles/72031/elife-72031-transrepform1-v2.docx
Supplementary file 1

Supplementary tables.

(a) ENID SoC-associated CpGs (no amplitude filter applied). (b) Seasonal amplitude tests. (c) Inter-cohort change in SoC amplitude at ENID SoC-associated CpGs. (d) SoC-CpGs (ENID SoC-associated CpGs with SoC methylation amplitude >=4%). (e) SoC-CpG clusters and singletons. (f) SoC-CpG cluster sizes. (g) SoC-CpG enrichment for MEs and overlap with ME sub-classes. (h) SoC-CpG enrichment for gDMRs and parent-of-origin specific methylation. (i) SoC-CpGs overlapping maternally methylated germline DMRs. (j) SoC-CpGs mQTL and their association with SoC. (k) Candidate genes previously associated with periconception / gestational exposures overlapping array background. (l) Look-up of SoC-CpGs in the EWAS Catalog. (m) EWAS Catalog data grouped by trait. (n) SoC-CpG associations with selected variables measured in the ENID 2yr dataset. (o) Look-up of SoC-CpGs in the GWAS Catalog. (p) ENID (2yr): Association test p-values to detect potential residual confounding. (q) EMPHASIS (7-9yr): Association test p-values to detect potential residual confounding. (r) ENID 5-7yr: Association test p-values to detect potential residual confounding. (s) ENID (2yr): Cell count, batch, village sensitivity analyses.

https://cdn.elifesciences.org/articles/72031/elife-72031-supp1-v2.xlsx

Download links