DNA methylation in Arabidopsis has a genetic basis and shows evidence of local adaptation

  1. Manu J Dubin  Is a corresponding author
  2. Pei Zhang
  3. Dazhe Meng
  4. Marie-Stanislas Remigereau
  5. Edward J Osborne
  6. Francesco Paolo Casale
  7. Philipp Drewe
  8. André Kahles
  9. Geraldine Jean
  10. Bjarni Vilhjálmsson
  11. Joanna Jagoda
  12. Selen Irez
  13. Viktor Voronin
  14. Qiang Song
  15. Quan Long
  16. Gunnar Rätsch
  17. Oliver Stegle
  18. Richard M Clark
  19. Magnus Nordborg  Is a corresponding author
  1. Vienna Biocenter, Austria
  2. University of Southern California, United States
  3. University of Utah, United States
  4. Wellcome Trust Genome Campus, United Kingdom
  5. Max Planck Society, Germany
  6. Memorial Sloan-Kettering Cancer Center, United States
8 figures, 5 tables and 3 additional files

Figures

Figure 1 with 2 supplements
The effect of CMT2 on genome-wide CHH methylation levels.

(A) Genome-wide average methylation level reaction norms for each accession (156 samples at 10°C and 125 samples at 16°C). Only CHH levels differ significantly between temperatures (Wilcoxon rank sum test; p = 1.7e-16). (B) Manhattan plot of genome-wide association studies (GWAS) results using average levels of CHH methylation for 151 accessions at 10°C on large transposons as the phenotype (the peak is also seen at 16°C [not shown]). The threshold line indicates a Bonferroni-corrected p-value of 0.05. (C) CHH methylation on large (over 2 kb) transposons at 10°C by CMT2 two-locus genotype (population sizes are 36, 82, and 24 for CMT2anr/nr/CMT2br/r, CMT2ar/r/CMT2br/r, CMT2ar/r/CMT2bnr/nr, respectively). The values plotted are the Best Linear Unbiased Predictor (BLUP) estimates after correcting for population structure. Since accessions are homozygous, only four genotypes are possible, of which only three exist due to complete linkage disequilibrium between CMT2a and CMT2b. Figure 1—figure supplement 1 shows Manhattan plots of GWAS results for global methylation averages. Figure 1—figure supplement 2 shows Stepwise GWAS using average CHH methylation of TE's.

https://doi.org/10.7554/eLife.05255.003
Figure 1—figure supplement 1
Manhattan plots of GWAS results for global methylation averages.

(A) CG methylation at 10°C. (B) CHG methylation at 10°C. (C) CHH methylation at 10°C. Results for methylation at 16°C were similar (data not shown).

https://doi.org/10.7554/eLife.05255.004
Figure 1—figure supplement 2
Stepwise GWAS using average CHH methylation of TE's as a phenotype.

(A) Without a cofactor. (B) Including SNP on chr 4 at position 10,459,127 (CMT2a) as a cofactor. (C) Including snps on chr 4 at 10,459,127 (CMT2a) and 10,454,628 (CMT2b) as cofactors. The threshold line indicates a Bonferroni-corrected p-value of 0.05.

https://doi.org/10.7554/eLife.05255.005
CHH methylation levels in an F2 population map to CMT2.

(A) CHH methylation on large transposons by CMT2 genotype in an F2 population of 113 individuals (population sizes are 19, 52, and 38 for CMT2anr/nr/CMT2br/r, CMT2ar/r/CMT2br/r, CMT2ar/r/CMT2bnr/nr, respectively; 4 individuals whose genotype at CMT2 could not be accurately inferred were omitted). (B) Mapping of CHH methylation of long TEs in the same population. The dotted line indicates a LOD threshold with a genome-wide p-value of 0.05 obtained using 1000 permutations, and the vertical blue line shows the marker interval that contains CMT2.

https://doi.org/10.7554/eLife.05255.006
Genetic basis CHH methylation variation.

(A) GWAS for CHH differentially methylated regions (DMRs) at 10°C in 151 accessions, defined using 200 bp sliding windows across the genome and selecting the 200,000 most variable ones. For each DMR, SNPs significantly associated at the Bonferroni-corrected 0.05-level are plotted. (B) Variance-components analysis of the CHH DMRs. For each DMR, a mixed model with cis, CMT2, and genome-wide trans effects, plus environment and genetic interactions with environment was fitted (see ‘Materials and methods’). DMRs were binned by the total variance explained by the model. The density of DMRs in each bin is shown at the top, and the bottom shows the average variance-decomposition for each bin.

https://doi.org/10.7554/eLife.05255.007
CHH methylation varies with temperature.

(A) Average methylation levels over variable transposons at 10°C (orange) vs 16°C (red), and over non-variable transposons at 10°C (purple) vs 16°C (dark blue). Methylation for variable TEs is significantly higher (permutation p-value for CHH methylation = 0.05). (B) The density of variable (red) and non-variable TEs along chromosomes in 500 kb windows. Density is defined as the percentage of the total number in either category in each window; pericentromeric regions are shaded grey. (C) The expression of TEs at both temperatures. Variable TEs are more highly expressed than non-variable TEs, but the difference is only statistically significant at 16°C (Wilcoxon: 10°C, p = 0.15; 16°C, p = 0.023).

https://doi.org/10.7554/eLife.05255.008
Figure 5 with 2 supplements
Temperature dependent CHH methylation variation at RdDM and CMT2 controlled DMRs.

CHH methylation at CMT2- and DCL3-dependent DMRs in natural accessions grown at 10°C and 16°C (cf. Figure 1A, each population has 110 individuals). The difference between temperatures was highly significant for both types of DMR (Wilcoxon p-value = 9.1e-11 and p-value = 5.9e-12 respectively). Black points/grey lines indicate accessions with the CMT2 reference allele; green, the CMT2a non-reference allele; and orange, the CMT2b non-reference allele. Red is the TAA-03 accession, which has a putative null allele of CMT2. Average methylation levels for each of the genotypes are shown in bars to the side Figure 5—figure supplement 1 shows GWAS on CMT2 and DCL3 dependant DMRs. Figure 5—figure supplement 2 shows a putative null allele of CMT2.

https://doi.org/10.7554/eLife.05255.010
Figure 5—figure supplement 1
GWAS on CMT2 and DCL3 dependent DMRs.

(A) GWAS for CMT2-dependent DMRs at 10°C. (B) GWAS on DCL3-dependent DMRs at 10°C. Results from 16°C were similar in both cases. The threshold line indicates a Bonferoni-corrected p-value of 0.05.

https://doi.org/10.7554/eLife.05255.011
Figure 5—figure supplement 2
Putative null allele of CMT2.

A screenshot from a genome browser indicating the lack of read coverage for CMT2 stretching from intron 7 to exon 16 in the accession TAA-03.

https://doi.org/10.7554/eLife.05255.012
Figure 6 with 6 supplements
Latitudinal difference in gene body methylation (GBM) and gene expression.

(A) Global CG methylation levels at 10°C for 151 accessions are strongly correlated with minimum temperature at the location of origin. Results for 16°C are similar. (B) Genes with GBM are more highly expressed at 10°C in northern (blue) than in southern (red) accessions (wilcoxon rank sum test p = 2.1e-03), whereas genes without GBM show little difference (p = 1.9e-02). At 16°C the difference for genes with GBM is more significant (p = 6.4e-05), whereas the difference for genes without GBM is insignificant (p = 0.49).

https://doi.org/10.7554/eLife.05255.014
Figure 6—figure supplement 1
Correlation between CG methylation levels and the minimum temperature at location of origin.

Above, GBM at 10°C and 16°C. Below, TE CG methylation at 10°C and 16°C.

https://doi.org/10.7554/eLife.05255.015
Figure 6—figure supplement 2
Filtering of GBM variation data.

(A) Genes with low or no CHG methylation have variable levels of CG methylation, while genes with appreciable CHG methylation have very high CG (and CHH) methylation. (B) Among genes with only CG GBM, variance-component analysis reveals a bimodal distribution of the total variance explained: variation in methylation for genes with low levels of methylation typically does not appear to have a genetic basis.

https://doi.org/10.7554/eLife.05255.016
Figure 6—figure supplement 3
Distribution of methylation levels at individual CG dinucleotides within GBM genes.

The histogram shows the average methylation level for each individual CG dinucleotide on GBM genes in all accessions in the north (blue) or in the south (red).

https://doi.org/10.7554/eLife.05255.017
Figure 6—figure supplement 4
Distribution of variation in methylation levels between the north and the south for individual CG dinucleotides within GBM genes.

The histogram shows the average methylation level for each individual CG dinucleotide on GBM genes in all accessions in the north minus the average methylation level in the south for each dinucleotide.

https://doi.org/10.7554/eLife.05255.018
Figure 6—figure supplement 5
Accessions with higher average GBM tend to have higher average expression (of genes with GBM, normalized by genes without GBM; r = 0.131, p = 0.0386).
https://doi.org/10.7554/eLife.05255.019
Figure 6—figure supplement 6
Genes with GBM show less expression variation between temperatures.

Mean per-gene variation in expression between 10°C and 16°C is reduced for GBM containing genes in northern (blue) accessions compared to southern (red) accessions (wilcoxon rank sum test p = 1.2e-05), whereas for genes without GBM the difference between north (light blue) and south (pink) is insignificant.

https://doi.org/10.7554/eLife.05255.020
The genetic basis of GBM.

(A) Variance component analysis of GBM. (B) Significant associations (Bonferroni-corrected 0.05-level) from a GWAS of GBM for individual genes. (C) Correlation between non-reference allele at associated SNPs and GBM.

https://doi.org/10.7554/eLife.05255.021
Figure 8 with 1 supplement
Frequency and distribution of GBM associated SNPs.

(A) Correlation between non-reference allele at associated SNPs and latitude. (B) Non-reference allele frequency distribution for cis and trans SNPs compared to random SNPs. (C) Accessions carrying the non-reference alleles are limited to northern Sweden (accessions with the non-reference allele at 8 or more of the 15 loci blue, remaining accessions are red).

https://doi.org/10.7554/eLife.05255.022
Figure 8—figure supplement 1
Linkage disequilibrium between the 15 highly associated trans-SNPs.
https://doi.org/10.7554/eLife.05255.023

Tables

Table 1

Super-families (italics) and families that are over-represented among ‘variable’ TEs

https://doi.org/10.7554/eLife.05255.009
TE (super-)familyExpectedObservedEnrichment95th Quantile
RathE1_cons5264.5610
RathE3_cons293.236
RathE2_cons152.524
SINE372.007
RC/Helitron3464441.28368
DNA/MuDR1441841.27162
ATREP245312.078
RP1_AT22711.595
ATTIRX1C11211.493
ATREP132289.876
VANDAL221118.563
SIMPLEHAT11117.344
VANDAL2N11107.323
ATREP82136.475
VANDAL2176.083
ATREP101105.934
AT9NMU1175.813
ATN9_11105.754
SIMPLEHAT21115.634
META13205.417
ATDNAI27T9A3154.836
ATREP2A3154.836
ATCOPIA78034.672
VANDAL18NA034.672
RathE1_cons5264.5610
VANDAL14034.482
SIMPLEGUY13134.196
ATDNATA1034.002
TNAT2A143.933
ATREP74163.648
RathE3_cons293.236
ATREP14143.183
ATREP16143.183
LIMPET1392.906
ATREP64142.898
ARNOLDY27222.8513
ATSINE4272.675
ATDNAI27T9C272.426
ATREP338922.3949
ARNOLDY16142.2111
ATREP113231.7319
HELITRONY337511.3648
Table 2

Correlation between methylation levels and environment-of-origin variables (Hancock et al., 2011)

https://doi.org/10.7554/eLife.05255.013
Environmental variableGrowing temp.CGCHGCHH
rrhop-valuerrhop-valuerrhop-value
Latitude100.690.527.8E-11−0.24−0.192.7E-020.100.141.1E-01
160.620.473.2E-07−0.21−0.204.2E-020.04−0.112.5E-01
Longitude100.590.541.2E-11−0.14−0.093.1E-010.230.287.5E-04
160.550.534.4E-09−0.12−0.037.4E-010.140.151.2E-01
Temperature seasonality100.680.491.6E-09−0.27−0.244.8E-030.090.092.8E-01
160.620.421.1E-05−0.23−0.266.6E-030.04−0.122.1E-01
Max. temp. (warmest month)10−0.140.064.6E-01−0.07−0.131.3E-010.140.202.0E-02
16−0.030.102.9E-01−0.10−0.203.8E-020.050.037.3E-01
Min. temp. (coldest month)10−0.70−0.569.1E-130.270.211.2E-02−0.07−0.064.7E-01
16−0.63−0.482.7E-070.240.241.4E-020.000.195.6E-02
Precipitation (wettest month)100.450.521.2E-10−0.25−0.271.2E-03−0.20−0.121.7E-01
160.290.434.0E-06−0.26−0.241.2E-02−0.22−0.195.8E-02
Precipitation (driest month)100.310.401.5E-06−0.33−0.296.5E-04−0.24−0.211.6E-02
160.210.327.4E-04−0.26−0.241.4E-02−0.15−0.186.0E-02
Precipitation seasonality100.420.447.1E-08−0.07−0.165.4E-020.050.019.0E-01
160.360.371.2E-04−0.13−0.161.1E-010.01−0.019.1E-01
PAR (spring)100.040.228.9E-030.200.183.7E-020.240.237.3E-03
160.030.186.6E-020.270.213.5E-020.380.352.8E-04
Length of growing season10−0.59−0.575.5E-130.240.237.3E-03−0.16−0.183.3E-02
16−0.58−0.544.0E-090.230.213.0E-02−0.040.018.9E-01
No. consecutive cold days100.600.534.0E-11−0.19−0.131.2E-010.170.281.1E-03
160.570.534.2E-09−0.17−0.093.7E-010.100.084.1E-01
No. consecutive frost-free days10−0.59−0.491.2E-090.290.271.5E-030.020.037.1E-01
16−0.51−0.394.9E-050.300.301.6E-030.070.131.9E-01
Relative humidity (spring)100.620.475.6E-09−0.23−0.183.9E-020.090.064.5E-01
160.530.371.2E-04−0.20−0.267.6E-030.04−0.084.3E-01
Daylength (spring)100.690.507.2E-10−0.27−0.211.4E-020.080.055.7E-01
160.630.411.5E-05−0.23−0.292.7E-030.04−0.178.7E-02
Aridity100.530.498.4E-10−0.35−0.311.9E-04−0.18−0.211.3E-02
160.430.428.4E-06−0.28−0.241.3E-02−0.13−0.203.8E-02
  1. r = Pearson's correlation, rho = Spearman's rank correlation, p-value = significance of rho.

  2. PAR = photosynthetically active radiation.

Table 3

15 SNPs associated with gene body methylation (GBM) at 5 or more genes

https://doi.org/10.7554/eLife.05255.024
ChrPositionAssociated with GBM at how many genes?Non-reference allele countSNP-latitude correlationOverlap with sweep (Long et al., 2013)Overlap with min. temp. Assoc. SNPs (Hancock et al., 2011)
19122918420.73none1_914088_0.21
144051035660.64nonenone
176141015480.66nonenone
1197559675880.75none1_19757140_0.24
269986316550.872_6931030none
276550166810.612_7613651none
276604699550.782_76136512_7662427_0.30
276660595690.722_76136512_7665507_0.25
276808825820.632_7613651none
279157126510.83none2_7913782_0.23
293824955730.71none2_9383856_0.34
296538789480.80nonenone
34193098660.68nonenone
45199828660.70nonenone
4132900345740.74nonenone
Table 4

Genes in the plant chromatin database that are within 100 kb of one of the 15 SNPs associated with GBM at 5 or more genes

https://doi.org/10.7554/eLife.05255.025
ChromDBLocus
ARID3AT2G17410
ARP3AT1G13180
CHB4AT1G21700
CHR9AT1G03750
CHR35AT2G16390
CONS3AT3G02380
DNG12AT1G21710
FLCP39AT3G02310
FLCP16AT2G22630
FLCP9AT2G22540
GTI1AT2G22720
HMGB4AT2G17560
JMJ27AT4G00990
NFA1AT4G26110
SDG23AT2G22740
SDG37AT2G17900
YDG2AT2G18000
HON3AT2G18050
Table 5

Genes within 100 kb of the 15 SNPs associated with GBM at 5 or more genes whose expression is also correlated with the SNP

https://doi.org/10.7554/eLife.05255.026
SNPLocusDesciptionp-value
1_19755967AT1G53030Encodes a copper chaperone4.72E-07
1_19755967AT1G52880NO APICAL MERISTEM (NAM) Transcription factor with a NAC domain5.47E-07
1_19755967AT1G52990Thioredoxin family protein2.36E-05
1_19755967AT1G52780Protein of unknown function (DUF2921)1.46E-04
1_4405103AT1G12750RHOMBOID-like protein 6 (RBL6); FUNCTIONS IN: serine-type endopeptidase activity3.74E-08
1_4405103AT1G12790RuvA domain 2-like2.76E-05
1_4405103AT1G12730GPI transamidase subunit2.81E-05
1_4405103AT1G13080CYTOCHROME P450 FAMILY 71 SUBFAMILY B POLYPEPTIDE 2 (CYP71B2)1.65E-04
1_7614101AT1G21790TRAM LAG1 and CLN8 (TLC) lipid-sensing domain containing protein1.10E-05
1_7614101AT1G21900Encodes an ER-localized p24 protein8.81E-05
1_7614101AT1G21760F-BOX PROTEIN 7 (FBP7) putative translation regulator in temperature stress response8.54E-04
1_912291AT1G03660Ankyrin-repeat containing protein1.26E-10
1_912291AT1G03770RING1B protein with similarity to polycomb repressive core complex1 (PRC1)5.76E-07
1_912291AT1G03940HXXXD-type acyl-transferase family protein1.18E-06
1_912291AT1G03610Protein of unknown function (DUF789)6.91E-06
1_912291AT1G03580Pseudogene with weak similarity to ubiquitin-specific protease 121.29E-05
1_912291AT1G03830Guanylate-binding family protein3.50E-05
2_6998631AT2G16340Unknown protein1.35E-08
2_6998631AT2G16210Transcriptional factor B3 family protein1.69E-04
2_7666059AT2G17630Pyridoxal phosphate (PLP)-dependent transferases superfamily protein2.47E-18
2_7660469AT2G17620Cyclin B2;1 (CYCB2;1)9.68E-07
2_7655016AT2G17740Cysteine/Histidine-rich C1 domain family protein1.22E-04
2_7655016AT2G17420NADPH-DEPENDENT THIOREDOXIN REDUCTASE 2 (NTR2)9.96E-04
2_7666059AT2G17430MILDEW RESISTANCE LOCUS O 7 (MLO7)7.56E-04
2_7915712AT2G18100Protein of unknown function (DUF726)1.73E-06
2_7915712AT2G17980ATSLY member of SLY1 Gene Family1.33E-05
2_7915712AT2G18400Ribosomal protein L6 family protein1.26E-04
2_7915712AT2G18150Haem peroxidase8.05E-04
2_7915712AT2G18050HISTONE H1-3 (HIS1-3)9.47E-04
2_9382495AT2G22260HOMOLOG OF E. COLI ALKB (ALKBH2) enzyme involved in DNA methylation damage repair1.21E-08
2_9382495AT2G21850Cysteine/Histidine-rich C1 domain family protein5.38E-06
2_9382495AT2G22240MYO-INOSITOL-1-PHOSPHATE SYNTHASE 1 (MIPS1)8.71E-05
2_9382495AT2G21940SHIKIMATE KINASE 1 (ATSK1) localized to the chloroplast1.80E-04
2_9653878AT2G22660Protein of unknown function (duplicated DUF1399)2.22E-14
2_9653878AT2G22900Galactosyl transferase GMA12/MNN10 family protein5.08E-09
2_9653878AT2G22830Squalene epoxidase 2 (SQE2)3.91E-06
2_9653878AT2G22640BRICK1 (BRK1)6.17E-05
2_9653878AT2G22540SHORT VEGETATIVE PHASE (SVP) Floral repressor involved in thermosensory pathway2.46E-04
2_9653878AT2G22570NICOTINAMIDASE 1 (NIC1)2.67E-04
2_9653878AT2G22770NAI1 Transcription factor7.71E-04
3_419309AT3G02220Protein of unknown function (DUF2039)2.06E-16
3_419309AT3G02230REVERSIBLY GLYCOSYLATED POLYPEPTIDE 1 (RGP1)4.58E-14
3_419309AT3G02300Regulator of chromosome condensation (RCC1) family protein1.25E-10
3_419309AT3G02120Hydroxyproline-rich glycoprotein family protein1.81E-09
3_419309AT3G01980Short-chain dehydrogenase/reductase (SDR)3.91E-09
3_419309AT3G02370Unknown protein4.53E-08
3_419309AT3G02020ASPARTATE KINASE 3 (AK3)4.18E-07
3_419309AT3G02160Bromodomain transcription factor2.60E-06
3_419309AT3G02390Unknown chloroplast protein5.60E-06
3_419309AT3G02050K+ UPTAKE TRANSPORTER 3 (KUP3)1.28E-05
3_419309AT3G02125Unknown chloroplast protein2.12E-05
3_419309AT3G02200Proteasome component (PCI) domain protein1.16E-04
3_419309AT3G02180SPIRAL1-LIKE3 Regulates cortical microtubule organization4.56E-04
3_419309AT3G02250O-fucosyltransferase family protein5.31E-04
3_419309AT3G02110Serine carboxypeptidase-like 25 (scpl25)6.18E-04
4_13290034AT4G26255Non-coding RNA of unknown function1.67E-13
4_13290034AT4G26450WPP DOMAIN INTERACTING PROTEIN 1 (WIP1)1.13E-04
4_13290034AT4G26230Ribosomal protein L31e family protein1.74E-04
4_13290034AT4G26160ATYPICAL CYS HIS RICH THIOREDOXIN 1 (ACHT1)5.72E-04
4_519982AT4G01090Protein of unknown function (DUF3133)1.23E-06
4_519982AT4G01230Reticulon family protein2.33E-05
4_519982AT4G01410Late embryogenesis abundant (LEA) hydroxyproline-rich glycoprotein family5.44E-05
4_519982AT4G01330Serine/threonine-protein kinase2.22E-04
4_519982AT4G01200Calcium-dependent lipid-binding (CaLB domain) family protein3.93E-04
4_519982AT4G01390TRAF-like family protein3.99E-04
4_519982AT4G01040Glycosyl hydrolase superfamily protein5.66E-04
4_519982AT4G01000Ubiquitin-like superfamily protein8.55E-04

Additional files

Supplementary file 1

Multiple sequence alignment of CMT2 sequences from different accessions.

https://doi.org/10.7554/eLife.05255.027
Supplementary file 2

Multiple sequence alignment of CMT2 sequences from different accessions in FASTA format.

https://doi.org/10.7554/eLife.05255.028
Supplementary file 3

Source for scripts used to extract SNPs from RNA-seq data and comparing to existing 250k SNP data.

https://doi.org/10.7554/eLife.05255.029

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Manu J Dubin
  2. Pei Zhang
  3. Dazhe Meng
  4. Marie-Stanislas Remigereau
  5. Edward J Osborne
  6. Francesco Paolo Casale
  7. Philipp Drewe
  8. André Kahles
  9. Geraldine Jean
  10. Bjarni Vilhjálmsson
  11. Joanna Jagoda
  12. Selen Irez
  13. Viktor Voronin
  14. Qiang Song
  15. Quan Long
  16. Gunnar Rätsch
  17. Oliver Stegle
  18. Richard M Clark
  19. Magnus Nordborg
(2015)
DNA methylation in Arabidopsis has a genetic basis and shows evidence of local adaptation
eLife 4:e05255.
https://doi.org/10.7554/eLife.05255