Natural epigenetic polymorphisms lead to intraspecific variation in Arabidopsis gene imprinting

  1. Daniela Pignatta
  2. Robert M Erdmann
  3. Elias Scheer
  4. Colette L Picard
  5. George W Bell
  6. Mary Gehring  Is a corresponding author
  1. Whitehead Institute for Biomedical Research, United States
  2. Massachusetts Institute of Technology, United States
7 figures and 3 additional files

Figures

Figure 1 with 4 supplements
mRNA-seq identifies genes with biased expression.

(A) Proportion of maternal (m) and paternal (p) reads for all three sets of reciprocal crosses in the endosperm. One replicate of each reciprocal cross is shown. Biases represented by each quadrant are depicted for Col-Ler endosperm crosses but apply to all graphs. Orange and pink dots represent MEGs (pink dots are MEGs in all three sets of reciprocal crosses), blue and green dots represent PEGs (blue dots are PEGs in all three sets of reciprocal crosses). Crosshairs indicate the expected log ratio for genes that lack biased expression. (B) Overlap of MEGs and PEGs in the endosperm among three sets of reciprocal crosses. Pink and blue circles: Col-Ler; brown and purple circles: Col-Cvi; yellow and gray circles: Ler-Cvi. (C) Proportion of maternal (m) and paternal (p) reads for Col-Cvi and Cvi-Ler reciprocal crosses in the embryo. Colored dots as in part A. Figure 1—figure supplement 1 shows seeds used in the experiment. Figure 1—figure supplement 2 shows validation of an imprinted gene. Figure 1—figure supplement 3 examines maternal:paternal ratios of imprinted genes identified in one set of crosses in the other two sets of reciprocal crosses. Figure 1—figure supplement 4 examines overall expression levels of imprinted genes at other stages of development. Information on mRNA-seq library metrics is in Figure 1—source data 1 and allele-specific expression information for all genes in endosperm and embryo is in Figure 1—source data 2 and Figure 1—source data 3, respectively. Figure 1—source data 4 shows the overlap among imprinted genes identified in this study and those identified in previous efforts and Figure 1—source data 5 includes independent validation of imprinted genes.

https://doi.org/10.7554/eLife.03198.003
Figure 1—source data 1

mRNA-seq libraries generated in this study.

https://doi.org/10.7554/eLife.03198.004
Figure 1—source data 2

Endosperm imprinting data for all genes.

https://doi.org/10.7554/eLife.03198.005
Figure 1—source data 3

Embryo imprinting data for all genes.

https://doi.org/10.7554/eLife.03198.006
Figure 1—source data 4

Overlap among published imprinted gene lists.

https://doi.org/10.7554/eLife.03198.007
Figure 1—source data 5

Validation of imprinted genes.

https://doi.org/10.7554/eLife.03198.008
Figure 1—figure supplement 1
Seed development in the crosses used in this study.

(A) Seeds cleared with chloral hydrate and imaged 6 days after pollination. Scale bar = 100 microns for all panels. (B) Mature seeds. Scale bar = 500 microns.

https://doi.org/10.7554/eLife.03198.009
Figure 1—figure supplement 2
Validation of AT4G00750 allele-specific imprinting by RT-PCR and CAPs digestion.

Endosperm cDNA from the indicated crosses (female in cross listed first) was amplified using intron-spanning primers that flank a C>G polymorphism between Col and Ler or Cvi and then restriction digested with Hpy188I. The PCR amplifies a 324 bp product. After digestion with Hpy188I, Col remains uncut but Ler or Cvi alleles are digested to 209 and 115 bp. Consistent with the RNA-seq data, AT4G00750 expression is primarily from the maternally inherited allele except when Ler is the male parent. AT4G00750 is a MEG in both directions of the cross for Col-Cvi.

https://doi.org/10.7554/eLife.03198.010
Figure 1—figure supplement 3
Consistency of imprinting among different sets of reciprocal crosses.

Allele-specific expression ratios of imprinted genes identified in one set of reciprocal crosses in the other two sets of reciprocal crosses. Pink dots, MEGs in both sets of crosses being compared; orange dots, MEGs not shared with the dataset being plotted; blue dots, PEGs in both sets of crosses being compared; green dots, PEGs not shared with the dataset being plotted. Most pink and orange dots are in the upper right quardrant and most blue and green in the lower left, indicating consistent parental bias. m, maternal; p, paternal.

https://doi.org/10.7554/eLife.03198.011
Figure 1—figure supplement 4
Imprinted genes are expressed at multiple stages of development.

(A) Expression of 199 MEGs and 82 PEGs in leaves, shoot apex (Sh), flowers, roots (R), pollen (P), and seeds at various stages of development. Tissue series data was downloaded using the e-Northern expression tool from the Bio-Analytic Resource (Toufighi et al., 2005). (B) Expression of MEGs and PEGs in whole seeds (WS), embryo proper (EP), suspensor (S), micropylar endosperm (MCE), peripheral endosperm (PEN), chalazal endosperm (CZE), general seed coat (SC) and chalazal seed coat (CSC). Data is from Belmonte et al., 2013. Each tissue is organized by increasing developmental age from pre-globular to mature green. Data was clustered and visualized using GENE-E. Gene order is the same between A and B.

https://doi.org/10.7554/eLife.03198.012
A subset of genes is only imprinted when a certain strain is the male or female parent.

Process for identifying allele-specific imprinted genes that are PEGs except when Cvi is the male parent. Genes that are paternally biased in Cvi x Col but not Col x Cvi (blue dots) were identified. These genes were overlapped with the Ler-Cvi maternal/paternal log ratios for the same genes (green dots) to generate a list of candidate loci that are not PEGs when Cvi is the male parent. Intersection with Col-Ler PEGs (pink dots) identifies strain-specific imprinted genes that are PEGs except when Cvi is the male parent, including AT2G32370 and AT3G14205. All candidate allele-specific imprinted genes are in Figure 2—source data 1. m, maternal; p, paternal.

https://doi.org/10.7554/eLife.03198.013
Figure 2—source data 1

Candidate allele-specific imprinted genes.

https://doi.org/10.7554/eLife.03198.014
Figure 3 with 1 supplement
Cvi is hypomethylated in CG contexts.

(A) Box and whiskers plots of CG DNA methylation levels of genes and TEs in Col, Ler, and Cvi embryos. Line: median; gray dots: outliers. (B) Average CG DNA methylation profiles of genes (blue colors) and TEs (orange colors) in Col, Ler, and Cvi embryos. Relative to Col, mean Cvi methylation level was decreased by 56% in genes (p=0.00, Tukey's HSD test) and by 14% in TEs (p=0.00, Tukey's HSD test). (C) DNA methylation in Col, Ler and Cvi embryos at a representative genomic region that includes genes and TEs. CG (red), CHG (blue) and CHH (green) methylation. Tick marks below the line indicate cytosines for which data was present but no methylation was detected. Figure 3—figure supplement 1 contains additonal analyses, Figure 3—source data 1 has statistics on BS-libraries and Figure 3—source data 2 shows the complete statistical analysis of methylation in Cvi compared to other strains.

https://doi.org/10.7554/eLife.03198.015
Figure 3—source data 1

BS-Seq libraries generated in this study.

https://doi.org/10.7554/eLife.03198.016
Figure 3—source data 2

Statistical analysis of strain differential methylation in genes and TEs.

https://doi.org/10.7554/eLife.03198.017
Figure 3—figure supplement 1
Cvi is hypomethylated in CG contexts regardless of tissue type but is not as hypomethylated in CHG and CHH contexts.

Box and whisker plots of DNA methylation levels of genes (left) and TEs (right). (A) CHG methylation in Col, Ler, and Cvi embryos. Relative to Col, mean Cvi methylation level was decreased by 22% in genes (p<0.05, Tukey's HSD test) and by 2% in TEs (p>0.05, Tukey's HSD test). (B) CHH methylation in Col, Ler, and Cvi embryos. Relative to Col, mean Cvi methylation level was decreased by 26% in genes (p<0.05, Tukey's HSD test) and by 9% in TEs (p<0.05, Tukey's HSD test). (C) Box and whisker plots of % CG DNA methylation of genes (left) and TEs (right) in Col and Cvi embryos in comparison to Col and Cvi leaves, using methylation data from Schmitz et al. (2013). Relative to Col, mean Cvi methylation level was decreased by 56% (embryo) and 54% (leaf) in genes and by 14% in TEs in both tissues (p<0.05, Tukey's HSD test). Line: median; gray dots: outliers. Statistics are in Figure 3—source data 2.

https://doi.org/10.7554/eLife.03198.018
Figure 4 with 5 supplements
Strain DMRs and embryo-endosperm DMRs are in distinct genomic regions.

(A) Number of features overlapping strain DMRs between Col and Ler or Col and Cvi embryos. (B) Number of features overlapping embryo-endosperm DMRs in Col x Cvi and Cvi x Col crosses. (C and D) 24 nt small RNA quantities (reads per million) corresponding to Col-Cvi strain (C) and Col x Cvi or Cvi x Col embryo-endosperm DMRs (D). (E) Overlap between Col-Cvi strain positive CG DMRs (more methylated in Col than Cvi) and the union of Col x Cvi and Cvi x Col embryo-endosperm CG DMRs (embryo more methylated than endosperm) corresponding to genes, TEs, and intergenic regions. Figure 4—figure supplement 1 show the distribution of all methylation differences; Figure 4—figure supplement 2 shows DMR analysis in Ler-Cvi crosses and other datasets; Figure 4—figure supplement 3 validates DMRs identified in this analysis; Figure 4—figure supplement 4 examines small RNAs at TEs and Figure 4—figure supplement 5 shows the overlap of embryo-endosperm CpG DMRs with previous studies. sRNA-seq library metrics are in Figure 4—source data 1.

https://doi.org/10.7554/eLife.03198.023
Figure 4—source data 1

Whole seed sRNA-seq libraries generated in this study.

https://doi.org/10.7554/eLife.03198.024
Figure 4—figure supplement 1
Distribution of endosperm-embryo and strain CG DNA methylation differences.

Histograms showing the distribution of all the 300 nt comparisons irrespective of the associated p-value. x axis: difference in weighted methylation; y axis: number of windows.

https://doi.org/10.7554/eLife.03198.025
Figure 4—figure supplement 2
Ler and Cvi strain DMRs and embryo-endosperm DMRs in additional datasets.

(A) Number of features overlapping strain DMRs between Ler and Cvi embryos. (B) Number of features overlapping embryo-endosperm DMRs in Cvi x Ler and Ler x Cvi crosses. (C) Number of features overlapping embryo-endosperm DMRs in Ler x Col and Col x Ler crosses (analysis of dataset from Ibarra et al., 2012).

https://doi.org/10.7554/eLife.03198.026
Figure 4—figure supplement 3
Validation of BS-seq results with locus-specific BS-PCR or McrBC-PCR.

(A) AT4G21430. (B) AT5G17320. (C) AT1G65330. Top: CG, CHG and CHH methylation profiles of Col, Ler, and Cvi embryos and strain DMRs selected for validation. Bottom: methylation of individual sequenced clones from locus-specific BS-PCR. Filled circles indicate methylation, whereas unmethylated positions remain unfilled. (D) McrBC digestion of leaf genomic DNA followed by PCR of AT2G34880 and AT1G48910.

https://doi.org/10.7554/eLife.03198.027
Figure 4—figure supplement 4
Distribution of TE superfamilies and small RNAs within embryo-endosperm DMRs.

(A) TE superfamilies overlapped by CG or CHH DMRs (embryo > endosperm methylation in Cvi x Cvi) in comparison to the whole genome TE representation. TE superfamilies are as defined by TAIR10. (B) Box plots depicting the number of 21–24 nt small RNAs (reads per million) overlapping: all TEs of the designated class (black), CG DMRs (red) or CHH DMRs (green). p-values were calculated using the Wilcoxon-Mann-Whitney test, followed by a Bonferroni correction. *p<0.05; **p<0.01; ***p<0.001; ****p<0.0001.

https://doi.org/10.7554/eLife.03198.028
Figure 4—figure supplement 5
Overlap of embryo-endosperm CG DMRs with previous studies.

This study: DMRs identified from all matched comparisons, Gehring et al., 2009: Col-glxCol-gl and LerxLer DMRs, Ibarra et al., 2012: ColxLer and LerxCol DMRs combined. To identify the DMRs, this study and Ibarra et al. (2012) datasets were analyzed with the same analysis pipeline described in ‘Materials and methods’. Minimum overlap required between DMRs = 100 nt.

https://doi.org/10.7554/eLife.03198.029
Figure 5 with 3 supplements
Correspondence between DNA methylation, TEs, and sRNAs for imprinted genes.

(A) Average CG methylation in embryo and endosperm for the union set of PEGs, MEGs and all genes. (B) Percentage of genes with TE at indicated position. (C) Distribution of TEs and 24 nt small RNAs around endosperm imprinted MEGs (n = 85) and PEGs (n = 29) identified in at least two of three sets of reciprocal crosses. TE heatmap indicates the presence or absence of TEs according to TAIR10 annotation. 24 nt small RNA data is from ColxCvi whole seeds. Other libraries showed the same overall small RNA profile. Values were calculated in 200 nt windows extending 2 kb upstream and downstream from the 5′ and 3′ ends of the gene and 1 kb into the gene body. White indicates the absence of data. Figure 5—figure supplement 1 shows H3K27me3 profiles around imprinted genes in vegetative tissues. Figure 5—figure supplement 2 and Figure 5—figure supplement 3 further explore the distribution and allelic contribution of small RNAs associated with imprinted genes.

https://doi.org/10.7554/eLife.03198.030
Figure 5—figure supplement 1
Histone H3 lysine 27 trimethylation (H3K27me3) profiles of PEGs and MEGs in vegetative tissues.

H3K27me3 leaf data from Lafos et al. (2011).

https://doi.org/10.7554/eLife.03198.031
Figure 5—figure supplement 2
Small RNA levels around imprinted genes.

Box plots depicting 24 nt sRNAs in reads per million reads (RPM) within 1 kb windows associated with MEGs, PEGs, or all genes that could be evaluated for imprinted expression within that cross. Asterisks indicate significance when compared to all genes analyzed. p-values were calculated using the Wilcoxon-Mann-Whitney test, followed by a Bonferroni correction. *p<0.05; **p<0.01; ***p<0.001.

https://doi.org/10.7554/eLife.03198.032
Figure 5—figure supplement 3
Fraction of maternal small RNAs near the 5’ end of imprinted genes.

Boxplots illustrating the fraction of classified 24 nt sRNA reads identified as derived from the maternally inherited genome for the set of all genes that were analyzed for imprinting, the union of all identified MEGs, and the union of all identified PEGs (Figure 1B). Windows had to exceed a threshold of five classified reads (i.e., reads that could be assigned to the maternal or paternal genome based on SNPs) to be included in the analysis. p-values were calculated using the Wilcoxon-Mann-Whitney test. *p<0.05; **p<0.01; ***p<0.001.

https://doi.org/10.7554/eLife.03198.033
Figure 6 with 3 supplements
Expression and methylation analysis of HDG3, an allele-specific imprinted gene.

(A) HDG3 is a PEG except when Cvi is the paternal parent. Blue bars, % paternal allele expression; red bars, % maternal allele expression from combined mRNA-seq data; vertical line, expected percent paternal allele expression for a non-imprinted gene. (B) Methylation of HDG3 5′ flanking region in Col embryo, Ler embryo and endosperm, and Cvi embryo (additional analysis in Figure 6—figure supplement 1). Red track, CG; green track, CHH. (C) Methylation profile of maternal and paternal HDG3 alleles in Col x Cvi and Cvi x Col endosperm as determined by locus-specific bisulfite PCR. Red circles, CG; blue circles, CHG; green circle, CHH. Filled circles indicate methylation, whereas unmethylated positions are unfilled. (D) Methylation profile of HDG3 in Col, Ler, Cvi, Kz_9 and An_1 in leaves (http://neomorph.salk.edu/1001_epigenomes.html). (E) HDG3 is not imprinted in 6 DAP endosperm when another hypomethylated strain (Kz_9) is the pollen parent, but is a PEG in a cross with another methylated strain (An_1), as determined by sequencing RT-PCR products that span informative SNPs. Blue bars, % paternal allele expression; red bars, % maternal allele expression; vertical line, expected paternal allele expression for a non-imprinted gene. The number of RT-PCR clones sequenced is indicated. p value represents a binomial test of whether the observed maternal:total ratio is less than the expected 2:3 ratio. (F) Cartoon representation of results. Expression and methylation results for AT3G14205 and AT2G34890 are in Figure 6—figure supplement 2. Examples of genetic differences causing methylation differences are in Figure 6—figure supplement 3.

https://doi.org/10.7554/eLife.03198.019
Figure 6—figure supplement 1
Methylation analysis of HDG3.

(A) Allele-specific CG methylation of maternal and paternal HDG3 alleles in Cvi x Col and Col x Cvi embryos from whole genome BS-seq data. (B) Methylation profile of the HDG3 DMR in different strains and tissues and of maternal and paternal HDG3 alleles in Col x Cvi and Cvi x Col embryos as determined by locus-specific bisulfite PCR. Filled circles indicate methylation, whereas unmethylated positions remain unfilled.

https://doi.org/10.7554/eLife.03198.020
Figure 6—figure supplement 2
Expression and methylation analysis of other variably imprinted genes.

(A) AT3G14205 is a PEG except when Cvi is the paternal parent (mRNA-seq data). Blue bars, % paternal allele expression; red bars, % maternal allele expression from combined mRNA-seq data; vertical line, expected % paternal expression for a non-imprinted gene. (B) Methylation of AT3G14205 5′ flanking region in Col embryo, Ler embryo and endosperm, and Cvi embryo. Red track, CG; green track, CHH. (C) Leaf methylation profile of AT3G14205 in Col, Ler, Kz_9, Cvi and Seattle_0 (http://neomorph.salk.edu/1001_epigenomes.html). (D) AT3G14205 is not imprinted when another hypomethylated strain (Seattle_0) is the pollen parent, but is a PEG in a cross with another methylated strain (Kz_9), as determined by sequencing RT-PCR products that span informative SNPs. The number of RT-PCR clones sequenced is indicated. p value, binomial test of whether the observed maternal:total ratio is less than the expected 2:3 ratio. (E) AT2G34890 is a MEG except when Ler is the paternal parent (mRNA-seq data). Vertical line, expected maternal allele expression for a non-imprinted gene. (F) Methylation of AT2G34890 5′ flanking region in Col and Ler embryo and Cvi embryo and endosperm. (G) Leaf methylation profile of AT2G34890 in Col, Cvi, Ler and Es_0 (http://neomorph.salk.edu/1001_epigenomes.html). (H) AT2G34890 is still imprinted when another hypomethylated strain (Es_0) is the pollen parent. p value: binomial test of whether the observed maternal:total ratio is greater than the expected 2:3 ratio.

https://doi.org/10.7554/eLife.03198.021
Figure 6—figure supplement 3
Genetic difference between strains can underlie differential methylation.

Differences in methylation between strains can be due to genetic differences (e.g., absence of a sequence in one strain). To uncover possible genetic differences between Col and Cvi strains, we compared the set of Col-Cvi methylation difference positive windows to regions of the Cvi genome not covered by any reads in the 1001 Genomes resequencing project (http://signal.salk.edu/atg1001/index.php). We validated DMRs at one MEG (AT5G17165) and one PEG (AT1G57820) that were polymorphic between the two strains using PCR and sequencing. (A) In Cvi, the TE at the 3′ end of AT5G17165 lacks 600 nt, corresponding to a methylated region in Col and Ler. Red track, CG; blue track, CHG; green track, CHH. (B) In Cvi, the 86 nt long TE at the 5′ end of AT1G57820 has 9 SNPs, and the upstream and downstream intergenic DNA sequences have insertions and deletions compared to Col.

https://doi.org/10.7554/eLife.03198.022
Natural epigenetic variability across strains at embryo-endosperm CG DMRs.

(A) Methylation variability across strains for regions targeted for endosperm demethylation. Classification of methylation range in the Schmitz et al., 2013 dataset (total of 140 strains) for all all embryo-endosperm CG DMRs (n = 10,370) identified in this study. Only DMRs with at least 5 CG sites and a minimum of five reads coverage at each site in the Schmitz et al. dataset were classified. Additionally, only DMRs with at least 70 strains with sufficient data were included. (B) Examples of low range (gray), not bimodal (blue), weak bimodal (orange) and strongly bimodal (red) DMRs. (C) Association of the classified CG endosperm-embryo DMRs with PEGs. This study: DMRs identified from all pairwise matched endosperm-embryo comparisons from bisulfite datasets in Figure 3—source data 1; Ibarra et al., 2012: ColxLer and LerxCol DMRs combined. Allele-specific PEGs are listed around the pie chart. n/a = not classifiable because gene was associated with DMR of more than one type.

https://doi.org/10.7554/eLife.03198.034

Additional files

Supplementary file 1

Primers used in this study.

https://doi.org/10.7554/eLife.03198.035
Supplementary file 2

SNPs used in this study.

https://doi.org/10.7554/eLife.03198.036
Supplementary file 3

Script to classify mRNA-seq reads by strain.

https://doi.org/10.7554/eLife.03198.037

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Daniela Pignatta
  2. Robert M Erdmann
  3. Elias Scheer
  4. Colette L Picard
  5. George W Bell
  6. Mary Gehring
(2014)
Natural epigenetic polymorphisms lead to intraspecific variation in Arabidopsis gene imprinting
eLife 3:e03198.
https://doi.org/10.7554/eLife.03198