Introduction

Drastic Epigenome reprogramming occurred during early mammalian embryo development, epigenetic marks, such as DNA methylation, histone modifications have been revealed play important roles in the epigenome reprogramming process1,2. The epigenome reprogramming during preimplantation embryo development erases the oocyte and sperm’s epigenetic memory and allows the zygote to establish a new epigenome landscape which is relevant for early development and is essential for maternal zygotic transition (MZT) and differentiation1,3. Deficiencies in reconstructing of the epigenomes can result in developmental defects, such as embryonic arrest3. Histone modifications are fundamental epigenetic regulators that control gene expression and many key cellular processes46. Different types of histone modifications deposited in specific genomic regions and perform distinct functions3. Most histones modifications largely accumulated around TSSs in the embryos, this feature is shared by somatic cells79. this feature is commonly found in many species and cell lines1012, so this distribution pattern called as the ‘canonical’ pattern13. It had been reported that highly pervasive H3K27me3 is found in regions depleted of transcription and DNA methylation in oocytes, while extensive loss of promoter H3K27me3 at Hox and other developmental genes upon fertilization accompanied by global erasure of sperm H3K27me3 but inheritance of distal H3K27me3 from oocytes, both H3K4me3/H3K27me3 bivalent promoter marks are restored at developmental genes14.

The K-residues at N-terminus of histone H3 exhibit mono-(H3K4me1), di-(H3K4me2), or tri-methylated (H3K4me3) catalyzed by a family of proteins that contain a SET domain, the methylation of different number of methyl groups added represent different chromatin status1517. The different number and kinds of histone modification marks associated with different gene expression states18, it is well known that H3K4me3 usually enriched nearby transcription start sites(TSS) region associated with the activation of nearby genes19, and H3K4me1 as an global marker of enhancers, both the regions enriched H3K4me1 and H3K27ac identified as super enhancers2022. It is reported that H3K4me1/H3K4me2/H3K4me3 is enriched on both sides of the TSS, play diverse and important roles in regulating gene expression in mammalians23,24. Previous studies revealed that the H3K4me2 distribution pattern is like the H3K4me3 in plant25,26. While how the H3K4me2 reprogramming in mammalians early development is still unknown.

H3K4me2 has been proved in coordinating the recovery of protein biosynthesis and homeostasis following DNA damage required for developmental growth and longevity27. Unlike in animals, H3K4me2 enrichment is negatively correlated with gene transcript level in plant, H3K4me2 in rice and Arabidopsis specifically distributes over the gene body region and functions as a repressive epigenetic mark28. H3K4me2 was found to be a key epigenetic factor in chicken PGC cell lineage determination and narrow domain activated PGC-specific expression genes and signaling pathways29. H3K4me2 can be acquired as early as the embryonic stem cell stage, and its distribution is dynamic and progressive throughout mouse brain differentiation and tissue development30. Furthermore, it is reported that 90% of the transcription factor binding regions (TFBRs) overlap with the H3K4me2-enriched regions, and together with the H3K27ac-enriched regions can greatly reduce false positive predictions of the TFBRs, moreover, the stronger signals of H3K4me2 enrichment more likely overlap with the TFBRs than those with weaker signals31. It had been reported that a subgroup of genes enriched with high levels of H3K4me2 within their gene body linked to T cell functions, different from the TSS centered enrichment of housekeeping genes32. And it had reported that many regions of the sperm epigenome bearing H3K4me2 are already present in spermatogonia, suggesting an early established of H3K4me2 in spermatogenesis and highly conserved33.

In this study, we aimed to reveal how the H3K4me2 reprogramming in mammalian early development. To address this question, we optimized the Cleavage Under Target and Release Using Nuclease (CUT&RUN)34,35 methods using H3K4me2 antibody to as few as 50 cells and revealed that unique H3K4me2 modifications pattern and epigenetic reprogramming modes during the parental-to-zygotic transition in mouse.

Materials and methods

Cell culture of K562

Human K562 cell line was cultured at 37°C, 5% CO2 and saturated humidity, using 1640 culture medium (GIBCO) containing 10% fetal bovine serum and 90 % 1640 culture liquid. Human K562 cell was used for optimizing limited cell number CUT&RUN of antibody H3K4me2.

Cell culture of mouse ESC

Mouse embryo stem cells (ESC) were cultured on 0.1% gelatin-coated plates without irradiated mouse embryonic fibroblasts (MEFs), using standard ESC medium consisting of DMEM (GIBCO) supplemented with 15-20% fetal bovine serum (FBS, Sigma), 0.1 mM β-mercaptoethanol (Sigma), 0.1mM MEM non-essential amino acids (Cellgro), 2mM Glutamax (GIBCO), 1% nucleoside (Millipore), 1% EmbryoMax® Penicillin-Streptomycin Solution (Millipore) and 10 ug/L recombinant leukemia inhibitory factor (LIF) (Millipore). Mouse embryo stem cells was cultured at 37°C, 5% CO2 and saturated humidity.

Mouse gametes and early embryos collection

Overall, the mouse oocytes were collected from 4-5-week-old C57BL/6N females’ mice (purchased from Charles River), In addition, the pre-implantation embryos were harvested from 4-5-week-old C57BL/6N females mated with DBA/2 males (8-week-old when purchased from Charles River). To be specific, the Germinal vesicle (GV) oocytes were obtained from 4-5-week-old C57BL/6N females 46-48 hours after injection of 10 IU of pregnant mare serum gonadotropin (PMSG, Solarbio). What’s more, 4–5-week-old C57BL/6N females were administered 10 IU of human chorionic gonadotrophin (HCG)intraperitoneally, 46–48 h after PMSG disposed, then mated with DBA/2 males. PN5 zygotes were collected at 28-32 h post-hCG injection, Early_2-, Late_2-, 4cell, 8cell embryos, morulas and blastocysts were harvested at 32-36 h, 45-48 h,54-58 h,60-64 h, 68-75h, 84-96 h post-hCG injection. The metaphase of second meiosis (MII) oocyte were got 12-14 hours after injection of 10 IU of hCG intraperitoneally, the zona pellucida of all oocytes and embryos was removed by glutathione (GSH Solarbio).

Immunofluorescence

The samples (mouse oocytes or pre-implantation embryos) were washed twice in MII medium, 4% PFA was used as a fixation buffer (BOSTER, AR1068) to fix samples at room temperature about 60 mins. Next, samples were transferred by mouth pipette and washed by 1% BSA (Sigma, A1933) in 1 × PBS (Corning,21-040-CVR) for once. The samples were then permeabilized by 0.5% Triton X-100 (Sigma, T8787) in 1 × PBS at room temperature for 20 mins, followed by washing with 1% BSA in 1 × PBS, and blocked by 1% BSA in 1 × PBS at room temperature for 1 hour. The blocked samples were incubated by H3K4me2 antibody (SigmaMillpore, 07030) in 1% BSA in 1 × PBS diluted at ratio of 1:500, at room temperature for 2 hours or at 4°C for overnight, followed with washing by 1% BSA in 1 × PBS of 3 mins for at least 5 times. The samples were incubated with Alexa Fluor 568 conjugated Donkey anti Rabbit IgG (Invitrogen, A10042) in 1% BSA in 1 × PBS diluted at ratio of 1:200, at room temperature for 50 mins without light, followed with washing by 1% BSA in 1 × PBS of 3 mins for 5 times. Finally, the samples were stained by ProLong@ Gold Antifade Reagent with DAPI (Cell Signaling, 8961S), using 4-Chamber Glass Bottom Dish(100 per Case), 35mm Dish with 20mm Bottom Well (Cellvis, D35C4-20-1-N), putting the samples in room temperature for about 24 h in a dark environment. Images were acquired on Zeiss 700 confocal microscope. Raw images were projected along the Z-axis and processed using ZEN software (Zeiss) with brightness and contrast adjusted (The same parameters were used for each channel of all samples).

Zona pellucida removal and ICM/TE isolation

Before CUT&RUN experiment, to guarantee the effect of permeation, the zona pellucida of all oocytes and embryos was chemically removed by the glutathione (GSH Solarbio), GSH were added dropwise to oocytes or embryos by mouth pipette which in MII medium (Sigma), the zona pellucida were removed after about 10-30 seconds. Then the sample were washed twice in PBS (Corning 21-040-CVR), and then washed once with wash buffer. The inner cell mass (ICM)s or TEs of day 4-5 blastocysts were isolated mechanically. Before directly addressed to CUT&RUN, the separated cells were washed twice with PBS and once with wash buffer, then resuspended by wash buffer.

CUT&RUN library construction and next-generation sequencing

CUT&RUN was addressed following the previously published protocol34,35 with modifications in antibody binding, adaptor ligation and DNA purification steps. To be specific, washing buffer and binding buffer were balanced to room temperature for about 30min in advance. Samples (mouse zygotes, embryos or ESCs with zona pellucida removed) were transferred into 0.2mL conventional, non-low-binding PCR tube (Axygen). The samples were resuspended by 60μL washing buffer (HEPES-KOH, pH = 7.5, 20mM; NaCl, 150mM; Spermidine, 0.5mM and with Roche complete protease inhibitor.). Concanavalin-coated magnetic beads (Polyscience, 86057) were gently washed thrice or four times and resuspended by binding buffer (HEPES-KOH, pH = 7.9, 20mM; KCl, 10mM; CaCl2, 1mM; MnCl2, 1mM) then adding Concanavalin-coated magnetic beads 10-15μL for each sample, The samples were then incubated at room temperature (about 25°C) at 450 rpm for 20 mins on Thermomixer (Eppendorf). Putting the samples on magnetic stand to facilitate removing supernatant, then resuspended by 50μL antibody buffer (washing buffer plus digitonin 1000:1, EDTA, pH = 8.0, 2mM) with H3K4me2 antibody (Sigma Millpore 07030) diluted at ratio of 1:100. The samples were incubated at room temperature (23-26 °C) for 2h at 450 rpm on Thermomixer, or incubated at 4°C for overnight(≥ 16h) at 450 rpm on Thermomixer.

Next, the samples were held at the magnetic stand and washed by 190μL dig washing buffer (washing buffer plus digitonin, freshly pre-heated, 0.005%~0.01%, tested for each batch) for once and 100μL for a second time. The samples were then resuspended by 50μL dig washing buffer with PA/G-MNase (to a final concentration of 700ng/mL), and incubated at room temperature (23-26°C) on Thermomixer for 1h, or incubated at 4°C on Thermomixer for 3 hours at 450 rpm. The samples were washed by 190μL dig washing buffer for twice and 100μL for a third time at the magnetic stand. Then adding 100μL dig washing buffer and balanced on ice for 3 mins. Targeted cleavage was performed on ice by adding 2μL 100mM CaCl2 about 30 mins, and reaction was stopped by adding 100μL 2 × stop buffer (NaCl, 340mM; EDTA, pH = 8.0, 20mM; EGTA, pH = 8.0, 4mM; RNase A, 50μg/mL; glycogen, 100μg/mL; plus digitonin, preheated, 0.005%~0.01%, tested for each batch) and fully vortexed. The samples were then incubated at 37° C of 30 mins for fragment releasing. The total samples or supernatants were digested by adding 10ng carrier RNA (Qiagen, 1068337), 1μL 20% SDS and 2μL Proteinase K (Roche), and the reaction was addressed at 56°C for 45 mins and 72°C for 20 mins in PCR machine. DNA was purified by phenol chloroform followed by ethanol purification, 200ul Phenol chloroform was added, mixed and centrifuged at 20°C for 16000g for 5min.Then adding 200ul chloroform to each sample. After mixing, the samples were centrifuged at 20°C for 16000g for 5min and transferred to LO-binding 1.5ml tube, each sample was added 2ul glycogen, 20ul NaAC, 500ul 100% ethanol (precooled in minus 20 °C), mixed and centrifuged for 30min in a refrigerator at −20 °C, or incubated at −20°C for overnight. At the same time, the centrifuge was started for pre-cooling at 4ΰC, and 100% alcohol was treated in a refrigerator at −20°C for 1 hour.

The samples were centrifuged at 16000g at 4°C for 20min and remove the supernatant, 1ml of anhydrous ethanol was added to each sample, centrifuged at 4 degrees for 16000g for 10 min. Take out the sample to remove the supernatant and open the tube. After the liquid is naturally air-dried, Purified DNA was applied to Tru-Seq library construction using NEBNext Ultra II DNA Library Prep Kit for Illumina (NEB, E7645S). For NEB kit, the DNA was resuspended in 25μL ddH2O, followed by end-repair/A-tailing with 3.5μL End Prep buffer and 1.5μL enzyme mix, the reaction was addressed at 20°C for 30 mins and 65°C for 30 mins in PCR machine. Then adding diluted 1.25μL adaptors (NEB Multiplex Oligos for Illumina, E7335S) and fully vortexed, 15μL Ligation master mix, 0.5μL Ligation enhancer, incubated at 16-20°C for 16-20min, or at 4°C for overnight. After the ligation reaction, the samples were treated with 1.5μL USERTM enzyme, the reaction was addressed at 37°C for 15 mins in PCR machine. The samples were purified by 1×AMPure beads. The PCR procedure was performed by adding 25μL 2 × KAPA HiFi HotStart Ready Mix (KAPA biosystems, KM2602) with 2.5ul primers of NEB Oligos kit and 2.5ul Illumina universal primers, with the program of 98°C for 45s, (98°C for 15s and 60°C for 10s) with 16-17 cycles and 72°C for 1 min. The final libraries were purified by 0.9-1 × AMPure beads and applied to next-generation sequencing.

Data analysis

CUT&RUN data processing

The single-end CUT&RUN data were to mapped mm9 genome by using Bowtie2 (Version 2.2.9) 36 with the parameters −t −q −N 1 −L 25. The paired-end CUT&RUN data were mapped with the parameters: −t −q −N 1 −L 25 −X 1000 --nomixed --no-discordant. All unmapped reads, non-uniquely mapped reads and PCR duplicates were removed. For downstream analysis, we normalized the read counts by calculating the numbers of reads per kilobase of bin per million of reads sequenced (RPKM) for 100-bp bins of the genome. RPKM values were further Z-score normalized for minimizing the batch effect.

ATAC-seq data processing

The ATAC-seq reads were mapped to mm9 genome by Bowtie (version 2.2.2) 36 with parameters described previously. All unmapped reads, non-uniquely mapped reads, and PCR duplicates were removed. For downstream analysis, we normalized the read counts by calculating the RPKM for 100bp-bins of the genome. RPKM values were further normalized by Z-score normalization for minimizing batch effect.

DNA methylation data processing

we used Bismark software mapping DNA methylation reads to mm9 genome. PCR duplicates were removed. For each CpG site, the methylation level was computed as the total methylated counts divided by the total counts across all reads covering this CpG.

Analysis of the promoter H3K4me2 and H3K4me3 enrichment

Annotated promoters (TSS ± 2.5kb) of the mm9 refFlat annotation database from the UCSC genome browser were used. Whole genome Z-score normalized RPKM scores were used for calculation.

Identification of stage-specific distal ATAC-seq peaks

Stage-specific distal ATAC-seq peaks were identified as described previously. In brief, distal peaks from human embryos and ES cells were merged and then average Z-score normalized RPKM scores were calculated on them. A Shannon entropy-based method 3 was used to identify stage-specific peaks. Peaks with entropy less than 2 were selected as candidates and were further selected by the following criteria: the distal peak has high enrichment at this stage (normalized RPKM > 1) and positive enrichment (normalized RPKM > 0) at no more than two additional stages.

Identification of stage-specific genes

For stage-specific genes in 8-cell embryos and ICMs, a Shannon entropy-based method 3 was used to identify stage specific-genes as previously described. Maternally expressed genes, which were defined as those expressed in GV or MII oocytes (FPKM >= 1), were excluded from this analysis to avoid the confounding effects of lingering maternal RNAs.

Clustering analysis

The k-means clustering of gene expression levels at various stages was conducted using Cluster 3.0 with the parameters −g 7 (Euclidean distance). Heat maps were generated using Java Treeview 3.0. The hierarchical clustering analysis for H3K4me3 at various stages was conducted using an R package (ape) based on the Pearson correlation between each pair of stage. The distance between two stages was calculated as: 1 - Pearson correlation.

Identification of oocyte PMDs

Briefly, average DNA methylation levels for 10-kb bins of the genome were calculated and the numbers of CpG dyads in each 10-kb bin were also counted. Bins with averaged DNA methylation level lower than 0.6 and more than 20 CpG dyads covered were selected and further merged into PMDs by BEDtools (Version v2.25.0)37. Promoter regions (±2.5kb) were removed from PMDs.

Identification of H3K4me2 peaks

H3K4me2 peaks were called using MACS2 (Version 2.0.9)38 with the parameters --broad −nomodel --nolambda -g mm 1.87e9/ -g hs 2.7e9 and noisy peaks with weak signals (summed RPKM < 0) were removed from further analyses.

Statistical analysis

All statistical analyses in this study were briefly described in the text and performed using R (v3.6.1) (http://CRAN.R-project.org/).

Result

Genome-wide profiling of H3K4me2 in mouse oocytes and early embryos

Epigenome underwent drastic reprogramming during early development. To ask how the H3K4me2 inherited from sperm and oocyte to the zygotic during early development, we sought to investigate H3K4me2 in mouse gametes and preimplantation embryos using CUT&RUN methods. We optimized CUT&RUN protocol (CUT&RUN; Methods) using H3K4me2 antibody, at last, we acquired high-quality data using as few as 50 cells, most of the peaks could be retrieved using as few as 50 cells (Figure S1A, S1B). Then we collected GV oocytes, MII, PN5, early 2-cell, late 2-cell, 4-cell, 8-cell, and inner cell masses (ICMs) of mouse, and mouse embryonic stem cells (mESC) as control. We profiled H3K4me2 in oocytes and early embryos using CUT&RUN and immunofluorescence (IF). All replicates’ results showed highly consistent (Figure 1A, Figure S1C). We identified 57455, 25073, 110517, 126318, 62575, and 69154 H3K4me2 peaks in the GV oocytes, late 2-cell embryos, 4-cell embryos, 8-cell embryos, ICM and mESC, respectively, the H3K4me2 was erased from GV to MII during oocytes development, paternal H3K4me2 was also erased after fertilization, and may be re-established from late 2-cell stage (Figure 1A-B). In addition, we demonstrated the H3K4me2, ATAC-seq, mRNA expression of several developmental key genes (Figure 1C). Moreover, we found H3K4me2 positively correlated with H3K4me3 and chromatin accessibility in early embryos (Figure S1D-E). To our interest, the H3K4me2 genome-wide pattern of 4-cell embryos show strong association as 8-cell embryos: H3K4me2 of 4 and 8-cell embryos were in one cluster, and the correlation is high (R=0.93) (Figure S1F-G).

Mapping H3K4me2 in mouse (C57BL6/N) gametes and pre-implantation embryos.

(A) The IGV view showing the global landscape of H3K4me2 signals in mouse sperm, GV oocytes, MII oocytes, pre-implantation embryos. H3K4me2 data in sperm is from ENCODE (Lambrot R et al). ICM were isolated from blastocyst.

(B) Immunofluorescence of H3K4me2 in mouse(C57BL6/N) GV oocytes, MII oocytes, PN5 zygote, early 2-cell, late 2-cell, 4-cell, 8-cell, morula and blastocyst, scale bar is also shown.

(C) H3K4me2 enrichment near typical genes and their expression levels. H3K4me2 signal, ATAC-seq and Gene expression are shown.

Distinct H3K4me2 pattern between sperm and oocyte

We sought to systematically characterize the epigenome of mouse gametes, we included previous published DNA methylation39, H3K4me340 and ATAC-seq data of early embryos41, as well as H3K4me2 data42 of sperm for analyses. It is reported that non-canonical pattern of H3K4me3, enriched broad domains in both promoters and distal regions, are present in mouse oocytes as broad domains in partially methylated domains (PMDs) 43,44. Interestingly, H3K4me2 present as non-canonical pattern in GV oocytes and sperm in mouse (Figure 2A-B, 2E-F). Considering mature sperm and oocytes are in a state of inactive transcription 45,46, while it is reported that strong ChIP-Seq and CUT&RUN signal enrichment of this marker were found which is related to transcription47. These results raise the possibility that H3K4me2 may be deposited during gametogenesis and retained in mature gametes. To further examine this surmise, we perform K-means cluster analysis for H3K4me2 in gametes. Interestingly, by comparing promoter H3K4me2 at the sperm and GV stages, we found that promoters can be divided into four groups (Figure 2A). The first group (sperm-specific H3K4me2) are involved fertilization and germ cell development and these markers are erased at other stages lately (Figure 2A, 2C). On the contrary, the second group (GV-specific H3K4me2) are enriched for mitosis and gamete generation which are known to function in GV generation during oocyte development and they are reestablished in the lately stages (Figure 2A, 2C). The third group (sperm-GV share H3K4me2) related to mRNA processing and translation, the signal of these regions underwent erase (MII, zygote and early 2-cell stage) and rebuild (from late 2-cell stage) during development (Figure 2A, 2C). The fourth group have low H3K4me2 level at all stage (Figure 2A).

Comparison of the H3K4me2 between sperm and oocyte.

(A-B) Heatmaps showing H3K4me2 and H3K4me3 expressed genes between sperm and GV oocyte in promoter and distal region among different stages. CpG densities are also mapped and shown.

(C) The enriched GO terms of H3K4me2 marked genes in GV and sperm specific or shared genes, P value is also calculated and shown.

(D) The GREAT analysis result for distal peaks in GV and sperm specific or shared genes is shown. P value is also calculated and shown.

(E-F) The IGV views showing H3K4me2 expressions in GV and sperm specific or shared genes at representative promoters (left, shaded) or distal accessible regions (right, shaded).

The H3K4me2 enrichment in these promoters in consistent with the dynamics changing of H3K4me3 (Figure 2A). In particularly, promoters with high CpG density are preferentially marked by H3K4me2 in sperm-GV share group (Figure 2A). Next, we further investigated H3K4me2 marked distal region in gametes. Sperm specific group are involved gamete generation and GV specific distal H3K4me2 showed functions in cell differentiation and mitotic cell cycle, those were both erased after fertilization and not re-constructed during embryos development (Figure 2B,2D). While H3K4me3 signals did not occupied the sperm specific/GV specific H3K4me2 distal regions (Figure 2B). The GV-sperm share distal H3K4me2 regions is also marked by H3K4me3, those genes participate in system development, RNA processing and regulation of gene expression (Figure 2B, 2D). Compared to GV, sperm H3K4me2 enriched in full methylated promoter but sperm distal H3K4me2 enriched in hypomethylated region the same as GV (Figure S2A-D). Therefore, these results reveal differential H3K4me2 modification patterns between sperm and oocytes.

Resetting H3K4me2 during pre-implantation embryo

Drastic epigenetic reprograming and chromosome remodeling associated with H3K4me3, H3K27me3, H3K36me3 and H3K27ac during early mammalian development44,48,49, our previous data revealed epigenome reboot in human early development50. Next, we asked how parental histone mark H3K4me2 at regulatory elements reprogrammed after fertilization. To our surprise, the GV and sperm H3K4me2 peaks were almost erased in MII oocytes, and reconstructed accompanying zygotic genome activation (ZGA) (Figure 3A-B). This result is further confirmed by IF analysis, showed the re-establishment of H3K4me2 may start from late 2-cell stage (Figure 1B, Figure 3B). Coincidently, The H3K4me2 methyltransferase KDM1A is up-regulated at MII stage (Figure S3A). While the CUT&RUN data which showed slightly H3K4me2 peaks (relate to other stages) re-appear at late-2-cell stage (Figure 3A). We concluded that the IF was not able to recognize the newly established weak H3K4me2 signal in the late 2-cell stage. Correlated with the activation of Prdm9 and Kmt2c at 4-cell stage (Figure S3A), H3K4me2 is enough to be detected at 4/8 cell stage (Figure 3A-B) using CUT&RUN and IF. 4/8 cell H3K4me2 targets only partially overlap with those in sperm or oocytes and these overlap targets mainly enriched in development-associated pathway or process (Figure 3C). Additionally, gametes specific H3K4me2 targets involved in gametes development and generation whereas 4/8-cell specific targets were associated with cell communication and metabolic process (Figure 3C). These data indicated that the resetting of H3K4me2 involves distinct sets of genes that closely related to embryos development from mouse gametes to early embryos. In particularly, about 1% 4/8 cell H3K4me2 occurs in hypomethylated region as early as late 2-cell stage (Figure S3B-D). It implies that the resetting of H3K4me2 start from late 2-cell stage but largely re-establish at 4/8 cell stage.

Resetting H3K4me2 during mouse pre-implantation development.

(A) H3K4me2 peaks number in mouse embryo stem cell(mESC), mouse gametes and pre-implantation embryos was calculated and shown.

(B) The IGV view showing H3K4me2 signals in mESC, mouse gametes and pre-implantation embryos, Ncc1 gene sites was shown.

(C) Venn diagrams showing the overlaps of H3K4me2 marked genes between sperm/GV oocytes, separately (left) or jointly (right), and 4-cell. The enriched GO terms for shared or nonoverlapping genes are also shown.

Non-canonical H3K4me2 in mammalian embryos and pervasive H3K4me2 in CpG-rich regulatory regions in mouse 4 and 8-cell embryos

In GV oocytes and post ZGA stages, H3K4me2 enriched in both canonical gene promoters and partially methylated domains (PMDs) (Figure 4A-D). Our results showed that the H3K4me2 may be re-established in late 2-cell stage, and strong promoter H3K4me2 enrichment re-establishment is already finished in the 4-cell stage and maintained in 8-cell stage, furthermore, the H3K4me2 signals occupied broader regions than those at other stages (Figure 1A). We found about 65% of these gene promoters occupied by H3K4me2 in 4-cell,8-cell stage and ICM, and those genes play roles in RNA metabolic and RNA splice (Figure 4E). We then combined previous H3K4me3 and chromosome accessibility data, and we found these regions overlapped with high CG density regions, H3K4me3 enrichment and chromatin accessibility regions (Figure 4E), it shows that H3K4me2 prefer to occupy hypomethylation and high CpG density regions (Figure S4A-C).

Pervasive H3K4me2 in CpG-rich regulatory regions in mouse embryos.

(A-B) H3K4me2 and H3K4me3 enrichment near TSS (TSS ± 5kb) and PMD (3× PMD) in mouse gametes, pre-implantation embryos and mESC is shown.

(C) The IGV view showing promoter H3K4me2 signals and widespread distal H3K4me2 signals in mouse oocytes and early embryos.

(D) The IGV view showing H3K4me2 signals and DNA methylation distributions in mouse oocytes and early embryos at PMD (partially methylation district) region.

(E) Heat maps showing H3K4me2, H3K4me3 enrichment at the promoters in all stage shared and 4-cell/8-cell specific embryos. ATAC-seq, promoter CpG density and GO analysis results are also shown.

(F) Heatmaps showing stage-specifically expressed distal peaks of H3K4me2 among 8-cell, ICM and mESC. ATAC-seq is also shown.

We further asked how the stage specific distal peaks reprogrammed, about 35% of the genes promoter which associates with cell differentiation and development were not large-scale reestablished H3K4me2 in 4-cell, 8-cell and blastocyst stage (Figure 4E). In particularly, we identified stage-specifically expressed genes and examined their promoter H3K4me2 (Figure S4E). We found promoters with constantly high H3K4me2 that are preferentially CpG-rich, whereas promoters with constantly low H3K4me2 are generally CpG-poor (Figure 4F, Figure S4E). Interestingly, a group of genes showed dynamic promoter H3K4me2 that correlates with gene expression (Figure S4E). These genes preferentially play roles in development, differentiation, and morphogenesis. Therefore, our data indicated that promoter H3K4me2 in early embryo development correlates with both gene activities and CpG density.

Next, we sought to investigate the distal H3K4me2, 4/8-cell embryos also showed widespread distal H3K4me2 signals (Figure 4C). We firstly identified the distal H3K4me2 regions in mouse gametes and pre-implantation embryos. These regions are relatively CpG-rich (although lower than promoters) and hypo-methylated (Figure S4B and S4C). However, compared with promoters, distal H3K4me2 correlated with CpG density in a continuous and monotonic manner (Figure S4A). In sum, these data reveal widespread distal H3K4me2 in CpG-rich and hypomethylated regions in mouse 4/8-cell embryos.

Transcription regulation, CcREs priming and resolving during maternal-to-zygotic transition

Previously study found that most of the transcription factor binding regions overlap with H3K4me2-enriched regions51. To investigate the association between H3K4me2 and transcription regulation. We further identified stage specific distal regions and the accessibility in these distal peaks of H3K4me2 (Figure 4F). Then, we identified the binding motifs for transcription factors (TFs) enriched in these distal regions (Figure S4D). TFs in mESC and 8-cell include pluripotency factors involved system development52,53, we found CTF is specifically enriched in 8-cell embryos. The GATA, KLF families are enriched in ICMs.

Our recent study in human found widespread deposition of H3K4me3 at promoters, including those for developmental genes, prior to ZGA, which subsequently resolve to either active or repressed states after ZGA50. Such promoter priming is related to embryonic development. To ask whether this is also happened in mouse, we identified specific H3K4me2 target genes in 4/8-cell and compared them with genes which are expressed or inactive at all stages. The promoters of 4/8cell specific group decreased in ICM (Figure 5A and S5A). “Active” genes showed strong H3K4me2, H3K4me3 and ATAC-seq signals before and after ZGA (Figure 5A and S5A). We then asked if such priming-resolving epigenetic transition also occurs in the distal regions. Distal H3K4me2 sites are enriched on regulatory elements, as 75.4% (compared to 16.2% of random) of them overlap with candidate cis-Regulatory Elements (CcREs) identified by ENCODE (Figure 5B).

Chromatin states at specific stages and H3K4me2 mode chart during mouse maternal-to-zygotic transition.

(A) Heatmaps showing H3K4me2, H3K4me3 and ATAC-seq, “All active” and “All inactive” promoters and corresponding genes expression. CpG densities are also mapped and shown.

(B) The bar chart showing H3K4me2 distal peaks overlap with CcREs 4/8-cell stage. Random peaks are shown as a control.

(C) Heat maps showing the chromatin states of CcREs overlapping with H3K4me2 and H3K27ac distal peaks. Clustering was conducted using ATAC-seq, H3K4me2 in 4-cell, 8-cell and ICM stages. The GREAT analysis result for each cluster is shown (right). Mouse histone modified H3K27ac data refer to GEO database : GSE72784.

(D) Dynamic reprogramming of H3K4me2 during mouse parental-to-zygotic transition. A schematic model showing “H3K4me2 resetting” during parental-to-zygotic transition. After fertilization, the parental H3K4me2 is almost globally erased. De novo H3K4me2 promoters and distal regions in CpG-rich and DNA hypomethylation regions were re-established at late 2-cell stage, possibly as a priming chromatin state. This leads to resolution of primed promoters to “active” promoters with H3K4me3, or “poised”. A similar transition occurs for putative enhancers.

The enhancers are DNA sequences that increase promoter activity and thus the frequency of gene transcription. The enhancer region is usually modified by H3K27ac catalyzed by P300 /CBP and H3K4me1 histone 2 catalyzed by MLL3/454, while the H3K4me3 modification level is low, which is the opposite of the promoter region. We compared histone modifications to H3K4me2 and H3K27ac and found no significant signal overlap between the two histone modifications at specific developmental stages in the distal region. Distal 4/8-cell H3K4me2 marked CcREs show high enrichment of chromatin accessibility (Figure 5C, S5B). Interestingly, after 4/8-cell, one class (“active”) remain accessible and are associated with basic biological processes (GREAT analysis). By contrast, the other class (Figure 5, S5B, “poised”) are associated with H3K27me3 in ICM and somatic and located preferentially near developmental genes. These data indicate that the widespread de novo H3K4me2 was re-established following ZGA occurred, and H3K4me2 may be globally reset to a permissive state before they resolve after 4/8-cell (Figure 5D).

Discussion

Histone modifications are fundamental epigenetic marks, the reprogramming of histone marks play important roles in early development9. Here in our study, we profiled the H3K4me2 landscape in mESC, oocyte and early embryos using modified low-input sensitivity CUT&RUN method. We found the H3K4me2 presented differential patterns in sperm and oocytes, exhibited non-canonical pattern in GV oocytes while canonical of sperm in mouse33, furthermore, the GV H3K4me2 peaks were almost erased in MII oocytes, and reconstructed accompanying ZGA occurred, our results indicated that maternal H3K4me2 were lost during oocyte maturation. The non-canonical H3K4me2 in mammalian embryos and pervasive H3K4me2 in CpG-rich regulatory regions in 4 and 8-cell embryos and in ICM. Hence, we provided the first landscape and revealed the potential function of H3K4me2 during early mammalian development.

How the paternal epigenetic inheritance is important for development and disease in offspring, the drastic epigenome reprogramming occurred and associated with key process during early mammalian development1,3,9,14,3941,5559. It is reported that genome-wide H3K4me3 exhibited a non-canonical pattern (ncH3K4me3) in full-grown and mature oocytes, means it exists as broad peaks at promoters and lots of distal loci40,44,48, here in our study, we found that H3K4me2 was a non-canonical pattern in GV oocyte, but was erased in MII oocyte, indicating that H3K4me2 play an important role during oocyte maturation. The H3K4me3 was inherited in pre-implantation embryos and was erased in the late two-cell embryos following canonical H3K4me3 starts to be established40,44, while the H3K4me2 mainly starts to be re-established in late 2-cell stage, and it exists as non-canonical pattern in the following stages. Emerging data reveal that in many species, including mouse, zebrafish and human, the parental epigenome undergoes dramatic reprogramming after terminally differentiated gamete fertilization followed by subsequent reestablishment of the embryo epigenome, leading to epigenetic “rebooting”9. We found H3K4me2 were erased in MII oocyte, but reserved in sperm33, while the paternal H3K4me2 was erased upon fertilization, and start to be re-established in late 2-cell stage, indicating the H3K4me2 participated the embryo epigenome rebooting process.

H3K4me3 is known as an active gene expression marker, mainly marks the gene promoter region. H3K4me1 and H3K4me2 mark both enhancer and promoter, were reported to have multiple functions for gene activation according to different enzymes and complexes22,27,32,6069. It is reported that distinct pattern of H3K4me2 enrichment, spanning from the transcriptional start site (TSS) to the gene body, and marks a subset of tissue-specific genes, which indicated unique H3K4me2 profile mark tissue-specific gene regulation30,32. Our results showed that non-canonical H3K4me2 in mammalian embryos and pervasive H3K4me2 in CpG-rich regulatory regions in mouse 4 and 8-cell embryos. We next interrogate whether the H3K4me2 associated active or repressed genes in embryo development, and our data revealed that promoter H3K4me2 in early mouse embryo development correlates with both gene activities and high CpG density, as expected, the promoter H3K4me2 is consistent with H3K4me3.

Histone modification and chromatin accessibility profiles are usually used to identify transcription factor binding regions and motifs, we previously found large scale chromatin accessibility in pre-ZGA human embryos, and these regions enriched lots of transcription factors3. It is reported about average 90% of the transcription factor binding regions overlap with the H3K4me2-enriched regions and the stronger signals of H3K4me2 enrichment, the more likely overlap with the transcription factor binding regions 31. Interesting, here in our study, we identified the binding motifs for transcription factors enriched in these distal regions, and CCAAT transcription factor (CTF) is specifically enriched in 8-cell embryos. And the GATA, KLF families are enriched in ICMs, GATA is critical regulators of primitive endoderm (PE), which is part of ICM 70, and KLF factors support the naive pluripotency71. Thus, these results showed stage-specific distal H3K4me2 involve transcription circuitry in mammalian embryonic development.

The state of chromatin influences DNA replication, DNA repair and gene expression. It has reported that accessible chromatin occupies regulatory sequences, such as promoters and enhancers72. These elements interact with cell type-specific transcription factors to start transcriptional programs that determinate cell fate73. Early animal embryos undergo extensive histone modification reprogramming and chromatin remodeling before terminally differentiated gametes develop into pluripotent cells. Previous study found that extensive overlaps of accessible promoters and enhancers at 4-cell stage in human50. This phenomenon indicated crosstalk function between histone modification and chromatin state. In our study, most of the H3K4me2 reestablished in chromatin accessible regions including promoters and distal regions at 4-cell stage. So, we infer that the H3K4me2 establishment may play an important shaping role in chromatin state. Therefore, our results furthermore support the crosstalk between accessible chromatin and histone modification. But how the chromatin state influences H3K4me2 deposition or the opposite need further investigation.

Dynamic histone modifications play important roles during mammalian Maternal-to-zygotic transition (MZT), ~22% of the oocyte genome associated with broad H3K4me3 are anti-correlated with DNA methylation and then become confined to transcriptional-start-site (TSS) regions in 2-cell embryos, concomitant with the onset of major ZGA48, while H3K4me2 was erased in MII oocytes, the sperm H3K4me2 was erased after fertilization. The same as H3K4me3, the embryos’ H3K4me2 was also re-established accompanied with the onset of major ZGA. The ncH3K4me3 in oocytes overlaps almost exclusively with partially methylated DNA domains40, the H3K4me2 was also enriched in both canonical gene promoters and partially methylated domains (PMDs).

Taken together, our investigation provided a genome-wide landscape of H3K4me2 modification in mammalian oocytes and pre-implantation embryos, facilitating further exploration of the epigenetic regulation network in mammalian early development.

Data availability

All raw sequencing data generated during this work have been submitted to the Genome Sequence Archive in National Genomics Data Center (China National Center for Bioinformation/Beijing Institute of Genomics, Chinese Academy of Sciences)

H3K4me2 CUT&RUN data of Human K562 cell line http://bigd.big.ac.cn/gsa-human

H3K4me2 CUT&RUN data of mouse gametes and early embryos https://ngdc.cncb.ac.cn/gsa/

Funding and acknowledgement

This work was supported by the National Key R&D Program of China (2019YFA0802200 and 2019YFA0110900 to Jiawei Xu), the National Natural Science Foundation of China (31870817 and 32170819 to Jiawei Xu), the Scientific and Technological Innovation Talent Project of Universities of Henan Province (20HASTIT045 to Jiawei Xu), the Central Plains-Youth Talents Program (ZYQR201810155 to Jiawei Xu).

Author contributions

Conceptualization: JWX

Methodology: CW

Experiments performing: CW, SQY, XQX, KYH, YQW, YL, SLS

Data NGS (next generation sequencing): CW

Bioinformatics analysis: YS, JWX,CW

Supervision: JWX

Writing—original draft: JWX, CW, YS

Writing—review & editing: JWX, CW, YS

Competing interests

The authors declare no competing interests.

Supplementary figures

Validation of H3K4me2 CUT&RUN data in Human K562 cell line, oocytes and early embryos.

(A) The IGV views showing H3K4me2 distributions by CUT&RUN using various numbers of Human K562 cell line. The ENCODE references are added for comparison. GEO accession: GSM733651.

(B) The Pearson correlation coefficients showing the comparison between CUT&RUN of H3K4me2 using various numbers of Human K562 cells and conventional ChIP–seq data from ENCODE.

(C) Scatter plots showing the correlations of biological replicates (n=2) of H3K4me2 signals (2kb-bin, whole genome) by CUT&RUN for each stage of mouse oocytes and early embryos. The Pearson correlation coefficients are also shown.

(D) Scatter plots comparing the H3K4me2 signals with ATAC-seq (Assay for Transposase-Accessible Chromatin with high throughput sequencing) at promoters or distal accessible regions in mouse oocytes and early embryos. Spearman correlation coefficients are also shown.

(E) Scatter plots comparing the H3K4me2 signals with H3K4me3 signals in mouse sperm, oocytes and early embryos. Spearman correlation coefficients are also shown.

(F) Hierarchical clustering of H3K4me2 mouse oocytes and early embryos. Pearson correlation was used to measure distances.

(G) Scatter plots comparing the H3K4me2 signals between 4-cell and 8-cell in mouse embryos. Spearman correlation coefficients are also shown.

H3K4me2 in mouse gametes.

(A) Heatmaps showing H3K4me2 signals at all promoter regions (TSS ± 2.5kb) in mouse gametes. DNA methylation (TSS ± 2.5kb) is also mapped. DNA methlation data cited in GEO dataset: GSE56697.

(B) The box plots showing DNA methylation level at all promoter regions in mouse gametes. Random peaks are shown as a control. DNA methlation data cited in GEO dataset: GSE56697.

(C) Heatmaps showing H3K4me2 signals at all distal regions (peak ± 2kb) in mouse gametes. DNA methylation (peak ± 2kb) is also mapped. DNA methlation data cited in GEO dataset: GSE56697.

(D) The box plots showing DNA methylation level at all distal regions in mouse gametes. Random peaks are shown as a control. DNA methlation data cited in GEO dataset: GSE56697.

H3K4me2 in late 2-cell embryos.

(A) Heatmaps showing the expression of H3K4me2 related regulators in mouse (based on published RNA-seq data40).

(B) The box plots showing DNA methylation level at all promoter regions and distal region in mouse 2-cell stage. Random peaks are shown as a control.

(C-D) Venn diagrams showing De novo 4-cell H3K4me2 promoter and distal peaks number.

H3K4me2 in mouse 4/8-cell embryos.

(A) Scatter plots comparing the promoter and distal H3K4me2 signals with CpG densities in mouse gametes, 4-cell, 8-cell, ICM and mESC. The Spearman correlation coefficients are also shown.

(B) The box plots showing the CpG density and promoter/distal H3K4me2 peaks. Random peaks are shown as a control.

(C) The box plot showing DNA methylation in promoter and distal 4-cell H3K4me2 peaks. Random peaks are shown as a control.

(D) TF motifs identified from active enhancers in mouse early embryos (8-cell, ICM) and mESC. showing motif enrichment p value < 1e-20 at least at one stage were included. Circle size showing TF enrichment and the expression of the TF is color coded.

(E) Heatmaps showing stage-specifically expressed genes further classified by their H3K4me2 states (high, dynamic, and low). RNA-seq, H3K4me3 state, CpG densities and GO analysis results are also mapped and shown.

Chromatin states at specific stages and distal CcREs.

(A) Box plots showing H3K4me2, H3K4me3 and ATAC-seq signals, “All active” and “All inactive” promoters in mouse gametes, embryos and mESC. CpG density for each group is also shown (right).

(B) The box plot showing the ATAC-seq, H3K4me2 and H3K27ac signals at putative active, poised or random enhancers in mouse gametes, early embryos or mESC, Mouse histone modified H3K27ac data refer to GEO database : GSE72784.

Supplementary Table

The samples applied in this study are listed in the table below.

CUT&RUN experimental sample information