Arabidopsis SDG proteins mediate Polycomb removal and transcription-coupled H3K36 methylation for gene activation

Yicong Wang; Masato Abe; Yuka Kadoya; Takeru Saiki; Kanae Imai; Xuejing Wang; Taiko Kim To; Soichi Inagaki; Takamasa Suzuki; Tetsuji Kakutani; Toshiro Ito; Nobutoshi Yamaguchi

doi:10.7554/eLife.100905.1

Post-translational modifications of histone proteins play critical roles in regulating the physical properties of chromatin and defining the transcriptional state¹. Among various histone modifications, histone methylation primarily occurs on lysine or arginine residues and can affect both chromatin condensation and transcriptional states. Trimethylation of lysine 27 on histone H3 (H3K27me3) is associated with downregulation of nearby genes, leading to the formation of heterochromatic regions that are responsive to certain conditions². Methylation of lysine 4 and 36 on histone H3 (H3K4 and H3K36) is present on nucleosomes of transcriptionally active chromatin². Histone modifications are complex and reflect the topological constraints caused by nearby modifications. Indeed, the presence of particular histone modifications can promote the addition of the same type of modification or the removal of different types: for example, higher H3K36 levels hinder H3K27me3 deposition³. Such interconnected epigenetic systems, characterized by bistable states, determine the transcriptional activity of each gene.

Methylation marks can be added to histones by specific enzymes called methyltransferases. In all eukaryotic organisms, the activity of H3K27 methyltransferase is controlled by Polycomb Repressive Complex 2 (PRC2)^4–6. In Drosophila melanogaster, PRC2 contains four core subunits, including Enhancer of zeste [E(z)] with a catalytic SET domain and Extra sex comb (Esc)⁶. Arabidopsis (Arabidopsis thaliana) has three E(z) methyltransferases homologous to E(z)—CURLY LEAF (CLF), SWINGER (SWN) and MEDEA (MEA)—that catalyze H3K27me3⁷. Because these methyltransferases act redundantly, higher-order mutants exhibit more severe developmental defects than any single mutant⁸. Arabidopsis FERTILIZATION INDEPENDENT ENDOSPERM (FIE) is a WD40 homolog of D. melanogaster Esc. PRC2 interacts with multiple transcription factors through its subunits and is recruited to cis-regulatory regions of several hundred base pairs in length called Polycomb Response Elements (PREs). Proper PRC2 placement by trans-acting factors, such as BASIC PENTACYSTEINE (BPC), APETALA2 (AP2)-like or C2H2 zinc-finger (ZnF) family members, triggers the initial deposition of H3K27me3^9,10. The opposing H3K36 trimethylation is regulated by SET DOMAIN GROUP8 (SDG8)¹¹. SDG8 interacts with the transcription machinery, namely RNA polymerase II (RNA Pol II) and the polymerase-associated factor 1 (Paf1) complex (PAF1C)¹². PAF1C components include VERNALIZATION INDEPENDENCE2 (VIP2, also named EARLY FLOWERING 7 [ELF7]), VIP3, VIP4, VIP5 and VIP6 (also named ELF8)¹³. Despite the critical importance of H3K36me3 and the general transcription machinery, sdg8 single mutants show relatively weak developmental defects¹⁴, suggesting that the full function of SDG8 may be masked by partial redundancy with closely related protein(s). However, the identity of such factor(s) and their roles in the epigenetic systems have not been established.

In this study, we focus on one SDG8 homolog, SDG7, to consolidate the existing knowledge of the epigenetic interaction network along with the general transcription machinery. We show that the sdg7 sdg8 double mutant exhibits developmental abnormalities and a notable decrease in both H3K36me2 and H3K36me3 levels, supporting the partial redundancy of SDG7 and SDG8. We also examine the genomic regions to which SDG7 and SDG8 are recruited and explore the connection between the sdg7 sdg8 mutant phenotypes, the PRC2 complex, and RNA Pol II. We propose that SDGs displace PRC2 from PREs, preventing the deposition of H3K27me3 and activating the expression of target genes through transcription-coupled H3K36 methylation.

Results

H3K36me2 and K36me3 levels are decreased by the double loss of SDG7/SDG8 genes

To identify potential factors that function together with SDG8, we examined the five class II SDG genes for overlapping expression patterns¹⁶. Visualization of expression data from the Arabidopsis eFP website as a heatmap revealed that SDG8 is expressed throughout development (Fig. 1a). Among the five members, SDG7emerged as a promising candidate for synergistic interaction with SDG8, as illustrated by promoter reporter lines in which b-GLUCURONIDASE was driven by the SDG7 or SDG8 promoter (Supplementary Fig. 1). AlphaFold predicted similar structures for the SET, post-SET and Associated With SET (AWS) domains of SDG7 and SDG8 (Fig. 1b,c and Supplementary Fig. 2). Comparison between predicted SDG7 and SDG8 and the crystal structure of the SET domain from ARABIDOPSIS TRITHORAX-RELATED PROTEIN 5 (ATXR5) in complex with a K36me3 peptide suggested that these proteins have highly similar catalytic core pockets¹⁷ (Fig. 1d). These results suggest a conserved biochemical function for SDG7 and SDG8 in H3K36me3 recognition (Fig. 1d).

SDG7 and SDG8 mediate H3K36me2 and K36me3 deposition
a, Heatmap representation of gene expression levels for class II SDG family genes in specific tissues. b, Diagrams of Arabidopsis class II SDG family proteins. c, 3D structure prediction of the conserved domain of SDG7 and SDG8 by AlphaFold 2. Left, Inter-chain predicted aligned error (PAE) plot for the best-ranked model structures of SDG7 and SDG8 by AlphaFold 2. Right, Structure models of SDG7 and SDG8 predicted by AlphaFold 2. Yellow, green and purple lines indicate the AWS, SET and post-SET domains, respectively. d, Overlay of the crystal structure for ATXR5 SET domain in complex with K36me3 histone H3 peptide (Dark green: 5VAC), SDG8 (pale purple) and SDG7 (purple). The right panels show the interfaces of SDG7 and SDG8 with histone tails. e, Representative photographs of the wild type (WT), *sdg7, sdg8* and *sdg7 sdg8* at 25 days after germination. Scale bar, 1 cm. f, Plant height of the WT, *sdg7, sdg8*and *sdg7 sdg8*(WT: n = 35; *sdg7: n* = 25; *sdg8: n* = 25; *sdg7 sdg8: n* = 18). Significant differences were calculated based on one-way ANOVA tests (p = 1.1 x 10^-16). Different letters indicate significant differences based on post-hoc Tukey’s HSD test (p < 0.05: See the Data Source file for all combinations of the exact p-values). g, Top view of stage 14 flowers from soil-grown WT, *sdg7, sdg8* and *sdg7 sdg8*. Scale bar, 0.5 mm. h, Representative photographs showing the roots of WT, *sdg7, sdg8* and *sdg7 sdg8*7-day-old seedlings. Scale bar, 1 cm. i, Primary root length of WT, *sdg7, sdg8* and *sdg7 sdg8* (WT: n = 41; *sdg7: n* = 33; *sdg8: n* = 46; *sdg7 sdg8: n* = 15). Significant differences were calculated based on one-way ANOVA tests (p = 1.1 x 10^-16). Different letters indicate significant differences based on post-hoc Tukey’s HSD test (p < 0.05: See the Data Source file for all combinations of the exact p-values). j, Averaged profiles of H3K36me3 (left) and H3K36me2 (right) signal intensity around genes in WT and *sdg7 sdg8*. k, Heatmaps of H3K36me3 (left) and H3K36me2 (right) accumulation across genes in WT and *sdg7 sdg8*. l, Scatterplot of H3K36me3 (above) and H3K36me2 (below) levels in WT and *sdg7 sdg8*. Top, shoot samples. Bottom, root samples. Hypermethylated and hypomethylated genes in *sdg7 sdg8* relative to WT are shown in light and dark colors, respectively. m, Selected enriched GO terms determined by their – log₁₀-adjusted p-values. H3K36me3 (top) and H3K36me2 (bottom) hypomethylated genes in *sdg7 sdg8* relative to WT. Top, shoot samples. Bottom, root samples. n, H3K36me3 and H3K36me2 ChlP-qPCR at *MAF3*and At1g01040 in the WT and *sdg7 sdg8*. ChIP-seq results and the diagram of the genomic loci and PCR amplicons are shown below. Values are means ± SE (n = 4 independent experiments). Significant differences were calculated based on one-way ANOVA tests (K36me3 at *MAF3: p* = 3.6 x 10^-3; K36me3 at *DCL1: p* = 4.8 x 10^-7; K36me2 at *MAF3: p* = 2.9 x 10^-3; K36me2 at *DCL1: p* = 7.0 x 10^-3). Different letters indicate significant differences based on post-hoc Tukey’s HSD test (p < 0.05: See the Data Source file for all combinations of the exact p-values).

To assess the biological function of SDG7 and SDG8 during plant development, we generated three independent sdg7 sdg8 double mutant using different sdg7 alleles (Supplementary Fig. 3). Although the sdg7 single mutants did not show any clear developmental defects, the sdg7 sdg8 double mutant flowered significantly earlier and exhibited a dwarf phenotype compared to the sdg8 single mutant, which itself had an earlier flowering time and smaller than the wild type (WT) (Fig. 1e–i and Supplementary Figs. 3 and 4) (p< 0.05, one-way ANOVA tests). We closely examined cells from leaves and petals in the sdg7 sdg8 double mutant, which revealed that the observed dwarfism is due to defects in both cell size and cell number (Supplementary Fig. 4). Consistent with these phenotypic changes, we detected lower expression levels for marker genes of auxin homeostasis, cell division and key developmental regulators in the sdg7 sdg8 mutant using promoter reporter lines driving GUS, green fluorescent protein (GFP) or yellow fluorescent protein (YFP) expression (Supplementary Fig. 5).

To evaluate the biochemical function of SDG7 and SDG8 in the context of H3K36me3 and H3K36me2 deposition, we conducted chromatin immunoprecipitation followed by deep sequencing (ChIP-seq). Metaplot profiles in the wild type indicated that H3K36me3 and H3K36me2 deposition reaches their highest levels near the 5′ and 3′ ends of transcribed genes, respectively (Fig. 1j,k). The sdg7 sdg8 mutant showed higher levels of the H3K36me3 mark at the 5′ ends of genes, downstream of the transcription start site (TSS), compared to the wild type, and lower levels for the same mark located in the middle of the genes, as reported previously¹¹(Fig. 1j,k). By contrast, the H3K36me2 marks were absent across the entire gene body of transcribed regions in the sdg7 sdg8 mutant (Fig. 1j,k). In the sdg7 sdg8 mutant, we identified 3,380 genes as being H3K36me3-hypomethylated and another 4,878 as H3K36me2-hypomethylated relative to the wild type (Fig. 1l and Supplementary Tables 1 and 2). Based on gene ontology (GO) term enrichment analysis, these hypomethylated genes were associated with similar developmental processes, such as meristem and reproductive development as well as cell cycle process (Fig. 1m and Supplementary Tables 3 and 4). At selected targets, H3K36me3 and H3K36me2 levels in sdg7 sdg8 were either similar to or significantly lower than those in the sdg8 single mutant (Fig. 1n). These results indicate that SDG7 and SDG8 are required to maintain proper H3K36me3 and H3K36me2 levels.

SDG7 occupies TSS and TES while SDG8 occupies gene body

We addressed the genome-wide binding pattern of SDG8 and SDG7 by ChIP-seq (Supplementary Tables 5 and 6). We detected SDG8 binding along the gene body, particularly in introns, exons and their junctions, whereas SDG7 binding was primarily located near the TSS and transcription end site (TES) (Fig. 2a,b). We determined that more than 50% of SDG7-bound regions are within promoter regions (Fig. 2b). We conducted a GO term enrichment analysis using the predicted direct SDG8 and SDG7 targets based on their binding peaks. In good agreement with a previous publication¹¹, we observed an enrichment for light-related GO terms associated with SDG8 target genes such as ‘photosynthesis’, ‘light reaction’ and ‘circadian rhythm’ (Fig. 2c and Supplementary Table 7). The top 20 significantly enriched GO terms for SDG7 targets included ‘biological regulation’, ‘response to stimulus’, ‘response to chemical’ and ‘developmental process’ (Fig. 2c and Supplementary Table 8). An exploration of semantic similarity among all significant GO terms with REVIGO revealed that SDG8 and SDG7 are involved in similar biological processes, such as response to stimulus and circadian rhythm (Fig. 2d and Supplementary Tables 9 and 10).

SDG7 is present at TSS and TES, whereas SDG8 occupies gene body
a, Averaged profiles of SDG8-GFP and SDG7-VENUS signal intensity around genes in wild type. b, Distribution of peak position for SDG8-GFP and SDG7-VENUS. The percentages of binding peaks at each region type are shown. c, Top 20 GO terms as determined by their –log₁₀-adjusted p-values using SDG8-(left) or SDG7- (right) bound genes. d, Scatterplot showing representative clusters in a two-dimensional space based on semantic similarities between GO terms. Color and size indicate p-value and the frequency of the GO term in the underlying gene ontology annotation database, respectively. Left: SDG8-bound genes. Right: SDG7-bound genes. e, Upset bar plots showing the number of overlapping genes among SDG8- and SDG7-bound genes and H3K36me3- and H3K36me2-hypermethylated genes in the wild type. f, Left: Venn diagram showing the overlap between H3K36me3- and H3K36me2-hypomethylated genes in *sdg7 sdg8* and direct SDG8 (top) and SDG7 (bottom) targets. Right, distribution of peak position for SDG8-(left) and SDG7-(right) bound peaks according to their functional genomic distribution, using the same color legend as in b. AnnotatePeak was used for assignment.

We also examined the degree of genome-wide overlap in chromatin occupancy for SDG8, SDG7, H3K36me3 and H3K36me2 at the same developmental stages (Fig. 2e). We observed a co-localization between significant SDG8- and SDG7-bound peaks (p< 1.1 × 10^-17, hypergeometric test) and with peaks for H3K36me3 and H3K36me2, both on a global scale and at specific loci (Fig. 2e and Supplementary Table 11). Among 331 direct SDG8 targets, 103 genes (31%) met the criteria of decreased H3K36me3 deposition, with another 147 genes (44%) showing lower H3K36me2 levels in the sdg7 sdg8 mutant (Fig. 2f). Consistent with the general distribution patterns, SDG8 bound to genes enriched with H3K36me3, showing a greater preference for the 5′ region of genes, encompassing the gene body, promoter and 5′ untranslated region (5′ UTR) (Fig. 2f). For both H3K36me3- and H3K36me2-decorated genes, we observed most SDG8 binding peaks within the gene body (Fig. 2f). Looking at H3K36me2 deposition, we determined that SDG8 primarily binds to the 3′ ends of genes, including the 3′ UTR and intergenic regions (Fig. 2f), suggesting that SDG8 is recruited to H3K36 methylation peaks. We also established that 34% of all SDG7-bound genes overlap with H3K36me3-hypomethylated genes, while another 42% overlapped with H3K36me2-hypomethylated genes. By contrast, SDG7 binding peaks exhibited the same patterns along genomic features for both H3K36me2- and H3K36me3-hypomethylated genes, unlike SDG8. Taking these results together, we propose that SDG7 and SDG8 regulate key shared target genes via H3K36 methylation by distinct mechanisms.

SDG7 displaces PRC2 from PRE to overcome H3K27me3-mediated gene silencing

Permissive histone modifications influence the access of transcription factors and/or the transcriptional machinery by controlling accessibility to their binding sites along chromatin. Since SDG7 is preferentially localized at gene regulatory regions on the genome, we conducted de novo motif profiling and identified motifs recognized by known transcription factors within the SDG7-bound peaks (Fig. 3a and Supplementary Table 12). We noticed the overrepresentation of GA repeat (recognized by the BPC family), CTCC, CCG (recognized by AP2/ETHYLENE-RESPONSE FACTORs [ERFs]), G-box (recognized by ZnF proteins),and CArG box (recognized by MADS-box transcription factors) (Fig. 3a and Supplementary Table 12). Except for the CArG box, these motifs have previously been reported to be associated with Polycomb occupancy⁹. Overrepresentation of the CArG box may be due to BPC-mediated recruitment of MADS domain complex to DNA for tissue-specific regulation of target genes¹⁸. Indeed, we observed a significant overlap between cis-elements located at SDG7 peaks and previously identified cis-elements located at PREs⁹ (p< 1.2 × 10^-58, hypergeometric test) (Fig. 3b and Supplementary Table 13). The top 10 significantly enriched shared cis-elements included binding motifs for Polycomb-recruiting or interacting transcription factors, such as AP2/ERF, Barley B Recombinant (BBR)-BPC and ZnF (Fig. 3c). After recognizing these motifs as common elements for SDG7 and PRC2, we investigated the mechanism underlying the antagonism between SDG7 and PRC2. As reported previously, enrichment of PRC2 components (FIE and CLF), and their recruiters (AZF1 and BPC1) was seen at key target loci⁹, such as PETAL LOSS (PTL), SHORT VEGETATIVE PHASE (SVP) and AGAMOUS (AG) (Fig. 3d). Indeed, SDG7 binding peaks were present near the PRC2 and recruiter peaks at these loci (Fig. 3d). To examine the possible competition between SDG7 and PRC2 for their binding sites, we performed chromatin immunoprecipitation–quantitative PCR (ChIP-qPCR) in 35S:SDG7-HA-GR proCLF:CLF-GFP plants 8 hours after mock or dexamethasone (DEX) treatment. DEX induction resulted in rapid recruitment of SDG7-HA-GR (a fusion between SDG7, the hemagglutinin tag and the glucocorticoid receptor) to known PREs, the regions occupied by PRC2, as evidenced by ChIP-qPCR with an anti-HA antibody (Fig. 3e). An anti-GFP and anti-H3K27me3 ChIP-qPCR performed on the same sample uncovered a concomitant decrease in CLF and H3K27me3 occupancy at PTL, SVP, and AG (Fig. 3e), suggesting that SDG7 evicts PRC2 complex from PREs to overcome H3K27me3-mediated gene silencing.

SDG7 displaces PRC2 complex from PRE to overcome Polycomb repression
a, *De novo* motif analysis under each SDG7 peak summit. p-values obtained from HOMER are shown. b, Venn diagram showing the overlap between SDG7-bound *cis*-elements and all PREs bound by transcription factors. c, Classification of transcription factor families seen binding to SDG7 peak summit and PREs. d, ChIP-seq signals for H3K27me3, FIE, CLF, AZF1, BPC1, and SDG7 at the selected target loci. The gene models are shown as black bars and lines at the bottom of each panel. e, SDG7-HA-GR, CLF-GFP, and H3K27me3 ChIP-qPCR at PREs of *PTL, SVP*, and AG genes in mock- and dexamethasone (DEX)-treated *pro35S:SDG7-HA-GR* seedlings. ChIP-seq assay and diagram of the genomic loci and PCR amplicons are shown below. Values are means ± SE (n = 3 independent experiments). Significant differences were calculated based on one-tailed Student’s t-test (SDG7-HA at *PTL*: p = 0.039; SDG7-HA at *SVP: p* = 0.046; SDG7=HA at AG: p = 4.7 x 10^-3; CLF-GFP at *PTL*: p = 6.5 x 10^-3; CLF-GFP at *SVP: p* = 0.011; CLF-GFP at AG: p = 0.01; H3K27me3 at *PTL*: p = 0.040; H3K27me3 at *SVP: p* = 0.029; H3K27me3 at AG: p = 0.018). f, Representative photographs of *sdg7 sdg8, clf* and *sdg7 sdg8 clf* at 25 days after germination. Scale bar, 1 cm. g, Top view of stage 14 flowers from soil-grown *sdg7 sdg8, clf* and *sdg7 sdg8 clf*. Scale bar, 0.5 mm. h, Representative photographs of roots from *sdg7 sdg8, clf* and *sdg7 sdg8 clf*. Scale bar, 1 cm. i, Plant height in *sdg7 sdg8, clf* and *sdg7 sdg8 clf (sdg7 sdg8 n* = 18; *clf*: n = 10; *sdg7 sdg8 clf: n* = 22). Significant differences were calculated based on one-way ANOVA tests (p = 1.1 x 10^-16). Different letters indicate significant differences based on a post-hoc Tukey’s HSD test (p < 0.05: See the Data Source file for all combinations of the exact p-values). j, Root length in *sdg7 sdg8, clf* and *sdg7 sdg8 clf (sdg7 sdg8: n* = 15; *clf*: n = 29; *sdg7 sdg8 clf: n* = 46). Significant differences were calculated based on one-way ANOVA tests (p = 1.1 x 10^-16). Different letters indicate significant differences based on a posthoc Tukey’s HSD test (p < 0.05: See the Data Source file for all combinations of the exact p-values). k, Volcano plot showing differentially expressed genes (DEGs). Top, WT vs *clf*. Middle, WT vs *sdg7 sdg8*. Bottom, *sdg7 sdg8* vs *sdg7 sdg8 clf*. l, Venn diagram showing the overlap between differentially expressed genes identified by RNA-seq. Top, upregulated genes in *clf* vs WT, downregulated genes in *sdg7 sdg8* vs WT, and upregulated genes in *sdg7 sdg8 clf* vs *sdg7 sdg8*. Bottom, downregulated genes in *clf* vs WT, upregulated genes in *sdg7 sdg8* vs WT, and downregulated genes in *sdg7 sdg8 clf* vs *sdg7 sdg8*. m, Heatmap representation of gene expression levels in wild type (WT), *clf, sdg7 sdg8*, and *sdg7 sdg8 clf* at 7 days after germination. The Z-scores are based on FPKM values. n, Interactive graph view of 827 genes (genes shown by while color in Fig. 3m) generated by REVIGO.

Considering the deposition of opposing histone modifications by SDG7/SDG8 and PRC2, these two types of proteins may function antagonistically in plant development and gene expression. Indeed, introduction of the clf mutant causing loss of Polycomb function in the sdg7 sdg8 background partially rescued the defects seen in the sdg7 sdg8 mutant (Fig. 3f–h). Most phenotypic defects (plant height, flower size, primary root length and lateral root density) in sdg7 sdg8 were partially suppressed in cf sdg7 sdg8 compared to sdg7 sdg8 (Fig. 3i,j) (p< 0.05, one-way ANOVA tests). Some phenotypes, however, such as flowering time and leaf size, were enhanced in the triple mutant (Supplementary Fig. 6), possibly due to secondary effects caused by expression changes in many regulatory genes as reported previously¹⁹. To assess the effect of introducing the clf mutation in sdg7 sdg8 for gene expression patterns, we performed transcriptome deep sequencing (RNA-seq) of wild type, clf, sdg7 sdg8 and sdg7 sdg8 clf 7-day-old seedlings (Supplementary Table 14). Based on a principal component analysis (PCA) and heatmap analysis with clustering of the RNA-seq data, each genotype formed a different group, reflecting the high quality of our RNA-seq data (Supplementary Fig. 6).

We next defined DEGs that were upregulated in the clf mutant relative to WT (195 genes), downregulated in the sdg7 sdg8 mutant compared to WT (467 genes) and downregulated in clf sdg7 sdg8 relative to sdg7 sdg8 (838 genes) (Fig. 3k). The genes that fulfilled these criteria included the floral repressor gene FLOWERING LOCUS C²⁵ (FLC) and the antisense transcript originating from the 3′ end of FLC, COOLAIR²⁵ (Fig. 3l). DEGs that were downregulated in the clf mutant compared to WT (77 genes), upregulated in the sdg7 sdg8 mutant relative to WT (1,287 genes), and upregulated in clf sdg7 sdg8 compared to sdg7 sdg8 (802 genes) may also be involved in phenotypic rescue (Fig. 3k,l). We next defined genes that may contribute to the partial phenotypic rescue of sdg7 sdg8 by the clf mutation. K-means clustering of all differentially expressed genes (DEGs) identified four main patterns and 16 subclusters (Fig. 3m). SMALL AUXIN UPREGULATED RNAs²⁰ (SAURs) and YUCCA4²¹ (YUC4), which affect plant size, clustered together (cluster 1 in Fig. 3m) and were upregulated in clf sdg7 sdg8 compared with sdg7 sdg8. GIGANTEA²² (GI), whose protein product regulates the circadian clock, phytochrome B signaling and photoperiod-mediated flowering, clustered with these genes (Fig. 3m). By contrast, the genes encoding the calmodulin-like proteins TOUCH3²³ (TCH3) and TCH4, and the serine/threonine kinase PINOID²⁴ (PID), were downregulated in clf sdg7 sdg8compared to sdg7 sdg8. A GO term enrichment analysis linked the differentially expressed genes (DEGs) antagonistically regulated by CLF and SDG7/SDG8 (828 genes) to GO terms related to development, phytohormone response, phytohormone-mediated development, indole metabolism, and cell wall organization (Fig. 3n). Hence, the partial rescue of the pleiotropic defects and altered gene expression seen in sdg7 sdg8 by loss of Polycomb function underscores the antagonistic relationship between SDG7/SDG8 and PRC2.

SDG8 and PAF1C propagate H3K36 methylation together with RNA Pol II for shared target activation

SDG8 was previously shown to interact with the transcription machinery components RNA Pol II and its associated factor PAF1C. We integrated multiple transcriptome datasets in sdg7 sdg8 and five PAF1C-deficient mutants¹³ (elf7, vip3, vip4, vip5 and vip6) and identified shared downstream genes regulated by both SDG8 and PAF1C. To this end, we selected DEGs with decreased transcript levels in sdg7 sdg8 and at least one PAF1 C-deficient mutant (Fig. 4a,b and Supplementary Table 15). We determined that 230 out of 467 genes (49%) are commonly downregulated in these mutants (Fig. 4a), suggesting that SDG8 primarily mediates target gene expression in conjunction with PAF1C. We next focused on PAF1C-regulated DEGs that exhibit CLF-dependent repression and contributed to the partial phenotypic rescue of the clf sdg7 sdg8 triple mutant relative to sdg7 sdg8 (Fig. 4c). Several genes, including FLC, were significantly repressed by CLF (Fig. 4c). Indeed, CLF and SDG7 binding peaks were present near the TSS of these loci (Fig. 4d). In addition, we observed H3K27me3 deposition along the gene body at these loci in wild type, while the spreading of the silencing modification disappeared in the clf mutant. By contrast, we noticed the higher enrichment of SDG8, RNA Pol II and its phosphorylated versions for elongation²⁶ near the TSS and 3′ end pause site of RNA Pol II (Fig. 4d). We also detected weaker SDG8 and RNA Pol II enrichment along the gene body for these loci (Fig. 4d). The presence of peaks near the TSS and 3′ end pause site of RNA Pol II correlated well with the positions of H3K36me3 and H3K36me2 peaks, respectively (Fig. 4d). Consistent with the levels of H3K36me3, H3K36me2 and H3K27me3 at these loci in the sdg7 sdg8 mutant, the expression levels of FLC, At4g13572, At4g13575, UMAMIT14 and COOLAIR were lower in sdg7 sdg8 compared to wild type and clf (Fig. 4d,e). In addition, the expression levels of most of these genes returned to wild-type levels in the clf sdg7 sdg8 triple mutant (Fig. 4e). Our combined data suggest that SDG proteins evict PRC2 from PREs to prevent H3K27me3 deposition and activate target gene expression via transcription-coupled H3K36 methylation (Fig. 4f).

SDG8 and PAF1C propagate H3K36 methylation together with RNA Pol II
a, Venn diagram showing overlapping downregulated genes in *sdg7 sdg8* vs wild type (WT), *ef7* vs WT, *vip3* vs WT, *vip4* vs WT, *vp5*vs WT and *vip6* vs WT. b, Heatmap representation of gene expression levels for 230 PAF1 C-regulated genes in WT, *elf7, vip3, vip4, vip5*and *vip6*. RPKM values are shown. c, Venn diagram showing the overlap between 230 PAF1C complex–regulated genes, upregulated genes in *clf* vs WT, and upregulated genes in *sdg7 sdg8 clfvs sdg7 sdg8*. d, ChlP-seq signals for CLF, H3K27me3, SDG8, SDG7, RNA Pol II, RNA Pol II-S2P, RNA Pol II-S5P H3K36me3 and H3K36me2 at the selected target loci and their expression levels in WT, *clf, sdg7 sdg8* and *sdg7 sdg8 clf*. The gene models are shown as black bars and lines at the bottom of each panel. e, Expression levels of *COOLAIR, FLC*, At4g13572, At4g13575 and *UMAMIT14* in WT, *clf, sdg7 sdg8and sdg7 sdg8 clf* Values are means ± SE (n = 3 independent experiments). f, Molecular mechanism of K27/K36 methylation propagation by PRC2 and SDG8/SDG7. H3K27me3; purple circles. H3K36me3; green circles. H3K36me2; blue circles.

Discussion

Here we identify SDG7 and SDG8 as key regulators that overcome PRC2-mediated silencing, leading to a switch from H3K27 to H3K36 methylation during growth and development. We provide a molecular framework for the antagonistic roles of SDG7 and PRC2, which relies on their competition at PREs in the regulatory regions of their shared target loci. Key targets, under dual opposite transcriptional regulation by SDG7 and PRC2, are controlled by SDG8 and the RNA Pol II elongation complex. The binding of these factors balances global distribution patterns between methylation of H3K36 and H3K27 and determines expression levels.

In Arabidopsis, PRC2 binds to PREs consisting of GAGA, telobox and G-box motifs. Trans-acting factors such as BPC, Arabidopsis ZINC-FINGER PROTEINs (AZFs), SCREAM and MUTE recruit PRC2 to PREs for H3K27me3 deposition and gene silencing^9,27. We uncover a PRC2 eviction strategy that enables the alteration of epigenetic states. SDG7 is recruited to PREs and displaces PRC2, facilitating H3K36 methylation and activation of the target gene. Hence, the opposing dynamics of H3K27 and H3K36 methylation is regulated by cis-localized DNA motifs. It has been reported that the MADS-box protein AG can evict PRC2 from PREs at specific loci²⁸. Interestingly, CArG boxes, which are important for MADS box binding, were significantly enriched under the SDG7 binding peaks. The relationship between SDG7 and MADS-box proteins in the context of PRC2 displacement needs to be examined in the future. Although we identified many cis-elements associated with PRC2 occupancy, we did not identify the telobox among them. Hence, additional factors that can displace PRC2 from PREs besides SDG7 likely exist and may act at different stages in different tissues.

Our data suggest that SDG8 and RNA Pol II (as well as its phosphorylated versions Ser2P and Ser5P) are located near the TSS and 3′ ends of target genes. We observed H3K36me3 and H3K36me2 peaks at the TSS and 3′ end pause sites of RNA Pol II, and these signals were significantly weaker in sdg7 sdg8, suggesting the critical role of SDG8 and the RNA Pol II elongation complex for H3K36me3 and H3K36me2 deposition. Differentially H3K36me3-methylated genes maintained their H3K36me3 peaks at their 5′ ends in the wild type and the sdg7 sdg8 double mutant. Additional support for this pattern comes from previous ChIP assays in the sdg8 single mutant¹¹, suggesting that SDG8 may primarily function in the propagation of H3K36me3 rather than its initial deposition. Thus, additional histone methyltransferases may contribute to H3K36me3 deposition near the TSS, even in the sdg7 sdg8 double mutant background. Notably, H3K36me2 was almost absent from differentially H3K36me2-methylated genes in the sdg7 sdg8 double mutant. H3K36me2 functions not only in the elongation and/or termination steps of mRNA transcription from the 5′ end, but also in the initiation of transcription from the 3′ end, as observed with COOLAIR.

Changes in the relative balance of mutually exclusive histone modifications occur during plant development. Due to the different distribution of histone modification patterns along the genome between plants and animals, gaining mechanistic insight into the antagonism in plants will benefit not only plant development but also the evolution of gene expression systems.

Methods

Plant materials and growth conditions

All Arabidopsis (Arabidopsis thaliana) lines analyzed in this study are in the Columbia (Col-0) accession. sdg8-2²⁹ (SALK_026442), sdg7-2²⁹ (SALK_131218), sdg7-3 (SALK_072682), sdg7-4 (WiscDsLox430F09), clf-28 (SALK_139371). proLBD16:GUS30 proPLT2:PLT2-YFP3, proSHR:SHR-GFP3², DR5:GUS33, proWOX5:NLS-GFP3⁴ proCYCB1;2:CYCB1;2-NLS-YFP⁵, proCLF:CLF-GFP3 were used. Primers used for genotyping are listed in Supplementary Table 16.

Arabidopsis seedlings were grown on half-strength Murashige and Skoog (MS) medium plates. MS salts (Duchefa Biochemie), 2-(N-morpholino)ethanesulfonic acid (Nacalai Tesque, Kyoto) and 0.8% (w/v) agar (Nacalai Tesque, Kyoto) were added to distilled water. The pH was adjusted from 4.3 to 5.6 with KOH. After autoclaving at 121°C for 20 min, 50 ml medium was poured into individual sterile No. 2 square culture dishes (Eiken Chemical) on a clean bench.

Seeds were surface sterilized with 70% (v/v) ethanol for 10 min, washed three times with sterile distilled water and sown on half-strength MS plates. After stratification at 4°C in the dark for up to 7 days, the plates were placed in dish drainer trays located in a growth chamber at 22°C under constant light conditions (200 μl m² s^-1).

Cloning and plant transformation

To generate the construct harboring the SDG8 promoter, the genomic region preceding the gene was amplified from Col-0 genomic DNA with primers proSDG8-FW and proSDG8-RV using PrimeSTAR GXL DNA polymerase (TaKaRa) through gradient PCR. The resulting PCR product was subjected to electrophoresis on an agarose gel, purified from the gel using a gel/PCR extraction kit (NIPPON Genetics) and then inserted into pENTR/D-TOPO (Thermo Fisher Scientific) using Gateway cloning. The identity of the inserted sequence was confirmed via Sanger sequencing on an Applied Biosystems 3130 genetic analyzer (Applied Biosystems). Supplementary Table 16 provides the list of primers employed for the promoter cloning. The promoter fragments were recombined into the pBGWSF7.0 GUS-GFP³⁷ vector via LR reaction facilitated by LR Clonase II mix (Thermo Fisher Scientific). Following verification of the junction between the promoter insertion and GUS-GFP, the resulting plasmids were introduced into Agrobacterium (Agrobacterium tumefaciens strain GV3101. Positive Agrobacterium colonies were used for transformation of Arabidopsis plants through the floral dip method³⁸. For gSDG7-VENUS and gSDG7-GUS, a 4461 -bp SDG7genomic fragment (from - 1051 bp to + 3410 bp; the A of the start codon was set as +1) amplified with the primers gSDG7-FW and gSDG7-RV was cloned into the pCR8/GW/TOPO vector (Invitrogen). An Sfol restriction site was introduced upstream of the SDG7 stop codon through mutagenesis PCR using the primers mSDG7-SfoI-FW and mSDG7-SfoI-RV. VENUS and GUS fragments were then inserted into the SfoI site of the pCR8/GW/TOPO-gSDG8 plasmid. Finally, the gSDG7-VENUS or gSDG7-GUS cassette was recombined into pEarleyGate303 using LR Clonase II (Invitrogen). To generate the pro35S:SDG7-GR plasmid, a SDG7 cDNA fragment of 1174 bp with 85 bp of the 5′ untranslated region (5′ UTR) was amplified by the primers SDG7-KpnI-FW and SDG7-ApaI-RV. After digestion, the SDG7 cDNA without the stop codon was inserted into pGreen0281 in-frame with the sequence encoding GR-HA.

The resulting plasmids were introduced into Agrobacterium (strain GV3101. Positive Agrobacterium colonies were used for transformation of Arabidopsis plants through the floral dip method³⁸.

Data comparison

Publicity available expression datasets were obtained from The Bio-Analytic Resource for Plant Biology database (http://bar.utoronto.ca) or from published work¹³. After reorganizing gene expression levels in Microsoft Excel (Version 16.79.2), a heatmap was generated with the packages heatmaply (Version 1.4.2) or superheat (Version 0.1.0) in R (Version 4.3.1). Venn diagrams were generated using Venny²·¹ (https://bioinfogp.cnb.csic.es/tools/venny/) or BioVenn (https://www.biovenn.nl) on the respective websites, or ggVennDiagram (Version 1.2.3) in R (Version 4.3.1). p-values were calculated based on the hypergeometric test (http://nemates.org/MA/progs/overlap_stats_prog.html). Upset plots were generated using the UpSet R Shiny App (https://gehlenborglab.shinyapps.io/upsetr/).

Protein structure prediction and comparison

Protein structure was predicted using Alpha fold 2.0³⁹ (Version 1.5.3) on Google Colab (https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb). The amino acid sequences for five class II SDG family members were obtained from the TAIR website (https://www.arabidopsis.org). The crystal structure data of the ATXR5 SET domain in complex with K36me3 (PDB entry 5VAC¹⁷) was obtained from the Protein Data Bank (https://doi.org/10.2210/pdb5VAC/pdb). Comparison between predicted SDG proteins and ATXR5 was conducted using the PyMOL Molecular Graphics System (Version 2.5.2).

Phenotyping

For plant height analysis, plants were grown on soil. The final height of the main stem bearing an inflorescence was measured. Photographs of the entire plant were taken, and plant height was measured with ImageJ software (NIH).

For root phenotypic analysis, seedlings were grown on vertically oriented MS plates. The root length of seedlings 7 days after germination (7 DAG) was measured. Photographs of the entire seedling were taken. The length from the base of the hypocotyl to the root tip was measured with ImageJ software (NIH). To quantify lateral root number, the roots of 7-DAG seedlings were used. The lateral root number was counted using a microscope (Olympus SZ).

For leaf phenotypic analysis, the 5th true leaves from plants grown on soil were used. Leaves were excised with forceps and placed into glass vials containing FAA (formalin: ice-cold acetic acid:70% [v/v] ethanol at a volume ratio of 1:1:18) for fixation. After 30 min in Faa under vacuum, the leaves were transferred to a chloral hydrate/glycerol/H2O mixture solution (8 g: 1 mL: 2 mL) and incubated for at least 16 h until clear. Leaves were then placed onto glass slides (Matsunami Glass Industry Co., Ltd.), and covered with coverslips (Matsunami Glass Industry Co., Ltd.). Images were taken with an AXIO Zoom V16 stereo microscope (Zeiss). Leaf area and leaf length from the leaf tip to the petiole were measured with ImageJ software (NIH). To quantify palisade cell size and number, the same leaf samples were used. The cells were imaged using an AxioScope A light microscope (Zeiss). The epidermal cell area was measured with ImageJ software (NIH). Cell numbers were calculated from leaf and cell areas.

For petal phenotypic analysis, plants were grown on soil. Petals from stage 15 flowers were used. The petal area was measured with ImageJ software (NIH). To quantify cell size and number, petal epidermal cells were imaged by scanning electron microscopy (SEM). For SEM, petals were placed in FAA (45% [v/v] ethanol, 2.5% [v/v] formaldehyde, and 2.5% [v/v] acetic acid), vacuum-infiltrated until the tissues sank and incubated at room temperature for at least 16 h. The fixed tissues were then passed through a gradient ethanol series (50% [ethanol:water, v/v], 60%, 70%, 80%, 90%, 95%, 100% × 2) for 20 min each, followed by a gradient acetone series (25% [acetone:ethanol, v/v], 50%, 75%, 95%, 100% × 2) for 30 min each. Then, the tissues were dried with an EM CPD300 critical point dryer with liquid CO₂ (Leica Microsystems). Samples were gold-coated with a coating time of 45 s using a gold coater instrument (Hitachi E 1010). The tissues were imaged with a S-4700 scanning electron microscope (Hitachi) with an accelerating voltage of 15 kV. At least 10 petals for each genotype were observed; representative images are shown. Cell area was measured with ImageJ software (NIH). Cell numbers were calculated from petal and cell areas.

All plots were generated using ggplot2 (Version 3.4.3) in R (Version 4.3.1). Significant differences were determined through one-way analysis of variance (ANOVA) tests, followed by a post-hoc Tukey’s honest significant difference (HSD) test (https://astatsa.com/OneWay_Anova_with_TukeyHSD/).

ChIP-seq

ChIP-seq was conducted following the procedures reported previously with minor modifications⁴⁰. Two grams of 5-day-old seedlings were used. Frozen tissues were ground to a fine powder using a mortar and pestle and transferred to nuclei isolation buffer (10 mM HEPES, 1 M sucrose, 5 mM KCl, 5 mM MgCl₂, 5 mM EDTA, 16% [v/v] formaldehyde, 20% [v/v] Triton X-100, β-mercaptoethanol, 100 mM Pefabloc) for 10 min to crosslink proteins and DNA. The reaction was quenched by adding 0.125 M glycine. After removal of debris by filtration through two layers of Miracloth, the resulting solution was centrifuged at 3,000g for 10 min at 4°C. Pellets were resuspended in nuclei isolation buffer. Chromatin was separated using nuclei separation buffer (1 M HEPES (pH = 7.6), 3 M KCl, 1 M MgCl₂, 0.5 M EDTA) by centrifugation. The pellets were dissolved in SDS lysis buffer (1 M Tris-HCl (pH = 7.8), 10% [w/v] SDS, 0.5 M EDTA) and ChIP dilution buffer (50 mM Tris-HCl (pH = 7.8), 0.167 M NaCl, 1.1% [v/v] Triton-X, 0.11% [v/v] sodium deoxycholate). Then, sonication (Time: 20 min, Duty cycle: 5%, Intensity: 4, Cycles per burst: 200, Temperature: 5°C, Power mode frequency: Sweeping) was conducted with a Covaris M2 (M&S instruments) to obtain approximately 300-bp sheared DNA fragments. Immunoprecipitation was conducted using anti-H3K36me3, anti-H3K36me2 and anti-H3 antibodies (H3K36me3, ab9050; H3K36me2, ab9049; H3, abcam), anti-GFP antibody (SAB4701015; Sigma-Aldrich), and Dynabead Protein A (Thermo Fisher) for at least 16 h at 4°C. Beads were washed with low salt RIPA buffer (50 mM Tris-HCl (pH = 7.8), 0.15 M NaCl, 1 mM EDTA, 0.1% [w/v] SDS, 1% [v/v] Triton-X, 0.1% [v/v] sodium deoxycholate), high-salt RIPA buffer (50 mM Tris-HCl (pH = 7.8), 0.3 M NaCl, 1 mM EDTA, 0.1% [w/v] SDS, 1% [v/v] Triton-X, 0.1% [v/v] sodium deoxycholate), LNDET buffer (0.25 M LiCl, 1% [v/v] IGEPAL, 1% [v/v] sodium deoxycholate, 1 mM EDTA, 10 mM Tris-HCl (pH = 7.8)) and 1× TE (10 mM Tris-HCl, 0.1 mM EDTA pH 8.1). Then, ChIP dilution buffer was added to beads and incubated at 65°C overnight to reverse the crosslinking. RNase and Proteinase K were then added to digest residual RNA and proteins. ChIP DNA was purified with a Monarch PCR & DNA Cleanup Kit (Monarc). To generate libraries for sequencing, a ThruPLEX DNA-seq kit (RUBICON GENOMICS) was used following the manufacturer’s instructions. Libraries were purified with AMPure beads (Beckman Coulter) and sequenced on a NextSeq 500 or NovaSeq 6000 instrument (Illumina).

Mapping of ChIP-seq data for histone modification was performed as reported previously³⁹. Using Bowtie (version 1.2.2), bam files were created by selecting the same sequences matching the Arabidopsis thaliana TAIR10 reference genome (https://www.arabidopsis.org). Based on the bam files, bed files were created with bedtools (Version 2.27.0). The coverage was determined in R (Version 3.5.2). Mapping of binding peaks for histone modification enzymes was also conducted as described previously²⁸. The raw sequencing reads underwent quality checks and trimming using FastQC (Version 0.11.7) and Trimmomatic (Version 0.38), respectively. Subsequently, mapping was carried out with Bowtie2 (Version 2.3.4.2) using the resulting fastq files. Peaks were then identified using SICER (Version 1.1). Read counting was performed through featureCounts (Version 1.6.3). Heatmaps were generated using deeptools (Version 3.2.1). Motif analysis was conducted using HOMER (Version 4.1). Binding peaks were visualized in the Integrative Genomics Viewer (IGV) Web App (https://igv.org/app/). The ChIP-seq data have been submitted to the DDBJ database (SDG7/SDG8: DRA17907, H3K36me2: DRA17938, and H3K36me3: DRA17939).

Gene Ontology (GO) term enrichment analysis

GO term analysis was conducted with the online tool g:Profiler⁴¹ (https://biit.cs.ut.ee/gprofiler_beta/gost) with g:GOS function. To reduce duplicated GO terms, the agriGO⁴² database was used (http://bioinfo.cau.edu.cn/agriGO/). The p-values produced by the GO analysis were filtered by REVIGO⁴³ (http://revigo.irb.hr/). Original files of the network and tree maps were generated by the REVIGO website (http://revigo.irb.hr/). The graphs of the tree maps were created using ggplot2 (Version 3.4.3) in R (Version 4.3.1). The final network figure was generated in Cytoscape (Version 3.10.1).

Chromatin immunoprecipitation quantitative PCR (ChIP-qPCR)

ChIP-qPCR was conducted following the procedures reported previously with minor modifications⁴⁴. To conduct ChIP up to 0.8 g (fresh weight) of 5-day-old seedlings grown on half-strength MS plates was harvested in vials filled with phosphate-buffered saline (PBS) (137 mM NaCl, 2.7 mM KCl, 10 mM Na₂HPO₄ and 2 mM KH₂PO₄, pH 7.4). Fixation was performed in 1% (v/v) formaldehyde for 15 min. During fixation, tissues were vacuum-infiltrated three to five times until they sank. Afterwards, samples were moved to PBS with 0.125 M glycine for 5 min to quench the fixation. Seedlings were rinsed with icecold PBS twice, frozen in liquid nitrogen, and stored in a deep freezer until use. Tissues were ground to a fine powder using a mortar and pestle and dissolved in Nuclei Extraction Buffer (100 mM MOPs pH 8.0, 10 mM MgCl₂, 0.25 M sucrose, 0.5% [w/v] dextran T-40, 2.5% [w/v] Ficoll 400, 9.36 μL/mL β-mercaptoethanol and 10 μL/mL protease inhibitors) to extract protein–DNA complexes. After filtration over two layers of Miracloth and centrifugation, chromatin pellets were resuspended in nuclei lysis buffer (10 mM EDTA pH 8.0, 50 mM Tris-HCl, pH 8.0, and 1% [w/v] SDS). ChIP dilution buffer (16.7 mM Tris-HCl pH 8.0, 1.2 mM EDTA, 167 mM NaCl, 1.1% [v/v] Triton X-100 and 0.01% [w/v] SDS) was then added to each sample. Sonication was conducted with an ultrasonic disruptor (TOMY) to obtain 200–700-bp sheared DNA. ChIP dilution buffer with 22% (v/v) Triton X-100 was mixed in with the sonicated samples and centrifuged to remove debris. After preclearing by Dynabeads™ Protein A (Thermo Fisher) and collection of input DNA, immunoprecipitation was conducted at 4°C overnight using Dynabeads and commercial antibodies (GFP SAB4301138; HA, 12C5; H3K27me3, ab6002; H3K36me3, ab9050; H3K36me2, ab9049 Abcam). Beads were successively washed in low salt buffer (2 mM EDTA, 20 mM Tris-HCl pH 8.1, 150 mM NaCl, 0.1% [w/v] SDS and 1% [v/v] Triton X-100), LiCl buffer (0.25 M LiCl, 1% [w/v] Nonidet P-40, 1% [w/v] deoxycholate, 1 mM EDTA, 10 mM Tris-HCl, pH 8.1), and Tris EDTA buffer (TE, 10 mM Tris-HCl, 0.1 mM EDTA, pH 8.1) twice each for 10 min at 4°C. Elution of protein–DNA from beads was conducted in nuclei lysis buffer by incubation for 25 min at 65°C twice. After combining both eluates, crosslinking was reversed with NaCl overnight at 65°C. DNA was purified with a QIAquick PCR purification kit (Qiagen) according to the manufacturer’s instructions with slight modifications. qPCR was performed as described above. Primer sequences are listed in Supplementary Table 16. Significant differences were calculated based on one-way ANOVA tests followed by a post-hoc Tukey’s HSD test.

RNA-seq

Total RNA was extracted from the entire root of 5-day-old MS-grown seedlings. An RNeasy plant mini kit (Qiagen) was employed to extract RNA following the manufacturer’s instructions. Using total RNA treated with DNase I (Qiagen), sequencing libraries were prepared using a NEBNext Poly(A) mRNA Magnetic Isolation Module (NEB) and NEBNext Ultra TMII Directional RNA Library Prep Kit (NEB). After confirmation of library quality on a bioanalyzer (Agilent), sequencing was conducted using a NovaSeq 6000 instrument (Illumina).

Prior to mapping, the raw sequencing reads underwent quality checks and trimming using FastQC (Version 0.11.7) and Trimmomatic (Version 0.38), respectively. Subsequently, mapping against TAIR10 genome was carried out with HISAT2 (Version 2.1.0) using the resulting fastq files. Read counting and calculation of FPKM and TPM were performed through featureCounts (Version 1.6.3). Hierarchical clustering and principal component analysis were conducted with the R package stat (Version 3.6.1). Differentially expressed genes were identified using DESeq2 (Version 1.24.0). Volvano plots were generated by plotly (Version 4.9.2.1). Reads were visualized in the Integrative Genomics Viewer (IGV) Web App (https://igv.org/app/). The RNA-seq data have been submitted to the DDBJ database (DRA17906).

β-glucuronidase (GUS) staining

For GUS staining, 5-day-old seedlings grown on MS plates were fixed in ice-cold 90% (v/v) acetone for 10–15 min. The samples were washed with sterile water, followed by GUS buffer twice (50 mM phosphoric acid buffer, 0.5 mM K₃Fe[CN]₆ [Nacalai Tesque, Kyoto], 0.5 mM K₄Fe[CN]₆ [Nacalai Tesque, Kyoto]). Subsequently, tissues were incubated at 37°C for 30 min to 1 h in GUS buffer with 1 mM X-Gluc (5-bromo-4-chloro-3-indolyl β-D-glucuronide) (Nacalai Tesque, Kyoto). To stop the staining reaction, samples were transferred into 70% (v/v) ethanol.

To observe GUS-stained tissues, samples were washed in sterile water three times, incubated in 4% (v/v) HCl with 20% (v/v) methanol for 15 min at 55°C. Then, the solution was changed to 70% (v/v) ethanol with 7% (w/v) NaOH and incubated for another 15 min at room temperature. Subsequently, tissues were successively incubated in 40% (v/v), 20%, 10% ethanol for 10 min. Finally, samples were incubated in 10% (v/v) ethanol with 50% (v/v) glycerol for 30 min and observed with an upright microscope (Zeiss Scope.A1) by differential interference contrast microscopy.

Confocal microscopy

Fluorescence observation of roots and seedlings was conducted using an FV1000 (Olympus), LSM710, or SP8 microscope (Leica). To generate seedling sections, tissues were placed into DISPOSABLE BASE MOLDS M475-2 (Simport) containing 5% (w/v) agar (Nippon Gene). The molds were kept for at least 5 min to form agar blocks. Sections of 50 μm thickness were then prepared using a Linear Slicer PRO7 (Dosaka-em). Tissues were sandwiched between a NEO MICRO cover glass (MATSUNAMI) and a slide glass (MATSUNAMI) for GFP fluorescence observation. During the observation, a water-immersion lens (x60) was used as the objective lens, and the chosen channel was the GFP channel (500–570 nm).

Fluorescence observation of the inflorescences was conducted using a LSM900 microscope (Carl Zeiss) as described previously⁴⁵. After cutting the tip of the primary inflorescence to a length of approximately 0.5 cm with a razor blade, the primary inflorescence was immersed in a 1 mg/ml solution of propidium iodide (PI) for 2 min to stain the cell walls. The stained inflorescences were then inserted upright into halfstrength MS medium, ensuring the plant surface was submerged in deionized water. After preparing the samples, fluorescence was promptly observed, and both 2D and 3D images were acquired. For observation, a water-immersion lens (×20) was used as the objective lens. In terms of the optical path setup, the Airy scan mode was selected, and the channels chosen were the GFP channel (500–570 nm) and the PI channel (590–700 nm). Additionally, the acquisition of 3D images involved moving the z-axis continuously to scan and create images.

Declarations

Data Availability

Protein structure data was obtained from Protein Data Bank (5VAC: https://www.rcsb.org/structure/5vac).

ChIP-seq and RNA-seq data were collected from the National Center for Biotechnology Information GEO database (CLF/H3K27me3 ChIP: GSE108960 [https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE108960]; AZF1/BPC1/H3K27me3 ChIP: GSE84483 [https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE84483], Pol II/Pol II-S2P/Pol II-S5P ChIP: GSE112443 [https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE112443], and vip3-6/elf7 RNA-seq :GSE171778 [https://0-www-ncbi-nlm-nih-gov.brum.beds.ac.uk/geo/query/acc.cgi?acc=GSE171778]). The RNA-seq and ChIP-seq data have been deposited in the DDBJ database (DRA017906 [https://ddbj.nig.ac.jp/resource/sra-submission/DRA017906], DRA017907 [https://ddbj.nig.ac.jp/resource/sra-submission/DRA017907], DRA017938 [https://ddbj.nig.ac.jp/resource/sra-submission/DRA017938], and DRA017939 [https://ddbj.nig.ac.jp/resource/sra-submission/DRA017939]. Source data are provided with this paper.

Code Availability

Main source codes that are used in this work are available on the GitHub repository [https://zefeng2018.github.io/Plant-Gene-Expression-Prediction/]. Please refer to “Methods” above for details of key parameters and package versions.

Author contributions

Conceptualization: W.Y., M.A., Y.K., Y.X., T.I. and N.Y. (lead) and all other authors (supporting); data curation: N.Y.; formal analysis: W.Y., M.A., Y.K., T.S., K.I., T.T., S.I., T.S. and N.Y.; funding acquisition: T.I. and N.Y.; investigation: W.Y., M.A., Y.K., T.S., K.I. and N.Y. (lead) and T.T., S.I., T.S., T.K., Y.X. and T.I. (supporting); project administration: T.I. and N.Y.; software: N.Y.; supervision: T.I. and N.Y.; validation: N.Y.; visualization: N.Y.; writing: N.Y. (original draft) and all authors (review and editing).

Competing interests

The authors declare no competing interests.

Additional Declarations:

There is NO Competing Interest.

Acknowledgements

We thank Akie Takahashi, Hiroko Egashira, Mizuki Shinoda Taeko Kawakami and Rie Shimano for technical assistance, TAIR for providing sdg7-2, sdg8-2, sdg8-3, sdg8-4, clf-28and DR5:GUSseeds, Dr. Keiji Nakajima (Professor in Nara Institute of Science and Technology) for giving proPLT2:PLT2-YFP, proSHR:SHR-GFP and proWOX5:NLS-GFP and Dr. Masaki Ito (Professor in Kanazawa University), Dr. Caroline Dean (Professor in John Innes Centre) and Dr. Hidehiro Fukaki (Professor in Kobe University) for sharing proCYCB1;2:CYCB1 ;2-YFP, proSDG8:SDG8-GFPand proLBD16:GUSseeds, respectively. Computations were partially performed on the National Institute of Genetics (NIG) supercomputer at the Research Organization of Information and Systems, NIG. This work was supported by a grant from a JSPS KAKENHI Grant-in-Aid for Scientific Research B (No. 23H02503), a Grant-in-Aid for challenging Exploratory Research (No. 19K22431), a Grant-in-Aid for Transformative Research Area (A) (No. 21H05663, 23H04968), a grant from the Daiichi Sankyo Foundation of Life Science, a grant from the LOTTE Foundation, and a grant from Ohsumi Frontier Science Foundation to N.Y., a JSPS KAKENHI Grant-in-Aid for Scientific Research C (No. 22K06180) to TKT, and a JSPS KAKENHI Grant-in-Aid for Scientific Research on Innovative Areas (No. 21K19266), a JSPS KAKENHI Grant-in-Aid for Scientific Research A (No. 20H00470), and a Grant-in-Aid for Transformative Research Area (A) (No. 22H05176) to T.I.