Abstract
The biological activities of organisms are closely linked to seasonality. Phenology, the temporal orchestration of biological activities, is governed by gene expression, yet the evolutionary dynamics underlying seasonal gene expression remain unclear. To investigate these dynamics, we compared genome-wide expression dynamics (molecular phenology) in four dominant evergreen Fagaceae species in Asia (Quercus glauca, Q. acuta, Lithocarpus edulis, and L. glaber), using leaf and bud tissues over two seasonal cycles. We assembled high-quality reference genomes, identifying 11749 single-copy orthologous genes. Seasonal transcriptomic profiling of these orthologous genes revealed highly conserved gene expression across species in winter when temperatures fall below ∼10°C. Rhythmic gene expression with significant periodic oscillations was more prevalent in buds (51.9%) than in leaves (40.6%), with most rhythmic genes (78.4–92.0%) exhibiting annual periodicity, while a smaller fraction (1.2–11.9%) followed half-annual cycles. The seasonal peaks of rhythmic genes were highly synchronized across species in winter but diverged during the growing season, reflecting species-specific timing of leaf flushing and flowering. These findings suggest that the four species share a common molecular calendar in winter, which constrains the evolution of gene expression under seasonal environments.
Introduction
Fundamental biological processes, such as growth, mortality, and reproduction, rarely remain constant over time; instead, they exhibit pronounced variation in response to seasonal environmental changes (Fretwell, 1972). This temporal organization of a series of biological activities, known as phenology, plays a critical role in the evolution of life histories in response to seasonal fluctuating environments (Forrest and Miller-Rushing, 2010; Morellato et al., 2013; Piao et al., 2019; Schwartz, 2013). Phenology is regulated by the coordinated expression of genes that enable organisms to exhibit essential functions at the appropriate times (Kudoh, 2016; Satake et al., 2024a, 2022). However, gene expression dynamics underpinning seasonal responses in organisms remain not fully understood.
To explore the evolution of gene expression, transcriptome analyses are widely employed to compare transcript profiles across organs, sexes, species, and developmental stages within a given environment. Such studies have been conducted in mammals (Brawand et al., 2011; Cardoso-Moreira et al., 2019; Naqvi et al., 2019), Drosophila (Lemos et al., 2005; Nuzhdin et al., 2004), and Tanzanian cichlids (El Taher et al., 2021), providing key insights into the evolutionary dynamics of gene expression and its central role in shaping phenotypic diversity. These studies have revealed that gene expression profiles tend to be more conserved across species within the same tissue, whereas the evolutionary rate of gene expression varies among organs. In mammals, for instance, nervous tissue exhibits a slower rate of gene expression change compared to the testis (Brawand et al., 2011; Cardoso-Moreira et al., 2019; Naqvi et al., 2019), suggesting the presence of tissue-specific functional constraints that influence the evolution of gene expression.
In addition to tissue-dependent constraints, seasonal environmental changes may play a crucial role in shaping the evolution of gene expression, particularly by influencing the temporal coordination of gene expression under natural conditions. Gene expression responses to certain seasons may be more resistant to change, whereas in other seasons, divergence in gene expression timing across species may occur, contributing to temporal niche differentiation. Despite its potential importance, the influence of seasonal environmental fluctuations on the gene expression evolution remains poorly understood, largely because most transcriptomic analyses have been conducted under tightly controlled laboratory conditions.
To investigate the evolution of gene expression in seasonal environments, we performed a molecular phenology analysis on Fagaceae tree species, which are widely distributed across the Northern Hemisphere, with its center of diversity in subtropical Southeast Asia (Hipp et al., 2020; Kremer et al., 2012; Manos et al., 2001; Zhou et al., 2022). Molecular phenology refers to transcriptome analysis conducted under natural fluctuating conditions (Komoto et al., 2024; Kudoh, 2016; Nagano et al., 2019; Satake et al., 2023) and provides a valuable framework for studying gene–environment interactions and their role in orchestrating the temporal biological functions essential for adaptation to environmental fluctuations (Satake et al., 2022). In this study, we selected four evergreen Fagaceae species—two Quercus species (Q. glauca and Q. acuta) and two Lithocarpus species (L. edulis and L. glaber)—that co-occur in the same locality in Japan. Q. glauca, L. edulis and L. glaber inhabit nearly identical environment, while Q. acuta is restricted to the high-altitude regions with slightly colder environment compared to the other three species. Leaf flushing and flowering phenology vary across species, with Q. glauca and Q. acuta flowering in early spring, almost simultaneously with leaf flushing, L. edulis flowering in late spring, and L. glaber flowering in autumn, approximately one to two months after leaf flushing. This setting enabled us to compare seasonal gene expression profiles across species with diverse phenology, both within and between genera, under nearly identical environmental fluctuations. By assessing and comparing seasonal transcriptomic dynamics across Fagaceae species, our study provides new insights into the evolution of gene expression and adaptation to seasonal natural environments.
Results
Genome assemblies of Q. glauca and L. edulis
To quantify transcript abundance of non-model Fagaceae species, we first constructed reference genomes for Q. glauca, as a representative of genus Quercus, and L. edulis, as a representative of genus Lithocarpus (Fig. 1A). To obtain high-quality chromosome-level reference genomes, we conducted a hybrid genome assembly for the HiFi reads and the high-throughput chromosome conformation capture (Hi-C) reads (Table S1).The resultant scaffolds were 867.7 and 835.7 Mb and consisted of 797 and 750 scaffolds including 12 chromosome-level scaffolds with contig N50 of 71.7 Mb and 68.8 Mb for Q. glauca and L. edulis, respectively (Table S1). Approximately 94.8% and 97.2% of the final assemblies were assigned to 12 pseudochromosomes of Q. glauca and L. edulis, respectively (Figure 1B). The number of chromosome-scale scaffolds were identical to the known numbers in other Quercus (Zoldos et al., 1999) and Lithocarpus species (Liu et al., 2023). The genomes were estimated to contain 39758 and 34059 protein-coding genes (33947 and 30246 genes were annotated within 12 chromosomes), covering 98.3% and 95.0% of complete Benchmarking Universal Single Copy Orthologs (BUSCO) genes (eudicots_odb10) for Q. glauca and L. edulis, respectively. The genome sizes of Q. glauca and L. edulis were estimated to be 891.2 Mb and 868.7 Mb, respectively, using KmerGenie (Chikhi and Medvedev, 2014) and GenomeScope2 (Ranallo-Benavidez et al., 2020).

Synteny and distribution of genomic features of Q. glauca and L. edulis genomes.
(A) A phylogenetic tree of five Fagaceae species from three genera, Quercus, Lithocarpus, and Fagus. Bootstrap values for all branches are 100. (B) Synteny relationship between the Q. glauca and L. edulis genomes. The collinear blocks within the genomes were displayed by lines. Black lines show reverse alignment and gray lines represent regular. Numbers correspond to the 12 chromosomes. (C–D) Circos plots of the Q. glauca (C) and L. edulis genomes (D). Lanes depict circular representation of chromosome length (Mbp) (a), gene density (b), GC density (c), and repeat (d). Lines in the inner circle represent links between synteny-selected paralogs. (E) Gene copy-number variations between the Q. glauca and L. edulis genomes. The bar plot illustrates the number of genes in gene families categorized into 1:1 orthologs, N:N (many-to-many), 1:N (one-to-many), N:1 (many-to-one), and species-specific unique genes. (F–G) Mean gene expression levels in each copy number variation category for Q. glauca (F) and L. edulis (G). Different letters denote statistically significant differences (p < 0.05, Steel-Dwass test).
Overall, the abundance and distribution of protein-coding genes, GC content, and repeat sequences (Fig. 1C, D; Table S1) and syntenic structure (Fig. 1B) were similar between Q. glauca and L. edulis. Using OrthoFinder2 (Emms and Kelly, 2019), we clustered genes from the two newly assembled genomes into orthologous groups. We identified 13427 single-copy orthologs, enabling pairwise comparisons between species (Fig. 1E). Single-copy genes exhibited significantly higher average expression levels than multi-copy genes (Steel-Dwass test, p < 0.05; Fig. 1F, G). These single-copy orthologous genes were used for further analyses.
Seasonal gene expression dynamics
To characterize the evolution of seasonal gene expressions, we generated a total of 483 transcriptome profiles every four weeks over two years using leaf and bud tissues from Q. glauca, Q. acuta, L. edulis, and L. glaber during 2021–2023. Reads from the two Quercus species were mapped to the newly constructed Q. glauca reference genome, achieving a mean mapping rate of 87.0 ± 5.6%. Similarly, reads from the two Lithocarpus species were mapped to the newly constructed L. edulis reference genome, with a mean mapping rate of 90.0 ± 3.0%. The number of reads uniquely mapped on the reference genome was 19.9 million reads per sample, on average. Among the single copy orthologous genes, we excluded genes with low expression levels (mean of Reads Per Kilobase (RPK) across samples for each species < 1) and normalized the transcript count data using the GeTMM method (Smid et al., 2018). This process resulted in a high-quality expression dataset for 11749 genes (Fig. S1A). For further analyses, we used the mean expression levels across three biological replicates (three individuals per species), except for L. edulis, where only one individual was available.
To gain an initial overview of gene expression dynamics in a seasonal environment, we assessed the relative magnitude of seasonal variation in expression levels for each gene by calculating the standard deviation divided by the mean for each gene (σi/μi) and compared it to the overall variation across all genes in the genome (σg/μg) (Fig. S1B). A prior study using 3025 yeast RNA-seq experiments reported that the dynamic range of individual genes across experiments was considerably smaller than the overall genomic variation under laboratory conditions (Zrimec et al., 2021). Consistently, our results demonstrate that despite substantial seasonal fluctuations in environmental conditions (Fig. S2), individual gene expression remains relatively conserved. On average, the magnitude of genome-wide variation was 15.8 times greater than the seasonal variation observed for individual genes (Fig. S1B).
To assess the similarity of gene expression profiles across tissues, species, and seasons, we performed a hierarchical clustering of seasonal transcriptome profiles from leaf and bud tissues across four species and identified five distinct clusters (cluster L1, L2, B1–B3) according to the Elbow method (Fig. 2A; Fig. S3A; Dataset S1). Transcriptional profiles first grouped into leaves (cluster L1 and L2) and buds (cluster B1–B3), irrespective of seasons and species (Fig. 2A), except that newly flushed leaf samples were grouped closely with buds (Fig. 2A; Fig. S4). Within the leaf cluster, transcriptome profiles were first clustered by genus and then by species in Lithocarpus (cluster L1; Fig. 2A). In contrast, samples from Quercus species formed a mixture of two species (cluster L2; Fig. 2A). Notably, the winter samples of Q. acuta, collected during periods of low mean temperatures, were distinctly separated from those collected in other seasons (cluster L2; Fig. 2A). Within the bud cluster, transcriptome profiles were predominantly grouped by season—winter (cluster B3; November to February, with the exception of a March sample from L. edulis; Dataset S1) versus the other growing seasons (cluster B1 and B2; March to November; Dataset S1)—and subsequently organized by genus and species. This suggests that the winter season exerts a stronger influence on gene expression in buds than phylogenetic relationships.

Seasonal gene expression dynamics.
(A) Hierarchical clustering of 213 gene expression profiles encompassing two tissues (leaf and bud) and four species (Q. glauca, Q. acuta, L. edulis, and L. glaber), collected over two-year. (B) Distribution of daily mean temperature on each sampling date across clusters, from L1 (top) to B3 (bottom). The Mann-Whitney U test was used to compare daily mean temperature between cluster L1 and L2 (p = 0.136). The Steel-Dwass test was applied for multiple comparisons across clusters B1–B3 (p < 0.05). Different letters denote statistically significant differences (p < 0.05). (C–D) Hierarchical clustering of 11749 genes based on the similarity of gene expression dynamics. Daily mean temperature, flowering and leaf flushing phenology (proportion) are shown in the upper three panels of the heatmaps.
Within the bud cluster, transcriptional profiles in the winter cluster were observed under significantly lower temperatures, approximately below 10°C, compared to other seasons (Fig. 2B; Fig. S5A). In contrast, no such temperature-related distinction was evident within the leaf cluster (Fig. 2B). Additionally, photoperiods were markedly shorter in the winter cluster in buds (Fig. S5B). These findings suggest that the buds of the four species exhibit a conserved response to environmental factors characteristic of the winter season. To further investigate the relationships among all samples, we conducted a principal component analysis (PCA). The first principal component (PC1) clearly separated samples by tissue type, while the second (PC2) and the third principal component (PC3) gradually differentiated winter samples from those collected during other seasons (Fig. S6).
To investigate seasonal expression dynamics of each gene, we conducted hierarchical clustering on the transcriptional dynamics of 11749 genes and identified five distinct gene sets characterizing different transcriptional modes (gene set I–V; Fig. 2C, D; Fig. S3B; Dataset S2). Gene set I exhibited predominant expression in buds, with seasonal peaks during summer, whereas gene set II was primarily expressed in leaves, highlighting clear tissue-specificity (Fig. 2C, D). Gene sets III and V demonstrated genus-specific expression patterns, with higher expression levels observed in Lithocarpus and Quercus, respectively (Fig. 2C, D). In contrast, gene set IV was expressed during winter across both leaf and bud tissues (Fig. 2C, D).
Detection of rhythmic genes
To identify genes revealing seasonal expression dynamics and determine the period and seasonal peaks in gene expression, we applied the RAIN algorithm (Thaben and Westermark, 2014). This approach enabled us to detect genes with rhythmic expression (hereafter rhythmic genes) and analyze their periods and phases (peak months). Our analysis revealed that 44.6–58.8% of genes in bud tissues and 33.9–45.9% in leaf tissues exhibited significant rhythmic expression across species (Fig. S7A; Dataset S3). Notably, bud tissues demonstrated a higher proportion of rhythmic genes, with 51.9% across four species on average, compared to 40.6% in leaf tissues (Fig. S7A).
Among the rhythmic genes, the majority (78.4–92.0%) exhibited annual periodicity, with a period of 12 ± 1 month observed across species and tissues (Fig. 3A, B, Dataset S3). Additionally, hundreds of genes (1.2–11.9%) displayed half-annual periodicity (6 ± 1 month) with these genes being most frequently observed in the bud samples of L. edulis (Fig. 3A, B). The fraction of genes exhibiting other periodicities was relatively small (5.7–14.1%; Fig. 3A, B). Based on the observed period distributions, we classified genes with periods within the range of 8–16 months as exhibiting annual periodicity (96.0%) and those with periods shorter than 8 months as having half-annual periodicity (4.0%), accounting for detection noise (Fig. S7A). Rhythmic genes with longer periods (>17 months) were also identified, although their periodicity may be less reliable due to the limited number of time points. Many genes with annual periodicity were shared across all species, including 1280 genes in leaves and 1894 genes in buds (Fig. S7B). In contrast, most genes exhibiting half-annual periodicity were species-specific, with no genes shared among all the species (Fig. S7C).

Distribution of period and peak month in rhythmic gene expression.
(A– B) Distribution of period length (month) among the genes with significant rhythmicity calculated by RAIN for leaf (A) and bud (B) samples of each species. (C–D) Distribution of seasonal peaks among the genes with annual periodicity. The angle of each bar represents the peak month, while the bar height indicates the number of genes. Purple and green squares around each rose plot denote the observed phenology of flowering and leaf flushing (proportion), respectively.
Peak month distribution of rhythmic genes and intra-genus and inter-genera comparison
To determine the seasonal timing of gene expression peaks, we focused on genes with annual or half-annual periodicity and visualized the density and direction of these peaks throughout the year using a circular histogram (Fig. 3C, D; Dataset S3). On average, 34.9% and 41.3% of annual rhythmic genes peaked in winter The biological activities of organisms are closely linked to seasonality. Phenology, the temporal orchestration of biological activities, is governed by gene expression, yet the evolutionary dynamics underlying seasonal gene expression remain unclear. To investigate these dynamics, we compared genome-wide expression dynamics (molecular phenology) in four dominant evergreen Fagaceae species in Asia (Quercus glauca, Q. acuta, Lithocarpus edulis, and L. glaber), using leaf and bud tissues over two seasonal cycles. We assembled high-quality reference genomes, identifying 11749 single-copy orthologous genes. Seasonal transcriptomic profiling of these orthologous genes revealed highly conserved gene expression across species in winter when temperatures fall below ∼10°C. Rhythmic gene expression with significant periodic oscillations was more prevalent in buds (51.9%) than in leaves (40.6%), with most rhythmic genes (78.4–92.0%) exhibiting annual periodicity, while a smaller fraction (1.2–11.9%) followed half-annual cycles. The seasonal peaks of rhythmic genes were highly synchronized across species in winter but diverged during the growing season, reflecting species-specific timing of leaf flushing and flowering. These findings suggest that the four species share a common molecular calendar in winter, which constrains the evolution of gene expression under seasonal environments.
(December–February) in leaf and bud samples, respectively. Additionally, another peak corresponding to or occurring slightly after leaf flushing and flowering events was observed, with its timing varying by species: April in Q. glauca (15.6%), July in Q. acuta (15.3%), June in L. edulis (16.0%), September in L. glaber (18.1%), for leaf tissues (Fig. 3C). Similar peaks corresponding to leaf flushing or flowering were also detected in bud samples, occurring approximately one to two months earlier than in leaf samples (Fig. 3D). For genes with half-annual periodicity, the majority (63.2 ± 4.8%) exhibited expression peaks both in spring (March–May) and autumn (September–November) in both tissues (Fig. S7D; Dataset S3), with the exception of L. glaber, an autumn flowering species. Interestingly, WRKY22, a transcription factor regulating ethylene biosynthesis and senescence (Zhu et al., 2024), exhibited half-annual periodicity in L. glaber (Fig. S7E). In addition, SNF1-RELATED PROTEIN KINASE REGULATORY SUBUNIT GAMMA 1 (KINγ1), a subunit of SNF1-related kinase (SnRK) involved in stress response such as starvation (Emanuelle et al., 2016), showed two distinct peaks in Q. glauca and L. edulis, occurring in both summer (July) and winter (January) (Fig. S7E). This result suggests that both summer and winter pose stressful conditions for these species.
To compare the peak month of gene expression with annual periodicity across species, we quantified the number of genes within each peak combination for both intra-genus (Fig. 4A) and inter-genera species pairs in buds (Fig. 4B) and leaves (Fig. S8A, B). During winter, a large number of genes exhibited highly synchronized expression peaks across species, with high pairwise Pearson’s correlation coefficients observed (Fig. 4A, B). In contrast, the peak timing of gene expression from spring to autumn varied across species, aligning with differences in leaf flushing and flowering times (Fig. 4A, B). Some genes exhibited pronounced divergence, showing negative Pearson’s correlation coefficients (< −0.3) between species. Notably, the number of such genes was larger in inter-genera comparisons (11–77 genes; Fig. 4B) than in intra-genus comparisons (4–12 genes; Fig. 4A). A similar pattern was observed in leaf samples (Fig. S8A, B). When comparing seasonal expression peaks between tissues, the majority of genes were synchronously activated in winter in both tissues (Fig. S8C). During spring to summer, slight variations in seasonal expression peaks were observed among tissues (Fig. S8C).

Comparison of peak months in seasonal gene expression across species in buds.
(A) Intra-genus and (B) inter-genera comparisons of peak months in seasonal gene expression. The size and color of the circles represent the number of genes and the mean of Pearson’s correlation, respectively. Genes with negative correlations lower than −0.3 are highlighted in blue, while the month of leaf flushing is highlighted in green. (C) Plot of molecular phenology divergence index D across months. Triangles and circles indicate the values of D calculated from intra-genus and inter-genera comparison, respectively. The line and shaded envelope indicate the mean and standard deviation (s.d.). Different letters denote statistically significant differences (Nemenyi test, p < 0.05).
We quantified the degree of divergence in seasonal gene expression patterns across species by calculating the peak month differences between two species, where one species exhibits an expression peak in the given month. We defined the molecular phenology divergence index (D) for each month by quantifying the proportion of genes with peak month differences greater than 2 months between two species, relative to the total number of genes with expression peaks in that month. From spring to autumn, the mean of D over all species pairs in buds ranged from 0.47 to 0.66, but it declined sharply during winter, reaching a minimum of 0.12 in January in buds (Nemenyi test, p < 0.05; Fig. 4C). A similar trend was observed in leaf samples, with another trough in summer (Fig. S8D). The values of D were not consistently smaller in inter-genera pairs compared to intra-genus pairs. These findings suggest that there is a seasonal constraint in gene expression in which the degree of molecular phenology divergence is not constant but varies significantly across seasons, with the least divergence occurring in winter.
Phylogenetic constraints in the evolution of seasonal gene expression
We next investigated whether phylogenetic relationship influences the evolution of seasonal gene expression by calculating pairwise Pearson’s correlations of seasonal expression for each gene across all species pairs (Dataset S4). The median of Pearson’s correlation coefficients across all genes were significantly higher between species within the same genus compared to those between species from different genera in both leaves (Steel-Dwass test, p < 0.05; Fig. S9A) and buds (Steel-Dwass test, p < 0.05; Fig. 5A). The intra-genus comparison in genus Lithocarpus exhibited the highest correlation in their seasonal gene expression, despite their notable differences in flowering phenology, with L. edulis flowering in spring and L. glaber in autumn. In contrast, the intra-genus comparison in genus Quercus resulted in a relatively low correlation in their gene expression regardless of relatively similar phenology in leaf flushing and flowering. This difference may be attributed to environmental factors, as Q. acuta is the only species found in the high-altitude study site with colder climates compared to the other three species.

Phylogenetic constraints in the evolution of seasonal gene expression and the relationship between seasonal gene expression divergence and sequence evolution.
(A) Distribution of pairwise Pearson’s correlation coefficients for gene expression across species (Le: L. edulis, Lg: L. glaber, Qg: Q. glauca, Qa: Q. acuta; n = 11749). Colors indicate rhythmicity categories. Black bars show median correlation values, and different letters indicate significant differences (Nemenyi test, p < 0.05). (B) Heatmap of pairwise correlation coefficients for each gene, with mean gene counts in four seasonal expression categories: (1) annual rhythmic genes peaking in the growing season (March–November), (2) annual rhythmic genes peaking in winter (December–February), (3) half-annual rhythmic genes, and (4) arrhythmic genes. Genes are ordered by mean correlation across species pairs. (C–D) Conserved (C) and diverged (D) gene expression patterns, based on the top and bottom 5% of pairwise correlations. Solid lines and error bars indicate mean ± SD. Numbers in parentheses show peak month differences between Q. glauca and L. edulis. Peak month difference for BFT was not calculated due to arrhythmic expression in Q. glauca. (E) Relationship between evolutionary rate (dN/dS) and peak month difference (Δφ) for genes with annual periodicity in Q. glauca and L. edulis (n = 951). Black lines and transparent bands represent regression lines and 95% confidence intervals.
Rhythmic genes exhibited more conserved expression patterns than arrhythmic genes in both tissues (Fig. 5A; Fig. S9A). In particular, genes with seasonal expression peaks in winter showed high correlation across species in both tissues (Fig. 5B; Fig. S9B). The proportion of rhythmic genes with annual periodicity declined as the mean correlation coefficient across species decreased, with a more rapid decline observed for genes peaking in winter compared to those with peak expression during the growing seasons (Fig. 5B; Fig. S9B). Additionally, the mean absolute difference in seasonal gene expression dynamics was significantly smaller between species within the same genus than between species from different genera in both tissues (Fig. S10A, B). The relationship between the mean absolute difference and the proportion of rhythmic genes was less clear compared to that of the mean correlation coefficient (Fig. S10C, D). These findings suggest that, in addition to seasonal constraints, the overall dynamics of seasonal gene expression are also shaped by phylogenetic constraints.
Functional annotation of genes with conserved or divergent expression across species
To functionally annotate genes with conserved or divergent expression across species, we extracted genes with the top and bottom 5% based on the distribution of mean correlation (Fig. 5B; Fig. S9B) and conducted Gene Ontology (GO) enrichment analysis. GO analysis revealed that the top 5% of genes in leaves were associated with photosynthesis, while those in buds were linked to cell fate determination (Fig. S11A; Dataset S5). The gene with the highest correlation in leaves was STRESS RESPONSE SUPPRESSOR1 (STRS1; mean of pairwise correlation = 0.91; Fig. 5C), a DEAD-box RNA helicase previously reported to enhance tolerance to abiotic stresses such as cold (Kant et al., 2007). In buds, the most highly correlated gene was the histone deacetylase HDA19 (mean of pairwise correlation = 0.93; Fig. 5C), which plays a key role in environmental stress responses through histone modification and chromatin remodeling (Chen and Wu, 2010). These findings suggest that genes associated with photosynthesis during the growing season and stress responses during winter exhibit highly conserved seasonal expression patterns across species.
The bottom 5% of genes were predominantly associated with DNA replication and cell division in leaves (Fig. S11B), aligning with the divergent leaf phenology observed (Fig. 3B). In contrast, GO terms related to immune response were weakly enriched in buds (Fig. S11B). Notably, we identified MADS-box genes linked to floral transition among the bottom 5% of genes: BROTHER OF FT AND TFL1 (BFT) in leaves (mean of pairwise correlation = −0.094) and FRUITFULL (FUL) in buds (mean of pairwise correlation = −0.069). BFT, a floral repressor (Yoo et al., 2010), was expressed in autumn in the spring-flowering species L. edulis and in spring in the autumn-flowering species L. glaber leaves (Fig. 5D). Meanwhile, FRUITFULL (FUL), a positive regulator of floral transition (Smaczniak et al., 2012), was expressed in spring in L. edulis and in autumn in L. glaber (Fig. 5D), mirroring the timing of flowering. In Q. glauca and Q. acuta, FUL was expressed in winter (Fig. 5D), nearly out of phase with their expressions in Lithocarpus species. These results suggest that genes involved in reproductive processes exhibit species-specific divergence in expression patterns in response to seasonal environmental changes.
Relating seasonal gene expression divergence to sequence divergence
If divergence of seasonal timing of gene expression is related to selective constraint, we would expect a correlation between sequence divergence patterns and the divergence in seasonal expression peaks. To explore the relationship between the divergence of seasonal gene expression and protein sequence evolution, we calculated synonymous (dS) and non-synonymous substitution rates (dN) and their ratio (dN/dS) for each gene using coding sequences from newly assembled genomes of Q. glauca and L. edulis. We first confirmed that the highly expressed genes evolve more slowly by identifying a significant negative correlation between evolutionary rate (dN/dS) and mean of the gene expression (r = −0.21, p = 1.8 × 10−118 for dN/dS; Fig. S12A; Dataset S6). This finding aligns with previous studies in yeasts, mammals, and plants (Drummond et al., 2005; Gaut et al., 2011; Liao and Zhang, 2006; Yang and Gaut, 2011). By targeting the genes that exhibited annual periodicity across both species and tissues, we examined the relationship between these evolutionary measures and peak month differences in seasonal gene expression between species (Δφ) in leaves and bud tissues using linear regression. Assessment of the significance of the regression coefficient using a t-test revealed that genes with smaller peak month differences between species exhibited lower values of dN/dS in buds (r = 0.072, p =2.6 × 10⁻2) and marginally lower values of dN/dS in leaves (r = 0.062, p = 5.7 × 10⁻2). The lower values were also found in dN in both leaves (r = 0.086, p = 8.1 × 10⁻3) and buds (r = 0.086, p = 7.8 × 10⁻3), whereas no significant trend was observed for dS in both tissues (Fig. 5E; Fig. S12B, C; Dataset S6). This finding suggests that genes with conserved seasonal expression patterns tend to evolve at a slower rate. We further investigated whether the association between the divergence in seasonal expression peaks and sequence divergence patterns varies between genes expressed in winter and in the growing season. We classified genes that exhibited annual periodicity across both species and tissues into four groups based on their expression peak timing (winter or growing seasons) and tissue specificity (leaves or buds) and compared the extent of protein sequence divergence across these groups. We found no significant differences in dN/dS (Kruskal Wallis test, p = 2.8 × 10⁻1; Fig. S12D; Dataset S6) between genes expressed in winter in both tissues and those in growing seasons. The same result was observed even when focusing only on genes with conserved peaks (peak month difference < 2), with no significant differences in dN/dS between these gene sets (Kruskal Wallis test, p = 3.5 × 10⁻1; Fig. S12E; Dataset S6). These results suggest that, regardless of the season, genes with conserved expression peaks tend to evolve at slower rates.
Discussion
A central goal in biology is to unravel the molecular basis of phenotypic evolution, with gene expression serving as a fundamental driver of this process. In this study, we investigated the impact of seasonal environmental fluctuations on the evolution of gene expression and its relationship with phenological traits in four dominant Fagaceae tree species in forest ecosystems. We found that gene expression profiles were more resistant to change during winter compared to the growing season, indicating the presence of a seasonal constraint on the evolution of gene expression. This constraint becomes particularly pronounced in buds during winter when temperatures fall below approximately 10°C (Fig. 2B). This temperature threshold aligns with previous findings, where winter-specific transcript profiles were reported at temperatures below 8.7–11.3°C in cherry trees (Miyawaki - Kuwakado et al., 2024), 11.3°C in ever-green Fagaceae (Satake et al., 2023), and 10.5°C in perennial Arabidopsis (Aikawa et al., 2010). These results suggest the presence of a conserved mechanism for gene expression regulation under low-temperature environments across different species. The greater conservation of gene expression patterns in buds compared to leaves across species may be attributed to their functional role protecting essential meristematic tissues, including the shoot apex and leaf and flower primordia. Additionally, buds likely play a crucial role in maintaining common physiological functions necessary for the resumption of growth in spring. This functional consistency may impose evolutionary constraints on gene expression, contributing to its conservation across species.
It is intriguing to compare the seasonal constraints identified in this study with developmental constraints in gene expression evolution, where evolutionary conservation and variation across developmental stages during the life cycle have been observed, such as in the early conservation model and the developmental hourglass model (Fig. S13A)(Irie and Kuratani, 2011; Lotharukpong et al., 2024; Quint et al., 2012). Previous studies have demonstrated that certain developmental stages exhibit remarkably similar gene expression patterns (Drost et al., 2016; Kalinka et al., 2010; Levin et al., 2016). Our findings highlight that such highly similar gene expression patterns are observed during the winter season (Fig. S13B), suggesting that a common physiological state is essential for enduring harsh winter conditions. This seasonal constraint on gene expression evolution may have the potential to limit temporal niche partitioning at the phenotypic level and slow species divergence rates in seasonal environments (Cleland and Wolkovich, 2024; Kubo and Iwasa, 1996). Although completely disentangling the effects of developmental stages and seasonal changes on gene expression is challenging when using molecular phenology data collected from natural environments, the observation that species with different phenology—i.e., different developmental timing in seasonal environments—exhibit convergent gene expression patterns during winter underscores the profound influence of winter environmental conditions on the evolution of gene expression. Future research is needed to elucidate the mechanisms driving the establishment of such conserved gene expression patterns, including the dynamic reorganization of chromatin architecture in response to cold (Fischl et al., 2020; Nishio et al., 2020; Zhang et al., 2023).
We found that the evolution of coding-sequences was weakly correlated with the divergence of seasonal peaks in gene expression across species. This suggests that weak functional constraints exist for specific proteins that are expressed in certain seasons, such as photosynthesis in summer and stress response to winter. However, among genes with conserved seasonal peaks, the evolutionary rate of coding-sequences was similarly slow regardless of whether they were expressed in winter or during the growing season. Therefore, to understand the evolutionary mechanism underlying the seasonal constraint on gene expression that results in similar expression patterns during winter, it is essential to consider evolution of non-coding sequences that modulate gene expression levels in response to seasonal environmental changes. Regulatory variation has been recognized as a major driver of evolutionary innovation (Prud’homme et al., 2007; True and Haag, 2001). Recent deep learning studies utilizing genome and transcriptome data from yeast and humans have highlighted the significance of co-evolution between coding and non-coding sequences in shaping gene expression levels (Zrimec et al., 2021, 2020). Understanding how seasonal environments drive the co-evolution of coding and regulatory sequences, as well as the divergence of molecular phenology and its phenological outputs, will require expanding analyses to a broader range of species. Such efforts will facilitate the development of predictive models for evolutionary processes influenced by seasonal dynamics. Comparison of molecular phenology across wide climatic zones, including both tropical and temperate zones, will also be effective to identify key genes for the phenological response to on-going climate change.
Materials and Methods
Sample collection for genome sequencing
To construct genomes of Q. glauca and L. edulis, we collected approximately 5g of fresh leaves from one individual in spring, respectively. We shipped these samples to Dovetail Genomics, LLC (Scotts Valley, CA, USA) where Dovetail Staff performed DNA extraction, library preparation, sequencing, and assembly steps.
Genome sequencing and assembly
DNA extraction was performed using the QIAamp DNA Mini Kit (Qiagen, Germantown, MD, USA) using the manufacturer’s recommended protocol. Mean fragment length of the extracted DNA was 60 kbp for Q. glauca and 65 kbp for L. edulis. DNA samples were quantified using Qubit 2.0 Fluorometer (Life Technologies, Carlsbad, CA, USA). The library preparation, sequencing and scaffolding were carried out by Dovetail Genomics (California, USA) according to their standard genome assembly pipeline (https://dovetailgenomics.com/). The PacBio SMRTbell library (∼20kb) for PacBio Sequel was constructed using SMRTbell Express Template Prep Kit 2.0 (PacBio, Menlo Park, CA, USA) using the manufacturer’s recommended protocol. The library was bound to polymerase using the Sequel II Binding Kit 2.0 (PacBio) and loaded onto PacBio Sequel II. Sequencing was performed on PacBio Sequel II 8M SMRT cells. PacBio CCS reads were used as an input to Hifiasm1 v0.15.4-r347 with default parameters. For Q. glauca, we also generated sequence data using PacBio Revio. BLASTn results of the contigs obtained by Hifiasm assembly against the NCBI’s NT database were used as input for blobtools2 v1.1.1 and contigs identified as possible contamination were removed from the assembly. Finally, purge_dups3 v1.2.5 was used to remove haplotigs and contig overlaps.
Assembly scaffolding with HiRise
A proximity ligation library was generated by the Dovetail’s Omni-C technique, followed by sequencing on an Illumina HiSeqX platform. Chromatin was fixed in place in the nucleus with formaldehyde before extraction (https://dovetailgenomics.com/wp-content/uploads/2021/09/Omni-C-Tech-Note.pdf). Fixed chromatin was digested with DNase I, fragmented chromatin ends were repaired and biotinylated to adapters followed by proximity ligation. Crosslinks were then reversed, the DNA purified, and the biotin subsequently removed. The DNA library was prepared and sequenced to produce 2 × 150 bp paired-end reads (around 30× coverage). The Omni-C technology uses a sequence-independent endonuclease which provides even, unbiased genome coverage. The input de novo assembly and Omni-C library reads were used as input data for HiRise, a software pipeline designed specifically for using proximity ligation data to scaffold genome assemblies (Putnam et al., 2016). The Omni-C library sequences were aligned to the contigs assembled by Hifiasm using bwa (Li and Durbin, 2009). The separations of Omni-C reads from paired-end mapped within the contigs were analyzed by HiRise to produce a likelihood model for genomic distance between paired-end reads, and the model was used to identify and break putative misjoins, to score prospective joins, and make joins above a threshold.
Genome size estimation
The genome size of Q. glauca was estimated by GenomeScope2 v2.0 (Ranallo-Benavidez et al., 2020) using k-mer frequency plot obtained by kmc v3.2.4 (Kokot et al., 2017) with k-mer size 61 against the HiFi reads. The paired-end reads of four individuals of L. edulis were merged and trimmed by fastp v0.23.2 (Chen et al., 2018). The genome size of L. edulis was estimated by KmerGenie (Chikhi and Medvedev, 2014) using k-mer size 55 against the trimmed paired-end reads.
Sample collection for RNA sequencing
To monitor the genome-wide gene expression profiles, we collected leaf, bud, and flower samples from four Fagaceae species, Q. glauca, Q. acuta, L. edulis and L. glaber, every four weeks spanning from May 2019 to June 2023 across Fukuoka and Saga prefectures in Kyushu, Japan. We targeted three individuals, and each sample was collected from three branches per individual, except for L. edulis. Given the consistent expression patterns observed among individuals in a previous study (Satake et al., 2023), we focused on only one individual for L. edulis. Study sites for Q. glauca is the biodiversity reserve on the Ito campus of Kyushu University (33°35′ N, 130°12′ E, 20 to 57 m a.s.l). Q. acuta was monitored in Sefuri mountains (33°26′ N, 130°22′ E, 970 to 974 m a.s.l). L. edulis and L. glaber were studied at the Imajuku Field Activity Center (33°33′ N, 130°16′ E, 84 to 111 m a.s.l). Environmental data from Ito Campus and the Imajuku Field Activity Center were obtained from the Japan Meteorological Agency, while data from Sefuri Mountain were collected using a logger installed at the sampling site.
Samples for RNA-seq were taken from the sun-exposed crown (approximately 2–5 m from the ground) using long pruning shears from 11:30 to 12:30 h. For each sample, 0.2–0.4 g of tissue was preserved in a 2 ml microtube containing 1.5 ml of RNA-stabilizing reagent (RNAlater; Thermo Fisher Scientific, Waltham, MA, USA) immediately after harvesting. The samples were transferred to the laboratory within 2 hr after sampling, stored at 4°C overnight and then stored at −80°C until RNA extraction. During transport to the laboratory, the samples were kept in a cooler box with ice to maintain a low temperature.
RNA extraction and RNA-seq analysis
RNA was extracted independently from bud and leaf samples of each tree. Total RNA of the leaf sample was extracted in accordance with the method described in a previous study (Miyazaki et al., 2014). Total RNA of the bud sample was extracted using the PureLink™ Plant RNA Reagent (Thermo Fisher Scientific, Waltham, MA, USA) and subsequently processed with DNase using the TURBO DNA-free kit™ (Thermo Fisher Scientific) according to their protocols and purified using the RNeasy Plant Mini Kit (Qiagen, Hilden, Germany) according to its RNA cleanup protocol. RNA integrity was examined using the Agilent RNA 6000 Nano kit on a 2100 Bioanalyzer (Agilent Technologies), and the RNA yield was determined using a NanoDrop ND-2000 spectrophotometer (Thermo Fisher Scientific). Five to six micrograms of RNA extracted from each sample were sent to Hangzhou Veritas Genetics Medical Institute Co., Ltd., where a cDNA library was prepared with a NEBNext Ultra II RNA Library Prep Kit for Illumina, and 150 base paired-end transcriptome sequencing of each sample was conducted using an Illumina NovaSeq6000 sequencer (Illumina).
The quantification of gene expression followed a three-step process. First, quality filtering was performed using the fastp v0.23.2 with default options: phred quality ≥15, the maximum number of N base in one read = 5 (Chen et al., 2018). Second, RNA-seq reads were mapped to the reference genome using STAR v2.7.10b (Dobin et al., 2013). Third, RNA-seq reads were quantified using RSEM v1.3.1 (Li and Dewey, 2011). We excluded the genes with low expression level using the threshold, the value of mean Reads Per Kilobase (RPK) < 1 for each species. Then, we further normalized the expression level using the value of Gene length corrected Trimmed Mean of M-values (GeTMM) (Smid et al., 2018) using an R package “edgeR” v3.42.4. For the downstream analysis we calculated the mean expression levels of GeTMM of each gene across multiple individuals and transformed them to log2(GeTMM+1).
Genome annotation
The RNA-seq reads of Q. glauca (8 samples obtained by a previous study (20) collected from May to December 2017; GEO’s accession number: GSE211384) and L. edulis (20 samples covering leaf, bud and flower tissues collected in January, April, July, and November 2021) were respectively mapped against the scaffolds of Q. glauca and L. edulis obtained by HiRise with HISAT2 v2.1.0 (Kim et al., 2019). The bam files generated by the mappings and amino acid sequences of the 9 species obtained from NCBI (Castanea mollissima: GCA_000763605.2 (NCBI’s accession number), Q. lobata: GCF_001633185.1, Q. robur: GCF_932294415.1, Q. suber: GCF_002906115.1, Carpinus fangiana: GCA_006937295.1, Carya illinoinensis: GCF_018687715.1, Juglans macrocarpa × J. regia: GCF_004785595.1, Juglans regia: GCF_001411555.2, Morella rubra: GCA_003952965.2) were used for ab initio gene prediction by Braker2 v2.1.6 (Brůna et al., 2021), and the genes with highest-score were selected to exclude splicing variants. In parallel, the gene models were respectively constructed in Q. glauca and L. edulis by GeMoMa v1.7.1 (Keilwagen et al., 2019) with RNA-seq reads and amino acid sequences used for the gene prediction in Braker2, and the genes with highest-score were selected to exclude splicing variants. The duplicated genes between Braker2 and GeMoMa were excluded by GFFcompare v0.11.2 (Pertea and Pertea, 2020). The remaining genes were searched against UniProtKB (Bateman et al., 2025) by DIAMOND v2.0.5 (Buchfink et al., 2014) in ‘more-sensitive’ option with E-value ≤10−20 and identity ≥25%. Next, gene expression analysis was performed by Salmon v1.4.0 (Patro et al., 2017), and the genes with TPM values >0.1 were considered as expressed. The orthologous genes were searched against EggNOG (Hernández-Plaza et al., 2023) using eggNOG-mapper v2.1.8 (Cantalapiedra et al., 2021) with E-value ≤10−10. The genes were searched against protein sequences of Castanea mollissima (Accession: GCA_000763605.2), Quercus lobata (GCF_001633185.2), and Arabidopsis thaliana (Araport11 (Krishnakumar et al., 2015)) by BLASTp v2.13.0 (Camacho et al., 2009) with E-value ≤10−20. According to these similarity searches, the genes whose product names contain keywords related to transposable elements (TEs) were categorized into TE. The genes having similarity hits and being expressed by Salmon were classified into high-confidence (HC) genes, while remaining genes were categorized into low-confidence (LC) genes. The genes categorized into HC were used for further analyses.
Genome synteny analysis
To investigate the syntenic relationship between Q. glauca and L. edulis, we first performed an all-vs-all BLASTp search of the protein sequences, using a cut-off E-value of 10−5. Collinearity blocks within and between the genomes were then identified using MCScanX (Wang et al., 2012) based on the results of the BLASTp searches. Finally, the collinearity blocks were visualized using SynVisio (https://synvisio.github.io).
Illumina short-read sequencing and library preparation
To draw molecular phylogeny, we also generated short-read sequencing of Q. glauca, Q. acuta, L. edulis, and L. glaber. DNA extraction was performed using a modified version of the method described previously (Satake et al., 2024b). The DNA samples were sent to Macrogen Inc. (Republic of Korea) for sequencing on the Illumina HiseqX platform (Illumina, San Diego, CA, USA). DNA was sheared to around 500 bp fragments in size using dsDNA fragmentase (New England BioLabs, Ipswich, MA, USA). Library preparation was performed using the NEBNext Ultra II DNA Library Prep Kit (New England BioLabs) according to the manufacturer’s protocol, and the libraries were individually indexed with the NEBNext Multiplex Oligos for Illumina (New England BioLabs) by PCR. The quality and quantity of each amplified library were analyzed using the Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA, USA), the High Sensitivity DNA kit (Agilent Technologies), and the NEBNext Library Quant Kit for Illumina (New England BioLabs).
Phylogenetic analysis
We constructed a phylogenetic tree using an alignment-free method (Fan et al., 2015) based on short-read data from Q. acuta and L. glaber and genome assembly data from Q. glauca and L. edulis. Additionally, genomic data for Fagus sylvatica was obtained from beechgenome.net (Mishra et al., 2022) and included as an outgroup. Due to the large volume of short-read data for Q. acuta and L. glaber (18 Gb each), we randomly subsampled half of the reads. Bootstrap values were estimated through 100 resampling of the k-mer table.
Identification of orthologs
The prediction of orthogroups was based on a BLASTp all-against-all comparison of the protein sequences (E-value < 10−5) of 5 Fagaceae species: L. edulis, Q. glauca, Q. robur, Fagus sylvatica and Juglans regia. Clustering was conducted with OrthoFinder2, using version 2.2.6 for BLAST data generation and version 2.5.5 for orthogroup determination (Emms and Kelly, 2019). The genomic data of Q. robur and J. regia were downloaded from NCBI (dhQueRobu3.1 and Walnut 2.0), while F. sylvatica data were sourced from beechgenome.net (Mishra et al., 2022). Since the F. sylvatica dataset contained numerous predicted transposable elements (TEs), we re-annotated genes using three databases—EggNOG5 (EggNOG-mapper), Pfam35.0 (HMMER v3.3.2), and UniRef90_2023_0628 (DIAMOND v2.1.8.162)—and excluded genes identified as TEs in at least one database. In total, we identified 26692 orthogroups and focused on single-copy orthogroups after filtering out genes mapped outside the 12 chromosomes. This resulted in a final dataset of 11749 genes for further analysis (Data S2).
Hierarchical clustering and principal component analysis (PCA)
To assess the similarity of the genome-wide transcriptional profiles across orthologous genes and 483 samples collected in different seasons, we performed hierarchical clustering using the monthly time series data of 11749 orthologous genes from October 2019 to April 2023. For each orthologous gene, there were 26 or 27 time points, with three individuals each for Q. glauca, Q. acuta, L. glaber and one individual for L. edulis. We calculated the mean expression levels of each orthologous gene across three individuals in each species and subsequently normalized the values by adjusting the mean to zero and the standard deviation to one. We performed hierarchical clustering using the Ward method and the Euclidian distance using the pheatmap function in R (v 1.0.12). To determine the number of the clusters, we calculated Within-Cluster-Sum of Squared Error (WSS) and determined the number of clusters as an elbow point of it (Fig. S3). To assess the seasonal expression dynamics of 11749 orthologous genes, we also performed PCA of the gene expression profiles from all samples using the prcomp function in R (v 4.3.1).
Identification of genes with rhythmic expression
To identify genes with rhythmic expression and determine the period and peak month of gene expression, we applied the RAIN algorithm (Thaben and Westermark, 2014) using the rain function in R (v 1.34.0). We set the following settings: deltat = 1, period = 12, period.delta = 11, peak.border = (0,1), nr.series = 1, method= “independent”, na.rm= T, adjp.method = “ABH”. Based on the results of the RAIN analysis, we identified genes with significant periodicity using a Benjamini-Hochberg (BH) adjusted q-value threshold of < 0.01. Since the period and peak month in the RAIN output are expressed in terms of the number of time points, we converted them into months using the sampling span (four weeks) and sampling dates. Subsequently, we categorized the rhythmic genes into three categories based on the period length T (in months): “half-annual” (0 ≦ T < 8), “annual” (8 ≦ T ≦ 16), and “long” (T > 16).
Calculation of Pearson’s correlation coefficient across species
To quantify the similarity of seasonal gene expression pattern across species, we calculated pairwise Pearson’s correlations of seasonal expression for each gene across all the species pairs. For species within the same genus, the monitoring period was identical, allowing direct comparison. On the other hand, the monitoring periods differed between Quercus and Lithocarpus species: Q. glauca and Q. acuta were monitored from May 2021 to April 2023, while L. edulis and L. glaber were monitored from October 2019 to October 2021 (Fig. S2). To ensure comparable time series data for inter-genera comparisons, we extracted the time series from Q. glauca and Q. acuta spanning May 19, 2021, to November 2, 2022, and from L. edulis and L. glaber spanning May 13, 2020, to October 20, 2021, for correlation analysis. The mean absolute differences for each gene across all the species pairs were also calculated in a similar manner.
Molecular phenology divergence index (D)
To quantify the seasonal difference of peak month divergence across species, we defined the molecular phenology divergence index (D) for each month t and species pair A-B as the proportion of genes that peak in month t in the reference species A but exhibit a peak month difference (Δφ) greater than two months in species B using following formula:
Gene Ontology (GO) enrichment analysis
First, we assigned GO terms for each gene in Lithocarpus and Quercus genomes based on the annotation data of Arabidopsis thaliana orthologs downloaded from PLAZA Database (http://bioinformatics.psb.ugent.be/plaza/). Then, GO enrichment analysis was conducted using the GOstats (Falcon and Gentleman, 2007) function in R (v 2.66.0) with the following settings: ontology = BP, pvalueCutoff = 0.05, conditional = TRUE, and testDirection = over. The universe gene set was defined as genes associated with GO terms common to both of the genomes. Genes without annotated GO terms were removed from the universe set, and those with differing GO term annotations between the two genomes were also excluded. The BH adjusted q-values were calculated from the p-values generated by GOstats to account for multiple testing.
Phenology measurement: leaf unfolding and flowering
To compare seasonal changes in genome-wide gene expression profiles with phenology, we conducted observations of leaf unfolding and flowering phenology on the three terminal branches of each of the three individuals used for gene expression monitoring. Leaf unfolding was defined as the stage where both the entire leaf blade and the leaf stalk were visible (Fig. S4). Flowering was defined as the stage where either male or female flowers were visible (Fig. S4). We subsequently plotted the proportion of branches exhibiting leaf unfolding and flowering. The data are provided in Dataset S2.
Estimation of protein-coding sequence divergence
Pairwise distances for non-synonymous (dN) and synonymous (dS) substitutions, as well as the dN/dS ratio, were estimated for each orthologous gene pair between L. edulis and Q. glauca using codeml (PAML 4.0)(Yang, 2007). The analysis was performed with the following settings: seqtype = 1, CodonFreq = 2, Runmode = −2, and the transition/transversion ratio estimated from the data. To ensure reliability, genes exhibiting signs of saturated divergence or underestimated dS values were excluded, as codeml produces robust results only within a moderate range of sequence divergence. Specifically, we removed 43 orthologs with dN/dS ≧ 99 and 483 genes with dS > 10, dS < 0.001, or dN < 0.0003. The thresholds of dN and dS were determined by examining the relationship between dN and dS and excluding outliers that deviated from the main distribution. After filtering, the final dataset comprised 11223 orthologous genes.
Linear regression analyses were performed to examine the relationship between dN, dS, dN/dS and peak month differences (Δφ) in gene expression between Q. glauca and L. edulis, as well as mean expression levels. These analyses were conducted using the lm function in R software (v 4.3.1). Specifically, we modeled the evolutionary rates (dN, dS, and dN/dS) as the response variable and the peak month difference (Δφ) or the mean levels in gene expression as the predictor variable. To account for the skewed distributions of evolutionary rates, logarithm transformations with base 2 were applied.
Acknowledgements
We thank Atsuko Miyawaki-Kuwakado for her valuable technical advice on RNA-seq analysis. We also appreciate the technical support provided by Kayoko Ohta. Additionally, we thank Koharu Yamaguchi, Yuta Aoyagi, and Ryosuke Imai for their help with sample collection.
Additional information
Data deposition and code availability
The genome sequences of Q. glauca and L. edulis determined in this paper are available from the National Center for Biotechnology Information (NCBI). The sequence reads used for genome assembly and expression analyses are available from BioProject PRJNA1242478. All codes used in the analysis are available at https://github.com/ku-biomath/Evolution_of_gene_expression_in_seasonal_envrionment.
Funding
This work was supported by JSPS KAKENHI Grant Numbers 23H04965 and 23H04966.
Author Contributions
S.N.K. and A.S. designed research; S.N.K., Y.I., J.K., H.H., S.I., and A.S. performed research; S.N.K., Y.I., J.K., H.H., S.I., and A.S. analyzed data; S.N.K. and A.S. wrote the paper with inputs from all the authors.
Funding
Japan Society for the Promotion of Science (23H04965)
Japan Society for the Promotion of Science (23H04966)
Additional files
References
- Robust control of the seasonal expression of the Arabidopsis FLC gene in a fluctuating environmentProc. Natl. Acad. Sci. U. S. A 107:11632–11637https://doi.org/10.1073/pnas.0914293107Google Scholar
- UniProt: the Universal Protein Knowledgebase in 2025Nucleic Acids Res 53:D609–D617https://doi.org/10.1093/nar/gkae1010Google Scholar
- The evolution of gene expression levels in mammalian organsNature 478:343–348https://doi.org/10.1038/nature10532Google Scholar
- BRAKER2: Automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein databaseNAR Genom. Bioinform 3:1–11https://doi.org/10.1093/nargab/lqaa108Google Scholar
- Fast and sensitive protein alignment using DIAMONDNat. Methods https://doi.org/10.1038/nmeth.3176Google Scholar
- BLAST+: Architecture and applicationsBMC Bioinformatics 10https://doi.org/10.1186/1471-2105-10-421Google Scholar
- eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scaleMol. Biol. Evol 38:5825–5829https://doi.org/10.1093/molbev/msab293Google Scholar
- Gene expression across mammalian organ developmentNature 571:505–509https://doi.org/10.1038/s41586-019-1338-5Google Scholar
- Role of histone deacetylases HDA6 and HDA19 in ABA and abiotic stress responsePlant Signal. Behav 5:1318–1320https://doi.org/10.4161/psb.5.10.13168Google Scholar
- . fastp: an ultra-fast all-in-one FASTQ preprocessorBioinformatics 34:i884–i890https://doi.org/10.1093/bioinformatics/bty560Google Scholar
- Informed and automated k-mer size selection for genome assemblyBioinformatics 30:31–37https://doi.org/10.1093/bioinformatics/btt310Google Scholar
- Effects of phenology on plant community assembly and structureAnnu. Rev. Ecol. Evol. Syst 55:471–492https://doi.org/10.1146/annurev-ecolsys-102722-011653Google Scholar
- STAR: Ultrafast universal RNA-seq alignerBioinformatics 29:15–21https://doi.org/10.1093/bioinformatics/bts635Google Scholar
- Post-embryonic hourglass patterns mark ontogenetic transitions in plant developmentMol. Biol. Evol 33:1158–1163https://doi.org/10.1093/molbev/msw039Google Scholar
- Why highly expressed proteins evolve slowlyProc. Natl. Acad. Sci 102:14338–14343https://doi.org/10.1073/pnas.0504070102Google Scholar
- Gene expression dynamics during rapid organismal diversification in African cichlid fishesNat. Ecol. Evol 5:243–250https://doi.org/10.1038/s41559-020-01354-3Google Scholar
- Molecular insights into the enigmatic metabolic regulator, SnRK1Trends Plant Sci 21:341–353https://doi.org/10.1016/j.tplants.2015.11.001Google Scholar
- OrthoFinder: Phylogenetic orthology inference for comparative genomicsGenome Biol 20https://doi.org/10.1186/s13059-019-1832-yGoogle Scholar
- Using GOstats to test gene lists for GO term associationBioinformatics 23:257–258https://doi.org/10.1093/bioinformatics/btl567Google Scholar
- An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing dataBMC Genomics 16https://doi.org/10.1186/s12864-015-1647-5Google Scholar
- Cold-induced chromatin compaction and nuclear retention of clock mRNAs resets the circadian rhythmEMBO J 39https://doi.org/10.15252/embj.2020105604Google Scholar
- Toward a synthetic understanding of the role of phenology in ecology and evolutionPhilos. Trans. R. Soc. B 365:3101–3112https://doi.org/10.1098/rstb.2010.0145Google Scholar
- Populations in a seasonal environmentMonogr. Popul. Biol 5:1–217Google Scholar
- The patterns and causes of variation in plant nucleotide substitution ratesAnnu. Rev. Ecol. Evol. Syst 42:245–266https://doi.org/10.1146/annurev-ecolsys-102710-145119Google Scholar
- eggNOG 6.0: enabling comparative genomics across 12 535 organismsNucleic Acids Res 51:D389–D394https://doi.org/10.1093/nar/gkac1022Google Scholar
- Genomic landscape of the global oak phylogenyNew Phytol 226:1198–1212https://doi.org/10.1111/nph.16162Google Scholar
- Comparative transcriptome analysis reveals vertebrate phylotypic period during organogenesisNat. Commun 2https://doi.org/10.1038/ncomms1248Google Scholar
- Gene expression divergence recapitulates the developmental hourglass modelNature 468:811–816https://doi.org/10.1038/nature09634Google Scholar
- Stress Response Suppressor1 and Stress Response Suppressor2, two Dead-box RNA helicases that attenuate Arabidopsis responses to multiple abiotic stressesPlant Physiol 145:814–830https://doi.org/10.1104/pp.107.099895Google Scholar
- GeMoMa: Homology-based gene prediction utilizing intron position conservation and RNA-seq dataMethods in Molecular BiologyHumana Press Inc pp. 161–177https://doi.org/10.1007/978-1-4939-9173-0_9Google Scholar
- Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotypeNat. Biotechnol 37:907–915https://doi.org/10.1038/s41587-019-0201-4Google Scholar
- KMC 3: counting and manipulating k-mer statisticsBioinformatics 33:2759–2761https://doi.org/10.1093/bioinformatics/btx304Google Scholar
- The transcriptional changes underlying the flowering phenology shift of Arabidopsis halleri in response to climate warmingPlant Cell Environ 47:174–186https://doi.org/10.1111/pce.14716Google Scholar
- Genomics of FagaceaeTree Genet. Genomes 8:583–610https://doi.org/10.1007/s11295-012-0498-3Google Scholar
- Araport: the Arabidopsis information portalNucleic Acids Res 43:D1003–D1009https://doi.org/10.1093/nar/gku1200Google Scholar
- Phenological pattern of tree regeneration in a model for forest species diversityTheor. Popul. Biol 49:90–117https://doi.org/10.1006/tpbi.1996.0004Google Scholar
- Molecular phenology in plants: in natura systems biology for the comprehensive understanding of seasonal responses under natural environmentsNew Phytol 210:399–412https://doi.org/10.1111/nph.13733Google Scholar
- Evolution of proteins and gene expression levels are coupled in Drosophila and are independently associated with mRNA abundance, protein length, and number of protein-protein interactionsMol. Biol. Evol 22:1345–1354https://doi.org/10.1093/molbev/msi122Google Scholar
- The mid-developmental transition and the evolution of animal body plansNature 531:637–641https://doi.org/10.1038/nature16994Google Scholar
- RSEM: Accurate transcript quantification from RNA-Seq data with or without a reference genomeBMC Bioinformatics 12https://doi.org/10.1186/1471-2105-12-323Google Scholar
- Fast and accurate short read alignment with Burrows-Wheeler transformBioinformatics 25:1754–1760https://doi.org/10.1093/bioinformatics/btp324Google Scholar
- Low rates of expression profile divergence in highly expressed genes and tissue-specific genes during mammalian evolutionMol. Biol. Evol 23:1119–1128https://doi.org/10.1093/molbev/msj119Google Scholar
- Chromosome-scale genome assembly of sweet tea (Lithocarpus polystachyus Rehder)Sci. Data 10https://doi.org/10.1038/s41597-023-02791-yGoogle Scholar
- A transcriptomic hourglass in brown algaeNature 635:129–135https://doi.org/10.1038/s41586-024-08059-8Google Scholar
- Systematics of Fagaceae: phylogenetic tests of reproductive trait evolutionInt. J. Plant Sci 162:1361–1379https://doi.org/10.1086/322949Google Scholar
- A Chromosome-level genome assembly of the European beech (Fagus sylvatica) reveals anomalies for organelle DNA integration, repeat content and distribution of SNPsFront. Genet 12https://doi.org/10.3389/fgene.2021.691058Google Scholar
- Impacts of climate change on the transcriptional dynamics and timing of bud dormancy release in Yoshino-cherry treePlants, People, Planet 6:1505–1521https://doi.org/10.1002/ppp3.10548Google Scholar
- Nitrogen as a key regulator of flowering in Fagus crenata: understanding the physiological mechanism of masting by gene expression analysisEcol. Lett 17:1299–1309https://doi.org/10.1111/ele.12338Google Scholar
- A review of plant phenology in South and Central AmericaIn:
- Schwartz M
- Annual transcriptome dynamics in natural environments reveals plant seasonal adaptationNat. Plants 5:74–83https://doi.org/10.1038/s41477-018-0338-zGoogle Scholar
- Conservation, acquisition, and functional impact of sex-biased gene expression in mammalsScience (1979) 365https://doi.org/10.1126/science.aaw7317Google Scholar
- Seasonal plasticity and diel stability of H3K27me3 in natural fluctuating environmentsNat. Plants 6:1091–1097https://doi.org/10.1038/s41477-020-00757-1Google Scholar
- Common pattern of evolution of gene expression level and protein sequence in DrosophilaMol. Biol. Evol 21:1308–1317https://doi.org/10.1093/molbev/msh128Google Scholar
- Salmon provides fast and bias-aware quantification of transcript expressionNat. Methods 14:417–419https://doi.org/10.1038/nmeth.4197Google Scholar
- GFF Utilities: GffRead and GffCompareF1000Res 9https://doi.org/10.12688/f1000research.23297.1Google Scholar
- Plant phenology and global climate change: current progresses and challengesGlob. Chang. Biol 25:1922–1940https://doi.org/10.1111/gcb.14619Google Scholar
- Emerging principles of regulatory evolutionProc. Natl. Acad. Sci 104:8605–8612https://doi.org/10.1073/pnas.0700488104Google Scholar
- Chromosome-scale shotgun assembly using an in vitro method for long-range linkageGenome Res 26:342–350https://doi.org/10.1101/gr.193474.115Google Scholar
- A transcriptomic hourglass in plant embryogenesisNature 490:98–101https://doi.org/10.1038/nature11394Google Scholar
- GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomesNat. Commun 11https://doi.org/10.1038/s41467-020-14998-3Google Scholar
- Plant molecular phenology and climate feedbacks mediated by BVOCsAnnu. Rev. Plant Biol 75:605–627https://doi.org/10.1146/annurev-arplant-060223-032108Google Scholar
- Somatic mutation rates scale with time not growth rate in long-lived tropical treeseLife 12https://doi.org/10.7554/eLife.88456Google Scholar
- A cross-scale approach to unravel the molecular basis of plant phenology in temperate and tropical climatesNew Phytol 233:2340–2353https://doi.org/10.1111/nph.17897Google Scholar
- Seasonal gene expression signatures of delayed fertilization in FagaceaeMol. Ecol 32:4801–4813https://doi.org/10.1111/mec.17079Google Scholar
- Phenology: An Integrative Environmental ScienceDordrecht: Springer Google Scholar
- Characterization of MADS-domain transcription factor complexes in Arabidopsis flower developmentProc. Natl. Acad. Sci. U. S. A 109:1560–1565https://doi.org/10.1073/pnas.1112871109Google Scholar
- Gene length corrected trimmed mean of M-values (GeTMM) processing of RNA-seq data performs similarly in intersample analyses while improving intrasample comparisonsBMC Bioinformatics 19https://doi.org/10.1186/s12859-018-2246-7Google Scholar
- Detecting rhythms in time series with rainJ Biol Rhythms 29:391–400https://doi.org/10.1177/0748730414553029Google Scholar
- Developmental system drift and flexibility in evolutionary trajectoriesEvol. Dev 3:109–119https://doi.org/10.1046/j.1525-142x.2001.003002109.xGoogle Scholar
- MCScanX: A toolkit for detection and evolutionary analysis of gene synteny and collinearityNucleic Acids Res 40https://doi.org/10.1093/nar/gkr1293Google Scholar
- Factors that contribute to variation in evolutionary rate among arabidopsis genesMol. Biol. Evol 28:2359–2369https://doi.org/10.1093/molbev/msr058Google Scholar
- PAML 4: Phylogenetic analysis by maximum likelihoodMol. Biol. Evol 24:1586–1591https://doi.org/10.1093/molbev/msm088Google Scholar
- BROTHER of FT and TFL1 (BFT) has TFL1-like activity and functions redundantly with TFL1 in inflorescence meristem development in ArabidopsisPlant J 63:241–253https://doi.org/10.1111/j.1365-313X.2010.04234.xGoogle Scholar
- High-resolution Hi-C maps highlight multiscale chromatin architecture reorganization during cold stress in Brachypodium distachyonBMC Plant Biol 23https://doi.org/10.1186/s12870-023-04269-wGoogle Scholar
- Phylogenomic analyses highlight innovation and introgression in the continental radiations of Fagaceae across the Northern HemisphereNat. Commun 13https://doi.org/10.1038/s41467-022-28917-1Google Scholar
- Casein kinase 1 AELs promote senescence by enhancing ethylene biosynthesis through phosphorylating WRKY22 transcription factorNew Phytol 244:116–130https://doi.org/10.1111/nph.19785Google Scholar
- Molecular-cytogenetic studies of ribosomal genes and heterochromatin reveal conserved genome organization among 11 Quercus speciesTheor. Appl. Genet 99:969–977https://doi.org/10.1007/s001220051404Google Scholar
- Deep learning suggests that gene expression is encoded in all parts of a co-evolving interacting gene regulatory structureNat. Commun 11https://doi.org/10.1038/s41467-020-19921-4Google Scholar
- Learning the regulatory code of gene expressionFront. Mol. Biosci 8https://doi.org/10.3389/fmolb.2021.673363Google Scholar
Article and author information
Author information
Version history
- Preprint posted:
- Sent for peer review:
- Reviewed Preprint version 1:
Cite all versions
You can cite all versions using the DOI https://doi.org/10.7554/eLife.107309. This DOI represents all versions, and will always resolve to the latest one.
Copyright
© 2025, Kudo et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
- views
- 13
- downloads
- 0
- citations
- 0
Views, downloads and citations are aggregated across all versions of this paper published by eLife.