Somatic mutation rates scale with time not growth rate in long-lived tropical trees

Akiko Satake; Ryosuke Imai; Takeshi Fujino; Sou Tomimoto; Kayoko Ohta; Mohammad Na’iem; Sapto Indrioko; Widiyatno; Susilo Purnomo; Almudena Mollá–Morales; Viktoria Nizhynska; Naoki Tani; Yoshihisa Suyama; Eriko Sasaki; Masahiro Kasahara

doi:10.7554/eLife.88456.2

eLife assessment

Satake and colleagues' important study elucidates somatic mutation processes in plants, demonstrating that in two tropical trees, mutation rates correlate with age, not growth rates. Their convincing evidence shows that many mutations do not align with cell divisions, suggesting many somatic mutations are generated in a replication-independent manner. This study represents a significant step towards advancing our understanding of plant development and the patterns and inheritance of mutations. This significant research is poised to engage a diverse array of scholars in plant evolution and development.

https://doi.org/10.7554/eLife.88456.2.sa3

Significance of findings

important: Findings that have theoretical or practical implications beyond a single subfield

landmark
fundamental
important
valuable
useful

Strength of evidence

convincing: Appropriate and validated methodology in line with current state-of-the-art

exceptional
compelling
convincing
solid
incomplete
inadequate

During the peer-review process the editor and reviewers write an eLife assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife assessments

Abstract

The rates of appearance of new mutations play a central role in evolution. However, mutational processes in natural environments and their relationship with growth rates are largely unknown, particular in tropical ecosystems with high biodiversity. Here, we examined the somatic mutation landscapes of two tropical trees, Shorea laevis (slow-growing) and S. leprosula (fast-growing), in central Borneo, Indonesia. Using newly-constructed genomes, we identified a greater number of somatic mutations in tropical trees than in temperate trees. In both species, we observed a linear increase in the number of somatic mutations with physical distance between branches. However, we found that the rate of somatic mutation accumulation per meter of growth was 3.7-fold higher in S. laevis than in S. leprosula. This difference in the somatic mutation rate was scaled with the slower growth rate of S. laevis compared to S. leprosula, resulting in a constant somatic mutation rate per year between the two species. We also found that somatic mutations are neutral within an individual, but those mutations transmitted to the next generation are subject to purifying selection. These findings suggest that somatic mutations accumulate with absolute time and older trees have a greater contribution towards generating genetic variation.

Significance Statement

The significance of our study lies in the discovery of an absolute time-dependent accumulation of somatic mutations in long-lived tropical trees, independent of growth rate. Through a comparative analysis of somatic mutation landscapes in slow- and fast-growing species, we observed a clock-like accumulation of somatic mutations in both species, regardless of their growth rates. Although the majority of somatic mutations were restricted to a single branch, we also identified mutations present in multiple branches, likely transmitted during growth. Our findings suggest that older trees make a greater contribution towards generating genetic variation.

Main text

Biodiversity ultimately results from mutations that provide genetic variation for organisms to adapt to their environments. However, how and when mutations occur in natural environments is poorly understood^1–3. Recent genomic data from long-lived multicellular species have begun to uncover the somatic genetic variation and the rate of naturally occurring mutations^4,5. The rate of somatic mutations per year in a 234-year-old oak tree has been found to be surprisingly low⁶ compared to the rate in an annual herb⁷. Similar analyses in other long-lived trees have also shown low mutation rates in both broadleaf trees^8–12 and conifers¹³. Despite the growing body of knowledge of somatic mutation landscapes in temperate regions, there is currently no knowledge on the somatic mutation landscapes in organisms living in tropical ecosystems, which are among the most diverse biomes on Earth.

Mutations can arise from errors during replication¹⁴, or from DNA damage caused by exogenous mutagens or endogenous reactions at any time during cell growth¹⁵. While DNA replication errors have long been assumed to be major sources of mutations^16,17, a modeling study that relates the mutation rate to rates of DNA damage, repair and cell division¹⁵ and experimental studies in yeast¹⁸, human¹⁹, and other animals²⁰ have shown the importance of mutagenic processes that do not depend on cell division. Consequently, it remains largely unknown which source of mutations, whether replicative or non-replicative, predominates in naturally growing organisms.

To investigate the rates and patterns of somatic mutation and their relation to growth rates in tropical organisms, we studied the somatic mutation landscapes of slow- and fast-growing tropical trees in a humid tropical rain forest of Southeast Asia. By comparing the somatic mutation landscape between slow- and fast-growing species in a tropical ecosystem, we can gain insights into the mutagenesis that occurs in a natural setting. This comparison provides a unique opportunity to understand the impact of growth rate on somatic mutations and its potential role in driving evolutionary processes.

Detecting somatic mutations in slow- and fast-growing tropical trees

The humid tropical rainforests of Southeast Asia are characterized by a preponderance of trees of the Dipterocarpaceae family²¹. Dipterocarp trees are highly valued for both their contribution to forest diversity and their use in timber production. For the purposes of this study, we selected Shorea laevis and S. leprosula, both native hardwood species of the Dipterocarpaceae family (Supplementary Fig. 1a). S. laevis is a slow-growing species²², with a mean annual increment (MAI) of diameter at breast height (DBH) of 0.38 cm/year (as measured over a 20 year period in n = 2 individuals; Supplementary Data 1). In contrast, S. leprosula exhibits a faster growth rate²², with an MAI of 1.21 cm/year (n = 18; Supplementary Data 1), which is 3.2 times greater than that of S. laevis. We selected the two largest individuals of each species (S1 and S2 for S.laevis and F1 and F2 for S. leprosula; Fig. 1a) at the study site, located just below the equator in central Borneo, Indonesia (Supplementary Fig. 1b). We collected leaves from the apices of seven branches and a cambium from the base of the stem from each tree (Fig. 1a; Supplementary Fig. 2), resulting in a total of 32 samples. To determine the physical distance between the sampling positions, we measured the length of each branch (Supplementary Data 2) and DBH (Supplementary Table 1). The average heights of the slow- and fast-growing species were 44.1 m and 43.9 m, respectively (Fig. 1a; Supplementary Data 1). While it is challenging to accurately estimate the age of tropical trees due to the absence of annual rings, we used the DBH/MAI to approximate the average age of the slow-growing species to be 256 years and the fast-growing species to be 66 years (Supplementary Table 1).

Physical tree structures and phylogenetic trees constructed from somatic mutations.
a, Comparisons of physical tree structures (left, branch length in meters) and neighbor-joining (NJ) trees (right, branch length in the number of nucleotide substitutions) in two tropical tree species: *S. laevis*, a slow-growing species (S1 and S2), and *S. leprosula*, a fast-growing species (F1 and F2). IDs are assigned to each sample from which genome sequencing data were generated. Vertical lines represent tree heights. b, Distribution of somatic mutations within tree architecture. A white and gray panel indicates the presence (gray) and absence (white) of somatic mutation in each of eight samples compared to the genotype of sample 0. Sample IDs are the same between panels a and b. The distribution pattern of somatic mutations is categorized as Single, Double, and More depending on the number of samples possessing the focal somatic mutations. Among 2⁷ – 1 possible distribution patterns, the patterns observed in at least one of the four individuals are shown.

To identify somatic mutations, we constructed new reference genomes of the slow- and fast-growing species. We generated sequence data using long-read PacBio RS II and short-read Illumina sequencing and assembled the genome using DNA extracted from the apical leaf at branch 1-1 of the tallest individual of each species (S1 and F1). The genomes were estimated to contain 52,935 and 40,665 protein-coding genes, covering 97.9% and 97.8% of complete BUSCO genes (eudicots_odb10) for the slow- and fast-growing species (Supplementary Table 2). Genome sizes estimated using k-mer distribution were 347 and 376 Mb for the slow- and fast-growing species, respectively. The synteny relationship between S. laevis and S. leprosula exhibited a high level of conservation overall (Supplementary Fig. 3).

To accurately identify somatic mutations, we extracted DNA from each sample twice to generate two biological replicates (Supplementary Fig. 2). A total of 64 DNA samples were sequenced, yielding an average coverage of 69.3 and 56.5× per sample for the slow- and fast-growing species, respectively (Supplementary Data 5). We identified Single Nucleotide Variants (SNVs) within the same individual by identifying those that were identical within two biological replicates of each sample (Supplementary Fig. 2). We identified 728 and 234 SNVs in S1 and S2, and 106 and 68 SNVs in F1 and F2, respectively (Supplementary Fig. 2; Supplementary Data 4). All somatic mutations were unique and did not overlap between individuals. We conducted an independent evaluation of a subset of the inferred single nucleotide variants (SNVs) using amplicon sequencing. Our analysis demonstrated accurate annotation for 31 out of 33 mutations (94% overall), with 22 out of 24 mutations on S1 and all 9 mutations on S2 (Supplementary Table 5).

Somatic mutation rates per year is independent of growth rate

Phylogenetic trees constructed using somatic mutations were almost perfectly congruent with the physical tree structures (Fig. 1a), even though we did not incorporate knowledge of the branching topology of the tree in the SNV discovery process. The majority of somatic mutations were present at a single branch, but we also identified somatic mutations present in multiple branches (Fig. 1b) which are likely transmitted to new branches during growth. We also observed somatic mutations that did not conform to the branching topology (Fig. 1b), as theoretically predicted due to the stochastic loss of somatic mutations during branching²³.

Our analysis revealed that the number of SNVs increases linearly as the physical distance between branch tips increases (Fig. 2a). The somatic mutation rate per site per meter was determined by dividing the slope of the linear regression of the number of SNVs against the physical distance between branch tips by the number of callable sites from the diploid genome of each tree (Fig. 2b; Supplementary Table 3). The somatic mutation rate per nucleotide per meter was 7.08 × 10^-9 (95% CI: 6.41–7.74 × 10⁻⁹) and 4.27 × 10^-9 (95% CI: 3.99–4.55 × 10⁻⁹) for S1 and S2, and 1.77 × 10^-9 (95% CI: 1.64–1.91 × 10⁻⁹) and 1.29 × 10^-9 (95% CI: 1.05–1.53 × 10⁻⁹) for F1 and F2, respectively. The average rate of somatic mutation for the slow-growing species was 5.67 × 10^-9 nucleotide^-1 m^-1, which is 3.7-fold higher than the average rate of 1.53 × 10^-9 nucleotide^-1 m^-1 observed in the fast-growing species (Fig. 2b; Supplementary Table 3). This result indicates that the slow-growing tree accumulates more somatic mutations compared to the fast-growing tree to grow the unit length. This cannot be explained by differences in the number of cell divisions, as the length and diameter of fiber cells in both species are not substantially different (1.29 mm and 19.0 μm for the slow-growing species²⁴ and 0.91mm and 22.7 μm for the fast-growing species²⁵).

The relationship between the physical distance and the numbers of SNVs.
a, Linear regression of the number of SNVs against the pair-wise distance between branch tipcs with an intercept of 0 for each tree (S1: blue, S2: right blue, F1: red, and F2: orange). Shaded areas represent 95% confidence intervals of regression lines. Regression coefficients are listed in Supplementary Table 3. b, Comparison of somatic mutation rates per nucleotide per growth and per year across four tropical trees. Bars indicate 95% confidence intervals.

Based on the estimated age of each tree, somatic mutation rate per nucleotide per year was calculated for each tree. On average, resultant values were largely similar between the two species, with 7.71 × 10^-10 and 8.05 × 10^-10 nucleotide^-1 year^-1 for the slow- and fast-growing species, respectively (Fig. 2b; Supplementary Table 3). This result suggests that somatic mutation accumulates in a clock-like manner as they age regardless of tree growth. The result suggests that somatic mutation accumulates in a clock-like manner as they age regardless of tree growth. Our estimates of somatic mutation rates per nucleotide per year in Shorea are higher than those previously reported in other long-lived trees such as Quercus robur⁶, Populus trichocarpa¹¹, Eucalyptus melliodora¹⁰ and Picea sitchensis¹³. This might suggest that long-lived trees in the tropics do not necessarily suppress somatic mutation rates to the same extent as their temperate counterparts. To validate this assertion, additional studies are required to compare somatic mutation rates among trees in tropical, temperate, and boreal regions, employing standardized methodologies.

Mutational spectra are similar between slow- and fast-growing trees

Somatic mutations may be caused by exogenous factors such as ultraviolet and ionizing radiation, or endogenous factors such as oxidative respiration and errors in DNA replication. To identify characteristic mutational signatures caused by different mutagenic factors, we characterized mutational spectra by calculating the relative frequency of mutations at the 96 triplets defined by the mutated base and its flanking 5’ and 3’ bases (Fig. 3; Supplementary Fig. 4). Across species, the mutational spectra showed a dominance of cytosine-to-thymine (C>T and G>A on the other strand, noted as C:G>T:A) substitutions at CpG sites with CG (Fig. 3a, b). This is believed to result from the spontaneous deamination of 5-methylcytosine^26,27. Methylated CpG sites spontaneously deaminate, leading to TpG sites and increasing the number of C>T substitutions²⁸. Compared to the proportion of CpG sites in the reference genomes, the proportion of somatic mutations at CpG sites showed a 3.38-fold and 2.56-fold increase for F1 and F2, and a 4.54-fold and 3.53-fold increase for S1 and S2, respectively.

Mutational spectra of somatic SNVs.
Somatic mutation spectra in *S. laevis* (upper panel) and *S. leprosula* (lower panel). The horizontal axis shows 96 mutation types on a trinucleotide context, coloured by base substitution type. Different colours within each bar indicate complementary bases. For each species, the data from two trees (S1 and S2 for *S. laevis* and F1 and F2 for *S. leprosula*) were pooled to calculate the fraction of each mutated triplet.

We compared the mutational spectra of our tropical trees to single-base substitution (SBS) signatures in human cancers using the Catalogue Of Somatic Mutations In Cancer (COSMIC) compendium of mutation signatures (COSMICv.2^29–31). The mutational spectra were largely similar to the dominant mutation signature in humans known as SBS1 (cosine similarity = 0.789 and 0.597 for the slow- and fast-growing species; Supplementary Data 6). SBS1 is believed to result from the spontaneous deamination of 5-methylcytosine. The mutational spectra were also comparable to another dominant signature in all human cancers, SBS5 (cosine similarity = 0.577 and 0.558 for the slow- and fast-growing species; Supplementary Data 6), the origin of which remains unknown. Our finding that somatic mutations in tropical trees accumulate in a clock-like manner (Fig. 2a) is consistent with the clock-like mutational process observed in SBS1 and SBS5 in human somatic cells^32,33. This suggests that the mutational processes in plants and animals are conserved, despite the variation in their life forms and environmental conditions.

Somatic mutations are neutral but inter-individual SNVs are subject to selection

We tested whether the somatic mutations and inter-individual SNVs are subject to selection (Fig. 4a). The observed rate of non-synonymous somatic mutations did not deviate significantly from the expected rate under the null hypothesis of neutral selection in both the slow-(binomial test: P = 0.71) and fast-growing (binomial test: P = 1.0) species (Fig. 4b; Supplementary Table 4). In contrast, the number of inter-individual SNVs were significantly smaller than expected (P < 10^-15 for both species: Fig. 4c). These results indicate that somatic mutations are largely neutral within an individual, but mutations passed to next generation are subject to strong purifying selection during the process of embryogenesis, seed germination and growth.

Detecting selection on somatic and inter-individual SNVs.
a, An illustration of somatic and inter-individual SNVs. Different colours indicate different genotypes. b, Expected (Exp.) and observed (Obs.) rates of somatic non-synonymous substitutions. c, Expected (Exp.) and observed (Obs.) rates of inter-individual non-synonymous substitutions. d, The difference between the fractions of inter-individual and somatic substitutions spectra in *S. laevis* (upper panel) and *S. leprosula* (lower panel). The positive and negative values are plotted in different colours. The horizontal axis shows 96 mutation types on a trinucleotide context, coloured by base substitution type.

Overall, the mutational spectra were similar between somatic and inter-individual SNVs (Supplementary Fig. 4). However, the fraction of C>T substitutions, in particular at CpG sites, was lower in inter-individual SNVs compared to somatic SNVs (Fig. 4d). This observation may be indicative of the potential influence of GC-biased gene conversion during meiosis³⁴ or biased purifying selection for C>T inter-individual nucleotide substitutions.

Discussion

Our study demonstrates that while the somatic mutation rate per meter is higher in the slow-than in fast-growing species, the somatic mutation rate per year is independent of growth rate. To gain deeper understanding of these findings, we developed a simple model that decomposes the mutation rate per site per cell division (μ) into the two components: DNA replication dependent (α) and replication independent (β) mutagenesis. This can be represented as μ = α + βϋ, where ϋ is the duration of cell cycle measured in years. The replication dependent mutation emanates from errors that occur during DNA replication, such as the misincorporation of a nucleotide during DNA synthesis. The replication independent mutation arises from DNA damage caused by endogenous reactions or exogenous mutagens at any time of cell cycle. Since the number of cell division per year is given as r = 1/ϋ, the mutation rate per year becomes rμ = α/ϋ + β. From the relationship, the number of nucleotide substitution per site accumulated over t years, denoted as m(t), is given by m(t) = (α/ϋ + β)t. The formula indicates that when β is significantly greater than α, somatic mutations accumulate with tree age rather than with tree growth.

We estimated the relative magnitudes of α and β by using the results obtained from our study. Given that the cell cycle duration is likely inversely proportional to MAI, we have ϋ_S/ϋ_F = 3.2 (Supplementary Data 1), where ϋ_S and ϋ_F denote the cell cycle duration for the slow- and fast-growing species, respectively. It is also reasonable to assume that the same number of cell divisions are required to achieve 1 m of growth in both species as the cell size is similar between the two species. Based on our estimates of the somatic mutation rate per site per meter for the slow-(μ_S) and fast-growing species (μ_F), we have μ_S/μ_F = (α + βϋ_S)/(α + βϋ_F) = 3.7, which is close to the ratio of cell cycle duration ϋ_S/ϋ_F. This consistency can be explained by the substantial contribution of the replication independent mutagenesis to the somatic mutation rate (i.e. β ≫ α), as long as the magnitudes for α and β are similar between the two species. The time required for a unit length to grow can vary even within the same species, depending on microenvironmental conditions such as the availability of light and nutrients. These variations could explain the differences in somatic mutation rates per unit growth between two individuals within the same species (Fig. 2).

This argument concords with previous studies in human and other animals, which showed the presence of mutations that do not track cell division^19,20. This study contributes to understanding the importance of non-replicative mutagenesis in naturally grown trees by decoupling the impacts of growth and time on the rate of somatic mutation. The preponderance of non-replicative mutational process can be attributed to its distinct molecular origin, the accumulation of spontaneous CpG mutations with absolute time. The neutral nature of newly arising somatic mutations within the tree results in a molecular clock, a constant rate of molecular evolution^35–37. For our argument, we made an intuitive assumption that the number of stem cell divisions increases with distance regardless of species when cell size is similar. However, to further validate this assumption, we require mathematical models that consider the asymmetric division of stem cells within the meristem^38,39 and complex stem cell population dynamics during elongation and branching in tree growth^23,40. Moreover, understanding establishment timing of germlines during development is crucial in addressing the impact of somatic mutation on the next generation³⁹. The model we have presented here is based on the assumption that genetic drift is prominent within a stem cell population, and that a single stem cell lineage becomes fixed within a meristem. However, future studies could explore relaxing this assumption to consider the contribution of multiple stem cell lineages. By doing so, we can gain insights into how the relationship between pairwise genetic differences and the distance between branch tips is influenced by the branching architecture of the tree and the strength of genetic drift. Furthermore, improving the accuracy of our argument, as derived from the model, can be achieved through future investigations that directly estimate the cell cycle duration for each individual tree.

The relative importance of replication independent mutagenesis, represented as the relative magnitude of β compared to α, can vary through evolution possibly through selection on DNA repair pathways. The selection pressure that leads to different magnitudes either or both for

α or β may explain the differential somatic mutation rate per year in mammals with different lifespan⁴¹. Conversely, in plants, the selection pressure to constrain somatic mutation rates to lower levels in long-lived trees might be less significant. A definitive answer to this query awaits the accumulation of additional data on somatic mutation rates in closely related plant species inhabiting the same environment but exhibiting different growth rates.

Supporting information

Supplementary Tables

Materials and Methods

Study site and sampling methods

The study site is in a humid tropical rain forest in Central Borneo, Indonesia (00°49′ 45.7′′ S, 112°00′ 09.5′′ E; Supplementary Fig. 1b). The forest is characterized by a prevalence of trees of the Dipterocarpaceae family and is managed through a combination of selective logging and line planting (Tebang Pilih Tanam Jalur, TPTJ). The mean annual temperature range from 2001 to 2009 was between 22 to 28°C at night and 30 to 33°C during the day, with an average annual precipitation of 3376 mm⁴².

The study focuses on two native Dipterocarpaceae species, S. laevis and S. leprosula (Supplementary Fig. 1a). We logged two individuals from each species (S1 and S2 for S. laevis and F1 and F2 for S leprosula; Supplementary Fig. 1a) on July 17–18, 2018 and collected samples prior to their transportation for timber production. Approximately 0.4–1.0 g of leaf tissue was collected from each of the apices of seven branches and approximately 5 g of cambium tissue was taken from the base of the stem per individual (Supplementary Fig. 2). To calculate the physical distance between sampling positions within the tree architecture, we measured the length of each branch (Supplementary Data 2). Samples were promptly preserved in a plastic bag with silica gel following harvest and transported to the laboratory within 4 days of sampling. During transportation, samples were kept in a cooler box with ice to maintain a low temperature. Once in the laboratory, samples were stored at −80°C until DNA and RNA extraction.

DBH have been recorded for the trees with DBH greater than 10 cm every two years since 1998 within three census plots of 1 hectare (100 × 100 m) in size located near the target trees. The mean growth was calculated by taking the average of MAI of DBH for 2 and 18 trees for the slow- and fast-growing species, respectively (Supplementary Data 1).

DNA extraction

For short-read sequencing, DNA extraction was performed using a modified version of the method described previously⁴³ as follows: Frozen leaves were ground in liquid nitrogen and washed up to five times with 1 mL buffer (including 100 mM HEPES pH 8.0, 1% PVP, 50 mM Ascorbic acid, 2% (v/v) β-mercaptoethanol)⁴⁴. DNA was treated with Ribonuclease (Nippongene, Tokyo, Japan) according to the manufacture’s instruction. DNA was extracted twice independently from each sample for two biological replicates. The DNA yield was measured on a NanoDrop ND-2000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA) and Qubit4 Fluorometer (Thermo Fisher Scientific). For long-read sequencing, we extracted high molecular weight genomic DNA from branch 1-1 leaf materials of S1 and F1 individuals using a modified CTAB method⁴⁵.

RNA extraction and sequencing

For genome annotation, total RNA was extracted from the cambium sample of the S1 individual of S. laevis in accordance with the method described in a previous study⁴⁶. RNA integrity was measured using the Agilent RNA 6000 Nano kit on a 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA), and the RNA yield was determined using a NanoDrop ND-2000 spectrophotometer (Thermo Fisher Scientific). The extracted RNA was sent to Pacific Alliance Lab (Singapore), where a cDNA library was prepared with a NEBNext^® Ultra™ RNA Library Prep Kit for Illumina (New England BioLabs, Ipswich, MA, USA) and 150 paired-end transcriptome sequencing was conducted using an Illumina NovaSeq6000 sequencer (Illumina, San Diego, CA, USA). For S. leprosula, we used published RNA-seq data⁴⁷.

Illumina short-read sequencing and library preparation

For Illumina short-read sequencing, the DNA sample from the first replicate of the S1 individual of S. laevis was sent to the Next Generation Sequencing Facility at Vienna BioCenter Core Facilities (VBCF), a member of the Vienna BioCenter (VBC) in Austria, for library preparation and sequencing on the Illumina HiSeq2500 platform (Illumina). The library was prepared using the on-bead tagmentation library prep method according to the manufacturer’s protocol and was individually indexed with the Nextera index Kit (Illumina) by PCR. Insert size was adjusted to around 450 bp. The quantity and quality of each amplified library were analyzed using the Fragment Analyzer (Agilent Technologies) and the HS NGS Fragment Kit (Agilent Technologies).

The DNA sample from the second replicate of the S1 individual and two replicates from the S2, F1, and F2 individuals were sent to Macrogen Inc. (Republic of Korea) for sequencing on the Illumina HiseqX platform (Illumina). DNA was sheared to around 500 bp fragments in size using dsDNA fragmentase (New England BioLabs). Library preparation was performed using the NEBNext Ultra II DNA Library Prep Kit (New England BioLabs) according to the manufacturer’s protocol, and the libraries were individually indexed with the NEBNext Multiplex Oligos for Illumina (New England BioLabs) by PCR. The quality and quantity of each amplified library were analyzed using the Bioanalyzer 2100 (Agilent Technologies), the High Sensitivity DNA kit (Agilent Technologies), and the NEBNext Library Quant Kit for Illumina (New England BioLabs). In total, 64 samples (16 samples per individual) were used for short-read sequencing.

PacBio long-read sequencing and library preparation

To construct the reference genome of S. laevis and S. leprosula, high molecular weight DNA samples were extracted from branch 1-1 leaf materials of S1 and F1 individuals of each species, and sequenced using PacBio platforms. For S. laevis, library preparation and sequencing were performed at VBCF. The library was prepared using the SMRTbell express Kit (PacBio, Menlo Park, CA, USA), and sequenced on the Sequel platform with six SMRT cells (PacBio). For S. leprosula, library preparation and sequencing were performed by Macrogen Inc. (Republic of Korea). The library for S. leprosula was prepared using the HiFi SMARTbell library preparation system (PacBio) according to the manufacturer’s protocol, and was sequenced on the Sequel II platform (PacBio) with one SMRT cell.

Genome assembly

The PacBio continuous long reads of S. laevis were assembled using Flye 2.7-b1587⁴⁸ with 12 threads and with an estimated genome size of 350 Mbp. We subsequently used HyPo v1.0.3⁴⁹ for polishing the contigs. The Illumina read alignments provided to HyPo were created using Bowtie v2.3.4.3⁵⁰ with--very-sensitive option and using 32 threads. We used the Illumina reads from all branches of the individual S1 rather than utilizing exclusively those of branch 1-1, in order to capitalize on the increased aggregate sequencing depth.

The PacBio HiFi reads of S. leprosula with an average Quality Value (QV) 20 or higher were extracted, and subsequently assembled using Hifiasm 0.16.1-r375⁵¹, with -z10 option and using 40 threads. The primary assembly of S. leprosula was used for further analysis. The quality and completeness of the genome assembly were assessed by searching for a set of 2,326 core genes from eudicots_odb10 using BUSCO v5.3.0⁵² for each species (Supplementary Table 2).

Genome annotation

We constructed repeat libraries of S. laevis and S. leprosula using EDTA v2.0.0⁵³. Using the libraries, we ran RepeatMasker 4.1.2-p1⁵⁴ with -s option and with Cross_match as a search engine, to perform soft-masking of trepetitive sequences in the genomes. The estimated percentages of the repetitive sequences were 42.4% for S. laevis and 39.5% for S. leprosula (Supplementary Table 2).

We ran BRAKER 2.1.6⁵⁵ to perform gene prediction by first incorporating RNA-seq data and subsequently utilizing a protein database, resulting in the generation of two sets of gene predictions for each species. To perform RNA-seq-based prediction, we mapped the RNA-seq reads (see RNA extraction in Methods section) to the genomes using HISAT 2.2.1⁵⁶, with the alignments subsequently being employed as training data for BRAKER. For protein-based prediction, we used proteins from the Viridiplantae level of OrthoDB v10⁵⁷ as the training data.

The two sets of gene predictions were merged using TSEBRA (commit 0e6c9bf in the GitHub repository)⁵⁸ to select reliable gene predictions for each species. Although in principle TSEBRA groups overlapping transcripts and considers them as alternative spliced isoforms of the same gene, we identified instances where one transcript in a gene overlapped with another transcript in a separate gene. In such cases, we manually clustered these transcripts into the same gene.

We used EnTAP 0.10.8⁵⁹ with default parameters for functional annotation. The databases employed were: UniProtKB release 2022_05⁶⁰, NCBI RefSeq plant proteins release 215⁶¹, EnTAP Binary Database v0.10.8⁵⁹ and EggNOG 4.1⁶². We constructed the standard gene model by utilizing the gene predictions of each species, eliminating any gene structures that lacked a complete ORF. Transcripts containing Ns were also excluded. Following the filtering process, the splice variant displaying the longest coding sequence (CDS) was selected as the primary isoform for each gene. The set of primary isoforms was used as the standard gene model.

Genome size estimation

We estimated genome size of two species using GenomeScope⁶³. We counted k-mer from forward sequence data of branch 1-1 from the S1 and F1 individuals using KMC 3⁶⁴ (k = 21). The genome size and heterozygous ratio were estimated by best model fitting. Estimated genome sizes were 347 Mb for the slow-growing species and 376 Mb for the fast-growing species. These estimates were 8% and 7% smaller than the estimates obtained through flow cytometry⁶⁵, respectively. The genome size of the fast-growing species was nearly identical to that previously reported for S. leprosula in peninsular Malaysia⁴⁷.

Genome synteny analysis

To investigate the syntenic relationship between S. laevis and S. leprosula, the synteny analysis performed using the MCScanX in TBtools-II (Toolbox for Biologists) v1.120 (https://github.com/CJ-Chen/TBtools/releases) with default parameters. For the synteny analysis, we selected 20 contigs from S. leprosula because these were the only ones that exhibited synteny blocks between the two species. 20 contigs covers more than 99.5% of the S. leprosula genome. The syntenic blocks spanning more than 30 genes were displayed in the synteny map (Supplementary Fig. 3).

Somatic (intra-individual) SNV discovery

We filtered low quality reads out and trimmed adapters using fastp v22.0⁶⁶ with following options: -q 20 -n 10 -t 1 -T 1 -l 20 -w 16. The cleaned reads were mapped to the reference genome using bwa-mem2 22.1⁶⁷ with default parameters. We removed PCR duplicates using fixmate and markdup function of samtools 1.13⁶⁸. The sequence reads were mapped to the reference genome, yielding average mapping rates of 91.61% and 89.5% for the slow- and fast-growing species, respectively. To identify reliable SNVs, we utilized two SNP callers (bcftools mpileup^68,69 and GATK (4.2.4.0) HaplotypeCaller⁷⁰) and extracted SNVs detected by both (Supplementary Fig. 2).

We first called SNVs with BCFtools 1.13⁷¹ mpileup at three different thresholds; threshold 1 (T40): mapping quality (MQ) = 40, base quality (BQ) = 40; threshold 2 (T30): MQ = 30, BQ = 30; threshold 3 (T20): MQ = 20, BQ = 20. SNVs detected under each threshold were pooled for further analyses, with duplicates removed. We normalized indels using bcftools norm for vcf files. We removed indels and missing data using vcftools 0.1.16⁷².

Second, we called SNVs using GATK (4.2.4.0) HaplotypeCaller and merged the individual gvcfs into a vcf file containing only variant sites. We removed indels from the vcf using the GATK SelectVariants. We filtered out unreliable SNVs using GATK VariantFiltration with the following filters: QD (Qual By Depth) < 2.0, QUAL (Base Quality) < 30.0, SOR (Strand Odds Ratio) > 4.0, FS (Fisher Strand) > 60.0, MQ (RMS Mapping Quality) < 40.0, MQRankSum (Mapping Quality Rank Sum Test) < -12.5, ReadPosRankSum (Read Pos Rank Sum Test) < -8.0. After performing independent SNV calling for each biological replicate using each SNP caller, we extracted SNVs that were detected in both replicates for each SNP caller. We further extracted SNVs that were detected by both bcftools mpileup and GATK HaplotypeCaller (Supplementary Fig. 2) using Tassel5⁷³ and a custom python script, generating potential SNVs for each threshold. Finally, SNVs detected at any of the three thresholds were extracted to obtain candidate SNVs. The number of SNVs at each filtering step can be found in Supplementary Data 4.

The candidate SNV calls were manually confirmed by two independent researchers using the IGV browser⁷⁴. We removed sites from the list of candidates if there were fewer than five high-quality reads (MQ > 20) in at least one branch sample among the 16 samples. After labeling branches carrying the called variant as somatic mutations, we compared the observed pattern with the genotyping call and extracted SNVs that were supported more than one read in both biological replicates (Supplementary Fig. 5a). We illustrated three types of false positive SNVs that were removed from the list of candidates in Supplementary Fig. 5b–d. The final set of SNVs can be found in Supplementary Data 7. Proportion of potential false positive and negative SNVs for each threshold are illustrated in Supplementary Fig. 6 and 7.

The NJ tree for each individual was generated using MEGA11⁷⁵ based on the matrix of the number of sites with somatic SNVs present between each pair of branches and edited using FigTree v1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/). Most of the somatic SNVs were heterozygous, whereas 4% of the total SNVs (46/1136) were homozygous (Supplementary Data 7). The homozygous sites were treated as a single mutation due to the likelihood of a genotyping error being higher than the probability of two mutations occurring at the same site.

Inter-individual SNV discovery

We also identified SNVs between pairs of individuals within each species as inter-individuals SNVs. The method for calling inter-individual SNVs was the same as for intra-individual SNVs, except that only threshold 2 (MQ = 30, BQ = 30) for BCFtools 1.13⁷¹ was used. We extracted SNVs that are present in all branches within an individual using Tassel5⁷³. To exclude ambiguous SNV calls, we removed SNVs within 151 bp of indels that were called with BCFtools 1.13⁷¹ with the option of threshold 2. We eliminated SNVs within 151 bp of sites with a depth value of zero that occur in more than ten consecutive sites. We also removed SNVs that had a depth smaller than five or larger than d + 3√d, where d represents the mean depth of all sites⁷⁶. Due to the large number of candidates for inter-individual SNVs, the manual checking process was skipped.

Somatic SNVs confirmation by amplicon sequencing

We verified the reliability of the final set of somatic SNVs by amplicon sequencing approximately 5% of the SNVs in S. laevis (31 and 10 SNVs for S1 and S2, respectively). We used multiplexed phylogenetic marker sequencing method (MPM-seq⁷⁷) with modifications to the protocol as follows: to amplify 152–280 bp fragments, the first PCR primers comprising tail sequences for the second PCR primers were designed on the flanking regions of each SNV. The first PCR was conducted using the Fast PCR cycling kit (Qiagen, Düsseldorf, Germany) under the following conditions: an initial activation step at 95°C for 5 minutes, followed by 30 cycles of denaturation at 96°C for 5 seconds, annealing at 50/54/56°C for 5 seconds, and extension at 68°C for 10 seconds. This was followed by a final incubation at 72°C for 1 minute. Subsequent next-generation sequencing was performed on an Illumina MiSeq platform using the MiSeq Reagent Kit v2 (300 cycles: Illumina).

Amplicon sequencing reads were mapped to the reference genome using bwa-mem2 22.1⁶⁷ with default parameters. Using bcftools mpileup⁶⁹, we called the genotypes of all sites on target regions and eliminated candidate sequences with MQ and BQ less than 10. The final set of sites selected for confirmation consisted of 24 for the S1 individual and 9 for the S2 individual. We manually confirmed the polymorphic patterns at the target sites using the IGV browser⁷⁴. If the alternative allele was present or absent in all eight branches in the amplicon sequence, the site was determined as fixed. The site was determined as mismatch if the difference of polymorphic patterns between the somatic SNV calls and amplicon sequence was supported by more than four reads per branch. The sites that were neither fixed nor mismatched were determined as true. 94% (31/33) of SNVs at the final target sites, with 22 out of 24 mutations on S1 and all 9 mutations on S2, were confirmed to exhibit a polymorphic pattern that exactly matched between the somatic SNV calls and amplicon sequence (Supplementary Data 5). It is important to note that the SNVs that were not matched with amplicon sequencing data could potentially represent true somatic mutations. This discrepancy could be attributed to a low allele frequency, where the call is not identified as heterozygous despite the presence of a true mutation.

Somatic mutation rates per growth and per year

To estimate the somatic mutation rate per nucleotide per growth (μ_g), a linear regression analysis of the number of somatic SNVs against the physical distance between sampling positions within an individual was conducted using the lm package, with an intercept of zero, in R version 3.6.2. The somatic mutation rate per nucleotide per growth was estimated as:

where b indicates the slope of linear regression and R denotes the number of callable sites, respectively. Note that the denominator includes a factor of two due to diploidy. A site was considered callable when it passed the filters as the polymorphic sites, that is, a mapping quality of at least 40 using GATK, a mapping quality of at least 20 using BCFtools, and a depth greater than or equal to 5. This resulted in 388,801,756 and 320,739,335 base pairs for S1 and S2 and 327,435,618 and 263,488,812 base pairs for F1 and F2, respectively.

The somatic mutation rate per nucleotide per year (μ_y) was estimated as:

Here, M indicate the total number of SNVs accumulated from the base (ID 0 in Fig. 1a; Supplementary Data 2) to the branch tip and A represents tree age, respectively. R denotes the number of callable sites that was also used to estimate μ_g. Because there are seven branch tips for each tree (Fig. 1a), we estimated μ_y for each of branch tips and then calculated the mean and 95% confidence interval for each tree (Supplementary Table 3).

Mutational spectrum

Mutational spectra were derived directly from the reference genome and alternative alleles at each variant site. There are a total of six possible classes of base substitutions at each variant site: A:T>G:C (T>C), G:C>A:T (C>T), A:T>T:A (T>A), G:C>T:A (C>A), A:T>C:G (T>G), and G:C>C:G (C>G), By considering the bases immediately 5′ and 3′ to each mutated base, there are a total of 96 possible mutation classes, referred to as triplets, in this classification. We used seqkit⁷⁸ to extract the triplets for each variant site. To count the number of each triplet, we used the Wordcount tool in the EMBOSS web service (https://www.bioinformatics.nl/cgi-bin/emboss/wordcount). We calculated the fraction of each mutated triplet by dividing the number of mutated triplets by the total number of triplets in the reference genome.

We compared the mutational signatures of our tropical trees to those of single-base substitution (SBS) signatures in human cancers using Catalogue Of Somatic Mutations In Cancer (COSMIC) compendium of mutation signatures (COSMICv.2^79–81, available at https://cancer.sanger.ac.uk/cosmic/signatures_v2). Cosine similarity was calculated between each tropical tree species and each SBS signature in human cancers.

Testing selection of somatic and inter-individual SNVs

To test whether somatic and inter-individual SNVs are subject to selection, we calculated the expected rate of non-synonymous mutation. For the CDS of length L_cds, there are possible numbers of mutations of length of 3L_cds (Supplementary Fig. 8). We classified all possible mutations into three types based on the codon table: synonymous, missense, and nonsense (Supplementary Fig. 8). Each type of mutation was counted for each of the six base substitution classes (Supplementary Fig. 8). We generated count tables based on two distinct categories of CDS: those that included all isoforms and those that only encompassed primary isoforms (Supplementary Data 8). As the two tables were largely congruent, we employed the version which included all isoforms of CDS.

Using the count table and background mutation rate for each category of substitution class, we calculated the expected number of synonymous (λ_S) and non-synonymous mutations (λ_N) (Supplementary Fig. 8). As a background mutation rate, we adopted the observed somatic mutation rates in the six substitution classes in the intergenic region (Supplementary Table 5), assuming that the intergenic region is nearly neutral to selection. Because the number of nonsense somatic mutation is small, we combined missense and nonsense mutations as non-synonymous. The intergenic regions were identified as the regions situated between 1 kbp upstream of the start codon and 500 bp downstream of the stop codon. Expected rate of synonymous mutation (p_N) is given as λ_N/(λ_S + λ_N). Given the observed number of non-synonymous and synonymous mutations, we rejected the null hypothesis of neutral selection using a binomial test with the significance level of 5% (Supplementary Table 4). We used the package binom.test in R v3.6.2.

We also used the observed somatic mutation rate in the whole genome (Supplementary Table 5), including genic and intergenic regions, as the background mutation rate and confirmed the robustness of our conclusion (Supplementary Tables 4). The somatic mutation rates in the intergenic region and the whole genome were calculated for each species by pooling the data from two individuals (Supplementary Table 5). While cancer genomics studies have accounted for more detailed context-dependent mutations, such as the high rate of C>T at CpG dinucleotides⁸² or comprehensive analysis of 96 possible substitution classes in triplet context⁸³, the number of SNVs in our tropical trees is too small to perform such a comprehensive analysis. Therefore, we used the relatively simple six base substitution classes. The genes with somatic SNVs can be found in Supplementary Data 7.

Data availability

The raw sequencing data have been deposited to DDBJ under accessons DRX404986-DRX405036 for S. laevis and DRX412534-DRX412566 for S. leprosula. The genome assembly and the gene annotation are available under accessions BSQA01000001-BSQA01007745 for S. laevis and BSQB01000001-BSQB01000070 for S. leprosula.

Code availability

The codes for the bioinformatics pipeline to process whole genome sequencing data is available from https://github.com/ku-biomath/Shorea_mutation_detection.

Acknowledgements

The authors would like to thank to M. Ohno for her insightful discussion, M. Seki for his assistance with statistical analysis, S.K. Hirota for his technical support in molecular experiments, and Y. Ikezaki for her support in synteny analyses. We also thank Y. Iwasa, H. Tachida, M.M. Manuel, N. Spisak, M. Przeworski and M, Nordborg for their very insightful comments on the initial draft of our manuscript.

Authors contributions

A.S. conceived and designed the analysis; M.N, S.I, W., S.P., N.T., Y.S. and A.S. collected samples; K.O., R.I., A.M.M., V.N., and Y.S. performed molecular experiments; R.I., E.S., S.T. and A.S. analyzed data; T.F. and M.K. performed reference genome construction. A.S. leaded writing the paper with input from all authors. This study was funded by JSPS KAKENHI (JP17H06478, JP23H04966, JP23H04965, and JP23H04966 to A.S. and JP22H04925 (PAGS) to M.K.).

Additional information

Additional supporting information will be found in the online version of this article.

Competing interest

The authors declare that they have no competing financial and non-financial interests.

Target tropical trees and location of study site.
a, Images of *S. laevis* (S1), a slow-growing species, and *S. leprosula* (F1), a fast-growing species. b, Location of the study site in central Borneo, Indonesia.

Workflow for identifying *de novo* somatic SNVs.
8 samples (seven leaves and one cambium) were collected from four trees (two trees from each species). DNA was extracted twice independently from each sample and sequenced independently. Reads were mapped to the reference genome and used for SNV calling and filtering. SNVs over 8 samples were called using GATK HaplotypeCaller (GATK) and Bcftools mpileup (BCF tools) for each set of biological replicates from 7 branches and 1 cambium independently, generating potential SNVs for each set of replicates and for each SNP caller (G₁ and G₂ for GATK, B₁ and B₂ for BCF tools). For BCF tools, we set three thresholds (T40, T30, and T20) with different base quality (BQ) and mapping quality (MQ). SNVs detected in both replicates were extracted for each SNP callers and generated potential SNVs for each SNP caller, SNV_GATK for GATK and SNV_BCF for Bcftools with three thresholds. These SNVs were filtered by extracting SNVs detected in both SNP callers, generating potential SNVs for each threshold: SNV_T40, SNV_T30, and SNV_T20. Finally, SNVs detected at any of the three thresholds were extracted to obtain candidate SNVs. We checked the candidate SNVs manually and obtained a final set of SNVs, SNV_Final.

Synteny relationship between *S. laevis* and *S. leprosula*.
The collinear blocks within the genomes of *S. leprosula* and *S. laevis* were displayed by gray lines, with orange objects representing the contigs of the *S. leprosula* genome and green objects denoting the contigs of the *S. laevis* genome. In cases where the direction of a contig in *S. laevis* was partly different from that in *S. leprosula*, the contigs of the *S. laevis* genome were colored in red, otherwise it is indicated as green.

Mutational spectra of somatic and inter-individual substitutions.
a, Somatic mutation spectra for S1 and S2 individuals in *S. laevis*. b, Somatic mutation spectra for F1 and F2 individuals in *S. leprosula*. c, Inter-individual SNVs between S1 and S2 (upper panel) and between F1 and F2 (lower panel). The horizontal axis shows 96 mutation types on a trinucleotide context, coloured by base substitution type. Different colours in each bar indicate complementary bases.

Manual confirmation of candidate SNVs.
a, SNVs that passed manual confirmation. b, SNVs that were removed due to their fixed heterozygote pattern. c, SNVs that have been removed due to the difference between the observed pattern and the genotyping call. d, SNVs that were removed due to the presence of another allele with multiple reads.

Proportion of potential false positive SNVs for *S. laevis* (S1, S2) and *S. leprosula* (F1, F2).
Potential false positive SNVs was identified as the subset of candidate SNVs that were not included in the final set for each threshold (T40, T30, and T20). This subset was then divided by the total number of potential SNVs at each threshold to determine the proportion.

Proportion of potential false negative SNVs for *S. laevis* (S1, S2) and *S. leprosula* (F1, F2).
Potential false negative SNVs was identified as the subset of potential SNVs present in the final set but excluded from the candidate SNVs for each threshold (T40, T30, and T20). This subset was then divided by the total number of potential SNVs at each threshold to calculate the proportion.

A calculation scheme for the expected rate of non-synonymous mutation.
The possible numbers of synonymous (N_S), missense (N_M), and nonsense (N_NON) mutations were counted for each of six base substitution classes from all possible mutations in CDS of length L_cds and used for the calculation of expected rate of non-synonymous mutation. For non-synonymous mutation, we pooled the number for missense and nonsense mutations. The background mutation rate for each substitution class i (*r_i*) is calculated from the observed somatic substitutions in intergenic regions.

Summary statistics of the studied trees.
Height and DBH were directly measured for two individuals of *S. laevis* and *S. leprosula*. Age was estimated as DBH divided by a mean annual increment (MAI).

Summary statistics of genome assemblies for *S. laevis* and *S. leprosula*.
We assembled the genome using DNA extracted from the apical leaf at branch 1-1 of the tallest individual of each species (S1 and F1). Summary statistics of genome assemblies are listed here.

Somatic mutation rates.
The somatic mutation rate per nucleotide per meter was estimated as , where b indicates the slope of linear regression. The somatic mutation per nucleotide per year (*μ_y*) was estimated as , where M indicates the total number of SNVs accumulated from the base to the branch tip and A represents tree age, respectively. R denotes the number of callable sites.

Results of the binomial test for selection on somatic and inter-individual SNVs.
To test whether somatic and inter-individual SNVs are subject to selection, we calculated the expected rate of non-synonymous mutation. Given the observed number of non-synonymous and synonymous mutations, we rejected the null hypothesis of neutral selection using a binomial test with the significance level of 5%. *p_N*_expected and *p_N*_observed represent the expected and observed rate of non-synonymous substitutions.

Somatic mutation rates for six substitution classes.
Somatic mutation rates for six substitution classes were calculated based on the observed number of SNVs both from the intergenic region and the whole genome. S1+S2 and F1+F2 represent the use of pooled data from two individuals for each species: *S. laevis* (S1, S2) and *S. leprosula* (F1, F2). The values based on the pooled data (indicated in bold type) were used to calculate the expected rate of non-synonymous mutation.

References

1.
1. Whitham T. G.
2. Slobodchikoff C. N
1981Evolution by individuals, plant-herbivore interactions, and mosaics of genetic variability: The adaptive significance of somatic mutations in plantsOecologia 49:287–292Google Scholar
2.
1. Gill D. E.
2. Chao L.
3. Perkins S. L.
4. Wolf J. B
1995Genetic mosaicism in plants and clonal animalsAnnu Rev Ecol Syst 26:423–444Google Scholar
3.
1. Schoen D. J.
2. Schultz S. T
2019Somatic mutation and evolution in plantsAnnu. Rev. Ecol. Evol. Syst 50:49–73Google Scholar
4.
1. Yu L.
2. et al.
2020Somatic genetic drift and multilevel selection in a clonal seagrassNature Ecology & Evolution 2020 4:7 4:952–962Google Scholar
5.
1. Reusch T. B. H.
2. Baums I. B.
3. Werner B
2021Evolution via somatic genetic variation in modular speciesTrends Ecol Evol 36:1083–1092Google Scholar
6.
1. Schmid-Siegert E.
2. et al.
2017Low number of fixed somatic mutations in a long-lived oak treeNat Plants 3:926–929Google Scholar
7.
1. Ossowski S.
2. et al.
2010The rate and molecular spectrum of spontaneous mutations in Arabidopsis thalianaScience 327:92–94Google Scholar
8.
1. Plomion C.
2. et al.
2018Oak genome reveals facets of long lifespanNat Plants 4:440–452Google Scholar
9.
1. Wang L.
2. et al.
2019The architecture of intra-organism mutation rate variation in plantsPLoS Biol 17:1–29Google Scholar
10.
1. Orr A. J.
2. et al.
2020A phylogenomic approach reveals a low somatic mutation rate in a long-lived plantProceedings of the Royal Society B: Biological Sciences 287Google Scholar
11.
1. Hofmeister B. T.
2. et al.
2020A genome assembly and the somatic genetic and epigenetic mutation rate in a wild long-lived perennial Populus trichocarpaGenome Biol 21:1–27Google Scholar
12.
1. Duan Y.
2. et al.
2022Limited accumulation of high-frequency somatic mutations in a 1700-year-old Osmanthus fragrans treeTree Physiol 42:2040–2049Google Scholar
13.
1. Hanlon V. C. T.
2. Otto S. P.
3. Aitken S. N
2019Somatic mutations substantially increase the per-generation mutation rate in the conifer Picea sitchensisEvol Lett 3:348–358Google Scholar
14.
1. Reijns M. A. M.
2. et al.
2015Lagging-strand replication shapes the mutational landscape of the genomeNature 518:502–506Google Scholar
15.
1. Gao Z.
2. Wyman M. J.
3. Sella G.
4. Przeworski M
2016Interpreting the dependence of mutation rates on age and timePLoS Biol 14:e1002355Google Scholar
16.
1. Makova K. D.
2. Li W.-H
2002Strong male-driven evolution of DNA sequences in humans and apesNature 416:624–626Google Scholar
17.
1. Tomasetti C.
2. Vogelstein B
2015Variation in cancer risk among tissues can be explained by the number of stem cell divisionsScience 347:78–81Google Scholar
18.
1. Liu H.
2. Zhang J
2019Yeast Spontaneous mutation rate and spectrum vary with environmentCurrent Biology 29:1584–1591Google Scholar
19.
1. Abascal F.
2. et al.
Somatic mutation landscapes at single-molecule resolutionGoogle Scholar
20.
1. De Manuel M.
2. Wu F. L.
3. Przeworski M
2022A paternal bias in germline mutation is widespread in amniotes and can arise independently of cell division numbersElife 11Google Scholar
21.
1. J, G
2016Dipterocarp Biology, Ecology, and ConservationOxford University Press Google Scholar
22.
1. Widiyatno W.
2. Soekotjo S.
3. Naiem M.
4. Purnomo S.
5. Setiyanto P. E
2014Early performance of 23 dipterocarp species planted in logged-over rainforestJournal of Tropical Forest Science 26:259–266Google Scholar
23.
1. Tomimoto S.
2. Satake A
2023Modelling somatic mutation accumulation and expansion in a long-lived tree with hierarchical modular architectureJournal of Theoretical Biology 565:111465Google Scholar
24.
1. Usami K
1978Tropical Woods as Pulp StuffsJournal of Agricultural Research Quarterly 12:109–114Google Scholar
25.
1. Praptoyo H.
2. Mayaningsih R
2012Anatomical features of wood from some fast growing red merantiProceeding of The 4th International Symposium of IWoRs 7:8Google Scholar
26.
1. Coulondre C.
2. Miller J. H.
3. Farabaugh P. J.
4. Gilbert W
1978Molecular basis of base substitution hotspots Escherichia coliNature 274:568–571Google Scholar
27.
1. Duncan B. K.
2. Miller J. H
1980Mutagenic deamination of cytosine residues in DNANature 287:560–561Google Scholar
28.
1. Cooper D. N.
2. Krawczak M
1989Cytosine methylation and the fate of CpG dinucleotides in vertebrate genomesHum Genet 83:181–188Google Scholar
29.
1. Alexandrov L. B.
2. Nik-Zainal S.
3. Wedge D. C.
4. Campbell P. J.
5. Stratton M. R
2013Deciphering signatures of mutational processes operative in human cancerCell Rep 3:246–259Google Scholar
30.
1. Nik-Zainal S.
2. et al.
2016Landscape of somatic mutations in 560 breast cancer whole-genome sequencesNature 534:47–54Google Scholar
31.
1. Alexandrov L. B.
2. et al.
2020The repertoire of mutational signatures in human cancerNature 578:94–101Google Scholar
32.
1. Alexandrov L. B.
2. et al.
2015Clock-like mutational processes in human somatic cells. Europe PMC Funders GroupNat Genet 47:1402–1407Google Scholar
33.
1. Lee-Six H.
2. et al.
2019The landscape of somatic mutation in normal colorectal epithelial cellsNature 574:532–537Google Scholar
34.
1. Duret L.
2. Galtier N
2009Biased gene conversion and the evolution of mammalian genomic landscapesAnnu Rev Genomics Hum Genet 10:285–311Google Scholar
35.
1. Zuckerkandl E.
2. Pauling L
1965Evolving Genes and ProteinsAcademic Press Google Scholar
36.
1. Kimura M.
2. Ohta T
1971On the rate of molecular evolutionJ Mol Evol 1:1–17Google Scholar
37.
1. Kimura M
1983The Neutral theory of molecular evolutionCambridge University Press https://doi.org/10.1017/CBO9780511623486 Google Scholar
38.
1. Watson J. M.
2. et al.
2016Germline replications and somatic mutation accumulation are independent of vegetative life span in ArabidopsisProc Natl Acad Sci U S A 113:12226–12231Google Scholar
39.
1. Lanfear R
2018Do plants have a segregated germline?PLoS Biol 16:e2005439Google Scholar
40.
1. Iwasa Y.
2. Tomimoto S.
3. Satake A
2023The genetic structure within a single tree is determined by the behavior of the stem cells in the meristemGenetics https://doi.org/10.1093/GENETICS/IYAD020 Google Scholar
41.
1. Cagan A.
2. et al.
2022Somatic mutation rates scale with lifespan across mammalsNature 604:517–524Google Scholar
42.
1. Widiyatno W.
2. Soekotjo S.
3. Naiem M.
4. Purnomo S.
5. Setiyanto P. E
2014Early performance of 23 dipterocarp species planted in logged-over rainforestJournal of Tropical Forest Science 26:259–266Google Scholar
43.
1. Doyle J. J.
2. Doyle J. L.
1987A rapid DNA isolation procedure for small quantities of fresh leaf tissuePhytochemical Bulletin 19:11–15Google Scholar
44.
1. Toyama H.
2. et al.
2015Effects of logging and recruitment on community phylogenetic structure in 32 permanent forest plots of Kampong Thom, CambodiaPhilosophical Transactions of the Royal Society B: Biological Sciences 370:1–13Google Scholar
45.
1. Doyle J
1991DNA Protocols for plants in molecular techniques in taxonomySpringer https://doi.org/10.1007/978-3-642-83962-7_18 Google Scholar
46.
1. Yeoh S. H.
2. et al.
2017Unravelling proximate cues of mass flowering in the tropical forests of South-East Asia from gene expression analysesMol Ecol 26:5074–5085Google Scholar
47.
1. Ng K. K. S.
2. et al.
2021The genome of Shorea leprosula (Dipterocarpaceae) highlights the ecological relevance of drought in aseasonal tropical rainforestsCommun Biol 4:1–14Google Scholar
48.
1. Kolmogorov M.
2. Yuan J.
3. Lin Y.
4. Pevzner P. A
2019Assembly of long, error-prone reads using repeat graphsNat Biotechnol 37:540–546Google Scholar
49.
1. Kundu R.
2. Casey J.
3. Sung W.-K.
2019HyPo: Super Fast & Accurate polisher for long read genome assembliesbioRxiv Google Scholar
50.
1. Langmead B.
2. Salzberg S. L
2012Fast gapped-read alignment with Bowtie 2Nat Methods 9Google Scholar
51.
1. Cheng H.
2. Concepcion G. T.
3. Feng X.
4. Zhang H.
5. Li H
2021Haplotype-resolved de novo assembly using phased assembly graphs with hifiasmNat Methods 18Google Scholar
52.
1. Manni M.
2. Berkeley M. R.
3. Seppey M.
4. Simão F. A.
5. Zdobnov E. M
2021BUSCO Update: Novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomesMol Biol Evol 38Google Scholar
53.
1. Ou S.
2. et al.
2019Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipelineGenome Biol 20Google Scholar
54.
1. Smit A.
2. Hubley R.
3. Green P.
RepeatMasker Open-4.0http://www.repeatmasker.org
55.
1. Brůna T.
2. Hoff K. J.
3. Lomsadze A.
4. Stanke M.
5. Borodovsky M
2021BRAKER2: Automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein databaseNAR Genom Bioinform 3Google Scholar
56.
1. Kim D.
2. Paggi J. M.
3. Park C.
4. Bennett C.
5. Salzberg S. L
2019Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotypeNat Biotechnol 37Google Scholar
57.
1. Zdobnov E. M.
2. et al.
2021OrthoDB in 2020: Evolutionary and functional annotations of orthologsNucleic Acids Res 49Google Scholar
58.
1. Gabriel L.
2. Hoff K. J.
3. Brůna T.
4. Borodovsky M.
5. Stanke M
2021TSEBRA: transcript selector for BRAKERBMC Bioinformatics 22Google Scholar
59.
1. Hart A. J.
2. et al.
2020EnTAP: Bringing faster and smarter functional annotation to non-model eukaryotic transcriptomesMol Ecol Resour 20Google Scholar
60.
1. Bateman A.
2. et al.
2021UniProt: the universal protein knowledgebase in 2021Nucleic Acids Res 49Google Scholar
61.
1. O’Leary N. A.
2. et al.
2016Reference sequence (RefSeq) database at NCBI: Current status, taxonomic expansion, and functional annotationNucleic Acids Res 44Google Scholar
62.
1. Powell S.
2. et al.
2014EggNOG v4.0: Nested orthology inference across 3686 organismsNucleic Acids Res 42Google Scholar
63.
1. Vurture G. W.
2. et al.
2017GenomeScope: Fast reference-free genome profiling from short readsBioinformatics 33:2202–2204Google Scholar
64.
1. Kokot M.
2. Dlugosz M.
3. Deorowicz S
2017KMC 3: counting and manipulating k-mer statisticsBioinformatics 33:2759–2761Google Scholar
65.
1. Ng C. H.
2. et al.
2016Genome size variation and evolution in DipterocarpaceaePlant Ecol Divers 9:437–446Google Scholar
66.
1. Chen S.
2. Zhou Y.
3. Chen Y.
4. Gu J
2018Fastp: An ultra-fast all-in-one FASTQ preprocessorBioinformatics 34:i884–i890Google Scholar
67.
1. Vasimuddin M.
2. Sanchit M.
3. Heng L.
4. Srinivas A
2019Efficient architecture-aware acceleration of BWA-MEM for multicore systemsProceedings - 2019 IEEE 33rd International Parallel and Distributed Processing Symposium, IPDPS 2019:314–324https://doi.org/10.1109/IPDPS.2019.00041 Google Scholar
68.
1. Li H.
2. et al.
2009The sequence alignment/map format and SAMtoolsBioinformatics 25:2078–2079Google Scholar
69.
1. Li H
2011A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing dataBioinformatics 27:2987–2993Google Scholar
70.
1. McKenna A.
2. et al.
2010The genome analysis toolkit: A MapReduce framework for analyzing next-generation DNA sequencing dataGenome Res 20Google Scholar
71.
1. Danecek P.
2. et al.
2021Twelve years of SAMtools and BCFtoolsGigascience 10:1–4Google Scholar
72.
1. Danecek P.
2. et al.
2011The variant call format and VCFtoolsBioinformatics 27:2156–2158Google Scholar
73.
1. Bradbury P. J.
2. et al.
2007TASSEL: Software for association mapping of complex traits in diverse samplesBioinformatics 23Google Scholar
74.
1. Robinson J. T.
2. Thorvaldsdóttir H.
3. Wenger A. M.
4. Zehir A.
5. Mesirov J. P
2017Variant review with the integrative genomics viewerCancer Research 77https://doi.org/10.1158/0008-5472.CAN-17-0337 Google Scholar
75.
1. Tamura K.
2. Stecher G.
3. Kumar S
2021MEGA11: Molecular evolutionary genetics analysis version 11Mol Biol Evol 38Google Scholar
76.
1. Li H.
2. Wren J
2014Toward better understanding of artifacts in variant calling from high-coverage samplesBioinformatics 30:2843–2851Google Scholar
77.
1. Suyama Y.
2. et al.
2022Complementary combination of multiplex high-throughput DNA sequencing for molecular phylogenyEcol Res 37:171–181Google Scholar
78.
1. Shen W.
2. Le S.
3. Li Y.
4. Hu F
2016SeqKit: A cross-platform and ultrafast toolkit for FASTA/Q file manipulationPLoS One 11:1–10Google Scholar
79.
1. Alexandrov L. B.
2. Nik-Zainal S.
3. Wedge D. C.
4. Campbell P. J.
5. Stratton M. R
2013Deciphering signatures of mutational processes operative in human cancerCell Rep 3:246–259Google Scholar
80.
1. Nik-Zainal S.
2. et al.
2016Landscape of somatic mutations in 560 breast cancer whole-genome sequencesNature 534:47–54Google Scholar
81.
1. Alexandrov L. B.
2. et al.
2020The repertoire of mutational signatures in human cancerNature 578:94–101Google Scholar
82.
1. Greenman C.
2. Wooster R.
3. Futreal P. A.
4. Stratton M. R.
5. Easton D. F
2006Statistical analysis of pathogenicity of somatic mutations in cancerGenetics 173:2187–2198Google Scholar
83.
1. Martincorena I.
2. et al.
2017Universal patterns of selection in cancer and somatic tissuesCell 171:1029–1041Google Scholar

Article and author information

Author information

Akiko Satake
Department of Biology, Faculty of Science, Kyushu University, Fukuoka 819-0395, Japan
ORCID iD: 0000-0002-0831-8617
- Correspondence to: Akiko Satake (akiko.satake@kyudai.jp)
- These two authors contributed equally to this work
Ryosuke Imai
Department of Biology, Faculty of Science, Kyushu University, Fukuoka 819-0395, Japan
- These two authors contributed equally to this work
Takeshi Fujino
Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 277-8561, Chiba, Japan
Sou Tomimoto
Department of Biology, Faculty of Science, Kyushu University, Fukuoka 819-0395, Japan
Kayoko Ohta
Department of Biology, Faculty of Science, Kyushu University, Fukuoka 819-0395, Japan
Mohammad Na’iem
Faculty of Forestry, Universitas Gadjah Mada, Jl. Agro No. 1 Bulaksumur Yogyakarta 55281, Indonesia
Sapto Indrioko
Faculty of Forestry, Universitas Gadjah Mada, Jl. Agro No. 1 Bulaksumur Yogyakarta 55281, Indonesia
Widiyatno
Faculty of Forestry, Universitas Gadjah Mada, Jl. Agro No. 1 Bulaksumur Yogyakarta 55281, Indonesia
Susilo Purnomo
PT. Sari Bumi Kusuma, Sungai Raya, Pontianak Kota, Pontianak, West Kalimantan 78391, Indonesia
Almudena Mollá–Morales
Gregor Mendel Institute of Molecular Plant Biology, Austrian Academy of Sciences, Vienna BioCenter (VBC), Dr. Bohr-Gasse 3, 1030, Vienna, Austria
Viktoria Nizhynska
Gregor Mendel Institute of Molecular Plant Biology, Austrian Academy of Sciences, Vienna BioCenter (VBC), Dr. Bohr-Gasse 3, 1030, Vienna, Austria
Naoki Tani
Forestry Division, Japan International Research Center for Agricultural Sciences, Tsukuba, Ibaraki, 305-8686, Japan, Faculty of Life and Environmental Sciences, University of Tsukuba, Tsukuba, Ibaraki, 305-8686, Japan
ORCID iD: 0000-0001-9878-6352
Yoshihisa Suyama
Field Science Center, Graduate School of Agricultural Science, Tohoku University, 232-3 Yomogida, Naruko-onsen, Osaki, Miyagi, 989-6711, Japan
ORCID iD: 0000-0002-3136-5489
Eriko Sasaki
Department of Biology, Faculty of Science, Kyushu University, Fukuoka 819-0395, Japan
ORCID iD: 0000-0002-0878-364X
Masahiro Kasahara
Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 277-8561, Chiba, Japan
ORCID iD: 0000-0001-8430-1414

Version history

Preprint posted: January 27, 2023
Sent for peer review: April 19, 2023
Reviewed Preprint version 1: June 6, 2023
Reviewed Preprint version 2: October 9, 2023
Version of Record published: October 23, 2024

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.88456. This DOI represents all versions, and will always resolve to the latest one.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

views: 3,821
downloads: 324
citations: 26

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Significance of findings

Strength of evidence

Abstract

Significance Statement

Main text

Detecting somatic mutations in slow- and fast-growing tropical trees

Physical tree structures and phylogenetic trees constructed from somatic mutations.

Somatic mutation rates per year is independent of growth rate

The relationship between the physical distance and the numbers of SNVs.

Mutational spectra are similar between slow- and fast-growing trees

Mutational spectra of somatic SNVs.

Somatic mutations are neutral but inter-individual SNVs are subject to selection

Detecting selection on somatic and inter-individual SNVs.

Discussion

Supporting information

Materials and Methods

Study site and sampling methods

DNA extraction

RNA extraction and sequencing

Illumina short-read sequencing and library preparation

PacBio long-read sequencing and library preparation

Genome assembly

Genome annotation

Genome size estimation

Genome synteny analysis

Somatic (intra-individual) SNV discovery

Inter-individual SNV discovery

Somatic SNVs confirmation by amplicon sequencing

Somatic mutation rates per growth and per year

Mutational spectrum

Testing selection of somatic and inter-individual SNVs

Data availability

Code availability

Acknowledgements

Authors contributions

Additional information

Competing interest

Target tropical trees and location of study site.

Workflow for identifying de novo somatic SNVs.

Synteny relationship between S. laevis and S. leprosula.

Mutational spectra of somatic and inter-individual substitutions.

Manual confirmation of candidate SNVs.

Proportion of potential false positive SNVs for S. laevis (S1, S2) and S. leprosula (F1, F2).

Proportion of potential false negative SNVs for S. laevis (S1, S2) and S. leprosula (F1, F2).

A calculation scheme for the expected rate of non-synonymous mutation.

Summary statistics of the studied trees.

Summary statistics of genome assemblies for S. laevis and S. leprosula.

Somatic mutation rates.

Results of the binomial test for selection on somatic and inter-individual SNVs.

Somatic mutation rates for six substitution classes.

References

Article and author information

Author information

Akiko Satake†

Ryosuke Imai†

Takeshi Fujino

Sou Tomimoto

Kayoko Ohta

Mohammad Na’iem

Sapto Indrioko

Widiyatno

Susilo Purnomo

Almudena Mollá–Morales

Viktoria Nizhynska

Naoki Tani

Yoshihisa Suyama

Eriko Sasaki

Masahiro Kasahara

Version history

Cite all versions

Copyright

Metrics

Akiko Satake

Ryosuke Imai