Physical tree structures and phylogenetic trees constructed from somatic mutations.

a, Comparisons of physical tree structures (left, branch length in meters) and neighbor-joining (NJ) trees (right, branch length in the number of nucleotide substitutions) in two tropical tree species: S. laevis, a slow-growing species (S1 and S2), and S. leprosula, a fast-growing species (F1 and F2). IDs are assigned to each sample from which genome sequencing data were generated. Vertical lines represent tree heights. b, Distribution of somatic mutations within tree architecture. A white and gray panel indicates the presence (gray) and absence (white) of somatic mutation in each of eight samples compared to the genotype of sample 0. Sample IDs are the same between panels a and b. The distribution pattern of somatic mutations is categorized as Single, Double, and More depending on the number of samples possessing the focal somatic mutations. Among 27 – 1 possible distribution patterns, the patterns observed in at least one of the four individuals are shown.

The relationship between the physical distance and the numbers of SNVs.

a, Linear regression of the number of SNVs against the pair-wise distance between branch tipcs with an intercept of 0 for each tree (S1: blue, S2: right blue, F1: red, and F2: orange). Shaded areas represent 95% confidence intervals of regression lines. Regression coefficients are listed in Supplementary Table 3. b, Comparison of somatic mutation rates per nucleotide per growth and per year across four tropical trees. Bars indicate 95% confidence intervals.

Mutational spectra of somatic SNVs.

Somatic mutation spectra in S. laevis (upper panel) and S. leprosula (lower panel). The horizontal axis shows 96 mutation types on a trinucleotide context, coloured by base substitution type. Different colours within each bar indicate complementary bases. For each species, the data from two trees (S1 and S2 for S. laevis and F1 and F2 for S. leprosula) were pooled to calculate the fraction of each mutated triplet.

Detecting selection on somatic and inter-individual SNVs.

a, An illustration of somatic and inter-individual SNVs. Different colours indicate different genotypes. b, Expected (Exp.) and observed (Obs.) rates of somatic non-synonymous substitutions. c, Expected (Exp.) and observed (Obs.) rates of inter-individual non-synonymous substitutions. d, The difference between the fractions of inter-individual and somatic substitutions spectra in S. laevis (upper panel) and S. leprosula (lower panel). The positive and negative values are plotted in different colours. The horizontal axis shows 96 mutation types on a trinucleotide context, coloured by base substitution type.

Target tropical trees and location of study site.

a, Images of S. laevis (S1), a slow-growing species, and S. leprosula (F1), a fast-growing species. b, Location of the study site in central Borneo, Indonesia.

Workflow for identifying de novo somatic SNVs.

8 samples (seven leaves and one cambium) were collected from four trees (two trees from each species). DNA was extracted twice independently from each sample and sequenced independently. Reads were mapped to the reference genome and used for SNV calling and filtering. SNVs over 8 samples were called using GATK HaplotypeCaller (GATK) and Bcftools mpileup (BCF tools) for each set of biological replicates from 7 branches and 1 cambium independently, generating potential SNVs for each set of replicates and for each SNP caller (G1 and G2 for GATK, B1 and B2 for BCF tools). For BCF tools, we set three thresholds (T40, T30, and T20) with different base quality (BQ) and mapping quality (MQ). SNVs detected in both replicates were extracted for each SNP callers and generated potential SNVs for each SNP caller, SNVGATK for GATK and SNVBCF for Bcftools with three thresholds. These SNVs were filtered by extracting SNVs detected in both SNP callers, generating potential SNVs for each threshold: SNVT40, SNVT30, and SNVT20. Finally, SNVs detected at any of the three thresholds were extracted to obtain candidate SNVs. We checked the candidate SNVs manually and obtained a final set of SNVs, SNVFinal.

Synteny relationship between S. laevis and S. leprosula.

The collinear blocks within the genomes of S. leprosula and S. laevis were displayed by gray lines, with orange objects representing the contigs of the S. leprosula genome and green objects denoting the contigs of the S. laevis genome. In cases where the direction of a contig in S. laevis was partly different from that in S. leprosula, the contigs of the S. laevis genome were colored in red, otherwise it is indicated as green.

Mutational spectra of somatic and inter-individual substitutions.

a, Somatic mutation spectra for S1 and S2 individuals in S. laevis. b, Somatic mutation spectra for F1 and F2 individuals in S. leprosula. c, Inter-individual SNVs between S1 and S2 (upper panel) and between F1 and F2 (lower panel). The horizontal axis shows 96 mutation types on a trinucleotide context, coloured by base substitution type. Different colours in each bar indicate complementary bases.

Manual confirmation of candidate SNVs.

a, SNVs that passed manual confirmation. b, SNVs that were removed due to their fixed heterozygote pattern. c, SNVs that have been removed due to the difference between the observed pattern and the genotyping call. d, SNVs that were removed due to the presence of another allele with multiple reads.

Proportion of potential false positive SNVs for S. laevis (S1, S2) and S. leprosula (F1, F2).

Potential false positive SNVs was identified as the subset of candidate SNVs that were not included in the final set for each threshold (T40, T30, and T20). This subset was then divided by the total number of potential SNVs at each threshold to determine the proportion.

Proportion of potential false negative SNVs for S. laevis (S1, S2) and S. leprosula (F1, F2).

Potential false negative SNVs was identified as the subset of potential SNVs present in the final set but excluded from the candidate SNVs for each threshold (T40, T30, and T20). This subset was then divided by the total number of potential SNVs at each threshold to calculate the proportion.

A calculation scheme for the expected rate of non-synonymous mutation.

The possible numbers of synonymous (NS), missense (NM), and nonsense (NNON) mutations were counted for each of six base substitution classes from all possible mutations in CDS of length Lcds and used for the calculation of expected rate of non-synonymous mutation. For non-synonymous mutation, we pooled the number for missense and nonsense mutations. The background mutation rate for each substitution class i (ri) is calculated from the observed somatic substitutions in intergenic regions.

Summary statistics of the studied trees.

Height and DBH were directly measured for two individuals of S. laevis and S. leprosula. Age was estimated as DBH divided by a mean annual increment (MAI).

Summary statistics of genome assemblies for S. laevis and S. leprosula.

We assembled the genome using DNA extracted from the apical leaf at branch 1-1 of the tallest individual of each species (S1 and F1). Summary statistics of genome assemblies are listed here.

Somatic mutation rates.

The somatic mutation rate per nucleotide per meter was estimated as , where b indicates the slope of linear regression. The somatic mutation per nucleotide per year (μy) was estimated as , where M indicates the total number of SNVs accumulated from the base to the branch tip and A represents tree age, respectively. R denotes the number of callable sites.

Results of the binomial test for selection on somatic and inter-individual SNVs.

To test whether somatic and inter-individual SNVs are subject to selection, we calculated the expected rate of non-synonymous mutation. Given the observed number of non-synonymous and synonymous mutations, we rejected the null hypothesis of neutral selection using a binomial test with the significance level of 5%. pN_expected and pN_observed represent the expected and observed rate of non-synonymous substitutions.

Somatic mutation rates for six substitution classes.

Somatic mutation rates for six substitution classes were calculated based on the observed number of SNVs both from the intergenic region and the whole genome. S1+S2 and F1+F2 represent the use of pooled data from two individuals for each species: S. laevis (S1, S2) and S. leprosula (F1, F2). The values based on the pooled data (indicated in bold type) were used to calculate the expected rate of non-synonymous mutation.