Mosaic of somatic mutations in one of Earth’s largest organisms, Pando

Rozenn M Pineau author has email address
Karen E Mock
Jesse Morris
Vachel Kraklow
Andrea Brunelle
Aurore Pageot
William C Ratcliff
Zachariah Gompert

School of Biological Sciences, Georgia Institute of Technology, Atlanta, United States
University of Chicago, Chicago, United States
Department of Wildland Resources, Utah State University, Logan, United States
Ecology Center, Utah State University, Logan, United States
School of Environment, Society and Sustainability, University of Utah, Salt Lake City, United States
Earth and Environmental Sciences Division, Los Alamos National Laboratory, Los Alamos, United States
Independent researcher, Grenoble, France
Department of Biology, Utah State University, Logan, United States

https://doi.org/10.7554/eLife.110244.1

Open access
Copyright information

Figures and data

Parsing out the Pando samples from the surrounding clone samples.
(A) The projection of genotypes (22,888 variants) form three distinct clusters: two clusters with negative PC1 values and one cluster with positive PC1 values. Points are labeled with a color proportional to their PC1 value. (B) Plotting the PC1 value into the sampling space delineates the Pando cluster (positive PC1 values) from the surrounding clone clusters (negative PC1 values).

Replication power for somatic mutations
(A) To filter for somatic mutations, we labeled the mutations that were found in at least two samples per replicate group, and at most 80% of the samples (see methods for details on the filters). We identified 101 somatic mutations, (B) found in less than 40% of the individuals. (C) If a mutation is present in two samples in a group, it is found on average in 44% of the samples total.

To study the evolutionary history of the Pando clone, we generated datasets at different spatial scales and using different sequencing strategies.

The large-scale and fine-scale datasets have the same initial number of mutations as the variant calling was done on both sets at once. Nb. = number.

Detecting spatial genetic structure at large scale.
(A) We use the set of 3,942 somatic mutations identified in the Pando clone samples to test for spatial genetic structure. Focusing on the sample-level, we observe that the number of shared variants between pairs of samples decreases with the physical distance between sample pairs (Pearson correlation coefficient between number of variants and spatial distance is −0.02, 95% CI = [ −0.05, 0.00]), which is significantly different from a randomized distribution (P < 0.001) (B). (C & D) Focusing on the variant-level, we find that the mean distance within a group of samples sharing the variant is significantly less than expected by chance (mean distance for data is 264.28 m and mean distance for randomized dataset is 279.93 m, P < 0.001).

Conceptual model of somatic mutation inheritance between ramets within an aspen clone.
When a mutation arises, we expect it to propagate down to the new tissues as the clone continues to grow. New mutations are symbolized with the lightning bolt. The mutation identity is marked as a colored star and the dark marks corresponds to where samples could be collected from the clone.

Detecting spatial genetic structure at the finer scale.
We use the set of 3034 somatic mutations detected in the finer scale dataset to test for smaller-scale and within tissues spatial genetic structure. (A) Focusing at the sample-level, we observe an overall significantly negative correlation between genetic and physical distance (thick lines, Pearson correlation coefficient = −0.097, [CI] = [−0.12, 0.07]), driven mostly by the leaves and the roots (compared to null distributions, P < 0.001 and P = 0.026, respectively). (B) Focusing on the variant-level, we find that the mean distance within a group of samples sharing the variant (thick line, mean distance for the data is 46.33 m) is significantly less than expected by chance when considering all tissue types together (mean distance for the null distribution is 55.31 m, P < 0.001), signal that is mostly driven by the leaves (mean distance for leaves only is 39.28 m, as compared to 53.36 m expected under the null distribution, see S7 for means and p-values).

Pando is between 17,000 and 36,000 years old based on the large-scale dataset.
(A) We use the relationship between the proportion of missing mutations from a simulated dataset and the phylogenetic tree height to take into account the somatic mutations that we might be missing in the Pando clone (linear regression y = 0.10 + 0.11x, P < 2.2∗ 10¹⁶, R² = 0.92). (B) With this correction, we calculate the Pando clone age based on three different assumptions: (1) if the mutations we detect are all real, the Pando clone would be about 34,055 years old (±sd = 1,255 years); (2) if we are missing 80% of the mutations, then the clone would on average be 36,351 years old (±sd = 733 years); (3) finally, if only 20% of the mutations we detect are real somatic mutations, the Pando clone would be 17,238 years old (±sd = 29 years). (C) The Bayesian skyline plot suggests a steady population increase followed by a plateau. Note that this example was scaled for assumption 1 (all the mutations that we detect are real somatic mutations). (D) Despite thousands of years of evolutionary history, the Pando clone shows minimal phylogenetic structure (points colored according to PC1 score). (E) Pollen records from the Fish Lake show Populus was consistently present during the last 15,000 years, and generally well-represented over the last 60,000 years.

Localities for large-scale (left) and fine-scale (right) sampling.
Coordinates are given in Supplementary Table.

Sampling strategy for the fine-scale dataset.
Leaf, bark, branch and root samples from two localities within the Pando stand were collected. In site #1, located in a recently clear cut area, two ramets were heavily sampled (leaf, bark, branch and root samples), and five surrounding ramets were lightly sampled (leaf and root samples). In site #2, one ramet was heavily sampled and three surrounding ramets were lightly sampled.

Principal Coordinate Analysis (PCoA) on the large scale and finer scale datasets, colored by dataset, tree ID and tissue type.

Mean read depth per SNP in the replicate dataset.
The mean read depth per SNP for the mutations that were found in more than 2 samples per replicate group was not different from the mean read depth of the mutations that were not found in more than 2 samples per group (two-sided T test, t = 0.69, P = 0.51). Error bars indicate standard error. Replicate group 7 only had 2 samples left after filtering and was removed from downstream analyses.

Distributions of mean correlations between the spatial distance between pairs of samples, and the number of mutations they have in common, sorted by tissue type.
Correlation values: r = −0.06 for branches,r = −0.44 for leaves, r = −0.11 for roots, r = −0.05 for shoots.

Tissue-specific mutation loads.
(A) The number of somatic mutations differs between tissue types (ANOVA, F_3,97 = 14.47, P = 7.26e⁻⁸), with the leaves having a significantly higher number of mutations as compared to the roots, branches or shoots (Tukey HSD’s P < 0.0003). (B) When normalized by read depth, the leaves still show a significantly higher number of mutations as compared to root and branches, but not shoots (ANOVA, F_3,97 = 16.55, P = 9.22e⁻⁹ followed by Tukey HSD with P < 0.0001 for roots and branches).

Distributions of mean distance between samples sharing one mutation in the fine-scale dataset, sorted by tissue type.
Mean distance for leaves is 39.28 m, mean distance for roots is 51.36 m, mean distance for shoots is 46.12 m, mean distance for branches is 46.69 m.

Quality, read depth and allelic frequency comparisons between all sites in the “fine-scale” and the “replicate” datasets (A), and between the set of sites that replicate (B).
(A) The mean read depth per sample is overall higher in the “replicates” dataset than in the “fine-scale” dataset (two-sided T test,t = −5.37, P = 4.93 ×10⁻⁷). (B) We also observe a higher depth in the set of mutations that replicate between datasets.

Mean number of times a mutation from the swapped fine-scale sub-dataset is found in swapped replicate set.
To test whether the set of mutations from the fine-scale sub-dataset was more likely to replicate than a random set, we isolated one sample at random from the fine-scale sub-dataset (12 samples) and the replicate dataset (8 times these same 12 samples) to create a new reference set. We calculated the mean number of times a mutation identified in a sample from the fine-scale sub-dataset was found in the replicate dataset, and compared to the mean value from the initial replicate and fine-scale datasets (p=0.46).

The Pando clone age is between ∼12,000 and 37,000 years old based on the fine-scale dataset.
(A) We use the relationship between the proportion of missing mutations from a simulated dataset and the phylogenetic tree height to take into account the somatic mutations that we are missing in the Pando clone fine-scale dataset (linear regression y = 0.10 + 0.16x, P < 2.2∗ 10¹⁶, R² = 0.95). (B) With this correction, we calculate the Pando clone age based on three different assumptions: (1) if the mutations we detect are all real, the Pando clone would be about 31,421 years old (±sd = 1,584 years); (2) if we are missing 80% of the mutations, then the clone would on average be 37,126 years old (±sd = 1,308 years); (3) finally, if only 20% of the mutations we detect are real somatic mutations, the Pando clone would be 12,215 years old (±sd = 52 years). (C) Despite thousands of years of evolutionary history, the Pando clone shows minimal phylogenetic structure (points colored according to PC1 score).

Sign up for email alerts