The distribution of variant allele frequencies changes with the growth phases and by sampling.

(a) In the current population, cells divide symmetrically into two daughter cells or asymmetrically with only one daughter cell kept in the focused population. All other events are mathematically equivalent to and are treated as a part of cell death. (b) The rates of symmetric and asymmetric division change during the population growth and lead to a dynamical distribution of variant allele frequencies. (c) The observed VAF distribution is shifted again during sampling compared to the VAF of the whole population – a fact should be considered when inferring population properties through genetic data.

Bulk sequencing based VAF and mutation rate inferences in healthy esophagus.

(a) Expected VAF distributions from evolving Eq. (1) to different time points for a population with an initial exponential growth phase and subsequent constant population phase (mature size N = 103). Once the population reaches the maximum carrying capacity, the distribution moves from a 1/ f 2 growing population shape (purple) to a 1/ f constant population shape (green). Note that the shift slows considerably at older age. (b) VAF from healthy tissue in the oesophagus of 9 individuals sorted into age brackets. The youngest bracket, 20-39, is closer to the developmental 1/ f 2 scaling. The older age brackets are both close to the constant population 1/ f scaling, resembling the theoretical expectations.(c)Inferred mutation rates increase linearly with age. (d) Simulations of slowly growing stem cell populations reveal that mutation rates appear to increase with age, although the true underlying per division mutation rate remaining constant.

Inference of evolutionary parameters on simulated stem cell populations.

Simulated populations were run up to age 59, growing exponentially from a single cell to constant size NM = 10000 at age tM = 5, with mutation rate μ = 1.2 and division rates λ = 5 and p = 0.4. Where sampling is mentioned, the sample size 89 was taken. a) The single-cell mutational burden distribution. The compound Poisson distribution (dashed line) matches the burden distribution when averaging over multiple independently evolved populations (filled curve). b) Distribution of estimated mutation rates from 10’000 individual simulations, obtained from burden distributions of the complete populations (blue) as well as sampled sets of cells (orange). Because the expected mutational burden distribution is unaltered by sampling, the expected estimate of the mutation rate from Eq. (5) remains unchanged: . However, sampling increases the noise on the observed burden distribution, which results in a higher error-margin of the estimate: . c) VAF spectra measured in the complete population (blue) and a sampled set of cells (orange). In contrast with the mutational burden distribution, strong sampling changes the shape of the expected distribution. A single simulation result is shown (diamonds) alongside the theoretically predicted expected values for both the total and sampled populations (Eqs. (1) and (6)) (dashed line) and the average across 100 simulations (solid line). d) Distribution of NM and p inference results for 100 simulated and sampled populations, through estimation of and from the single cell burden distribution and fitting the number of lowest frequency (1/S) mutations to the theoretical prediction (Eq. (1)).

Evolutionary inferences in single cell HSC data

(a) The single-cell mutational burden distribution of the data (bars) and the Compound Poisson distribution obtained from its mean and variance, used to obtain the estimated per division mutation rate . (b) distribution of mutation frequencies of the data and theoretically predicted average fitted to only the lowest frequency (1/S) data point. (c) Difference Δvf between measured value of the VAF spectrum at lowest frequency (1/S) and its prediction from Eq. (1), for varying total population size N and asymmetric division proportion p, with fixed maturation time tM = 5 and operational hematopoietic population size NH = 50. The solid line denotes the plane of best fit where this difference is 0. (d) Maximally inferred population size N (taking p = 0 in (c)) for variation of the maturation time tN and the operational hematopoietic population size NH.

Comparison of an exponential and logistic growth model.

(a) Population size over time for exponential (dotted) and logistic (dashed) growth functions. At the point of maturity (t = 20) the exponential function reaches the population capacity (NM = 10000), and the logistic function equals 0.99NM. The time points at which the VAF spectra are measured are indicated by solid vertical lines: t1 = 25 (blue) and t2 = 50 (orange). (b) Comparison of the expected VAF spectra, calculated with Eq. (1), in the exponential (dotted) and logistic (dashed) growth models measured at the time points t1 and t2. For reference, the theoretical predictions for an exponentially growing population without cell death and a constant population in the long time limit are shown as solid purple and green lines respectively.

Stochastic simulations of the Variance/Mean of the mutational burden distribution over time for a per cell division mutation rate of μ = 1.3 and varying stem cell population size N and asymmetric division probability p.Stochastic fluctuations are pronounced for small population size N and low asymmetric division probability p.

Likelihood of the Variance/Mean to be in the interval 3 < θ < 5 for a per division mutation rate of μ = 1.3.

Likelihood of the Variance/Mean to be in the interval 3 < θ < 5 for a per division mutation rate of μ = 3.

The standard deviation on the VAF spectrum increases for higher frequencies.

a) The VAF spectrum Vf averaged across 100 simulations of a population evolved according to the model described in section 1.1. The standard deviation from the mean is shown as the band Vf ±σf around the average.(b)The standard deviation across all simulations for each frequency f = 1/N, 2/N, …, scaled by the average spectrum Vf.

Evolutionary parameters appearing in the model system.

Evolutionary parameters appearing in the analytical derivations of the expected VAF distribution in the Moran and pure-birth models.