## Peer review process

**Not revised:** This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.

## Editors

- Reviewing EditorJennifer FleggThe University of Melbourne, Melbourne, Australia
- Senior EditorAlan MosesUniversity of Toronto, Toronto, Canada

**Reviewer #1 (Public Review):**

This paper can be seen as an extension of a recent study by two of the same authors [1]. In the previous paper, the authors considered two variants of the Moran process, labelled Model A and Model B, and examined differences between the evolutionary dynamics of these two models. They further described the site frequency spectra, expected allele counts, and expected singleton counts of these models, building on analytical results from prior studies, and used numerical simulations to investigate the models' evolutionary dynamics. Finally, they compared the site frequency spectra of the two models (using numerical simulations) to spectra derived from a small breast cancer data set (two sets of three samples).

In the new paper, the authors consider the same two Moran process variants (Model A and Model B) and some related branching processes. As before, they compare the site frequency spectra and various summary statistics of these models, but here they present only numerical simulations (except that some prior analytical results are summarized in Appendix A, which are never referred to in the main text and seem unconnected to the study). They then compare the site frequency spectra of these models (again using numerical simulations) to those derived from the same breast cancer samples as before and thus infer some evolutionary parameters.

The first main conclusion is that the critical branching process and the Moran process models behave similarly and generate similar site frequency spectra. This finding is unsurprising (indeed, the authors acknowledge that the result "has been expected"). For a reasonably large population size, the population size in the critical branching process has been shown to vary relatively little over time and the model is thus essentially a continuous time Moran process (see, for example, Equation 8.55 in ref 2). Nor is it surprising that the authors see stronger similarities when they select only the subset of branching process replicates in which the final population size is particularly close to the initial population size (this is because, in these replicates, the population size likely varies even less than usual).

The second main conclusion is that, although "the mutational SFS alone is not adequate" to quantify the strength of selection, "All fitted values for the selective disadvantage of passenger mutations are nonzero, supporting the view that they exert deleterious selection during tumorigenesis". Although the question of whether mildly deleterious mutations play an important role in cancer evolution is of considerable interest, it's debatable whether the results presented here help resolve the issue.

Many prominent researchers have called into question whether cancer evolutionary parameters can be reliably inferred from site frequency spectra (e.g., [3-7]), even using sophisticated statistical methods. The statistical approach used here (though not named as such in the paper) is a crude kind of approximate Bayesian computation. To improve the accuracy of the results, it would have been better to have set reasonably vague priors for the uncertain mutation rates, rather than fixing them arbitrarily. It would also have been better to have chosen a likelihood function explicitly based on an analysis of the sampling and error distributions, rather than just summing the absolute logged deviations. It is well known that "Checking the model is crucial to statistical analysis" and "A good Bayesian analysis, therefore, should include at least some check of the adequacy of the fit of the model to the data and the plausibility of the model for the purposes for which the model will be used" [8]. The authors' failure to describe any attempt to validate or check their model, using simulated data or otherwise, casts doubt on the reliability of their inferences.

Putting aside the potential biassing effects of sampling error, measurement error, and the limitations of the authors' statistical method, it is well established that both population growth and spatial structure profoundly alter the shape of site frequency spectra in ways that can mimic the effects of selection (e.g. [9-11]). Indeed, Figures 3, 4 and 5 show that the critical and super-critical branching processes generate markedly different site frequency spectra. It follows that if the population dynamics and spatial structure of the mathematical model used for inference don't match those of the biological process that produced the data then any inferred evolutionary parameter values will be unreliable. Breast cancer has two indisputable ecological features that shape its evolutionary dynamics: the cell population expands by many orders of magnitude from a single cell, and the population is spatially structured. In the authors' mathematical model, the population size is initially 100 cells and either remains constant or varies little, and there is no spatial structure. These profound mismatches between model and data cast further doubt on what is supposed to be the paper's most important biological finding.

In this paper the authors offer no justification for their decision to model breast cancer as a non-growing, non-spatial cell population. Nor do they engage with the extensive recent literature on the challenges of inferring evolutionary parameters from cancer site frequency spectra (they cite none of the many relevant papers listed at https://www.sottorivalab.org/neutral-evolution.html). Their 2022 paper [1] claims that, "it sometimes makes sense to consider cancer growth in the framework of constant-population models. Our models correspond to the situation in which a constant population of N "healthy" stem cells is gradually replaced by a growing clone of transformed cells with increasing fitness." No evidence was presented to support this hypothesis regarding breast cancer progression. On the other hand, a wealth of evidence supports the consensus view that, in breast cancer and other human solid tumours, the number of cells with unlimited proliferative potential is several orders of magnitude greater than 100 and grows over time (e.g. [12]).

Analytic expressions for the site frequency spectra with neutral mutations are already known. It is well known that the site frequency spectrum of an exponentially growing population has a tail following a power law S_k ~ k^(-2) [13, 14]. Similarly, it is known that for the critical branching process or the Moran process, the site frequency spectrum at equilibrium is S_k ~ k^(-1) [13, 15]. Especially noteworthy yet uncited studies that use those results about site frequency spectra to make inferences based on sequencing data include ref 16, in which selection is inferred, and ref 17, in which evolutionary parameters of constant populations (healthy cell populations) are inferred.

Although the paper is well written, the figures are ineffective in communicating the results. As others have put it, "A figure is meant to express an idea or introduce some facts or a result that would be too long (or nearly impossible) to explain only with words" and "If your figure is able to convey a striking message at first glance, chances are increased that your article will draw more attention from the community" [18]. On the contrary, Figures 3, 4, 5 and 6 are bewilderingly complicated, crowded, and repetitive. These figures comprise no fewer than fifty-six plots, each containing numerous curves or histograms, spread across four pages. To compare the results of different scenarios, the reader is presumably expected to put these figures side by side and try to spot the differences, hampered by inconsistent axis ranges, absence of axis labels, absence of titles, absence of legends, and unreliable captions ("cyan" seems to refer to pale blue, and "orange" to something closer to red). For example, the only notable difference between Figures 3 and 4 is in the shape of a single green curve in panel I. In the main text of a published paper, one would expect fewer, more carefully curated figures drawing attention to salient features, so that the reader can infer the main results with minimal effort. The rest can be put in supplementary figures.

In summary, this paper adds somewhat to our understanding of some standard mathematical models; whether it tells us anything new about cancer is open to debate.

References

(1) Kurpas, Monika K., and Marek Kimmel. "Modes of selection in tumors as reflected by two mathematical models and site frequency spectra." Frontiers in Ecology and Evolution 10 (2022): 889438.

(2) Bailey, Norman TJ. The elements of stochastic processes with applications to the natural sciences. John Wiley & Sons, 1964.

(3) Tarabichi, Maxime, et al. "Neutral tumor evolution?." Nature Genetics 50.12 (2018): 1630-1633.

(4) McDonald, Thomas O., Shaon Chakrabarti, and Franziska Michor. "Currently available bulk sequencing data do not necessarily support a model of neutral tumor evolution." Nature Genetics 50.12 (2018): 1620-1623.

(5) Balaparya, Abdul, and Subhajyoti De. "Revisiting signatures of neutral tumor evolution in the light of complexity of cancer genomic data." Nature Genetics 50.12 (2018): 1626-1628.

(6) Noorbakhsh, Javad, and Jeffrey H. Chuang. "Uncertainties in tumor allele frequencies limit power to infer evolutionary pressures." Nature Genetics 49.9 (2017): 1288-1289.

(7) Bozic, Ivana, Chay Paterson, and Bartlomiej Waclaw. "On measuring selection in cancer from subclonal mutation frequencies." PLoS Computational Biology 15.9 (2019): e1007368.

(8) Neher, Richard A., and Oskar Hallatschek. "Genealogies of rapidly adapting populations." Proceedings of the National Academy of Sciences 110.2 (2013): 437-442.

(9) Gelman, Andrew, et al. Bayesian data analysis (Third Edition). Chapman and Hall/CRC, 2014.

(10) Fusco, Diana, et al. "Excess of mutational jackpot events in expanding populations revealed by spatial Luria-Delbrück experiments." Nature Communications 7.1 (2016): 12760.

(11) Noble, Robert, et al. "Spatial structure governs the mode of tumour evolution." Nature Ecology & Evolution 6.2 (2022): 207-217.

(12) Lawson, Devon A., et al. "Single-cell analysis reveals a stem-cell program in human metastatic breast cancer cells." Nature 526.7571 (2015): 131-135.

(13) Gunnarsson, Einar B., Leder, Kevin, and Foo Jasmine. "Exact site frequency spectra of neutrally evolving tumors: A transition between power laws reveals a signature of cell viability" Theoretical Population Biology 142 (2021) 67-90

(14) Durrett, Richard "Branching Process Models of Cancer" Springer (2015)

(15) Durrett, Richard "Probability Models for DNA Sequence Evolution" Springer Science & Business media (2008)

(16) Williams, Mark J. et al. "Quantification of subclonal selection in cancer from bulk sequencing data." Nature Genetics 50 (6). 895-903 (2018)

(17) Moeller, Marius E. et al. "Measures of genetic diversification in somatic tissues at bulk and single-cell resolution" eLife (2024) 12:RP89780

(18) Rougier, Nicolas P., Michael Droettboom, and Philip E. Bourne. "Ten simple rules for better figures." PLoS Computational Biology 10.9 (2014): e1003833.

**Reviewer #2 (Public Review):**

Summary:

In this manuscript, the authors present a comparison of two models of cancer evolution with advantageous drivers and deleterious passengers: a fixed-population "Moran" model, and a "Branching Process" (BP) model with dynamic population size. The Moran model is more mathematically-tractable, but since cancer is a disease of uncontrolled growth, it is unclear to me how clinically-relevant it is to consider a model with constant population size. Intriguingly, both models can explain observed Site Frequency Spectrums (SFSs) in three breast cancers, which suggests that the Moran model may have some value. This distinction between the two models is addressed well.

Strengths:

The comparisons of the various BP models (extinction/non-extinction, and balanced/supercritical) are very interesting. The survivability of rare, fitness-disadvantaged clones has huge implications for treatment resistance in general - drug resistant clones are very often disadvantaged in the absence of drug. Clinical sequencing is, most decidedly, investigating population dynamics conditioned on non-extinction, however most published models do not condition on non-extinction - an unfortunate community oversight that this publication rectifies.

Site Frequency Spectrums in three breast cancers are measured with unprecedented resolution to my knowledge (allele abundances below one in a thousand).

Detailed description of the behavior of the various models.

Weaknesses:

I do not believe Moran B is a useful theoretical distinction between Moran A. Incorporating fitness effects into the birth process, instead of the death process, is generally mathematically equivalent when time is measured in generations (or cell divisions). Visible differences in the two models in Figures 2-6 by all accounts seem to be due to the fact that Moran B experiences more evolution in the balanced/driver-dominated case, and less evolution in the passenger dominated case. We generally do not use arbitrary time steps for this reason - we quantify time in 'generations'.