Measures of genetic diversification in somatic tissues at bulk and single cell resolution suggest sources of unknown stochasticity

  1. Department of Mathematics, Queen Mary University of London, United Kingdom
  2. Evolutionary Dynamics Group, Centre for Cancer Genomics and Computational Biology, Barts Cancer Centre, Queen Mary University of London, Charterhouse Square, London, United Kingdom EC1M 6BQ
  3. Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles, Belgium
  4. Group of Theoretical Biology, The State Key Laboratory of Biocontrol, School of Life Science, Sun Yat-sen University, Guangzhou, 510060 China

Peer review process

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Jennifer Flegg
    The University of Melbourne, Melbourne, Australia
  • Senior Editor
    Molly Przeworski
    Columbia University, New York, United States of America

Reviewer #1 (Public Review):

Summary:
Authors propose mathematical methods for inferring evolutionary parameters of interest from bulk/single cell sequencing data in healthy tissue and hematopoiesis. In general, the introduction is well-written and adequately references the relevant and important previous literature and findings in this field (e.g. the power laws for well-mixed exponentially growing populations). The authors consider 3 phases of human development: early development, growth and maintenance, and mature phase. In particular, time-dependent mutation rates in Figure 2d is an intriguing and strong result, and the process underlying Figures 3 and 4 are generally well-explained and convincing.

Notes & suggestions:
1. The explanation of Figure 2 in Lines 101 - 111 should be expanded for clarity. First, is Figure 2a derived from stochastic simulation (line 101 suggests) or some theoretical analysis? Second, the gradual transition from f-2 to f-1 is appreciated, but the shape of the intermediates is not addressed in detail. The power laws are straight lines, and the simulations provide curved lines -- please expand in what range (low or high frequency variants) the power law approximations apply.

Additionally, I do not understand the claim in line 108, that the transition is fast for low frequency variants, as the low frequency (on the left of the graph) lines are all close together, whereas the high frequency lines are far apart.

It would be helpful to reiterate in this paragraph that these power laws are derived based on exponentially growing populations and are expected to break down under homeostatic conditions.

2. The sample vs population (blue vs orange) in Figure 3 is under-explained. How is it that the mutational burden and inferred mutation rate in A and B roughly match, but the VAF distributions in C are so different? How was the sampled set chosen? Perhaps this is an unimportant distinction based on the particular sample set, but the divergence of the two in C may serve as a distraction, here.

3. The comparison of results herein to claims by Mitchell (ref. 12) are quite important results within the paper. I appreciate the note in the final paragraph of the discussion, and I suggest adding a sentence referencing the result noted in line 248-249 to the abstract, as well.

Reviewer #2 (Public Review):

Summary: The authors provide a nice summary on the possibility to study genetic heterogeneity and how to measure the dynamics of stem cells. By combining single cell and bulk sequencing analyses, they aim to use a stochastic process and inform on different aspects of genetic heterogeneity.

Strengths: Well designed study and strong methods

Weaknesses: Minor
Further clarification to Figure 3 legend would be good to explain the 'no association' of number of samples and mutational burden estimate as per line 180-182 p.8

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation