Reconstructing the in vivo dynamics of hematopoietic stem cells from telomere length distributions
Abstract
We investigate the in vivo patterns of stem cell divisions in the human hematopoietic system throughout life. In particular, we analyze the shape of telomere length distributions underlying stem cell behavior within individuals. Our mathematical model shows that these distributions contain a fingerprint of the progressive telomere loss and the fraction of symmetric cell proliferations. Our predictions are tested against measured telomere length distributions in humans across all ages, collected from lymphocyte and granulocyte sorted telomere length data of 356 healthy individuals, including 47 cord blood and 28 bone marrow samples. We find an increasing stem cell pool during childhood and adolescence and an approximately maintained stem cell population in adults. Furthermore, our method is able to detect individual differences from a single tissue sample, i.e. a single snapshot. Prospectively, this allows us to compare cell proliferation between individuals and identify abnormal stem cell dynamics, which affects the risk of stem cell related diseases.
https://doi.org/10.7554/eLife.08687.001eLife digest
Human cells die off regularly due to normal wear and tear, aging or injury. To replace these cells, humans maintain pockets of tissue specific stem cells that can develop into one of several different types of specialized cell. For example, stem cells in the bone marrow can develop into red blood cells, white blood cells or any of the other blood cell types. Unavoidably, over the course of a lifetime stem cells accumulate mutations that may cause them to become cancerous.
Researchers have learned a lot about stem cells by studying them under laboratory conditions. However, these studies cannot answer all the questions we have about human stem cells. As a result, human studies are needed; but frequently taking samples of stem cells from humans to assess them is impossible for numerous reasons, most importantly it is invasive and potentially harmful. Instead, researchers are looking for indirect ways to measure how stem cells grow.
Each time a cell divides, the protective ends of a chromosome – known as telomeres – get shorter. Now, Werner, Beier et al. have developed a mathematical model to assess human stem cell growth based on the length of the cells’ telomeres. This model can gauge the growth patterns of the stem cell populations in an individual based on a sample taken from a single tissue.
Werner, Beier et al. tested the model using telomere measurements from blood and bone marrow samples taken from 356 healthy people of different ages. The results suggest that the stem cell population that gives rise to blood cells (the hematopoietic stem cells) increases in size during childhood and adolescence, but levels off during adulthood. The model also revealed that patterns of stem cell growth vary among individuals. Further studies of telomere length differences may help scientists identify the abnormal (stem celllike) growth patterns associated with diseases like cancer.
https://doi.org/10.7554/eLife.08687.002Introduction
Homeostasis is in most mammalian tissues maintained by the occasional differentiation of infrequently dividing multipotent stem cells (Li and Clevers, 2010; Busch et al., 2015). These cells are involved in the formation, maintenance, renewal, and aging of tissues (Reya et al., 2001; Morrison and Kimble, 2006). Their longevity imposes the risk of the accumulation of multiple mutations that potentially induce aberrant stem cell proliferation and can ultimately cause the emergence of cancer (Hanahan and Weinberg, 2011). The quantification of aberrant stem cell properties in cancer is impeded by the lack of detailed information about the expected patterns of cell replication in healthy human tissues (Rossi et al., 2008; Vermeulen et al., 2013). Dynamic properties of stem cell populations in vivo are predominantly obtained from sequential experiments in animal models (Morrison and Spradling, 2008; Orford and Scadden, 2008). Unfortunately, these methods are mostly inapplicable to humans and to infer in vivo properties of human stem cell populations remains a challenge. Indirect methods, i.e. biomarkers that reflect the proliferation history of a tissue, may overcome these limitations (Greaves et al., 2006; Graham et al., 2011; Kozar et al., 2013). In the following, we combine data of telomere length distributions and mathematical modelling of the underlying dynamical processes to deduce proliferation properties of human hematopoietic stem cells in vivo.
Telomeres are noncoding repetitive DNA sequences at the ends of all eukaryotic chromosomes. In vertebrates, these sequences consist of hundreds to thousands of repeats of the nucleobase blocks TTAGGG (Griffith et al., 1999). Telomere repeats are progressively lost in most somatic cells with age, as the conventional DNA polymerase is unable to fully copy the lagging DNA strand of chromosomes during cell replication (Olovnikov, 1973). Short telomeres are associated with genetic instability (Hande, 1999; Feldser et al., 2003). They trigger DNAdamage checkpoint pathways and enforce permanent cell cycle arrest ( d'Adda di Fagagna et al., 2003). Thus, telomere length limits the replication capacity of somatic cells (Hayflick and Moorhead, 1961) and can indirectly act as a tumor suppressor (Kinzler and Vogelstein, 1997; Campisi, 2005). This effect can be attenuated by the enzyme telomerase, which tags additional TTAGGG repeats to the end of chromosomes by utilizing single stranded RNA templates (Greider and Blackburn, 1989). Telomerase is primarily expressed in compartments of stem and germ line cells, as well as in numerous tumors (Kim et al., 1994). However, telomerase expression levels are insufficient to prevent the progressive loss of telomere repeats in most healthy human tissues with age (Harley et al., 1990; Rufer et al., 1999). This net loss of telomere repeats during cell replication leads to a characteristic telomere length distribution that reflects the replication history of cells. Since telomere length dynamics is important for a number of genetic and acquired disorders (Hastie et al., 1990; Blasco, 2005; Calado and Young, 2009), it is critical to understand the underlying mechanisms of this fundamental process. We have developed a mathematical model that allows us to interpret data of telomere length shortening in hematopoietic cells obtained from 356 healthy humans. Most importantly, we can infer the patterns of stem cell behavior from the underlying telomere dynamics within individuals from a single tissue sample, i.e. a single snapshot.
Modelling telomere length dynamics
Our mathematical model recovers the temporal change of telomere length distributions in human hematopoietic cells with a minimal number of required model parameters. Since hematopoietic cells proliferate in a hierarchical organised tissue with slowly dividing stem cells at its root, such a model needs to connect properties of cell proliferation and telomere shortening. Telomere length can be assessed on three different levels of resolution, (i) the level of single telomeres, (ii) the level of single cells and (iii) the level of the tissue. Of course these levels are not independent, for example the knowledge of telomere length in all cells allows to obtain the (average) telomere length of a tissue. The processes that drive telomere length dynamics differ at these levels of resolution. Single telomeres are prone to stochastic events such as oxidative stress or recombination and thus may also shorten by effects independent of proliferation associated attrition (von Zglinicki, 2002; Antal et al., 2007). Healthy human cells contain 184 telomeres, four on each of the 46 chromosomes. Thus, the noise on the level of single telomeres becomes much smaller on the cell level. We capitalise on this and consider telomere length on the cell level in the following. Thus, the average telomere length of a cell shortens by a constant factor during each division. Such an approach might underestimate the number of senescent cells once telomeres become critically short, since it is the length of the shortest telomere rather then the average telomere length that triggers cell cycle arrest (Hemann et al., 2001). Our model is sensitive to the accumulation of cells in the state of cell cycle arrest and we can infer this effect experimentally from population wide telomere length distributions. However, this effect can likely be neglected during adolescence and adulthood, but might have important implications in some tumors, at old age or in conditions associated with abnormal telomere maintenance.
We further need to consider properties of a hierarchical tissue organization, where few slowly dividing stem cells give rise to shorter lived progeny. Although some of the progeny, particularly primitive progenitor cells, can be long lived and are able to maintain homeostasis without stem cell turnover for intermediate time intervals, eventually all non hematopoietic stem cells will be depleted without continuous stem cell turn over (Busch et al., 2015; Sun et al., 2014). Age dependent differences in telomere shortening across different lineages of hematopoiesis can only persist in the hematopoietic system if they occur on the level of the maintained selfrenewing cell population. Cells leaving the stem cell pool have an approximately constant number of cell divisions before they reach maturation (Takano, 2004; Werner et al., 2011). This shifts the distribution to shorter values of telomere length and consequently, the distribution of telomere lengths of mature cells is a good proxy for the distribution of telomere lengths in stem cells (RodriguezBrenes et al., 2013). We measured telomere length distributions in lymphocytes, granulocytes and bone marrow sections separately. This allows us to investigate the myeloid and lymphoid lineage of hematopoiesis independently.
In our model, we assume a population of initially ${N}_{0}$ stem cells. In the simplest case, each stem cell would proliferate with the same rate $r$ and the cell cycle time would follow an exponential distribution. However, tissue homeostasis requires continuous stem cell turn over in intermediate time intervals, therefore the proliferation rate of the population of stem cells is adjusted, such that a required constant output of differentiated cells per unit of time is maintained. In the simplest case of a constant stem cell population, the effective proliferation rate becomes $r/{N}_{0}$. However, in more complex scenarios, the number of stem cells could differ with age and the effective proliferation rate of stem cells $r/N\left(t\right)$ also becomes age dependent (Rozhok and DeGregori, 2015; Bowie et al., 2006). This resembles a feedback mechanism and results in an approximately Lognormal distribution of cell cycles, see also Equation S26 in Materials and methods for details. In addition, each stem cell clone is characterised by a certain telomere length (Antal et al., 2007; Simon and Derrida, 2008). This telomere length shortens with each stem cell division by a constant length $\Delta c$ and consequently the remaining proliferation potential is reduced in both daughter cells (Rufer et al., 1999; Allsopp et al., 1992). If the telomeres of a cell reach a critically short length, this cell enters cell cycle arrest and stops proliferation, reflecting a cell’s Hayflick limit (Hayflick and Moorhead, 1961). This can be modelled by collecting cells with the same proliferation potential in states $i$. A cell enters the next downstream state $i\to i+1$ after a cell division, see also Figure 1, as well as Equations S1,S14 in Materials and methods. Since the next cell to proliferate is chosen at random from the reservoir, cells progressively distribute over all accessible states with time (Olofsson and Kimmel, 1999). This corresponds to the problem of how many cells are expected in a state $i$ at any given time, which we denote by ${N}^{\left(i\right)}\left(t\right)$ in the following.
Results
The model predicts characteristic telomere length distributions for different ratios of symmetric and asymmetric stem cell divisions
The shape of the distribution of cells across cell cycles depends on the patterns of stem cell proliferation, for example the ratio of symmetric versus asymmetric divisions. An asymmetric stem cell division produces one stem and one nonstem cell (for example a progenitor cell that leaves the stem cell compartment). If we restrict the stem cells dynamics to only asymmetric divisions, the process results in a stem cell population of constant size and the number of cells in each state $i$ follows a Poisson distribution
A typical example of this distribution is shown in Figure 1—figure supplement 1 and details on the derivation can be found in Materials and methods, see Equation S1. Cells with maximum proliferation capacity (cells in state 0 in our model) are progressively lost and cells accumulate in the final state of cell cycle arrest by passing through all intermediate states.
Inferring the dynamics of distribution (1) from in vivo measurements requires sequential sampling and complicated cell sorting, which seems challenging in realistic clinical settings. On the other hand, the measured (observed) telomere length distribution corresponds to a single sample of the underlying Poisson process. The expected shape of this observed distribution is depicted in Figure 1g. It becomes a traveling wave that starts narrowly distributed around an initial telomere length and shifts towards shorter average telomere length with time. We have measured this distribution, which arises from our theoretical model, experimentally in many samples of granulocytes, lymphocytes and bone marrow sections of healthy adult humans, which we discuss in detail below.
In addition to asymmetric divisions, stem cells can undergo symmetric selfrenewal, which is a prerequisite for development, as it allows for a growing stem cell population. In our model, stem cells divide symmetrically with probability $p$ and asymmetrically with probability $1p$ respectively. In this situation, the number of stem cells is not constant, but increases with each symmetric stem cell selfrenewal. As a consequence, the expected distribution also changes and is now described by a generalised Poisson distribution (see Equation S14 in Materials and methods) given by
This distribution also leads to a traveling wave, but the maximum of the distribution decreases considerably slower compared to the case of purely asymmetric stem cell divisions. In the following, we refer to the model that is restricted to only asymmetric stem cell divisions as model 1 and denote the more general case of symmetric and asymmetric cell divisions as model 2.
Ideally, we would like to follow these traveling waves in individual healthy humans over time and compare this sequential data to the dynamics from our model predictions. Unfortunately, the time required to confirm our model across all ages would exceed the life expectancy of the authors. We therefore explored those properties of our analytical model that are directly testable in population wide data of telomere length. One such property is the change of the average telomere length with age, which we measure in a group of 356 healthy individuals.
The average telomere length decreases nonlinearly in the presence of symmetric stem cell selfrenewal
The average telomere length decreases in most human tissues with age (Harley et al., 1990). This is well known and has been confirmed numerous times. Surprisingly, less is known about the detailed dynamics of this decrease. We can derive the dynamics of the average telomere length from the telomere length distributions directly. The average telomere length corresponds to the expected value of the telomere length distribution (in the following denoted by $E\left[c\left(t\right)\right]$), see Equation S5 in Materials and methods for details. As the telomere length distribution changes with time, the average telomere length becomes time dependent naturally. In the absence of symmetric stem cell selfrenewal (model 1) the average telomere length $E\left[c\left(t\right)\right]$ is expected to decrease linearly
with age (denoted by $t$ in the equation above). More specifically, the average telomere length of cells of a particular type, e.g. the population of granulocytes or lymphocytes, shorten by a constant fraction each year. The dynamics changes once a significant fraction of cells enter cell cycle arrest, see Equation S9. The average telomere length transitions from a linear into a power law decline (when the average telomere length becomes very short) and the stem cell pool reaches the state of complete cell cycle exhaustion asymptotically. This transition would enable the identification of an age where a considerable fraction of stem cells enter cell cycle arrest, potentially a mechanism important in aging, carcinogenesis or bone marrow failure syndromes.
Furthermore, we calculated the variance of the underlying stochastic process. This gives us a measure for the expected fluctuation of the average telomere length in a population of healthy humans. We expect the variance to increase linearly in time in the absence of symmetric stem cell selfrenewal. Consequently, the standard deviation is proportional to the square root of age. Yet again, similar to the average telomere length, the dynamics of the variance changes once a significant fraction of cells enters cell cycle arrest. The variance starts to decrease and would reach zero, if all cells stopped proliferation.
The distribution of telomere length changes under the presence of symmetric stem cell selfrenewal (model 2). Accordingly, we expect a different decrease of the average telomere length. We find that the telomere length follows a logarithmic decay with age (see also Equation S19), given by
The average telomere length of a cell population shortens less with increasing age under the presence of symmetric selfrenewal, although the decrease of telomeric repeats per cell division (denoted by $\Delta c$ in Equation 4) is constant. This effect emerges naturally in our model due to the increasing number of stem cells with age. In a population with only few cells, each cell proliferation has a considerable impact on the average telomere length, while this impact diminishes in larger populations. If the stem cell population increases progressively, telomere shortening reduces on the tissue level with age.
In vivo measurements of telomere length suggest an increasing number of hematopietic stem cells during human adolescence
In order to test the predictions of our model experimentally, we have measured telomere length in lymphocytes and granulocytes in a cohort of 356 healthy humans with ages between 0 and 85 years. Our data includes 47 cord blood samples of healthy children and bone marrow biopsies of 28 patients with diagnosed Hodgkin lymphoma without bone marrow involvement. We assessed the average telomere length in all 356 samples with established FlowFISH protocols (Aubert et al., 2012; Baerlocher et al., 2006; Weidner et al., 2014; Beier et al., 2012). This reveals the population wide dynamics of telomere length and contains a significant number of cord blood samples that allow us to investigate differences in cell proliferation during adolescence and homeostasis in adulthood.
In addition, we have analyzed 28 blood samples of lymphocytes, 10 blood samples of granulocytes and 28 bone marrow biopsies with quantitativefluorescence in situ hybridisation (QFISH) (Beier et al., 2015; Varela et al., 2011; Zijlmans et al., 1997) (see Figure 2 and experimental methods for details). The averages of these samples correspond to the open symbols in Figure 3. From the full distribution, we obtain the telomere length distributions of single individuals and estimate personalised cell proliferation properties, e.g. the ratio of symmetric to asymmetric cell divisions as well as the rate of telomere shortening for each sample separately. We compare these personalised estimates to population wide telomere length to test the consistency of our results on two independent data sets.
In order to compare our model with the experimental data, we implemented standard maximum likelihood estimates for a regression analysis. Our experimental finding in adults (we only consider persons of 20 years or older) show that telomere length in granulocytes and lymphocytes decreases approximately linearly with age on the population level. In both cell populations the telomere length of adults decreases with $50\pm 5\phantom{\rule{3.26212pt}{0ex}}\mathrm{bp}/\mathrm{year}$ (we state the maximum likelihood estimate and the 95% confidence interval). If for example a cell looses on average 50 bp telomeric repeats per cell division (Rufer et al., 1999), this implies approximately 1 replication per year for the hematopoietic stem cells. This agrees with the observation of rare stem cell turnover under homeostasis (Busch et al., 2015; Sun et al., 2014; Dingli et al., 2006).
However, the assumption of strictly asymmetric cell divisions (model 1) fails to explain the pronounced loss of telomere repeats in infants (prediction of model 1 for the initial telomere length in lymphocytes: $9.8\pm 0.15\phantom{\rule{3.26212pt}{0ex}}\mathrm{kbp}$, measured average initial telomere length: $10.67\pm 0.4\phantom{\rule{3.26212pt}{0ex}}\mathrm{kbp}$, similar results for granulocytes, see also Figure 4 for a comparison of model 1 and model 2). This discrepancy can be resolved by introducing an interplay of symmetric and asymmetric stem cell divisions (model 2) that allows for an increasing number of stem cells. In this situation, the proliferation rate of stem cells becomes age dependent and our model predicts that at the youngest ages, when the number of stem cells is lowest, telomere loss is most pronounced. Maximum likelihood estimates of our general mathematical solution (Equation 4) to the telomere length data on the population level (see Figure 3) reveals for the parameter controlling average loss of telomere length in lymphocytes a value of $75\pm 7\phantom{\rule{3.26212pt}{0ex}}\mathrm{bp}/\mathrm{year}$, an initial telomere length of $10.4\pm 0.2\phantom{\rule{3.26212pt}{0ex}}\mathrm{kbp}$ and a probability for symmetric stem cell selfrenewal of $0.35\pm 0.07$. In granulocytes we find a value of telomere loss of $68\pm 5\phantom{\rule{3.26212pt}{0ex}}\mathrm{bp}/\mathrm{year}$, an initial telomere length of $10.2\pm 0.3\phantom{\rule{3.26212pt}{0ex}}\mathrm{kbp}$ and a probability for symmetric stem cell selfrenewal of $0.44\pm 0.2$. This probability accounts for the increased loss of telomere repeats in infants and substantially improves the prediction of the initial average telomere length. In addition to our group of 356 healthy humans, we have tested our hypothesis in an independent data set of 835 healthy humans, previously published by an unrelated group in (Aubert et al., 2012), see Figure 3—figure supplement 1. This set confirms our parameter estimations, in particular the accelerated decrease of average telomere length during adolescence is also observed.
Our model suggests that the increased loss of telomere repeats in the first years of human life is a consequence of an expanding stem cell population. This expansion is combined with a reduction in proliferation rates of single stem cells. The loss of telomere repeats during cell replication has a more pronounced impact on the average telomere length within a small cell population and diminishes in large stem cell populations. This explains the increased loss of telomeric repeats during adolescence (see Figure 4) naturally as a consequence of growth by an expanding stem cell population. Similarly, a sudden accelerated loss of telomeric repeats in aged individuals could point towards an insufficient stem cell selfrenewal. This might provide a promising direction for further investigations with an extended data set of sufficiently high resolution in aged individuals.
Proliferation properties of stem cells differ during adolescence and adulthood
Our analytical model is consistent with population wide telomere length data. It shows that symmetric stem cell selfrenewals are more frequent in adolescence and their effect on the dynamics of average telomere length reduces with age. However, how robust are our conclusions under variation of model parameters or a change of cell proliferation properties with age? One possibility to address these problems is the implementation of Bayesian inference methods (Dempster, 1968). In a nutshell, such methods draw a random set of model parameters either from an uninformed (objective) or informed (subjective) prior distribution and produce independent realizations of the model. These realizations are compared to some (appropriate) data of interest and fits with a predefined statistic significance are retained while unsatisfactory realizations are rejected. Originally developed for phylogenetic tree reconstruction, such methods are increasingly used in other applications (Marjoram and Tavaré, 2006). Bayesian inference methods allow to quantify the uncertainty in an analysis by providing posterior distributions of model parameters.
In the following we implement an Approximate Bayesian Computation (ABC) rejection sampling framework (Csilléry et al., 2010) on the data presented in Figure 3. We derive posterior distributions for our three free model parameters, the initial telomere length $c$, the relative decrease of telomere length per time $\Delta cr/{N}_{0}$ and the probability of symmetric stem cell divisions $p$. We draw these variables independently from uniform (uninformed) distributions and test $1{0}^{9}$ independent realizations of our mathematical model 1 and model 2. We seek parameter regimes that maximize the coefficient of determination ${R}^{2}$ between Equation 3 (model 1) or Equation 4 (model 2) and the average telomere length presented in Figure 3. We discard any parameter combination below a threshold. We perform the same analysis independently on the data set of granulocytes and lymphocytes.
In both cases, we find localized posterior parameter distributions. For lymphocytes, parameters peak at $\Delta cr/{N}_{0}=0.071\pm 0.005\phantom{\rule{0.3em}{0ex}}\mathrm{kbp}/\mathrm{year}$, $c=10.41\pm 0.3\phantom{\rule{0.3em}{0ex}}\mathrm{kbp}$ and $p=0.32\pm 0.2$, see Figure 5c–e. Only a small parameter range explains the exact patterns of telomere shortening. We find approximately 70% of stem cell divisions are asymmetric and 30% are symmetric selfrenewals. This stochastic approach confirms the results of the nonlinear model fits using a standard maximum likelihood approach that were discussed in the previous section, but provides further information on the distribution of our parameters.
The previous analysis assumes a fixed set of parameters for the dynamics of telomere shortening for all ages. In principle, these parameters could also change with age. To see if we can identify ages with different stem cell proliferation parameters, we investigated a third model that allows for successive phases of stem cell dynamics with independent parameter sets for each phase. We consider an additional parameter ${t}_{T}$, which corresponds to a transition time. We perform the above Bayesian approach independently for each random partition of the data set. This approach suggests at most two separate phases, with a transition between the 6th and 7th year of life for lymphocytes, see Figure 5f–i, and a transition between the 10th and 15th years of life for granulocytes, see Figure 5j–m. In infants and the first years of life, the probability of stem cell selfrenewal shows a significant variance (Figure 5). However, the data resolution is insufficient for this short time window to provide reliable parameter estimates. The probability of symmetric stem cell selfrenewal in adults however is in the the range of $p\in \left(0,0.2\right)$. This is lower as was predicted by the regression analysis across all ages. This suggests a reduction in the selfrenewal probability of stem cells after adolescence and points towards an either slower growing or constant stem cell population in adults. This may reflect selection for an optimal stem cell population size to minimize the risk of cancer initiation as suggested in theoretical studies before (Michor et al., 2003).
Next, we aimed to test which of the three models explains the data best, considering the complexity of the models. We therefore utilise the likelihood estimates of the former subsection and perform a model selection based on the Akaike information criterion (AIC) (Burnham, 2004). Model 1 scores with an AIC of 2550, model 2 has an AIC of 2328 and a multiphase model with a minimum of 7 parameters yields an AIC of 2361. The AIC is minimized by model 2. Based on this approach, model 1 as well as a multiphase model can be rejected as more likely explanations for the telomere length shortening presented in Figure 3 (given the above numbers and according to standard procedures, the relative likelihood of model 1 to better explain the data compared to model 2 is assumed to be $p\approx 1{0}^{48}$, the relative likelihood of the multiphase model to better explain the data compared to model 2 is assumed to be $p\approx 1{0}^{8}$). This selection is robust under the choice of different statistical methods. For example, a BIC approach selects the models in the same order.
A single sample of the telomere length distribution can inform about stem cell dynamics
The actual stem cell population sizes and their dynamics do not only vary with age, but also between individuals. This has immediate consequences on the susceptibility of individuals towards certain diseases (Calado and Young, 2009; Brümmendorf and Balabanov, 2006) and could potentially be used in individualised treatment strategies. Our model describes the telomere length distributions in individuals and quantifies three parameters, i.e. initial telomere length, increase of stem cell pool size and stem cell replication rates of an individual from a single tissue sample. We therefore extended our experimental protocols to further test our theoretical results. First, we measured single telomere signals of peripheral blood sorted for lymphocytes in 28 individuals and sorted for granulocytes in 10 individuals by quantitative confocal FISH in addition to the average telomere length that is provided by flow FISH. Second, we investigated the telomere length distribution in paraffinembedded bone marrow sections of an additional cohort of 28 healthy individuals using quantitative confocal FISH (Beier, 2005), see Figure 2. We compare our general telomere length distribution that allows for any ratio of symmetric and asymmetric stem cell divisions (model 2) to the data set of all 66 individuals. Cases of four representative individuals are shown in Figure 6. All cases can be found in Figure 6—figure supplements 1–3 and all individual cell proliferation properties as well as quality of fits are summarised in Supplementary file 1. The average telomere length of these 66 distributions are shown as open symbols in Figure 3.
The fits of our calculated distribution (see Equation S15 for the distribution and Equation S29 for details on the fitting procedure) reveal substantial differences in initial telomere length, increase of stem cell pool size and stem cell replication rates between the 66 individuals, but also between granulocytes, lymphocytes and bone marrow samples. We find a low probability of symmetric selfrenewal ($p$ between 0.005 to 0.03 per cell division) in all individual samples. This agrees with our results on the average telomere length shortening in adults at the population level and supports our observation of an approximately maintained active stem cell number in individuals after adolescence. Also the average telomere loss per year varies between individuals and ranges from 18 bp/year to 110 bp/year. However, the averages of all individual parameter sets agree with the estimated proliferation properties inferred from the population wide data of telomere length. We find differences between individual samples of lymphocytes and granulocytes. While the loss of telomeric repeats slows down with age in granulocytes, it slightly accelerates in lymphocytes, see Figure 7. These cells represent the myeloid and lymphoid lineage respectively. In our model, such a reduced rate of telomere loss can be explained with an increased reservoir of myeloid specific stem and progenitor cells and is in agreement with a skewed differentiation potential towards the myeloid lineage of aged hematopoietic stem cells (Geiger et al., 2013).
Discussion
Our knowledge about the dynamics of tissue specific stem cells comes mostly from lineage tracing experiments in transgenic mouse models. They provided insights into many aspects of tissue formation and maintenance, e.g. the intestinal crypt, but also the hematopoietic system (Busch et al., 2015; Sun et al., 2014; Itzkovitz et al., 2012). However, there is variation between different transgenic mouse models and their significance for human stem cell properties remains a challenging question. In some cases, clonal lineages can be traced by naturally occurring somatic mutations, e.g. particular mtDNA mutations in human intestinal crypts (Baker et al., 2014). However, the in vivo dynamic properties of human hematopoietic stem cells remain poorly characterized.
Here, we have utilized telomere length distributions of hematopoietic cells as a biomarker that contains information about the proliferation history of cells. We developed a mathematical model that allows us to infer dynamic properties of stem cell populations from data of telomere length distributions. These properties were analyzed in different cell types, e.g. lymphocytes, granulocytes and bone marrow sections of individuals of different ages. These calculated distributions describe the change of telomere length within the human population. The expected changes with age were confirmed in a representative group of 356 healthy individuals and the conclusions are consistent with our individualized parameter estimations.
The population wide data of average telomere length reveals different stem cell properties in adolescence and adulthood. Telomere length decrease is logarithmic and occurs at a faster rate during adolescence, suggesting a stem cell pool expansion in the first years of human life compatible with growth. This decrease becomes almost linear in adults and is in line with an approximately constant stem cell population. It is an interesting question why the number of stem cells would reach a certain targeted size. This could be simply because of spatial constrains in the bone marrow. Yet, from an evolutionary perspective, intermediate stem cell pool sizes were suggested to minimize the risk of cancer initiation (RodriguezBrenes et al., 2013; Michor et al., 2003). Such an optimization requires feedback signals that ensures the maintenance of an intermediate sized stem cell population, feedback signals that might be prone to (epi)genetic change and potentially are involved in cancer and ageing.
It is still a debated question if stem cells in mammals are maintained by predominantly asymmetric divisions, or alternatively by a population strategy of balanced symmetric selfrenewal and symmetric differentiation. While the former strategy can be implemented on the single cell level, the latter strategy would require further feedback signals. From a modelling perspective, a population strategy of symmetric selfrenewal and symmetric differentiation was suggested to minimize the clonal load within a stem cell population (Shahriyari et al., 2013). On the other hand, experimental findings seem to point towards predominantly asymmetric divisions, but this might also differ across tissues (Morrison and Kimble, 2006). In our model, the stem cell pool is maintained by asymmetric cell divisions. A balance of symmetric and asymmetric cell divisions would on average result in the same telomere length dynamics and thus would be indistinguishable from asymmetric divisions on the population level, only the interpretation of $p$, the probability of symmetric selfrenewal would change in this case. Yet, the variance of the distribution would be expected to increase under the presence of symmetric differentiation and symmetric selfrenewal. However, likely this effect is weak compared to the measurement related noise of telomere length.
Our method quantifies the parameters of telomere dynamics from a single blood sample or paraffinembedded tissue samples of an individual. It is independent of any particular tissue organization and thus can be applied, in principle, to any tissue. This general method will be of particular interest to distinguish stem cell dynamics in healthy and sick individuals. We expect characteristic changes in telomere length distributions in certain (hematopoietic) stem cell disorders such as chronic leukemias (Braig et al., 2014) and bone marrow failure syndromes (Calado and Young, 2009; Beier, 2005). Therefore, our model can serve as a tool to infer stem cell dynamics in vivo retrospectively and prospectively from a single tissue sample. Such an approach can not only increase our understanding of disease dynamics but may also contribute to personalized disease diagnosis and prognosis in the future.
Materials and methods
Patients
Peripheral blood of 309 healthy blood donors was obtained from the blood donor bank in Aachen. QFISH of peripheral blood cytospins was performed on 28 healthy blood samples. 47 cord blood and blood samples from healthy children and adolescents were obtained from the Department of Pediatrics and Neonatology of the University Hospital of Aachen. Bone marrow biopsies of 28 patients with diagnosed Hodgkin lymphoma without bone marrow involvement were used for bone marrow analysis. All samples were taken with informed consent and according to the guidelines of the ethics committees at University Hospital Aachen.
FlowFISH
Request a detailed protocolThe FlowFISH technique provides the mean telomere length per nucleus. FlowFISH was carried out according to previously published protocols (Aubert et al., 2012; Baerlocher et al., 2006; Weidner et al., 2014; Beier et al., 2012). Briefly, after osmotic lysis of erythrocytes with ammonium chloride, white blood cells were mixed with cow thymocytes. Cells were hybridized with FITC labeled, telomere specific (CCCTAA)3 peptide nucleic acid (PNA) probe (Panagene) and DNA was counterstained with LDS 751 (Sigma). FACS analysis was carried out on Navios or FC500 (both Beckman Coulter). Thymocytes, lymphocytes and granulocytes subsets were identified based on LDS571 staining and forward scatter. Mean telomere length was calculated by subtracting the unstained autofluorescence value of the respective lymphocyte, granulocyte or thymocyte subpopulation. Cow thymocytes with a determined telomere length were used as an internal control to convert telomere length in kilobase (kb). All measurements were carried out in triplicate.
QuantitativeFluorescence in situ hybdridsation (QFISH)
Request a detailed protocolQFISH offers the possibility to analyze the distribution pattern of individual telomeres. For cytospins of peripheral blood cells, erythrocytes were lysed using ammonium chloride (Stem cell Technologies, Vancouver, British Columbia, Canada) and 50,000 cells were centrifuged for cytospin. Cells were fixed with 70% ethanol solution for 30 seconds and air dried for 15 min. Bone marrow sections were deparaffinized with xylol and rehydrated with ethanol following standard protocols. Deparaffinized bone marrow tissue sections, metaphases and peripheral blood cells were processed following previously published protocols (Beier et al., 2015; Varela et al., 2011; Zijlmans et al., 1997). After initial washing with PBS, slides were fixed in formaldehyde (Sigma) (4%) in PBS for 2 min. Slides were further washed (three times for 5 min) with PBS followed by dehydration with ethanol and air drying for 30 min. Hybridization mixture containing 70% formamide (Sigma), 0.5% Magnesium chloride (Sigma), 0.25% (wt/vol) blocking reagent (Boeringer) 0.3 μg/ml Cy3conjugated (C3TA2)3 peptide nucleic acid probe (Pnagene), in 10 mM Tris (pH 7.2, Sigma) was added to the slide. After adding a coverslip; DNA was denatured for 3 min at 85°C. Hybridization was carried out for 2 h at room temperature. After washing the slides twice with 70% formamide/10 mM Tris (pH 7.2)/0.1% bovine serum albumin (BSA), slides were washed again (three times for 5 min) with 0.05 M Tris/0.15 M NaCl (pH 7.5) containing 0.05% Tween20. After dehydration with ethanol slides were air dried and stained with PBS containing 0.1 ng/ml of 4’6diamidino2phenylindole (DAPI) for 5 min. After mounting the cells (Vectashield, Vectorlabs), a coverslip was added.
Image analysis
Request a detailed protocolConfocal microscopy analysis was carried out at a Leica TCSsp5 confocal microscope (Leica). Images were acquired at 63x magnification and 1.52.0 digital zoom. Multitracking mode was used to acquire images. Stacks of DAPI and Cy3 staining were taken with a step size of 1 μm. Peripheral blood cells and bone marrows were captured including five steps (zrange 4 μm). Maximum projection of the images was carried out and Definiens XD 1.5 image analysis software (Definiens GmbH) was used for quantitative image analysis. Nucleus and telomere detection was carried out based on DAPI and Cy3 intensity patterns. A valid image analysis was assumed in case of a correct detection of 90% of all visible telomeres. All image analysis was carried out singleblinded. Individual telomere signals were calculated after subtraction of the mean background value per detected nucleus. For bone marrow section and peripheral blood cells, values of all detected telomeres were used for analysis. Paraffin embedded lymphocytes of three healthy donors and granulocytes of a patient with chronic myeloid leukemia with a determined telomere length were used as controls for bone marrow biopsies. Linear regression of the control cells was carried out to convert telomere length from arbitrary units to kb. Telomere length in kb of the QFISH analysis of peripheral blood cells was calculated based on the linear regression of the corresponding FlowFISH values.
Mathematical model of telomere length dynamics
We assume a finite number of $1+c$ accessible telomere states of stem cells, where each state $i$ contains cells of equal average telomere length. Initially, ${N}_{0}$ cells are in state $0$ and cells will progressively enter downstream states after cell divisions. An asymmetric division of a cell in state $i$ leads to one more differentiated cell (more committed within a hierarchically tissue organization) and one stem cell. The committed (progenitor) cell leaves the pool of stem cells and does not further contribute to dynamics in the stem cell population. The second cell keeps the stem cell properties and enters state $i+1$, reflecting the shortening of its telomeres by a length of $\Delta c$. Similarly, a symmetric cell division results in two stem cells, both entering the next subsequent state. In our model, stem cells divide symmetrically with probability $p$ and asymmetrically with probability $1p$, respectively. A cell in state $c$ enters cell cycle arrest and cannot reach subsequent states  the next proliferating cell is randomly chosen amongst all cells not yet in state $c$.
Stochastic simulations
Request a detailed protocolWe implement individual based stochastic simulations of our telomere model. We initialize our program with ${N}_{0}$ cells in state $0$. The next cell to proliferate is chosen randomly amongst all cells not yet in state $c$. If a cell is chosen, we draw a random number $\xi \in \left[0,1\right]$. If $\xi >p$, one cell enters the next subsequent compartment (corresponding to an asymmetric cell division). If $\xi \le p$, two cells enter the next subsequent compartment (corresponding to a symmetric stem cell division). In both cases, the mother cell is removed. Iterating over many cell divisions leads to a distribution of cells amongst the accessible $1+c$ cell cycle states. Recording the temporal change of the distribution allows us to infer further properties of interest such as the time dependence of the average and the variance of the distribution. All simulations are implemented in C++, and are analyzed and visualized in Mathematica 10.0 and R 3.2.1.
Asymmetric cell divisions
Request a detailed protocolWe first discuss the telomere length dynamics under asymmetric cell divisions (corresponding to $p=0$ and called model 1 in our further notation). We call ${N}^{\left(i\right)}\left(t\right)$ the number of cells in state $i$ at time $t$. We further choose the initial condition ${N}^{\left(0\right)}\left(0\right)={N}_{0}$. Asymmetric cell divisions strictly conserve the size of the cell pool ${\sum}_{i=0}^{c}{N}^{\left(i\right)}\left(t\right)={N}_{0}$. We apply a deterministic, time continuous approximation of the underlying stochastic process and capture the average dynamics of telomere shortening by a system of coupled differential equations,
Here, $r$ represents the proliferation rate of a cell. Cells move towards higher states progressively and accumulate in state $c$, where they enter cell cycle arrest.
The general solution of (Equation S1) can be derived recursively and is given by
The number of cells in states $i<c$ resembles a truncated Poisson distribution with rate parameter $\frac{r}{{N}_{0}}$ and shape parameter $i$. Figure 1g shows a comparison of solution (Equation S2) to exact individual based stochastic computer simulations. The number of cells in state $0$ decreases exponentially. Cells in states $i=1,\dots ,c1$ are initially absent, undergo a maximum and vanish in the long run again. Only cells in state $c$ accumulate over time.
Inferring distribution (Equation S2) from in vivo data requires several blood samples at sequential time intervals. A single measurement of the telomere length distribution at time ${t}^{\prime}$ corresponds to the interception points of a vertical line, drawn at time ${t}^{\prime}$, and the number of cells in every state in the model given by Equation S2. Thus, the observed distribution at time ${t}^{\prime}$ in Figure 1g is given by
This distribution becomes a traveling wave that shifts towards shorter average telomere length in time, see Figure 1—figure supplement 1. The maximum of this wave reaches state $i$ after time ${t}_{\mathrm{max}}^{\left(i\right)}=\frac{i{N}_{0}}{r}$. Plugging this into Equation S2, we find for the maximum of this traveling wave
where we applied Stirling’s formula. The most abundant telomere length declines proportional to $\frac{1}{\sqrt{{t}_{\mathrm{max}}}}$ in time if cells undergo asymmetric cell divisions only.
Next we calculate the time dependence of the average telomere length $E\left[c\left(t\right)\right]$. This corresponds to the first moment of the distribution (Equation S2), given by
where cells in state c do not contribute. To calculate this sum we first note that the upper incomplete gamma function is defined as $\Gamma \left[1+c,\frac{rt}{N}\right]={\int}_{\frac{rt}{N}}^{\infty}dx\phantom{\rule{2.61018pt}{0ex}}{x}^{c}{e}^{x}$, but can also be represented by incomplete exponential sums $\Gamma \left[1+c,\frac{rt}{N}\right]=c!{e}^{\frac{rt}{N}}{\sum}_{i=0}^{c}\frac{1}{i!}{\left(\frac{rt}{N}\right)}^{i}$. If we set $x=\frac{rt}{N}$, we can write
the second term is
and thus we have
In the last step we used the property of the upper incomplete gamma function $\frac{\partial}{\partial x}\Gamma \left[n+1,x\right]={x}^{n}{e}^{x}$. Collecting all terms in Equation S5 again gives
The expression for the average telomere length (Equation S9) simplifies significantly for certain parameter regimes. For example for the hematopoietic system in humans we expect ${N}_{0}$ at least to be in the order of a few hundred of cells and $c$ is strictly larger than zero. Thus the first term in Equation S9 is very small and negligible. The second term is dominated by the linearly decaying term, as the incomplete gamma function is $\Gamma \left[1+c,\frac{rt}{{N}_{0}}\right]\approx c!$ for $t\ll r/{N}_{0}$, i.e. sufficiently small $t$. Thus in this situation expression (S9) is well approximated by
until only few cells have reached state $c$. The linear approximation Equation S10 is excellent, until most cells reach states of very short telomeres. In the situation of critically short telomeres, the full solution (Equation S9) has to be used and the average telomere length reaches zero asymptotically.
Our approach allows us to calculate additional properties of the system. The knowledge of the exact distribution enables us to derive all moments of the distribution. For example, we can derive analytical expressions for the time dependence of the variance ${\sigma}^{2}\left(t\right)$. First note, that the moment generating function for the distribution (Equation S2), ${M}_{c}\left(z\right)=E\left[{e}^{cz}\right]\left(t\right)$, is
We recover the average (Equation S9) of the telomere length distribution via $E\left[c\left(t\right)\right]=\frac{\partial}{\partial x}\left({M}_{c}\left(0\right)\right)$. The variance can be calculated via
Again, the first term of Equation S12 is negligible for a biological meaningful parameter range. The quadratic term ${\left(crt/{N}_{0}\right)}^{2}$ is compensated by an identical term in ${E}^{2}\left[c\left(t\right)\right]$ (see Equation S9). Again, the gamma function is approximately equal to $c!$ for sufficiently small times. Thus, expression (Equation S12) is initially dominated by the linear term and consequently, the variance grows linear as ${\sigma}^{2}=\frac{rt}{{N}_{0}}$. The standard deviation increases in time as
The linear approximation of the variance is excellent. Only if cells start to accumulate in state $c$ (cell cycle arrest) the variance decreases.
Symmetric cell divisions
Request a detailed protocolIn the following, we modify the system of differential Equation S1 (model 1) to incorporate symmetric stem cell divisions (model 2). We assume a cell division to be symmetric with probability $p$ and asymmetric with probability $1p$ respectively. Note that the number of stem cells is not constant but increases due to symmetric cell divisions. Initially there are ${N}_{0}$ cells with telomeres of length $c$. We assume a number of stem cell divisions that is constant within a fixed time interval, reflecting the necessity to produce a fixed number of differentiated cells during a unit of time. However, time intervals between stem cell divisions remain stochastic in the individual based model. As a consequence, the stem cell pool increases linearly in time, ${N}_{p}\left(t\right)={N}_{0}+rpt$. Thus, the system of differential equations changes to
The solution to this system of differential equations is
where we used ${t}^{*}=\frac{rp}{{N}_{0}}t+1$ as an abbreviation. Using l’Hopital and ${e}^{x}=\underset{n\to \infty}{lim}(1+\frac{x}{n}{)}^{n}$ we recover the Equation S2 for $p\to 0$ and the solution turns into a Poisson distribution again,
Note that we assumed a constant number of cell divisions within a fixed time interval. Due to the increasing stem cell pool size, this effectively causes a reduction in the proliferation rate of individual stem cells with age.
Similar to the former subsection, the time dependence of the maximum of the distribution can be calculated for $i=1,\dots ,c1$. The time until the maximum of the telomere length distribution reaches length $i$ becomes
The time to reach the maximum increases exponentially in $i$ for symmetric cell divisions, in contrast to the linear increase for only asymmetric cell divisions. However, Equation S17 reduces to the result we obtained in the former subsection in the limit $p\to 0$. The cell count at the maximum becomes
The maximum decreases considerably slower with $i$ (given the same initial size of the stem cell pool) compared to the case of only asymmetric cell divisions Equation S4, where we have used Stirling’s formula for the approximation. Similar to the former subsection we can calculate the average of the telomere length distribution. This time the average becomes
with ${t}^{*}=\frac{rp}{{N}_{0}}t+1$ and $\rho =\frac{1+p}{p}$. Similar to Equation S9, this expression is dominated by the second term of the equation. The average decreases approximately logarithmically for sufficiently small $t$,
The temporal decrease of the average telomere length speeds up with decreasing $p$. In the limit $p\to 0$, we recover the result (Equation S10) of a linear decreasing average. Similar to the former section we can derive the variance of the distribution, using the moment generating function ${M}_{p}\left(x\right)={E}_{p}\left[{e}^{cx}\right]\left(t\right)$, via
However, the result becomes less accessible and informative. Thus we restrict ourselves to a numerical solution of Equation S21. The logarithmic decay of the average telomere length has consequences on the interpretation of experimental results of telomere length distributions. In infants an accelerated decrease of telomere length can be observed. This can be explained immediately by an expanding stem cell pool. The stem cell pool contains only a few ${N}_{0}$ stem cells initially (newborns). These stem cells divide symmetrically with probability $p$ and asymmetrically with probability $1p$ respectively. The symmetric cell divisions cause an increase of the stem cell pool size and an indirect decrease in cell proliferation rates. The logarithmic decay is pronounced initially, but flattens after some time (as the number of stem cells increases). Thus, in adults the logarithmic decay is difficult to distinguish from a linear decay, see for example Figure 4 in the main text.
Connections to the Normal and LogNormal distribution
Request a detailed protocolThe number of cells in each state $i$ follows a Poisson distribution
in the case of only asymmetric stem cell divisions, see Equation S2 for details. We introduce $x=\frac{rt}{{N}_{0}}$, and upon normalisation (S22) becomes
where $x$ is a Poisson distributed variable. For $x$ sufficiently large, this random variable is well described by a normal distribution and we have x ∝ Normal distribution.
If we allow for symmetric cell divisions, cells in state $i$ followed a generalised Poisson distribution
see Equation S15 for details. Choosing $y=\frac{rpt}{{N}_{0}}+1$ and neglecting normalisation factors we can write
If we change variables again and choose $y={e}^{x}$, Equation S25 becomes
As $x=\frac{rt}{{N}_{0}}$ is approximately normally distributed, and $y={e}^{x}$, $y=\frac{rpt}{{N}_{0}}+1$ follows a Lognormal distribution.
Parameter evaluation for the average telomere length on population level by Bayesian inference method
We implement Approximate Bayesian Computation (ABC) rejection samplings to derive posterior parameter distributions for the predicted average telomere length under asymmetric (model 1, Equation S10) and combined symmetric and asymmetric (model 2, Equation S20) cell proliferations respectively. Utilizing Equation S10, we have to infer two parameters: (i) the average decrease of telomere length per time $r/{N}_{0}$ and (ii) the initial telomere length $c$. In the case of Equation S20 a third variable has to be determined: (iii) the probability of symmetric cell divisions $p$. We draw these variables independently from uniform distributions (prior) with ranges $r/No\in [0,0.2]\frac{\mathrm{kbp}}{\mathrm{year}}$, $c\in \left[7,15\right]\phantom{\rule{0.3em}{0ex}}\mathrm{kbp}$ and $p\in \left[0,1\right]$ and produce $5\times 1{0}^{8}$ independent realizations of Equation S10,S20. We calculate the coefficient of determination ${R}^{2}$ between each of these realizations and the average telomere length from a data set of 356 healthy individuals (see for example Figure 1 in the main text) via
Here, $y\left({t}_{i}\right)$ denotes, the measured telomere length of an individual with age ${t}_{i},\overline{y}$ is the average measured telomere length of the population and $E\left[c\right]\left({t}_{i}\right)$ the value of a single realization of Equation S10 or Equation S20 at time ${t}_{i}$ given the random set of parameter values. We seek parameter regimes that maximize ${R}^{2}$ and discard any parameter combination below a certain threshold.
Bayesian parameter evaluation for asymmetric cell divisions
Request a detailed protocolFor a linear fit according to Equation S10 with 2 parameters we find ${R}_{\mathrm{max}}^{2}=0.5314$ as the maximum value for the coefficient of determination. To determine the possible rate of parameters we discard any parameter combination with ${R}^{2}<0.53$. This gives sharp posterior distributions for both parameter values that peak at $\Delta cr/{N}_{0}=0.056\frac{\mathrm{kbp}}{\mathrm{year}}$ and $c=10.15\phantom{\rule{0.3em}{0ex}}\mathrm{kbp}$, see Figure 5a,b. This concurs with best parameter estimations from linear fitting ${c}_{f}=9.85\pm 0.2\phantom{\rule{0.3em}{0ex}}\mathrm{kbp}$ and $\Delta c{r}_{f}/{N}_{f}=0.05\pm 0.005\frac{\mathrm{kbp}}{\mathrm{year}}$. This scenario underestimates the initial telomere length ($c=10.15$, whereas the average initial telomere length in the data is $\overline{c}=10.67\phantom{\rule{0.3em}{0ex}}\mathrm{kbp}$).
Bayesian parameter evaluation for an interplay of symmetric and asymmetric cell divisions
Request a detailed protocolFor a logarithmic fit according to Equation S20 with three parameters we get an improved coefficient of determination ${R}_{\mathrm{max}}^{2}=0.541$. We discard any parameter combination that results in ${R}^{2}<0.54$. Again we find localized posterior parameter distributions that peak at $\Delta cr/{n}_{0}=0.071\frac{\mathrm{kbp}}{\mathrm{year}}$, $c=10.41\phantom{\rule{0.3em}{0ex}}\mathrm{kbp}$ and $p=0.32$, see Figure 5c–e. This approach improves the prediction of the initial telomere length. The average loss of telomere length per year is higher compared to only asymmetric proliferation and the probability of symmetric cell divisions peaks in a range of $p\in \left[0.25,0.4\right]$. This concurs with a nonlinear fit, where we find ${p}_{f}=0.37\pm 0.2$, ${c}_{f}=10.4\pm 0.3\phantom{\rule{0.3em}{0ex}}\mathrm{kbp}$ and $\Delta c{r}_{f}/{N}_{f}=0.071\pm 0.005\frac{\mathrm{kbp}}{\mathrm{year}}$. However, we note this is an average over all individuals with an age distribution from $0$ to $85$.
Bayesian parameter evaluation for a phase transition extension of the model
Request a detailed protocolIn the following we partition the data into two subsets and analyze an extension of the model. We introduce an additional parameter ${t}_{T}$ that resembles a transition time. This transition time is drawn from a uniform distribution with ${t}_{T}\in \left[0,80\right]$. We perform above Bayesian approach according to Equation S20 independently for each random partition of the data set. This gives in total seven posterior distributions. This approach gives ${R}_{\mathrm{max}}^{2}=0.573$ as the maximum value for the coefficient of determination and we discard any parameter combination with ${R}^{2}<0.57$. The transition occurs in children at the age of 6 to 7, see Figure 5f–i, and a clear distinction of the posterior parameter distributions between phase 1 and phase 2 can be observed. The parameter estimations confirm with the interpretation of a growing stem cell pool. We find an increased rate of telomere shortening, compared to phase 2 as well as an increased probability of symmetric cell divisions.
Non linear fitting of calculated telomere length distributions to measured distributions in single individuals
Request a detailed protocolIn the previous subsection, the average telomere shortening at the population level was investigated. We found indications for an increasing stem cell pool with age in particular in children due to infrequent symmetric stem cell divisions. In the following, we shift from the population level towards the telomere length distribution in healthy individuals. Equation S15 allows us to compare theoretical predictions to measured telomere length distributions and to infer individual proliferation parameters of stem cell populations in vivo from a single blood sample under an interplay of symmetric cell divisions (with probability $p$) and asymmetric cell divisions (with probability $1p$). However, Equation S2 is contained as the special case ($p=0$), according to Equation S15. The expected number of cells that have not entered cell cycle arrest is given by
We set ${t}^{*}=\frac{rp}{{N}_{0}}t+1$, normalize (S28) and obtain for the expected telomere length distribution
We perform nonlinear fits of Equation S29 to measured telomere distributions in healthy individuals, leaving three free parameters ${t}^{*}$, $p$ and $c$ to be determined. Results of the nonlinear fits can be seen in Figure 6—figure supplements 1–3. The corresponding fitting parameters are denoted in Supplementary file 1.
References

Telomere length predicts replicative capacity of human fibroblastsProceedings of the National Academy of Sciences of the United States of America 89:10114–10118.https://doi.org/10.1073/pnas.89.21.10114

Aging and immortality in a cell proliferation modelJournal of Theoretical Biology 248:411–417.https://doi.org/10.1016/j.jtbi.2007.06.009

Telomeres and human disease: ageing, cancer and beyondNature Reviews Genetics 6:611–622.https://doi.org/10.1038/nrg1656

Hematopoietic stem cells proliferate until after birth and show a reversible phasespecific engraftment defectJournal of Clinical Investigation 116:2808–2816.https://doi.org/10.1172/JCI28310

A ‘telomereassociated secretory phenotype’cooperates with BCRABL to drive malignant proliferation of leukemic cellsLeukemia 95:1–12.

Multimodel inference: understanding AIC and BIC in model selectionSociological Methods & Research 33:261–304.https://doi.org/10.1177/0049124104268644

Telomere diseasesNew England Journal of Medicine 361:2353–2365.https://doi.org/10.1056/NEJMra0903373

Approximate bayesian computation (ABC) in practiceTrends in Ecology & Evolution 25:410–418.https://doi.org/10.1016/j.tree.2010.04.001

A generalization of bayesian inferenceJournal of the Royal Statistical Society Series B 30:205–247.

Opinion: telomere dysfunction and the initiation of genome instabilityNature Reviews Cancer 3:623–627.https://doi.org/10.1038/nrc1142

The ageing haematopoietic stem cell compartmentNature Reviews Immunology 13:376–389.https://doi.org/10.1038/nri3433

Mitochondrial DNA mutations are established in human colonic stem cells, and mutated clones expand by crypt fissionProceedings of the National Academy of Sciences of the United States of America 103:714–719.https://doi.org/10.1073/pnas.0505903103

Telomere length dynamics and chromosomal instability in cells derived from telomerase null miceThe Journal of Cell Biology 144:589–601.https://doi.org/10.1083/jcb.144.4.589

The serial cultivation of human diploid cell strainsExperimental Cell Research 25:585–621.https://doi.org/10.1016/00144827(61)901926

Modern computational approaches for analysing molecular genetic variation dataNature Reviews Genetics 7:759–770.https://doi.org/10.1038/nrg1961

Stochastic elimination of cancer cellsProceedings of the Royal Society B: Biological Sciences 270:2017–2024.https://doi.org/10.1098/rspb.2003.2483

Stochastic models of telomere shorteningMathematical Biosciences 158:75–92.https://doi.org/10.1016/S00255564(98)100925

A theory of marginotomyJournal of Theoretical Biology 41:181–190.https://doi.org/10.1016/00225193(73)901987

Deconstructing stem cell selfrenewal: genetic insights into cellcycle regulationNature Reviews Genetics 9:115–128.https://doi.org/10.1038/nrg2269

Minimizing the risk of cancer: tissue architecture and cellular replication limitsJournal of the Royal Society Interface 10:20130410.https://doi.org/10.1098/rsif.2013.0410

Toward an evolutionary model of cancer: considering the mechanisms that govern the fate of somatic mutationsProceedings of the National Academy of Sciences of the United States of America 112:8914–8921.https://doi.org/10.1073/pnas.1501713112

Quasistationary regime of a branching random walk in presence of an absorbing wallJournal of Statistical Physics 131:203–233.https://doi.org/10.1007/s1095500895044

Different telomerelength dynamics at the inner cell mass versus established embryonic stem (eS) cellsProceedings of the National Academy of Sciences of the United States of America 108:15207–15212.https://doi.org/10.1073/pnas.1105414108

Oxidative stress shortens telomeresTrends in Biochemical Sciences 27:339–344.https://doi.org/10.1016/S09680004(02)021102

Dynamics of mutant cells in hierarchical organized tissuesPLoS Computational Biology 7:e1002290.https://doi.org/10.1371/journal.pcbi.1002290

Telomeres in the mouse have large interchromosomal variations in the number of T2AG3 repeatsProceedings of the National Academy of Sciences of the United States of America 94:7423–7428.https://doi.org/10.1073/pnas.94.14.7423
Decision letter

Frank JülicherReviewing Editor; Max Planck Institute for the Physics of Complex Systems, Germany
eLife posts the editorial decision letter and author response on a selection of the published articles (subject to the approval of the authors). An edited version of the letter sent to the authors after peer review is shown, indicating the substantive concerns or comments; minor concerns are not usually shown. Reviewers have the opportunity to discuss the decision before the letter is sent (see review process). Similarly, the author response typically shows only responses to the major concerns raised by the reviewers.
Thank you for submitting your work entitled "Reconstructing the in vivo dynamics of hematopoietic stem cells from telomere length distributions" for peer review at eLife. Your submission has been favorably evaluated by Naama Barkai (Senior Editor) and three reviewers, one of whom is am member of our Board of Reviewing Editors.
The reviewers have discussed the reviews and the Reviewing Editor has drafted this decision to help you prepare a revised submission. The referees find the work very interesting and of high quality. Using simple models for symmetric and asymmetric stem cell divisions, key parameter that characterizes the in vivo dynamics of telomere length distributions are reconstructed from measured telomere length distributions.
Before the manuscript could be accepted, several points raised by the referees should be addressed and clarified in a revised manuscript. These concern in particular the possibility to use a likelihood based inference approach and questions about the relationship between measured data (telomere length) and model states (numbers of stem cell divisions).
Reviewer #1:
The authors study the length distributions of telomeres in hematopoietic stem cells. Using simple stochastic models of telomere shortening during cell division, parameters characterizing the in vivo dynamics of hematopoietic stem cells are reconstructed.
This work is elegant and very interesting. Overall the paper is based on a simple and elegant idea. It provides information on stem cell dynamics in humans. However, when reading the manuscript it remains unclear to me how robust and sound the conclusions and results are. There are a number of points that need to be addressed by the authors before the strength of the work can be fully appreciated.
Major points:
A basic problem I have when reading the manuscript is that the models of telomere length reduction address the telomere length distribution in the stem cell population. However the telomere length data is obtained for the cell population taken from blood or bone marrow that contains differentiated cells. So in order to analyze the data it appears that one would need a model for the telomere length distribution of those cells found in the samples. Why should this distribution correspond to the one of the stem cell pool described by the model? This is quite unclear.
A related point is that the authors discuss very simple scenarios where only one step of differentiation from the stem cell pool is considered. In practice, differentiated blood cells could arise via several differentiation steps in a more complex scheme. This might change the telomere length distribution of the circulating cells. Such issues are not addressed in the paper.
A question arises from the statement in that "a cell loses typically 30 to 50kbp telomeric repeats per cell division" (subsection “In vivo measurements of telomere length suggest an increasing number of hematopietic stem cells during human adolescence”). This implies that the loss of repeats per division is itself a stochastic variable with mean and variance. In this case the distribution of telomere length differs from the distribution of cell states i. This is not discussed. Rather, the distinction is blurred by a confusing mix of language. For example in the subsection “Proliferation properties of stem cells differ during adolescence and adulthood”, the units of r/N_{0} are given as kbp/year even though by definition r/N_{0} is a rate with units 1 over time.
The presentation of the work is sometimes confusing because of imprecise language and confusing terminology. For example, in the subsection “The model predicts characteristic telomere length distributions for different ratios of symmetric and asymmetric stem cell divisions” you state: "Each stem cell proliferates randomly with a rate r". Again, in the subsection “Asymmetric cell divisions”: "r represents the proliferation rate of a cell". However looking carefully at Equation (S1) suggests that the parameter r is not what the authors say it is. The proliferation rate per cell rather appears to be given by the ratio r/N_{0} when using the terminology of the authors. Similarly, for the case with symmetric divisions, the proliferation rate per cell is r/(rpt+N_{0}). This is consistent with the statement that the proliferation rate of a stem cell decreases with time but is inconsistent with the definition of r in the text.
The statement "the assumption of strictly asymmetric divisions fails to explain the pronounced loss of telomere in infants" is unclear. What exactly is the evidence for a pronounced loss in infants? If one fits a linear law to the data in Figure 2 there would be a lot of variability at young age but not a "pronounced loss".
Related to this last point is the statement in the caption to Figure 2: "The decrease of the average telomere length slows down in children and becomes almost linear in adults." I do not understand how this statement follows from the data shown. The data does not appear to support this statement.
The statement of fact that the "increased loss" of telomere length at young age is "an immediate consequence of an expanding stem cell population" is a strong statement, without a clear reasoning. It rather seems to be a suggestion based on the model that is here stated as plain fact. Also, the evidence for an "increased loss" is not compelling, as it is not easily seen in the data in Figure 2. I do not understand where this claim stems from. The statement about "increased loss… during adolescence" is unclear too. Which part of the data is meant here?
The distinction of two phases discussed in Figure 4 is not very convincing and seems a bit arbitrary. Also, in Figure 4, it is unclear what data was analyzed with the Bayesian method. What problem is solved by introducing the distinction between two phases? The Bayesian approach also becomes conceptually puzzling when the probability p is now itself becoming distributed by a very broad probability distribution in phase 1 (Figure 4 h). What does this mean?
The linear fits to the data in Figure 6 are not really compelling as the data seems to be consistent with widely varying linear behaviors.
Reviewer #2:
In their manuscript entitled 'Reconstructing the in vivo dynamics of hematopoietic stem cells from telomere length distributions' Werner et al. use a simple mathematical model of symmetric/asymmetric stem cell divisions and link it telomere length distributions measured experimentally in hematopoietic cells. Studying the properties of the model, they identify qualitative differences in the resulting telomere distribution between a strictly asymmetric and a mixed symmetric/asymmetric model of cell division. Fitting the model to the data, they claim that during childhood stem cell divisions are predominately symmetric leading to an expanding stem cell pool, while during adulthood stem cell divisions are mostly asymmetric.
The manuscript is well written, the theoretical results/calculations are well explained, sometimes maybe even in too much detail in the Introduction but definitely sound, and the application of the model to experimental data reveals interesting insight into (individual) human stem cell dynamics.
I very much liked that the authors took a Bayesian approach to model estimation, because the indeterminacies in the model need to be quantified. My main criticism (details below) however revolves around some modeling aspects and the precise inference method used to fit the model to the experimental data.
Before publication, I encourage the authors include the following improvements/clarification into the manuscript:
Major points:
The applied inference approach (termed 'Bayesian inference method') seems to be very simplistic. The authors essentially apply Approximate Bayesian Computation (ABC) rejection sampling (this term should be mentioned in the manuscript). These likelihoodfree methods are usually applied when the likelihood function of the model is intractable. However, the authors show (e.g. S10, S20) that they can solve for the relevant quantities of their model. Hence a likelihood based inference approach is possible (e.g. maximum likelihood estimation + profile likelihoods, or full MCMC for posterior distributions) and should be used (in order to avoid the pitfalls associated with ABC). In fact, fitting S10 is just a linear regression problem.
You consider three different models of increasing complexity: 1) strictly asymmetric, 2) mixed symmetric/asymmetric, and 3) a twophase model. Obviously, more complex model can fit the data better. However, can you show that the more complicated models are indeed necessary using some model selection techniques? Especially model 3) might be hard to justify: Looking at Figure 2, one cannot really observe two different phases and model 2) seems to do fairly well already. Here I strongly suggest to do Bayesian model selection e.g. using Bayes factor – recently it has been shown that this can be efficiently calculated e.g. using thermodynamic integration also in the context of biological dynamical systems.
In your model, cells are distinguished according to the number of times they have divided (the states i). In the experiment you can only observe telomere length for a cell, not number of divisions.
How are those two quantities related in your model (what's your observation/error model for the data)? This seems nontrivial, e.g. because a certain state i in the model doesn't correspond to a fixed observed telomere length (you mention that each division yield a loss of ~3050bp) and this uncertainty will propagate with each division. Hence, based on a measured telomere length inferring the number of divisions is probably not unique (e.g. what about a measurement of telomere length = 55? It could have divided once, losing 55bp at once. Or it might have divided twice losing a bit less then 30bp each time). Please discuss these nonuniqueness using either Bayesian credible intervals or more so identifiability analysis using profile likelihood/posterior methods.
According to the stochastic model, you implicitly assume that cellcycle times (in your case the waiting times in state i) are distributed exponentially (which they are clearly not in the real stem cell system). Please discuss if/how this assumption impacts on the results. Alternatively people nowadays often use delayed models by having the cell go through multiple states before actually dividing – then the Poisson model would be replaced by a log normal in the limit I believe. Here comparison to actual cell cycle distributions would be helpful.
For the section "Proliferation properties of stem cells during adolescence and adulthood" it remains unclear what data is actually fitted with this procedure. Are the same data shown in Figure 2? If so, it seems strange that you already discuss the fitted model in a previous paragraph and describe this 'Bayesian inference method' only later.
In the next section (“A single sample of telomere length distribution…”) you analyze the entire telomere distribution instead of only looking at the average. First, it is worth mentioning that these data correspond to the open symbols in Figure 2 (which of course shows only the mean again). Second, did you refit the model to those distributions? If so, how was this done?
Reviewer #3:
The authors develop a mathematical model in order to interpret data of telomere length shortening in hematopoietic stem cells from a relatively large number of patients. The mathematical model is used to infer dynamic properties of stem cell populations from data on telomere length distributions. Such analysis could quantify parameters of telomere dynamics from a single blood sample of a person. The model can be used as a tool to retrospectively infer stem cell dynamics, which could become useful in the personalized diagnosis and prognosis of disease.
I think that this is a very nice paper and of very high impact for the field. It is a showcase for how biologically realistic mathematical models can be coupled with clinical data in order to arrive at clinically useful insights, with potential to influence disease understanding and diagnosis. Therefore, I think that the quality of the paper is definitely suitable for a highimpact journal, such as eLife, and I recommend that this paper is published in the journal after some revisions.
I think that revisions could improve the paper further, and my suggestions are given as follows:
Following the Introduction, you have a section on "Modeling telomere length dynamics". I think that this could be improved. In particular, it is hard to get a detailed enough picture of the exact modeling approach from this section. Some of the details that I was missing in this section are explained in the Results section. So perhaps, all model description could be placed in a modeling section, and the Results section could just describe the results. This might make it easier to read the paper. Of course, a detailed and clear model description is given in the supplementary materials, so my comment is just about the readability of this particular section in the main text
I have a question regarding the model structure, and it might be useful to discuss this somewhere in the paper. With a certain probability, asymmetric division occurs, where stem cell division leads to one stem daughter cell and one differentiating cell. With the opposite probability, symmetric division was assumed to occur, where division gives rise to 2 stem cells. This is a biologically reasonable model. However, it is also possible to formulate this differently. For example, you can assume that with a certain probability, a stem cell divides to give rise to 2 stem daughter cells, and with the opposite probability, it divides to give rise to 2 differentiated daughter cells. In this way, you then probably would need to include feedback mechanisms to obtain stable and realistic dynamics. Would such a difference influence your results or not? I am not suggesting analyzing further models, but a discussion on model robustness might be useful for the paper.
Along similar thoughts, it could be helpful for the Discussion to include some text that discusses whether the reported results could change if different mathematical formulations or assumptions are used.
For clarity, the Abstract could point out more strongly that data have been collected for this and experimental procedures were used.
[Editors' note: further revisions were requested prior to acceptance, as described below.]
Thank you for resubmitting your work entitled "Reconstructing the in vivo dynamics of hematopoietic stem cells from telomere length distributions" for further consideration at eLife. Your revised article has been favorably evaluated by Naama Barkai (Senior Editor) and three reviewers, one of whom is a member of our Board of Reviewing Editors.. The manuscript has been significantly improved and the points of the referees have been taken into account. There is one technical comment of referee two that the authors should consider before acceptance, as outlined below. The new Figure 2—figure supplement 1 clarifying the difference between the two models is very useful and might be well suited as a main figure.
Reviewer #1:
The authors have carefully addressed the reviewer comments. The issues raised in my first report have been clarified and the text is now much clearer.
Reviewer #2:
In revised version of their manuscript "Reconstructing the in vivo dynamics of hematopoietic stem cells from telomere length distributions", the authors have incorporated the comments and suggestions of the three reviewers, increasing the quality of the manuscript considerably. Especially some confusion due to inconsistent nomenclature has been sorted out and makes the manuscript much easier to read.
In particular, the authors responded to all my concerns/issues and could sort out most of them. However, I would like to comment on some specific points and ask for additional clarification if possible.
Fitting procedure:
I have the feeling that it is a bit confusing/redundant now; to summarize this is what I found as fitting setup:
1) You fit your models in the section "In vivo measurements of telomere length…" giving some parameter estimates;
2) You perform ABCrejection sampling in the next section, giving some idea about the posterior parameter distribution and uncertainties;
3) You do maximum likelihood estimates of the parameters (+ confidence intervals), again giving some idea about the uncertainties.
So, you fit the same data, with three different methods (actually I'm not sure what the difference between 1 and 3 is). This seems unnecessary and confusing. Why doing basically the same thing three times? Is this for different models, or for comparison? What’s the difference between 1 and 3?
One observes that the estimates a (slightly) different, e.g. telomere_loss_ABC = 0.071 vs telormere_loss_MLE=0.075. Is this difference due to the different "error measure", i.e. R^{2} for ABC and quadratic distance/Gaussian error for MLE? Since you do ABC to get a handle on the uncertainties, you should report them (as you did for the MLE), i.e. the boundaries of the credibility intervals.
One major advantage of having the full posterior is that you can look at correlations of parameters. The onedimensional confidence/credibility intervals (or an approximation) you can also obtain from MLE. Consider showing e.g. pairwise scatterplots of the posterior samples to show whether there are some correlations/dependencies.
This whole subject discussed above should be straightened out in the final manuscript. Choose one suitable method and stick to it (I would suggest MCMC + Bayes factors, which gives you posterior uncertainties and model comparison or the ABC version). Otherwise the reader will get lost.
Model selection:
Thanks for incorporating some model selection. Any particular reason why you choose AIC and not BIC? Does the BIC select the same model? Since the multiphase model is inferior to model 2, you should be careful not to interpret this model too much.
Thanks for showing the fits in Figure 2—figure supplement 1. This helps a lot to see why model 1 doesn't explain the data so well. Is there any reason not to put this figure into the main manuscript (instead of the old Figure 2 which doesn't show the linear fit)?
About my question on exponential waiting times:
My question was about cell cycle times, i.e. the time from a cell's "birth" (the division of its mother cell) until it divides. From what I understood when reading the supplement section "Connections to the Normal and Lognormal distribution", you look at the "distribution" N^{(i)}(t), i.e. the number of cells across states, which is Poisson (model 1) or generalized Poisson (model 2). These can then be approximated by normals/lognormals.
However, to me this is fundamentally different from a normal/lognormally distributed cell cycle time. The cell cycle time tells us when a single cell is going to divide. Your N^{(i)}(t) tells us how a population of states spreads over the states i. I don't see an immediate connection between those to quantities. Please comment on that.
Reviewer #3:
The authors have addressed all of my comments and improved the manuscript accordingly. I think it should be accepted for publication in the current form.
https://doi.org/10.7554/eLife.08687.018Author response
Reviewer #1:
Major points:
A basic problem I have when reading the manuscript is that the models of telomere length reduction address the telomere length distribution in the stem cell population. However the telomere length data is obtained for the cell population taken from blood or bone marrow that contains differentiated cells. So in order to analyze the data it appears that one would need a model for the telomere length distribution of those cells found in the samples. Why should this distribution correspond to the one of the stem cell pool described by the model? This is quite unclear.
We agree with the reviewer. Ideally, we would like to measure the telomere length distribution within the stem cell population over time. Obviously this is infeasible in human populations. However, homeostasis in the hematopoietic system is maintained by rare stem cell differentiations, see for example the very recent papers (Busch et al., 2015; Sun et al., 2014). Non stem cells differentiate further and allow for the diversity of cells in the hematopoietic system. But these cells have a finite life time and are eventually washed out, see for example (Busch et al., 2015; Sun et al., 2014; Werner et al., 2011). Only stem cells are able to maintain homeostasis in the long run. Age dependent differences in telomere shortening across different lineages of haematopoiesis across individuals can thus only persist in the hematopoietic system, if they occur on the level of the maintained selfrenewing cell population. But we agree, our inference would only be possible if the differentiation process affects telomeres in a way that does not change with age. To account for this assumption better, we have now extended our explanation in the manuscript.
A related point is that the authors discuss very simple scenarios where only one step of differentiation from the stem cell pool is considered. In practice, differentiated blood cells could arise via several differentiation steps in a more complex scheme. This might change the telomere length distribution of the circulating cells. Such issues are not addressed in the paper.
Please see our explanations above. The time dependent component of telomere shortening in homeostasis should ultimately arise from the stem cell compartment (the cell compartment that is responsible in maintaining homeostasis). We implicitly assume that the basic organization of the hierarchy remains approximately the same throughout life (it is also well conserved across mammals) and thus cannot be responsible for the persistent temporal changes in telomere length dynamics (of course there are many differentiation steps until all mature blood cells are generated, however this just results in a shift of the distribution towards shorter telomeres. This shift is assumed to be constant and not age dependent). Consequently, it suffices to analyse the population dynamics of the persistent time dependent component of the system to indirectly infer properties of the stem cell pool. We have now made these assumptions more explicit.
A question arises from the statement in that "a cell loses typically 30 to 50kbp telomeric repeats per cell division" (subsection “In vivo measurements of telomere length suggest an increasing number of hematopietic stem cells during human adolescence”). This implies that the loss of repeats per division is itself a stochastic variable with mean and variance. In this case the distribution of telomere length differs from the distribution of cell states i. This is not discussed. Rather, the distinction is blurred by a confusing mix of language. For example in the subsection “Proliferation properties of stem cells differ during adolescence and adulthood”, the units of r/N_{0} are given as kbp/year even though by definition r/N_{0} is a rate with units 1 over time.
We apologize for this imprecision. We describe telomere length on the level of cells as an average cell property. The dynamics of telomere length is defined on a population level, where telomere length changes by a cell division, but is accessed within a pool of other cells. The effect of two cell divisions with a small loss of telomeric repeats or one cell division with a larger loss of telomeric repeats is indistinguishable on the population level. We thus actually estimate a parameter $\frac{r\ufffdc}{{N}_{0}}$. Here $\text{\Delta}c$ is the average loss of telomere repeats per cell division (for example measured in base pairs) and $\frac{r}{{N}_{0}}$ is the relative proliferation rate (measured in 1/time). It is this interconnection of telomere loss and cell turnover that hinders us to for example estimate the actual number of active hematopoietic stem cells with our approach. In the previous version, we implicitly absorbed the parameter $\text{\Delta}c$. But we agree this might be a source of confusion and now explicitly state this parameter where necessary.
The statement in referred to by the reviewer was meant to be an example. If we know the parameter $\text{\Delta}c$ exactly, we can estimate the turn over rate of stem cells. This does not imply that the telomere shortening itself is a stochastic variable in our model. Biologically, the loss of repeats is almost certainly stochastic on the level of single telomeres. However, we consider the shortening of telomeres on the cell level. This corresponds to an average of 184 independent events, so we assume that the variability is small.
To avoid confusion, we have changed the former statement to: “If for example a cell looses on average 50 bp telomeric repeats per cell division…”.
We also extended our Discussion and explain that we use the average telomere length of a cell throughout the manuscript and therefore only consider the average shortening of telomere length per cell division.
The presentation of the work is sometimes confusing because of imprecise language and confusing terminology. For example, in the subsection “The model predicts characteristic telomere length distributions for different ratios of symmetric and asymmetric stem cell divisions” you state: "Each stem cell proliferates randomly with a rate r". Again, in the subsection “Asymmetric cell divisions”: "r represents the proliferation rate of a cell". However looking carefully at Equation (S1) suggests that the parameter r is not what the authors say it is. The proliferation rate per cell rather appears to be given by the ratio r/N_{0} when using the terminology of the authors. Similarly, for the case with symmetric divisions, the proliferation rate per cell is r/(rpt+N_{0}). This is consistent with the statement that the proliferation rate of a stem cell decreases with time but is inconsistent with the definition of r in the text.
We apologize for this imprecision. If homeostasis would be driven by a single stem cell, this cell indeed has a proliferation rate r. In model 1 we assume that ${N}_{0}$ stem cells drive homeostasis and consequently the contribution of each stem cell changes to an effective proliferation rate $\frac{r}{{N}_{0}}$ on the population level. In model 2, the number of stem cells changes over time and consequently, the effective proliferation rates of stem cells on the population level also change with age. We now explain this in more detail in the manuscript and point out important differences between model 1 and model 2. We also discuss the difference between the proliferation rate r and the effective proliferation rate of stem cells within a pool of stem cells.
The statement "the assumption of strictly asymmetric divisions fails to explain the pronounced loss of telomere in infants" is unclear. What exactly is the evidence for a pronounced loss in infants? If one fits a linear law to the data in Figure 2 there would be a lot of variability at young age but not a "pronounced loss".
We have now added a new supplement to Figure 2 that shows the linear and logarithmic decrease. It also depicts the average initial telomere length in newborns. It illustrates that during adolescence, telomeres are lost at a faster rate compared to the loss of telomeres in adults.
We also added a model selection based on maximum likelihood methods, which are explained in the response to reviewer 2. These methods reject model 1 and favor model 2 as a potential explanation of the data.
Related to this last point is the statement in the caption to Figure 2: "The decrease of the average telomere length slows down in children and becomes almost linear in adults." I do not understand how this statement follows from the data shown. The data does not appear to support this statement.
We now added a new Figure (Figure 2—figure supplement 1) that shows both the logarithmic and linear best fit to the data in Figure 2. It also highlights the mean initial telomere length in newborns. It shows that the differences between both curves are marginal in adults, but the linear curve underestimates the telomere length reductions in newborns and during adolescence. In other words, the telomere length declines faster then predicted by the best linear fit during adolescence and is well approximated by a linear decrease during adulthood.
We also removed the reference to Figure 2 to avoid confusion here.
The statement of fact that the "increased loss" of telomere length at young age is "an immediate consequence of an expanding stem cell population" is a strong statement, without a clear reasoning. It rather seems to be a suggestion based on the model that is here stated as plain fact. Also, the evidence for an "increased loss" is not compelling, as it is not easily seen in the data in Figure 2. I do not understand where this claim stems from. The statement about "increased loss… during adolescence" is unclear too. Which part of the data is meant here?
We thank the reviewer for this important remark. We have added another supplement to Figure 2 that compares the linear and logarithmic model prediction. It emphasizes the increased loss of telomeres during adolescence, but the connection to an expanding stem cell population is of course indirect.
We have thus now rephrased our statement and now say:” Our model suggests that the increased loss of telomere length…”.
We also cite Figure 2—figure supplement 1 after the statement.
The distinction of two phases discussed in Figure 4 is not very convincing and seems a bit arbitrary. Also, in Figure 4, it is unclear what data was analyzed with the Bayesian method. What problem is solved by introducing the distinction between two phases? The Bayesian approach also becomes conceptually puzzling when the probability p is now itself becoming distributed by a very broad probability distribution in phase 1 (Figure 4 h). What does this mean?
We now discuss a model selection procedure based on a maximum likelihood (please see the detailed discussion in response to reviewer 2). Indeed, model 2 is favoured over a multiphase model with subsequent independent parameters.
The linear fits to the data in Figure 6 are not really compelling as the data seems to be consistent with widely varying linear behaviors.
We agree with the reviewer. The lines are only meant to represent a trend. We do not mean to claim that this increase or decrease is linear. We added an additional explanation to the caption of Figure 6 to emphasize this last point.
Reviewer #2:
Major points:
The applied inference approach (termed 'Bayesian inference method') seems to be very simplistic. The authors essentially apply Approximate Bayesian Computation (ABC) rejection sampling (this term should be mentioned in the manuscript). These likelihoodfree methods are usually applied when the likelihood function of the model is intractable. However, the authors show (e.g. S10, S20) that they can solve for the relevant quantities of their model. Hence a likelihood based inference approach is possible (e.g. maximum likelihood estimation + profile likelihoods, or full MCMC for posterior distributions) and should be used (in order to avoid the pitfalls associated with ABC). In fact, fitting S10 is just a linear regression problem.
We thank the reviewer for this important comment. We now mention the full term, “Approximate Bayesian Computation rejection sampling”, in the main manuscript as well as in the supplemental information.
As the reviewer correctly stated, we are able to solve the dynamical equations of the model and thus can utilise methods beyond ABC. In fact, our “best” parameter estimations rely on standard fitting procedures using regression analysis (or just linear fitting in case of equation S10).
We revisited our data analysis and in addition now implemented standard likelihood estimates for our regression analysis. In the case of a linear regression they correspond to the minimal mean squared distance of course. The likelihoodbased methods confirm our parameter estimates from standard regression analysis. This is now stated in the manuscript too.
You consider three different models of increasing complexity: 1) strictly asymmetric, 2) mixed symmetric/asymmetric, and 3) a twophase model. Obviously, more complex model can fit the data better. However, can you show that the more complicated models are indeed necessary using some model selection techniques? Especially model 3) might be hard to justify: Looking at Figure 2, one cannot really observe two different phases and model 2) seems to do fairly well already. Here I strongly suggest to do Bayesian model selection e.g. using Bayes factor – recently it has been shown that this can be efficiently calculated e.g. using thermodynamic integration also in the context of biological dynamical systems.
We thank the reviewer for this very helpful suggestion. We performed a model selection analysis based on the Akaike information criterion (AIC), as well as a Bayesian information criterion, using our likelihood estimates.
Model 1 scores with a AIC of 2550, model 2 has an AIC of 2328 and a multiphase model with a minimum of 7 parameters yields an AIC of 2361 and therefore model 2 minimizes the AIC. Based on these values and according to established standard procedures, model 1 is only with probability $p\approx {10}^{48}$ more likely to minimize the information loss compared to model 2. A multiphase model performs slightly better compared to the linear model 1, but still it is more likely to minimize information loss compared to model 2 with a probability of $p\approx {10}^{8}.$
Thus, as also suspected by the reviewer, model 2 is highly favoured and both model 1 as well as a multiphase model with different parameters for each subsequent phase can be rejected as more likely explanations for the data presented in Figure 2.
We added this point to the manuscript and now discuss and score the models according to the model selection criteria.
In your model, cells are distinguished according to the number of times they have divided (the states i). In the experiment you can only observe telomere length for a cell, not number of divisions.
How are those two quantities related in your model (what's your observation/error model for the data)? This seems nontrivial, e.g. because a certain state i in the model doesn't correspond to a fixed observed telomere length (you mention that each division yield a loss of ~3050bp) and this uncertainty will propagate with each division. Hence, based on a measured telomere length inferring the number of divisions is probably not unique (e.g. what about a measurement of telomere length = 55? It could have divided once, losing 55bp at once. Or it might have divided twice losing a bit less then 30bp each time). Please discuss these nonuniqueness using either Bayesian credible intervals or more so identifiability analysis using profile likelihood/posterior methods.
This is an important point and was also raised by reviewer 1. We kindly refer the reviewer to our reply to reviewer 1.
According to the stochastic model, you implicitly assume that cellcycle times (in your case the waiting times in state i) are distributed exponentially (which they are clearly not in the real stem cell system). Please discuss if/how this assumption impacts on the results. Alternatively people nowadays often use delayed models by having the cell go through multiple states before actually dividing – then the Poisson model would be replaced by a log normal in the limit I believe. Here comparison to actual cell cycle distributions would be helpful.
We thank the reviewer for interesting remark. Indeed, in Model 1, where we assume a strictly constant population size, cell cycle times are distributed exponentially. Although stem cell divisions might be stochastic, we agree that this assumption is unlikely met in highly regulated tissue environments under the presence of feedback. Interestingly, Model 1 is also unable to describe the full spectrum of telomere length dynamics, as for example suggested by the model selection criteria.
However, we want to point the attention of the reviewer to Model 2, where we allow for an increasing stem cell population. In Model 2, we pick cells randomly for proliferation to allow for stochastic stem cell behaviour, but we also condition the system on a certain constant output of differentiated cells per unit of time. This condition has several consequences. First, it ensures that the output of the stem cell compartment is not steadily increasing with an increasing stem cell pool size, which would lead to very unrealistic scenarios, particularly in adults, where we can assume that the output of the stem cell compartment is approximately at a constant level during homeostasis. In a sense, this condition describes a feedback of the hematopoietic system on the regulation of stem cell proliferation. This leads to a continuous decrease of the proliferation rate of individual stem cells. As the number of stem cells increase linearly in time, this decrease is proportional to 1/age of healthy individuals during adolescence. This basic property is found in humans and in mice, see for example Rozhok and DeGregori, 2015, and Bowie et al., 2006.
However, this condition has another interesting consequence. As the reviewer correctly observed, Model 1 leads to a Poisson distribution, which in our case is also well approximated by a Normal distribution. Thus the variable of effective proliferation $\frac{rt}{{N}_{0}}$ is approximately normally distributed. Except for normalization, a transformation of variables $x\to {e}^{\frac{rt}{{N}_{0}}}$, allows to transfer Model 2 into Model 1, where $x=\frac{rpt}{{N}_{0}}+1$ is the transformed effective proliferation rate of Model 2. Thus, Model 2 results approximately in a LogNormal distribution, and recapitulates what the reviewer suggested to be a consequence of a model with multiple states of cell cycles and is a natural component of our model without explicitly introducing such multiple states. It is tempting to speculate that the mechanism of multiple cell cycle states evolved to ensure such a regulated constant cell output in homeostasis.
We added a subsection to the supplemental information that discusses the connections of our Model 1 and Model 2 to the Normal and LogNormal distribution in more detail, see for example equation (S26).
We also discuss this connection in more detail in the main text.
For the section "Proliferation properties of stem cells during adolescence and adulthood" it remains unclear what data is actually fitted with this procedure. Are the same data shown in Figure 2? If so, it seems strange that you already discuss the fitted model in a previous paragraph and describe this 'Bayesian inference method' only later.
We apologize for this imprecision. We perform the Bayesian inference method on the same data set that leads to Figure 2 and indeed this data was also used to fit the model in the previous paragraph. While we discuss the “best” fit of Model 1 and Model 2 to the data in Figure 2, we aim to discuss the range of parameters that are consistent with the model within a certain statistical measure in the next subsection. We explain this now in more detail and state more precisely which data is fitted, as well as the connection to the former section.
In the next section (“A single sample of telomere length distribution…”) you analyze the entire telomere distribution instead of only looking at the average. First, it is worth mentioning that these data correspond to the open symbols in Figure 2 (which of course shows only the mean again). Second, did you refit the model to those distributions? If so, how was this done?
Thanks for this comment; we stated in the caption of Figure 2 that open symbols correspond to cases where we have the full telomere length distribution of individuals (in total 66 healthy persons). This is now also mentioned in the main text.
The fitting procedure is described in the supplemental information section, see for example Equation (S29), which is now cited in the main text. The results of the fits are presented in Figure 6–figure supplements 1–3, and all parameters are summarised in Supplementary file 1.
Reviewer #3:
Following the Introduction, you have a section on "Modeling telomere length dynamics". I think that this could be improved. In particular, it is hard to get a detailed enough picture of the exact modeling approach from this section. Some of the details that I was missing in this section are explained in the Results section. So perhaps, all model description could be placed in a modeling section, and the Results section could just describe the results. This might make it easier to read the paper. Of course, a detailed and clear model description is given in the supplementary materials, so my comment is just about the readability of this particular section in the main text
We thank the reviewer for this suggestion. We restructured the manuscript as suggested. We moved all model descriptions from the Results into the modelling section. In addition we cite some equations from the supplementary information to improve readability further.
I have a question regarding the model structure, and it might be useful to discuss this somewhere in the paper. With a certain probability, asymmetric division occurs, where stem cell division leads to one stem daughter cell and one differentiating cell. With the opposite probability, symmetric division was assumed to occur, where division gives rise to 2 stem cells. This is a biologically reasonable model. However, it is also possible to formulate this differently. For example, you can assume that with a certain probability, a stem cell divides to give rise to 2 stem daughter cells, and with the opposite probability, it divides to give rise to 2 differentiated daughter cells. In this way, you then probably would need to include feedback mechanisms to obtain stable and realistic dynamics. Would such a difference influence your results or not? I am not suggesting analyzing further models, but a discussion on model robustness might be useful for the paper.
We agree with the reviewer that there are different ways to formulate the model. The alternative proposal is certainly possible and reasonable, in fact some of the authors have worked on such models before (Dingli D, Traulsen A and Michor F. (A)Symmetric Stem Cell Replication and Cancer. PLoS Comput Biol. 2007;3(3):e53). The difference of both formulations will depend on the detailed assumptions of stem cell proliferation. If we assume a feedback driven successive proliferation of stem cells, both models result in the same time dependent properties of telomere length, as two asymmetric divisions have the same impact on the stem cell population as one symmetric renewal followed by a symmetric differentiation. Thus both situations would on average be indistinguishable on the level of telomeres. Only the interpretation of p, the probability of symmetric selfrenewal would change.
In contrast to the average that is expected to be the same for both scenarios, the variance of the distribution might change. We would expect a higher variance in the case of symmetric differentiation and symmetric selfrenewal. However, most likely this effect is small compared to the measurement related noise.
However, other properties such as the clonal load of hematopoietic cells, could change (Shahriyari and Komarova, 2013). Very recent studies of clonal dynamics in hematopoietic cells in mice seem to point to a model where stem cells contribute repeatedly to homeostasis with longer breaks of quiescence in between. This would suggest a system of predominantly asymmetric divisions rather a then a system of symmetric renewals and differentiations. However, this also might differ across different tissues (Morrison and Kimble, 2006).
We discuss this point now at the end of the manuscript.
Along similar thoughts, it could be helpful for the Discussion to include some text that discusses whether the reported results could change if different mathematical formulations or assumptions are used.
We kindly refer the reviewer to our explanation above. We added further discussion at the end of the manuscript.
For clarity, the Abstract could point out more strongly that data have been collected for this and experimental procedures were used.
Thanks for the remark. We now have an additional sentence in the Abstract to emphasize the collection and analysis of data in combination with the mathematical model.
[Editors' note: further revisions were requested prior to acceptance, as described below.]
Reviewer #2:
Fitting procedure: I have the feeling that it is a bit confusing/redundant now; to summarize this is what I found as fitting setup:
1) You fit your models in the section "In vivo measurements of telomere length…" giving some parameter estimates;
2) You perform ABCrejection sampling in the next section, giving some idea about the posterior parameter distribution and uncertainties; 3) You do maximum likelihood estimates of the parameters (+ confidence intervals), again giving some idea about the uncertainties.
So, you fit the same data, with three different methods (actually I'm not sure what the difference between 1 and 3 is). This seems unnecessary and confusing.
We thank the reviewer to point out this shortcoming. The reviewer’s summary of the fitting procedure is correct.
We agree with the reviewer, in particular method 1 and 3 are so similar that they can be considered redundant now. We initially started with method 1 and included method 3 later to be able to do a proper model selection procedure. However, this introduced unnecessary potential for confusion and might even lead to misunderstandings. We therefore rearranged the manuscript. We now dropped method 1 and only discuss the parameter estimations in terms of maximum likelihood methods in the section: “In vivo measurements of telomere length suggest an increasing number of hematopietic stem cells during human adolescence”. Please see below some more detailed comments.
Why doing basically the same thing three times? Is this for different models, or for comparison?
We now dropped the first method from the manuscript to avoid repetition and confusion.
What’s the difference between 1 and 3?
Method one was based on a standard R^{2 }distance. Method 3 is a standard implementation of maximum likelihood inference for a regression analysis. One therefore assumes a normal distribution of errors and minimizes the quadratic distance between the model and the data. For a linear regression, this yields exactly the same estimators as the standard R^{2 }distance. For a nonlinear regression, the estimates can differ slightly. However, we dropped method 1 and only use method 3 in our manuscript now, as this yields a straightforward way to implement a model selection procedure.
One observes that the estimates a (slightly) different, e.g. telomere_loss_ABC = 0.071 vs telormere_loss_MLE=0.075. Is this difference due to the different "error measure", i.e. R^{2} for ABC and quadratic distance/Gaussian error for MLE?
Our expertise on statistical inference is not exhaustive, but we also believe that the slight differences occur due to the different error measures (R^{2 }in ABC and Gaussian/quadratic distance for MLE). This is supported by the fact that a standard R^{2 }parameter estimate recapitulates the most likely ABC estimates and also the estimated error intervals are comparable.
Since you do ABC to get a handle on the uncertainties, you should report them (as you did for the MLE), i.e. the boundaries of the credibility intervals.
One major advantage of having the full posterior is that you can look at correlations of parameters. The onedimensional confidence/credibility intervals (or an approximation) you can also obtain from MLE. Consider showing e.g. pairwise scatterplots of the posterior samples to show whether there are some correlations/dependencies.
We thank the reviewer for this remark. We now also state the confidence intervals based on the ABC method.
This whole subject discussed above should be straightened out in the final manuscript. Choose one suitable method and stick to it (I would suggest MCMC + Bayes factors, which gives you posterior uncertainties and model comparison or the ABC version). Otherwise the reader will get lost.
We rearranged the presentation in the manuscript to avoid confusion and misunderstanding. We dropped method 1 and present the parameter estimation in the framework of maximum likelihood estimates. In addition we discuss the ABC more carefully and emphasize, that maximum likelihood and ABC operate on the same data set.
Model selection:
Thanks for incorporating some model selection. Any particular reason why you choose AIC and not BIC? Does the BIC select the same model?
There was no particular reason to choose AIC over BIC. The BIC criterion selects the models in the same order. We state this in the manuscript now.
Since the multiphase model is inferior to model 2, you should be careful not to interpret this model too much.
We thank the reviewer for this remark. We rephrased these statements and emphasize the limitations of the twophase model.
Thanks for showing the fits in Figure 2—figure supplement 1. This helps a lot to see why model 1 doesn't explain the data so well. Is there any reason not to put this figure into the main manuscript (instead of the old Figure 2 which doesn't show the linear fit)?
We thank the reviewer and gladly take this suggestion. We changed Figure 2 accordingly and present Figure 2—figure supplement 1 in the main manuscript as a new Figure 3 now.
About my question on exponential waiting times:
My question was about cell cycle times, i.e. the time from a cell's "birth" (the division of its mother cell) until it divides. From what I understood when reading the supplement section "Connections to the Normal and Lognormal distribution", you look at the "distribution" N^{(i)}(t), i.e. the number of cells across states, which is Poisson (model 1) or generalized Poisson (model 2). These can then be approximated by normals/lognormals.
However, to me this is fundamentally different from a normal/lognormally distributed cell cycle time. The cell cycle time tells us when a single cell is going to divide. Your N^{(i)}(t) tells us how a population of states spreads over the states i. I don't see an immediate connection between those to quantities. Please comment on that.
We agree with the reviewer, in general our model discusses the distribution of the population size N^{(i)}(t), which corresponds to the distribution of cells across different states i. However, in the supplemental chapter “Connections to the Normal and Lognormal distribution", we discuss the distribution of the dimensionless variable x=rt/N (x = proliferation rate per cell $\times $ time). We bravely interpret this variable as a random variable and show that model 2 leads to an approximately lognormal distribution for this variable. This random variable x corresponds to the rate of individual cell divisions within a certain time interval and thus can be interpreted (maybe it is too farfetched) as an average waiting time for cell divisions within the cell population.
We now extend our explanation in the supplemental information to distinguish these two aspects better. However as this is not essential to our model and we are not entirely sure if this connection is helpful or confusing, we would be open to drop this aspect if the reviewer advises us to do so.
https://doi.org/10.7554/eLife.08687.019Article and author information
Author details
Funding
No external funding was received for this work.
Acknowledgements
We would like to thank Lucia Vankann for technical assistance. Confocal microscopy was performed in the “Immunohistochemistry and confocal microscopy” core unit of the Interdisciplinary Center for Clinical Research (IZKF) Aachen within the Faculty of Medicine at RWTH Aachen University with support of Gerhard MüllerNewen.
Ethics
Human subjects: All samples and the approval for publication were taken with informed consent of all patients at the University Hospital Aachen according to the guidelines and the approval of the ethics committees at the University Hospital Aachen.
Reviewing Editor
 Frank Jülicher, Max Planck Institute for the Physics of Complex Systems, Germany
Publication history
 Received: May 13, 2015
 Accepted: October 14, 2015
 Accepted Manuscript published: October 15, 2015 (version 1)
 Version of Record published: January 27, 2016 (version 2)
Copyright
© 2015, Werner et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 2,688
 Page views

 673
 Downloads

 53
 Citations
Article citation count generated by polling the highest count across the following sources: Scopus, Crossref, PubMed Central.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Computational and Systems Biology
Early predator detection is a key component of the predatorprey arms race and has driven the evolution of multiple animal hearing systems. Katydids (Insecta) have sophisticated ears, each consisting of paired tympana on each foreleg that receive sound both externally, through the air, and internally via a narrowing ear canal running through the leg from an acoustic spiracle on the thorax. These ears are pressuretime difference receivers capable of sensitive and accurate directional hearing across a wide frequency range. Many katydid species have cuticular pinnae which form cavities around the outer tympanal surfaces, but their function is unknown. We investigated pinnal function in the katydid Copiphora gorgonensis by combining experimental biophysics and numerical modelling using 3D ear geometries. We found that the pinnae in C. gorgonensis do not assist in directional hearing for conspecific call frequencies, but instead act as ultrasound detectors. Pinnae induced large sound pressure gains (20–30 dB) that enhanced sound detection at high ultrasonic frequencies (>60 kHz), matching the echolocation range of cooccurring insectivorous gleaning bats. These findings were supported by behavioural and neural audiograms and pinnal cavity resonances from live specimens, and comparisons with the pinnal mechanics of sympatric katydid species, which together suggest that katydid pinnae primarily evolved for the enhanced detection of predatory bats.

 Computational and Systems Biology
 Genetics and Genomics
Genotype imputation is a foundational tool for population genetics. Standard statistical imputation approaches rely on the colocation of large wholegenome sequencingbased reference panels, powerful computing environments, and potentially sensitive genetic study data. This results in computational resource and privacyrisk barriers to access to cuttingedge imputation techniques. Moreover, the accuracy of current statistical approaches is known to degrade in regions of low and complex linkage disequilibrium. Artificial neural networkbased imputation approaches may overcome these limitations by encoding complex genotype relationships in easily portable inference models. Here we demonstrate an autoencoderbased approach for genotype imputation, using a large, commonly used reference panel, and spanning the entirety of human chromosome 22. Our autoencoderbased genotype imputation strategy achieved superior imputation accuracy across the allelefrequency spectrum and across genomes of diverse ancestry, while delivering at least 4fold faster inference run time relative to standard imputation tools.