Universality of clonal dynamics poses fundamental limits to identify stem cell self-renewal strategies

Abstract
Introduction
Results
Discussion
Materials and methods
Appendix 1
Data availability
References
Article and author information
Metrics

Abstract

How adult stem cells maintain self-renewing tissues is commonly assessed by analysing clonal data from in vivo cell lineage-tracing assays. To identify strategies of stem cell self-renewal requires that different models of stem cell fate choice predict sufficiently different clonal statistics. Here, we show that models of cell fate choice can, in homeostatic tissues, be categorized by exactly two ‘universality classes’, whereby models of the same class predict, under asymptotic conditions, the same clonal statistics. Those classes relate to generalizations of the canonical asymmetric vs. symmetric stem cell self-renewal strategies and are distinguished by a conservation law. This poses both challenges and opportunities to identify stem cell self-renewal strategies: while under asymptotic conditions, self-renewal models of the same universality class cannot be distinguished by clonal data only, models of different classes can be distinguished by simple means.

Introduction

Adult stem cells are the key players for maintaining and renewing biological tissue, due to their ability to persistently produce tissue cells through cell division and differentiation (National Institute of Health, 2009). For maintaining tissues in a homeostatic state, it is crucial that stem cells adopt suitable self-renewal strategies, a pattern of stem cell fate choices that balances proliferation and differentiation; otherwise, imbalanced proliferation may lead to hyperplasia and cancer. Therefore, the understanding and identification of stem cell self-renewal strategies has been a major goal of stem cell biology ever since the discovery of adult stem cells.

Classically, two stem cell self-renewal strategies have been proposed (Potten and Loeffler, 1990; Simons and Clevers, 2011a): following the Invariant Asymmetric division (IA) strategy, stem cells undertake only asymmetric divisions, whose outcome is one differentiating cell and one stem cell as daughter cells. The other proposed strategy, Population Asymmetry (PA) (Potten and Loeffler, 1990; Simons and Clevers, 2011a; Watt and Hogan, 2000; Klein and Simons, 2011), features additionally symmetric divisions, which produce either two stem cells or two differentiating cells as daughters, yet in balanced proportions. Both patterns of cell fate choice leave the number of cells on average unchanged and thus can maintain homeostasis. Assessing stem cell self-renewal strategies experimentally is difficult in vivo, since direct observation of cell divisions is rarely possible. Yet, through genetic cell lineage-tracing assays, the statistics of clones – the progeny of individual cells – can be obtained, and via mathematical modeling assessing cell fate dynamics became possible. With such an approach several studies suggested that population asymmetry prevails in many mouse tissues (e.g. Clayton et al., 2007; Lopez-Garcia et al., 2010; Simons and Clevers, 2011b; Doupé et al., 2012; Klein et al., 2010).

However, the interpretation of those studies has been challenged by a suggested alternative self-renewal strategy, called Dynamic Heterogeneity (DH), featuring some degree of cell fate plasticity (Greulich and Simons, 2016). In this model, all stem cell divisions are asymmetric, yet it is in agreement with the experimental clonal data that had previously been shown to agree also with the population asymmetry strategy. Thus, those two strategies are not distinguishable in view of the clonal data.

This raises the question to what extent different stem cell self-renewal strategies can be distinguished at all via clonal data (Klein and Simons, 2011; Greulich, 2019). Here, we address this question by studying models for stem cell fate choice, which define the self-renewal strategies, in their most generic form. We show that many cell fate models predict, under asymptotic conditions, the same clonal statistics and thus cannot be distinguished via clonal data from cell lineage-tracing experiments. In particular, we find that there exist two particular classes of stem cell self-renewal strategies: one class of models which all generate an Exponential distribution of clone sizes (the number of cells in a clone) after sufficiently large time, and one which generates a Normal distribution under sufficiently fast stem cell proliferation. Crucially, these two classes are not differentiated via the classical definitions of symmetric and asymmetric stem cell divisions, but by whether or not a subset of cells is conserved. These classes thus bear resemblance to 'universality classes' known from statistical physics, as suggested in Klein and Simons, 2011. This leads us to a more generic, and in this context more useful, definition of the terms ‘symmetric’ and ‘asymmetric’ divisions. Notably, however, we find that the conditions for the emergence of universality are not always fulfilled in real tissues, which provides chances, but also further challenges, for the identification of stem cell fate choices in homeostatic tissues.

Strategies for stem cell self-renewal

The two classical stem cell self-renewal strategies, Invariant Asymmetry (IA) and Population Asymmetry (PA) (Potten and Loeffler, 1990; Simons and Clevers, 2011a; Watt and Hogan, 2000; Klein and Simons, 2011), are commonly described in terms of two cell types: stem cells (S) which can self-renew (i.e. divide without reducing their potential to divide in the future); and differentiating cells (D). Both strategies can be expressed in terms of a single parametrized stochastic model, a multi-type branching process (Haccou et al., 2005), defined by the outcomes of cell divisions (the cell fate choices),

S \overset{λ}{⟶} {\begin{cases} S + S & with probability r \\ S + D & with probability 1 - 2 r \\ D + D & with probability r \end{cases},

where cells of type S divide with rate λ. Here, a daughter cell configuration $S + S$ corresponds to symmetric self-renewal division and $D + D$ to symmetric differentiation, while daughter cells of different type, $S + D$ , marks an asymmetric division. In the basic model version, a cell of type D is eventually lost with rate γ, $D \overset{𝛾}{⟶} \emptyset$ (corresponding to death, shedding, or emigration of D-cells), while other versions may include the possibility of limited proliferation as committed progenitor cells. The two self-renewal strategies, IA and PA, are distinguished by the value of the symmetric division fraction r: the PA model corresponds to any $0 < r \leq \frac{1}{2}$ ; the IA model is defined by $r = 0$ , that is, only asymmetric divisions occur.

To maintain homeostasis, the number of cells must stay, on average, constant. Thus cells following the PA strategy must regulate the probabilities of symmetric self-renewal and differentiation to be exactly equal, whereas for the IA model this is trivially assured. However, only for the IA model is the number of stem cells strictly conserved, that is, no gain or loss of stem cells is possible.

A way to assess self-renewal strategies experimentally is via genetic cell-lineage tracing (Kretzschmar and Watt, 2012; Blanpain and Simons, 2013): By marking single cells with an inheritable genetic marker (through a Cre-Lox system [Soriano, 1999; Sauer, 1998]) each cell’s progeny, called a clone, which retain that marker, can be traced. The number of cells per clone, that is the clone size, is measured and the statistical frequency distribution of clone sizes (clone size distribution) determined. To test the cell fate choice models on that data, one evaluates the models with a single cell as initial condition and samples the outcome in terms of the final cell numbers – the size of a virtual clone. In the basic version of the model (i.e. when $D \overset{𝛾}{⟶} \emptyset$ ), the IA and PA models predict, respectively, a Poisson and an Exponential clone size distribution for large times (Klein and Simons, 2011; Antal and Krapivsky, 2010) (see also the Appendix, 'Invariant Asymmetry and Population Asymmetry models'). Thus, they are fundamentally different and can easily be distinguished when compared with clonal data. By a series of lineage-tracing experiments it was confirmed that Exponential clone size distributions prevail for most mouse tissues, which thus exclude the IA model and support the PA strategy (Clayton et al., 2007; Lopez-Garcia et al., 2010; Simons and Clevers, 2011b; Doupé et al., 2012; Klein et al., 2010).

While this seemed to settle the case in favour of the PA strategy, at least for most adult mouse tissues, this was challenged by a third type of strategy, the DH model (Greulich and Simons, 2016). Motivated by the emerging view of prevailing cell plasticity (Blanpain and Fuchs, 2014; Tetteh et al., 2015; Tetteh et al., 2016; Donati and Watt, 2015), the DH model considers the possibility of reversible switching between two cell types:

S \overset{λ}{\to} S + D, S ⇌_{ω_{S}}^{ω_{D}} D, D \overset{γ}{\to} \emptyset .

where symbols at arrows denote the process rates (frequency of events). This strategy is also capable of maintaining a homeostatic population if $γ / λ = ω_{S} / ω_{D}$ . Notably, the DH model only features asymmetric divisions (in that daughter cells are of different type), like the IA model, yet the DH model predicts clonal statistics that are indistinguishable from the PA model (Greulich and Simons, 2016). This means that in view of the existing clonal data for mouse tissues, the DH model, may as well describe the real cell fate dynamics. More fundamentally, this implies that the PA and DH model cannot be distinguished via plain clonal data, which poses fundamental limitations to the common approach to use lineage tracing for determining cell fate choices.

This demonstrates that the classical definition of asymmetric and symmetric divisions is not always suitable to distinguish cell fate strategies in view of clonal data alone. In general, cell fate dynamics may be much more complex than the simplified models described above, as there may exists a plethora of cell (sub-)types in a tissue. However, to what extent would it be possible to distinguish details of potentially rather complex cell fate dynamics models through comparison with clonal data at all? This is only the case if the clonal statistics are sufficiently different. In the following, we study cell fate models in their most generic form, and analyze what clonal statistics would be expected.

Results

Model generalization

Let us consider the dynamics of a generic system of cells, characterized by a number m of possible cell states X_i, $i = 1, \dots, m$ . We define a cell state here as a group of cells showing common properties (e.g. any cell sub-type classification). Most generally, cells in a state X_i may be able to divide, producing daughter cells of any cell states X_j and X_k (where $i = j = k$ , that is, simple cell duplication, is possible). Furthermore, any cell state X_i may turn into another state X_j or may be lost (through emigration, shedding, or death). Hence, we can write a generic cell fate model as,

cell division: X_{i} \overset{λ_{i} r_{i}^{j k}}{\to} X_{j} + X_{k}

cell state change: X_{i} \overset{ω_{i j}}{\to} X_{j}

cell loss: X_{i} \overset{γ_{i}}{\to} \emptyset,

where $i, j, k = 1, \dots, m$ . In this model, $λ_{i}$ is the rate of division of cells in state X_i and the parameter $r_{i}^{j k}$ corresponds to the proportion of division outcomes producing daughter cells of state X_j and state X_k; ω_ij is the transition rate from state X_i to state X_j and $γ_{i}$ the loss rate from state X_i.

The dynamics of each cell in Equations 3-5 could depend on the cell environment through spatial, cell-extrinsic regulation of cell fate. However, the clonal statistics of spatial models that include cell-extrinsic regulation of cell fate (models of the voter type [Clifford and Sudbury, 1973]) are, in the long term, the same as for the corresponding branching process models (Haccou et al., 2005), as Equations 3-5 are, except for one-dimensional arrangements of cells (as shown in Klein and Simons, 2011; Bramson and Griffeath, 1980). Here, we are focussing on the long-term clonal statistics of self-renewal strategies, and since this is not affected by cell-extrinsic regulation, for tissues with two-dimensional or three-dimensional arrangements of dividing cells (like epithelial sheets, and volumnar tissue), we wish to keep the analysis simple and therefore choose dynamics (and thus the parameters $λ_{i}, ω_{i j}, r_{i}^{j k}, γ_{i}$ ) to be independent of the cell environment.

In the following, we study the dynamics of cell numbers in each state X_i, $n_{i}$ . To gain initial insight into those dynamics, let us first consider the time evolution of the mean cell numbers, ${\bar{n}}_{i} = ⟨ n_{i} ⟩$ , given by,

\frac{d}{d t} {\bar{n}}_{i} = \sum_{j} (λ_{j} 2 r_{j}^{i} + ω_{j i}) {\bar{n}}_{j} - (λ_{i} + \sum_{j} ω_{i j} + γ_{i}) {\bar{n}}_{i} .

in which $r_{i}^{j} = \sum_{k} (r_{i}^{j k} + r_{i}^{k j}) / 2$ is the probability of having a daughter cell in state X_j produced upon division of a cell in state X_i. This linear system of differential equations can be written more compactly in terms of the mean cell number vector $\bar{𝒏} = ({\bar{n}}_{1}, {\bar{n}}_{2}, \dots, {\bar{n}}_{m})$ ,

\frac{d}{d t} \bar{𝒏} = A \bar{𝒏},

with $A$ being the $m \times m$ matrix

A = (\begin{matrix} κ_{11} - δ_{1} & κ_{21} & κ_{31} & \dots \\ κ_{12} & κ_{22} - δ_{2} & κ_{32} & \dots \\ κ_{1 m} & κ_{2 m} & \dots & κ_{m m} - δ_{m} \end{matrix}),

where we defined the total transition rate $κ_{i j} = λ_{i} 2 r_{i}^{j} + ω_{i j}$ , combining all transitions from X_i to X_j by cell divisions and direct transitions, and the local loss rate $δ_{i} = λ_{i} + \sum_{j} ω_{i j} + γ_{i}$ .

Models of the form Equations 3–5 are not generally in homeostasis, which in this context is defined by the existence of a stationary state ${\bar{𝒏}}^{*}$ , with $d {\bar{𝒏}}^{*} / d t = 0$ , that is (Lyapunov) stable and non-trivial (for a discussion, see the Appendix 'Conditions for homeostasis'). This can in principle be assessed through the spectral properties of A (Åström and Murray, 2008), but applying spectral conditions explicitly is unwieldy and difficult to interpret biologically. For a more intuitive view, we interpret the system, Equation 7, as a network (graph): the matrix A can be interpreted as the adjacency matrix of the cell state network. This is a weighted directed graph in which cell states correspond to the graph’s nodes and a link from state X_i to X_j exists where a transition is possible, that is, when $κ_{i j} > 0$ . The value of $κ_{i j}$ also denotes the link weights (diagonal elements of A can be considered as self-links). Now, we note that Equation 7 is linear and cooperative, that is, the off-diagonal elements of matrix A are non-negative, and for such systems more simple and intuitive conditions for homeostasis exist (Greulich et al., 2019), based on a decomposition into the network’s Strongly Connected Component (SCC). An SCC is a sub-graph that groups nodes which are strongly connected, that is, which are mutually connected by paths (more accurately: two nodes, X_i and X_j are strongly connected if there exists a path from X_i to X_j and from X_j to X_i on the network). An example of such a decomposition, which yields an acyclic condensed network that contains SCCs as nodes and directed links between them, is shown in Figure 1.

Figure 1

Download asset Open asset

Illustration of the decomposition of a homeostatic cell state network into SCCs and the compartment representation, Equation 9.

(Left): An example cell state network representing the matrix $A$ in Equation 8 (self-links not displayed). The dashed circles denote the network’s Strongly Connected Components (SCCs ) (see definition in text). (Middle): The *Condensed network* is the corresponding network of SCCs, $S_{k}$ , wherein SCCs are the nodes and a link between two SCCs exists if any of their states are connected. For homeostatic networks, an SCC with dominant eigenvalue $μ = 0$ is at the apex, while other SCCs have $μ < 0$ . (Right): We distinguish two compartments, the Renewing compartment $ℛ$ , consisting of the apex SCC, with $μ = 0$ , and the Committed compartment $𝒞$ consisting of the remainder, with $μ < 0$ .

The stability of systems like Equation 7 is then determined by the dominant eigenvalues $μ_{k}$ of each strongly connected component $k$ , for $k = 1, \dots, m_{S}$ where $m_{S}$ is the number of SCCs, and their topological arrangement (the Perron-Frobenius theorem assures that for adjacency matrices of SCCs of cooperative systems, a unique, real, maximal eigenvalue exists, which is the dominant eigenvalue [Arrow, 1989; Greulich et al., 2019]). In brief, according to Greulich et al., 2019, the conditions for existence of a homeostatic state are that, at the apex of each lineage (the condensed cell state network), there must be an SCC with dominant eigenvalue $μ_{k} = 0$ , while all SCCs downstream of the former must have $μ_{k} < 0$ (see detailed discussion in the Appendix, 'Conditions for homeostasis'). Given this structure of homeostatic models, we can define two compartments in the cell state transition network: (1) the (self-) Renewing compartment ( $ℛ$ ), which is the SCC at the apex of the lineage tree; and (2) the Committed compartment ( $𝒞$ ), which consists of all SCCs with $μ_{k} < 0$ , that is, those downstream of the apex SCC. Importantly, cells in states forming $ℛ$ have the potential to return to any state within the same compartment and this population maintains itself. Instead, the cell population in $𝒞$ would vanish without external input, since the combined dominant eigenvalue of all those SCCs is negative (it is the maximum of all SCCs’ $μ_{k} < 0$ ), thus the progeny of each cell in the committed compartment will eventually be lost. We can thereby classify cells as being of a (self-)Renewing type (R) if their state is within $ℛ$ , and of a Committed type (C) if their state is in $𝒞$ . With this coarse-grained classification, a generic homeostatic model can be represented in terms of compartments $ℛ$ and $𝒞$ as,

R \overset{λ_{R}}{\to} {\begin{cases} R + R & w i t h p r o b a b i l i t y r_{R R} \\ R + C & w i t h p r o b a b i l i t y 1 - r_{R R} - r_{C C} \\ C + C & w i t h p r o b a b i l i t y r_{C C} \end{cases},

R \overset{ω_{R C}}{\to} C, C \overset{λ_{C}}{\to} C + C, C \overset{γ_{C}}{\to} \emptyset,

where the symbols above arrows are the effective rates of those events, denoting the average frequency at which they occur (loss events $R \to \emptyset$ are not explicitly included, since they can be approximated by a short lived state $X_{d}$ in $𝒞$ , as $R \to X_{d} \to \emptyset$ ). To be compatible with a homeostatic condition, it is further required that (i) the R-population remains on average constant ( $μ_{k} = 0$ ), that is, $λ_{R} r_{R R} = λ_{R} r_{C C} + ω$ , and (ii) the loss rate of C must exceed its proliferation rate ( $μ_{k} < 0$ ), that is, $γ_{C} > λ_{C}$ . Figure 1 shows how a generic homeostatic cell state network can be condensed into an effective model of renewing and committed cell states, according to Equation 9. It has to be noted, however, that the events depicted in Equation 9 are not Markovian, that is, the timing of events is not independent from each other and depends on their history. Thus, the ‘rates’ $λ_{R}$ , $λ_{C}$ , $ω_{R C}$ , and γ_C are not constant rates in the Markovian sense, yet we can define them by the mean frequency of events occurring (see Appendix 'Approximation of generic GIA models' and 'Asymptotic clone size distributions: mathematical analysis').

The formulation in terms of renewing and committed states can help us to gain insights into potential behaviors of generic homeostatic cell fate models. In particular, we define generalized asymmetric divisions as events of the type $R \to R + C$ , and generalized symmetric divisions as events of the type $R \to R + R$ (symmetric renewal) and $R \to C + C$ (symmetric commitment). With these definitions, we can categorize homeostatic cell fate models into two classes: Generalized Invariant Asymmetry (GIA) models are those which only exhibit $R \to R + C$ divisions in the renewing compartment, while Generalized Population Asymmetry (GPA) are models for which such restriction does not hold. We note that the two classes are equivalently characterized by a conservation law: For GIA models, the number of cells in $ℛ$ is strictly conserved, while for GPA models, no such conservation law holds. Since $μ = 0$ is necessary for conservation, the only possible conserved cell states in homeostasis are those in $ℛ$ . Naturally, the previously discussed IA model is a GIA model and the PA model is a GPA model. Notably, the DH model (Equation 2) is of the GPA category, since in that model S and D cells form a single SCC at the apex of the lineage hierarchy, and thus they are both part of $ℛ$ . Therefore, a division $S \to S + D$ in the DH model, which is asymmetric in the conventional sense, corresponds to $R \to R + R$ in terms of compartments (Equation 9) and thus it is a generalised symmetric division. According to this classification, PA and DH models are both in the same category (GPA), and indeed, both predict the same type of clone size distribution, an Exponential one (Greulich and Simons, 2016).

Numerical simulation of random cell fate models

To check whether the correspondence between model class, GIA vs. GPA, and predicted clonal statistics holds in general, we analyze the clonal dynamics numerically, by generating and testing a large number of random stochastic models, implemented via random generation of the parameters λ_i, ω_ij, γ_i and $r_{i}^{j k}$ . To simulate clones, we perform stochastic simulations based on the Gillespie algorithm (Gillespie, 1977), assuming a Markov process following the rules of Equation 3-5. We run, for each model, a large number of simulations with initially one cell in the compartment $ℛ$ , thus the cell population of each simulation run represents one clone. Then we sample their outcomes, the total cell numbers per clone (the clone size) $n = \sum_{i} n_{i}$ , to obtain predictions for clonal statistics, namely the frequency distribution of clone sizes (clone size distribution) and mean clone sizes (see Materials and methods).

We first study the mean clone size of surviving clones (with $n > 0$ ), ${\bar{n}}_{s} = {⟨ n ⟩ |}_{n > 0}$ , shown in Figure 2, respectively, for the GIA and GPA models, as a function of time (the final time $τ = 20 / α_{\min}$ where $α_{\min}$ is the minimal process rate, $α_{\min} = \min (λ_{1}, \dots, ω_{12}, \dots, d_{m})$ ). We note that indeed a common behavior is seen in each case. While for every simulated GIA model, ${\bar{n}}_{s}$ saturates at a plateau value, it steadily increases for every GPA model. This is expected, and can be understood given that clones in a GPA model can go extinct while those in a GIA model not. Assume that there are initially a large number $N_{c}$ of clones, such that the total number of cells is $n_{tot} = N_{c} {\bar{n}}_{s}$ . Since the system is homeostatic, it will reach a constant steady state $n_{tot}^{*}$ after a sufficient amount of time, meaning that the mean clone size is ${\bar{n}}_{s} = n_{tot}^{*} / N_{c}$ . If no clones go extinct, as in GIA models, $N_{c}$ is constant and thus ${\bar{n}}_{s}$ approaches a constant. However, in non-conserved multi-type branching processes, as GPA models are, the clone number $N_{c}$ decreases through progressive extinction of clones (Haccou et al., 2005), and therefore ${\bar{n}}_{s}$ increases, despite the cell population as a whole staying stationary.

Figure 2

Download asset Open asset

Mean size of surviving clones, ${\bar{n}}_{s}$ , as a function of time for random GIA models (a), and GPA models (b).

In (a), $τ = 20 / α_{\min}$ , in (b), $τ$ is the time at 98% clone extinction. The grey shade represents the percentile of all the simulations (black lines limit the 5-95%ile range); the blue curves correspond to some illustrative selected simulations. Simulations for which the final mean is below two and where the final condition is not achieved (due to computational limitations) are not included: this results in 238 and 571 models, respectively for the GIA and GPA cases.

The resulting clone size distributions for the two model classes are shown in Figure 3. Here, clones sizes $n$ are rescaled by the mean value ${\bar{n}}_{s}$ and compared to an Exponential distribution of unitary mean (green curve). As conjectured, all simulated GPA models shown in panel (b) predict asymptotically the same rescaled clone size distribution, namely a standard Exponential distribution. Deviations exist for small times and small clone sizes, but these deviations vanish in the large time limit (details on the convergence are shown in the Appendix, 'Analysis of the generalized Population Asymmetry model'). This means that different models within the GPA class cannot be distinguished in the long-term limit, since they differ only by the mean clone size, which is a free fit parameter. In analogy to statistical physics, we can categorize them as a universality class (Klein and Simons, 2011), meaning that the details of the model do not affect the (scaled) outcomes for assymptotic conditions, which is a form of weak convergence of random variables (Billingsley, 1968). However, the same cannot be said about the GIA models. In fact, we see all kind of shapes in the clone size distributions, both peaked distributions and non-peaked ones, and in fact, some distributions are even close to an Exponential form, and can thus not be distinguished from GPA models. The question is whether we can yet find other parameters for which, when large, also GIA models exhibit universality, that is, yield the same rescaled clone size distribution. For this purpose, we will in the following sections develop a deeper theoretical understanding of the model classes.

Figure 3

Download asset Open asset

Rescaled clone size distributions (expected relative frequency $P$ of clone sizes) for random GIA models (a), and GPA models (b), in terms of the rescaled clone size $x = n / {\bar{n}}_{s}$ , at final time $t = τ$ (see Figure 2 for definition).

The grey shade represents the percentile of all the simulations (black lines limit the 5-95%ile range); the blue curves correspond to some selected simulations. A reference curve corresponding to an Exponential distribution of unitary mean (’Exp(1)’) is shown in green.

Mathematical analysis: Markovian approximation of compartment model

To obtain a deeper understanding of the numerical results, we study the cell fate models in terms of the compartment representation, Equation 9. In this representation models are not Markovian, yet we can study their Markovian counterpart, as an approximation. While this is not expected to yield accurate clone size distributions in general, the limiting distributions of non-Markovian processes are commonly well estimated by their Markovian counterparts.

For GIA models, which only feature $R \to R + C$ transitions between the renewing compartment, $𝒞$ , and the committed compartment, $𝒞$ , a corresponding Markovian model reads,

X_{1} \overset{λ_{1}}{⟶} X_{1} + X_{2}, X_{2} \overset{λ_{2}}{⟶} X_{2} + X_{2}, X_{2} \overset{γ}{⟶} \emptyset,

in which $X_{1}$ represents a single state in $ℛ$ and $X_{2}$ in $𝒞$ , and symbols at arrows are the process rates. The number of cells in $X_{1}$ , $n_{1}$ , is conserved, that is, given an single $X_{1}$ -cell initially, it always remains at $n_{1} = 1$ . Thus, we only need to consider the dynamics of cells in $X_{2}$ , $n_{2}$ . This Markov process can be solved analytically, and for sufficiently large steady state mean number of $X_{2}$ -cells, ${\bar{n}}_{2} = ⟨ n_{2} ⟩ = λ_{1} / (γ - λ_{2})$ (see Appendix, 'GIA⁰ test case: steady state distribution and limiting behavior'), the rescaled distribution of cells in $X_{2}$ is,

P (x_{2}) = {(1 - {\hat{λ}}_{2})}^{\frac{{\hat{λ}}_{1}}{{\hat{λ}}_{2}}} {\hat{λ}}_{2}^{\frac{{\hat{λ}}_{1} x_{2}}{(1 - {\hat{λ}}_{2})}} \frac{Γ (\frac{{\hat{λ}}_{1}}{{\hat{λ}}_{2}} + \frac{{\hat{λ}}_{1}}{1 - {\hat{λ}}_{2}} x_{2})}{x_{2} Γ (\frac{{\hat{λ}}_{1}}{{\hat{λ}}_{2}}) Γ (\frac{{\hat{λ}}_{1}}{1 - {\hat{λ}}_{2}} x_{2})},

in which $x_{2} = n_{2} / {\bar{n}}_{2}$ , ${\hat{λ}}_{1} = λ_{1} / γ$ and ${\hat{λ}}_{2} = λ_{2} / γ$ and $Γ (\dots)$ is the Gamma function (Abramowitz and Stegun, 1972). We note that this distribution exhibits a large variety of shapes: for large ${\hat{λ}}_{1}$ the distribution is peaked, while for small ${\hat{λ}}_{1}$ is loses its peak. Notably, for ${\hat{λ}}_{1} \to 1$ and ${\hat{λ}}_{2} \to 1$ , the distribution becomes Exponential and in this case it cannot be distinguished from the GPA case. On the other hand, for ${\hat{λ}}_{1} \to \infty$ , that is, when the ratio of asymmetric divisions over the loss rate is high, this distribution tends to a Normal distribution with unitary mean and variance equal to $1 / {\hat{λ}}_{1}$ . These different behaviors are graphically shown in the Appendix (see Appendix 1—figure 6, 7 and 8).

For the GPA models, a Markovian approximation reads, accordingly,

X_{1} \overset{λ_{1}}{⟶} {\begin{cases} X_{1} + X_{1} & w i t h p r o b a b i l i t y r_{1} \\ X_{1} + X_{2} & w i t h p r o b a b i l i t y 1 - r_{1} - r_{2} \\ X_{2} + X_{2} & w i t h p r o b a b i l i t y r_{2} \end{cases},

X_{1} \overset{ω}{⟶} X_{2}, X_{2} \overset{λ_{2}}{⟶} X_{2} + X_{2}, X_{2} \overset{γ}{⟶} \emptyset .

whereby for homeostasis to prevail, $λ_{1} r_{1} = λ_{1} r_{2} + ω$ and $λ_{2} < γ$ must hold. We note that the dynamics of $X_{1}$ are independent of $X_{2}$ and thus for the number of cells in $X_{1}$ in homeostasis holds

n_{1} \overset{λ_{1} r_{1} n_{1}}{\to} n_{1} \pm 1,

which corresponds to a simple continuous-time branching process with two offspring, for which it is known that the resulting distribution of cell numbers is Exponential, that is, $P_{1} (n_{1}) = {\bar{n}}_{1, s}^{- 1} e^{- n_{1} / {\bar{n}}_{1, s}}$ , where ${\bar{n}}_{1, s} ≃ λ_{1} r_{1} t$ is the mean number of cells in the surviving clones (Haccou et al., 2005).

$X_{2}$ cells produced according to 12 follow the same fate as in the two-state GIA model above. While it is not assured that the distribution of $X_{2}$ cells is identical to that of Equation 11 (due to simultaneous production events of type $X_{1} \to X_{2} + X_{2}$ ), we show in the Appendix, 'Asymptotic clone size distributions: mathematical analysis', that for large rates of production of C-cells, the distribution of C-cells – here: cells in state $X_{2}$ – attains a Normal distribution with mean ${\bar{n}}_{2}$ equal to its variance $σ_{n_{2}}^{2} = ⟨ {(n_{2} - {\bar{n}}_{2})}^{2} ⟩ = {\bar{n}}_{2}$ . As each $X_{1}$ cell contributes independently to the production of $X_{2}$ -cells, we have that ${\bar{n}}_{2} \sim n_{1, s} \sim t$ . Crucially, this means that in terms of the rescaled variable $x_{2} = n_{2} / {\bar{n}}_{s}$ the standard deviation $σ_{x_{2}} = \frac{σ_{n_{2}}}{{\bar{n}}_{s}} \leq \frac{1}{\sqrt{{\bar{n}}_{2}}} \sim t^{- 1 / 2}$ vanishes for large times, since ${\bar{n}}_{2} \sim n_{1, s} \sim t \to \infty$ . Hence, given fixed $x_{1}$ , $x_{2}$ can be approximated by a constant random number ${x_{2} |}_{x_{1}} \sim {\bar{x}}_{1} = n_{1} / {\bar{n}}_{s}$ . Therefore, the rescaled distribution of the total number of cells is $P (x) = P_{1} (x - x_{2}) = e^{- x}$ , where $\bar{x} = {\bar{x}}_{1} + {\bar{x}}_{2} \sim {\bar{x}}_{1}$ . Thus, the rescaled distribution of the total clone size, $x = n / {\bar{n}}_{s}$ , is as well an Exponential.

Universality of generic cell fate models

For generic GIA or GPA models, the compartment representation, Equation 9, is not Markovian and one would not expect exactly the distributions we found in the previous section. Fortunately, the limiting distributions of non-Markovian processes and their Markovian counterparts are often, under certain conditions on the parameters, the same. While we reserve the technical arguments for the Appendix ('Asymptotic clone size distributions: mathematical analysis'), we note that this independence of the limiting distribution on the Markov property related to the central limit theorem, which does not rely on the Markov property.

To identify the correct limiting parameters for more complex cell fate models, we need to express the effective non-Markovian rates (i.e. the mean frequency of events) of representation nine in terms of the original model, 3–5. As discussed in the Appendix ('Approximation of generic GIA models' and 'Asymptotic clone size distributions: Mathematical analysis'), we identify those effective rates by the total rates of cell divisions, $λ_{R} = \sum_{i \in ℛ} λ_{i} P_{i}^{R}$ , $γ_{C} = \sum_{i \in 𝒞} γ_{i} P_{i}^{C}$ , and $ω_{R C} = \sum_{i \in ℛ, j \in 𝒞} ω_{i j} P_{i}^{R}$ where, for each compartment, $P_{i}^{R, C} = {\bar{n}}_{i} / \sum_{j \in ℛ, 𝒞} {\bar{n}}_{j}$ is the probability of a single cell being in state X_i of $ℛ$ , respectively ( ${\bar{n}}_{i}$ are the solutions to Equation 6). In the Appendix, 'Asymptotic clone size distributions: mathematical analysis', we reason that all GPA models are expected to generate Exponential clone size distributions for large times t. This is indeed what is observed in Figure 3(b). Correspondingly, for GIA models we expect that for large ${\hat{λ}}_{R} = λ_{R} / γ_{C}$ the clone size distribution of GIA models would tend to a Normal distribution. To test this prediction, we simulated the same GIA models as for Figure 3 before, but we tuned parameters in $ℛ$ such that the effective parameter ${\hat{λ}}_{R}$ becomes large (see details in the Appendix, 'GIA model for large ${\hat{λ}}_{R}$ '). The result is shown in Figure 4: for an illustrative case shown in panel (a), increasing ${\hat{λ}}_{R}$ changes the distribution from an exponential form to a peaked form akin to a Normal distribution, and for all simulated random GIA models, shown in panel (b), a Normal distribution is approached when ${\hat{λ}}_{R}$ becomes large.

Figure 4

Download asset Open asset

Rescaled clone size distributions (expected relative frequency P of clone sizes) for random GIA models as in Figure 3, at time $t = τ$ (see definition in Figure 2).

Sensitivity to parameter ${\hat{λ}}_{R}$ is shown for one illustrative case in panel (a), and all GIA models for ${\hat{λ}}_{R} = 30$ in panel (b). The distributions are shown in terms of the rescaled variables $x = n / {\bar{n}}_{s}$ for panel (a) and $\tilde{x} = (n - {\bar{n}}_{s}) / σ_{n}$ , where $σ_{n}$ is the distributions variance, in panel (b). In (b), the grey shade represents the percentile of all simulations (black lines limit the 5-95%ile range); the blue curves correspond to some selected simulations. A reference curve corresponding to a Normal distribution of zero mean and unitary variance is shown in green. Simulations for which $t = τ$ is not reached (due to computational limitations) are not included, resulting in 922 model instances.

We note that when taking the limit of large ${\hat{λ}}_{R}$ , as shown in Figure 4, also all other process rates ω_ij with i,j within $ℛ$ increased as well. What if instead some process rates in $ℛ$ do not scale to become large with ${\hat{λ}}_{R}$ ? To assess this situation, we studied a simple test case similar to model 10 but containing two states in $ℛ$ , connected via direct state transition (see Appendix, 'GIA^B test case: bimodal distribution'). As discussed there, if all rates within $ℛ$ are large compared to the rates in $𝒞$ then indeed we observe a Normal clone size distribution, as expected. However, if the direct transition rates between the states of $ℛ$ are smaller or of equal magnitude as γ_C, and in addition, one of the two division rates is higher then the other, then we observe a bimodal clone size distribution. The reason is that if the transitions between the two states in $ℛ$ are rare compared to the life time of cells, $1 / γ_{C}$ , they become essentially separated and each of those states generate separate Normal distributions with different mean (due to different cell division rates in those two states) which, when overlaid, generate a bimodal clone size distribution (see detailed arguments in the Appendix, 'Asymptotic clone size distributions: mathematical analysis').

Finally, from those considerations follows:

GPA models attain an Exponential clone size distribution for time $t \to \infty$ .
GIA models attain a Normal clone size distribution if all process rates within $ℛ$ are much larger than the inverse lifetime of C-cells, γ_C.

Hence, the GIA and GPA model classes, each represent a universality class, that is, a scaling limit exists in which all models of the same class yield the same rescaled clonal statistics.

Discussion

Our analysis shows that intrinsic limitations exist for identifying strategies of stem cell self-renewal through clonal data from cell lineage-tracing experiments. This is due to different models of cell fate choice generating the same type of clonal statistics (clone size distributions), so that model inference based on clonal statistics – currently still the most prevalent method to determine stem cell self-renewal strategies – fails to distinguish them. The feature that different models asymptotically generate the same statistics is a form of weak convergence of random variables (Billingsley, 1968) and corresponds to universality, as known from statistical physics.

Cell fate models can in principle be very complex, with a plethora of cell (sub-)types in a tissue. We introduced a new categorization of cell types, distinguishing between cell states that are committed (C-cells), whose progeny is inevitably lost eventually, and non-committed or (self-)renewing cell states (R-cells), which retain the potential to remain or return to the apex of the lineage hierarchy. According to this categorization we classified generic models of cell fate choice as Generalized Invariant Asymmetry (GIA), if only generalized asymmetric divisions of the form $R \to R + C$ occur for R-cells, and Generalized Population Asymmetry (GPA), when all kind of divisions can occur, as long as gain and loss of R-cells are balanced. Models of the GIA category are also characterized by a conservation law, since the number of R-cells is strictly conserved, while GPA models do not exhibit such a conservation law.

We found that the classification in GIA and GPA models mirrors the clonal statistics generated by them: models of the GPA class all generate clonal statistics which with time converge to an Exponential clone size distribution. Thus, two GPA models can therefore not be distinguished through clonal data, once some time has passed after induction of clones. For GIA models, distributions can generally vary, but if the rates of divisions and transitions in the $ℛ$ compartment are much larger that the rate of cell loss, the clone size distribution of all those models becomes a Normal distribution. In that case, two GIA models can not be distinguished by the clonal data. While here we do not explicitly consider cell-extrinsic regulation of cell fate, this kind of regulation does not affect long-term clone size distributions, except when cells are arranged one-dimensionally (Klein and Simons, 2011; Bramson and Griffeath, 1980). Thus, our results cover cell dynamics in most renewing tissues, such as epithelial sheets or volumnar organs, but not (quasi-)one-dimensional arrangements of stem cells, as found in the seminiferous tubule, or in intestinal crypts, where clonal statistics may differ. Hence, our analysis shows that models of cell fate choice cannot in general be distinguished with further resolution beyond the R vs. C categorization of cell types. The universality of the model dynamics also shows that effective, simplistic models are often equally accurate to model experimental data, yet with a higher statistical power due to less free parameters.

While at first glance, this analysis seems to discourage efforts to unravel details of cell fate dynamics, room remains in regimes where the limiting conditions for asymptotic distributions are not fulfilled. In particular, if fast cycling committed progenitor cells are present, while stem cells are slow cycling, then the condition that the division rate of R-cells is much larger than the cell loss rate is not fulfilled. In that case, details of the model dynamics may affect the shape of the clone size distribution and thus allow distinction between models. However, caution should be given when an Exponential clone size distribution is observed, since this could indicate either a GIA model with high activity of committed progenitor cells, or a GPA model. In that case, the mean clone size needs to be consulted to distinguish models (see Figure 2). Differentiating between models within the GPA category is more difficult, since the predicted statistics from different models always become more similar over time. Short-term measurements would in principle allow such a distinction, but since in reality the underlying processes are not truly Markovian (as assumed for the modeling purpose) they are not necessarily a good representation of the real cell dynamics at short times. At long times, however, Markovian approximations are increasingly accurate, precisely because of the feature of universality.

How could the resolution of cell fate modeling be improved? The state-of-the-art approach to determine cell fate trajectories is via analysis and modeling of single-cell RNA-sequencing (scRNA-seq) data. However, many limitations to this method exist, discussed in Weinreb et al., 2018, and neither reversible trajectories nor the modes of cell division, such as asymmetric vs symmetric divisions, can be inferred. Intravital live imaging, on the other hand, allows to trace individual clones over time (Ritsma et al., 2012; Pittet and Weissleder, 2011; Hara et al., 2014; Rompolas et al., 2016), and thus can obtain details of cell fate trajectories, yet this technique is limited to few tissue types which are accessible for invasive long-term imaging. Nonetheless, while each of those experimental assays alone is prone to limitations in defining self-renewal strategies, advanced model inference schemes, that integrate data from different experimental sources, might be the way forward in the future to finally reveal the details of stem cell self-renewal strategies.

Materials and methods

The numerical analysis of the random cell fate model was implemented in Matlab. The description of the stochastic models definition, the random model generation and the simulation campaign is detailed in the Appendix, 'Stochastic process modelling'. Additionally, as a validation of the implemented simulator, based on the Gillespie algorithm (Gillespie, 1977), the IA and PA models were simulated and the results analyzed in the Appendix, 'Invariant Asymmetry and Population Asymmetry models'.

Analytical solutions were partially obtained using Mathematica.

Appendix 1

Conditions for homeostasis

Here, we ‘translate’ the generic conditions for the existence of a Lyapunov stable stationary state for Linear Cooperative Systems (LCS) (Greulich et al., 2019) into the biological context of clonal dynamics. A linear cooperative system is one of the form $\frac{d}{d t} 𝒙 (t) = A 𝒙 (t)$ where $𝒙 (t) = (x_{1} (t), x_{2} (t), \dots, x_{m} (t))$ are functions of time t and A is a constant m × m matrix for which all off-diagonal elements are non-negative (the latter condition defines the cooperativity of the system) (Hirsch and Smith, 2006; Greulich et al., 2019). We note that the dynamics of mean cell numbers, Equations 6 and 7 in the main text, indeed describe an LCS according to this definition. Now we use the following definitions:

$G (A)$ is the graph of A, that is, the graph for which A is the adjacency matrix, whose elements a_ij give the weight of the links from i to j ( $a_{i j} = 0$ means that no link exists). In the following, we use the terms graph and network synonymously.
If in $G (A)$ there exists a path from node i to node j and from j to i, then we call those nodes strongly connected, $i \equiv j$ , which is an equivalence relation. A maximal set of nodes which are are strongly connected with each other are called a Strongly Connected Component (SCC) of the graph (the equivalence class of the equivalence relation ‘≡').
The graph $G (A)$ can be decomposed into its $N_{S}$ SCCs, S_k, $k = 1, \dots, N_{S}$ (Cormen, 2009), which are sub-graphs associated with an adjacency matrix $A_{k}$ , such that $G (A_{k}) = S_{k}$ . Since the $A_{k}$ have non-negative off-diagonal elements, they are Metzler matrices for which the Perron-Frobenius theorem ensures that a unique, simple and real maximal eigenvalue $μ_{k}$ exists (Arrow, 1989). The eigenvalue $μ_{k}$ is called the dominant eigenvalue of S_k. Associated with this eigenvalue, there is, for all $k$ , a positive eigenvector $𝒙^{(k)} = (x_{1}^{(k)}, x_{2}^{(k)}, \dots)$ , that is, one with all entries $x_{i}^{(k)} > 0$ .
The condensed graph of $G (A)$ is the graph where nodes are the SCCs of $G (A)$ and a link from SCC S_k to SCC S_l ( $k, l = 1, \dots, N_{S}$ ) exists if there is is at least one link from a node (in $G (A)$ ) in S_k to a node in S_l.
If there is a path from SCC S_k to SCC S_l, then we call S_k upstream of S_l and accordingly S_l downstream of S_k. We note that there can never exist paths from S_k to S_l and from S_l to S_k, since otherwise, by definition, their nodes would be strongly connected and both together would form a single SCC (Cormen, 2009). Thus, there is a unique hierarchy of SCCs.
A stationary state $𝒙^{*}$ of a dynamical system is Lyapunov stable if a small initial deviation from $𝒙^{*}$ leads to a small final deviation $x (t)$ (i.e. $𝒙^{*}$ is not unstable). More accurately: there exists a constant $C > 0$ such that $| 𝒙 (t) - 𝒙^{*} | < C | 𝒙_{0} - 𝒙^{*} |$ for all times t, where $𝒙_{0} = 𝒙 (t = t_{0})$ is the initial condition, sufficiently close to $𝒙^{*}$ . A stationary state of a linear system that is Lyapunov stable, yet neither asymptotically stable nor has a limit cycle, is neutrally stable.
Homeostasis means that the cell numbers in each state, $𝒏 = (n_{1}, \dots, n_{m})$ , stay on average constant, $\frac{d \bar{𝒏}}{d t} = 0$ (where $\bar{𝒏} = ⟨ 𝒏 ⟩$ ), and that this state is not unstable towards perturbations. This condition corresponds to a Lyapunov-stable stationary state. Note that a linear system, as the one described by Equations 6 and 7, main text, cannot have an asymptotically stable state except for the trivial state ${\bar{𝒏}}^{*} = 0$ , which corresponds to a vanishing cell population. We note that when considering the tissue cell population as a whole, dynamics can be non-linear through interactions between cells and a non-vanishing asymptotically stable state may then exist. However, since single clones do not significantly affect the total configuration of cells in a tissue, the clones compete neutrally, when embedded in a homeostatic cell population, which corresponds to a Lyapunov stable, but not asymptotically stable state. We therefore use Lyapunov stability, a weaker form of stability, to define homeostasis, since an asymptotically stable vanishing state is not a biologically viable state.

Now, for an LCS holds, according to Greulich et al., 2019,

Theorem 1

An LCS, $\dot{𝐱} = A 𝐱$ , possesses a non-trivial Lyapunov stable stationary state ( $𝐱^{*} > 0$ ), if and only if,

G(A) does not contain any SCC, S_k, with $μ_{k} > 0$ .
There is at least one SCC, S_k, with $μ_{k} = 0$ .
There is no path between any two SCCs, S_k and S_l, which have $μ_{k} = 0$ and $μ_{l} = 0$ .

Furthermore holds,

Theorem 2

All nodes i upstream of an SCC S_l with $μ_{l} = 0$ must be empty in the the stationary state, that is, $x_{i}^{*} = 0$ , if i is upstream of the SCC S_l.

Since Equation 7, main text, is an LCS, we can apply theorems 1 and 2 to find conditions for homeostasis, defined by a Lyapunov-stable configuration of mean cell numbers ${\bar{𝒏}}^{*} = ({\bar{n}}_{1}, {\bar{n}}_{2}, \dots)$ . According to theorem 1 at least one SCC with $μ_{k} = 0$ must then exist, and according to theorem 2 the stationary state of nodes upstream of it must be empty, that is, they do not exist in homeostasis. Since the condensed graph of the SCCs does not have cyclic paths, an SCC S_k with $μ_{k} = 0$ must therefore always reside at the apex of all non-vanishing cell types. In principle, an acyclic graph may have more than one apex, however, since, by definition, a stem cell clone always starts with a single stem cell, and no other SCC with $μ = 0$ may be downstream of the latter, we only consider one apex SCC with one initial cell when studying clonal dynamics.

Hence, in the context of homeostatic clonal dynamics, we can assume that there is a single SCC, S_k with $μ_{k} = 0$ at the apex of the cell state graph, while all other SCCs, S_l are downstream of it and have $μ_{l} < 0$ . Since there are no paths from the non-apex SCC to the apex SCC (as the condensed graph is acyclic) we can distinguish the two separate compartments $ℛ$ (the renewing compartment) consisting of all nodes of the apex SCC, S_k, and $𝒞$ (the committed compartment), consisting of all other nodes, whereby due to $μ_{l} < 0$ for all SCCs in $𝒞$ , all progeny of cells in $𝒞$ will vanish in the long term.

Stochastic process modelling

Model description

Since clonal dynamics start, by definition, with a single cell, we use stochastic dynamics to model clones. Thus, we model cell fate dynamics as a continuous-time multi-type branching process (Haccou et al., 2005), a Markov process following the rules of Equations 3-5, main text. As shown later, without losing generality, here only two types of events are modeled; considering an arbitrary number m of cell states, X_i, for $i = 1, \dots m$ , the model includes

Cell divisions: a cell in state X_i divides in two cells with rate $λ_{i}$ , respectively in state X_j and X_k at a ratio $r_{i}^{j k}$ .
$X_{i} \overset{λ_{i} r_{i}^{j k}}{⟶} X_{j} + X_{k}, i, j, k = 1, . . ., m,$

where $λ_{i} = 0$ if state X_i does not allow division. In this formulation of cell division events, which we use for the generation and numerical simulations of random models, only one division outcome is possible upon division of a particular cell state X_i. Nonetheless, multiple division outcomes per state can be implemented as single outcomes if additional metastates are introduced, which represent priming of a state X_i towards a certain division outcome option. For example, if in the original model, state X_i has different outcome options, $X_{j_{1}} + X_{k_{1}}, X_{j_{2}} + X_{k_{2}}, \dots$ , we can substitute this by, first, transitions from X_i to (new) states $X_{m_{1}}, X_{m_{2}}, \dots$ and subsequent divisions $X_{m_{l}} \to X_{j_{l}} + X_{k_{l}}$ . The use of metastates to model more complex processes is discussed in detail in 'Population Asymmetry model using metastates'.

Direct state transitions: a cell in state X_i changes to state X_j at a given rate ω_ij.
$X_{i} \overset{ω_{i j}}{⟶} X_{j}, i, j = 1, \dots, m; i \neq j,$

where $ω_{i j} = 0$ means that no transition from X_i to X_j is possible. Additionally, we include cell loss in this scheme, by treating it as a transition to an additional special state, called hereafter death and denoted by $\emptyset$ (cells in this state do not enter in the counting of the total number of cells). In that formulation, the loss rates of the original model are $d_{i} = ω_{i \emptyset}$ .

These events define a Markov process, which can be represented as a stochastic network (Bang-Jensen and Gutin, 2007). In this view, each node can be related to a cell state, while the links represent transitions between states via cell divisions and the direct state transitions. It is noted that this stochastic network is different from the network defined in the main text and in 'Conditions for homeostasis' of this SI, which describes the dynamics of mean cell number instead. Here, for the stochastic modelling, let us define the adjacency matrix $K$ of this network, through the elements $κ_{i j} = λ_{i} 2 r_{i}^{j} + ω_{i j} i, j = 1, \dots, m$ , in which $κ_{i j}$ are the total transition rates as defined in the main text. We note that $K$ is related to the matrix A used in the main text by $A = K^{T} - Δ$ , where $Δ$ is the diagonal matrix with entries $δ_{i}, i = 1, \dots, m$ , as defined in the main text, with the slight difference that here the loss state $\emptyset$ is treated as a separate state. Additionally, it is remarked that in this model interpretation, where only one division option for each state is possible, the term $r_{i}^{j} \leq 1$ is not a continuum value, but instead it can only take the values $0, 1 / 2, 1$ depending on the specific outcome of the division of the cells in state X_i. Notably, more than one stochastic network may result in the same matrix $K$ , therefore, to uniquely define a process, we distinguish a matrix D which describes cell division events (note that this is possible with just a single matrix as there is only one division option per state) and a matrix $T$ which describes direct transition events. The matrix $K$ is the sum of both, $K = N + T$ .

Generation of random models

To test the behavior of the clonal dynamics in a generic homeostatic model, a large number of random stochastic networks was generated, whereby each stochastic network corresponds to a distinct set of parameters $λ_{1}, \dots, λ_{m}, ω_{12}, \dots, ω_{m \emptyset}$ for the stochastic stem cell fate choice model. The strategy detailed below is based on the following considerations which summarize the key requirements to achieve homeostasis detailed in 'Conditions for homeostasis': (a) each network is composed of Strongly Connected Components (SCCs) that are randomly connected; (b) only one SCC, the one at the apex of the network, forms the renewing compartment, $ℛ$ , (i.e. it is characterized by a dominant eigenvalue $μ = 0$ with respect to A) and all the others form the committed compartment, $𝒞$ , (i.e. they are characterized by a dominant eigenvalues $μ < 0$ ). It is further noted that the SCCs of the stochastic network $G (K)$ are the same as those of the matrix $G (A)$ , where $A = K^{T} - Δ$ defines the dynamics of mean cell numbers. This is, since transposition of an adjacency matrix and altering of diagonal elements does not affect the network topology.

To generate the stochastic network, a two-step process is followed: (1) a large number of (random) SCCs are generated; (2) a condensed network is randomly constructed and filled with randomly picked SCC from step 1.

It is noted that unitary rates are assumed in step (1) and they are successively randomly modified in step (2) to achieve the desired properties of the dominant eigenvalue μ while ensuring randomness.

Focusing now on step (1), that is, the generation of single SCCs, the following procedure is used.

The total number of states composing the SCC is defined, indicated as $m_{S}$ . An additional state is added to represent whatever is outside the SCC. In the current analysis, we set $1 \leq m_{S} \leq 4$ .
We build separately all the possible combinations of transition and division matrices, indicated hereafter with $M_{T}$ and $M_{D}$ , respectively. These matrices are ordered for increasing number of transitions $N_{T}$ and divisions $N_{D}$ . In case GIA networks are generated, the $M_{D}$ and $M_{T}$ combinations are filtered, to remain just with those where the division outcome is one cell inside the SCC and one outside the SCC, and where there are only transitions between states within the SCC (i.e. where cell numbers are conserved). From a computational point of view, this process is feasible up to $m_{S} = 4$ .
The matrices stored in $M_{D}$ and $M_{T}$ are then combined together to form a model (which is completely defined by one matrix in $M_{D}$ and one in $M_{T}$ ); $M_{D T}$ indicates the pool of possible models. This process is done considering separately each $m_{S}$ , $N_{T}$ and $N_{D}$ . In this step, due to technical limitations given by the high number of possible combinations, if the total number of combinations exceed $5 \cdot 10^{4}$ then only 10⁴ random matrices from $M_{D}$ and $M_{T}$ are combined.
Each model in $M_{D T}$ is then processed to check if the corresponding network is a SCC in the first $m_{S}$ states. If not, then this model is discarded. In case GPA networks are generated, a further check is performed to discard also those models consistent with a GIA network (they cannot be a priori excluded as done in point 2 for the GIA ones). These pools of models are indicated as $M_{GIA}$ and $M_{GPA}$ for the GIA and GPA models, respectively.
For each SCC in $M_{GIA}$ and $M_{GPA}$ , the dominant eigenvalue μ is estimated. For construction, the generated GIA networks are all characterized by $μ = 0$ , while in general any value can be obtained within $M_{GPA}$ .
The SCCs in $M_{GPA}$ are additionally processed to check whether the network is compatible with homeostasis by tuning the rates. Networks satisfying this condition are additionally stored under a new pool of SCCs, called $M_{GPA}^{*}$ . If not, then they are discarded when $μ > 0$ (i.e. for any combination of rates the number of cells in these networks is expected to grow).

This process results in three pools of SCCs classified for $m_{S}$ , $N_{T}$ and $N_{D}$ (i.e. number of states, transitions and divisions): (1) $M_{GIA}$ contains GIA models; (2) $M_{GPA}^{*}$ contains GPA models that can be tuned to have $μ = 0$ and (3) $M_{GPA}$ contains GPA models characterized by $μ < 0$ or that can be tuned to meet this condition.

In step (2), the generation of random networks starting from the individual SCCs is implemented as follows.

A number of committed SCCs, $N_{c}$ , between 1 and 3 is randomly chosen.
$N_{c}$ SCCs are randomly picked from the pool of models $M_{GPA}$ . The selection is done considering equal probability in $m_{S}$ , $N_{T}$ and $N_{D}$ . For each SCC, the unitary rates α (where α stands for any rate λ_i or ω_ij) are modified by multiplying them for random numbers (exponentially distributed with mean $\bar{α} = 1$ and minimum $α_{m} = 0.3$ ). Additionally, a threshold on the dominant eigenvalue is set, $μ_{\max} = - 1$ ; if this condition is not satisfied, then the rates are tuned to meet this requirement while maintaining the rates above the minimum.
The committed compartment of the condensed network is generated by randomly connecting all the outgoing components of the $k$ -SCC with states in the $l$ -SCC for $l = k + 1, . ., N_{c}$ . In this way, the transposed adjacency matrix of the stochastic network has triangular block form:
$K^{T} = [\begin{array}{ccccc} B_{1} \\ C_{12} & B_{2} & 0 \\ . . . \\ C_{1, N_{c}} & C_{2, N_{c}} & B_{N_{c}} \\ C_{1 \emptyset} & C_{2 \emptyset} & C_{N_{c}, \emptyset} & 0 \end{array}] .$
The last SCC is forced to be linked to a single death state.
With a similar procedure described in point 2, two SCCs are randomly picked respectively from the pool of SCCs in $M_{GPA}^{*}$ and $M_{GIA}$ ; the unitary rates are modified (exponentially distributed with mean $\bar{α} = 1$ and minimum $α_{m} = 0.3$ ) and, in the GPA case, tuned to meet the condition $μ = 0$ . They represent the renewing part of the network.
Two networks (one for the GIA and one for the GPA models) are produced by attaching the selected renewing network upstream the committed one; this is done based on an analogous procedure as described in step 3.

At the end of this process, we have two networks which are different in just the renewing part, being one consistent with the GIA model and the other with the GPA one. In total 2000 networks were built and analyzed.

Simulation campaign

An extensive simulation campaign was run to model the clone dynamics. The code implemented to numerically simulate the stochastic process defined by events of type 1 and 2 is based on the Gillespie algorithm (Gillespie, 1977). Since a clone is by definition the progeny of a single cell, we choose as initial condition a single cell put randomly in a state within $ℛ$ . Concerning the final condition, given the substantial difference in the dynamics in the two models, the final time, indicated by τ, is set equal to 20 times the inverse of the minimum process rate, $α_{\min} = \min (λ_{1}, \dots, λ_{m}, ω_{12}, \dots, ω_{m, \emptyset})$ , in the GIA models, and to the time at which the fraction of extinct clones reaches 98% in the GPA models. Note that all critical branching processes, as homeostatic clonal dynamics are, will go extinct almost surely at some point in time (Haccou et al., 2005).

To determine the clone size distribution, 10³ and $5 \cdot 10^{4}$ simulations were run respectively in for each GIA and GPA model (in this way, both models result in the same final number of clones when 98% extinction is taken into account).

Numerical simulation test cases

Invariant Asymmetry and Population Asymmetry models

To validate the simulation approach, we tested the procedure on simple cell fate models for which analytical results are known, the Invariant Asymmetry (IA) and Population Asymmetry (PA) models. As described in the main text, in the simplest version, these are defined as,

S \overset{λ}{⟶} {\begin{cases} S + S & Pr. r \\ S + D & Pr. 1 - 2 r \\ D + D & Pr. r \end{cases}, D \overset{γ}{⟶} \emptyset .

In these processes, cells of type S represent the stem cells (called hereafter also progenitor), which divide with stochastic rate λ, and cells of type D are the differentiated cells, which are shed with rate γ. While in the PA model the three possible outcomes of the division of a progenitor are controlled by a probability parameter $0 < r \leq 1 / 2$ , in the IA model r = 0, meaning that there are strictly asymmetric division and the number of S-cells is conserved. It is remarked that in the definition of the stochastic networks given in 'Model description' only one division option for each state is modelled; however, the code implemented for the numerical simulations of the stochastic process allows for an arbitrary number of division options for each state as well (see 'Population Asymmetry model using metastates').

Considering the dynamics at tissue level, the system of ODEs describing the average number of cell ${\bar{n}}_{S}$ and ${\bar{n}}_{D}$ respectively of type S and D is,

{\begin{matrix} \frac{d {\bar{n}}_{S}}{d t} = 0 \\ \frac{d {\bar{n}}_{D}}{d t} = λ; {\bar{n}}_{S} - γ {\bar{n}}_{D} \end{matrix} .

It is clear that, on average, the number of S-cells remains constant. Additionally, in homeostasis, the average total number of D-cells stabilizes around a constant value ${\bar{n}}_{D}^{*} = (λ / γ) {\bar{n}}_{S}$ that uniquely depends on the number of stem cells, ${\bar{n}}_{S}$ which equals the initial number of stem cells ${\bar{n}}_{S, 0} = {\bar{n}}_{S} (t = 0)$ , Thus, the (Lyapunov stable) stationary state of total cell numbers $\bar{n} = {\bar{n}}_{S} + {\bar{n}}_{D}$ is given by,

{\bar{n}}^{*} = (1 + \frac{λ}{γ}) {\bar{n}}_{S, 0} .

Based on Equation 6, the process rates λ and γ determine the proportion of cells of type D with respect to cells of type S. Importantly, there is no difference at tissue level between the IA and PA models.

A distinction is instead evident when we look at the dynamics at the single-cell level, and study the clone size distribution, that is, the distribution of the progeny of a single cell. For the IA model, the number of S-cells is strictly constant, and thus the joint probability distribution $P (n_{S}, n_{D})$ of both S-cells and D-cells, respectively indicated as $n_{S}$ and $n_{D}$ , is fully determined by the distribution of D-cells, $P (n_{D})$ . The IA model’s master equation for $P (n_{D})$ , considering a single initial cell of type S, is given by,

\frac{d P (n_{D})}{d t} = λ P (n_{D} - 1) + γ (n_{D} + 1) P (n_{D} + 1) - (λ + γ n_{D}) P (n_{D}) .

This corresponds to a simple birth-and-death process for which the distribution is Poissonian with mean $λ / γ$ , (Van Kampen, 1981).

Considering now the PA model, the master equation is instead given by,

\begin{array}{ll} \frac{d P (n_{S}, n_{D})}{d t} & = λ (r (n_{S} - 1) P (n_{S} - 1, n_{D}) + (1 - 2 r) n_{S} P (n_{S}, n_{D} - 1) + r (n_{S} + 1) P (n_{S} + 1, n_{D} - 2)) \\ + γ (n_{D} + 1) P (n_{S}, n_{D} + 1) \\ - (λ n_{S} + γ n_{D}) P (n_{S}, n_{D}) . \end{array}

In Antal and Krapivsky, 2010, an exact result for the distribution of total cell numbers $n = n_{S} + n_{D}$ is found when $λ = γ$ and $r = 1 / 4$ . For different values of the process parameters, the long-term distribution is shown to be Exponential.

Numerical simulations for the clonal dynamics were run, considering the above models and three different sets of test parameters each, indicated as IA# and PA#i for $i = 1, 2, 3$ , which are reported in Appendix 1—table 1. It is noted that the time unit is arbitrary and therefore omitted. Simulations are based on 10⁴ and $5 \cdot 10^{4}$ runs respectively for the IA and PA test cases. The initial condition is a single stem cell and the final simulation time, indicated as τ, is equal to 10: this value is well representative of a steady state condition (for the IA test cases) and at which the total extinction of the process is not yet achieved (for PA test cases only). The clone size distribution at τ in the IA test cases is shown in Appendix 1—figure 1: in this figure, each profile is compared to the corresponding Poisson distribution shifted by one (i.e. plus the stem cell). Concerning the results for the PA test cases, they are shown in Appendix 1—figure 2. In this case, the profiles are compared to the numerical integration of the master Equation 8. Additionally, for the PA# test case, where $λ = γ$ and $r = 1 / 4$ , the reference analytic solution provided in Antal and Krapivsky, 2010 is also shown. In general, a good agreement is obtained in all of the cases.

Appendix 1—figure 1

Download asset Open asset

Invariant Asymmetry (IA) test cases clone size distribution $P (n)$ , that is the distribution of the total number of cells n forming the progeny of a single initial cell in $ℛ$ .

For each case, the distribution is shown at τ (defined in Figure 2, main text), which is well representative of the steady state condition. Tested parameters for cases IA#1-3 are provided in Appendix 1—table 1; the numerical simulation results are compared to the expected Poisson distribution. The detailed discussion is reported in 'Invariant Asymmetry and Population Asymmetry models'.

Appendix 1—figure 2

Download asset Open asset

Population Asymmetry (PA) test cases clone size distribution $P (n)$ , that is the distribution of the total number of cells n forming the progeny of a single initial stem cell.

For each case, the distribution is shown at the final time τ, at which the total extinction of the process is not yet achieved. Tested parameters for cases PA#-3 are provided in Appendix 1—table 1; the numerical simulation results are compared to the solution of the numerical integration of the master Equation 8 and, for test case PA#1, also to the reference analytic solution from Antal and Krapivsky, 2010. The detailed discussion is reported in 'Invariant Asymmetry and Population Asymmetry models'.

Population Asymmetry model using metastates

As argued before, we assume in the random model generation that cell division in state X_i has a unique outcome, $X_{i} \to X_{j} + X_{k}$ (Equation 1), since thereby the stochastic process can be uniquely defined by the two matrices D and $T$ . To accommodate for the possibility of different division outcomes from the same state X_i, as in Equation 4 and Equations 3-5 in the main text, we introduce metastates, which represent short-lived states that indicate priming for either outcome, from which the cell division outcomes are unique. This is a small modification of the original model, which, however, does not lead to significant deviations if the metastates are traversed sufficiently quickly (which can be assured by a choice of high direct state transition rates in the metastates).

To illustrate this, let us consider the PA model described by 4; instead of having three different outcomes upon division of an S-cell we define the corresponding Metastate (MS) model with three primed states, $M_{1, 2, 3}$ , as,

\begin{array}{ll} S \overset{ω_{1}}{⟶} M_{1}, M_{1} \overset{λ_{1}}{⟶} S + S, \\ S \overset{ω_{2}}{⟶} M_{2}, M_{2} \overset{λ_{2}}{⟶} S + D, \\ S \overset{ω_{3}}{⟶} M_{3}, M_{3} \overset{λ_{3}}{⟶} D + D, \\ D \overset{γ}{⟶} \emptyset, \end{array}

in which S and D correspond to the same cell type of the PA model (i.e. the stem and the differentiated cells, respectively), while $M_{i}$ , for $i = 1, 2, 3$ , represent the metastates. These states are temporary states that are used to model each one of the three different possible division options of the S-cells. The rates λ_i and $ω_{i}$ , for $i = 1, 2, 3$ , are chosen such that the time scales of division and outcome probabilities are the same as in the original PA model:

ω_{1} / ω_{2} = r / (1 - 2 r), ω_{2} / ω_{3} = (1 - 2 r) / r,

\frac{1}{(1 / ω_{1} + 1 / λ_{1})} = λ r, \frac{1}{(1 / ω_{2} + 1 / λ_{2})} = λ (1 - 2 r), \frac{1}{(1 / ω_{3} + 1 / λ_{3})} = λ r .

Equations 10 assure that outcome probabilities are the same as in the original model, while Equations 11 are needed to have the same total average time between two consecutive events. As there are six unknowns and only five relations, the following additional equation is added

λ_{1} = ω_{1} Δ,

in which $Δ$ is an additional parameter that is used to control how fast cells in metastate $M_{1}$ divide. Low values of $Δ$ imply that as soon as an S-cell transits to the metastate $M_{1}$ , it divides in two S-cells. Globally, this results in

\begin{array}{ll} ω_{1} & = ω_{3} = λ r (Δ + 1) / Δ \\ ω_{2} & = λ (1 - 2 r) (Δ + 1) / Δ \\ λ_{i} & = ω_{i} Δ for i = 1, 2, 3. \end{array}

Numerical simulations for the two models were run and compared, based on the parameters reported in Appendix 1—table 1, and specifically the PA#1 and PA#3 test cases. The time unit, which is arbitrary, is omitted. The process rates for the corresponding MS model, which are indicated in the figures as MS#1 and MS#3, are computed based on Equation 13 and $Δ = 1 / 500$ . As well as for the PA test cases, the initial condition is one cell of type S and the final time, τ, is equal to 10; simulations are based on $5 \cdot 10^{4}$ trajectories.

Appendix 1—table 1

IA and PA test cases simulation parameters (see 'Invariant Asymmetry and Population Asymmetry models').

Case	λ	γ	r
IA#1	1.0	1.0	-
IA#2	2.0	1.0	-
IA#3	5.0	1.0	-
PA#1	1.0	1.0	1/4
PA#2	2.0	1.0	1/4
PA#3	2.0	1.0	1/6

The mean number of cells in the surviving clones and the extinction probability as function of time (scaled by τ) are shown in Appendix 1—figure 3. The clone size distribution at τ is shown in Appendix 1—figure 4. Both MS simulations agree very well with the corresponding PA ones, which justifies the use of metastates for our simulation campaign.

Appendix 1—figure 3

Download asset Open asset

Metastate (MS) test cases simulation results in terms of mean number of cells in the surviving clones ${\bar{n}}_{s}$ and extinction probability $P (n = 0)$ as function of time (scaled by the final simulation time τ).

As well as for the PA test cases, at τ the total extinction of the process is not yet achieved. Profiles from the numerical simulation for cases MS#,3 are compared to the corresponding PA#1,3 test cases which are based on parameters provided in Appendix 1—table 1. The detailed discussion is reported in 'Population Asymmetry model using metastates'.

Appendix 1—figure 4

Download asset Open asset

Metastate (MS) test cases simulation results in terms clone size distribution $P (n)$ , that is the distribution of the total number of cells n forming the progeny of a single initial stem cell.

As well as for the PA test cases, the distribution is shown at the final time, τ, at which the total extinction of the process is not yet achieved. Profiles from the numerical simulation for cases MS#,3 are compared to the corresponding PA#1,3 test cases which are based on parameters provided in Appendix 1—table 1. The detailed discussion is reported in 'Population Asymmetry model using metastates'.

Analysis of the Generalized Invariant Asymmetry model

GIA⁰ test case: Steady state distribution and limiting behavior

A simple Generalized Invariant Asymmetric model, indicated hereafter as GIA⁰, was analyzed to identify the causes of the different clone size distribution behaviors observed in the randomly generated models (see main text). Thus, in this section, we study the Markov process defined by,

X_{1} \overset{λ_{1}}{⟶} X_{1} + X_{2}, X_{2} \overset{λ_{2}}{⟶} X_{2} + X_{2}, X_{2} \overset{γ}{⟶} \emptyset .

Here, the renewing compartment is composed of just a single state $X_{1}$ and cells in this state asymmetrically divide with rate $λ_{1}$ . The committed compartment is formed of state $X_{2}$ ; cells in this state can either divide to duplicate, with rate $λ_{2}$ , or die, with rate γ. It is noted that for $λ_{2} = 0$ , this model is reduced to the previously analyzed Invariant Asymmetric (IA) model (see 'Invariant Asymmetry and Population Asymmetry models').

As for the IA model, here the number of cells in state $X_{1}$ , indicated as $n_{1}$ , is conserved. It is therefore sufficient to determine the statistics of $n_{2}$ , defined by the master equation for $P (n_{2})$ , the probability of having $n_{2}$ cells in state $X_{2}$ , provided that there are $n_{1}$ cells in state $X_{1}$ . The master equation is given by,

\begin{array}{ll} \frac{d P (n_{2})}{d t} = & - (λ_{1} n_{1} + λ_{2} n_{2} + γ n_{2}) P (n_{2}) \\ + (λ_{1} n_{1} + λ_{2} (n_{2} - 1)) P (n_{2} - 1) \\ + γ (n_{2} + 1) P (n_{2} + 1), \end{array}

also written as,

\begin{array}{ll} \frac{d P (n_{2})}{d t} = & - (g (n_{2}) + r (n_{2})) P (n_{2}) \\ + g (n_{2} - 1) P (n_{2} - 1) + r (n_{2} + 1) P (n_{2} + 1), \end{array}

in which $r (n_{2}) = γ n_{2}$ and $g (n_{2}) = λ_{1} n_{1} + λ_{2} n_{2}$ . Considering that we are interested in clonal dynamics, meaning that we start from a single stem cell, $n_{1}$ is equal to one.

In this simple case, the steady state distribution $P^{*} (n_{2})$ , corresponding to the solution of $d P (n_{2}) / d t = 0$ , can be analytically derived. Defining the net flux between states $n_{2}$ and $n_{2} - 1$ as

I_{n_{2}} = r (n_{2}) P^{*} (n_{2}) - g (n_{2} - 1) P^{*} (n_{2} - 1),

and considering that $I_{n_{2} + 1} = I_{n_{2}}$ for every $n_{2}$ , it follows that $I_{n_{2}} = I_{0} = r (0) P^{*} (0) - g (- 1) P^{*} (- 1) = 0$ , which means that

P^{*} (n_{2}) = \frac{g (n_{2} - 1)}{r (n_{2})} P^{*} (n_{2} - 1) = \prod_{l = 0}^{n_{2} - 1} \frac{g (l)}{r (l + 1)} P^{*} (0),

where $P^{*} (0)$ is the steady state probability of having 0 cells in state $X_{2}$ . Finally, by applying the conservation of the total probability, $\sum_{n_{2} = 0}^{\infty} P^{*} (n_{2}) = 1$ , and rearranging the terms we obtain,

P^{*} (n_{2}) = {(1 - \frac{λ_{2}}{γ})}^{λ_{1} / λ_{2}} {(\frac{λ_{2}}{γ})}^{n_{2}} \frac{Γ (\frac{λ_{1}}{λ_{2}} + n_{2})}{Γ (n_{2} + 1) Γ (\frac{λ_{1}}{λ_{2}})} .

In the main text, we defined the dimensionless parameters ${\hat{λ}}_{1} = λ_{1} / γ$ and ${\hat{λ}}_{2} = λ_{2} / γ$ , representing the rescaled division rates for cells in state $X_{1}$ and $X_{2}$ , respectively. For clarity and readability, in this section, we simplify the notation using $p = {\hat{λ}}_{1}$ and $q = {\hat{λ}}_{2}$ . Equation 19 is then rewritten as,

P^{*} (n_{2}) = {(1 - q)}^{p / q} q^{n_{2}} \frac{Γ (\frac{p}{q} + n_{2})}{Γ (n_{2} + 1) Γ (\frac{p}{q})} .

It is noted that while p varies between 0 and $\infty$ , q is defined between 0 and 1.

The mean number of cells in each state, indicated respectively as ${\bar{n}}_{1}$ and ${\bar{n}}_{2}$ , satisfies the system of ODEs

{\begin{cases} \frac{d {\bar{n}}_{1}}{d t} = 0 \\ \frac{d {\bar{n}}_{2}}{d t} = λ_{1} {\bar{n}}_{1} + (λ_{2} - γ) {\bar{n}}_{2} \end{cases} .

Based on this, the steady state average number of cells is

{\begin{matrix} {\bar{n}}_{1}^{*} = 1 \\ {\bar{n}}_{2}^{*} = \frac{λ_{1}}{γ - λ_{2}} = \frac{p}{1 - q} \end{matrix} .

When the mean number of cells in state $X_{2}$ is sufficiently large, that is, for large p or in case q is close to one, the discrete distribution given by Equation 20, can be approximated by a continuous probability density function $P^{*} (x_{2})$ , given by,

P^{*} (x_{2}) = {(1 - q)}^{p / q} q^{p x_{2} / (1 - q)} \frac{Γ (\frac{p}{q} + \frac{p}{1 - q} x_{2})}{x_{2} Γ (\frac{p}{q}) Γ (\frac{p}{1 - q} x_{2})},

in which $x_{2} = n_{2} / {\bar{n}}_{2}^{*}$ . We note that Equation 23 corresponds to Equation 11 in the main text.

To better understand the distribution for different values of the parameters p and q, the limit behavior are analyzed below.

1. $𝐪 \to$ (i.e. ${\hat{λ}}_{2} \to 0)$

When $q \to 0$ , Equation 20 can be simplified considering that

lim_{q \to 0} \frac{Γ (\frac{p}{q} + n_{2})}{Γ (\frac{p}{q})} {(\frac{q}{p})}^{n_{2}} = 1,

lim_{q \to 0} {(1 - q)}^{p / q} = e^{- p}

and

Γ (n_{2} + 1) = n_{2}! .

Thus, the distribution results in

lim_{q \to 0} P^{*} (n_{2}) = \frac{p^{n_{2}} e^{- p}}{n_{2}!} = P o i s s o n (p),

that is a Poisson distribution with mean equal to p. This agrees with what we were expecting considering that when $q = 0$ the model is reduced to the IA model for which the distribution in $n_{2}$ is known to be poissonian.

Additionally, for large mean number of cells, which are obtained for large p (when $q = 0$ , then ${\bar{n}}_{2}^{*} = p$ ), the Poisson distribution tends to a Normal distribution with mean and variance equal to p. Therefore,

lim_{(q, p) \to (0, \infty)} P^{*} (n_{2}) = \frac{1}{\sqrt{2 π p}} e^{- \frac{(n_{2} - p)^{2}}{2 p}} = N o r m a l (p, p) .

Rescaling the distribution, and considering $x_{2} = n_{2} / {\bar{n}}_{2}^{*}$ , results in

lim_{(q, p) \to (0, \infty)} P^{*} (x_{2}) = N o r m a l (1, 1 / p),

that is a Normal distribution with unitary mean and variance equal to $1 / p$ .

2. $𝐪 \to 𝟏$ (i.e. ${\hat{λ}}_{2} \to 1)$

For $q \to 1$ the steady state mean number of cells ${\bar{n}}_{2}^{*} \to \infty$ and Equation 23 holds. This equation can be rewritten as,

P^{*} (x_{2}) = q^{p / (1 - q) x_{2} + 1} \frac{(1 - q)^{p / q}}{q (x_{2} - 1) + 1} \frac{Γ (p \frac{q (x_{2} - 1) + 1}{q (1 - q)} + 1)}{Γ (\frac{p}{q}) Γ (\frac{p}{1 - q} x_{2} + 1)} .

If the Stirling’s approximation is applied

Γ (z + 1) = \sqrt{2 π z} {(\frac{z}{e})}^{z},

we obtain,

P^{*} (x_{2}) = \frac{p^{p / q} e^{- p / q} q^{(q - 2 p) / (2 q)} (q (x_{2} - 1) + 1)^{p / (1 - q) (x_{2} - 1 + 1 / q) - 1 / 2}}{Γ (\frac{p}{q}) x_{2}^{x_{2} p / (1 - q) + 1 / 2}} .

Considering now that

lim_{q \to 1} \frac{{(q (x_{2} - 1) + 1)}^{p / (1 - q) (x_{2} - 1 + 1 / q) - 1 / 2}}{x_{2}^{x_{2} p / (1 - q) + 1 / 2}} = e^{p (1 - x_{2})} x_{2}^{p - 1},

it follows that

lim_{q \to 1} P^{*} (x_{2}) = \frac{p^{p}}{Γ (p)} x_{2}^{p - 1} e^{- p x_{2}} = G a m m a (p, 1 / p),

that is a Gamma distribution with unitary mean and shape parameter given by p. Importantly, the Gamma distribution for $p \to \infty$ tends to a Normal distribution with unitary mean and variance 1/p. For $p = 1$ , it corresponds instead to an Exponential distribution with unitary mean.

3. $𝐩 \to \infty$ (i.e. ${\hat{λ}}_{1} \to \infty)$

When p is large, the mean number of cells is large for any value of q. Thus, Equation 32 is valid. By applying the Stirling’s approximation also to the term $Γ (p / q)$ , we obtain,

P^{*} (x_{2}) = \sqrt{\frac{p}{2 π}} x_{2}^{- p / (1 - q) x_{2} - 1 / 2} {(q (x_{2} - 1) + 1)}^{p / (1 - q) (x_{2} - 1 + 1 / q) - 1 / 2} .

This expression can be also rewritten as,

P^{*} (x_{2}) = \sqrt{\frac{p}{2 π}} e^{p / (1 - q) ((x_{2} - 1 + 1 / q) \log (q (x_{2} - 1) + 1) - x_{2} \log (x_{2})) - 1 / 2 (\log (x_{2}) + \log (q (x_{2} - 1) + 1))} .

Considering now that p is large, then $- 1 / 2 (\log (x_{2}) + \log (q (x_{2} - 1) + 1)) ≪$ $p / (1 - q) ((x_{2} - 1 + 1 / q)$ $\log (q (x_{2} - 1) + 1) - x_{2} \log (x_{2}))$ , so the term on the right can be neglected. Additionally, for $x_{2} \to 1$ the following expansions can be applied:

\log (q (x_{2} - 1) + 1) = \sum_{k = 1}^{\infty} ({(- 1)}^{k + 1} \frac{{(q (x_{2} - 1))}^{k}}{k}),

and

\log (x_{2}) = \sum_{k = 1}^{\infty} ({(- 1)}^{k + 1} \frac{{(x_{2} - 1)}^{k}}{k}) .

Finally, if we consider that

\frac{(x_{2} - 1 + \frac{1}{q}) \sum_{k = 1}^{\infty} ({(- 1)}^{k + 1} \frac{{(q (x_{2} - 1))}^{k}}{k}) - x_{2} \sum_{k = 1}^{\infty} ({(- 1)}^{k + 1} \frac{{(x_{2} - 1)}^{k}}{k})}{{(x_{2} - 1)}^{2}} = - \frac{1}{2 (1 - q)},

then Equation 36 results in

lim_{p \to \infty} P^{*} (x_{2}) ≃ \sqrt{\frac{p}{2 π}} e^{- 1 / 2 p (x_{2} - 1)^{2}} = N o r m a l (1, 1 / p),

that is a Normal distribution with unitary mean and variance equal to $1 / p$ .

Importantly, it is noted that the limiting behavior of $P^{*} (x_{2})$ for $q \to 0$ and $q \to 1$ in case of large p, are both consistent with the results obtained for $p \to \infty$ and any q. In other words, remembering that $p = {\hat{λ}}_{1}$ and $q = {\hat{λ}}_{2}$ , the steady state distribution for ${\hat{λ}}_{1} \to \infty$ and any value of ${\hat{λ}}_{2}$ is a Normal distribution of unitary mean and variance equal to $1 / {\hat{λ}}_{1}$ .

To globally verify these results, numerical simulations of the stochastic process associated with model 14 for different values of ${\hat{λ}}_{1}$ and ${\hat{λ}}_{2}$ were run. The following curves were compared:

Stochastic simulation: distribution at the final simulation time, τ, of the number of cells in state $X_{2}$ . The final time was chosen here as $τ = 20 / α_{\min}$ , where $α_{\min} = \min (λ_{1}, λ_{2}, γ)$ ; this value is well representative of a steady state condition. Furthermore, the process rates considered are based on a unitary γ (i.e. $λ_{1} = {\hat{λ}}_{1}$ , $λ_{2} = {\hat{λ}}_{2}$ and $γ = 1$ ). It is noted that the time unit is arbitrary and therefore omitted.
Analytic distribution: based on Equations 20, for low mean values, and 23, for large mean values.
Approximate distributions: Poisson, Gamma and Normal distributions respectively given by Equations 27, 34 and 40.

The tested parameters ${\hat{λ}}_{1}$ and ${\hat{λ}}_{2}$ are graphically shown in Appendix 1—figure 5 a contour map showing the expected steady state mean number of cells ${\bar{n}}_{2}^{*}$ over the $({\hat{λ}}_{1}, {\hat{λ}}_{2})$ -parameter plane. The curves from the numerical simulations and the corresponding exact and approximated solutions are shown in Appendix 1—figure 6, Appendix 1—figure 7 and Appendix 1—figure 8: the tested conditions are divided into three groups (one figure each) representing the limiting behaviors discussed above. Generally, analytical and numerical results agree very well. This also demonstrates that GIA models can show both peaked and non-peaked distributions, depending on the model parameters.

Appendix 1—figure 5

Download asset Open asset

GIA⁰ test case parameters ${\hat{λ}}_{1}$ and ${\hat{λ}}_{2}$ over the contour map of the expected steady state mean number of cells in state $X_{2}$ , ${\bar{n}}_{2}^{*}$ .

The tested conditions are divided in three groups representing the limiting behaviors discussed in in 'GIA⁰ test case: steady state distribution and limiting behavior', and for which the steady state distribution is shown respectively in Appendix 1—figure 6, Appendix 1—figure 7 and Appendix 1—figure 8.

Appendix 1—figure 6

Download asset Open asset

GIA⁰ test case (see GIA⁰ test case: steady state distribution and limiting behavior') results in terms of steady state distribution $P^{*} (n_{2})$ of the the number of cells in state $X_{2}$ , $n_{2}$ .

The tested parameters correspond to the condition ${\hat{λ}}_{2} = 0.01$ , as representative of the limiting case ${\hat{λ}}_{2} \to 0$ , and to different values of ${\hat{λ}}_{1}$ . The results from the numerical simulations are compared to the analytic solution (Equation 20), and its approximation, that is, the Poisson distribution (Equation 27).

Appendix 1—figure 7

Download asset Open asset

GIA⁰ test case (see 'GIA⁰ test case: steady state distribution and limiting behavior') results in terms of steady state rescaled distribution $P^{*} (x_{2})$ of the the number of cells in state $X_{2}$ , where $x_{2} = n_{2} / {\bar{n}}_{2}^{*}$ .

The tested parameters correspond to the condition ${\hat{λ}}_{2} = 0.99$ , as representative of the limiting case ${\hat{λ}}_{2} \to 1$ , and to different values of ${\hat{λ}}_{1}$ . The results from the numerical simulations are compared to the analytic solution (Equation 23), and its approximation that is the Gamma distribution (Equation 34).

Appendix 1—figure 8

Download asset Open asset

Approximation of generic GIA models

As shown in the main text, a generic GIA model can be expressed in terms of the compartments $ℛ$ and $𝒞$ (Equation 9 in the main text). We note that the the GIA⁰ model discussed in the previous section corresponds to the general compartment dynamics of GIA models, Equation 9, main text, if the dynamics of compartments are assumed to be Markovian. Thus, we can treat the GIA⁰ model as a Markovian approximation of generic GIA models. In this section, we test this approximation numerically.

To this end, we first wish to relate the effective (non-Markovian) rates $λ_{R, C}$ and γ_C of a generic GIA model to the rates of the Markovian approximation, the GIA⁰ model. We refer to this model – the GIA⁰ model matched to the effective rates of a particular more complex GIA model – as the equivalent model to the latter. The equivalent rates $λ_{R}$ , $λ_{C}$ and γ_C are computed considering the same steady state condition in terms of mean number of cells. To this aim, we rewrite the dynamics of mean cell numbers, Equation 7 in the main text, in block form as,

{\begin{cases} \frac{d {\bar{n}}_{R}}{d t} = A_{R R} {\bar{n}}_{R} \\ \frac{d {\bar{n}}_{C}}{d t} = A_{C R} {\bar{n}}_{R} + A_{C C} {\bar{n}}_{C} \\ \frac{d {\bar{n}}_{\emptyset}}{d t} = A_{\emptyset C} {\bar{n}}_{C} \end{cases},

in which ${\bar{𝒏}}_{R, C}$ denote the vectors of mean cell numbers of states restricted to compartments $ℛ, 𝒞$ , respectively, and $n_{\emptyset}$ the number of lost cells (not considered for total cell numbers and homeostasis condition). It is noted that $A_{R C} =$ , since there cannot be links from $𝒞$ to $ℛ$ . Also $A_{\emptyset R} =$ as we do not consider loss from $ℛ$ (see main text for the arguments).

Thus, summing up all the components in each compartment, ${\bar{n}}_{R} = \sum_{i} {({\bar{𝐧}}_{R})}_{i} = 1$ and ${\bar{n}}_{C} = \sum_{i} {({\bar{𝐧}}_{C})}_{i}$ , results in

{\begin{cases} \frac{d {\bar{n}}_{R}}{d t} = 0 \\ \frac{d {\bar{n}}_{C}}{d t} = \sum_{i} (A_{C R} {\bar{n}}_{R})_{i} + \sum_{i} (A_{C C} {\bar{n}}_{C})_{i} \\ \frac{d {\bar{n}}_{\emptyset}}{d t} = A_{\emptyset C} {\bar{n}}_{C} \end{cases} .

The equivalent parameters are then estimated from the steady state condition ${\bar{𝐧}}_{X}^{*}$ and ${\bar{n}}_{X}^{*}$ , for $X = R, C, \emptyset$ , as,

λ_{R} = \sum_{i} {(A_{C R} {\bar{𝐧}}_{R}^{*})}_{i}, γ_{C} = \frac{\sum_{i} {(A_{\emptyset C} {\bar{𝐧}}_{C}^{*})}_{i}}{{\bar{n}}_{C}^{*}} and λ_{C} = γ_{C} - \frac{λ_{R}}{{\bar{n}}_{C}^{*}} .

The applicability of this approximation was evaluated by comparing the clone size distribution obtained from the random GIA models (generated as described in 'Generation of random models' and analyzed in the main text) with that from the corresponding equivalent GIA⁰ model with parameters ${\hat{λ}}_{1} = {\hat{λ}}_{R} = λ_{R} / γ_{C}$ and ${\hat{λ}}_{2} = {\hat{λ}}_{C} = λ_{C} / γ_{C}$ . The values of ${\hat{λ}}_{1}$ and ${\hat{λ}}_{2}$ for all the GIA random models are shown in Appendix 1—figure 9 in the contour map of the expected mean number of cells in $𝒞$ (in compartment $ℛ$ there is always one single cell). In general, ${\hat{λ}}_{1}$ remains below five and ${\hat{λ}}_{2}$ is spread between zero and one. As measure of the error of the equivalent model, $ϵ$ , we choose the maximum difference between the distributions of a particular random GIA model and that of the corresponding equivalent model, relative to the peak of the distribution of the random model. For low mean cell numbers, the distribution is compared to Equation 20; for large mean number instead, the rescaled distribution is compared to Equation 23. A threshold on the mean cell number equal to 10 was chosen to distinguish between the two cases. This relative error $ϵ$ as function of ${\hat{λ}}_{2}$ is presented in Appendix 1—figure 10, where it is evident that large errors are obtained only for large values of this parameters. Some illustrative cases, representative of different value of ${\hat{λ}}_{2}$ , were selected and their distribution is shown in Appendix 1—figure 11, Appendix 1—figure 12 and Appendix 1—figure 13. The following considerations are made:

Two cases for ${\hat{λ}}_{2} < 0.2$ are presented in Appendix 1—figure 11. In these cases, the distribution obtained from the random models agrees with the analytic solution from the equivalent model, which in turn is well approximated by a Poisson distribution. As expected, larger deviations between the equivalent model’s analytic solution and the approximation are noted for increasing values of ${\hat{λ}}_{2}$ . In general, all the models in this range are well approximated by the equivalent model.
The two cases presented in Appendix 1—figure 12 have ${\hat{λ}}_{2} > 0.8$ , for which the Gamma distribution is an approximation of the equivalent model’s analytic solution. The distribution in some cases (see for instance the top figure), presents some deviations with respect to the equivalent model. However, globally a good agreement is obtained in most of the cases (failing ratio, based on a 0.5 maximum error is 21.7%).
Two cases in an intermediate range $0.2 < {\hat{λ}}_{2} < 0.8$ are shown in Appendix 1—figure 13. Again, the equivalent model’s analytic solution is well representative of the distribution (failing ratio, based on a 0.5 maximum error is 3.2%). It is noted that for such values of ${\hat{λ}}_{2}$ an approximation of the equivalent model analytic solution is not available.

Appendix 1—figure 9

Download asset Open asset

GIA random models (generated as described in 'Generation of random models' and analyzed in the main text) equivalent parameters ${\hat{λ}}_{1} = {\hat{λ}}_{R}$ and ${\hat{λ}}_{2} = {\hat{λ}}_{C}$ (see section Approximation of generic GIA models) over the contour map of the expected steady state mean number of cells in the committed compartment, ${\bar{n}}_{C}^{*}$ .

Some illustrative cases, for which the steady state distribution is shown in Appendix 1—figure 11, Appendix 1—figure 12 and Appendix 1—figure 13, are highlighted.

Appendix 1—figure 10

Download asset Open asset

Relative error of the the equivalent model approximation, $ϵ$ , (see definition in 'Approximation of generic GIA models') as function of ${\hat{λ}}_{2} = {\hat{λ}}_{C}$ for the GIA random models (generated as described in 'Generation of random models' and analyzed in the main text).

The selected cases correspond to some illustrative cases for which the steady state distribution is shown in Appendix 1—figure 11, Appendix 1—figure 12 and Appendix 1—figure 13.

Appendix 1—figure 11

Download asset Open asset

GIA random models selected cases (see Appendix 1—figure 9 and Appendix 1—figure 10) where ${\hat{λ}}_{2} < 0.2$ : the steady state distribution $P^{*} (n_{C})$ of the number of cells in the committed compartment, $n_{C}$ , is compared to that of the equivalent model (Equation model in the legend) analytic solution and its approximation for low ${\hat{λ}}_{2}$ (i.e. the Poisson distribution, Poisson $({\hat{λ}}_{1})$ ).

Results discussion is reported in 'Approximation of generic GIA models'.

Appendix 1—figure 12

Download asset Open asset

GIA random models selected cases (see Appendix 1—figure 9 and Appendix 1—figure 10) where ${\hat{λ}}_{2} > 0.8$ : the steady state rescaled distribution $P^{*} (x_{C})$ of the number of cells in the committed compartment, where $x_{C} = n_{C} / {\bar{n}}_{C}^{*}$ , is compared to that of the equivalent model (Equation model in the legend) analytic solution and its approximation for high ${\hat{λ}}_{2}$ (i.e. the Gamma distribution, Gamma $({\hat{λ}}_{1}, 1 / {\hat{λ}}_{1})$ ).

Results discussion is reported in 'Approximation of generic GIA models'.

Appendix 1—figure 13

Download asset Open asset

GIA random models selected cases (see Appendix 1—figure 9 and Appendix 1—figure 10) where $0.2 < {\hat{λ}}_{2} < 0.8$ : the steady state distribution $P^{*} (n_{C})$ (or the rescaled distribution $P^{*} (x_{C})$ ) of the number of cells in the committed compartment, $n_{C}$ (or in the rescaled case $x_{C} = n_{C} / {\bar{n}}_{C}^{*}$ ), is compared to that of the equivalent model (Equation model in the legend) analytic solution.

Results discussion is reported in 'Approximation of generic GIA models'.

Thus, in most of the tested cases the equivalent model is able to catch the behavior of a generic random GIA model, and thus represents a good approximation (global failing ratio, based on a 0.5 maximum error is 6%). In the cases where the equivalent model does not yield a good approximation, the internal structure of the $ℛ$ and $𝒞$ compartments become relevant and subsequent events that affect $n_{R}$ and $n_{C}$ become dependent on each other, and thus are non-Markovian.

GIA model for large ${\hat{λ}}_{R}$

To test the behavior of a generic GIA model in case of large ${\hat{λ}}_{R}$ , the GIA random models (generated as described in 'Generation of random models' and analyzed in the main text) were modified by changing the process rates associated to the renewing compartment to achieve ${\hat{λ}}_{R} = 30$ . To this aim, considering that infinite solutions are possible, we applied a global search method, and more specifically a Genetic Algorithm (Goldberg, 1989). We therefore setup an optimization problem, where the process parameters are the optimization variables and the cost function is the error of the current ${\hat{λ}}_{R}$ with respect to the target.

The envelope of curves obtained in all the random models and some illustrative profiles are shown in Appendix 1—figure 14. A reference Normal distribution, characterized by unitary mean and variance equal to $1 / {\hat{λ}}_{R} = 1 / 30$ is also reported: this curve corresponds to the distribution expected in the equivalent model for which ${\hat{λ}}_{1} = {\hat{λ}}_{R}$ . Deviations become relevant, when the internal structure of compartments in a random model leads to subsequent events that are not independent from each other. These effects alter the variance of the Normal distribution. In fact, Figure 4 in the main text is based on the same simulation results, but in this case the rescaling is done considering both the mean number of cells and its variance (a Normal distribution is a two-parameter distribution).

Appendix 1—figure 14

Download asset Open asset

Rescaled clone size distribution for the random GIA models when ${\hat{λ}}_{R} = 30$ at the final simulation time, which corresponds to $20 / α_{\min}$ ( $α_{\min}$ is the minimum process rate).

The grey shade represents the percentile of all the simulations (black lines limit the 5-95%ile range); the blue curves correspond to some illustrative selected simulations. A reference curve corresponding to a Normal distribution of unitary mean and variance equal to $1 / {\hat{λ}}_{R} = 1 / 30$ is shown in green. Distributions of the total number of cells $n$ are scaled by the mean number of cells $\bar{n}$ , being $x = n / \bar{n}$ . Simulations for which the final condition (20 times the inverse of the minimum process rate) is not achieved (due to computational limitations) are not included, resulting in 922 models. Results discussion in provided in 'GIA model for large'.

GIA^B test case: bimodal distribution

In the previous subsection we increased $λ_{R}$ in a way which assures that other parameters within $ℛ$ stay of the same order of magnitude. Here, we address the question what happens if some parameters within $ℛ$ are much smaller than parameters of $𝒞$ , such as γ_C. For that purpose, we study another simple GIA model, let us call it GIA^B, as a modification of the GIA⁰ test model defined by 14. In the GIA^B model the renewing compartment is composed by two states $X_{1}$ and $X_{2}$ , instead of only one. Cells in these states divide asymmetrically (i.e. one daughter cell remains within the renewing compartment while the other enters the committed compartment) or change state between $X_{1}$ and $X_{2}$ (cell state switching) while still remaining within the renewing compartment. The committed compartment of the system is composed just by a single state, $X_{3}$ , and cells in this state either duplicate or die (as the previous state $X_{2}$ in Equation 14). This corresponds to the model

X_{1} \overset{λ_{1}}{⟶} X_{1} + X_{3}, X_{2} \overset{λ_{2}}{⟶} X_{2} + X_{3}, X_{1} \overset{ω_{12}}{⟶} X_{2}, X_{2} \overset{ω_{21}}{⟶} X_{1}, X_{3} \overset{λ_{3}}{⟶} X_{3} + X_{3}, X_{3} \overset{γ}{⟶} \emptyset .

In this model, the effective parameters as defined in 'Approximation of generic GIA models', $λ_{R} = λ_{1} P_{1}^{*} + λ_{2} P_{2}^{*}$ , where $P_{i}^{*} = \frac{ω_{j i}}{ω_{i j} + ω_{j i}}$ , $i, j = 1, 2, i \neq j$ , and $γ_{C} = γ$ . As before, we define the non-dimensionalized parameters ${\hat{λ}}_{R} = λ_{R} / γ_{C}$ and here we also define $\hat{ω} = ω_{12} / γ_{C}$ , and further the parameter ratios $a = λ_{1} / λ_{2}$ and $b = ω_{12} / ω_{21}$ . In the following, we test this model for different values of $a$ and $\hat{ω}$ as reported in Appendix 1—table 2, while fixing ${\hat{λ}}_{R} = 30$ , which is our main scaling parameter, as well as ${\hat{λ}}_{C} = 0$ and $b = 1$ .

Appendix 1—table 2

GIA^B test case simulation parameters (see 'GIA^B test case: bimodal distribution').

Case	$\hat{ω}$	$λ_{1} / λ_{2}$
GIA^B#1	3 10¹	1
GIA^B#2	3 10^-2	1
GIA^B#3	3 10²	10
GIA^B#4	3 10¹	10
GIA^B#5	3 10⁰	10
GIA^B#6	3 10^-1	10
GIA^B#7	3 10^-2	10

The rescaled distribution of the number of cells in the committed compartment $𝒞$ (i.e. in state $X_{3}$ ), $n_{C}$ , obtained at the final simulation time τ, is shown in Appendix 1—figure 15. A value of τ equal to $20 / α_{\min}$ (where $α_{\min}$ is the minimum of all rate parameters) was chosen to assure that the steady state is reached. Considering first the test cases GIA_B#1 and GIA_B#2 according to Appendix 1—table 2, which are characterized by $a = 1$ (i.e. there is no difference in the division timescales for the two renewing states), they both lead to a Normal distribution, independently on the value assumed by $\hat{ω}$ . Test cases GIA_B#3 to GIA_B#7 instead are all characterized by $a = 10$ , and different orders of magnitude for $\hat{ω}$ are tested. The distribution in these cases is Normal until $\hat{ω} \geq {\hat{λ}}_{R} / 10$ (see cases GIA_B#3 to GIA_B#5); when $\hat{ω}$ is significantly lower than ${\hat{λ}}_{R}$ , then bimodality emerges (see cases GIA_B#6 and GIA_B#7). Looking at the extreme case, GIA_B#7, cells in each renewing state, if analyzed independently, would result in a Poisson distribution in the committed compartment with different mean values (low for the slow-dividing state and large for the fast-dividing one). Thus, globally the distribution is in line with a bimodal distribution computed as

Appendix 1—figure 15

Download asset Open asset

Rescaled distribution of the cells number in the committed compartment in the $G I A^{B}$ test cases at time τ, which is $20 / α_{\min}$ ( $α_{\min}$ is the minimum process rate).

The distributions $P ({\tilde{x}}_{C})$ of the number of cells in the committed compartment $n_{C}$ is rescaled considering that ${\tilde{x}}_{c} = (n_{C} - {\bar{n}}_{C}) / σ_{n_{c}}$ , where $σ_{n_{c}}$ is the variance of $n_{c}$ . In addition to the stochastic simulation results for different settings (see Appendix 1—table 2), the reference Normal and bimodal distributions are also shown. Results discussion is provided in 'GIA^B test case: bimodal distribution'.

P (n) = β Poisson ({\hat{λ}}_{R}^{(1)}) + (1 - β) Poisson ({\hat{λ}}_{R}^{(2)}),

in which $β$ is the mixing parameter, computed as,

β = \frac{\bar{n} - {\bar{n}}_{2}}{{\bar{n}}_{1} - {\bar{n}}_{2}},

and the parameters ${\hat{λ}}_{R}^{(i)}$ and ${\bar{n}}_{i}$ for $i = 1, 2$ correspond to the parameter ${\hat{λ}}_{R}$ and to the mean number of cells of a system in which the renewing compartment would be composed just by state X_i. The total mean number of cells is instead indicated by $\bar{n}$ . The bimodal distribution given by Equation 45 is indicated as a black dashed-dotted line in Appendix 1—figure 15.

Analysis of the Generalized Population Asymmetry model

In the main text, it is shown that GPA models predict asymptotically, for large times t, the same rescaled clone size distribution, that is, an Exponential distribution of unitary mean.

In Appendix 1—figure 16, the 50%tile distribution of all the GPA models analyzed is shown at different levels of extinction (which are related to the different time points), showing a gradual convergence to the expected Exponential distribution.

Appendix 1—figure 16

Download asset Open asset

Clonal size distribution (corresponding to the 50%ile curve) in the GPA random models at different extinction fraction (i.e. different time).

The curves are compared to the expected Exponential distribution (see 'Analysis of the generalized Population Asymmetry model').

Thus, the Markov approximation to all GPA models, Equation 12 in the main text (the equivalent model of GPA models), becomes accurate for sufficiently large t and no significant deviations are observed. This also means that for large t, the distribution is independent of the choice of parameters, since only the mean value of surviving clones, ${\bar{n}}_{s}$ , depends on parameters, which however, does not affect the rescaled distribution in terms of $x = \frac{n}{{\bar{n}}_{s}}$ . We can therefore abstain from an extended study of different parameter regimes. This is in contrast to the GIA model class where distributions depend sensitively in the choice of parameters if we are not in the scaling regime of large ${\hat{λ}}_{R}$ , and the non-Markovian nature of GIA models can become relevant, as we showed in the previous section.

Asymptotic clone size distributions: Mathematical analysis

In the previous two sections, we studied numerically how a Markovian representation can approximate general cell fate models (GIA and GPA) models. Here, we study from an analytical view point how generic GIA and GPA models converge to the respective limiting distributions, for large time t (GPA models) and large ${\hat{λ}}_{R}$ (GIA models).

Similar to 'Approximation of generic GIA models', we define $𝒏_{R}$ and $𝒏_{C}$ as the cell number vectors (here: actual cell numbers of the stochastic model, not mean cell numbers) restricted to the states of compartments $ℛ$ and $𝒞$ , respectively. We further define the accumulated cell numbers $n_{R} = \sum_{i} {(𝒏_{R})}_{i}$ and $n_{C} = \sum_{i} {(𝒏_{C})}_{i}$ in $ℛ$ and $𝒞$ , respectively. Considering $n_{R}$ and $n_{C}$ as observables of our compartment model, this corresponds to a Hidden Markov Model in that the dynamics of the observables are not Markovian, yet they are entirely determined by a set of states which follow a Markov process.

General dynamics of C-cells for GIA and GPA models

Comments on the effective rate parameter $λ_{R}$

For general GIA and GPA models in the compartment representation of Equation 9, main text, the effective rate parameter $λ_{R}$ (i.e. the frequency of cell divisions in $ℛ$ per cell), is defined similar as in 'Approximation of generic GIA Models', yet, here we take into account that $λ_{R}$ can depend on time via the – not necessarily stationary – distribution of cells within $ℛ$ (since the process is non-Markovian). Hence, in these more general terms, we define $λ_{R} (t) = \sum_{i \in ℛ} λ_{i} P_{i}^{R} (t)$ where $P_{i}^{R} (t) = \frac{{\bar{n}}_{i} (t)}{{\bar{n}}_{R} (t)}$ is the probability of a single cell to be in state i at time t. $P_{i}^{R} (t)$ may variate after each event $E$ , as the conditional probability ${P^{R} |}_{E}$ , provided that an event E has just occurred, differs from the stationary state distribution.

In homeostasis, where the number of R-cells must on average stay constant, $λ_{R}$ is also the rate, per R-cell, at which C-cells are created from R-cells, via events $R \to R + C, R \to C + C$ , or direct transition, $R \to C$ . Thus, the total rate of C-cells being created from the R-cells by such events – let us call them $R C$ -events – is $λ_{R} n_{R}$ . While the non-Markovian nature of the process does not assure that such events are distributed exponentially, we can state that, by definition, the number of such creation events in a time period $Δ t$ , $N_{R C}$ , has mean value $⟨ N_{R C} (Δ t) ⟩ = \int_{0}^{Δ t} λ_{R} (t) n_{R} (t) 𝑑 t$ .

While, $λ_{R} (t)$ may in principle depend on time, we note that when internal rates of $ℛ$ are fast compared to the time period $Δ t$ above (an internal rate of $ℛ$ is a rate ω_ij where states i,j are both in $ℛ$ ), then $λ_{R} (t)$ fluctuates quickly and we can make an adiabatic approximation, replacing $λ_{R} (t)$ by its average ${\bar{λ}}_{R} = \sum_{i \in ℛ} λ_{i} P_{i}^{R}$ , where $P_{i}^{R, *} = \frac{{\bar{n}}_{i}^{*}}{{\bar{n}}_{R}^{*}}$ is the steady state value of $P_{i}^{R} (t)$ (this corresponds for GIA models to the definition of $λ_{R}$ in 'Approximation of generic GIA models'). This is fulfilled in our simulations of large ${\hat{λ}}_{R}$ , since internal rates, such as $\hat{ω}$ defined in 'GIA^B test case: bimodal distribution', scale with ${\hat{λ}}_{R}$ when $λ_{R} \to \infty$ (see 'GIA model for large'). Hence, the time scales of internal rates are substantially smaller than the relevant time scale $Δ t = 1 / {\bar{γ}}_{C}$ , the lifetime of generated C-cells. Therefore, when comparing with simulation results, it is generally appropriate to assume that $λ_{R} (t) \approx {\bar{λ}}_{R}$ is constant. In the following subsection, we will discuss this case. The case when internal rates are slower than the time scale γ_C is discussed in the subsequent subsection.

Asymptotic distributions of C-cells

Each C-cell created by an $R C$ -event initiates a sub-clone within $𝒞$ , defined through its progeny, which then follows the dynamics of $𝒞$ . Such sub-clones evolve independently of each other (a defining characteristic of branching processes [Haccou et al., 2005]). Let us call the number of cells of a sub-clone created by an $R C$ -event at time $t_{i}$ , which evolves over time t, as $ξ_{i} (t)$ . We denote two $R C$ -events which happen at the same time via a symmetric division of type $R \to C + C$ by different indices i and $i + 1$ , yet with $t_{i} = t_{i + 1}$ . Therefore, the total number of cells in $𝒞$ is the sum of independent random numbers $ξ_{i}$ ,

n_{C} (t) = \sum_{i = 1}^{N_{R C}} ξ_{i} (t)

Note that the random numbers $ξ_{i} (t)$ are not identically distributed, since their statistics depend on the time point of the i-th $R C$ -event. In particular, the mean value, ${\bar{ξ}}_{i} (t - t_{i}) = ⟨ ξ_{i} (t) ⟩$ and variance $σ_{ξ}^{2} (t - t_{i}) = ⟨ {(ξ_{i} (t) - {\bar{ξ}}_{i})}^{2} ⟩$ depend on the time passed since the respective $R C$ -event at time $t_{i}$ . Thus, we cannot apply the central limit theorem in its original form to the sum of random numbers, Equation 47. However, a variation of the central limit theorem states that sums of non-identically distributed random variables, $\sum_{i} ξ_{i}$ , converge to normally distributed random variables, if mean and variance of $ξ_{i}$ are finite, and they fulfill Lindeberg’s condition (Billingsley, 1995).

The (strict) Lindeberg’s condition is said to be fulfilled for a sequence of random numbers $ξ_{i}$ , $i = 1, \dots, N$ , if

\max_{i} \frac{σ_{i}^{2}}{σ_{N}^{2}} \to 0, for N \to \infty

where $σ_{i}^{2} = ⟨ {(ξ_{i} - {\bar{ξ}}_{i})}^{2} ⟩$ and $σ_{N}^{2} = \sum_{i = 1}^{N} σ_{i}^{2}$ . If this is fulfilled, then $n_{C} = \sum_{i = 1}^{N} ξ_{i}$ converges for $N \to \infty$ to a random variable that is normal distributed.

To show that the $ξ_{i}$ fulfill Lindeberg’s condition, we note that $ξ_{i} (t - t_{i})$ follow a sub-critical multi-type branching process, for which ${\bar{ξ}}_{i} (t) \to 0$ for $t \to \infty$ , which is assured since the eigenvalues of the adjacency matrix of $𝒞$ are all negative (since dominant eigenvalues of all SCCs in $𝒞$ are negative [astrom_murray_feedback_book]). For multi-type branching processes, the variance σ² is proportional to the mean value, hence $σ_{i}^{2} (t - t_{i}) \sim \bar{ξ} (t - t_{i})$ . Therefore, $σ_{i}^{2} \to 0$ for $t \to \infty$ , hence it is bounded, i.e there exists $C > 0$ such that $σ_{i}^{2} (t) < C$ for all t. Furthermore, since initially, at $t = t_{i}$ , ${\bar{ξ}}_{i} (t_{i}) = 1$ , we know that there exist $t_{1} > 0$ and $δ > 0$ such that ${\bar{ξ}}_{i} (t) > δ$ for $t - t_{i} < t_{1}$ . Now we recall that, since here we assume the validity of the adiabatic approximation discussed in the previous subsection, the number of $R C$ -events within a time period $Δ t$ is $N_{R C} (Δ t) \sim λ_{R} \int_{0}^{Δ t} n_{R} (t^{'}) 𝑑 t^{'}$ . For generic $λ_{R}$ , $N_{R C}$ is finite and thus is $σ_{N}$ , since all $σ_{i} (t) \to 0$ for large t. However, for $λ_{R} \to \infty$ or $n_{R} \to \infty$ , we get that $N_{R C} (t_{1}) \sim {\bar{λ}}_{R} n_{R} \to \infty$ and thus $σ_{N}^{2} = \sum_{i = 1}^{N_{R C}} σ_{i}^{2} (t) > N_{R C} δ \to \infty$ . On the other hand, all $σ_{i}^{2} < C$ , which means that all $\frac{σ_{i}^{2}}{σ_{N}^{2}} < \frac{C}{σ_{N}^{2}} \to 0$ for $λ_{R} \to \infty$ or $n_{R} \to \infty$ . Hence, Lindeberg’s condition is fulfilled if $λ_{R} \to \infty$ or $n_{R} \to \infty$ and thus, $n_{C}$ becomes normally distributed,

n_{C} (t) = \sum_{i}^{N_{R C}} ξ_{i} (t) \to Normal (mean = {\bar{n}}_{C}, variance \sim {\bar{n}}_{C})

The variance scales with $n_{C}$ since variances of independent random numbers add linearly and each $σ_{i}^{2} \sim {\bar{ξ}}_{i}$ . The exact value of ${\bar{n}}_{C}$ and the pre-factor of the variance of $n_{C}$ in this limit depend on the (non-Markovian) model details.

Deviations from a normal distribution in the asymptotic case

The arguments leading to Equation 49 hold for large ${\hat{λ}}_{R}$ if the internal rates of $ℛ$ are comparable to ${\bar{λ}}_{R} = \sum_{i} λ_{i} \frac{{\bar{n}}_{i}^{*}}{{\bar{n}}_{R}^{*}}$ , which is satisfied for all cases we sampled randomly for numerical simulations, see 'GIA model for large'. However, if internal rates of $ℛ$ are much smaller than $λ_{R}$ , then the adiabatic approximation $P_{i}^{R} (t) \approx \frac{{\bar{n}}_{i}^{*}}{{\bar{n}}_{R}^{*}}$ does not apply and $λ_{R} (t)$ may vary slower than the time scale $1 / {\bar{γ}}_{C}$ . For example, consider a GIA model in which $ℛ$ can be decomposed into two sub-compartments, say $ℛ_{1}$ and $ℛ_{2}$ , whereby any rates $ω_{i j}, ω_{j i}$ with $i \in ℛ_{1}, j \in ℛ_{2}$ have $ω_{i j}, ω_{j i} ≪ {\bar{λ}}_{R}$ , as the example discussed in 'GIA^B test case: bimodal distribution'. Then, the single cell in $ℛ$ (note that always $n_{R} = 1$ in GIA models) may spend long time periods in $ℛ_{1}$ and $ℛ_{2}$ respectively. Now, if ${\bar{λ}}_{R_{1}} = \sum_{i \in ℛ_{1}} λ_{i} \frac{{\bar{n}}_{i}}{{\bar{n}}_{R_{1}}} \neq \sum_{i \in ℛ_{2}} λ_{i} \frac{{\bar{n}}_{i}}{{\bar{n}}_{R_{2}}} = {\bar{λ}}_{R_{2}}$ , then, for time periods exceeding $1 / {\bar{γ}}_{C}$ , the effective asymmetric division rates are ${\bar{λ}}_{R_{1}}$ and ${\bar{λ}}_{R_{2}}$ respectively, and during these time periods the distribution of $n_{C}$ cells has mean ${\bar{n}}_{C}^{(1)} \sim {\bar{λ}}_{R_{1}}$ and ${\bar{n}}_{C}^{(2)} \sim {\bar{λ}}_{R_{2}}$ respectively. Hence, the total clone size distribution will be the mix of two Normal distributions with mean ${\bar{n}}_{C}^{(1)}$ and ${\bar{n}}_{C}^{(2)}$ , respectively, that is, a bimodal distribution. This scenario is discussed in 'GIA^B test case: bimodal distribution', for the specific case of two states in $ℛ$ .

GIA models

In GIA models, the number of R-cells is conserved, and in particular, for clones, we have $n_{R} = 1$ for all times. Hence, the rate of $R C$ -events is simply $λ_{R}$ . Now, if internal rates are fast and $λ_{R} \to \infty$ , then $n_{C}$ becomes normally distributed, as argued above. Hence, also $n = n_{R} + n_{C} = 1 + n_{C}$ follows a Normal distribution, with mean $n_{C} + 1$ instead.

Nonetheless, if internal rates are less than γ_C then bimodal distributions may be observed, as discussed in 'GIA^B test case: bimodal distribution'.

GPA models

The dynamics of GPA models read, in compartment formulation,

R \overset{λ_{R}}{\to} {\begin{cases} R + R & P r . r_{R R} \\ R + C & P r . 1 - r_{R R} - r_{C C} \\ C + C & P r . r_{C C} \end{cases},

R \overset{ω_{R C}}{⟶} C, C \overset{λ_{C}}{⟶} C + C, C \overset{γ_{C}}{⟶} \emptyset

Since the dynamics of R-cells do not depend on C-cells, we can first consider the formers’ dynamics separately. In homeostasis, where $λ_{R} r_{R R} = λ_{R} r_{C C} + ω_{R C}$ , we have thus for R-cells,

n_{R} \overset{λ_{R} r_{R R} n_{R}}{\to} n_{R} \pm 1

This is a simple continuous time branching process with two offspring; yet it is non-Markovian: subsequent events may be correlated, since each event imbalances the internal distribution $P_{i}^{R}$ of cells in the compartment $ℛ$ . Yet, as for C-cells, we can write the number of R-cells as a sum of independent (but not identically distributed) random variables. Let us consider for each R-cells, born at time $t_{i}$ , the random variable $ξ_{i}^{R}$ describing its 'survival' state, that is, $ξ_{i}^{R} = 1$ if that cell is still in $ℛ$ , and $ξ_{i}^{R} = 0$ if that cell has left $ℛ$ via symmetric differentiation, $R \to C + C$ or direct transition, $R \to C$ . Essentially, the random numbers $ξ_{i}^{R}$ are the ‘branches’ of the branching process. Since these events do not depend on other cells, the random numbers $ξ_{i}^{R}$ are independent of each other, and thus,

n_{R} (t) = \sum_{i = 1}^{N_{b} (t)} ξ_{i}^{R} (t),

is a sum of independent, not identically distributed random variables. Here, $N_{b} (t)$ is the total number of birth events occurring at rate $λ_{R} r_{R R} n_{R}$ , $R \to R + R$ , up to time t. Since $ξ_{i}^{R} (t) \leq 1$ and $ξ_{i}^{R} (t = t_{i}) = 1$ , we can argue analogue to above for Equation 49 that the sequence of $ξ_{i}^{R}$ fulfills Lindeberg’s condition and thus $n_{R}$ converges to a Normal distribution, whereby the mean value ${\bar{n}}_{R} = 1$ (since due to homeostasis the mean number is constant and the initial condition is $n_{R} (t = 0) = 1$ ). Hence, the probability to have $n_{R}$ cells in $ℛ$ is

P (n_{R}) \propto e^{- \frac{{(n_{R} - 1)}^{2}}{2 σ_{R}^{2}}} \sim e^{- \frac{n_{R}^{2}}{2 σ_{R}^{2}}} for n_{R} ≫ 1 .

However, here, the variance $σ_{R}^{2}$ is a random variable itself: Since the $ξ_{i}^{R}$ are independent, $σ_{R}^{2} = \sum_{i = 1}^{N_{b} (t)} σ_{i}^{2}$ , where $σ_{i}^{2} = ⟨ {(ξ_{i}^{R} - {\bar{ξ}}^{R})}^{2} ⟩$ , and where $N_{b} (t)$ is a random variable. The random numbers $ξ_{i}^{R}$ can only have the values $ξ_{i} = 1$ or $ξ_{i}^{R} = 0$ and they follow a simple death process, so for $ξ^{R} = 0$ , it must be $σ_{i}^{2} = 0$ , while for $ξ_{i}^{R} = 1$ , the variance must be finite, let’s say, $σ_{i}^{2} = β (t) > 0$ where $β$ can in principle depend on time, yet is not known (it depends on the non-Markovian details of the model). Hence,

σ_{R}^{2} = \sum_{i = 1}^{N_{b} (t)} β (t) ξ_{i}^{R} = β (t) n_{R}

since the number of summands with $ξ_{i}^{R} = 1$ is the number of surviving R-cells, that is, $n_{R}$ . Substituting $σ_{R}^{2} = β (t) n_{R}$ into Equation 54 gives,

P (n_{R}) \sim e^{- \frac{n_{R}^{2}}{2 β (t) n_{R}}} = e^{- \frac{n_{R}}{2 β (t)}}

This is an Exponential distribution with mean value ${\bar{n}}_{R} = ⟨ n_{R} ⟩ = 2 β (t)$ . Finally, when we enforce normalisation of the probability distribution, we get,

P (n_{R}) = \frac{1}{{\bar{n}}_{R} (t)} e^{- \frac{n_{R}}{{\bar{n}}_{R} (t)}} for n_{R} ≫ 1 .

Eventually, we also have to 'add' the C-cells. Since for $t ≫ 1$ , also $n_{R} ≫ 1$ , individual events $n_{R} \to n_{R} \pm 1$ do not significantly affect the distribution of R-cells, $P_{i}^{R} = \frac{{\bar{n}}_{i}}{{\bar{n}}_{R}}$ (in contrast to the case of $n_{R} = 1$ for GIA models), and hence we can assume the adiabatic approximation discussed above, where $P_{i}^{R} \approx P_{i}^{R, *}$ and thus $λ_{R} \approx c o n s t$ . Therefore, C-cells are distributed according to a Normal distribution with mean ${\bar{n}}_{C}$ and variance $σ_{n_{2}}^{2} \sim {\bar{n}}_{C} \sim λ_{R} n_{R}$ . As argued in the main text, the mean value of $n_{R}$ , conditionend on survival of a clone, $n_{R} > 0$ , must grow over time, without bound if $t \to \infty$ . Therefore, we can generally assume that $n_{R} ≫ 1$ , and hence both ${\bar{n}}_{C} \sim n_{R} \to \infty$ and $σ_{C}^{2} \sim n_{R} \to \infty$ . However, if we express the clone size in form of a rescaled variable $x = \frac{n}{{\bar{n}}_{s}}$ ( ${\bar{n}}_{s}$ is the mean of surviving clones) we can write $x = x_{R} + x_{C}$ with $x_{R} = \frac{n_{R}}{{\bar{n}}_{s}}$ and $x_{C} = \frac{n_{C}}{{\bar{n}}_{s}}$ , and note that the rescaled standard width of the distribution of $x_{C}$ , $σ_{x_{C}} = \frac{σ_{C}}{\bar{n}} \sim \frac{\sqrt{{\bar{n}}_{C}}}{{\bar{n}}_{R} + {\bar{n}}_{C}} \sim \frac{\sqrt{n_{R}}}{n_{R}}$ vanishes for $t \to \infty$ . Therefore, $x_{C}$ is effectively a constant in that limit, $x_{C} \approx {\bar{x}}_{C} \propto x_{R}$ . Hence, also $x = x_{R} + x_{C} \propto x_{R}$ and thus, the rescaled clone size, $x = \frac{n}{{\bar{n}}_{s}}$ , is distributed according to an Exponential distribution (here: a probability density function) with unit mean, and after renormalisation, we get that

P (x) = e^{- x} for t \to \infty .

This distribution is indeed observed in all our simulations of GPA models for large t.

Data availability

All numerical data used for figures is produced by programme code, which can be found on Github, under https://github.com/cp4u17/simCellState (copy archived at https://github.com/elifesciences-publications/simCellState).

The following data sets were generated

1. Parigini C
(2020) Github
ID cp4u17/simCellState. simCellState.

https://github.com/cp4u17/simCellState

References

Book
1. Abramowitz M
2. Stegun IA
(1972)
Handbook of Mathematical Functions with Formulas, Graphs and Mathematical Tables, 32

United States Department of Commerce, National Bureau of Standards.
- Google Scholar
1. Antal T
2. Krapivsky PL
(2010) Exact solution of a two-type branching process: clone size distribution in cell division kinetics
Journal of Statistical Mechanics: Theory and Experiment 2010:P07028.

https://doi.org/10.1088/1742-5468/2010/07/P07028
- Google Scholar
1. Arrow KJ
(1989) A ”dynamic” proof of the Frobenuis-Perron theorem for Metzler matrices
Probability Statistics, and Mathematics 1:17–26.

https://doi.org/10.1016/B978-0-12-058470-3.50009-4
- Google Scholar
Book
1. Åström KJ
2. Murray RM
(2008)
Feedback Systems: An Introduction for Scientists and Engineers

Princeton Unversity Press.
- Google Scholar
Book
1. Bang-Jensen J
2. Gutin GZ
(2007) Digraphs: Theory, Algorithms and Applications
Springer.

https://doi.org/10.1007/978-1-84800-998-1
- Google Scholar
Book
1. Billingsley P
(1968)
Convergence of Probability Measures

Jon Wiley & Sons.
- Google Scholar
Book
1. Billingsley P
(1995)
Probability and Measure

John Wiley and Sons.
- Google Scholar
1. Blanpain C
2. Fuchs E
(2014) Stem cell plasticity. Plasticity of epithelial stem cells in tissue regeneration
Science 344:1242281.

https://doi.org/10.1126/science.1242281
- PubMed
- Google Scholar
1. Blanpain C
2. Simons BD
(2013) Unravelling stem cell dynamics by lineage tracing
Nature Reviews Molecular Cell Biology 14:489–502.

https://doi.org/10.1038/nrm3625
- PubMed
- Google Scholar
1. Bramson M
2. Griffeath D
(1980) Asymptotics for interacting particle systems onZ ^d
Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete 53:183–196.

https://doi.org/10.1007/BF01013315
- Google Scholar
1. Clayton E
2. Doupé DP
3. Klein AM
4. Winton DJ
5. Simons BD
6. Jones PH
(2007) A single type of progenitor cell maintains normal epidermis
Nature 446:185–189.

https://doi.org/10.1038/nature05574
- PubMed
- Google Scholar
1. Clifford P
2. Sudbury A
(1973) A model for spatial conflict
Biometrika 60:581–588.

https://doi.org/10.1093/biomet/60.3.581
- Google Scholar
Book
1. Cormen TH
(2009)
Introduction to Algorithms

MIT Press.
- Google Scholar
1. Donati G
2. Watt FM
(2015) Stem cell heterogeneity and plasticity in epithelia
Cell Stem Cell 16:465–476.

https://doi.org/10.1016/j.stem.2015.04.014
- PubMed
- Google Scholar
1. Doupé DP
2. Alcolea MP
3. Roshan A
4. Zhang G
5. Klein AM
6. Simons BD
7. Jones PH
(2012) A single progenitor population switches behavior to maintain and repair esophageal epithelium
Science 337:1091–1093.

https://doi.org/10.1126/science.1218835
- PubMed
- Google Scholar
1. Gillespie DT
(1977) Exact stochastic simulation of coupled chemical reactions
The Journal of Physical Chemistry 81:2340–2361.

https://doi.org/10.1021/j100540a008
- PubMed
- Google Scholar
Book
1. Goldberg DE
(1989)
Genetic Algorithms in Search, Optimization and Machine Learning (1st edn)

Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc.
- Google Scholar
Book
1. Greulich P
(2019) Mathematical Modelling of Clonal Stem Cell Dynamics
In: Cahan P, editors. Computational Stem Cell Biology. New York: Humana. pp. 107–129.

https://doi.org/10.1007/978-1-4939-9224-9_5
- Google Scholar
(2019) Stability and steady state of complex cooperative systems: a diakoptic approach
Royal Society Open Science 6:191090.

https://doi.org/10.1098/rsos.191090
- PubMed
- Google Scholar
1. Greulich P
2. Simons BD
(2016) Dynamic heterogeneity as a strategy of stem cell self-renewal
PNAS 113:7509–7514.

https://doi.org/10.1073/pnas.1602779113
- PubMed
- Google Scholar
Book
(2005) Branching Processes: Variation, Growth, and Extinction of Populations
Cambridge: Cambridge University Press.

https://doi.org/10.1017/CBO9780511629136
- Google Scholar
1. Hara K
2. Nakagawa T
3. Enomoto H
4. Suzuki M
5. Yamamoto M
6. Simons BD
7. Yoshida S
(2014) Mouse spermatogenic stem cells continually interconvert between equipotent singly isolated and syncytial states
Cell Stem Cell 14:658–672.

https://doi.org/10.1016/j.stem.2014.01.019
- PubMed
- Google Scholar
1. Hirsch MW
2. Smith H
(2006)
Handbook of Differential Equations: Ordinary Differential Equations

239–257, Monotone dynamical systems, Handbook of Differential Equations: Ordinary Differential Equations, Elsevier, 10.1016/S1874-5725(05)80006-9.
- Google Scholar
(2010) Mouse germ line stem cells undergo rapid and stochastic turnover
Cell Stem Cell 7:214–224.

https://doi.org/10.1016/j.stem.2010.05.017
- PubMed
- Google Scholar
1. Klein AM
2. Simons BD
(2011) Universal patterns of stem cell fate in cycling adult tissues
Development 138:3103–3111.

https://doi.org/10.1242/dev.060103
- PubMed
- Google Scholar
1. Kretzschmar K
2. Watt FM
(2012) Lineage tracing
Cell 148:33–45.

https://doi.org/10.1016/j.cell.2012.01.002
- PubMed
- Google Scholar
(2010) Intestinal stem cell replacement follows a pattern of neutral drift
Science 330:822–825.

https://doi.org/10.1126/science.1196236
- PubMed
- Google Scholar
Report
1. National Institute of Health
(2009)
Stem Cell Basics

NIH.
- Google Scholar
1. Pittet MJ
2. Weissleder R
(2011) Intravital imaging
Cell 147:983–991.

https://doi.org/10.1016/j.cell.2011.11.004
- PubMed
- Google Scholar
1. Potten CS
2. Loeffler M
(1990)
Stem cells: attributes, cycles, spirals, pitfalls and uncertainties. Lessons for and from the crypt

Development 110:1001–1020.
- PubMed
- Google Scholar
1. Ritsma L
2. Steller EJ
3. Beerling E
4. Loomans CJ
5. Zomer A
6. Gerlach C
7. Vrisekoop N
8. Seinstra D
9. van Gurp L
10. Schäfer R
11. Raats DA
12. de Graaff A
13. Schumacher TN
14. de Koning EJ
15. Rinkes IH
16. Kranenburg O
17. van Rheenen J
(2012) Intravital microscopy through an abdominal imaging window reveals a pre-micrometastasis stage during liver metastasis
Science Translational Medicine 4:158ra145.

https://doi.org/10.1126/scitranslmed.3004394
- PubMed
- Google Scholar
1. Rompolas P
2. Mesa KR
3. Kawaguchi K
4. Park S
5. Gonzalez D
6. Brown S
7. Boucher J
8. Klein AM
9. Greco V
(2016) Spatiotemporal coordination of stem cell commitment during epidermal homeostasis
Science 352:1471–1474.

https://doi.org/10.1126/science.aaf7012
- PubMed
- Google Scholar
1. Sauer B
(1998) Inducible gene targeting in mice using the Cre/lox system
Methods 14:381–392.

https://doi.org/10.1006/meth.1998.0593
- PubMed
- Google Scholar
1. Simons BD
2. Clevers H
(2011a) Strategies for homeostatic stem cell self-renewal in adult tissues
Cell 145:851–862.

https://doi.org/10.1016/j.cell.2011.05.033
- PubMed
- Google Scholar
1. Simons BD
2. Clevers H
(2011b) Stem cell self-renewal in intestinal crypt
Experimental Cell Research 317:2719–2724.

https://doi.org/10.1016/j.yexcr.2011.07.010
- PubMed
- Google Scholar
1. Soriano P
(1999) Generalized lacZ expression with the ROSA26 cre reporter strain
Nature Genetics 21:70–71.

https://doi.org/10.1038/5007
- PubMed
- Google Scholar
(2015) Plasticity within stem cell hierarchies in mammalian epithelia
Trends in Cell Biology 25:100–108.

https://doi.org/10.1016/j.tcb.2014.09.003
- PubMed
- Google Scholar
(2016) Replacement of lost Lgr5-Positive stem cells through plasticity of their Enterocyte-Lineage daughters
Cell Stem Cell 18:203–213.

https://doi.org/10.1016/j.stem.2016.01.001
- PubMed
- Google Scholar
Book
1. Van Kampen N
(1981)
Stochastic Processes in Physics and Chemistry

Elsevier.
- Google Scholar
1. Watt FM
2. Hogan BL
(2000) Out of eden: stem cells and their niches
Science 287:1427–1430.

https://doi.org/10.1126/science.287.5457.1427
- PubMed
- Google Scholar
1. Weinreb C
2. Wolock S
3. Tusi BK
4. Socolovsky M
5. Klein AM
(2018) Fundamental limits on dynamic inference from single-cell snapshots
PNAS 115:E2467.

https://doi.org/10.1073/pnas.1714723115
- PubMed
- Google Scholar

Article and author information

Author details

Cristina Parigini
1. School of Mathematical Science, University of Southampton, Southampton, United Kingdom
2. Institute for Life Sciences, University of Southampton, Southampton, United Kingdom
Contribution
Conceptualization, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing, Mathematical analysis (part), Numerical analysis

Competing interests
No competing interests declared
Philip Greulich
1. School of Mathematical Science, University of Southampton, Southampton, United Kingdom
2. Institute for Life Sciences, University of Southampton, Southampton, United Kingdom
Contribution
Conceptualization, Supervision, Funding acquisition, Project administration, Writing - review and editing, Mathematical analysis (part)

For correspondence
P.S.Greulich@soton.ac.uk

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0001-5247-6738

Funding

Medical Research Council (MR/R026610/1)

Philip Greulich

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Benjamin D MacArthur for valuable discussions that contributed in the development of this research. CP is supported by a Studentship of the Institute for Life Sciences (Southampton) and PG by Medical Research Council New Investigator Research Grant MR/R026610/1.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

979

views
191

downloads
12

citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Citations by DOI

12

citations for umbrella DOI https://doi.org/10.7554/eLife.56532

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Article PDF

Open citations (links to open the citations from this article in various online reference manager services)

Mendeley

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Cristina Parigini
Philip Greulich

(2020)

Universality of clonal dynamics poses fundamental limits to identify stem cell self-renewal strategies

eLife 9:e56532.

https://doi.org/10.7554/eLife.56532

Share this article

Cite this article

Illustration of the decomposition of a homeostatic cell state network into SCCs and the compartment representation, Equation 9.

Mean size of surviving clones, n¯s, as a function of time for random GIA models (a), and GPA models (b).

Rescaled clone size distributions (expected relative frequency P of clone sizes) for random GIA models (a), and GPA models (b), in terms of the rescaled clone size x=n/n¯s, at final time t=τ (see Figure 2 for definition).

Rescaled clone size distributions (expected relative frequency P of clone sizes) for random GIA models as in Figure 3, at time t=τ (see definition in Figure 2).

Invariant Asymmetry (IA) test cases clone size distribution P⁢(n), that is the distribution of the total number of cells n forming the progeny of a single initial cell in ℛ.

Population Asymmetry (PA) test cases clone size distribution P⁢(n), that is the distribution of the total number of cells n forming the progeny of a single initial stem cell.

IA and PA test cases simulation parameters (see 'Invariant Asymmetry and Population Asymmetry models').

Metastate (MS) test cases simulation results in terms of mean number of cells in the surviving clones n¯s and extinction probability P(n=0) as function of time (scaled by the final simulation time τ).

Metastate (MS) test cases simulation results in terms clone size distribution P⁢(n), that is the distribution of the total number of cells n forming the progeny of a single initial stem cell.

GIA0 test case parameters λ^1 and λ^2 over the contour map of the expected steady state mean number of cells in state X2, n¯2*.

GIA0 test case (see GIA0 test case: steady state distribution and limiting behavior') results in terms of steady state distribution P*⁢(n2) of the the number of cells in state X2, n2.

GIA0 test case (see 'GIA0 test case: steady state distribution and limiting behavior') results in terms of steady state rescaled distribution P*⁢(x2) of the the number of cells in state X2, where x2=n2/n¯2*.

GIA0 test case (see 'GIA0 test case: steady state distribution and limiting behavior') results in terms of steady state rescaled distribution P*⁢(x2) of the the number of cells in state X2, where x2=n2/n¯2*.

Relative error of the the equivalent model approximation, ϵ, (see definition in 'Approximation of generic GIA models') as function of λ^2=λ^C for the GIA random models (generated as described in 'Generation of random models' and analyzed in the main text).

Rescaled clone size distribution for the random GIA models when λ^R=30 at the final simulation time, which corresponds to 20/αmin (αmin is the minimum process rate).

GIAB test case simulation parameters (see 'GIAB test case: bimodal distribution').

Rescaled distribution of the cells number in the committed compartment in the G⁢I⁢AB test cases at time τ, which is 20/αmin (αmin is the minimum process rate).

Clonal size distribution (corresponding to the 50%ile curve) in the GPA random models at different extinction fraction (i.e. different time).

Author details

Cristina Parigini

Contribution

Competing interests

Philip Greulich

Contribution

For correspondence

Competing interests

Citations by DOI

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Mean size of surviving clones, ${\bar{n}}_{s}$ , as a function of time for random GIA models (a), and GPA models (b).

Rescaled clone size distributions (expected relative frequency $P$ of clone sizes) for random GIA models (a), and GPA models (b), in terms of the rescaled clone size $x = n / {\bar{n}}_{s}$ , at final time $t = τ$ (see Figure 2 for definition).

Rescaled clone size distributions (expected relative frequency P of clone sizes) for random GIA models as in Figure 3, at time $t = τ$ (see definition in Figure 2).

Invariant Asymmetry (IA) test cases clone size distribution $P (n)$ , that is the distribution of the total number of cells n forming the progeny of a single initial cell in $ℛ$ .

Population Asymmetry (PA) test cases clone size distribution $P (n)$ , that is the distribution of the total number of cells n forming the progeny of a single initial stem cell.

Metastate (MS) test cases simulation results in terms of mean number of cells in the surviving clones ${\bar{n}}_{s}$ and extinction probability $P (n = 0)$ as function of time (scaled by the final simulation time τ).

Metastate (MS) test cases simulation results in terms clone size distribution $P (n)$ , that is the distribution of the total number of cells n forming the progeny of a single initial stem cell.

GIA⁰ test case parameters ${\hat{λ}}_{1}$ and ${\hat{λ}}_{2}$ over the contour map of the expected steady state mean number of cells in state $X_{2}$ , ${\bar{n}}_{2}^{*}$ .

GIA⁰ test case (see GIA⁰ test case: steady state distribution and limiting behavior') results in terms of steady state distribution $P^{*} (n_{2})$ of the the number of cells in state $X_{2}$ , $n_{2}$ .

GIA⁰ test case (see 'GIA⁰ test case: steady state distribution and limiting behavior') results in terms of steady state rescaled distribution $P^{} (x_{2})$ of the the number of cells in state $X_{2}$ , where $x_{2} = n_{2} / {\bar{n}}_{2}^{}$ .

GIA⁰ test case (see 'GIA⁰ test case: steady state distribution and limiting behavior') results in terms of steady state rescaled distribution $P^{} (x_{2})$ of the the number of cells in state $X_{2}$ , where $x_{2} = n_{2} / {\bar{n}}_{2}^{}$ .

Relative error of the the equivalent model approximation, $ϵ$ , (see definition in 'Approximation of generic GIA models') as function of ${\hat{λ}}_{2} = {\hat{λ}}_{C}$ for the GIA random models (generated as described in 'Generation of random models' and analyzed in the main text).

Rescaled clone size distribution for the random GIA models when ${\hat{λ}}_{R} = 30$ at the final simulation time, which corresponds to $20 / α_{\min}$ ( $α_{\min}$ is the minimum process rate).

GIA^B test case simulation parameters (see 'GIA^B test case: bimodal distribution').

Rescaled distribution of the cells number in the committed compartment in the $G I A^{B}$ test cases at time τ, which is $20 / α_{\min}$ ( $α_{\min}$ is the minimum process rate).