Universality of clonal dynamics poses fundamental limits to identify stem cell selfrenewal strategies
Abstract
How adult stem cells maintain selfrenewing tissues is commonly assessed by analysing clonal data from in vivo cell lineagetracing assays. To identify strategies of stem cell selfrenewal requires that different models of stem cell fate choice predict sufficiently different clonal statistics. Here, we show that models of cell fate choice can, in homeostatic tissues, be categorized by exactly two ‘universality classes’, whereby models of the same class predict, under asymptotic conditions, the same clonal statistics. Those classes relate to generalizations of the canonical asymmetric vs. symmetric stem cell selfrenewal strategies and are distinguished by a conservation law. This poses both challenges and opportunities to identify stem cell selfrenewal strategies: while under asymptotic conditions, selfrenewal models of the same universality class cannot be distinguished by clonal data only, models of different classes can be distinguished by simple means.
Introduction
Adult stem cells are the key players for maintaining and renewing biological tissue, due to their ability to persistently produce tissue cells through cell division and differentiation (National Institute of Health, 2009). For maintaining tissues in a homeostatic state, it is crucial that stem cells adopt suitable selfrenewal strategies, a pattern of stem cell fate choices that balances proliferation and differentiation; otherwise, imbalanced proliferation may lead to hyperplasia and cancer. Therefore, the understanding and identification of stem cell selfrenewal strategies has been a major goal of stem cell biology ever since the discovery of adult stem cells.
Classically, two stem cell selfrenewal strategies have been proposed (Potten and Loeffler, 1990; Simons and Clevers, 2011a): following the Invariant Asymmetric division (IA) strategy, stem cells undertake only asymmetric divisions, whose outcome is one differentiating cell and one stem cell as daughter cells. The other proposed strategy, Population Asymmetry (PA) (Potten and Loeffler, 1990; Simons and Clevers, 2011a; Watt and Hogan, 2000; Klein and Simons, 2011), features additionally symmetric divisions, which produce either two stem cells or two differentiating cells as daughters, yet in balanced proportions. Both patterns of cell fate choice leave the number of cells on average unchanged and thus can maintain homeostasis. Assessing stem cell selfrenewal strategies experimentally is difficult in vivo, since direct observation of cell divisions is rarely possible. Yet, through genetic cell lineagetracing assays, the statistics of clones – the progeny of individual cells – can be obtained, and via mathematical modeling assessing cell fate dynamics became possible. With such an approach several studies suggested that population asymmetry prevails in many mouse tissues (e.g. Clayton et al., 2007; LopezGarcia et al., 2010; Simons and Clevers, 2011b; Doupé et al., 2012; Klein et al., 2010).
However, the interpretation of those studies has been challenged by a suggested alternative selfrenewal strategy, called Dynamic Heterogeneity (DH), featuring some degree of cell fate plasticity (Greulich and Simons, 2016). In this model, all stem cell divisions are asymmetric, yet it is in agreement with the experimental clonal data that had previously been shown to agree also with the population asymmetry strategy. Thus, those two strategies are not distinguishable in view of the clonal data.
This raises the question to what extent different stem cell selfrenewal strategies can be distinguished at all via clonal data (Klein and Simons, 2011; Greulich, 2019). Here, we address this question by studying models for stem cell fate choice, which define the selfrenewal strategies, in their most generic form. We show that many cell fate models predict, under asymptotic conditions, the same clonal statistics and thus cannot be distinguished via clonal data from cell lineagetracing experiments. In particular, we find that there exist two particular classes of stem cell selfrenewal strategies: one class of models which all generate an Exponential distribution of clone sizes (the number of cells in a clone) after sufficiently large time, and one which generates a Normal distribution under sufficiently fast stem cell proliferation. Crucially, these two classes are not differentiated via the classical definitions of symmetric and asymmetric stem cell divisions, but by whether or not a subset of cells is conserved. These classes thus bear resemblance to 'universality classes' known from statistical physics, as suggested in Klein and Simons, 2011. This leads us to a more generic, and in this context more useful, definition of the terms ‘symmetric’ and ‘asymmetric’ divisions. Notably, however, we find that the conditions for the emergence of universality are not always fulfilled in real tissues, which provides chances, but also further challenges, for the identification of stem cell fate choices in homeostatic tissues.
Strategies for stem cell selfrenewal
The two classical stem cell selfrenewal strategies, Invariant Asymmetry (IA) and Population Asymmetry (PA) (Potten and Loeffler, 1990; Simons and Clevers, 2011a; Watt and Hogan, 2000; Klein and Simons, 2011), are commonly described in terms of two cell types: stem cells (S) which can selfrenew (i.e. divide without reducing their potential to divide in the future); and differentiating cells (D). Both strategies can be expressed in terms of a single parametrized stochastic model, a multitype branching process (Haccou et al., 2005), defined by the outcomes of cell divisions (the cell fate choices),
where cells of type S divide with rate λ. Here, a daughter cell configuration $S+S$ corresponds to symmetric selfrenewal division and $D+D$ to symmetric differentiation, while daughter cells of different type, $S+D$, marks an asymmetric division. In the basic model version, a cell of type D is eventually lost with rate γ, $D\stackrel{\mathit{\gamma}}{\u27f6}\mathrm{\varnothing}$ (corresponding to death, shedding, or emigration of Dcells), while other versions may include the possibility of limited proliferation as committed progenitor cells. The two selfrenewal strategies, IA and PA, are distinguished by the value of the symmetric division fraction r: the PA model corresponds to any $0<r\le \frac{1}{2}$; the IA model is defined by $r=0$, that is, only asymmetric divisions occur.
To maintain homeostasis, the number of cells must stay, on average, constant. Thus cells following the PA strategy must regulate the probabilities of symmetric selfrenewal and differentiation to be exactly equal, whereas for the IA model this is trivially assured. However, only for the IA model is the number of stem cells strictly conserved, that is, no gain or loss of stem cells is possible.
A way to assess selfrenewal strategies experimentally is via genetic celllineage tracing (Kretzschmar and Watt, 2012; Blanpain and Simons, 2013): By marking single cells with an inheritable genetic marker (through a CreLox system [Soriano, 1999; Sauer, 1998]) each cell’s progeny, called a clone, which retain that marker, can be traced. The number of cells per clone, that is the clone size, is measured and the statistical frequency distribution of clone sizes (clone size distribution) determined. To test the cell fate choice models on that data, one evaluates the models with a single cell as initial condition and samples the outcome in terms of the final cell numbers – the size of a virtual clone. In the basic version of the model (i.e. when $D\stackrel{\mathit{\gamma}}{\u27f6}\mathrm{\varnothing}$), the IA and PA models predict, respectively, a Poisson and an Exponential clone size distribution for large times (Klein and Simons, 2011; Antal and Krapivsky, 2010) (see also the Appendix, 'Invariant Asymmetry and Population Asymmetry models'). Thus, they are fundamentally different and can easily be distinguished when compared with clonal data. By a series of lineagetracing experiments it was confirmed that Exponential clone size distributions prevail for most mouse tissues, which thus exclude the IA model and support the PA strategy (Clayton et al., 2007; LopezGarcia et al., 2010; Simons and Clevers, 2011b; Doupé et al., 2012; Klein et al., 2010).
While this seemed to settle the case in favour of the PA strategy, at least for most adult mouse tissues, this was challenged by a third type of strategy, the DH model (Greulich and Simons, 2016). Motivated by the emerging view of prevailing cell plasticity (Blanpain and Fuchs, 2014; Tetteh et al., 2015; Tetteh et al., 2016; Donati and Watt, 2015), the DH model considers the possibility of reversible switching between two cell types:
where symbols at arrows denote the process rates (frequency of events). This strategy is also capable of maintaining a homeostatic population if $\gamma /\lambda ={\omega}_{S}/{\omega}_{D}$. Notably, the DH model only features asymmetric divisions (in that daughter cells are of different type), like the IA model, yet the DH model predicts clonal statistics that are indistinguishable from the PA model (Greulich and Simons, 2016). This means that in view of the existing clonal data for mouse tissues, the DH model, may as well describe the real cell fate dynamics. More fundamentally, this implies that the PA and DH model cannot be distinguished via plain clonal data, which poses fundamental limitations to the common approach to use lineage tracing for determining cell fate choices.
This demonstrates that the classical definition of asymmetric and symmetric divisions is not always suitable to distinguish cell fate strategies in view of clonal data alone. In general, cell fate dynamics may be much more complex than the simplified models described above, as there may exists a plethora of cell (sub)types in a tissue. However, to what extent would it be possible to distinguish details of potentially rather complex cell fate dynamics models through comparison with clonal data at all? This is only the case if the clonal statistics are sufficiently different. In the following, we study cell fate models in their most generic form, and analyze what clonal statistics would be expected.
Results
Model generalization
Let us consider the dynamics of a generic system of cells, characterized by a number m of possible cell states X_{i}, $i=1,\mathrm{\dots},m$. We define a cell state here as a group of cells showing common properties (e.g. any cell subtype classification). Most generally, cells in a state X_{i} may be able to divide, producing daughter cells of any cell states X_{j} and X_{k} (where $i=j=k$, that is, simple cell duplication, is possible). Furthermore, any cell state X_{i} may turn into another state X_{j} or may be lost (through emigration, shedding, or death). Hence, we can write a generic cell fate model as,
where $i,j,k=1,\mathrm{\dots},m$. In this model, ${\lambda}_{i}$ is the rate of division of cells in state X_{i} and the parameter ${r}_{i}^{jk}$ corresponds to the proportion of division outcomes producing daughter cells of state X_{j} and state X_{k}; ω_{ij} is the transition rate from state X_{i} to state X_{j} and ${\gamma}_{i}$ the loss rate from state X_{i}.
The dynamics of each cell in Equations 35 could depend on the cell environment through spatial, cellextrinsic regulation of cell fate. However, the clonal statistics of spatial models that include cellextrinsic regulation of cell fate (models of the voter type [Clifford and Sudbury, 1973]) are, in the long term, the same as for the corresponding branching process models (Haccou et al., 2005), as Equations 35 are, except for onedimensional arrangements of cells (as shown in Klein and Simons, 2011; Bramson and Griffeath, 1980). Here, we are focussing on the longterm clonal statistics of selfrenewal strategies, and since this is not affected by cellextrinsic regulation, for tissues with twodimensional or threedimensional arrangements of dividing cells (like epithelial sheets, and volumnar tissue), we wish to keep the analysis simple and therefore choose dynamics (and thus the parameters ${\lambda}_{i},{\omega}_{ij},{r}_{i}^{jk},{\gamma}_{i}$) to be independent of the cell environment.
In the following, we study the dynamics of cell numbers in each state X_{i}, ${n}_{i}$. To gain initial insight into those dynamics, let us first consider the time evolution of the mean cell numbers, ${\overline{n}}_{i}=\u27e8{n}_{i}\u27e9$, given by,
in which ${r}_{i}^{j}={\sum}_{k}({r}_{i}^{jk}+{r}_{i}^{kj})/2$ is the probability of having a daughter cell in state X_{j} produced upon division of a cell in state X_{i}. This linear system of differential equations can be written more compactly in terms of the mean cell number vector $\overline{\mathit{\bm{n}}}=({\overline{n}}_{1},{\overline{n}}_{2},\mathrm{\dots},{\overline{n}}_{m})$,
with $A$ being the $m\times m$ matrix
where we defined the total transition rate ${\kappa}_{ij}={\lambda}_{i}2{r}_{i}^{j}+{\omega}_{ij}$, combining all transitions from X_{i} to X_{j} by cell divisions and direct transitions, and the local loss rate ${\delta}_{i}={\lambda}_{i}+{\sum}_{j}{\omega}_{ij}+{\gamma}_{i}$.
Models of the form Equations 3–5 are not generally in homeostasis, which in this context is defined by the existence of a stationary state ${\overline{\mathit{\bm{n}}}}^{*}$, with $d{\overline{\mathit{\bm{n}}}}^{*}/dt=0$, that is (Lyapunov) stable and nontrivial (for a discussion, see the Appendix 'Conditions for homeostasis'). This can in principle be assessed through the spectral properties of A (Åström and Murray, 2008), but applying spectral conditions explicitly is unwieldy and difficult to interpret biologically. For a more intuitive view, we interpret the system, Equation 7, as a network (graph): the matrix A can be interpreted as the adjacency matrix of the cell state network. This is a weighted directed graph in which cell states correspond to the graph’s nodes and a link from state X_{i} to X_{j} exists where a transition is possible, that is, when ${\kappa}_{ij}>0$. The value of ${\kappa}_{ij}$ also denotes the link weights (diagonal elements of A can be considered as selflinks). Now, we note that Equation 7 is linear and cooperative, that is, the offdiagonal elements of matrix A are nonnegative, and for such systems more simple and intuitive conditions for homeostasis exist (Greulich et al., 2019), based on a decomposition into the network’s Strongly Connected Component (SCC). An SCC is a subgraph that groups nodes which are strongly connected, that is, which are mutually connected by paths (more accurately: two nodes, X_{i} and X_{j} are strongly connected if there exists a path from X_{i} to X_{j} and from X_{j} to X_{i} on the network). An example of such a decomposition, which yields an acyclic condensed network that contains SCCs as nodes and directed links between them, is shown in Figure 1.
The stability of systems like Equation 7 is then determined by the dominant eigenvalues ${\mu}_{k}$ of each strongly connected component $k$, for $k=1,\mathrm{\dots},{m}_{S}$ where ${m}_{S}$ is the number of SCCs, and their topological arrangement (the PerronFrobenius theorem assures that for adjacency matrices of SCCs of cooperative systems, a unique, real, maximal eigenvalue exists, which is the dominant eigenvalue [Arrow, 1989; Greulich et al., 2019]). In brief, according to Greulich et al., 2019, the conditions for existence of a homeostatic state are that, at the apex of each lineage (the condensed cell state network), there must be an SCC with dominant eigenvalue ${\mu}_{k}=0$, while all SCCs downstream of the former must have ${\mu}_{k}<0$ (see detailed discussion in the Appendix, 'Conditions for homeostasis'). Given this structure of homeostatic models, we can define two compartments in the cell state transition network: (1) the (self) Renewing compartment ($\mathcal{\mathcal{R}}$), which is the SCC at the apex of the lineage tree; and (2) the Committed compartment ($\mathcal{\mathcal{C}}$), which consists of all SCCs with ${\mu}_{k}<0$, that is, those downstream of the apex SCC. Importantly, cells in states forming $\mathcal{\mathcal{R}}$ have the potential to return to any state within the same compartment and this population maintains itself. Instead, the cell population in $\mathcal{\mathcal{C}}$ would vanish without external input, since the combined dominant eigenvalue of all those SCCs is negative (it is the maximum of all SCCs’ ${\mu}_{k}<0$), thus the progeny of each cell in the committed compartment will eventually be lost. We can thereby classify cells as being of a (self)Renewing type (R) if their state is within $\mathcal{\mathcal{R}}$, and of a Committed type (C) if their state is in $\mathcal{\mathcal{C}}$. With this coarsegrained classification, a generic homeostatic model can be represented in terms of compartments $\mathcal{\mathcal{R}}$ and $\mathcal{\mathcal{C}}$ as,
where the symbols above arrows are the effective rates of those events, denoting the average frequency at which they occur (loss events $R\to \mathrm{\varnothing}$ are not explicitly included, since they can be approximated by a short lived state ${X}_{d}$ in $\mathcal{\mathcal{C}}$, as $R\to {X}_{d}\to \mathrm{\varnothing}$). To be compatible with a homeostatic condition, it is further required that (i) the Rpopulation remains on average constant (${\mu}_{k}=0$), that is, ${\lambda}_{R}{r}_{RR}={\lambda}_{R}{r}_{CC}+\omega $, and (ii) the loss rate of C must exceed its proliferation rate (${\mu}_{k}<0$), that is, ${\gamma}_{C}>{\lambda}_{C}$. Figure 1 shows how a generic homeostatic cell state network can be condensed into an effective model of renewing and committed cell states, according to Equation 9. It has to be noted, however, that the events depicted in Equation 9 are not Markovian, that is, the timing of events is not independent from each other and depends on their history. Thus, the ‘rates’ ${\lambda}_{R}$, ${\lambda}_{C}$, ${\omega}_{RC}$, and γ_{C} are not constant rates in the Markovian sense, yet we can define them by the mean frequency of events occurring (see Appendix 'Approximation of generic GIA models' and 'Asymptotic clone size distributions: mathematical analysis').
The formulation in terms of renewing and committed states can help us to gain insights into potential behaviors of generic homeostatic cell fate models. In particular, we define generalized asymmetric divisions as events of the type $R\to R+C$, and generalized symmetric divisions as events of the type $R\to R+R$ (symmetric renewal) and $R\to C+C$ (symmetric commitment). With these definitions, we can categorize homeostatic cell fate models into two classes: Generalized Invariant Asymmetry (GIA) models are those which only exhibit $R\to R+C$ divisions in the renewing compartment, while Generalized Population Asymmetry (GPA) are models for which such restriction does not hold. We note that the two classes are equivalently characterized by a conservation law: For GIA models, the number of cells in $\mathcal{\mathcal{R}}$ is strictly conserved, while for GPA models, no such conservation law holds. Since $\mu =0$ is necessary for conservation, the only possible conserved cell states in homeostasis are those in $\mathcal{\mathcal{R}}$. Naturally, the previously discussed IA model is a GIA model and the PA model is a GPA model. Notably, the DH model (Equation 2) is of the GPA category, since in that model S and D cells form a single SCC at the apex of the lineage hierarchy, and thus they are both part of $\mathcal{\mathcal{R}}$. Therefore, a division $S\to S+D$ in the DH model, which is asymmetric in the conventional sense, corresponds to $R\to R+R$ in terms of compartments (Equation 9) and thus it is a generalised symmetric division. According to this classification, PA and DH models are both in the same category (GPA), and indeed, both predict the same type of clone size distribution, an Exponential one (Greulich and Simons, 2016).
Numerical simulation of random cell fate models
To check whether the correspondence between model class, GIA vs. GPA, and predicted clonal statistics holds in general, we analyze the clonal dynamics numerically, by generating and testing a large number of random stochastic models, implemented via random generation of the parameters λ_{i}, ω_{ij}, γ_{i} and ${r}_{i}^{jk}$. To simulate clones, we perform stochastic simulations based on the Gillespie algorithm (Gillespie, 1977), assuming a Markov process following the rules of Equation 35. We run, for each model, a large number of simulations with initially one cell in the compartment $\mathcal{\mathcal{R}}$, thus the cell population of each simulation run represents one clone. Then we sample their outcomes, the total cell numbers per clone (the clone size) $n={\sum}_{i}{n}_{i}$, to obtain predictions for clonal statistics, namely the frequency distribution of clone sizes (clone size distribution) and mean clone sizes (see Materials and methods).
We first study the mean clone size of surviving clones (with $n>0$), ${\overline{n}}_{s}={\u27e8n\u27e9}_{n>0}$, shown in Figure 2, respectively, for the GIA and GPA models, as a function of time (the final time $\tau =20/{\alpha}_{\mathrm{min}}$ where ${\alpha}_{\mathrm{min}}$ is the minimal process rate, ${\alpha}_{\mathrm{min}}=\mathrm{min}({\lambda}_{1},\mathrm{\dots},{\omega}_{12},\mathrm{\dots},{d}_{m})$). We note that indeed a common behavior is seen in each case. While for every simulated GIA model, ${\overline{n}}_{s}$ saturates at a plateau value, it steadily increases for every GPA model. This is expected, and can be understood given that clones in a GPA model can go extinct while those in a GIA model not. Assume that there are initially a large number ${N}_{c}$ of clones, such that the total number of cells is ${n}_{\mathrm{tot}}={N}_{c}{\overline{n}}_{s}$. Since the system is homeostatic, it will reach a constant steady state ${n}_{\mathrm{tot}}^{*}$ after a sufficient amount of time, meaning that the mean clone size is ${\overline{n}}_{s}={n}_{\mathrm{tot}}^{*}/{N}_{c}$. If no clones go extinct, as in GIA models, ${N}_{c}$ is constant and thus ${\overline{n}}_{s}$ approaches a constant. However, in nonconserved multitype branching processes, as GPA models are, the clone number ${N}_{c}$ decreases through progressive extinction of clones (Haccou et al., 2005), and therefore ${\overline{n}}_{s}$ increases, despite the cell population as a whole staying stationary.
The resulting clone size distributions for the two model classes are shown in Figure 3. Here, clones sizes $n$ are rescaled by the mean value ${\overline{n}}_{s}$ and compared to an Exponential distribution of unitary mean (green curve). As conjectured, all simulated GPA models shown in panel (b) predict asymptotically the same rescaled clone size distribution, namely a standard Exponential distribution. Deviations exist for small times and small clone sizes, but these deviations vanish in the large time limit (details on the convergence are shown in the Appendix, 'Analysis of the generalized Population Asymmetry model'). This means that different models within the GPA class cannot be distinguished in the longterm limit, since they differ only by the mean clone size, which is a free fit parameter. In analogy to statistical physics, we can categorize them as a universality class (Klein and Simons, 2011), meaning that the details of the model do not affect the (scaled) outcomes for assymptotic conditions, which is a form of weak convergence of random variables (Billingsley, 1968). However, the same cannot be said about the GIA models. In fact, we see all kind of shapes in the clone size distributions, both peaked distributions and nonpeaked ones, and in fact, some distributions are even close to an Exponential form, and can thus not be distinguished from GPA models. The question is whether we can yet find other parameters for which, when large, also GIA models exhibit universality, that is, yield the same rescaled clone size distribution. For this purpose, we will in the following sections develop a deeper theoretical understanding of the model classes.
Mathematical analysis: Markovian approximation of compartment model
To obtain a deeper understanding of the numerical results, we study the cell fate models in terms of the compartment representation, Equation 9. In this representation models are not Markovian, yet we can study their Markovian counterpart, as an approximation. While this is not expected to yield accurate clone size distributions in general, the limiting distributions of nonMarkovian processes are commonly well estimated by their Markovian counterparts.
For GIA models, which only feature $R\to R+C$ transitions between the renewing compartment, $\mathcal{\mathcal{C}}$, and the committed compartment, $\mathcal{\mathcal{C}}$, a corresponding Markovian model reads,
in which ${X}_{1}$ represents a single state in $\mathcal{\mathcal{R}}$ and ${X}_{2}$ in $\mathcal{\mathcal{C}}$, and symbols at arrows are the process rates. The number of cells in ${X}_{1}$, ${n}_{1}$, is conserved, that is, given an single ${X}_{1}$cell initially, it always remains at ${n}_{1}=1$. Thus, we only need to consider the dynamics of cells in ${X}_{2}$, ${n}_{2}$. This Markov process can be solved analytically, and for sufficiently large steady state mean number of ${X}_{2}$cells, ${\overline{n}}_{2}=\u27e8{n}_{2}\u27e9={\lambda}_{1}/(\gamma {\lambda}_{2})$ (see Appendix, 'GIA^{0} test case: steady state distribution and limiting behavior'), the rescaled distribution of cells in ${X}_{2}$ is,
in which ${x}_{2}={n}_{2}/{\overline{n}}_{2}$, ${\widehat{\lambda}}_{1}={\lambda}_{1}/\gamma $ and ${\widehat{\lambda}}_{2}={\lambda}_{2}/\gamma $ and $\mathrm{\Gamma}(\mathrm{\dots})$ is the Gamma function (Abramowitz and Stegun, 1972). We note that this distribution exhibits a large variety of shapes: for large ${\widehat{\lambda}}_{1}$ the distribution is peaked, while for small ${\widehat{\lambda}}_{1}$ is loses its peak. Notably, for ${\widehat{\lambda}}_{1}\to 1$ and ${\widehat{\lambda}}_{2}\to 1$, the distribution becomes Exponential and in this case it cannot be distinguished from the GPA case. On the other hand, for ${\widehat{\lambda}}_{1}\to \mathrm{\infty}$, that is, when the ratio of asymmetric divisions over the loss rate is high, this distribution tends to a Normal distribution with unitary mean and variance equal to $1/{\widehat{\lambda}}_{1}$. These different behaviors are graphically shown in the Appendix (see Appendix 1—figure 6, 7 and 8).
For the GPA models, a Markovian approximation reads, accordingly,
whereby for homeostasis to prevail, ${\lambda}_{1}{r}_{1}={\lambda}_{1}{r}_{2}+\omega $ and ${\lambda}_{2}<\gamma $ must hold. We note that the dynamics of ${X}_{1}$ are independent of ${X}_{2}$ and thus for the number of cells in ${X}_{1}$ in homeostasis holds
which corresponds to a simple continuoustime branching process with two offspring, for which it is known that the resulting distribution of cell numbers is Exponential, that is, ${P}_{1}({n}_{1})={\overline{n}}_{1,s}^{1}{e}^{{n}_{1}/{\overline{n}}_{1,s}}$, where ${\overline{n}}_{1,s}\simeq {\lambda}_{1}{r}_{1}t$ is the mean number of cells in the surviving clones (Haccou et al., 2005).
${X}_{2}$ cells produced according to 12 follow the same fate as in the twostate GIA model above. While it is not assured that the distribution of ${X}_{2}$ cells is identical to that of Equation 11 (due to simultaneous production events of type ${X}_{1}\to {X}_{2}+{X}_{2}$), we show in the Appendix, 'Asymptotic clone size distributions: mathematical analysis', that for large rates of production of Ccells, the distribution of Ccells – here: cells in state ${X}_{2}$ – attains a Normal distribution with mean ${\overline{n}}_{2}$ equal to its variance ${\sigma}_{{n}_{2}}^{2}=\u27e8{({n}_{2}{\overline{n}}_{2})}^{2}\u27e9={\overline{n}}_{2}$. As each ${X}_{1}$ cell contributes independently to the production of ${X}_{2}$cells, we have that ${\overline{n}}_{2}\sim {n}_{1,s}\sim t$. Crucially, this means that in terms of the rescaled variable ${x}_{2}={n}_{2}/{\overline{n}}_{s}$ the standard deviation ${\sigma}_{{x}_{2}}=\frac{{\sigma}_{{n}_{2}}}{{\overline{n}}_{s}}\le \frac{1}{\sqrt{{\overline{n}}_{2}}}\sim {t}^{1/2}$ vanishes for large times, since ${\overline{n}}_{2}\sim {n}_{1,s}\sim t\to \mathrm{\infty}$. Hence, given fixed ${x}_{1}$, ${x}_{2}$ can be approximated by a constant random number ${{x}_{2}}_{{x}_{1}}\sim {\overline{x}}_{1}={n}_{1}/{\overline{n}}_{s}$. Therefore, the rescaled distribution of the total number of cells is $P(x)={P}_{1}(x{x}_{2})={e}^{x}$, where $\overline{x}={\overline{x}}_{1}+{\overline{x}}_{2}\sim {\overline{x}}_{1}$. Thus, the rescaled distribution of the total clone size, $x=n/{\overline{n}}_{s}$, is as well an Exponential.
Universality of generic cell fate models
For generic GIA or GPA models, the compartment representation, Equation 9, is not Markovian and one would not expect exactly the distributions we found in the previous section. Fortunately, the limiting distributions of nonMarkovian processes and their Markovian counterparts are often, under certain conditions on the parameters, the same. While we reserve the technical arguments for the Appendix ('Asymptotic clone size distributions: mathematical analysis'), we note that this independence of the limiting distribution on the Markov property related to the central limit theorem, which does not rely on the Markov property.
To identify the correct limiting parameters for more complex cell fate models, we need to express the effective nonMarkovian rates (i.e. the mean frequency of events) of representation nine in terms of the original model, 3–5. As discussed in the Appendix ('Approximation of generic GIA models' and 'Asymptotic clone size distributions: Mathematical analysis'), we identify those effective rates by the total rates of cell divisions, ${\lambda}_{R}={\sum}_{i\in \mathcal{R}}{\lambda}_{i}{P}_{i}^{R}$, ${\gamma}_{C}={\sum}_{i\in \mathcal{C}}{\gamma}_{i}{P}_{i}^{C}$, and ${\omega}_{RC}={\sum}_{i\in \mathcal{R},j\in \mathcal{C}}{\omega}_{ij}{P}_{i}^{R}$ where, for each compartment, ${P}_{i}^{R,C}={\overline{n}}_{i}/{\sum}_{j\in \mathcal{R},\mathcal{C}}{\overline{n}}_{j}$ is the probability of a single cell being in state X_{i} of $\mathcal{\mathcal{R}}$, respectively (${\overline{n}}_{i}$ are the solutions to Equation 6). In the Appendix, 'Asymptotic clone size distributions: mathematical analysis', we reason that all GPA models are expected to generate Exponential clone size distributions for large times t. This is indeed what is observed in Figure 3(b). Correspondingly, for GIA models we expect that for large ${\widehat{\lambda}}_{R}={\lambda}_{R}/{\gamma}_{C}$ the clone size distribution of GIA models would tend to a Normal distribution. To test this prediction, we simulated the same GIA models as for Figure 3 before, but we tuned parameters in $\mathcal{\mathcal{R}}$ such that the effective parameter ${\widehat{\lambda}}_{R}$ becomes large (see details in the Appendix, 'GIA model for large $\hat{\lambda}}_{R$'). The result is shown in Figure 4: for an illustrative case shown in panel (a), increasing ${\widehat{\lambda}}_{R}$ changes the distribution from an exponential form to a peaked form akin to a Normal distribution, and for all simulated random GIA models, shown in panel (b), a Normal distribution is approached when ${\widehat{\lambda}}_{R}$ becomes large.
We note that when taking the limit of large ${\widehat{\lambda}}_{R}$, as shown in Figure 4, also all other process rates ω_{ij} with i,j within $\mathcal{\mathcal{R}}$ increased as well. What if instead some process rates in $\mathcal{\mathcal{R}}$ do not scale to become large with ${\widehat{\lambda}}_{R}$? To assess this situation, we studied a simple test case similar to model 10 but containing two states in $\mathcal{\mathcal{R}}$, connected via direct state transition (see Appendix, 'GIA^{B} test case: bimodal distribution'). As discussed there, if all rates within $\mathcal{\mathcal{R}}$ are large compared to the rates in $\mathcal{\mathcal{C}}$ then indeed we observe a Normal clone size distribution, as expected. However, if the direct transition rates between the states of $\mathcal{\mathcal{R}}$ are smaller or of equal magnitude as γ_{C}, and in addition, one of the two division rates is higher then the other, then we observe a bimodal clone size distribution. The reason is that if the transitions between the two states in $\mathcal{\mathcal{R}}$ are rare compared to the life time of cells, $1/{\gamma}_{C}$, they become essentially separated and each of those states generate separate Normal distributions with different mean (due to different cell division rates in those two states) which, when overlaid, generate a bimodal clone size distribution (see detailed arguments in the Appendix, 'Asymptotic clone size distributions: mathematical analysis').
Finally, from those considerations follows:
GPA models attain an Exponential clone size distribution for time $t\to \mathrm{\infty}$.
GIA models attain a Normal clone size distribution if all process rates within $\mathcal{\mathcal{R}}$ are much larger than the inverse lifetime of Ccells, γ_{C}.
Hence, the GIA and GPA model classes, each represent a universality class, that is, a scaling limit exists in which all models of the same class yield the same rescaled clonal statistics.
Discussion
Our analysis shows that intrinsic limitations exist for identifying strategies of stem cell selfrenewal through clonal data from cell lineagetracing experiments. This is due to different models of cell fate choice generating the same type of clonal statistics (clone size distributions), so that model inference based on clonal statistics – currently still the most prevalent method to determine stem cell selfrenewal strategies – fails to distinguish them. The feature that different models asymptotically generate the same statistics is a form of weak convergence of random variables (Billingsley, 1968) and corresponds to universality, as known from statistical physics.
Cell fate models can in principle be very complex, with a plethora of cell (sub)types in a tissue. We introduced a new categorization of cell types, distinguishing between cell states that are committed (Ccells), whose progeny is inevitably lost eventually, and noncommitted or (self)renewing cell states (Rcells), which retain the potential to remain or return to the apex of the lineage hierarchy. According to this categorization we classified generic models of cell fate choice as Generalized Invariant Asymmetry (GIA), if only generalized asymmetric divisions of the form $R\to R+C$ occur for Rcells, and Generalized Population Asymmetry (GPA), when all kind of divisions can occur, as long as gain and loss of Rcells are balanced. Models of the GIA category are also characterized by a conservation law, since the number of Rcells is strictly conserved, while GPA models do not exhibit such a conservation law.
We found that the classification in GIA and GPA models mirrors the clonal statistics generated by them: models of the GPA class all generate clonal statistics which with time converge to an Exponential clone size distribution. Thus, two GPA models can therefore not be distinguished through clonal data, once some time has passed after induction of clones. For GIA models, distributions can generally vary, but if the rates of divisions and transitions in the $\mathcal{\mathcal{R}}$ compartment are much larger that the rate of cell loss, the clone size distribution of all those models becomes a Normal distribution. In that case, two GIA models can not be distinguished by the clonal data. While here we do not explicitly consider cellextrinsic regulation of cell fate, this kind of regulation does not affect longterm clone size distributions, except when cells are arranged onedimensionally (Klein and Simons, 2011; Bramson and Griffeath, 1980). Thus, our results cover cell dynamics in most renewing tissues, such as epithelial sheets or volumnar organs, but not (quasi)onedimensional arrangements of stem cells, as found in the seminiferous tubule, or in intestinal crypts, where clonal statistics may differ. Hence, our analysis shows that models of cell fate choice cannot in general be distinguished with further resolution beyond the R vs. C categorization of cell types. The universality of the model dynamics also shows that effective, simplistic models are often equally accurate to model experimental data, yet with a higher statistical power due to less free parameters.
While at first glance, this analysis seems to discourage efforts to unravel details of cell fate dynamics, room remains in regimes where the limiting conditions for asymptotic distributions are not fulfilled. In particular, if fast cycling committed progenitor cells are present, while stem cells are slow cycling, then the condition that the division rate of Rcells is much larger than the cell loss rate is not fulfilled. In that case, details of the model dynamics may affect the shape of the clone size distribution and thus allow distinction between models. However, caution should be given when an Exponential clone size distribution is observed, since this could indicate either a GIA model with high activity of committed progenitor cells, or a GPA model. In that case, the mean clone size needs to be consulted to distinguish models (see Figure 2). Differentiating between models within the GPA category is more difficult, since the predicted statistics from different models always become more similar over time. Shortterm measurements would in principle allow such a distinction, but since in reality the underlying processes are not truly Markovian (as assumed for the modeling purpose) they are not necessarily a good representation of the real cell dynamics at short times. At long times, however, Markovian approximations are increasingly accurate, precisely because of the feature of universality.
How could the resolution of cell fate modeling be improved? The stateoftheart approach to determine cell fate trajectories is via analysis and modeling of singlecell RNAsequencing (scRNAseq) data. However, many limitations to this method exist, discussed in Weinreb et al., 2018, and neither reversible trajectories nor the modes of cell division, such as asymmetric vs symmetric divisions, can be inferred. Intravital live imaging, on the other hand, allows to trace individual clones over time (Ritsma et al., 2012; Pittet and Weissleder, 2011; Hara et al., 2014; Rompolas et al., 2016), and thus can obtain details of cell fate trajectories, yet this technique is limited to few tissue types which are accessible for invasive longterm imaging. Nonetheless, while each of those experimental assays alone is prone to limitations in defining selfrenewal strategies, advanced model inference schemes, that integrate data from different experimental sources, might be the way forward in the future to finally reveal the details of stem cell selfrenewal strategies.
Materials and methods
The numerical analysis of the random cell fate model was implemented in Matlab. The description of the stochastic models definition, the random model generation and the simulation campaign is detailed in the Appendix, 'Stochastic process modelling'. Additionally, as a validation of the implemented simulator, based on the Gillespie algorithm (Gillespie, 1977), the IA and PA models were simulated and the results analyzed in the Appendix, 'Invariant Asymmetry and Population Asymmetry models'.
Analytical solutions were partially obtained using Mathematica.
Appendix 1
Conditions for homeostasis
Here, we ‘translate’ the generic conditions for the existence of a Lyapunov stable stationary state for Linear Cooperative Systems (LCS) (Greulich et al., 2019) into the biological context of clonal dynamics. A linear cooperative system is one of the form $\frac{d}{dt}\mathit{\bm{x}}(t)=A\mathit{\bm{x}}(t)$ where $\mathit{\bm{x}}(t)=({x}_{1}(t),{x}_{2}(t),\mathrm{\dots},{x}_{m}(t))$ are functions of time t and A is a constant m × m matrix for which all offdiagonal elements are nonnegative (the latter condition defines the cooperativity of the system) (Hirsch and Smith, 2006; Greulich et al., 2019). We note that the dynamics of mean cell numbers, Equations 6 and 7 in the main text, indeed describe an LCS according to this definition. Now we use the following definitions:
$G(A)$ is the graph of A, that is, the graph for which A is the adjacency matrix, whose elements a_{ij} give the weight of the links from i to j (${a}_{ij}=0$ means that no link exists). In the following, we use the terms graph and network synonymously.
If in $G(A)$ there exists a path from node i to node j and from j to i, then we call those nodes strongly connected, $i\equiv j$, which is an equivalence relation. A maximal set of nodes which are are strongly connected with each other are called a Strongly Connected Component (SCC) of the graph (the equivalence class of the equivalence relation ‘≡').
The graph $G(A)$ can be decomposed into its ${N}_{S}$ SCCs, S_{k}, $k=1,\mathrm{\dots},{N}_{S}$ (Cormen, 2009), which are subgraphs associated with an adjacency matrix ${A}_{k}$, such that $G({A}_{k})={S}_{k}$. Since the ${A}_{k}$ have nonnegative offdiagonal elements, they are Metzler matrices for which the PerronFrobenius theorem ensures that a unique, simple and real maximal eigenvalue ${\mu}_{k}$ exists (Arrow, 1989). The eigenvalue ${\mu}_{k}$ is called the dominant eigenvalue of S_{k}. Associated with this eigenvalue, there is, for all $k$, a positive eigenvector ${\mathit{\bm{x}}}^{(k)}=({x}_{1}^{(k)},{x}_{2}^{(k)},\mathrm{\dots})$, that is, one with all entries ${x}_{i}^{(k)}>0$.
The condensed graph of $G(A)$ is the graph where nodes are the SCCs of $G(A)$ and a link from SCC S_{k} to SCC S_{l} ($k,l=1,\mathrm{\dots},{N}_{S}$) exists if there is is at least one link from a node (in $G(A)$) in S_{k} to a node in S_{l}.
If there is a path from SCC S_{k} to SCC S_{l}, then we call S_{k} upstream of S_{l} and accordingly S_{l} downstream of S_{k}. We note that there can never exist paths from S_{k} to S_{l} and from S_{l} to S_{k}, since otherwise, by definition, their nodes would be strongly connected and both together would form a single SCC (Cormen, 2009). Thus, there is a unique hierarchy of SCCs.
A stationary state ${\mathit{\bm{x}}}^{*}$ of a dynamical system is Lyapunov stable if a small initial deviation from ${\mathit{\bm{x}}}^{*}$ leads to a small final deviation $x(t)$ (i.e. ${\mathit{\bm{x}}}^{*}$ is not unstable). More accurately: there exists a constant $C>0$ such that $\mathit{\bm{x}}(t){\mathit{\bm{x}}}^{*}<C{\mathit{\bm{x}}}_{0}{\mathit{\bm{x}}}^{*}$ for all times t, where ${\mathit{\bm{x}}}_{0}=\mathit{\bm{x}}(t={t}_{0})$ is the initial condition, sufficiently close to ${\mathit{\bm{x}}}^{*}$. A stationary state of a linear system that is Lyapunov stable, yet neither asymptotically stable nor has a limit cycle, is neutrally stable.
Homeostasis means that the cell numbers in each state, $\mathit{\bm{n}}=({n}_{1},\mathrm{\dots},{n}_{m})$, stay on average constant, $\frac{d\overline{\mathit{\bm{n}}}}{dt}=0$ (where $\overline{\mathit{\bm{n}}}=\u27e8\mathit{\bm{n}}\u27e9$), and that this state is not unstable towards perturbations. This condition corresponds to a Lyapunovstable stationary state. Note that a linear system, as the one described by Equations 6 and 7, main text, cannot have an asymptotically stable state except for the trivial state ${\overline{\mathit{\bm{n}}}}^{*}=0$, which corresponds to a vanishing cell population. We note that when considering the tissue cell population as a whole, dynamics can be nonlinear through interactions between cells and a nonvanishing asymptotically stable state may then exist. However, since single clones do not significantly affect the total configuration of cells in a tissue, the clones compete neutrally, when embedded in a homeostatic cell population, which corresponds to a Lyapunov stable, but not asymptotically stable state. We therefore use Lyapunov stability, a weaker form of stability, to define homeostasis, since an asymptotically stable vanishing state is not a biologically viable state.
Now, for an LCS holds, according to Greulich et al., 2019,
Theorem 1
An LCS, $\dot{\mathbf{\mathbf{x}}}\mathrm{=}A\mathit{}\mathbf{\mathbf{x}}$, possesses a nontrivial Lyapunov stable stationary state (${\mathbf{\mathbf{x}}}^{\mathrm{*}}\mathrm{>}\mathrm{0}$), if and only if,
G(A) does not contain any SCC, S_{k}, with ${\mu}_{k}>0$.
There is at least one SCC, S_{k}, with ${\mu}_{k}=0$.
There is no path between any two SCCs, S_{k} and S_{l}, which have ${\mu}_{k}=0$ and ${\mu}_{l}=0$.
Furthermore holds,
Theorem 2
All nodes i upstream of an SCC S_{l} with ${\mu}_{l}\mathrm{=}\mathrm{0}$ must be empty in the the stationary state, that is, ${x}_{i}^{\mathrm{*}}\mathrm{=}\mathrm{0}$, if i is upstream of the SCC S_{l}.
Since Equation 7, main text, is an LCS, we can apply theorems 1 and 2 to find conditions for homeostasis, defined by a Lyapunovstable configuration of mean cell numbers ${\overline{\mathit{\bm{n}}}}^{*}=({\overline{n}}_{1},{\overline{n}}_{2},\mathrm{\dots})$. According to theorem 1 at least one SCC with ${\mu}_{k}=0$ must then exist, and according to theorem 2 the stationary state of nodes upstream of it must be empty, that is, they do not exist in homeostasis. Since the condensed graph of the SCCs does not have cyclic paths, an SCC S_{k} with ${\mu}_{k}=0$ must therefore always reside at the apex of all nonvanishing cell types. In principle, an acyclic graph may have more than one apex, however, since, by definition, a stem cell clone always starts with a single stem cell, and no other SCC with $\mu =0$ may be downstream of the latter, we only consider one apex SCC with one initial cell when studying clonal dynamics.
Hence, in the context of homeostatic clonal dynamics, we can assume that there is a single SCC, S_{k} with ${\mu}_{k}=0$ at the apex of the cell state graph, while all other SCCs, S_{l} are downstream of it and have ${\mu}_{l}<0$. Since there are no paths from the nonapex SCC to the apex SCC (as the condensed graph is acyclic) we can distinguish the two separate compartments $\mathcal{\mathcal{R}}$ (the renewing compartment) consisting of all nodes of the apex SCC, S_{k}, and $\mathcal{\mathcal{C}}$ (the committed compartment), consisting of all other nodes, whereby due to ${\mu}_{l}<0$ for all SCCs in $\mathcal{\mathcal{C}}$, all progeny of cells in $\mathcal{\mathcal{C}}$ will vanish in the long term.
Stochastic process modelling
Model description
Since clonal dynamics start, by definition, with a single cell, we use stochastic dynamics to model clones. Thus, we model cell fate dynamics as a continuoustime multitype branching process (Haccou et al., 2005), a Markov process following the rules of Equations 35, main text. As shown later, without losing generality, here only two types of events are modeled; considering an arbitrary number m of cell states, X_{i}, for $i=1,\mathrm{\dots}m$, the model includes
Cell divisions: a cell in state X_{i} divides in two cells with rate $\lambda}_{i$, respectively in state X_{j} and X_{k} at a ratio $r}_{i}^{jk$.
(1) ${X}_{i}\stackrel{{\lambda}_{i}{r}_{i}^{jk}}{\u27f6}{X}_{j}+{X}_{k}{\textstyle \text{,}}\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{thickmathspace}{0ex}}i,j,k=1,...,m,$
where ${\lambda}_{i}=0$ if state X_{i} does not allow division. In this formulation of cell division events, which we use for the generation and numerical simulations of random models, only one division outcome is possible upon division of a particular cell state X_{i}. Nonetheless, multiple division outcomes per state can be implemented as single outcomes if additional metastates are introduced, which represent priming of a state X_{i} towards a certain division outcome option. For example, if in the original model, state X_{i} has different outcome options, ${X}_{{j}_{1}}+{X}_{{k}_{1}},{X}_{{j}_{2}}+{X}_{{k}_{2}},\mathrm{\dots}$, we can substitute this by, first, transitions from X_{i} to (new) states ${X}_{{m}_{1}},{X}_{{m}_{2}},\mathrm{\dots}$ and subsequent divisions ${X}_{{m}_{l}}\to {X}_{{j}_{l}}+{X}_{{k}_{l}}$. The use of metastates to model more complex processes is discussed in detail in 'Population Asymmetry model using metastates'.
Direct state transitions: a cell in state X_{i} changes to state X_{j} at a given rate ω_{ij}.
(2) ${X}_{i}\stackrel{{\omega}_{ij}}{\u27f6}{X}_{j}\text{,}\mathit{}i,j=1,\mathrm{\dots},m\text{;}i\ne j,$
where ${\omega}_{ij}=0$ means that no transition from X_{i} to X_{j} is possible. Additionally, we include cell loss in this scheme, by treating it as a transition to an additional special state, called hereafter death and denoted by $\mathrm{\varnothing}$ (cells in this state do not enter in the counting of the total number of cells). In that formulation, the loss rates of the original model are ${d}_{i}={\omega}_{i\mathrm{\varnothing}}$.
These events define a Markov process, which can be represented as a stochastic network (BangJensen and Gutin, 2007). In this view, each node can be related to a cell state, while the links represent transitions between states via cell divisions and the direct state transitions. It is noted that this stochastic network is different from the network defined in the main text and in 'Conditions for homeostasis' of this SI, which describes the dynamics of mean cell number instead. Here, for the stochastic modelling, let us define the adjacency matrix $K$ of this network, through the elements ${\kappa}_{ij}={\lambda}_{i}2{r}_{i}^{j}+{\omega}_{ij}i,j=1,\mathrm{\dots},m$, in which ${\kappa}_{ij}$ are the total transition rates as defined in the main text. We note that $K$ is related to the matrix A used in the main text by $A={K}^{T}\mathrm{\Delta}$, where $\mathrm{\Delta}$ is the diagonal matrix with entries ${\delta}_{i},i=1,\mathrm{\dots},m$, as defined in the main text, with the slight difference that here the loss state $\mathrm{\varnothing}$ is treated as a separate state. Additionally, it is remarked that in this model interpretation, where only one division option for each state is possible, the term ${r}_{i}^{j}\le 1$ is not a continuum value, but instead it can only take the values $0,1/2,1$ depending on the specific outcome of the division of the cells in state X_{i}. Notably, more than one stochastic network may result in the same matrix $K$, therefore, to uniquely define a process, we distinguish a matrix D which describes cell division events (note that this is possible with just a single matrix as there is only one division option per state) and a matrix $T$ which describes direct transition events. The matrix $K$ is the sum of both, $K=N+T$.
Generation of random models
To test the behavior of the clonal dynamics in a generic homeostatic model, a large number of random stochastic networks was generated, whereby each stochastic network corresponds to a distinct set of parameters ${\lambda}_{1},\mathrm{\dots},{\lambda}_{m},{\omega}_{12},\mathrm{\dots},{\omega}_{m\mathrm{\varnothing}}$ for the stochastic stem cell fate choice model. The strategy detailed below is based on the following considerations which summarize the key requirements to achieve homeostasis detailed in 'Conditions for homeostasis': (a) each network is composed of Strongly Connected Components (SCCs) that are randomly connected; (b) only one SCC, the one at the apex of the network, forms the renewing compartment, $\mathcal{\mathcal{R}}$, (i.e. it is characterized by a dominant eigenvalue $\mu =0$ with respect to A) and all the others form the committed compartment, $\mathcal{\mathcal{C}}$, (i.e. they are characterized by a dominant eigenvalues $\mu <0$). It is further noted that the SCCs of the stochastic network $G(K)$ are the same as those of the matrix $G(A)$, where $A={K}^{T}\mathrm{\Delta}$ defines the dynamics of mean cell numbers. This is, since transposition of an adjacency matrix and altering of diagonal elements does not affect the network topology.
To generate the stochastic network, a twostep process is followed: (1) a large number of (random) SCCs are generated; (2) a condensed network is randomly constructed and filled with randomly picked SCC from step 1.
It is noted that unitary rates are assumed in step (1) and they are successively randomly modified in step (2) to achieve the desired properties of the dominant eigenvalue μ while ensuring randomness.
Focusing now on step (1), that is, the generation of single SCCs, the following procedure is used.
The total number of states composing the SCC is defined, indicated as ${m}_{S}$. An additional state is added to represent whatever is outside the SCC. In the current analysis, we set $1\le {m}_{S}\le 4$.
We build separately all the possible combinations of transition and division matrices, indicated hereafter with ${M}_{T}$ and ${M}_{D}$, respectively. These matrices are ordered for increasing number of transitions ${N}_{T}$ and divisions ${N}_{D}$. In case GIA networks are generated, the ${M}_{D}$ and ${M}_{T}$ combinations are filtered, to remain just with those where the division outcome is one cell inside the SCC and one outside the SCC, and where there are only transitions between states within the SCC (i.e. where cell numbers are conserved). From a computational point of view, this process is feasible up to ${m}_{S}=4$.
The matrices stored in ${M}_{D}$ and ${M}_{T}$ are then combined together to form a model (which is completely defined by one matrix in ${M}_{D}$ and one in ${M}_{T}$); ${M}_{DT}$ indicates the pool of possible models. This process is done considering separately each ${m}_{S}$, ${N}_{T}$ and ${N}_{D}$. In this step, due to technical limitations given by the high number of possible combinations, if the total number of combinations exceed $5\cdot {10}^{4}$ then only 10^{4} random matrices from ${M}_{D}$ and ${M}_{T}$ are combined.
Each model in ${M}_{DT}$ is then processed to check if the corresponding network is a SCC in the first ${m}_{S}$ states. If not, then this model is discarded. In case GPA networks are generated, a further check is performed to discard also those models consistent with a GIA network (they cannot be a priori excluded as done in point 2 for the GIA ones). These pools of models are indicated as ${M}_{\mathrm{GIA}}$ and ${M}_{\mathrm{GPA}}$ for the GIA and GPA models, respectively.
For each SCC in ${M}_{\mathrm{GIA}}$ and ${M}_{\mathrm{GPA}}$, the dominant eigenvalue μ is estimated. For construction, the generated GIA networks are all characterized by $\mu =0$, while in general any value can be obtained within ${M}_{\mathrm{GPA}}$.
The SCCs in ${M}_{\mathrm{GPA}}$ are additionally processed to check whether the network is compatible with homeostasis by tuning the rates. Networks satisfying this condition are additionally stored under a new pool of SCCs, called ${M}_{\mathrm{GPA}}^{*}$. If not, then they are discarded when $\mu >0$ (i.e. for any combination of rates the number of cells in these networks is expected to grow).
This process results in three pools of SCCs classified for ${m}_{S}$, ${N}_{T}$ and ${N}_{D}$ (i.e. number of states, transitions and divisions): (1) ${M}_{\mathrm{GIA}}$ contains GIA models; (2) ${M}_{\mathrm{GPA}}^{*}$ contains GPA models that can be tuned to have $\mu =0$ and (3) ${M}_{\mathrm{GPA}}$ contains GPA models characterized by $\mu <0$ or that can be tuned to meet this condition.
In step (2), the generation of random networks starting from the individual SCCs is implemented as follows.
A number of committed SCCs, ${N}_{c}$, between 1 and 3 is randomly chosen.
${N}_{c}$ SCCs are randomly picked from the pool of models ${M}_{\mathrm{GPA}}$. The selection is done considering equal probability in ${m}_{S}$, ${N}_{T}$ and ${N}_{D}$. For each SCC, the unitary rates α (where α stands for any rate λ_{i} or ω_{ij}) are modified by multiplying them for random numbers (exponentially distributed with mean $\overline{\alpha}=1$ and minimum ${\alpha}_{m}=0.3$). Additionally, a threshold on the dominant eigenvalue is set, ${\mu}_{\mathrm{max}}=1$; if this condition is not satisfied, then the rates are tuned to meet this requirement while maintaining the rates above the minimum.
The committed compartment of the condensed network is generated by randomly connecting all the outgoing components of the $k$SCC with states in the $l$SCC for $l=k+1,..,{N}_{c}$. In this way, the transposed adjacency matrix of the stochastic network has triangular block form:
(3) ${K}^{T}=\left[\begin{array}{ccccc}{B}_{1}& & & & \\ {C}_{12}& {B}_{2}& & 0\\ & & ...& & \\ {C}_{1,{N}_{c}}& {C}_{2,{N}_{c}}& & {B}_{{N}_{c}}& \\ {C}_{1\mathrm{\varnothing}}& {C}_{2\mathrm{\varnothing}}& & {C}_{{N}_{c},\mathrm{\varnothing}}& 0\end{array}\right].$The last SCC is forced to be linked to a single death state.
With a similar procedure described in point 2, two SCCs are randomly picked respectively from the pool of SCCs in ${M}_{\mathrm{GPA}}^{*}$ and ${M}_{\mathrm{GIA}}$; the unitary rates are modified (exponentially distributed with mean $\overline{\alpha}=1$ and minimum ${\alpha}_{m}=0.3$) and, in the GPA case, tuned to meet the condition $\mu =0$. They represent the renewing part of the network.
Two networks (one for the GIA and one for the GPA models) are produced by attaching the selected renewing network upstream the committed one; this is done based on an analogous procedure as described in step 3.
At the end of this process, we have two networks which are different in just the renewing part, being one consistent with the GIA model and the other with the GPA one. In total 2000 networks were built and analyzed.
Simulation campaign
An extensive simulation campaign was run to model the clone dynamics. The code implemented to numerically simulate the stochastic process defined by events of type 1 and 2 is based on the Gillespie algorithm (Gillespie, 1977). Since a clone is by definition the progeny of a single cell, we choose as initial condition a single cell put randomly in a state within $\mathcal{\mathcal{R}}$. Concerning the final condition, given the substantial difference in the dynamics in the two models, the final time, indicated by τ, is set equal to 20 times the inverse of the minimum process rate, ${\alpha}_{\mathrm{min}}=\mathrm{min}({\lambda}_{1},\mathrm{\dots},{\lambda}_{m},{\omega}_{12},\mathrm{\dots},{\omega}_{m,\mathrm{\varnothing}})$, in the GIA models, and to the time at which the fraction of extinct clones reaches 98% in the GPA models. Note that all critical branching processes, as homeostatic clonal dynamics are, will go extinct almost surely at some point in time (Haccou et al., 2005).
To determine the clone size distribution, 10^{3} and $5\cdot {10}^{4}$ simulations were run respectively in for each GIA and GPA model (in this way, both models result in the same final number of clones when 98% extinction is taken into account).
Numerical simulation test cases
Invariant Asymmetry and Population Asymmetry models
To validate the simulation approach, we tested the procedure on simple cell fate models for which analytical results are known, the Invariant Asymmetry (IA) and Population Asymmetry (PA) models. As described in the main text, in the simplest version, these are defined as,
In these processes, cells of type S represent the stem cells (called hereafter also progenitor), which divide with stochastic rate λ, and cells of type D are the differentiated cells, which are shed with rate γ. While in the PA model the three possible outcomes of the division of a progenitor are controlled by a probability parameter $0<r\le 1/2$, in the IA model r = 0, meaning that there are strictly asymmetric division and the number of Scells is conserved. It is remarked that in the definition of the stochastic networks given in 'Model description' only one division option for each state is modelled; however, the code implemented for the numerical simulations of the stochastic process allows for an arbitrary number of division options for each state as well (see 'Population Asymmetry model using metastates').
Considering the dynamics at tissue level, the system of ODEs describing the average number of cell ${\overline{n}}_{S}$ and ${\overline{n}}_{D}$ respectively of type S and D is,
It is clear that, on average, the number of Scells remains constant. Additionally, in homeostasis, the average total number of Dcells stabilizes around a constant value ${\overline{n}}_{D}^{*}=(\lambda /\gamma ){\overline{n}}_{S}$ that uniquely depends on the number of stem cells, ${\overline{n}}_{S}$ which equals the initial number of stem cells ${\overline{n}}_{S,0}={\overline{n}}_{S}(t=0)$, Thus, the (Lyapunov stable) stationary state of total cell numbers $\overline{n}={\overline{n}}_{S}+{\overline{n}}_{D}$ is given by,
Based on Equation 6, the process rates λ and γ determine the proportion of cells of type D with respect to cells of type S. Importantly, there is no difference at tissue level between the IA and PA models.
A distinction is instead evident when we look at the dynamics at the singlecell level, and study the clone size distribution, that is, the distribution of the progeny of a single cell. For the IA model, the number of Scells is strictly constant, and thus the joint probability distribution $P({n}_{S},{n}_{D})$ of both Scells and Dcells, respectively indicated as ${n}_{S}$ and ${n}_{D}$, is fully determined by the distribution of Dcells, $P({n}_{D})$. The IA model’s master equation for $P({n}_{D})$, considering a single initial cell of type S, is given by,
This corresponds to a simple birthanddeath process for which the distribution is Poissonian with mean $\lambda /\gamma $, (Van Kampen, 1981).
Considering now the PA model, the master equation is instead given by,
In Antal and Krapivsky, 2010, an exact result for the distribution of total cell numbers $n={n}_{S}+{n}_{D}$ is found when $\lambda =\gamma $ and $r=1/4$. For different values of the process parameters, the longterm distribution is shown to be Exponential.
Numerical simulations for the clonal dynamics were run, considering the above models and three different sets of test parameters each, indicated as IA# and PA#i for $i=1,2,3$, which are reported in Appendix 1—table 1. It is noted that the time unit is arbitrary and therefore omitted. Simulations are based on 10^{4} and $5\cdot {10}^{4}$ runs respectively for the IA and PA test cases. The initial condition is a single stem cell and the final simulation time, indicated as τ, is equal to 10: this value is well representative of a steady state condition (for the IA test cases) and at which the total extinction of the process is not yet achieved (for PA test cases only). The clone size distribution at τ in the IA test cases is shown in Appendix 1—figure 1: in this figure, each profile is compared to the corresponding Poisson distribution shifted by one (i.e. plus the stem cell). Concerning the results for the PA test cases, they are shown in Appendix 1—figure 2. In this case, the profiles are compared to the numerical integration of the master Equation 8. Additionally, for the PA# test case, where $\lambda =\gamma $ and $r=1/4$, the reference analytic solution provided in Antal and Krapivsky, 2010 is also shown. In general, a good agreement is obtained in all of the cases.
Population Asymmetry model using metastates
As argued before, we assume in the random model generation that cell division in state X_{i} has a unique outcome, ${X}_{i}\to {X}_{j}+{X}_{k}$ (Equation 1), since thereby the stochastic process can be uniquely defined by the two matrices D and $T$. To accommodate for the possibility of different division outcomes from the same state X_{i}, as in Equation 4 and Equations 35 in the main text, we introduce metastates, which represent shortlived states that indicate priming for either outcome, from which the cell division outcomes are unique. This is a small modification of the original model, which, however, does not lead to significant deviations if the metastates are traversed sufficiently quickly (which can be assured by a choice of high direct state transition rates in the metastates).
To illustrate this, let us consider the PA model described by 4; instead of having three different outcomes upon division of an Scell we define the corresponding Metastate (MS) model with three primed states, ${M}_{1,2,3}$, as,
in which S and D correspond to the same cell type of the PA model (i.e. the stem and the differentiated cells, respectively), while ${M}_{i}$, for $i=1,2,3$, represent the metastates. These states are temporary states that are used to model each one of the three different possible division options of the Scells. The rates λ_{i} and ${\omega}_{i}$, for $i=1,2,3$, are chosen such that the time scales of division and outcome probabilities are the same as in the original PA model:
Equations 10 assure that outcome probabilities are the same as in the original model, while Equations 11 are needed to have the same total average time between two consecutive events. As there are six unknowns and only five relations, the following additional equation is added
in which $\mathrm{\Delta}$ is an additional parameter that is used to control how fast cells in metastate ${M}_{1}$ divide. Low values of $\mathrm{\Delta}$ imply that as soon as an Scell transits to the metastate ${M}_{1}$, it divides in two Scells. Globally, this results in
Numerical simulations for the two models were run and compared, based on the parameters reported in Appendix 1—table 1, and specifically the PA#1 and PA#3 test cases. The time unit, which is arbitrary, is omitted. The process rates for the corresponding MS model, which are indicated in the figures as MS#1 and MS#3, are computed based on Equation 13 and $\mathrm{\Delta}=1/500$. As well as for the PA test cases, the initial condition is one cell of type S and the final time, τ, is equal to 10; simulations are based on $5\cdot {10}^{4}$ trajectories.
The mean number of cells in the surviving clones and the extinction probability as function of time (scaled by τ) are shown in Appendix 1—figure 3. The clone size distribution at τ is shown in Appendix 1—figure 4. Both MS simulations agree very well with the corresponding PA ones, which justifies the use of metastates for our simulation campaign.
Analysis of the Generalized Invariant Asymmetry model
GIA^{0} test case: Steady state distribution and limiting behavior
A simple Generalized Invariant Asymmetric model, indicated hereafter as GIA^{0}, was analyzed to identify the causes of the different clone size distribution behaviors observed in the randomly generated models (see main text). Thus, in this section, we study the Markov process defined by,
Here, the renewing compartment is composed of just a single state ${X}_{1}$ and cells in this state asymmetrically divide with rate ${\lambda}_{1}$. The committed compartment is formed of state ${X}_{2}$; cells in this state can either divide to duplicate, with rate ${\lambda}_{2}$, or die, with rate γ. It is noted that for ${\lambda}_{2}=0$, this model is reduced to the previously analyzed Invariant Asymmetric (IA) model (see 'Invariant Asymmetry and Population Asymmetry models').
As for the IA model, here the number of cells in state ${X}_{1}$, indicated as ${n}_{1}$, is conserved. It is therefore sufficient to determine the statistics of ${n}_{2}$, defined by the master equation for $P({n}_{2})$, the probability of having ${n}_{2}$ cells in state ${X}_{2}$, provided that there are ${n}_{1}$ cells in state ${X}_{1}$. The master equation is given by,
also written as,
in which $r({n}_{2})=\gamma {n}_{2}$ and $g({n}_{2})={\lambda}_{1}{n}_{1}+{\lambda}_{2}{n}_{2}$. Considering that we are interested in clonal dynamics, meaning that we start from a single stem cell, ${n}_{1}$ is equal to one.
In this simple case, the steady state distribution ${P}^{*}({n}_{2})$, corresponding to the solution of $dP({n}_{2})/dt=0$, can be analytically derived. Defining the net flux between states ${n}_{2}$ and ${n}_{2}1$ as
and considering that ${I}_{{n}_{2}+1}={I}_{{n}_{2}}$ for every ${n}_{2}$, it follows that ${I}_{{n}_{2}}={I}_{0}=r(0){P}^{*}(0)g(1){P}^{*}(1)=0$, which means that
where ${P}^{*}(0)$ is the steady state probability of having 0 cells in state ${X}_{2}$. Finally, by applying the conservation of the total probability, ${\sum}_{{n}_{2}=0}^{\mathrm{\infty}}{P}^{*}({n}_{2})=1$, and rearranging the terms we obtain,
In the main text, we defined the dimensionless parameters ${\widehat{\lambda}}_{1}={\lambda}_{1}/\gamma $ and ${\widehat{\lambda}}_{2}={\lambda}_{2}/\gamma $, representing the rescaled division rates for cells in state ${X}_{1}$ and ${X}_{2}$, respectively. For clarity and readability, in this section, we simplify the notation using $p={\widehat{\lambda}}_{1}$ and $q={\widehat{\lambda}}_{2}$. Equation 19 is then rewritten as,
It is noted that while p varies between 0 and $\mathrm{\infty}$, q is defined between 0 and 1.
The mean number of cells in each state, indicated respectively as ${\overline{n}}_{1}$ and ${\overline{n}}_{2}$, satisfies the system of ODEs
Based on this, the steady state average number of cells is
When the mean number of cells in state ${X}_{2}$ is sufficiently large, that is, for large p or in case q is close to one, the discrete distribution given by Equation 20, can be approximated by a continuous probability density function ${P}^{*}({x}_{2})$, given by,
in which ${x}_{2}={n}_{2}/{\overline{n}}_{2}^{*}$. We note that Equation 23 corresponds to Equation 11 in the main text.
To better understand the distribution for different values of the parameters p and q, the limit behavior are analyzed below.
1. $\mathbf{\mathbf{q}}\to \mathrm{}$ (i.e. ${\widehat{\lambda}}_{2}\to 0)$
When $q\to 0$, Equation 20 can be simplified considering that
and
Thus, the distribution results in
that is a Poisson distribution with mean equal to p. This agrees with what we were expecting considering that when $q=0$ the model is reduced to the IA model for which the distribution in ${n}_{2}$ is known to be poissonian.
Additionally, for large mean number of cells, which are obtained for large p (when $q=0$, then ${\overline{n}}_{2}^{*}=p$), the Poisson distribution tends to a Normal distribution with mean and variance equal to p. Therefore,
Rescaling the distribution, and considering ${x}_{2}={n}_{2}/{\overline{n}}_{2}^{*}$, results in
that is a Normal distribution with unitary mean and variance equal to $1/p$.
2. $\mathbf{\mathbf{q}}\to \mathrm{\U0001d7cf}$ (i.e. ${\widehat{\lambda}}_{2}\to 1)$
For $q\to 1$ the steady state mean number of cells ${\overline{n}}_{2}^{*}\to \mathrm{\infty}$ and Equation 23 holds. This equation can be rewritten as,
If the Stirling’s approximation is applied
we obtain,
Considering now that
it follows that
that is a Gamma distribution with unitary mean and shape parameter given by p. Importantly, the Gamma distribution for $p\to \mathrm{\infty}$ tends to a Normal distribution with unitary mean and variance 1/p. For $p=1$, it corresponds instead to an Exponential distribution with unitary mean.
3. $\mathbf{\mathbf{p}}\to \mathrm{\infty}$ (i.e. ${\widehat{\lambda}}_{1}\to \mathrm{\infty})$
When p is large, the mean number of cells is large for any value of q. Thus, Equation 32 is valid. By applying the Stirling’s approximation also to the term $\mathrm{\Gamma}(p/q)$, we obtain,
This expression can be also rewritten as,
Considering now that p is large, then $1/2(\mathrm{log}({x}_{2})+\mathrm{log}(q({x}_{2}1)+1))\ll$$p/(1q)(({x}_{2}1+1/q)$$\mathrm{log}(q({x}_{2}1)+1){x}_{2}\mathrm{log}({x}_{2}))$, so the term on the right can be neglected. Additionally, for ${x}_{2}\to 1$ the following expansions can be applied:
and
Finally, if we consider that
then Equation 36 results in
that is a Normal distribution with unitary mean and variance equal to $1/p$.
Importantly, it is noted that the limiting behavior of ${P}^{*}({x}_{2})$ for $q\to 0$ and $q\to 1$ in case of large p, are both consistent with the results obtained for $p\to \mathrm{\infty}$ and any q. In other words, remembering that $p={\widehat{\lambda}}_{1}$ and $q={\widehat{\lambda}}_{2}$, the steady state distribution for ${\widehat{\lambda}}_{1}\to \mathrm{\infty}$ and any value of ${\widehat{\lambda}}_{2}$ is a Normal distribution of unitary mean and variance equal to $1/{\widehat{\lambda}}_{1}$.
To globally verify these results, numerical simulations of the stochastic process associated with model 14 for different values of ${\widehat{\lambda}}_{1}$ and ${\widehat{\lambda}}_{2}$ were run. The following curves were compared:
Stochastic simulation: distribution at the final simulation time, τ, of the number of cells in state ${X}_{2}$. The final time was chosen here as $\tau =20/{\alpha}_{\mathrm{min}}$, where ${\alpha}_{\mathrm{min}}=\mathrm{min}({\lambda}_{1},{\lambda}_{2},\gamma )$; this value is well representative of a steady state condition. Furthermore, the process rates considered are based on a unitary γ (i.e. ${\lambda}_{1}={\widehat{\lambda}}_{1}$, ${\lambda}_{2}={\widehat{\lambda}}_{2}$ and $\gamma =1$). It is noted that the time unit is arbitrary and therefore omitted.
Analytic distribution: based on Equations 20, for low mean values, and 23, for large mean values.
Approximate distributions: Poisson, Gamma and Normal distributions respectively given by Equations 27, 34 and 40.
The tested parameters ${\widehat{\lambda}}_{1}$ and ${\widehat{\lambda}}_{2}$ are graphically shown in Appendix 1—figure 5 a contour map showing the expected steady state mean number of cells ${\overline{n}}_{2}^{*}$ over the $({\widehat{\lambda}}_{1},{\widehat{\lambda}}_{2})$parameter plane. The curves from the numerical simulations and the corresponding exact and approximated solutions are shown in Appendix 1—figure 6, Appendix 1—figure 7 and Appendix 1—figure 8: the tested conditions are divided into three groups (one figure each) representing the limiting behaviors discussed above. Generally, analytical and numerical results agree very well. This also demonstrates that GIA models can show both peaked and nonpeaked distributions, depending on the model parameters.
Approximation of generic GIA models
As shown in the main text, a generic GIA model can be expressed in terms of the compartments $\mathcal{\mathcal{R}}$ and $\mathcal{\mathcal{C}}$ (Equation 9 in the main text). We note that the the GIA^{0} model discussed in the previous section corresponds to the general compartment dynamics of GIA models, Equation 9, main text, if the dynamics of compartments are assumed to be Markovian. Thus, we can treat the GIA^{0} model as a Markovian approximation of generic GIA models. In this section, we test this approximation numerically.
To this end, we first wish to relate the effective (nonMarkovian) rates ${\lambda}_{R,C}$ and γ_{C} of a generic GIA model to the rates of the Markovian approximation, the GIA^{0} model. We refer to this model – the GIA^{0} model matched to the effective rates of a particular more complex GIA model – as the equivalent model to the latter. The equivalent rates ${\lambda}_{R}$, ${\lambda}_{C}$ and γ_{C} are computed considering the same steady state condition in terms of mean number of cells. To this aim, we rewrite the dynamics of mean cell numbers, Equation 7 in the main text, in block form as,
in which ${\overline{\mathit{\bm{n}}}}_{R,C}$ denote the vectors of mean cell numbers of states restricted to compartments $\mathcal{R},\mathcal{C}$, respectively, and ${n}_{\mathrm{\varnothing}}$ the number of lost cells (not considered for total cell numbers and homeostasis condition). It is noted that ${A}_{RC}=\mathrm{}$, since there cannot be links from $\mathcal{\mathcal{C}}$ to $\mathcal{\mathcal{R}}$. Also ${A}_{\mathrm{\varnothing}R}=\mathrm{}$ as we do not consider loss from $\mathcal{\mathcal{R}}$ (see main text for the arguments).
Thus, summing up all the components in each compartment, ${\overline{n}}_{R}={\sum}_{i}{({\overline{\mathbf{\mathbf{n}}}}_{R})}_{i}=1$ and ${\overline{n}}_{C}={\sum}_{i}{({\overline{\mathbf{\mathbf{n}}}}_{C})}_{i}$, results in
The equivalent parameters are then estimated from the steady state condition ${\overline{\mathbf{\mathbf{n}}}}_{X}^{*}$ and ${\overline{n}}_{X}^{*}$, for $X=R,C,\mathrm{\varnothing}$, as,
The applicability of this approximation was evaluated by comparing the clone size distribution obtained from the random GIA models (generated as described in 'Generation of random models' and analyzed in the main text) with that from the corresponding equivalent GIA^{0} model with parameters ${\widehat{\lambda}}_{1}={\widehat{\lambda}}_{R}={\lambda}_{R}/{\gamma}_{C}$ and ${\widehat{\lambda}}_{2}={\widehat{\lambda}}_{C}={\lambda}_{C}/{\gamma}_{C}$. The values of ${\widehat{\lambda}}_{1}$ and ${\widehat{\lambda}}_{2}$ for all the GIA random models are shown in Appendix 1—figure 9 in the contour map of the expected mean number of cells in $\mathcal{\mathcal{C}}$ (in compartment $\mathcal{\mathcal{R}}$ there is always one single cell). In general, ${\widehat{\lambda}}_{1}$ remains below five and ${\widehat{\lambda}}_{2}$ is spread between zero and one. As measure of the error of the equivalent model, $\u03f5$, we choose the maximum difference between the distributions of a particular random GIA model and that of the corresponding equivalent model, relative to the peak of the distribution of the random model. For low mean cell numbers, the distribution is compared to Equation 20; for large mean number instead, the rescaled distribution is compared to Equation 23. A threshold on the mean cell number equal to 10 was chosen to distinguish between the two cases. This relative error $\u03f5$ as function of ${\widehat{\lambda}}_{2}$ is presented in Appendix 1—figure 10, where it is evident that large errors are obtained only for large values of this parameters. Some illustrative cases, representative of different value of ${\widehat{\lambda}}_{2}$, were selected and their distribution is shown in Appendix 1—figure 11, Appendix 1—figure 12 and Appendix 1—figure 13. The following considerations are made:
Two cases for ${\widehat{\lambda}}_{2}<0.2$ are presented in Appendix 1—figure 11. In these cases, the distribution obtained from the random models agrees with the analytic solution from the equivalent model, which in turn is well approximated by a Poisson distribution. As expected, larger deviations between the equivalent model’s analytic solution and the approximation are noted for increasing values of ${\widehat{\lambda}}_{2}$. In general, all the models in this range are well approximated by the equivalent model.
The two cases presented in Appendix 1—figure 12 have ${\widehat{\lambda}}_{2}>0.8$, for which the Gamma distribution is an approximation of the equivalent model’s analytic solution. The distribution in some cases (see for instance the top figure), presents some deviations with respect to the equivalent model. However, globally a good agreement is obtained in most of the cases (failing ratio, based on a 0.5 maximum error is 21.7%).
Two cases in an intermediate range $0.2<{\widehat{\lambda}}_{2}<0.8$ are shown in Appendix 1—figure 13. Again, the equivalent model’s analytic solution is well representative of the distribution (failing ratio, based on a 0.5 maximum error is 3.2%). It is noted that for such values of ${\widehat{\lambda}}_{2}$ an approximation of the equivalent model analytic solution is not available.
Thus, in most of the tested cases the equivalent model is able to catch the behavior of a generic random GIA model, and thus represents a good approximation (global failing ratio, based on a 0.5 maximum error is 6%). In the cases where the equivalent model does not yield a good approximation, the internal structure of the $\mathcal{\mathcal{R}}$ and $\mathcal{\mathcal{C}}$ compartments become relevant and subsequent events that affect ${n}_{R}$ and ${n}_{C}$ become dependent on each other, and thus are nonMarkovian.
GIA model for large $\hat{\lambda}}_{R$
To test the behavior of a generic GIA model in case of large ${\widehat{\lambda}}_{R}$, the GIA random models (generated as described in 'Generation of random models' and analyzed in the main text) were modified by changing the process rates associated to the renewing compartment to achieve ${\widehat{\lambda}}_{R}=30$. To this aim, considering that infinite solutions are possible, we applied a global search method, and more specifically a Genetic Algorithm (Goldberg, 1989). We therefore setup an optimization problem, where the process parameters are the optimization variables and the cost function is the error of the current ${\widehat{\lambda}}_{R}$ with respect to the target.
The envelope of curves obtained in all the random models and some illustrative profiles are shown in Appendix 1—figure 14. A reference Normal distribution, characterized by unitary mean and variance equal to $1/{\widehat{\lambda}}_{R}=1/30$ is also reported: this curve corresponds to the distribution expected in the equivalent model for which ${\widehat{\lambda}}_{1}={\widehat{\lambda}}_{R}$. Deviations become relevant, when the internal structure of compartments in a random model leads to subsequent events that are not independent from each other. These effects alter the variance of the Normal distribution. In fact, Figure 4 in the main text is based on the same simulation results, but in this case the rescaling is done considering both the mean number of cells and its variance (a Normal distribution is a twoparameter distribution).
GIA^{B} test case: bimodal distribution
In the previous subsection we increased ${\lambda}_{R}$ in a way which assures that other parameters within $\mathcal{\mathcal{R}}$ stay of the same order of magnitude. Here, we address the question what happens if some parameters within $\mathcal{\mathcal{R}}$ are much smaller than parameters of $\mathcal{\mathcal{C}}$, such as γ_{C}. For that purpose, we study another simple GIA model, let us call it GIA^{B}, as a modification of the GIA^{0} test model defined by 14. In the GIA^{B} model the renewing compartment is composed by two states ${X}_{1}$ and ${X}_{2}$, instead of only one. Cells in these states divide asymmetrically (i.e. one daughter cell remains within the renewing compartment while the other enters the committed compartment) or change state between ${X}_{1}$ and ${X}_{2}$ (cell state switching) while still remaining within the renewing compartment. The committed compartment of the system is composed just by a single state, ${X}_{3}$, and cells in this state either duplicate or die (as the previous state ${X}_{2}$ in Equation 14). This corresponds to the model
In this model, the effective parameters as defined in 'Approximation of generic GIA models', ${\lambda}_{R}={\lambda}_{1}{P}_{1}^{*}+{\lambda}_{2}{P}_{2}^{*}$, where ${P}_{i}^{*}=\frac{{\omega}_{ji}}{{\omega}_{ij}+{\omega}_{ji}}$, $i,j=1,2,i\ne j$, and ${\gamma}_{C}=\gamma $. As before, we define the nondimensionalized parameters ${\widehat{\lambda}}_{R}={\lambda}_{R}/{\gamma}_{C}$ and here we also define $\widehat{\omega}={\omega}_{12}/{\gamma}_{C}$, and further the parameter ratios $a={\lambda}_{1}/{\lambda}_{2}$ and $b={\omega}_{12}/{\omega}_{21}$. In the following, we test this model for different values of $a$ and $\widehat{\omega}$ as reported in Appendix 1—table 2, while fixing ${\widehat{\lambda}}_{R}=30$, which is our main scaling parameter, as well as ${\widehat{\lambda}}_{C}=0$ and $b=1$.
The rescaled distribution of the number of cells in the committed compartment $\mathcal{\mathcal{C}}$ (i.e. in state ${X}_{3}$), ${n}_{C}$, obtained at the final simulation time τ, is shown in Appendix 1—figure 15. A value of τ equal to $20/{\alpha}_{\mathrm{min}}$ (where ${\alpha}_{\mathrm{min}}$ is the minimum of all rate parameters) was chosen to assure that the steady state is reached. Considering first the test cases GIA_{B}#1 and GIA_{B}#2 according to Appendix 1—table 2, which are characterized by $a=1$ (i.e. there is no difference in the division timescales for the two renewing states), they both lead to a Normal distribution, independently on the value assumed by $\widehat{\omega}$. Test cases GIA_{B}#3 to GIA_{B}#7 instead are all characterized by $a=10$, and different orders of magnitude for $\widehat{\omega}$ are tested. The distribution in these cases is Normal until $\widehat{\omega}\ge {\widehat{\lambda}}_{R}/10$ (see cases GIA_{B}#3 to GIA_{B}#5); when $\widehat{\omega}$ is significantly lower than ${\widehat{\lambda}}_{R}$, then bimodality emerges (see cases GIA_{B}#6 and GIA_{B}#7). Looking at the extreme case, GIA_{B}#7, cells in each renewing state, if analyzed independently, would result in a Poisson distribution in the committed compartment with different mean values (low for the slowdividing state and large for the fastdividing one). Thus, globally the distribution is in line with a bimodal distribution computed as
in which $\beta $ is the mixing parameter, computed as,
and the parameters ${\widehat{\lambda}}_{R}^{(i)}$ and ${\overline{n}}_{i}$ for $i=1,2$ correspond to the parameter ${\widehat{\lambda}}_{R}$ and to the mean number of cells of a system in which the renewing compartment would be composed just by state X_{i}. The total mean number of cells is instead indicated by $\overline{n}$. The bimodal distribution given by Equation 45 is indicated as a black dasheddotted line in Appendix 1—figure 15.
Analysis of the Generalized Population Asymmetry model
In the main text, it is shown that GPA models predict asymptotically, for large times t, the same rescaled clone size distribution, that is, an Exponential distribution of unitary mean.
In Appendix 1—figure 16, the 50%tile distribution of all the GPA models analyzed is shown at different levels of extinction (which are related to the different time points), showing a gradual convergence to the expected Exponential distribution.
Thus, the Markov approximation to all GPA models, Equation 12 in the main text (the equivalent model of GPA models), becomes accurate for sufficiently large t and no significant deviations are observed. This also means that for large t, the distribution is independent of the choice of parameters, since only the mean value of surviving clones, ${\overline{n}}_{s}$, depends on parameters, which however, does not affect the rescaled distribution in terms of $x=\frac{n}{{\overline{n}}_{s}}$. We can therefore abstain from an extended study of different parameter regimes. This is in contrast to the GIA model class where distributions depend sensitively in the choice of parameters if we are not in the scaling regime of large ${\widehat{\lambda}}_{R}$, and the nonMarkovian nature of GIA models can become relevant, as we showed in the previous section.
Asymptotic clone size distributions: Mathematical analysis
In the previous two sections, we studied numerically how a Markovian representation can approximate general cell fate models (GIA and GPA) models. Here, we study from an analytical view point how generic GIA and GPA models converge to the respective limiting distributions, for large time t (GPA models) and large ${\widehat{\lambda}}_{R}$ (GIA models).
Similar to 'Approximation of generic GIA models', we define ${\mathit{\bm{n}}}_{R}$ and ${\mathit{\bm{n}}}_{C}$ as the cell number vectors (here: actual cell numbers of the stochastic model, not mean cell numbers) restricted to the states of compartments $\mathcal{\mathcal{R}}$ and $\mathcal{\mathcal{C}}$, respectively. We further define the accumulated cell numbers ${n}_{R}={\sum}_{i}{({\mathit{\bm{n}}}_{R})}_{i}$ and ${n}_{C}={\sum}_{i}{({\mathit{\bm{n}}}_{C})}_{i}$ in $\mathcal{\mathcal{R}}$ and $\mathcal{\mathcal{C}}$, respectively. Considering ${n}_{R}$ and ${n}_{C}$ as observables of our compartment model, this corresponds to a Hidden Markov Model in that the dynamics of the observables are not Markovian, yet they are entirely determined by a set of states which follow a Markov process.
General dynamics of Ccells for GIA and GPA models
Comments on the effective rate parameter ${\lambda}_{R}$
For general GIA and GPA models in the compartment representation of Equation 9, main text, the effective rate parameter ${\lambda}_{R}$ (i.e. the frequency of cell divisions in $\mathcal{\mathcal{R}}$ per cell), is defined similar as in 'Approximation of generic GIA Models', yet, here we take into account that ${\lambda}_{R}$ can depend on time via the – not necessarily stationary – distribution of cells within $\mathcal{\mathcal{R}}$ (since the process is nonMarkovian). Hence, in these more general terms, we define ${\lambda}_{R}(t)={\sum}_{i\in \mathcal{R}}{\lambda}_{i}{P}_{i}^{R}(t)$ where ${P}_{i}^{R}(t)=\frac{{\overline{n}}_{i}(t)}{{\overline{n}}_{R}(t)}$ is the probability of a single cell to be in state i at time t. ${P}_{i}^{R}(t)$ may variate after each event $E$, as the conditional probability ${{P}^{R}}_{E}$, provided that an event E has just occurred, differs from the stationary state distribution.
In homeostasis, where the number of Rcells must on average stay constant, ${\lambda}_{R}$ is also the rate, per Rcell, at which Ccells are created from Rcells, via events $R\to R+C,R\to C+C$, or direct transition, $R\to C$. Thus, the total rate of Ccells being created from the Rcells by such events – let us call them $RC$events – is ${\lambda}_{R}{n}_{R}$. While the nonMarkovian nature of the process does not assure that such events are distributed exponentially, we can state that, by definition, the number of such creation events in a time period $\mathrm{\Delta}t$, ${N}_{RC}$, has mean value $\u27e8{N}_{RC}(\mathrm{\Delta}t)\u27e9={\int}_{0}^{\mathrm{\Delta}t}{\lambda}_{R}(t){n}_{R}(t)\mathit{d}t$.
While, ${\lambda}_{R}(t)$ may in principle depend on time, we note that when internal rates of $\mathcal{\mathcal{R}}$ are fast compared to the time period $\mathrm{\Delta}t$ above (an internal rate of $\mathcal{\mathcal{R}}$ is a rate ω_{ij} where states i,j are both in $\mathcal{\mathcal{R}}$), then ${\lambda}_{R}(t)$ fluctuates quickly and we can make an adiabatic approximation, replacing ${\lambda}_{R}(t)$ by its average ${\overline{\lambda}}_{R}={\sum}_{i\in \mathcal{R}}{\lambda}_{i}{P}_{i}^{R}$, where ${P}_{i}^{R*}=\frac{{\overline{n}}_{i}^{*}}{{\overline{n}}_{R}^{*}}$ is the steady state value of ${P}_{i}^{R}(t)$ (this corresponds for GIA models to the definition of ${\lambda}_{R}$ in 'Approximation of generic GIA models'). This is fulfilled in our simulations of large ${\widehat{\lambda}}_{R}$, since internal rates, such as $\widehat{\omega}$ defined in 'GIA^{B} test case: bimodal distribution', scale with ${\widehat{\lambda}}_{R}$ when ${\lambda}_{R}\to \mathrm{\infty}$ (see 'GIA model for large'). Hence, the time scales of internal rates are substantially smaller than the relevant time scale $\mathrm{\Delta}t=1/{\overline{\gamma}}_{C}$, the lifetime of generated Ccells. Therefore, when comparing with simulation results, it is generally appropriate to assume that ${\lambda}_{R}(t)\approx {\overline{\lambda}}_{R}$ is constant. In the following subsection, we will discuss this case. The case when internal rates are slower than the time scale γ_{C} is discussed in the subsequent subsection.
Asymptotic distributions of Ccells
Each Ccell created by an $RC$event initiates a subclone within $\mathcal{\mathcal{C}}$, defined through its progeny, which then follows the dynamics of $\mathcal{\mathcal{C}}$. Such subclones evolve independently of each other (a defining characteristic of branching processes [Haccou et al., 2005]). Let us call the number of cells of a subclone created by an $RC$event at time ${t}_{i}$, which evolves over time t, as ${\xi}_{i}(t)$. We denote two $RC$events which happen at the same time via a symmetric division of type $R\to C+C$ by different indices i and $i+1$, yet with ${t}_{i}={t}_{i+1}$. Therefore, the total number of cells in $\mathcal{\mathcal{C}}$ is the sum of independent random numbers ${\xi}_{i}$,
Note that the random numbers ${\xi}_{i}(t)$ are not identically distributed, since their statistics depend on the time point of the ith $RC$event. In particular, the mean value, ${\overline{\xi}}_{i}(t{t}_{i})=\u27e8{\xi}_{i}(t)\u27e9$ and variance ${\sigma}_{\xi}^{2}(t{t}_{i})=\u27e8{({\xi}_{i}(t){\overline{\xi}}_{i})}^{2}\u27e9$ depend on the time passed since the respective $RC$event at time ${t}_{i}$. Thus, we cannot apply the central limit theorem in its original form to the sum of random numbers, Equation 47. However, a variation of the central limit theorem states that sums of nonidentically distributed random variables, ${\sum}_{i}{\xi}_{i}$, converge to normally distributed random variables, if mean and variance of ${\xi}_{i}$ are finite, and they fulfill Lindeberg’s condition (Billingsley, 1995).
The (strict) Lindeberg’s condition is said to be fulfilled for a sequence of random numbers ${\xi}_{i}$,$i=1,\mathrm{\dots},N$, if
where ${\sigma}_{i}^{2}=\u27e8{({\xi}_{i}{\overline{\xi}}_{i})}^{2}\u27e9$ and ${\sigma}_{N}^{2}={\sum}_{i=1}^{N}{\sigma}_{i}^{2}$ . If this is fulfilled, then ${n}_{C}={\sum}_{i=1}^{N}{\xi}_{i}$ converges for $N\to \mathrm{\infty}$ to a random variable that is normal distributed.
To show that the ${\xi}_{i}$ fulfill Lindeberg’s condition, we note that ${\xi}_{i}(t{t}_{i})$ follow a subcritical multitype branching process, for which ${\overline{\xi}}_{i}(t)\to 0$ for $t\to \mathrm{\infty}$, which is assured since the eigenvalues of the adjacency matrix of $\mathcal{\mathcal{C}}$ are all negative (since dominant eigenvalues of all SCCs in $\mathcal{\mathcal{C}}$ are negative [astrom_murray_feedback_book]). For multitype branching processes, the variance σ^{2} is proportional to the mean value, hence ${\sigma}_{i}^{2}(t{t}_{i})\sim \overline{\xi}(t{t}_{i})$. Therefore, ${\sigma}_{i}^{2}\to 0$ for $t\to \mathrm{\infty}$, hence it is bounded, i.e there exists $C\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}>\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}0$ such that ${\sigma}_{i}^{2}(t)\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}<\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}C$ for all t. Furthermore, since initially, at $t={t}_{i}$, ${\overline{\xi}}_{i}({t}_{i})=1$, we know that there exist ${t}_{1}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}>\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}0$ and $\delta \phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}>\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}0$ such that ${\overline{\xi}}_{i}(t)\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}>\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\delta$ for $t{t}_{i}<{t}_{1}$. Now we recall that, since here we assume the validity of the adiabatic approximation discussed in the previous subsection, the number of $RC$events within a time period $\mathrm{\Delta}t$ is ${N}_{RC}(\mathrm{\Delta}t)\sim {\lambda}_{R}{\int}_{0}^{\mathrm{\Delta}t}{n}_{R}({t}^{\prime})\mathit{d}{t}^{\prime}$. For generic ${\lambda}_{R}$, ${N}_{RC}$ is finite and thus is ${\sigma}_{N}$, since all ${\sigma}_{i}(t)\to 0$ for large t. However, for ${\lambda}_{R}\to \mathrm{\infty}$ or ${n}_{R}\to \mathrm{\infty}$, we get that ${N}_{RC}({t}_{1})\sim {\overline{\lambda}}_{R}{n}_{R}\to \mathrm{\infty}$ and thus $\sigma}_{N}^{2}=\sum _{i=1}^{{N}_{RC}}{\sigma}_{i}^{2}(t)\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}>\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}{N}_{RC}\delta \to \mathrm{\infty$. On the other hand, all ${\sigma}_{i}^{2}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}<\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}C$, which means that all $\frac{{\sigma}_{i}^{2}}{{\sigma}_{N}^{2}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}<\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\frac{C}{{\sigma}_{N}^{2}}\to 0$ for ${\lambda}_{R}\to \mathrm{\infty}$ or ${n}_{R}\to \mathrm{\infty}$. Hence, Lindeberg’s condition is fulfilled if ${\lambda}_{R}\to \mathrm{\infty}$ or ${n}_{R}\to \mathrm{\infty}$ and thus, ${n}_{C}$ becomes normally distributed,
The variance scales with ${n}_{C}$ since variances of independent random numbers add linearly and each ${\sigma}_{i}^{2}\sim {\overline{\xi}}_{i}$. The exact value of ${\overline{n}}_{C}$ and the prefactor of the variance of ${n}_{C}$ in this limit depend on the (nonMarkovian) model details.
Deviations from a normal distribution in the asymptotic case
The arguments leading to Equation 49 hold for large ${\widehat{\lambda}}_{R}$ if the internal rates of $\mathcal{\mathcal{R}}$ are comparable to ${\overline{\lambda}}_{R}={\sum}_{i}{\lambda}_{i}\frac{{\overline{n}}_{i}^{*}}{{\overline{n}}_{R}^{*}}$, which is satisfied for all cases we sampled randomly for numerical simulations, see 'GIA model for large'. However, if internal rates of $\mathcal{\mathcal{R}}$ are much smaller than ${\lambda}_{R}$, then the adiabatic approximation ${P}_{i}^{R}(t)\approx \frac{{\overline{n}}_{i}^{*}}{{\overline{n}}_{R}^{*}}$ does not apply and ${\lambda}_{R}(t)$ may vary slower than the time scale $1/{\overline{\gamma}}_{C}$. For example, consider a GIA model in which $\mathcal{\mathcal{R}}$ can be decomposed into two subcompartments, say ${\mathcal{R}}_{1}$ and ${\mathcal{R}}_{2}$, whereby any rates ${\omega}_{ij},{\omega}_{ji}$ with $i\in {\mathcal{R}}_{1},j\in {\mathcal{R}}_{2}$ have ${\omega}_{ij},{\omega}_{ji}\ll {\overline{\lambda}}_{R}$, as the example discussed in 'GIA^{B} test case: bimodal distribution'. Then, the single cell in $\mathcal{\mathcal{R}}$ (note that always ${n}_{R}=1$ in GIA models) may spend long time periods in ${\mathcal{R}}_{1}$ and ${\mathcal{R}}_{2}$ respectively. Now, if ${\overline{\lambda}}_{{R}_{1}}={\sum}_{i\in {\mathcal{R}}_{1}}{\lambda}_{i}\frac{{\overline{n}}_{i}}{{\overline{n}}_{{R}_{1}}}\ne {\sum}_{i\in {\mathcal{R}}_{2}}{\lambda}_{i}\frac{{\overline{n}}_{i}}{{\overline{n}}_{{R}_{2}}}={\overline{\lambda}}_{{R}_{2}}$, then, for time periods exceeding $1/{\overline{\gamma}}_{C}$, the effective asymmetric division rates are ${\overline{\lambda}}_{{R}_{1}}$ and ${\overline{\lambda}}_{{R}_{2}}$ respectively, and during these time periods the distribution of ${n}_{C}$ cells has mean ${\overline{n}}_{C}^{(1)}\sim {\overline{\lambda}}_{{R}_{1}}$ and ${\overline{n}}_{C}^{(2)}\sim {\overline{\lambda}}_{{R}_{2}}$ respectively. Hence, the total clone size distribution will be the mix of two Normal distributions with mean ${\overline{n}}_{C}^{(1)}$ and ${\overline{n}}_{C}^{(2)}$, respectively, that is, a bimodal distribution. This scenario is discussed in 'GIA^{B} test case: bimodal distribution', for the specific case of two states in $\mathcal{\mathcal{R}}$.
GIA models
In GIA models, the number of Rcells is conserved, and in particular, for clones, we have ${n}_{R}=1$ for all times. Hence, the rate of $RC$events is simply ${\lambda}_{R}$. Now, if internal rates are fast and ${\lambda}_{R}\to \mathrm{\infty}$, then ${n}_{C}$ becomes normally distributed, as argued above. Hence, also $n={n}_{R}+{n}_{C}=1+{n}_{C}$ follows a Normal distribution, with mean ${n}_{C}+1$ instead.
Nonetheless, if internal rates are less than γ_{C} then bimodal distributions may be observed, as discussed in 'GIA^{B} test case: bimodal distribution'.
GPA models
The dynamics of GPA models read, in compartment formulation,
Since the dynamics of Rcells do not depend on Ccells, we can first consider the formers’ dynamics separately. In homeostasis, where ${\lambda}_{R}{r}_{RR}={\lambda}_{R}{r}_{CC}+{\omega}_{RC}$, we have thus for Rcells,
This is a simple continuous time branching process with two offspring; yet it is nonMarkovian: subsequent events may be correlated, since each event imbalances the internal distribution ${P}_{i}^{R}$ of cells in the compartment $\mathcal{\mathcal{R}}$. Yet, as for Ccells, we can write the number of Rcells as a sum of independent (but not identically distributed) random variables. Let us consider for each Rcells, born at time ${t}_{i}$, the random variable ${\xi}_{i}^{R}$ describing its 'survival' state, that is, ${\xi}_{i}^{R}=1$ if that cell is still in $\mathcal{\mathcal{R}}$, and ${\xi}_{i}^{R}=0$ if that cell has left $\mathcal{\mathcal{R}}$ via symmetric differentiation, $R\to C+C$ or direct transition, $R\to C$. Essentially, the random numbers ${\xi}_{i}^{R}$ are the ‘branches’ of the branching process. Since these events do not depend on other cells, the random numbers ${\xi}_{i}^{R}$ are independent of each other, and thus,
is a sum of independent, not identically distributed random variables. Here,${N}_{b}(t)$ is the total number of birth events occurring at rate ${\lambda}_{R}{r}_{RR}{n}_{R}$, $R\to R+R$, up to time t. Since ${\xi}_{i}^{R}(t)\le 1$ and ${\xi}_{i}^{R}(t={t}_{i})=1$, we can argue analogue to above for Equation 49 that the sequence of ${\xi}_{i}^{R}$ fulfills Lindeberg’s condition and thus ${n}_{R}$ converges to a Normal distribution, whereby the mean value ${\overline{n}}_{R}=1$ (since due to homeostasis the mean number is constant and the initial condition is ${n}_{R}(t=0)=1$). Hence, the probability to have ${n}_{R}$ cells in $\mathcal{\mathcal{R}}$ is
However, here, the variance ${\sigma}_{R}^{2}$ is a random variable itself: Since the ${\xi}_{i}^{R}$ are independent, ${\sigma}_{R}^{2}={\sum}_{i=1}^{{N}_{b}(t)}{\sigma}_{i}^{2}$, where ${\sigma}_{i}^{2}=\u27e8{({\xi}_{i}^{R}{\overline{\xi}}^{R})}^{2}\u27e9$, and where ${N}_{b}(t)$ is a random variable. The random numbers ${\xi}_{i}^{R}$ can only have the values ${\xi}_{i}=1$ or ${\xi}_{i}^{R}=0$ and they follow a simple death process, so for ${\xi}^{R}=0$, it must be ${\sigma}_{i}^{2}=0$, while for ${\xi}_{i}^{R}=1$, the variance must be finite, let’s say, ${\sigma}_{i}^{2}=\beta (t)>0$ where $\beta $ can in principle depend on time, yet is not known (it depends on the nonMarkovian details of the model). Hence,
since the number of summands with ${\xi}_{i}^{R}=1$ is the number of surviving Rcells, that is, ${n}_{R}$. Substituting ${\sigma}_{R}^{2}=\beta (t){n}_{R}$ into Equation 54 gives,
This is an Exponential distribution with mean value ${\overline{n}}_{R}=\u27e8{n}_{R}\u27e9=2\beta (t)$. Finally, when we enforce normalisation of the probability distribution, we get,
Eventually, we also have to 'add' the Ccells. Since for $t\gg 1$, also ${n}_{R}\gg 1$, individual events ${n}_{R}\to {n}_{R}\pm 1$ do not significantly affect the distribution of Rcells, ${P}_{i}^{R}=\frac{{\overline{n}}_{i}}{{\overline{n}}_{R}}$ (in contrast to the case of ${n}_{R}=1$ for GIA models), and hence we can assume the adiabatic approximation discussed above, where ${P}_{i}^{R}\approx {P}_{i}^{R*}$ and thus ${\lambda}_{R}\approx const$. Therefore, Ccells are distributed according to a Normal distribution with mean ${\overline{n}}_{C}$ and variance ${\sigma}_{{n}_{2}}^{2}\sim {\overline{n}}_{C}\sim {\lambda}_{R}{n}_{R}$. As argued in the main text, the mean value of ${n}_{R}$, conditionend on survival of a clone, ${n}_{R}>0$, must grow over time, without bound if $t\to \mathrm{\infty}$. Therefore, we can generally assume that ${n}_{R}\gg 1$, and hence both ${\overline{n}}_{C}\sim {n}_{R}\to \mathrm{\infty}$ and ${\sigma}_{C}^{2}\sim {n}_{R}\to \mathrm{\infty}$. However, if we express the clone size in form of a rescaled variable $x=\frac{n}{{\overline{n}}_{s}}$ (${\overline{n}}_{s}$ is the mean of surviving clones) we can write $x={x}_{R}+{x}_{C}$ with ${x}_{R}=\frac{{n}_{R}}{{\overline{n}}_{s}}$ and ${x}_{C}=\frac{{n}_{C}}{{\overline{n}}_{s}}$, and note that the rescaled standard width of the distribution of ${x}_{C}$, ${\sigma}_{{x}_{C}}=\frac{{\sigma}_{C}}{\overline{n}}\sim \frac{\sqrt{{\overline{n}}_{C}}}{{\overline{n}}_{R}+{\overline{n}}_{C}}\sim \frac{\sqrt{{n}_{R}}}{{n}_{R}}$ vanishes for $t\to \mathrm{\infty}$. Therefore, ${x}_{C}$ is effectively a constant in that limit, ${x}_{C}\approx {\overline{x}}_{C}\propto {x}_{R}$. Hence, also $x={x}_{R}+{x}_{C}\propto {x}_{R}$ and thus, the rescaled clone size, $x=\frac{n}{{\overline{n}}_{s}}$, is distributed according to an Exponential distribution (here: a probability density function) with unit mean, and after renormalisation, we get that
This distribution is indeed observed in all our simulations of GPA models for large t.
Data availability
All numerical data used for figures is produced by programme code, which can be found on Github, under https://github.com/cp4u17/simCellState (copy archived at https://github.com/elifesciencespublications/simCellState).
References

BookHandbook of Mathematical Functions with Formulas, Graphs and Mathematical Tables, 32United States Department of Commerce, National Bureau of Standards.

Exact solution of a twotype branching process: clone size distribution in cell division kineticsJournal of Statistical Mechanics: Theory and Experiment 2010:P07028.https://doi.org/10.1088/17425468/2010/07/P07028

A ”dynamic” proof of the FrobenuisPerron theorem for Metzler matricesProbability Statistics, and Mathematics 1:17–26.https://doi.org/10.1016/B9780120584703.500094

BookFeedback Systems: An Introduction for Scientists and EngineersPrinceton Unversity Press.

Unravelling stem cell dynamics by lineage tracingNature Reviews Molecular Cell Biology 14:489–502.https://doi.org/10.1038/nrm3625

Asymptotics for interacting particle systems onZ ^{d}Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete 53:183–196.https://doi.org/10.1007/BF01013315

Stem cell heterogeneity and plasticity in epitheliaCell Stem Cell 16:465–476.https://doi.org/10.1016/j.stem.2015.04.014

Exact stochastic simulation of coupled chemical reactionsThe Journal of Physical Chemistry 81:2340–2361.https://doi.org/10.1021/j100540a008

BookGenetic Algorithms in Search, Optimization and Machine Learning (1st edn)Boston, MA, USA: AddisonWesley Longman Publishing Co., Inc.

BookMathematical Modelling of Clonal Stem Cell DynamicsIn: Cahan P, editors. Computational Stem Cell Biology. New York: Humana. pp. 107–129.https://doi.org/10.1007/9781493992249_5

Stability and steady state of complex cooperative systems: a diakoptic approachRoyal Society Open Science 6:191090.https://doi.org/10.1098/rsos.191090

BookBranching Processes: Variation, Growth, and Extinction of PopulationsCambridge: Cambridge University Press.https://doi.org/10.1017/CBO9780511629136

Handbook of Differential Equations: Ordinary Differential Equations239–257, Monotone dynamical systems, Handbook of Differential Equations: Ordinary Differential Equations, Elsevier, 10.1016/S18745725(05)800069.

Universal patterns of stem cell fate in cycling adult tissuesDevelopment 138:3103–3111.https://doi.org/10.1242/dev.060103

Stem cells: attributes, cycles, spirals, pitfalls and uncertainties. Lessons for and from the cryptDevelopment 110:1001–1020.

Intravital microscopy through an abdominal imaging window reveals a premicrometastasis stage during liver metastasisScience Translational Medicine 4:158ra145.https://doi.org/10.1126/scitranslmed.3004394

Stem cell selfrenewal in intestinal cryptExperimental Cell Research 317:2719–2724.https://doi.org/10.1016/j.yexcr.2011.07.010

Generalized lacZ expression with the ROSA26 cre reporter strainNature Genetics 21:70–71.https://doi.org/10.1038/5007

Plasticity within stem cell hierarchies in mammalian epitheliaTrends in Cell Biology 25:100–108.https://doi.org/10.1016/j.tcb.2014.09.003
Decision letter

Yukiko M YamashitaReviewing Editor; University of Michigan, United States

Naama BarkaiSenior Editor; Weizmann Institute of Science, Israel
In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.
Acceptance summary:
This work builds on several recent highprofile applications of statistical physics to infer the mode of stem cell fate choice in mammalian tissues based on clone size distributions obtained via cell lineage tracing. The work makes a solid contribution especially in clarifying how a broad class of models conceptually and quantitatively fits with the population asymmetry or the invariant asymmetry model. Such universality argument may have been already taken for granted by physicists/mathematicians but may have not been obvious to biologists who have specific systems in mind that have much richer cell state structure than the original simplified models.
Through a detailed mathematical analysis, the authors demonstrate that this large set of models can, in homeostatic tissues, be decomposed into two 'universality classes' in terms of their predicted asymptotic clone size statistics. This limits our ability to qualitatively distinguish between different models of stem cell fate choice based on clone size data only.
Decision letter after peer review:
Thank you for submitting your article "Universality of clonal dynamics poses fundamental limits to identify stem cell selfrenewal strategies" for consideration by eLife. Your article has been reviewed by two peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Naama Barkai as the Senior Editor The reviewers have opted to remain anonymous.
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
We would like to draw your attention to changes in our revision policy that we have made in response to COVID19 (https://elifesciences.org/articles/57162). Specifically, we are asking editors to accept without delay manuscripts, like yours, that they judge can stand as eLife papers without additional data, even if they feel that they would make the manuscript stronger. Thus the revisions requested below only address clarity and presentation.
Summary:
This manuscript provides an important contribution to the field of stem cell biology regarding their 'division modes' (symmetric vs. asymmetric divisions), as detailed in individual comments from reviewers.
Essential revisions:
The reviewers felt that the concerns can be mainly addressed textual changes as detailed in individual comments.
Reviewer #1:
This manuscript describes a general framework of understanding fate decision models, which have been previously proposed in explaining the dynamics of tissue stem cells in homeostasis. The work makes a solid contribution especially in clarifying how a broad class of models conceptually and quantitatively fits with the population asymmetry or the invariant asymmetry model. Such universality argument may have been already taken for granted by physicists/mathematicians but may have not been obvious to biologists who have specific systems in mind that have much richer cell state structure than the original simplified models.
The main criticism I have about the work is that the manuscript may misguide the readers to think that the GIA and GPA models are the only models that have been discussed in explaining the clonal statistics. As one of the authors showed in their previous paper (Greulich and Simons, 2016), the basic statistics such as the clone size distribution will change when adding spatial regulation to the voter model class, which is distinct from what is obtained by GIA or GPA. In fact, for all the tissues that have been studied so far, including the gut (10.1016/j.cell.2010.09.016, LopezGarcia et al., 2010), seminiferous tubule (Klein et al., 2010, Hara et al., 214, and 10.1016/j.stem.2018.11.013), and the skin (10.1016/j.stem.2018.09.005), experiments suggest that the spatial correlation between the fates are nonnegligible and therefore the asymptotic statistics should follow the voter model. I would assume that there needs to be at least a justification in only studying the cases without spatial interactions here, especially if the authors do not know any specific example that nicely corresponds to their model.
Reviewer #2:
This work builds on several recent highprofile applications of statistical physics to infer the mode of stem cell fate choice in mammalian tissues based on clone size distributions obtained via cell lineage tracing.
The authors note that two conceptually distinct 'models' of stem cell fate choice, the Population Asymmetry (PA) and Dynamic Heterogeneity (DH) models, exhibit the same asymptotic clone size distributions. They then consider a more general set of possible models for the proliferation, differentiation and loss of a multitype population of cells in a tissue.
Through a detailed mathematical analysis, the authors demonstrate that this large set of models can, in homeostatic tissues, be decomposed into two 'universality classes' in terms of their predicted asymptotic clone size statistics. This limits our ability to qualitatively distinguish between different models of stem cell fate choice based on clone size data only.
Overall I found this work, though technically complex, to be well motivated and well explained. The authors provide a significant advance to our understanding of how different descriptions of stem cell fate choice may be qualitatively distinguished experimentally (though, as the authors note, we may nevertheless apply parameter inference techniques and model selection to quantitatively distinguish competing models based on given datasets).
https://doi.org/10.7554/eLife.56532.sa1Author response
Reviewer #1:
[…] The main criticism I have about the work is that the manuscript may misguide the readers to think that the GIA and GPA models are the only models that have been discussed in explaining the clonal statistics. As one of the authors showed in their previous paper (Greulich and Simons, 2016), the basic statistics such as the clone size distribution will change when adding spatial regulation to the voter model class, which is distinct from what is obtained by GIA or GPA. In fact, for all the tissues that have been studied so far, including the gut (10.1016/j.cell.2010.09.016, LopezGarcia et al., 2010), seminiferous tubule (Klein et al., 2010, Hara et al., 2014, and 10.1016/j.stem.2018.11.013), and the skin (10.1016/j.stem.2018.09.005), experiments suggest that the spatial correlation between the fates are nonnegligible and therefore the asymptotic statistics should follow the voter model. I would assume that there needs to be at least a justification in only studying the cases without spatial interactions here, especially if the authors do not know any specific example that nicely corresponds to their model.
We thank the reviewer for this remark. The reviewer notes correctly that in most tissues cellextrinsic regelation via cellcell signalling determines fate choices. This can indeed be represented by a (multitype) voter model – a lattice model where lost cells are compensated by dividing neighbours. However, it has been shown that a voter model in any dimension larger than one has the same asymptotic rescaled clone size distribution as the corresponding branching process without interaction, which is the type of model we use (shown rigorously by Bramson and Griffeath, 1980, and discussed in the context of cell fate dynamics by Klein and Simons, 2011). For mean clone sizes as function of time t, there is no difference for dimensions larger than two (in the two dimensional scenario merely a logarithmic prefactor distinguishes the mean clone size in both models; since this factor is nearly constant for large t, it often cannot be distinguished by experimental data (Klein and Simons, 2011)).
Therefore, while a voter model may be a more realistic depiction of a tissue with spatial cellcell interaction, the asymptotic rescaled clone size distribution of such a model is the same as for a corresponding model without spatial regulation, if considering tissues with cell arrangements of dimensions larger than one. Hence, our results would not be different for a corresponding voter model where spatial regulation is included, for two or threedimensional tissues such as epithelial sheets and volumnar tissues (e.g. liver or stroma). Yet, we agree that for quasi onedimensional arrangements of stem cells, as found in seminiferous tubules and intestinal crypts, the clone size distribution differs, and these are indeed not covered by our model.
In the original version of the manuscript we expressed those limitations after the model definition: "This model is a general multitype branching process, which is suitable to describe cell population dynamics in any dimension larger than one, even under cellextrinsic regulation (Klein and Simons, 2011; Bramson and Griffeath, 1980)", but we acknowledge that this point warrants a clearer explanation. We therefore now include at this point a paragraph that clarifies the validity range of results to be expected from our model. Furthermore, we now discuss this in the Discussion section.
https://doi.org/10.7554/eLife.56532.sa2Article and author information
Author details
Funding
Medical Research Council (MR/R026610/1)
 Philip Greulich
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We thank Benjamin D MacArthur for valuable discussions that contributed in the development of this research. CP is supported by a Studentship of the Institute for Life Sciences (Southampton) and PG by Medical Research Council New Investigator Research Grant MR/R026610/1.
Senior Editor
 Naama Barkai, Weizmann Institute of Science, Israel
Reviewing Editor
 Yukiko M Yamashita, University of Michigan, United States
Publication history
 Received: March 2, 2020
 Accepted: July 3, 2020
 Accepted Manuscript published: July 20, 2020 (version 1)
 Version of Record published: August 24, 2020 (version 2)
Copyright
© 2020, Parigini and Greulich
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 797
 Page views

 160
 Downloads

 1
 Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Computational and Systems Biology
 Genetics and Genomics
Cardiometabolic diseases encompass a range of interrelated conditions that arise from underlying metabolic perturbations precipitated by genetic, environmental, and lifestyle factors. While obesity, dyslipidaemia, smoking, and insulin resistance are major risk factors for cardiometabolic diseases, individuals still present in the absence of such traditional risk factors, making it difficult to determine those at greatest risk of disease. Thus, it is crucial to elucidate the genetic, environmental, and molecular underpinnings to better understand, diagnose, and treat cardiometabolic diseases. Much of this information can be garnered using systems genetics, which takes populationbased approaches to investigate how genetic variance contributes to complex traits. Despite the important advances made by human genomewide association studies (GWAS) in this space, corroboration of these findings has been hampered by limitations including the inability to control environmental influence, limited access to pertinent metabolic tissues, and often, poor classification of diseases or phenotypes. A complementary approach to human GWAS is the utilisation of model systems such as genetically diverse mouse panels to study natural genetic and phenotypic variation in a controlled environment. Here, we review mouse genetic reference panels and the opportunities they provide for the study of cardiometabolic diseases and related traits. We discuss how the postGWAS era has prompted a shift in focus from discovery of novel genetic variants to understanding gene function. Finally, we highlight key advantages and challenges of integrating complementary genetic and multiomics data from human and mouse populations to advance biological discovery.

 Computational and Systems Biology
Predicting the function of a protein from its amino acid sequence is a longstanding challenge in bioinformatics. Traditional approaches use sequence alignment to compare a query sequence either to thousands of models of protein families or to large databases of individual protein sequences. Here we introduce ProteInfer, which instead employs deep convolutional neural networks to directly predict a variety of protein functions – Enzyme Commission (EC) numbers and Gene Ontology (GO) terms – directly from an unaligned amino acid sequence. This approach provides precise predictions which complement alignmentbased methods, and the computational efficiency of a single neural network permits novel and lightweight software interfaces, which we demonstrate with an inbrowser graphical interface for protein function prediction in which all computation is performed on the user’s personal computer with no data uploaded to remote servers. Moreover, these models place fulllength amino acid sequences into a generalised functional space, facilitating downstream analysis and interpretation. To read the interactive version of this paper, please visit https://googleresearch.github.io/proteinfer/.