Universality of clonal dynamics poses fundamental limits to identify stem cell self-renewal strategies

  1. Cristina Parigini
  2. Philip Greulich  Is a corresponding author
  1. School of Mathematical Science, University of Southampton, United Kingdom
  2. Institute for Life Sciences, University of Southampton, United Kingdom

Abstract

How adult stem cells maintain self-renewing tissues is commonly assessed by analysing clonal data from in vivo cell lineage-tracing assays. To identify strategies of stem cell self-renewal requires that different models of stem cell fate choice predict sufficiently different clonal statistics. Here, we show that models of cell fate choice can, in homeostatic tissues, be categorized by exactly two ‘universality classes’, whereby models of the same class predict, under asymptotic conditions, the same clonal statistics. Those classes relate to generalizations of the canonical asymmetric vs. symmetric stem cell self-renewal strategies and are distinguished by a conservation law. This poses both challenges and opportunities to identify stem cell self-renewal strategies: while under asymptotic conditions, self-renewal models of the same universality class cannot be distinguished by clonal data only, models of different classes can be distinguished by simple means.

Introduction

Adult stem cells are the key players for maintaining and renewing biological tissue, due to their ability to persistently produce tissue cells through cell division and differentiation (National Institute of Health, 2009). For maintaining tissues in a homeostatic state, it is crucial that stem cells adopt suitable self-renewal strategies, a pattern of stem cell fate choices that balances proliferation and differentiation; otherwise, imbalanced proliferation may lead to hyperplasia and cancer. Therefore, the understanding and identification of stem cell self-renewal strategies has been a major goal of stem cell biology ever since the discovery of adult stem cells.

Classically, two stem cell self-renewal strategies have been proposed (Potten and Loeffler, 1990; Simons and Clevers, 2011a): following the Invariant Asymmetric division (IA) strategy, stem cells undertake only asymmetric divisions, whose outcome is one differentiating cell and one stem cell as daughter cells. The other proposed strategy, Population Asymmetry (PA) (Potten and Loeffler, 1990; Simons and Clevers, 2011a; Watt and Hogan, 2000; Klein and Simons, 2011), features additionally symmetric divisions, which produce either two stem cells or two differentiating cells as daughters, yet in balanced proportions. Both patterns of cell fate choice leave the number of cells on average unchanged and thus can maintain homeostasis. Assessing stem cell self-renewal strategies experimentally is difficult in vivo, since direct observation of cell divisions is rarely possible. Yet, through genetic cell lineage-tracing assays, the statistics of clones – the progeny of individual cells – can be obtained, and via mathematical modeling assessing cell fate dynamics became possible. With such an approach several studies suggested that population asymmetry prevails in many mouse tissues (e.g. Clayton et al., 2007; Lopez-Garcia et al., 2010; Simons and Clevers, 2011b; Doupé et al., 2012; Klein et al., 2010).

However, the interpretation of those studies has been challenged by a suggested alternative self-renewal strategy, called Dynamic Heterogeneity (DH), featuring some degree of cell fate plasticity (Greulich and Simons, 2016). In this model, all stem cell divisions are asymmetric, yet it is in agreement with the experimental clonal data that had previously been shown to agree also with the population asymmetry strategy. Thus, those two strategies are not distinguishable in view of the clonal data.

This raises the question to what extent different stem cell self-renewal strategies can be distinguished at all via clonal data (Klein and Simons, 2011; Greulich, 2019). Here, we address this question by studying models for stem cell fate choice, which define the self-renewal strategies, in their most generic form. We show that many cell fate models predict, under asymptotic conditions, the same clonal statistics and thus cannot be distinguished via clonal data from cell lineage-tracing experiments. In particular, we find that there exist two particular classes of stem cell self-renewal strategies: one class of models which all generate an Exponential distribution of clone sizes (the number of cells in a clone) after sufficiently large time, and one which generates a Normal distribution under sufficiently fast stem cell proliferation. Crucially, these two classes are not differentiated via the classical definitions of symmetric and asymmetric stem cell divisions, but by whether or not a subset of cells is conserved. These classes thus bear resemblance to 'universality classes' known from statistical physics, as suggested in Klein and Simons, 2011. This leads us to a more generic, and in this context more useful, definition of the terms ‘symmetric’ and ‘asymmetric’ divisions. Notably, however, we find that the conditions for the emergence of universality are not always fulfilled in real tissues, which provides chances, but also further challenges, for the identification of stem cell fate choices in homeostatic tissues.

Strategies for stem cell self-renewal

The two classical stem cell self-renewal strategies, Invariant Asymmetry (IA) and Population Asymmetry (PA) (Potten and Loeffler, 1990; Simons and Clevers, 2011a; Watt and Hogan, 2000; Klein and Simons, 2011), are commonly described in terms of two cell types: stem cells (S) which can self-renew (i.e. divide without reducing their potential to divide in the future); and differentiating cells (D). Both strategies can be expressed in terms of a single parametrized stochastic model, a multi-type branching process (Haccou et al., 2005), defined by the outcomes of cell divisions (the cell fate choices),

(1) Sλ{S+Swith probability rS+Dwith probability 12rD+Dwith probability r,

where cells of type S divide with rate λ. Here, a daughter cell configuration S+S corresponds to symmetric self-renewal division and D+D to symmetric differentiation, while daughter cells of different type, S+D, marks an asymmetric division. In the basic model version, a cell of type D is eventually lost with rate γ, D𝛾 (corresponding to death, shedding, or emigration of D-cells), while other versions may include the possibility of limited proliferation as committed progenitor cells. The two self-renewal strategies, IA and PA, are distinguished by the value of the symmetric division fraction r: the PA model corresponds to any 0<r12; the IA model is defined by r=0, that is, only asymmetric divisions occur.

To maintain homeostasis, the number of cells must stay, on average, constant. Thus cells following the PA strategy must regulate the probabilities of symmetric self-renewal and differentiation to be exactly equal, whereas for the IA model this is trivially assured. However, only for the IA model is the number of stem cells strictly conserved, that is, no gain or loss of stem cells is possible.

A way to assess self-renewal strategies experimentally is via genetic cell-lineage tracing (Kretzschmar and Watt, 2012; Blanpain and Simons, 2013): By marking single cells with an inheritable genetic marker (through a Cre-Lox system [Soriano, 1999; Sauer, 1998]) each cell’s progeny, called a clone, which retain that marker, can be traced. The number of cells per clone, that is the clone size, is measured and the statistical frequency distribution of clone sizes (clone size distribution) determined. To test the cell fate choice models on that data, one evaluates the models with a single cell as initial condition and samples the outcome in terms of the final cell numbers – the size of a virtual clone. In the basic version of the model (i.e. when D𝛾), the IA and PA models predict, respectively, a Poisson and an Exponential clone size distribution for large times (Klein and Simons, 2011; Antal and Krapivsky, 2010) (see also the Appendix, 'Invariant Asymmetry and Population Asymmetry models'). Thus, they are fundamentally different and can easily be distinguished when compared with clonal data. By a series of lineage-tracing experiments it was confirmed that Exponential clone size distributions prevail for most mouse tissues, which thus exclude the IA model and support the PA strategy (Clayton et al., 2007; Lopez-Garcia et al., 2010; Simons and Clevers, 2011b; Doupé et al., 2012; Klein et al., 2010).

While this seemed to settle the case in favour of the PA strategy, at least for most adult mouse tissues, this was challenged by a third type of strategy, the DH model (Greulich and Simons, 2016). Motivated by the emerging view of prevailing cell plasticity (Blanpain and Fuchs, 2014; Tetteh et al., 2015; Tetteh et al., 2016; Donati and Watt, 2015), the DH model considers the possibility of reversible switching between two cell types:

(2) SλS+D,SωSωDD,Dγ.

where symbols at arrows denote the process rates (frequency of events). This strategy is also capable of maintaining a homeostatic population if γ/λ=ωS/ωD. Notably, the DH model only features asymmetric divisions (in that daughter cells are of different type), like the IA model, yet the DH model predicts clonal statistics that are indistinguishable from the PA model (Greulich and Simons, 2016). This means that in view of the existing clonal data for mouse tissues, the DH model, may as well describe the real cell fate dynamics. More fundamentally, this implies that the PA and DH model cannot be distinguished via plain clonal data, which poses fundamental limitations to the common approach to use lineage tracing for determining cell fate choices.

This demonstrates that the classical definition of asymmetric and symmetric divisions is not always suitable to distinguish cell fate strategies in view of clonal data alone. In general, cell fate dynamics may be much more complex than the simplified models described above, as there may exists a plethora of cell (sub-)types in a tissue. However, to what extent would it be possible to distinguish details of potentially rather complex cell fate dynamics models through comparison with clonal data at all? This is only the case if the clonal statistics are sufficiently different. In the following, we study cell fate models in their most generic form, and analyze what clonal statistics would be expected.

Results

Model generalization

Let us consider the dynamics of a generic system of cells, characterized by a number m of possible cell states Xi, i=1,,m. We define a cell state here as a group of cells showing common properties (e.g. any cell sub-type classification). Most generally, cells in a state Xi may be able to divide, producing daughter cells of any cell states Xj and Xk (where i=j=k, that is, simple cell duplication, is possible). Furthermore, any cell state Xi may turn into another state Xj or may be lost (through emigration, shedding, or death). Hence, we can write a generic cell fate model as,

(3)  cell division: XiλirijkXj+Xk
(4)  cell state change: XiωijXj
(5)  cell loss: Xiγi,

where i,j,k=1,,m. In this model, λi is the rate of division of cells in state Xi and the parameter rijk corresponds to the proportion of division outcomes producing daughter cells of state Xj and state Xk; ωij is the transition rate from state Xi to state Xj and γi the loss rate from state Xi.

The dynamics of each cell in Equations 3-5 could depend on the cell environment through spatial, cell-extrinsic regulation of cell fate. However, the clonal statistics of spatial models that include cell-extrinsic regulation of cell fate (models of the voter type [Clifford and Sudbury, 1973]) are, in the long term, the same as for the corresponding branching process models (Haccou et al., 2005), as Equations 3-5 are, except for one-dimensional arrangements of cells (as shown in Klein and Simons, 2011; Bramson and Griffeath, 1980). Here, we are focussing on the long-term clonal statistics of self-renewal strategies, and since this is not affected by cell-extrinsic regulation, for tissues with two-dimensional or three-dimensional arrangements of dividing cells (like epithelial sheets, and volumnar tissue), we wish to keep the analysis simple and therefore choose dynamics (and thus the parameters λi,ωij,rijk,γi) to be independent of the cell environment.

In the following, we study the dynamics of cell numbers in each state Xi, ni. To gain initial insight into those dynamics, let us first consider the time evolution of the mean cell numbers, n¯i=ni, given by,

(6) ddtn¯i=j(λj2rji+ωji)n¯j-(λi+jωij+γi)n¯i.

in which rij=k(rijk+rikj)/2 is the probability of having a daughter cell in state Xj produced upon division of a cell in state Xi. This linear system of differential equations can be written more compactly in terms of the mean cell number vector 𝒏¯=(n¯1,n¯2,,n¯m),

(7) ddt𝒏¯=A𝒏¯,

with A being the m×m matrix

(8) A=(κ11-δ1κ21κ31κ12κ22-δ2κ32κ1mκ2mκmm-δm),

where we defined the total transition rate κij=λi2rij+ωij, combining all transitions from Xi to Xj by cell divisions and direct transitions, and the local loss rate δi=λi+jωij+γi.

Models of the form Equations 3–5 are not generally in homeostasis, which in this context is defined by the existence of a stationary state 𝒏¯*, with d𝒏¯*/dt=0, that is (Lyapunov) stable and non-trivial (for a discussion, see the Appendix 'Conditions for homeostasis'). This can in principle be assessed through the spectral properties of A (Åström and Murray, 2008), but applying spectral conditions explicitly is unwieldy and difficult to interpret biologically. For a more intuitive view, we interpret the system, Equation 7, as a network (graph): the matrix A can be interpreted as the adjacency matrix of the cell state network. This is a weighted directed graph in which cell states correspond to the graph’s nodes and a link from state Xi to Xj exists where a transition is possible, that is, when κij>0. The value of κij also denotes the link weights (diagonal elements of A can be considered as self-links). Now, we note that Equation 7 is linear and cooperative, that is, the off-diagonal elements of matrix A are non-negative, and for such systems more simple and intuitive conditions for homeostasis exist (Greulich et al., 2019), based on a decomposition into the network’s Strongly Connected Component (SCC). An SCC is a sub-graph that groups nodes which are strongly connected, that is, which are mutually connected by paths (more accurately: two nodes, Xi and Xj are strongly connected if there exists a path from Xi to Xj and from Xj to Xi on the network). An example of such a decomposition, which yields an acyclic condensed network that contains SCCs as nodes and directed links between them, is shown in Figure 1.

Illustration of the decomposition of a homeostatic cell state network into SCCs and the compartment representation, Equation 9.

(Left): An example cell state network representing the matrix A in Equation 8 (self-links not displayed). The dashed circles denote the network’s Strongly Connected Components (SCCs ) (see definition in text). (Middle): The Condensed network is the corresponding network of SCCs, Sk, wherein SCCs are the nodes and a link between two SCCs exists if any of their states are connected. For homeostatic networks, an SCC with dominant eigenvalue μ=0 is at the apex, while other SCCs have μ<0. (Right): We distinguish two compartments, the Renewing compartment , consisting of the apex SCC, with μ=0, and the Committed compartment 𝒞 consisting of the remainder, with μ<0.

The stability of systems like Equation 7 is then determined by the dominant eigenvalues μk of each strongly connected component k, for k=1,,mS where mS is the number of SCCs, and their topological arrangement (the Perron-Frobenius theorem assures that for adjacency matrices of SCCs of cooperative systems, a unique, real, maximal eigenvalue exists, which is the dominant eigenvalue [Arrow, 1989; Greulich et al., 2019]). In brief, according to Greulich et al., 2019, the conditions for existence of a homeostatic state are that, at the apex of each lineage (the condensed cell state network), there must be an SCC with dominant eigenvalue μk=0, while all SCCs downstream of the former must have μk<0 (see detailed discussion in the Appendix, 'Conditions for homeostasis'). Given this structure of homeostatic models, we can define two compartments in the cell state transition network: (1) the (self-) Renewing compartment (), which is the SCC at the apex of the lineage tree; and (2) the Committed compartment (𝒞), which consists of all SCCs with μk<0, that is, those downstream of the apex SCC. Importantly, cells in states forming have the potential to return to any state within the same compartment and this population maintains itself. Instead, the cell population in 𝒞 would vanish without external input, since the combined dominant eigenvalue of all those SCCs is negative (it is the maximum of all SCCs’ μk<0), thus the progeny of each cell in the committed compartment will eventually be lost. We can thereby classify cells as being of a (self-)Renewing type (R) if their state is within , and of a Committed type (C) if their state is in 𝒞. With this coarse-grained classification, a generic homeostatic model can be represented in terms of compartments and 𝒞 as,

(9) RλR{R+RwithprobabilityrRRR+Cwithprobability1rRRrCCC+CwithprobabilityrCC,
RωRCC,CλCC+C,CγC,

where the symbols above arrows are the effective rates of those events, denoting the average frequency at which they occur (loss events R are not explicitly included, since they can be approximated by a short lived state Xd in 𝒞, as RXd). To be compatible with a homeostatic condition, it is further required that (i) the R-population remains on average constant (μk=0), that is, λRrRR=λRrCC+ω, and (ii) the loss rate of C must exceed its proliferation rate (μk<0), that is, γC>λC. Figure 1 shows how a generic homeostatic cell state network can be condensed into an effective model of renewing and committed cell states, according to Equation 9. It has to be noted, however, that the events depicted in Equation 9 are not Markovian, that is, the timing of events is not independent from each other and depends on their history. Thus, the ‘rates’ λR, λC, ωRC, and γC are not constant rates in the Markovian sense, yet we can define them by the mean frequency of events occurring (see Appendix 'Approximation of generic GIA models' and 'Asymptotic clone size distributions: mathematical analysis').

The formulation in terms of renewing and committed states can help us to gain insights into potential behaviors of generic homeostatic cell fate models. In particular, we define generalized asymmetric divisions as events of the type RR+C, and generalized symmetric divisions as events of the type RR+R (symmetric renewal) and RC+C (symmetric commitment). With these definitions, we can categorize homeostatic cell fate models into two classes: Generalized Invariant Asymmetry (GIA) models are those which only exhibit RR+C divisions in the renewing compartment, while Generalized Population Asymmetry (GPA) are models for which such restriction does not hold. We note that the two classes are equivalently characterized by a conservation law: For GIA models, the number of cells in is strictly conserved, while for GPA models, no such conservation law holds. Since μ=0 is necessary for conservation, the only possible conserved cell states in homeostasis are those in . Naturally, the previously discussed IA model is a GIA model and the PA model is a GPA model. Notably, the DH model (Equation 2) is of the GPA category, since in that model S and D cells form a single SCC at the apex of the lineage hierarchy, and thus they are both part of . Therefore, a division SS+D in the DH model, which is asymmetric in the conventional sense, corresponds to RR+R in terms of compartments (Equation 9) and thus it is a generalised symmetric division. According to this classification, PA and DH models are both in the same category (GPA), and indeed, both predict the same type of clone size distribution, an Exponential one (Greulich and Simons, 2016).

Numerical simulation of random cell fate models

To check whether the correspondence between model class, GIA vs. GPA, and predicted clonal statistics holds in general, we analyze the clonal dynamics numerically, by generating and testing a large number of random stochastic models, implemented via random generation of the parameters λi, ωij, γi and rijk. To simulate clones, we perform stochastic simulations based on the Gillespie algorithm (Gillespie, 1977), assuming a Markov process following the rules of Equation 3-5. We run, for each model, a large number of simulations with initially one cell in the compartment , thus the cell population of each simulation run represents one clone. Then we sample their outcomes, the total cell numbers per clone (the clone size) n=ini, to obtain predictions for clonal statistics, namely the frequency distribution of clone sizes (clone size distribution) and mean clone sizes (see Materials and methods).

We first study the mean clone size of surviving clones (with n>0), n¯s=n|n>0, shown in Figure 2, respectively, for the GIA and GPA models, as a function of time (the final time τ=20/αmin where αmin is the minimal process rate, αmin=min(λ1,,ω12,,dm)). We note that indeed a common behavior is seen in each case. While for every simulated GIA model, n¯s saturates at a plateau value, it steadily increases for every GPA model. This is expected, and can be understood given that clones in a GPA model can go extinct while those in a GIA model not. Assume that there are initially a large number Nc of clones, such that the total number of cells is ntot=Ncn¯s. Since the system is homeostatic, it will reach a constant steady state ntot* after a sufficient amount of time, meaning that the mean clone size is n¯s=ntot*/Nc. If no clones go extinct, as in GIA models, Nc is constant and thus n¯s approaches a constant. However, in non-conserved multi-type branching processes, as GPA models are, the clone number Nc decreases through progressive extinction of clones (Haccou et al., 2005), and therefore n¯s increases, despite the cell population as a whole staying stationary.

Mean size of surviving clones, n¯s, as a function of time for random GIA models (a), and GPA models (b).

In (a), τ=20/αmin, in (b), τ is the time at 98% clone extinction. The grey shade represents the percentile of all the simulations (black lines limit the 5-95%ile range); the blue curves correspond to some illustrative selected simulations. Simulations for which the final mean is below two and where the final condition is not achieved (due to computational limitations) are not included: this results in 238 and 571 models, respectively for the GIA and GPA cases.

The resulting clone size distributions for the two model classes are shown in Figure 3. Here, clones sizes n are rescaled by the mean value n¯s and compared to an Exponential distribution of unitary mean (green curve). As conjectured, all simulated GPA models shown in panel (b) predict asymptotically the same rescaled clone size distribution, namely a standard Exponential distribution. Deviations exist for small times and small clone sizes, but these deviations vanish in the large time limit (details on the convergence are shown in the Appendix, 'Analysis of the generalized Population Asymmetry model'). This means that different models within the GPA class cannot be distinguished in the long-term limit, since they differ only by the mean clone size, which is a free fit parameter. In analogy to statistical physics, we can categorize them as a universality class (Klein and Simons, 2011), meaning that the details of the model do not affect the (scaled) outcomes for assymptotic conditions, which is a form of weak convergence of random variables (Billingsley, 1968). However, the same cannot be said about the GIA models. In fact, we see all kind of shapes in the clone size distributions, both peaked distributions and non-peaked ones, and in fact, some distributions are even close to an Exponential form, and can thus not be distinguished from GPA models. The question is whether we can yet find other parameters for which, when large, also GIA models exhibit universality, that is, yield the same rescaled clone size distribution. For this purpose, we will in the following sections develop a deeper theoretical understanding of the model classes.

Rescaled clone size distributions (expected relative frequency P of clone sizes) for random GIA models (a), and GPA models (b), in terms of the rescaled clone size x=n/n¯s, at final time t=τ (see Figure 2 for definition).

The grey shade represents the percentile of all the simulations (black lines limit the 5-95%ile range); the blue curves correspond to some selected simulations. A reference curve corresponding to an Exponential distribution of unitary mean (’Exp(1)’) is shown in green.

Mathematical analysis: Markovian approximation of compartment model

To obtain a deeper understanding of the numerical results, we study the cell fate models in terms of the compartment representation, Equation 9. In this representation models are not Markovian, yet we can study their Markovian counterpart, as an approximation. While this is not expected to yield accurate clone size distributions in general, the limiting distributions of non-Markovian processes are commonly well estimated by their Markovian counterparts.

For GIA models, which only feature RR+C transitions between the renewing compartment, 𝒞, and the committed compartment, 𝒞, a corresponding Markovian model reads,

(10) X1λ1X1+X2,X2λ2X2+X2,X2γ,

in which X1 represents a single state in and X2 in 𝒞, and symbols at arrows are the process rates. The number of cells in X1, n1, is conserved, that is, given an single X1-cell initially, it always remains at n1=1. Thus, we only need to consider the dynamics of cells in X2, n2. This Markov process can be solved analytically, and for sufficiently large steady state mean number of X2-cells, n¯2=n2=λ1/(γ-λ2) (see Appendix, 'GIA0 test case: steady state distribution and limiting behavior'), the rescaled distribution of cells in X2 is,

(11) P(x2)=(1-λ^2)λ^1λ^2λ^2λ^1x2(1-λ^2)Γ(λ^1λ^2+λ^11-λ^2x2)x2Γ(λ^1λ^2)Γ(λ^11-λ^2x2),

in which x2=n2/n¯2, λ^1=λ1/γ and λ^2=λ2/γ and Γ() is the Gamma function (Abramowitz and Stegun, 1972). We note that this distribution exhibits a large variety of shapes: for large λ^1 the distribution is peaked, while for small λ^1 is loses its peak. Notably, for λ^11 and λ^21, the distribution becomes Exponential and in this case it cannot be distinguished from the GPA case. On the other hand, for λ^1, that is, when the ratio of asymmetric divisions over the loss rate is high, this distribution tends to a Normal distribution with unitary mean and variance equal to 1/λ^1. These different behaviors are graphically shown in the Appendix (see Appendix 1—figure 6, 7 and 8).

For the GPA models, a Markovian approximation reads, accordingly,

(12) X1λ1{X1+X1withprobabilityr1X1+X2withprobability1r1r2X2+X2withprobabilityr2,
X1ωX2,X2λ2X2+X2,X2γ.

whereby for homeostasis to prevail, λ1r1=λ1r2+ω and λ2<γ must hold. We note that the dynamics of X1 are independent of X2 and thus for the number of cells in X1 in homeostasis holds

(13) n1λ1r1n1n1±1,

which corresponds to a simple continuous-time branching process with two offspring, for which it is known that the resulting distribution of cell numbers is Exponential, that is, P1(n1)=n¯1,s-1e-n1/n¯1,s, where n¯1,sλ1r1t is the mean number of cells in the surviving clones (Haccou et al., 2005).

X2 cells produced according to 12 follow the same fate as in the two-state GIA model above. While it is not assured that the distribution of X2 cells is identical to that of Equation 11 (due to simultaneous production events of type X1X2+X2), we show in the Appendix, 'Asymptotic clone size distributions: mathematical analysis', that for large rates of production of C-cells, the distribution of C-cells – here: cells in state X2 – attains a Normal distribution with mean n¯2 equal to its variance σn22=(n2-n¯2)2=n¯2. As each X1 cell contributes independently to the production of X2-cells, we have that n¯2n1,st. Crucially, this means that in terms of the rescaled variable x2=n2/n¯s the standard deviation σx2=σn2n¯s1n¯2t-1/2 vanishes for large times, since n¯2n1,st. Hence, given fixed x1, x2 can be approximated by a constant random number x2|x1x¯1=n1/n¯s. Therefore, the rescaled distribution of the total number of cells is P(x)=P1(x-x2)=e-x, where x¯=x¯1+x¯2x¯1. Thus, the rescaled distribution of the total clone size, x=n/n¯s, is as well an Exponential.

Universality of generic cell fate models

For generic GIA or GPA models, the compartment representation, Equation 9, is not Markovian and one would not expect exactly the distributions we found in the previous section. Fortunately, the limiting distributions of non-Markovian processes and their Markovian counterparts are often, under certain conditions on the parameters, the same. While we reserve the technical arguments for the Appendix ('Asymptotic clone size distributions: mathematical analysis'), we note that this independence of the limiting distribution on the Markov property related to the central limit theorem, which does not rely on the Markov property.

To identify the correct limiting parameters for more complex cell fate models, we need to express the effective non-Markovian rates (i.e. the mean frequency of events) of representation nine in terms of the original model, 3–5. As discussed in the Appendix ('Approximation of generic GIA models' and 'Asymptotic clone size distributions: Mathematical analysis'), we identify those effective rates by the total rates of cell divisions, λR=iλiPiR, γC=i𝒞γiPiC, and ωRC=i,j𝒞ωijPiR where, for each compartment, PiR,C=n¯i/j,𝒞n¯j is the probability of a single cell being in state Xi of , respectively (n¯i are the solutions to Equation 6). In the Appendix, 'Asymptotic clone size distributions: mathematical analysis', we reason that all GPA models are expected to generate Exponential clone size distributions for large times t. This is indeed what is observed in Figure 3(b). Correspondingly, for GIA models we expect that for large λ^R=λR/γC the clone size distribution of GIA models would tend to a Normal distribution. To test this prediction, we simulated the same GIA models as for Figure 3 before, but we tuned parameters in such that the effective parameter λ^R becomes large (see details in the Appendix, 'GIA model for large λ^R'). The result is shown in Figure 4: for an illustrative case shown in panel (a), increasing λ^R changes the distribution from an exponential form to a peaked form akin to a Normal distribution, and for all simulated random GIA models, shown in panel (b), a Normal distribution is approached when λ^R becomes large.

Rescaled clone size distributions (expected relative frequency P of clone sizes) for random GIA models as in Figure 3, at time t=τ (see definition in Figure 2).

Sensitivity to parameter λ^R is shown for one illustrative case in panel (a), and all GIA models for λ^R=30 in panel (b). The distributions are shown in terms of the rescaled variables x=n/n¯s for panel (a) and x~=(n-n¯s)/σn, where σn is the distributions variance, in panel (b). In (b), the grey shade represents the percentile of all simulations (black lines limit the 5-95%ile range); the blue curves correspond to some selected simulations. A reference curve corresponding to a Normal distribution of zero mean and unitary variance is shown in green. Simulations for which t=τ is not reached (due to computational limitations) are not included, resulting in 922 model instances.

We note that when taking the limit of large λ^R, as shown in Figure 4, also all other process rates ωij with i,j within increased as well. What if instead some process rates in do not scale to become large with λ^R? To assess this situation, we studied a simple test case similar to model 10 but containing two states in , connected via direct state transition (see Appendix, 'GIAB test case: bimodal distribution'). As discussed there, if all rates within are large compared to the rates in 𝒞 then indeed we observe a Normal clone size distribution, as expected. However, if the direct transition rates between the states of are smaller or of equal magnitude as γC, and in addition, one of the two division rates is higher then the other, then we observe a bimodal clone size distribution. The reason is that if the transitions between the two states in are rare compared to the life time of cells, 1/γC, they become essentially separated and each of those states generate separate Normal distributions with different mean (due to different cell division rates in those two states) which, when overlaid, generate a bimodal clone size distribution (see detailed arguments in the Appendix, 'Asymptotic clone size distributions: mathematical analysis').

Finally, from those considerations follows:

  1. GPA models attain an Exponential clone size distribution for time t.

  2. GIA models attain a Normal clone size distribution if all process rates within are much larger than the inverse lifetime of C-cells, γC.

Hence, the GIA and GPA model classes, each represent a universality class, that is, a scaling limit exists in which all models of the same class yield the same rescaled clonal statistics.

Discussion

Our analysis shows that intrinsic limitations exist for identifying strategies of stem cell self-renewal through clonal data from cell lineage-tracing experiments. This is due to different models of cell fate choice generating the same type of clonal statistics (clone size distributions), so that model inference based on clonal statistics – currently still the most prevalent method to determine stem cell self-renewal strategies – fails to distinguish them. The feature that different models asymptotically generate the same statistics is a form of weak convergence of random variables (Billingsley, 1968) and corresponds to universality, as known from statistical physics.

Cell fate models can in principle be very complex, with a plethora of cell (sub-)types in a tissue. We introduced a new categorization of cell types, distinguishing between cell states that are committed (C-cells), whose progeny is inevitably lost eventually, and non-committed or (self-)renewing cell states (R-cells), which retain the potential to remain or return to the apex of the lineage hierarchy. According to this categorization we classified generic models of cell fate choice as Generalized Invariant Asymmetry (GIA), if only generalized asymmetric divisions of the form RR+C occur for R-cells, and Generalized Population Asymmetry (GPA), when all kind of divisions can occur, as long as gain and loss of R-cells are balanced. Models of the GIA category are also characterized by a conservation law, since the number of R-cells is strictly conserved, while GPA models do not exhibit such a conservation law.

We found that the classification in GIA and GPA models mirrors the clonal statistics generated by them: models of the GPA class all generate clonal statistics which with time converge to an Exponential clone size distribution. Thus, two GPA models can therefore not be distinguished through clonal data, once some time has passed after induction of clones. For GIA models, distributions can generally vary, but if the rates of divisions and transitions in the compartment are much larger that the rate of cell loss, the clone size distribution of all those models becomes a Normal distribution. In that case, two GIA models can not be distinguished by the clonal data. While here we do not explicitly consider cell-extrinsic regulation of cell fate, this kind of regulation does not affect long-term clone size distributions, except when cells are arranged one-dimensionally (Klein and Simons, 2011; Bramson and Griffeath, 1980). Thus, our results cover cell dynamics in most renewing tissues, such as epithelial sheets or volumnar organs, but not (quasi-)one-dimensional arrangements of stem cells, as found in the seminiferous tubule, or in intestinal crypts, where clonal statistics may differ. Hence, our analysis shows that models of cell fate choice cannot in general be distinguished with further resolution beyond the R vs. C categorization of cell types. The universality of the model dynamics also shows that effective, simplistic models are often equally accurate to model experimental data, yet with a higher statistical power due to less free parameters.

While at first glance, this analysis seems to discourage efforts to unravel details of cell fate dynamics, room remains in regimes where the limiting conditions for asymptotic distributions are not fulfilled. In particular, if fast cycling committed progenitor cells are present, while stem cells are slow cycling, then the condition that the division rate of R-cells is much larger than the cell loss rate is not fulfilled. In that case, details of the model dynamics may affect the shape of the clone size distribution and thus allow distinction between models. However, caution should be given when an Exponential clone size distribution is observed, since this could indicate either a GIA model with high activity of committed progenitor cells, or a GPA model. In that case, the mean clone size needs to be consulted to distinguish models (see Figure 2). Differentiating between models within the GPA category is more difficult, since the predicted statistics from different models always become more similar over time. Short-term measurements would in principle allow such a distinction, but since in reality the underlying processes are not truly Markovian (as assumed for the modeling purpose) they are not necessarily a good representation of the real cell dynamics at short times. At long times, however, Markovian approximations are increasingly accurate, precisely because of the feature of universality.

How could the resolution of cell fate modeling be improved? The state-of-the-art approach to determine cell fate trajectories is via analysis and modeling of single-cell RNA-sequencing (scRNA-seq) data. However, many limitations to this method exist, discussed in Weinreb et al., 2018, and neither reversible trajectories nor the modes of cell division, such as asymmetric vs symmetric divisions, can be inferred. Intravital live imaging, on the other hand, allows to trace individual clones over time (Ritsma et al., 2012; Pittet and Weissleder, 2011; Hara et al., 2014; Rompolas et al., 2016), and thus can obtain details of cell fate trajectories, yet this technique is limited to few tissue types which are accessible for invasive long-term imaging. Nonetheless, while each of those experimental assays alone is prone to limitations in defining self-renewal strategies, advanced model inference schemes, that integrate data from different experimental sources, might be the way forward in the future to finally reveal the details of stem cell self-renewal strategies.

Materials and methods

The numerical analysis of the random cell fate model was implemented in Matlab. The description of the stochastic models definition, the random model generation and the simulation campaign is detailed in the Appendix, 'Stochastic process modelling'. Additionally, as a validation of the implemented simulator, based on the Gillespie algorithm (Gillespie, 1977), the IA and PA models were simulated and the results analyzed in the Appendix, 'Invariant Asymmetry and Population Asymmetry models'.

Analytical solutions were partially obtained using Mathematica.

Appendix 1

Conditions for homeostasis

Here, we ‘translate’ the generic conditions for the existence of a Lyapunov stable stationary state for Linear Cooperative Systems (LCS) (Greulich et al., 2019) into the biological context of clonal dynamics. A linear cooperative system is one of the form ddt𝒙(t)=A𝒙(t) where 𝒙(t)=(x1(t),x2(t),,xm(t)) are functions of time t and A is a constant m × m matrix for which all off-diagonal elements are non-negative (the latter condition defines the cooperativity of the system) (Hirsch and Smith, 2006; Greulich et al., 2019). We note that the dynamics of mean cell numbers, Equations 6 and 7 in the main text, indeed describe an LCS according to this definition. Now we use the following definitions:

  • G(A) is the graph of A, that is, the graph for which A is the adjacency matrix, whose elements aij give the weight of the links from i to j (aij=0 means that no link exists). In the following, we use the terms graph and network synonymously.

  • If in G(A) there exists a path from node i to node j and from j to i, then we call those nodes strongly connected, ij, which is an equivalence relation. A maximal set of nodes which are are strongly connected with each other are called a Strongly Connected Component (SCC) of the graph (the equivalence class of the equivalence relation ‘≡').

  • The graph G(A) can be decomposed into its NS SCCs, Sk, k=1,,NS (Cormen, 2009), which are sub-graphs associated with an adjacency matrix Ak, such that G(Ak)=Sk. Since the Ak have non-negative off-diagonal elements, they are Metzler matrices for which the Perron-Frobenius theorem ensures that a unique, simple and real maximal eigenvalue μk exists (Arrow, 1989). The eigenvalue μk is called the dominant eigenvalue of Sk. Associated with this eigenvalue, there is, for all k, a positive eigenvector 𝒙(k)=(x1(k),x2(k),), that is, one with all entries xi(k)>0.

  • The condensed graph of G(A) is the graph where nodes are the SCCs of G(A) and a link from SCC Sk to SCC Sl (k,l=1,,NS) exists if there is is at least one link from a node (in G(A)) in Sk to a node in Sl.

  • If there is a path from SCC Sk to SCC Sl, then we call Sk upstream of Sl and accordingly Sl downstream of Sk. We note that there can never exist paths from Sk to Sl and from Sl to Sk, since otherwise, by definition, their nodes would be strongly connected and both together would form a single SCC (Cormen, 2009). Thus, there is a unique hierarchy of SCCs.

  • A stationary state 𝒙* of a dynamical system is Lyapunov stable if a small initial deviation from 𝒙* leads to a small final deviation x(t) (i.e. 𝒙* is not unstable). More accurately: there exists a constant C>0 such that |𝒙(t)-𝒙*|<C|𝒙0-𝒙*| for all times t, where 𝒙0=𝒙(t=t0) is the initial condition, sufficiently close to 𝒙*. A stationary state of a linear system that is Lyapunov stable, yet neither asymptotically stable nor has a limit cycle, is neutrally stable.

  • Homeostasis means that the cell numbers in each state, 𝒏=(n1,,nm), stay on average constant, d𝒏¯dt=0 (where 𝒏¯=𝒏), and that this state is not unstable towards perturbations. This condition corresponds to a Lyapunov-stable stationary state. Note that a linear system, as the one described by Equations 6 and 7, main text, cannot have an asymptotically stable state except for the trivial state 𝒏¯*=0, which corresponds to a vanishing cell population. We note that when considering the tissue cell population as a whole, dynamics can be non-linear through interactions between cells and a non-vanishing asymptotically stable state may then exist. However, since single clones do not significantly affect the total configuration of cells in a tissue, the clones compete neutrally, when embedded in a homeostatic cell population, which corresponds to a Lyapunov stable, but not asymptotically stable state. We therefore use Lyapunov stability, a weaker form of stability, to define homeostasis, since an asymptotically stable vanishing state is not a biologically viable state.

Now, for an LCS holds, according to Greulich et al., 2019,

Theorem 1

An LCS, 𝐱˙=A𝐱, possesses a non-trivial Lyapunov stable stationary state (𝐱*>0), if and only if,

  1. G(A) does not contain any SCC, Sk, with μk>0.

  2. There is at least one SCC, Sk, with μk=0.

  3. There is no path between any two SCCs, Sk and Sl, which have μk=0 and μl=0.

Furthermore holds,

Theorem 2

All nodes i upstream of an SCC Sl with μl=0 must be empty in the the stationary state, that is, xi*=0, if i is upstream of the SCC Sl.

Since Equation 7, main text, is an LCS, we can apply theorems 1 and 2 to find conditions for homeostasis, defined by a Lyapunov-stable configuration of mean cell numbers 𝒏¯*=(n¯1,n¯2,). According to theorem 1 at least one SCC with μk=0 must then exist, and according to theorem 2 the stationary state of nodes upstream of it must be empty, that is, they do not exist in homeostasis. Since the condensed graph of the SCCs does not have cyclic paths, an SCC Sk with μk=0 must therefore always reside at the apex of all non-vanishing cell types. In principle, an acyclic graph may have more than one apex, however, since, by definition, a stem cell clone always starts with a single stem cell, and no other SCC with μ=0 may be downstream of the latter, we only consider one apex SCC with one initial cell when studying clonal dynamics.

Hence, in the context of homeostatic clonal dynamics, we can assume that there is a single SCC, Sk with μk=0 at the apex of the cell state graph, while all other SCCs, Sl are downstream of it and have μl<0. Since there are no paths from the non-apex SCC to the apex SCC (as the condensed graph is acyclic) we can distinguish the two separate compartments (the renewing compartment) consisting of all nodes of the apex SCC, Sk, and 𝒞 (the committed compartment), consisting of all other nodes, whereby due to μl<0 for all SCCs in 𝒞, all progeny of cells in 𝒞 will vanish in the long term.

Stochastic process modelling

Model description

Since clonal dynamics start, by definition, with a single cell, we use stochastic dynamics to model clones. Thus, we model cell fate dynamics as a continuous-time multi-type branching process (Haccou et al., 2005), a Markov process following the rules of Equations 3-5, main text. As shown later, without losing generality, here only two types of events are modeled; considering an arbitrary number m of cell states, Xi, for i=1,m, the model includes

  • Cell divisions: a cell in state Xi divides in two cells with rate λi, respectively in state Xj and Xk at a ratio rijk.

    (1) XiλirijkXj+Xk,i,j,k=1,...,m,
  • where λi=0 if state Xi does not allow division. In this formulation of cell division events, which we use for the generation and numerical simulations of random models, only one division outcome is possible upon division of a particular cell state Xi. Nonetheless, multiple division outcomes per state can be implemented as single outcomes if additional metastates are introduced, which represent priming of a state Xi towards a certain division outcome option. For example, if in the original model, state Xi has different outcome options, Xj1+Xk1,Xj2+Xk2,, we can substitute this by, first, transitions from Xi to (new) states Xm1,Xm2, and subsequent divisions XmlXjl+Xkl. The use of metastates to model more complex processes is discussed in detail in 'Population Asymmetry model using metastates'.

  • Direct state transitions: a cell in state Xi changes to state Xj at a given rate ωij.

    (2) XiωijXj, i,j=1,,m;ij,
  • where ωij=0 means that no transition from Xi to Xj is possible. Additionally, we include cell loss in this scheme, by treating it as a transition to an additional special state, called hereafter death and denoted by (cells in this state do not enter in the counting of the total number of cells). In that formulation, the loss rates of the original model are di=ωi.

These events define a Markov process, which can be represented as a stochastic network (Bang-Jensen and Gutin, 2007). In this view, each node can be related to a cell state, while the links represent transitions between states via cell divisions and the direct state transitions. It is noted that this stochastic network is different from the network defined in the main text and in 'Conditions for homeostasis' of this SI, which describes the dynamics of mean cell number instead. Here, for the stochastic modelling, let us define the adjacency matrix K of this network, through the elements κij=λi2rij+ωiji,j=1,,m, in which κij are the total transition rates as defined in the main text. We note that K is related to the matrix A used in the main text by A=KT-Δ, where Δ is the diagonal matrix with entries δi,i=1,,m, as defined in the main text, with the slight difference that here the loss state is treated as a separate state. Additionally, it is remarked that in this model interpretation, where only one division option for each state is possible, the term rij1 is not a continuum value, but instead it can only take the values 0,1/2,1 depending on the specific outcome of the division of the cells in state Xi. Notably, more than one stochastic network may result in the same matrix K, therefore, to uniquely define a process, we distinguish a matrix D which describes cell division events (note that this is possible with just a single matrix as there is only one division option per state) and a matrix T which describes direct transition events. The matrix K is the sum of both, K=N+T.

Generation of random models

To test the behavior of the clonal dynamics in a generic homeostatic model, a large number of random stochastic networks was generated, whereby each stochastic network corresponds to a distinct set of parameters λ1,,λm,ω12,,ωm for the stochastic stem cell fate choice model. The strategy detailed below is based on the following considerations which summarize the key requirements to achieve homeostasis detailed in 'Conditions for homeostasis': (a) each network is composed of Strongly Connected Components (SCCs) that are randomly connected; (b) only one SCC, the one at the apex of the network, forms the renewing compartment, , (i.e. it is characterized by a dominant eigenvalue μ=0 with respect to A) and all the others form the committed compartment, 𝒞, (i.e. they are characterized by a dominant eigenvalues μ<0). It is further noted that the SCCs of the stochastic network G(K) are the same as those of the matrix G(A), where A=KT-Δ defines the dynamics of mean cell numbers. This is, since transposition of an adjacency matrix and altering of diagonal elements does not affect the network topology.

To generate the stochastic network, a two-step process is followed: (1) a large number of (random) SCCs are generated; (2) a condensed network is randomly constructed and filled with randomly picked SCC from step 1.

It is noted that unitary rates are assumed in step (1) and they are successively randomly modified in step (2) to achieve the desired properties of the dominant eigenvalue μ while ensuring randomness.

Focusing now on step (1), that is, the generation of single SCCs, the following procedure is used.

  1. The total number of states composing the SCC is defined, indicated as mS. An additional state is added to represent whatever is outside the SCC. In the current analysis, we set 1mS4.

  2. We build separately all the possible combinations of transition and division matrices, indicated hereafter with MT and MD, respectively. These matrices are ordered for increasing number of transitions NT and divisions ND. In case GIA networks are generated, the MD and MT combinations are filtered, to remain just with those where the division outcome is one cell inside the SCC and one outside the SCC, and where there are only transitions between states within the SCC (i.e. where cell numbers are conserved). From a computational point of view, this process is feasible up to mS=4.

  3. The matrices stored in MD and MT are then combined together to form a model (which is completely defined by one matrix in MD and one in MT); MDT indicates the pool of possible models. This process is done considering separately each mS, NT and ND. In this step, due to technical limitations given by the high number of possible combinations, if the total number of combinations exceed 5104 then only 104 random matrices from MD and MT are combined.

  4. Each model in MDT is then processed to check if the corresponding network is a SCC in the first mS states. If not, then this model is discarded. In case GPA networks are generated, a further check is performed to discard also those models consistent with a GIA network (they cannot be a priori excluded as done in point 2 for the GIA ones). These pools of models are indicated as MGIA and MGPA for the GIA and GPA models, respectively.

  5. For each SCC in MGIA and MGPA, the dominant eigenvalue μ is estimated. For construction, the generated GIA networks are all characterized by μ=0, while in general any value can be obtained within MGPA.

  6. The SCCs in MGPA are additionally processed to check whether the network is compatible with homeostasis by tuning the rates. Networks satisfying this condition are additionally stored under a new pool of SCCs, called MGPA*. If not, then they are discarded when μ>0 (i.e. for any combination of rates the number of cells in these networks is expected to grow).

This process results in three pools of SCCs classified for mS, NT and ND (i.e. number of states, transitions and divisions): (1) MGIA contains GIA models; (2) MGPA* contains GPA models that can be tuned to have μ=0 and (3) MGPA contains GPA models characterized by μ<0 or that can be tuned to meet this condition.

In step (2), the generation of random networks starting from the individual SCCs is implemented as follows.

  1. A number of committed SCCs, Nc, between 1 and 3 is randomly chosen.

  2. Nc SCCs are randomly picked from the pool of models MGPA. The selection is done considering equal probability in mS, NT and ND. For each SCC, the unitary rates α (where α stands for any rate λi or ωij) are modified by multiplying them for random numbers (exponentially distributed with mean α¯=1 and minimum αm=0.3). Additionally, a threshold on the dominant eigenvalue is set, μmax=-1; if this condition is not satisfied, then the rates are tuned to meet this requirement while maintaining the rates above the minimum.

  3. The committed compartment of the condensed network is generated by randomly connecting all the outgoing components of the k-SCC with states in the l-SCC for l=k+1,..,Nc. In this way, the transposed adjacency matrix of the stochastic network has triangular block form:

    (3) KT=[B1C12B20...C1,NcC2,NcBNcC1C2CNc,0].

    The last SCC is forced to be linked to a single death state.

  4. With a similar procedure described in point 2, two SCCs are randomly picked respectively from the pool of SCCs in MGPA* and MGIA; the unitary rates are modified (exponentially distributed with mean α¯=1 and minimum αm=0.3) and, in the GPA case, tuned to meet the condition μ=0. They represent the renewing part of the network.

  5. Two networks (one for the GIA and one for the GPA models) are produced by attaching the selected renewing network upstream the committed one; this is done based on an analogous procedure as described in step 3.

At the end of this process, we have two networks which are different in just the renewing part, being one consistent with the GIA model and the other with the GPA one. In total 2000 networks were built and analyzed.

Simulation campaign

An extensive simulation campaign was run to model the clone dynamics. The code implemented to numerically simulate the stochastic process defined by events of type 1 and 2 is based on the Gillespie algorithm (Gillespie, 1977). Since a clone is by definition the progeny of a single cell, we choose as initial condition a single cell put randomly in a state within . Concerning the final condition, given the substantial difference in the dynamics in the two models, the final time, indicated by τ, is set equal to 20 times the inverse of the minimum process rate, αmin=min(λ1,,λm,ω12,,ωm,), in the GIA models, and to the time at which the fraction of extinct clones reaches 98% in the GPA models. Note that all critical branching processes, as homeostatic clonal dynamics are, will go extinct almost surely at some point in time (Haccou et al., 2005).

To determine the clone size distribution, 103 and 5104 simulations were run respectively in for each GIA and GPA model (in this way, both models result in the same final number of clones when 98% extinction is taken into account).

Numerical simulation test cases

Invariant Asymmetry and Population Asymmetry models

To validate the simulation approach, we tested the procedure on simple cell fate models for which analytical results are known, the Invariant Asymmetry (IA) and Population Asymmetry (PA) models. As described in the main text, in the simplest version, these are defined as,

(4) Sλ{S+SPr. rS+DPr. 12rD+DPr. r,Dγ.

In these processes, cells of type S represent the stem cells (called hereafter also progenitor), which divide with stochastic rate λ, and cells of type D are the differentiated cells, which are shed with rate γ. While in the PA model the three possible outcomes of the division of a progenitor are controlled by a probability parameter 0<r1/2, in the IA model r = 0, meaning that there are strictly asymmetric division and the number of S-cells is conserved. It is remarked that in the definition of the stochastic networks given in 'Model description' only one division option for each state is modelled; however, the code implemented for the numerical simulations of the stochastic process allows for an arbitrary number of division options for each state as well (see 'Population Asymmetry model using metastates').

Considering the dynamics at tissue level, the system of ODEs describing the average number of cell n¯S and n¯D respectively of type S and D is,

(5) {dn¯Sdt=0dn¯Ddt=λ;n¯S-γn¯D.

It is clear that, on average, the number of S-cells remains constant. Additionally, in homeostasis, the average total number of D-cells stabilizes around a constant value n¯D*=(λ/γ)n¯S that uniquely depends on the number of stem cells, n¯S which equals the initial number of stem cells n¯S,0=n¯S(t=0), Thus, the (Lyapunov stable) stationary state of total cell numbers n¯=n¯S+n¯D is given by,

(6) n¯*=(1+λγ)n¯S,0.

Based on Equation 6, the process rates λ and γ determine the proportion of cells of type D with respect to cells of type S. Importantly, there is no difference at tissue level between the IA and PA models.

A distinction is instead evident when we look at the dynamics at the single-cell level, and study the clone size distribution, that is, the distribution of the progeny of a single cell. For the IA model, the number of S-cells is strictly constant, and thus the joint probability distribution P(nS,nD) of both S-cells and D-cells, respectively indicated as nS and nD, is fully determined by the distribution of D-cells, P(nD). The IA model’s master equation for P(nD), considering a single initial cell of type S, is given by,

(7) dP(nD)dt=λP(nD-1)+γ(nD+1)P(nD+1)-(λ+γnD)P(nD).

This corresponds to a simple birth-and-death process for which the distribution is Poissonian with mean λ/γ, (Van Kampen, 1981).

Considering now the PA model, the master equation is instead given by,

(8) dP(nS,nD)dt=λ(r(nS1)P(nS1,nD)+(12r)nSP(nS,nD1)+r(nS+1)P(nS+1,nD2))+γ(nD+1)P(nS,nD+1)(λnS+γnD)P(nS,nD).

In Antal and Krapivsky, 2010, an exact result for the distribution of total cell numbers n=nS+nD is found when λ=γ and r=1/4. For different values of the process parameters, the long-term distribution is shown to be Exponential.

Numerical simulations for the clonal dynamics were run, considering the above models and three different sets of test parameters each, indicated as IA# and PA#i for i=1,2,3, which are reported in Appendix 1—table 1. It is noted that the time unit is arbitrary and therefore omitted. Simulations are based on 104 and 5104 runs respectively for the IA and PA test cases. The initial condition is a single stem cell and the final simulation time, indicated as τ, is equal to 10: this value is well representative of a steady state condition (for the IA test cases) and at which the total extinction of the process is not yet achieved (for PA test cases only). The clone size distribution at τ in the IA test cases is shown in Appendix 1—figure 1: in this figure, each profile is compared to the corresponding Poisson distribution shifted by one (i.e. plus the stem cell). Concerning the results for the PA test cases, they are shown in Appendix 1—figure 2. In this case, the profiles are compared to the numerical integration of the master Equation 8. Additionally, for the PA# test case, where λ=γ and r=1/4, the reference analytic solution provided in Antal and Krapivsky, 2010 is also shown. In general, a good agreement is obtained in all of the cases.

Appendix 1—figure 1
Invariant Asymmetry (IA) test cases clone size distribution P(n), that is the distribution of the total number of cells n forming the progeny of a single initial cell in .

For each case, the distribution is shown at τ (defined in Figure 2, main text), which is well representative of the steady state condition. Tested parameters for cases IA#1-3 are provided in Appendix 1—table 1; the numerical simulation results are compared to the expected Poisson distribution. The detailed discussion is reported in 'Invariant Asymmetry and Population Asymmetry models'.

Appendix 1—figure 2
Population Asymmetry (PA) test cases clone size distribution P(n), that is the distribution of the total number of cells n forming the progeny of a single initial stem cell.

For each case, the distribution is shown at the final time τ, at which the total extinction of the process is not yet achieved. Tested parameters for cases PA#-3 are provided in Appendix 1—table 1; the numerical simulation results are compared to the solution of the numerical integration of the master Equation 8 and, for test case PA#1, also to the reference analytic solution from Antal and Krapivsky, 2010. The detailed discussion is reported in 'Invariant Asymmetry and Population Asymmetry models'.

Population Asymmetry model using metastates

As argued before, we assume in the random model generation that cell division in state Xi has a unique outcome, XiXj+Xk (Equation 1), since thereby the stochastic process can be uniquely defined by the two matrices D and T. To accommodate for the possibility of different division outcomes from the same state Xi, as in Equation 4 and Equations 3-5 in the main text, we introduce metastates, which represent short-lived states that indicate priming for either outcome, from which the cell division outcomes are unique. This is a small modification of the original model, which, however, does not lead to significant deviations if the metastates are traversed sufficiently quickly (which can be assured by a choice of high direct state transition rates in the metastates).

To illustrate this, let us consider the PA model described by 4; instead of having three different outcomes upon division of an S-cell we define the corresponding Metastate (MS) model with three primed states, M1,2,3, as,

(9) Sω1M1M1λ1S+S,Sω2M2M2λ2S+D,Sω3M3M3λ3D+D,Dγ,

 in which S and D correspond to the same cell type of the PA model (i.e. the stem and the differentiated cells, respectively), while Mi, for i=1,2,3, represent the metastates. These states are temporary states that are used to model each one of the three different possible division options of the S-cells. The rates λi and ωi, for i=1,2,3, are chosen such that the time scales of division and outcome probabilities are the same as in the original PA model:

(10) ω1/ω2=r/(1-2r), ω2/ω3=(1-2r)/r,
(11) 1(1/ω1+1/λ1)=λr, 1(1/ω2+1/λ2)=λ(1-2r), 1(1/ω3+1/λ3)=λr.

Equations 10 assure that outcome probabilities are the same as in the original model, while Equations 11 are needed to have the same total average time between two consecutive events. As there are six unknowns and only five relations, the following additional equation is added

(12) λ1=ω1Δ,

in which Δ is an additional parameter that is used to control how fast cells in metastate M1 divide. Low values of Δ imply that as soon as an S-cell transits to the metastate M1, it divides in two S-cells. Globally, this results in

(13) ω1=ω3=λr(Δ+1)/Δω2=λ(12r)(Δ+1)/Δλi=ωiΔ for i=1,2,3.

Numerical simulations for the two models were run and compared, based on the parameters reported in Appendix 1—table 1, and specifically the PA#1 and PA#3 test cases. The time unit, which is arbitrary, is omitted. The process rates for the corresponding MS model, which are indicated in the figures as MS#1 and MS#3, are computed based on Equation 13 and Δ=1/500. As well as for the PA test cases, the initial condition is one cell of type S and the final time, τ, is equal to 10; simulations are based on 5104 trajectories.

Appendix 1—table 1
IA and PA test cases simulation parameters (see 'Invariant Asymmetry and Population Asymmetry models').
Caseλγr
IA#11.01.0-
IA#22.01.0-
IA#35.01.0-
PA#11.01.01/4
PA#22.01.01/4
PA#32.01.01/6

The mean number of cells in the surviving clones and the extinction probability as function of time (scaled by τ) are shown in Appendix 1—figure 3. The clone size distribution at τ is shown in Appendix 1—figure 4. Both MS simulations agree very well with the corresponding PA ones, which justifies the use of metastates for our simulation campaign.

Appendix 1—figure 3
Metastate (MS) test cases simulation results in terms of mean number of cells in the surviving clones n¯s and extinction probability P(n=0) as function of time (scaled by the final simulation time τ).

As well as for the PA test cases, at τ the total extinction of the process is not yet achieved. Profiles from the numerical simulation for cases MS#,3 are compared to the corresponding PA#1,3 test cases which are based on parameters provided in Appendix 1—table 1. The detailed discussion is reported in 'Population Asymmetry model using metastates'.

Appendix 1—figure 4
Metastate (MS) test cases simulation results in terms clone size distribution P(n), that is the distribution of the total number of cells n forming the progeny of a single initial stem cell.

As well as for the PA test cases, the distribution is shown at the final time, τ, at which the total extinction of the process is not yet achieved. Profiles from the numerical simulation for cases MS#,3 are compared to the corresponding PA#1,3 test cases which are based on parameters provided in Appendix 1—table 1. The detailed discussion is reported in 'Population Asymmetry model using metastates'.

Analysis of the Generalized Invariant Asymmetry model

GIA0 test case: Steady state distribution and limiting behavior

A simple Generalized Invariant Asymmetric model, indicated hereafter as GIA0, was analyzed to identify the causes of the different clone size distribution behaviors observed in the randomly generated models (see main text). Thus, in this section, we study the Markov process defined by,

(14) X1λ1X1+X2,X2λ2X2+X2,X2γ.

Here, the renewing compartment is composed of just a single state X1 and cells in this state asymmetrically divide with rate λ1. The committed compartment is formed of state X2; cells in this state can either divide to duplicate, with rate λ2, or die, with rate γ. It is noted that for λ2=0, this model is reduced to the previously analyzed Invariant Asymmetric (IA) model (see 'Invariant Asymmetry and Population Asymmetry models').

As for the IA model, here the number of cells in state X1, indicated as n1, is conserved. It is therefore sufficient to determine the statistics of n2, defined by the master equation for P(n2), the probability of having n2 cells in state X2, provided that there are n1 cells in state X1. The master equation is given by,

(15) dP(n2)dt=(λ1n1+λ2n2+γn2)P(n2)+(λ1n1+λ2(n21))P(n21)+γ(n2+1)P(n2+1),

also written as,

(16) dP(n2)dt=(g(n2)+r(n2))P(n2)+g(n21)P(n21)+r(n2+1)P(n2+1),

in which r(n2)=γn2 and g(n2)=λ1n1+λ2n2. Considering that we are interested in clonal dynamics, meaning that we start from a single stem cell, n1 is equal to one.

In this simple case, the steady state distribution P*(n2), corresponding to the solution of dP(n2)/dt=0, can be analytically derived. Defining the net flux between states n2 and n2-1 as

(17) In2=r(n2)P*(n2)-g(n2-1)P*(n2-1),

and considering that In2+1=In2 for every n2, it follows that In2=I0=r(0)P*(0)-g(-1)P*(-1)=0, which means that

(18) P*(n2)=g(n2-1)r(n2)P*(n2-1)=l=0n2-1g(l)r(l+1)P*(0),

where P*(0) is the steady state probability of having 0 cells in state X2. Finally, by applying the conservation of the total probability, n2=0P*(n2)=1, and rearranging the terms we obtain,

(19) P*(n2)=(1-λ2γ)λ1/λ2(λ2γ)n2Γ(λ1λ2+n2)Γ(n2+1)Γ(λ1λ2).

In the main text, we defined the dimensionless parameters λ^1=λ1/γ and λ^2=λ2/γ, representing the rescaled division rates for cells in state X1 and X2, respectively. For clarity and readability, in this section, we simplify the notation using p=λ^1 and q=λ^2. Equation 19 is then rewritten as,

(20) P*(n2)=(1-q)p/qqn2Γ(pq+n2)Γ(n2+1)Γ(pq).

It is noted that while p varies between 0 and , q is defined between 0 and 1.

The mean number of cells in each state, indicated respectively as n¯1 and n¯2, satisfies the system of ODEs

(21) {dn¯1dt=0dn¯2dt=λ1n¯1+(λ2γ)n¯2.

Based on this, the steady state average number of cells is

(22) {n¯1*=1n¯2*=λ1γ-λ2=p1-q.

When the mean number of cells in state X2 is sufficiently large, that is, for large p or in case q is close to one, the discrete distribution given by Equation 20, can be approximated by a continuous probability density function P*(x2), given by,

(23) P*(x2)=(1-q)p/qqpx2/(1-q)Γ(pq+p1-qx2)x2Γ(pq)Γ(p1-qx2),

in which x2=n2/n¯2*. We note that Equation 23 corresponds to Equation 11 in the main text.

To better understand the distribution for different values of the parameters p and q, the limit behavior are analyzed below.

1. 𝐪 (i.e. λ^20)

When q0, Equation 20 can be simplified considering that

(24) limq0Γ(pq+n2)Γ(pq)(qp)n2=1,
(25) limq0(1-q)p/q=e-p

and

(26) Γ(n2+1)=n2!.

Thus, the distribution results in

(27) limq0P(n2)=pn2epn2!= Poisson(p),

that is a Poisson distribution with mean equal to p. This agrees with what we were expecting considering that when q=0 the model is reduced to the IA model for which the distribution in n2 is known to be poissonian.

Additionally, for large mean number of cells, which are obtained for large p (when q=0, then n¯2*=p), the Poisson distribution tends to a Normal distribution with mean and variance equal to p. Therefore,

(28) lim(q,p)(0,)P(n2)=12πpe(n2p)22p=Normal(p,p).

Rescaling the distribution, and considering x2=n2/n¯2*, results in

(29) lim(q,p)(0,)P(x2)=Normal(1,1/p),

that is a Normal distribution with unitary mean and variance equal to 1/p.

2. 𝐪𝟏 (i.e. λ^21)

For q1 the steady state mean number of cells n¯2* and Equation 23 holds. This equation can be rewritten as,

(30) P(x2)=qp/(1q)x2+1(1q)p/qq(x21)+1Γ(pq(x21)+1q(1q)+1)Γ(pq)Γ(p1qx2+1).

If the Stirling’s approximation is applied

(31) Γ(z+1)=2πz(ze)z,

we obtain,

(32) P(x2)=pp/qep/qq(q2p)/(2q)(q(x21)+1)p/(1q)(x21+1/q)1/2Γ(pq)x2x2p/(1q)+1/2.

Considering now that

(33) limq1(q(x2-1)+1)p/(1-q)(x2-1+1/q)-1/2x2x2p/(1-q)+1/2=ep(1-x2)x2p-1,

it follows that

(34) limq1P(x2)=ppΓ(p)x2p1epx2=Gamma(p,1/p),

that is a Gamma distribution with unitary mean and shape parameter given by p. Importantly, the Gamma distribution for p tends to a Normal distribution with unitary mean and variance 1/p. For p=1, it corresponds instead to an Exponential distribution with unitary mean.

3. 𝐩 (i.e. λ^1)

When p is large, the mean number of cells is large for any value of q. Thus, Equation 32 is valid. By applying the Stirling’s approximation also to the term Γ(p/q), we obtain,

(35) P*(x2)=p2πx2-p/(1-q)x2-1/2(q(x2-1)+1)p/(1-q)(x2-1+1/q)-1/2.

This expression can be also rewritten as,

(36) P*(x2)=p2πep/(1-q)((x2-1+1/q)log(q(x2-1)+1)-x2log(x2))-1/2(log(x2)+log(q(x2-1)+1)).

Considering now that p is large, then 1/2(log(x2)+log(q(x21)+1))p/(1q)((x21+1/q)log(q(x21)+1)x2log(x2)), so the term on the right can be neglected. Additionally, for x21 the following expansions can be applied:

(37) log(q(x2-1)+1)=k=1((-1)k+1(q(x2-1))kk),

and

(38) log(x2)=k=1((-1)k+1(x2-1)kk).

Finally, if we consider that

(39) (x2-1+1q)k=1((-1)k+1(q(x2-1))kk)-x2k=1((-1)k+1(x2-1)kk)(x2-1)2=-12(1-q),

then Equation 36 results in

(40) limpP(x2)p2πe1/2p(x21)2=Normal(1,1/p),

that is a Normal distribution with unitary mean and variance equal to 1/p.

Importantly, it is noted that the limiting behavior of P*(x2) for q0 and q1 in case of large p, are both consistent with the results obtained for p and any q. In other words, remembering that p=λ^1 and q=λ^2, the steady state distribution for λ^1 and any value of λ^2 is a Normal distribution of unitary mean and variance equal to 1/λ^1.

To globally verify these results, numerical simulations of the stochastic process associated with model 14 for different values of λ^1 and λ^2 were run. The following curves were compared:

  • Stochastic simulation: distribution at the final simulation time, τ, of the number of cells in state X2. The final time was chosen here as τ=20/αmin, where αmin=min(λ1,λ2,γ); this value is well representative of a steady state condition. Furthermore, the process rates considered are based on a unitary γ (i.e. λ1=λ^1, λ2=λ^2 and γ=1). It is noted that the time unit is arbitrary and therefore omitted.

  • Analytic distribution: based on Equations 20, for low mean values, and 23, for large mean values.

  • Approximate distributions: Poisson, Gamma and Normal distributions respectively given by Equations 27, 34 and 40.

The tested parameters λ^1 and λ^2 are graphically shown in Appendix 1—figure 5 a contour map showing the expected steady state mean number of cells n¯2* over the (λ^1,λ^2)-parameter plane. The curves from the numerical simulations and the corresponding exact and approximated solutions are shown in Appendix 1—figure 6, Appendix 1—figure 7 and Appendix 1—figure 8: the tested conditions are divided into three groups (one figure each) representing the limiting behaviors discussed above. Generally, analytical and numerical results agree very well. This also demonstrates that GIA models can show both peaked and non-peaked distributions, depending on the model parameters.

Appendix 1—figure 5
GIA0 test case parameters λ^1 and λ^2 over the contour map of the expected steady state mean number of cells in state X2, n¯2*.

The tested conditions are divided in three groups representing the limiting behaviors discussed in in 'GIA0 test case: steady state distribution and limiting behavior', and for which the steady state distribution is shown respectively in Appendix 1—figure 6, Appendix 1—figure 7 and Appendix 1—figure 8.

Appendix 1—figure 6
GIA0 test case (see GIA0 test case: steady state distribution and limiting behavior') results in terms of steady state distribution P*(n2) of the the number of cells in state X2, n2.

The tested parameters correspond to the condition λ^2=0.01, as representative of the limiting case λ^20, and to different values of λ^1. The results from the numerical simulations are compared to the analytic solution (Equation 20), and its approximation, that is, the Poisson distribution (Equation 27).

Appendix 1—figure 7
GIA0 test case (see 'GIA0 test case: steady state distribution and limiting behavior') results in terms of steady state rescaled distribution P*(x2) of the the number of cells in state X2, where x2=n2/n¯2*.

The tested parameters correspond to the condition λ^2=0.99, as representative of the limiting case λ^21, and to different values of λ^1. The results from the numerical simulations are compared to the analytic solution (Equation 23), and its approximation that is the Gamma distribution (Equation 34).

Appendix 1—figure 8
GIA0 test case (see 'GIA0 test case: steady state distribution and limiting behavior') results in terms of steady state rescaled distribution P*(x2) of the the number of cells in state X2, where x2=n2/n¯2*.

The tested parameters correspond to the condition λ^1=60, as representative of the limiting case λ^1, and to different values of λ^2. The results from the numerical simulations are compared to the analytic solution (Equation 23), and its approximation that is the Normal distribution (Equation 40).

Approximation of generic GIA models

As shown in the main text, a generic GIA model can be expressed in terms of the compartments and 𝒞 (Equation 9 in the main text). We note that the the GIA0 model discussed in the previous section corresponds to the general compartment dynamics of GIA models, Equation 9, main text, if the dynamics of compartments are assumed to be Markovian. Thus, we can treat the GIA0 model as a Markovian approximation of generic GIA models. In this section, we test this approximation numerically.

To this end, we first wish to relate the effective (non-Markovian) rates λR,C and γC of a generic GIA model to the rates of the Markovian approximation, the GIA0 model. We refer to this model – the GIA0 model matched to the effective rates of a particular more complex GIA model – as the equivalent model to the latter. The equivalent rates λR, λC and γC are computed considering the same steady state condition in terms of mean number of cells. To this aim, we rewrite the dynamics of mean cell numbers, Equation 7 in the main text, in block form as,

(41) {dn¯Rdt=ARRn¯Rdn¯Cdt=ACRn¯R+ACCn¯Cdn¯dt=ACn¯C,

in which 𝒏¯R,C denote the vectors of mean cell numbers of states restricted to compartments ,𝒞, respectively, and n the number of lost cells (not considered for total cell numbers and homeostasis condition). It is noted that ARC=, since there cannot be links from 𝒞 to . Also AR= as we do not consider loss from (see main text for the arguments).

Thus, summing up all the components in each compartment, n¯R=i(𝐧¯R)i=1 and n¯C=i(𝐧¯C)i, results in

(42) {dn¯Rdt=0dn¯Cdt=i(ACRn¯R)i+i(ACCn¯C)idn¯dt=ACn¯C.

The equivalent parameters are then estimated from the steady state condition 𝐧¯X* and n¯X*, for X=R,C,, as,

(43) λR=i(ACR𝐧¯R*)i, γC=i(AC𝐧¯C*)in¯C* and λC=γC-λRn¯C*.

The applicability of this approximation was evaluated by comparing the clone size distribution obtained from the random GIA models (generated as described in 'Generation of random models' and analyzed in the main text) with that from the corresponding equivalent GIA0 model with parameters λ^1=λ^R=λR/γC and λ^2=λ^C=λC/γC. The values of λ^1 and λ^2 for all the GIA random models are shown in Appendix 1—figure 9 in the contour map of the expected mean number of cells in 𝒞 (in compartment there is always one single cell). In general, λ^1 remains below five and λ^2 is spread between zero and one. As measure of the error of the equivalent model, ϵ, we choose the maximum difference between the distributions of a particular random GIA model and that of the corresponding equivalent model, relative to the peak of the distribution of the random model. For low mean cell numbers, the distribution is compared to Equation 20; for large mean number instead, the rescaled distribution is compared to Equation 23. A threshold on the mean cell number equal to 10 was chosen to distinguish between the two cases. This relative error ϵ as function of λ^2 is presented in Appendix 1—figure 10, where it is evident that large errors are obtained only for large values of this parameters. Some illustrative cases, representative of different value of λ^2, were selected and their distribution is shown in Appendix 1—figure 11, Appendix 1—figure 12 and Appendix 1—figure 13. The following considerations are made:

  • Two cases for λ^2<0.2 are presented in Appendix 1—figure 11. In these cases, the distribution obtained from the random models agrees with the analytic solution from the equivalent model, which in turn is well approximated by a Poisson distribution. As expected, larger deviations between the equivalent model’s analytic solution and the approximation are noted for increasing values of λ^2. In general, all the models in this range are well approximated by the equivalent model.

  • The two cases presented in Appendix 1—figure 12 have λ^2>0.8, for which the Gamma distribution is an approximation of the equivalent model’s analytic solution. The distribution in some cases (see for instance the top figure), presents some deviations with respect to the equivalent model. However, globally a good agreement is obtained in most of the cases (failing ratio, based on a 0.5 maximum error is 21.7%).

  • Two cases in an intermediate range 0.2<λ^2<0.8 are shown in Appendix 1—figure 13. Again, the equivalent model’s analytic solution is well representative of the distribution (failing ratio, based on a 0.5 maximum error is 3.2%). It is noted that for such values of λ^2 an approximation of the equivalent model analytic solution is not available.

Appendix 1—figure 9
GIA random models (generated as described in 'Generation of random models' and analyzed in the main text) equivalent parameters λ^1=λ^R and λ^2=λ^C (see section Approximation of generic GIA models) over the contour map of the expected steady state mean number of cells in the committed compartment, n¯C*.

Some illustrative cases, for which the steady state distribution is shown in Appendix 1—figure 11, Appendix 1—figure 12 and Appendix 1—figure 13, are highlighted.

Appendix 1—figure 10
Relative error of the the equivalent model approximation, ϵ, (see definition in 'Approximation of generic GIA models') as function of λ^2=λ^C for the GIA random models (generated as described in 'Generation of random models' and analyzed in the main text).

The selected cases correspond to some illustrative cases for which the steady state distribution is shown in Appendix 1—figure 11, Appendix 1—figure 12 and Appendix 1—figure 13.

Appendix 1—figure 11
GIA random models selected cases (see Appendix 1—figure 9 and Appendix 1—figure 10) where λ^2<0.2: the steady state distribution P*(nC) of the number of cells in the committed compartment, nC, is compared to that of the equivalent model (Equation model in the legend) analytic solution and its approximation for low λ^2 (i.e. the Poisson distribution, Poisson (λ^1)).

Results discussion is reported in 'Approximation of generic GIA models'.

Appendix 1—figure 12
GIA random models selected cases (see Appendix 1—figure 9 and Appendix 1—figure 10) where λ^2>0.8: the steady state rescaled distribution P*(xC) of the number of cells in the committed compartment, where xC=nC/n¯C*, is compared to that of the equivalent model (Equation model in the legend) analytic solution and its approximation for high λ^2 (i.e. the Gamma distribution, Gamma (λ^1,1/λ^1)).

Results discussion is reported in 'Approximation of generic GIA models'.

Appendix 1—figure 13
GIA random models selected cases (see Appendix 1—figure 9 and Appendix 1—figure 10) where 0.2<λ^2<0.8: the steady state distribution P*(nC) (or the rescaled distribution P*(xC)) of the number of cells in the committed compartment, nC (or in the rescaled case xC=nC/n¯C*), is compared to that of the equivalent model (Equation model in the legend) analytic solution.

Results discussion is reported in 'Approximation of generic GIA models'.

Thus, in most of the tested cases the equivalent model is able to catch the behavior of a generic random GIA model, and thus represents a good approximation (global failing ratio, based on a 0.5 maximum error is 6%). In the cases where the equivalent model does not yield a good approximation, the internal structure of the and 𝒞 compartments become relevant and subsequent events that affect nR and nC become dependent on each other, and thus are non-Markovian.

GIA model for large λ^R

To test the behavior of a generic GIA model in case of large λ^R, the GIA random models (generated as described in 'Generation of random models' and analyzed in the main text) were modified by changing the process rates associated to the renewing compartment to achieve λ^R=30. To this aim, considering that infinite solutions are possible, we applied a global search method, and more specifically a Genetic Algorithm (Goldberg, 1989). We therefore setup an optimization problem, where the process parameters are the optimization variables and the cost function is the error of the current λ^R with respect to the target.

The envelope of curves obtained in all the random models and some illustrative profiles are shown in Appendix 1—figure 14. A reference Normal distribution, characterized by unitary mean and variance equal to 1/λ^R=1/30 is also reported: this curve corresponds to the distribution expected in the equivalent model for which λ^1=λ^R. Deviations become relevant, when the internal structure of compartments in a random model leads to subsequent events that are not independent from each other. These effects alter the variance of the Normal distribution. In fact, Figure 4 in the main text is based on the same simulation results, but in this case the rescaling is done considering both the mean number of cells and its variance (a Normal distribution is a two-parameter distribution).

Appendix 1—figure 14
Rescaled clone size distribution for the random GIA models when λ^R=30 at the final simulation time, which corresponds to 20/αmin (αmin is the minimum process rate).

The grey shade represents the percentile of all the simulations (black lines limit the 5-95%ile range); the blue curves correspond to some illustrative selected simulations. A reference curve corresponding to a Normal distribution of unitary mean and variance equal to 1/λ^R=1/30 is shown in green. Distributions of the total number of cells n are scaled by the mean number of cells n¯, being x=n/n¯. Simulations for which the final condition (20 times the inverse of the minimum process rate) is not achieved (due to computational limitations) are not included, resulting in 922 models. Results discussion in provided in 'GIA model for large'.

GIAB test case: bimodal distribution

In the previous subsection we increased λR in a way which assures that other parameters within stay of the same order of magnitude. Here, we address the question what happens if some parameters within are much smaller than parameters of 𝒞, such as γC. For that purpose, we study another simple GIA model, let us call it GIAB, as a modification of the GIA0 test model defined by 14. In the GIAB model the renewing compartment is composed by two states X1 and X2, instead of only one. Cells in these states divide asymmetrically (i.e. one daughter cell remains within the renewing compartment while the other enters the committed compartment) or change state between X1 and X2 (cell state switching) while still remaining within the renewing compartment. The committed compartment of the system is composed just by a single state, X3, and cells in this state either duplicate or die (as the previous state X2 in Equation 14). This corresponds to the model

(44) X1λ1X1+X3,X2λ2X2+X3,X1ω12X2,X2ω21X1,X3λ3X3+X3,X3γ.

In this model, the effective parameters as defined in 'Approximation of generic GIA models', λR=λ1P1*+λ2P2*, where Pi*=ωjiωij+ωji, i,j=1,2,ij, and γC=γ. As before, we define the non-dimensionalized parameters λ^R=λR/γC and here we also define ω^=ω12/γC, and further the parameter ratios a=λ1/λ2 and b=ω12/ω21. In the following, we test this model for different values of a and ω^ as reported in Appendix 1—table 2, while fixing λ^R=30, which is our main scaling parameter, as well as λ^C=0 and b=1.

Appendix 1—table 2
GIAB test case simulation parameters (see 'GIAB test case: bimodal distribution').
Caseω^λ1/λ2
GIAB#13 1011
GIAB#23 10-21
GIAB#33 10210
GIAB#43 10110
GIAB#53 10010
GIAB#63 10-110
GIAB#73 10-210

The rescaled distribution of the number of cells in the committed compartment 𝒞 (i.e. in state X3), nC, obtained at the final simulation time τ, is shown in Appendix 1—figure 15. A value of τ equal to 20/αmin (where αmin is the minimum of all rate parameters) was chosen to assure that the steady state is reached. Considering first the test cases GIAB#1 and GIAB#2 according to Appendix 1—table 2, which are characterized by a=1 (i.e. there is no difference in the division timescales for the two renewing states), they both lead to a Normal distribution, independently on the value assumed by ω^. Test cases GIAB#3 to GIAB#7 instead are all characterized by a=10, and different orders of magnitude for ω^ are tested. The distribution in these cases is Normal until ω^λ^R/10 (see cases GIAB#3 to GIAB#5); when ω^ is significantly lower than λ^R, then bimodality emerges (see cases GIAB#6 and GIAB#7). Looking at the extreme case, GIAB#7, cells in each renewing state, if analyzed independently, would result in a Poisson distribution in the committed compartment with different mean values (low for the slow-dividing state and large for the fast-dividing one). Thus, globally the distribution is in line with a bimodal distribution computed as

Appendix 1—figure 15
Rescaled distribution of the cells number in the committed compartment in the GIAB test cases at time τ, which is 20/αmin (αmin is the minimum process rate).

The distributions P(x~C) of the number of cells in the committed compartment nC is rescaled considering that x~c=(nC-n¯C)/σnc, where σnc is the variance of nc. In addition to the stochastic simulation results for different settings (see Appendix 1—table 2), the reference Normal and bimodal distributions are also shown. Results discussion is provided in 'GIAB test case: bimodal distribution'.

(45) P(n)=βPoisson(λ^R(1))+(1-β)Poisson(λ^R(2)),

in which β is the mixing parameter, computed as,

(46) β=n¯-n¯2n¯1-n¯2,

and the parameters λ^R(i) and n¯i for i=1,2 correspond to the parameter λ^R and to the mean number of cells of a system in which the renewing compartment would be composed just by state Xi. The total mean number of cells is instead indicated by n¯. The bimodal distribution given by Equation 45 is indicated as a black dashed-dotted line in Appendix 1—figure 15.

Analysis of the Generalized Population Asymmetry model

In the main text, it is shown that GPA models predict asymptotically, for large times t, the same rescaled clone size distribution, that is, an Exponential distribution of unitary mean.

In Appendix 1—figure 16, the 50%tile distribution of all the GPA models analyzed is shown at different levels of extinction (which are related to the different time points), showing a gradual convergence to the expected Exponential distribution.

Appendix 1—figure 16
Clonal size distribution (corresponding to the 50%ile curve) in the GPA random models at different extinction fraction (i.e. different time).

The curves are compared to the expected Exponential distribution (see 'Analysis of the generalized Population Asymmetry model').

Thus, the Markov approximation to all GPA models, Equation 12 in the main text (the equivalent model of GPA models), becomes accurate for sufficiently large t and no significant deviations are observed. This also means that for large t, the distribution is independent of the choice of parameters, since only the mean value of surviving clones, n¯s, depends on parameters, which however, does not affect the rescaled distribution in terms of x=nn¯s. We can therefore abstain from an extended study of different parameter regimes. This is in contrast to the GIA model class where distributions depend sensitively in the choice of parameters if we are not in the scaling regime of large λ^R, and the non-Markovian nature of GIA models can become relevant, as we showed in the previous section.

Asymptotic clone size distributions: Mathematical analysis

In the previous two sections, we studied numerically how a Markovian representation can approximate general cell fate models (GIA and GPA) models. Here, we study from an analytical view point how generic GIA and GPA models converge to the respective limiting distributions, for large time t (GPA models) and large λ^R (GIA models).

Similar to 'Approximation of generic GIA models', we define 𝒏R and 𝒏C as the cell number vectors (here: actual cell numbers of the stochastic model, not mean cell numbers) restricted to the states of compartments and 𝒞, respectively. We further define the accumulated cell numbers nR=i(𝒏R)i and nC=i(𝒏C)i in and 𝒞, respectively. Considering nR and nC as observables of our compartment model, this corresponds to a Hidden Markov Model in that the dynamics of the observables are not Markovian, yet they are entirely determined by a set of states which follow a Markov process.

General dynamics of C-cells for GIA and GPA models

Comments on the effective rate parameter λR

For general GIA and GPA models in the compartment representation of Equation 9, main text, the effective rate parameter λR (i.e. the frequency of cell divisions in per cell), is defined similar as in 'Approximation of generic GIA Models', yet, here we take into account that λR can depend on time via the – not necessarily stationary – distribution of cells within (since the process is non-Markovian). Hence, in these more general terms, we define λR(t)=iλiPiR(t) where PiR(t)=n¯i(t)n¯R(t) is the probability of a single cell to be in state i at time t. PiR(t) may variate after each event E, as the conditional probability PR|E, provided that an event E has just occurred, differs from the stationary state distribution.

In homeostasis, where the number of R-cells must on average stay constant, λR is also the rate, per R-cell, at which C-cells are created from R-cells, via events RR+C,RC+C, or direct transition, RC. Thus, the total rate of C-cells being created from the R-cells by such events – let us call them RC-events – is λRnR. While the non-Markovian nature of the process does not assure that such events are distributed exponentially, we can state that, by definition, the number of such creation events in a time period Δt, NRC, has mean value NRC(Δt)=0ΔtλR(t)nR(t)𝑑t.

While, λR(t) may in principle depend on time, we note that when internal rates of are fast compared to the time period Δt above (an internal rate of is a rate ωij where states i,j are both in ), then λR(t) fluctuates quickly and we can make an adiabatic approximation, replacing λR(t) by its average λ¯R=iλiPiR, where PiR*=n¯i*n¯R* is the steady state value of PiR(t) (this corresponds for GIA models to the definition of λR in 'Approximation of generic GIA models'). This is fulfilled in our simulations of large λ^R, since internal rates, such as ω^ defined in 'GIAB test case: bimodal distribution', scale with λ^R when λR (see 'GIA model for large'). Hence, the time scales of internal rates are substantially smaller than the relevant time scale Δt=1/γ¯C, the lifetime of generated C-cells. Therefore, when comparing with simulation results, it is generally appropriate to assume that λR(t)λ¯R is constant. In the following subsection, we will discuss this case. The case when internal rates are slower than the time scale γC is discussed in the subsequent subsection.

Asymptotic distributions of C-cells

Each C-cell created by an RC-event initiates a sub-clone within 𝒞, defined through its progeny, which then follows the dynamics of 𝒞. Such sub-clones evolve independently of each other (a defining characteristic of branching processes [Haccou et al., 2005]). Let us call the number of cells of a sub-clone created by an RC-event at time ti, which evolves over time t, as ξi(t). We denote two RC-events which happen at the same time via a symmetric division of type RC+C by different indices i and i+1, yet with ti=ti+1. Therefore, the total number of cells in 𝒞 is the sum of independent random numbers ξi,

(47) nC(t)=i=1NRCξi(t)

Note that the random numbers ξi(t) are not identically distributed, since their statistics depend on the time point of the i-th RC-event. In particular, the mean value, ξ¯i(t-ti)=ξi(t) and variance σξ2(t-ti)=(ξi(t)-ξ¯i)2 depend on the time passed since the respective RC-event at time ti. Thus, we cannot apply the central limit theorem in its original form to the sum of random numbers, Equation 47. However, a variation of the central limit theorem states that sums of non-identically distributed random variables, iξi, converge to normally distributed random variables, if mean and variance of ξi are finite, and they fulfill Lindeberg’s condition (Billingsley, 1995).

The (strict) Lindeberg’s condition is said to be fulfilled for a sequence of random numbers ξi,i=1,,N, if

(48) maxiσi2σN20, for N

where σi2=(ξi-ξ¯i)2 and σN2=i=1Nσi2 . If this is fulfilled, then nC=i=1Nξi converges for N to a random variable that is normal distributed.

To show that the ξi fulfill Lindeberg’s condition, we note that ξi(t-ti) follow a sub-critical multi-type branching process, for which ξ¯i(t)0 for t, which is assured since the eigenvalues of the adjacency matrix of 𝒞 are all negative (since dominant eigenvalues of all SCCs in 𝒞 are negative [astrom_murray_feedback_book]). For multi-type branching processes, the variance σ2 is proportional to the mean value, hence σi2(t-ti)ξ¯(t-ti). Therefore, σi20 for t, hence it is bounded, i.e there exists C>0 such that σi2(t)<C for all t. Furthermore, since initially, at t=ti, ξ¯i(ti)=1, we know that there exist t1>0 and δ>0 such that ξ¯i(t)>δ for t-ti<t1. Now we recall that, since here we assume the validity of the adiabatic approximation discussed in the previous subsection, the number of RC-events within a time period Δt is NRC(Δt)λR0ΔtnR(t)𝑑t. For generic λR, NRC is finite and thus is σN, since all σi(t)0 for large t. However, for λR or nR, we get that NRC(t1)λ¯RnR and thus σN2=i=1NRCσi2(t)>NRCδ. On the other hand, all σi2<C, which means that all σi2σN2<CσN20 for λR or nR. Hence, Lindeberg’s condition is fulfilled if λR or nR and thus, nC becomes normally distributed,

(49) nC(t)=iNRCξi(t)Normal(mean=n¯C,variancen¯C)

The variance scales with nC since variances of independent random numbers add linearly and each σi2ξ¯i. The exact value of n¯C and the pre-factor of the variance of nC in this limit depend on the (non-Markovian) model details.

Deviations from a normal distribution in the asymptotic case

The arguments leading to Equation 49 hold for large λ^R if the internal rates of are comparable to λ¯R=iλin¯i*n¯R*, which is satisfied for all cases we sampled randomly for numerical simulations, see 'GIA model for large'. However, if internal rates of are much smaller than λR, then the adiabatic approximation PiR(t)n¯i*n¯R* does not apply and λR(t) may vary slower than the time scale 1/γ¯C. For example, consider a GIA model in which can be decomposed into two sub-compartments, say 1 and 2, whereby any rates ωij,ωji with i1,j2 have ωij,ωjiλ¯R, as the example discussed in 'GIAB test case: bimodal distribution'. Then, the single cell in (note that always nR=1 in GIA models) may spend long time periods in 1 and 2 respectively. Now, if λ¯R1=i1λin¯in¯R1i2λin¯in¯R2=λ¯R2, then, for time periods exceeding 1/γ¯C, the effective asymmetric division rates are λ¯R1 and λ¯R2 respectively, and during these time periods the distribution of nC cells has mean n¯C(1)λ¯R1 and n¯C(2)λ¯R2 respectively. Hence, the total clone size distribution will be the mix of two Normal distributions with mean n¯C(1) and n¯C(2), respectively, that is, a bimodal distribution. This scenario is discussed in 'GIAB test case: bimodal distribution', for the specific case of two states in .

GIA models

In GIA models, the number of R-cells is conserved, and in particular, for clones, we have nR=1 for all times. Hence, the rate of RC-events is simply λR. Now, if internal rates are fast and λR, then nC becomes normally distributed, as argued above. Hence, also n=nR+nC=1+nC follows a Normal distribution, with mean nC+1 instead.

Nonetheless, if internal rates are less than γC then bimodal distributions may be observed, as discussed in 'GIAB test case: bimodal distribution'.

GPA models

The dynamics of GPA models read, in compartment formulation,

(50) RλR{R+RPr.rRRR+CPr.1rRRrCCC+CPr.rCC,
(51) RωRCC,CλCC+C,CγC

Since the dynamics of R-cells do not depend on C-cells, we can first consider the formers’ dynamics separately. In homeostasis, where λRrRR=λRrCC+ωRC, we have thus for R-cells,

(52) nRλRrRRnRnR±1

This is a simple continuous time branching process with two offspring; yet it is non-Markovian: subsequent events may be correlated, since each event imbalances the internal distribution PiR of cells in the compartment . Yet, as for C-cells, we can write the number of R-cells as a sum of independent (but not identically distributed) random variables. Let us consider for each R-cells, born at time ti, the random variable ξiR describing its 'survival' state, that is, ξiR=1 if that cell is still in , and ξiR=0 if that cell has left via symmetric differentiation, RC+C or direct transition, RC. Essentially, the random numbers ξiR are the ‘branches’ of the branching process. Since these events do not depend on other cells, the random numbers ξiR are independent of each other, and thus,

(53) nR(t)=i=1Nb(t)ξiR(t),

is a sum of independent, not identically distributed random variables. Here,Nb(t) is the total number of birth events occurring at rate λRrRRnR, RR+R, up to time t. Since ξiR(t)1 and ξiR(t=ti)=1, we can argue analogue to above for Equation 49 that the sequence of ξiR fulfills Lindeberg’s condition and thus nR converges to a Normal distribution, whereby the mean value n¯R=1 (since due to homeostasis the mean number is constant and the initial condition is nR(t=0)=1). Hence, the probability to have nR cells in is

(54) P(nR)e-(nR-1)22σR2e-nR22σR2 for nR1.

However, here, the variance σR2 is a random variable itself: Since the ξiR are independent, σR2=i=1Nb(t)σi2, where σi2=(ξiR-ξ¯R)2, and where Nb(t) is a random variable. The random numbers ξiR can only have the values ξi=1 or ξiR=0 and they follow a simple death process, so for ξR=0, it must be σi2=0, while for ξiR=1, the variance must be finite, let’s say, σi2=β(t)>0 where β can in principle depend on time, yet is not known (it depends on the non-Markovian details of the model). Hence,

(55) σR2=i=1Nb(t)β(t)ξiR=β(t)nR

since the number of summands with ξiR=1 is the number of surviving R-cells, that is, nR. Substituting σR2=β(t)nR into Equation 54 gives,

(56) P(nR)enR22β(t)nR=enR2β(t)

This is an Exponential distribution with mean value n¯R=nR=2β(t). Finally, when we enforce normalisation of the probability distribution, we get,

(57) P(nR)=1n¯R(t)enRn¯R(t) for nR1.

Eventually, we also have to 'add' the C-cells. Since for t1, also nR1, individual events nRnR±1 do not significantly affect the distribution of R-cells, PiR=n¯in¯R (in contrast to the case of nR=1 for GIA models), and hence we can assume the adiabatic approximation discussed above, where PiRPiR* and thus λRconst. Therefore, C-cells are distributed according to a Normal distribution with mean n¯C and variance σn22n¯CλRnR. As argued in the main text, the mean value of nR, conditionend on survival of a clone, nR>0, must grow over time, without bound if t. Therefore, we can generally assume that nR1, and hence both n¯CnR and σC2nR. However, if we express the clone size in form of a rescaled variable x=nn¯s (n¯s is the mean of surviving clones) we can write x=xR+xC with xR=nRn¯s and xC=nCn¯s, and note that the rescaled standard width of the distribution of xC, σxC=σCn¯n¯Cn¯R+n¯CnRnR vanishes for t. Therefore, xC is effectively a constant in that limit, xCx¯CxR. Hence, also x=xR+xCxR and thus, the rescaled clone size, x=nn¯s, is distributed according to an Exponential distribution (here: a probability density function) with unit mean, and after renormalisation, we get that

(58) P(x)=e-x for t.

This distribution is indeed observed in all our simulations of GPA models for large t.

Data availability

All numerical data used for figures is produced by programme code, which can be found on Github, under https://github.com/cp4u17/simCellState (copy archived at https://github.com/elifesciences-publications/simCellState).

The following data sets were generated
    1. Parigini C
    (2020) Github
    ID cp4u17/simCellState. simCellState.

References

  1. Book
    1. Abramowitz M
    2. Stegun IA
    (1972)
    Handbook of Mathematical Functions with Formulas, Graphs and Mathematical Tables, 32
    United States Department of Commerce, National Bureau of Standards.
  2. Book
    1. Åström KJ
    2. Murray RM
    (2008)
    Feedback Systems: An Introduction for Scientists and Engineers
    Princeton Unversity Press.
  3. Book
    1. Billingsley P
    (1968)
    Convergence of Probability Measures
    Jon Wiley & Sons.
  4. Book
    1. Billingsley P
    (1995)
    Probability and Measure
    John Wiley and Sons.
    1. Bramson M
    2. Griffeath D
    (1980) Asymptotics for interacting particle systems onZ d
    Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete 53:183–196.
    https://doi.org/10.1007/BF01013315
  5. Book
    1. Cormen TH
    (2009)
    Introduction to Algorithms
    MIT Press.
  6. Book
    1. Goldberg DE
    (1989)
    Genetic Algorithms in Search, Optimization and Machine Learning (1st edn)
    Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc.
  7. Book
    1. Greulich P
    (2019) Mathematical Modelling of Clonal Stem Cell Dynamics
    In: Cahan P, editors. Computational Stem Cell Biology. New York: Humana. pp. 107–129.
    https://doi.org/10.1007/978-1-4939-9224-9_5
    1. Hirsch MW
    2. Smith H
    (2006)
    Handbook of Differential Equations: Ordinary Differential Equations
    239–257, Monotone dynamical systems, Handbook of Differential Equations: Ordinary Differential Equations, Elsevier, 10.1016/S1874-5725(05)80006-9.
  8. Report
    1. National Institute of Health
    (2009)
    Stem Cell Basics
    NIH.
    1. Potten CS
    2. Loeffler M
    (1990)
    Stem cells: attributes, cycles, spirals, pitfalls and uncertainties. Lessons for and from the crypt
    Development 110:1001–1020.
  9. Book
    1. Van Kampen N
    (1981)
    Stochastic Processes in Physics and Chemistry
    Elsevier.

Article and author information

Author details

  1. Cristina Parigini

    1. School of Mathematical Science, University of Southampton, Southampton, United Kingdom
    2. Institute for Life Sciences, University of Southampton, Southampton, United Kingdom
    Contribution
    Conceptualization, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing, Mathematical analysis (part), Numerical analysis
    Competing interests
    No competing interests declared
  2. Philip Greulich

    1. School of Mathematical Science, University of Southampton, Southampton, United Kingdom
    2. Institute for Life Sciences, University of Southampton, Southampton, United Kingdom
    Contribution
    Conceptualization, Supervision, Funding acquisition, Project administration, Writing - review and editing, Mathematical analysis (part)
    For correspondence
    P.S.Greulich@soton.ac.uk
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-5247-6738

Funding

Medical Research Council (MR/R026610/1)

  • Philip Greulich

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Benjamin D MacArthur for valuable discussions that contributed in the development of this research. CP is supported by a Studentship of the Institute for Life Sciences (Southampton) and PG by Medical Research Council New Investigator Research Grant MR/R026610/1.

Version history

  1. Received: March 2, 2020
  2. Accepted: July 3, 2020
  3. Accepted Manuscript published: July 20, 2020 (version 1)
  4. Version of Record published: August 24, 2020 (version 2)

Copyright

© 2020, Parigini and Greulich

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 895
    views
  • 177
    downloads
  • 7
    citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Cristina Parigini
  2. Philip Greulich
(2020)
Universality of clonal dynamics poses fundamental limits to identify stem cell self-renewal strategies
eLife 9:e56532.
https://doi.org/10.7554/eLife.56532

Share this article

https://doi.org/10.7554/eLife.56532

Further reading

    1. Computational and Systems Biology
    2. Neuroscience
    Andrea I Luppi, Pedro AM Mediano ... Emmanuel A Stamatakis
    Research Article

    How is the information-processing architecture of the human brain organised, and how does its organisation support consciousness? Here, we combine network science and a rigorous information-theoretic notion of synergy to delineate a ‘synergistic global workspace’, comprising gateway regions that gather synergistic information from specialised modules across the human brain. This information is then integrated within the workspace and widely distributed via broadcaster regions. Through functional MRI analysis, we show that gateway regions of the synergistic workspace correspond to the human brain’s default mode network, whereas broadcasters coincide with the executive control network. We find that loss of consciousness due to general anaesthesia or disorders of consciousness corresponds to diminished ability of the synergistic workspace to integrate information, which is restored upon recovery. Thus, loss of consciousness coincides with a breakdown of information integration within the synergistic workspace of the human brain. This work contributes to conceptual and empirical reconciliation between two prominent scientific theories of consciousness, the Global Neuronal Workspace and Integrated Information Theory, while also advancing our understanding of how the human brain supports consciousness through the synergistic integration of information.

    1. Computational and Systems Biology
    2. Genetics and Genomics
    Ardalan Naseri, Degui Zhi, Shaojie Zhang
    Research Article Updated

    Runs-of-homozygosity (ROH) segments, contiguous homozygous regions in a genome were traditionally linked to families and inbred populations. However, a growing literature suggests that ROHs are ubiquitous in outbred populations. Still, most existing genetic studies of ROH in populations are limited to aggregated ROH content across the genome, which does not offer the resolution for mapping causal loci. This limitation is mainly due to a lack of methods for the efficient identification of shared ROH diplotypes. Here, we present a new method, ROH-DICE (runs-of-homozygous diplotype cluster enumerator), to find large ROH diplotype clusters, sufficiently long ROHs shared by a sufficient number of individuals, in large cohorts. ROH-DICE identified over 1 million ROH diplotypes that span over 100 single nucleotide polymorphisms (SNPs) and are shared by more than 100 UK Biobank participants. Moreover, we found significant associations of clustered ROH diplotypes across the genome with various self-reported diseases, with the strongest associations found between the extended human leukocyte antigen (HLA) region and autoimmune disorders. We found an association between a diplotype covering the homeostatic iron regulator (HFE) gene and hemochromatosis, even though the well-known causal SNP was not directly genotyped or imputed. Using a genome-wide scan, we identified a putative association between carriers of an ROH diplotype in chromosome 4 and an increase in mortality among COVID-19 patients (p-value = 1.82 × 10−11). In summary, our ROH-DICE method, by calling out large ROH diplotypes in a large outbred population, enables further population genetics into the demographic history of large populations. More importantly, our method enables a new genome-wide mapping approach for finding disease-causing loci with multi-marker recessive effects at a population scale.