How adult stem cells maintain self-renewing tissues is commonly assessed by analysing clonal data from in vivo cell lineage-tracing assays. To identify strategies of stem cell self-renewal requires that different models of stem cell fate choice predict sufficiently different clonal statistics. Here, we show that models of cell fate choice can, in homeostatic tissues, be categorized by exactly two ‘universality classes’, whereby models of the same class predict, under asymptotic conditions, the same clonal statistics. Those classes relate to generalizations of the canonical asymmetric vs. symmetric stem cell self-renewal strategies and are distinguished by a conservation law. This poses both challenges and opportunities to identify stem cell self-renewal strategies: while under asymptotic conditions, self-renewal models of the same universality class cannot be distinguished by clonal data only, models of different classes can be distinguished by simple means.
Adult stem cells are the key players for maintaining and renewing biological tissue, due to their ability to persistently produce tissue cells through cell division and differentiation (National Institute of Health, 2009). For maintaining tissues in a homeostatic state, it is crucial that stem cells adopt suitable self-renewal strategies, a pattern of stem cell fate choices that balances proliferation and differentiation; otherwise, imbalanced proliferation may lead to hyperplasia and cancer. Therefore, the understanding and identification of stem cell self-renewal strategies has been a major goal of stem cell biology ever since the discovery of adult stem cells.
Classically, two stem cell self-renewal strategies have been proposed (Potten and Loeffler, 1990; Simons and Clevers, 2011a): following the Invariant Asymmetric division (IA) strategy, stem cells undertake only asymmetric divisions, whose outcome is one differentiating cell and one stem cell as daughter cells. The other proposed strategy, Population Asymmetry (PA) (Potten and Loeffler, 1990; Simons and Clevers, 2011a; Watt and Hogan, 2000; Klein and Simons, 2011), features additionally symmetric divisions, which produce either two stem cells or two differentiating cells as daughters, yet in balanced proportions. Both patterns of cell fate choice leave the number of cells on average unchanged and thus can maintain homeostasis. Assessing stem cell self-renewal strategies experimentally is difficult in vivo, since direct observation of cell divisions is rarely possible. Yet, through genetic cell lineage-tracing assays, the statistics of clones – the progeny of individual cells – can be obtained, and via mathematical modeling assessing cell fate dynamics became possible. With such an approach several studies suggested that population asymmetry prevails in many mouse tissues (e.g. Clayton et al., 2007; Lopez-Garcia et al., 2010; Simons and Clevers, 2011b; Doupé et al., 2012; Klein et al., 2010).
However, the interpretation of those studies has been challenged by a suggested alternative self-renewal strategy, called Dynamic Heterogeneity (DH), featuring some degree of cell fate plasticity (Greulich and Simons, 2016). In this model, all stem cell divisions are asymmetric, yet it is in agreement with the experimental clonal data that had previously been shown to agree also with the population asymmetry strategy. Thus, those two strategies are not distinguishable in view of the clonal data.
This raises the question to what extent different stem cell self-renewal strategies can be distinguished at all via clonal data (Klein and Simons, 2011; Greulich, 2019). Here, we address this question by studying models for stem cell fate choice, which define the self-renewal strategies, in their most generic form. We show that many cell fate models predict, under asymptotic conditions, the same clonal statistics and thus cannot be distinguished via clonal data from cell lineage-tracing experiments. In particular, we find that there exist two particular classes of stem cell self-renewal strategies: one class of models which all generate an Exponential distribution of clone sizes (the number of cells in a clone) after sufficiently large time, and one which generates a Normal distribution under sufficiently fast stem cell proliferation. Crucially, these two classes are not differentiated via the classical definitions of symmetric and asymmetric stem cell divisions, but by whether or not a subset of cells is conserved. These classes thus bear resemblance to 'universality classes' known from statistical physics, as suggested in Klein and Simons, 2011. This leads us to a more generic, and in this context more useful, definition of the terms ‘symmetric’ and ‘asymmetric’ divisions. Notably, however, we find that the conditions for the emergence of universality are not always fulfilled in real tissues, which provides chances, but also further challenges, for the identification of stem cell fate choices in homeostatic tissues.
The two classical stem cell self-renewal strategies, Invariant Asymmetry (IA) and Population Asymmetry (PA) (Potten and Loeffler, 1990; Simons and Clevers, 2011a; Watt and Hogan, 2000; Klein and Simons, 2011), are commonly described in terms of two cell types: stem cells (S) which can self-renew (i.e. divide without reducing their potential to divide in the future); and differentiating cells (D). Both strategies can be expressed in terms of a single parametrized stochastic model, a multi-type branching process (Haccou et al., 2005), defined by the outcomes of cell divisions (the cell fate choices),
where cells of type S divide with rate λ. Here, a daughter cell configuration corresponds to symmetric self-renewal division and to symmetric differentiation, while daughter cells of different type, , marks an asymmetric division. In the basic model version, a cell of type D is eventually lost with rate γ, (corresponding to death, shedding, or emigration of D-cells), while other versions may include the possibility of limited proliferation as committed progenitor cells. The two self-renewal strategies, IA and PA, are distinguished by the value of the symmetric division fraction r: the PA model corresponds to any ; the IA model is defined by , that is, only asymmetric divisions occur.
To maintain homeostasis, the number of cells must stay, on average, constant. Thus cells following the PA strategy must regulate the probabilities of symmetric self-renewal and differentiation to be exactly equal, whereas for the IA model this is trivially assured. However, only for the IA model is the number of stem cells strictly conserved, that is, no gain or loss of stem cells is possible.
A way to assess self-renewal strategies experimentally is via genetic cell-lineage tracing (Kretzschmar and Watt, 2012; Blanpain and Simons, 2013): By marking single cells with an inheritable genetic marker (through a Cre-Lox system [Soriano, 1999; Sauer, 1998]) each cell’s progeny, called a clone, which retain that marker, can be traced. The number of cells per clone, that is the clone size, is measured and the statistical frequency distribution of clone sizes (clone size distribution) determined. To test the cell fate choice models on that data, one evaluates the models with a single cell as initial condition and samples the outcome in terms of the final cell numbers – the size of a virtual clone. In the basic version of the model (i.e. when ), the IA and PA models predict, respectively, a Poisson and an Exponential clone size distribution for large times (Klein and Simons, 2011; Antal and Krapivsky, 2010) (see also the Appendix, 'Invariant Asymmetry and Population Asymmetry models'). Thus, they are fundamentally different and can easily be distinguished when compared with clonal data. By a series of lineage-tracing experiments it was confirmed that Exponential clone size distributions prevail for most mouse tissues, which thus exclude the IA model and support the PA strategy (Clayton et al., 2007; Lopez-Garcia et al., 2010; Simons and Clevers, 2011b; Doupé et al., 2012; Klein et al., 2010).
While this seemed to settle the case in favour of the PA strategy, at least for most adult mouse tissues, this was challenged by a third type of strategy, the DH model (Greulich and Simons, 2016). Motivated by the emerging view of prevailing cell plasticity (Blanpain and Fuchs, 2014; Tetteh et al., 2015; Tetteh et al., 2016; Donati and Watt, 2015), the DH model considers the possibility of reversible switching between two cell types:
where symbols at arrows denote the process rates (frequency of events). This strategy is also capable of maintaining a homeostatic population if . Notably, the DH model only features asymmetric divisions (in that daughter cells are of different type), like the IA model, yet the DH model predicts clonal statistics that are indistinguishable from the PA model (Greulich and Simons, 2016). This means that in view of the existing clonal data for mouse tissues, the DH model, may as well describe the real cell fate dynamics. More fundamentally, this implies that the PA and DH model cannot be distinguished via plain clonal data, which poses fundamental limitations to the common approach to use lineage tracing for determining cell fate choices.
This demonstrates that the classical definition of asymmetric and symmetric divisions is not always suitable to distinguish cell fate strategies in view of clonal data alone. In general, cell fate dynamics may be much more complex than the simplified models described above, as there may exists a plethora of cell (sub-)types in a tissue. However, to what extent would it be possible to distinguish details of potentially rather complex cell fate dynamics models through comparison with clonal data at all? This is only the case if the clonal statistics are sufficiently different. In the following, we study cell fate models in their most generic form, and analyze what clonal statistics would be expected.
Let us consider the dynamics of a generic system of cells, characterized by a number m of possible cell states Xi, . We define a cell state here as a group of cells showing common properties (e.g. any cell sub-type classification). Most generally, cells in a state Xi may be able to divide, producing daughter cells of any cell states Xj and Xk (where , that is, simple cell duplication, is possible). Furthermore, any cell state Xi may turn into another state Xj or may be lost (through emigration, shedding, or death). Hence, we can write a generic cell fate model as,
where . In this model, is the rate of division of cells in state Xi and the parameter corresponds to the proportion of division outcomes producing daughter cells of state Xj and state Xk; ωij is the transition rate from state Xi to state Xj and the loss rate from state Xi.
The dynamics of each cell in Equations 3-5 could depend on the cell environment through spatial, cell-extrinsic regulation of cell fate. However, the clonal statistics of spatial models that include cell-extrinsic regulation of cell fate (models of the voter type [Clifford and Sudbury, 1973]) are, in the long term, the same as for the corresponding branching process models (Haccou et al., 2005), as Equations 3-5 are, except for one-dimensional arrangements of cells (as shown in Klein and Simons, 2011; Bramson and Griffeath, 1980). Here, we are focussing on the long-term clonal statistics of self-renewal strategies, and since this is not affected by cell-extrinsic regulation, for tissues with two-dimensional or three-dimensional arrangements of dividing cells (like epithelial sheets, and volumnar tissue), we wish to keep the analysis simple and therefore choose dynamics (and thus the parameters ) to be independent of the cell environment.
In the following, we study the dynamics of cell numbers in each state Xi, . To gain initial insight into those dynamics, let us first consider the time evolution of the mean cell numbers, , given by,
in which is the probability of having a daughter cell in state Xj produced upon division of a cell in state Xi. This linear system of differential equations can be written more compactly in terms of the mean cell number vector ,
with being the matrix
where we defined the total transition rate , combining all transitions from Xi to Xj by cell divisions and direct transitions, and the local loss rate .
Models of the form Equations 3–5 are not generally in homeostasis, which in this context is defined by the existence of a stationary state , with , that is (Lyapunov) stable and non-trivial (for a discussion, see the Appendix 'Conditions for homeostasis'). This can in principle be assessed through the spectral properties of A (Åström and Murray, 2008), but applying spectral conditions explicitly is unwieldy and difficult to interpret biologically. For a more intuitive view, we interpret the system, Equation 7, as a network (graph): the matrix A can be interpreted as the adjacency matrix of the cell state network. This is a weighted directed graph in which cell states correspond to the graph’s nodes and a link from state Xi to Xj exists where a transition is possible, that is, when . The value of also denotes the link weights (diagonal elements of A can be considered as self-links). Now, we note that Equation 7 is linear and cooperative, that is, the off-diagonal elements of matrix A are non-negative, and for such systems more simple and intuitive conditions for homeostasis exist (Greulich et al., 2019), based on a decomposition into the network’s Strongly Connected Component (SCC). An SCC is a sub-graph that groups nodes which are strongly connected, that is, which are mutually connected by paths (more accurately: two nodes, Xi and Xj are strongly connected if there exists a path from Xi to Xj and from Xj to Xi on the network). An example of such a decomposition, which yields an acyclic condensed network that contains SCCs as nodes and directed links between them, is shown in Figure 1.
The stability of systems like Equation 7 is then determined by the dominant eigenvalues of each strongly connected component , for where is the number of SCCs, and their topological arrangement (the Perron-Frobenius theorem assures that for adjacency matrices of SCCs of cooperative systems, a unique, real, maximal eigenvalue exists, which is the dominant eigenvalue [Arrow, 1989; Greulich et al., 2019]). In brief, according to Greulich et al., 2019, the conditions for existence of a homeostatic state are that, at the apex of each lineage (the condensed cell state network), there must be an SCC with dominant eigenvalue , while all SCCs downstream of the former must have (see detailed discussion in the Appendix, 'Conditions for homeostasis'). Given this structure of homeostatic models, we can define two compartments in the cell state transition network: (1) the (self-) Renewing compartment (), which is the SCC at the apex of the lineage tree; and (2) the Committed compartment (), which consists of all SCCs with , that is, those downstream of the apex SCC. Importantly, cells in states forming have the potential to return to any state within the same compartment and this population maintains itself. Instead, the cell population in would vanish without external input, since the combined dominant eigenvalue of all those SCCs is negative (it is the maximum of all SCCs’ ), thus the progeny of each cell in the committed compartment will eventually be lost. We can thereby classify cells as being of a (self-)Renewing type (R) if their state is within , and of a Committed type (C) if their state is in . With this coarse-grained classification, a generic homeostatic model can be represented in terms of compartments and as,
where the symbols above arrows are the effective rates of those events, denoting the average frequency at which they occur (loss events are not explicitly included, since they can be approximated by a short lived state in , as ). To be compatible with a homeostatic condition, it is further required that (i) the R-population remains on average constant (), that is, , and (ii) the loss rate of C must exceed its proliferation rate (), that is, . Figure 1 shows how a generic homeostatic cell state network can be condensed into an effective model of renewing and committed cell states, according to Equation 9. It has to be noted, however, that the events depicted in Equation 9 are not Markovian, that is, the timing of events is not independent from each other and depends on their history. Thus, the ‘rates’ , , , and γC are not constant rates in the Markovian sense, yet we can define them by the mean frequency of events occurring (see Appendix 'Approximation of generic GIA models' and 'Asymptotic clone size distributions: mathematical analysis').
The formulation in terms of renewing and committed states can help us to gain insights into potential behaviors of generic homeostatic cell fate models. In particular, we define generalized asymmetric divisions as events of the type , and generalized symmetric divisions as events of the type (symmetric renewal) and (symmetric commitment). With these definitions, we can categorize homeostatic cell fate models into two classes: Generalized Invariant Asymmetry (GIA) models are those which only exhibit divisions in the renewing compartment, while Generalized Population Asymmetry (GPA) are models for which such restriction does not hold. We note that the two classes are equivalently characterized by a conservation law: For GIA models, the number of cells in is strictly conserved, while for GPA models, no such conservation law holds. Since is necessary for conservation, the only possible conserved cell states in homeostasis are those in . Naturally, the previously discussed IA model is a GIA model and the PA model is a GPA model. Notably, the DH model (Equation 2) is of the GPA category, since in that model S and D cells form a single SCC at the apex of the lineage hierarchy, and thus they are both part of . Therefore, a division in the DH model, which is asymmetric in the conventional sense, corresponds to in terms of compartments (Equation 9) and thus it is a generalised symmetric division. According to this classification, PA and DH models are both in the same category (GPA), and indeed, both predict the same type of clone size distribution, an Exponential one (Greulich and Simons, 2016).
To check whether the correspondence between model class, GIA vs. GPA, and predicted clonal statistics holds in general, we analyze the clonal dynamics numerically, by generating and testing a large number of random stochastic models, implemented via random generation of the parameters λi, ωij, γi and . To simulate clones, we perform stochastic simulations based on the Gillespie algorithm (Gillespie, 1977), assuming a Markov process following the rules of Equation 3-5. We run, for each model, a large number of simulations with initially one cell in the compartment , thus the cell population of each simulation run represents one clone. Then we sample their outcomes, the total cell numbers per clone (the clone size) , to obtain predictions for clonal statistics, namely the frequency distribution of clone sizes (clone size distribution) and mean clone sizes (see Materials and methods).
We first study the mean clone size of surviving clones (with ), , shown in Figure 2, respectively, for the GIA and GPA models, as a function of time (the final time where is the minimal process rate, ). We note that indeed a common behavior is seen in each case. While for every simulated GIA model, saturates at a plateau value, it steadily increases for every GPA model. This is expected, and can be understood given that clones in a GPA model can go extinct while those in a GIA model not. Assume that there are initially a large number of clones, such that the total number of cells is . Since the system is homeostatic, it will reach a constant steady state after a sufficient amount of time, meaning that the mean clone size is . If no clones go extinct, as in GIA models, is constant and thus approaches a constant. However, in non-conserved multi-type branching processes, as GPA models are, the clone number decreases through progressive extinction of clones (Haccou et al., 2005), and therefore increases, despite the cell population as a whole staying stationary.
The resulting clone size distributions for the two model classes are shown in Figure 3. Here, clones sizes are rescaled by the mean value and compared to an Exponential distribution of unitary mean (green curve). As conjectured, all simulated GPA models shown in panel (b) predict asymptotically the same rescaled clone size distribution, namely a standard Exponential distribution. Deviations exist for small times and small clone sizes, but these deviations vanish in the large time limit (details on the convergence are shown in the Appendix, 'Analysis of the generalized Population Asymmetry model'). This means that different models within the GPA class cannot be distinguished in the long-term limit, since they differ only by the mean clone size, which is a free fit parameter. In analogy to statistical physics, we can categorize them as a universality class (Klein and Simons, 2011), meaning that the details of the model do not affect the (scaled) outcomes for assymptotic conditions, which is a form of weak convergence of random variables (Billingsley, 1968). However, the same cannot be said about the GIA models. In fact, we see all kind of shapes in the clone size distributions, both peaked distributions and non-peaked ones, and in fact, some distributions are even close to an Exponential form, and can thus not be distinguished from GPA models. The question is whether we can yet find other parameters for which, when large, also GIA models exhibit universality, that is, yield the same rescaled clone size distribution. For this purpose, we will in the following sections develop a deeper theoretical understanding of the model classes.
To obtain a deeper understanding of the numerical results, we study the cell fate models in terms of the compartment representation, Equation 9. In this representation models are not Markovian, yet we can study their Markovian counterpart, as an approximation. While this is not expected to yield accurate clone size distributions in general, the limiting distributions of non-Markovian processes are commonly well estimated by their Markovian counterparts.
For GIA models, which only feature transitions between the renewing compartment, , and the committed compartment, , a corresponding Markovian model reads,
in which represents a single state in and in , and symbols at arrows are the process rates. The number of cells in , , is conserved, that is, given an single -cell initially, it always remains at . Thus, we only need to consider the dynamics of cells in , . This Markov process can be solved analytically, and for sufficiently large steady state mean number of -cells, (see Appendix, 'GIA0 test case: steady state distribution and limiting behavior'), the rescaled distribution of cells in is,
in which , and and is the Gamma function (Abramowitz and Stegun, 1972). We note that this distribution exhibits a large variety of shapes: for large the distribution is peaked, while for small is loses its peak. Notably, for and , the distribution becomes Exponential and in this case it cannot be distinguished from the GPA case. On the other hand, for , that is, when the ratio of asymmetric divisions over the loss rate is high, this distribution tends to a Normal distribution with unitary mean and variance equal to . These different behaviors are graphically shown in the Appendix (see Appendix 1—figure 6, 7 and 8).
For the GPA models, a Markovian approximation reads, accordingly,
whereby for homeostasis to prevail, and must hold. We note that the dynamics of are independent of and thus for the number of cells in in homeostasis holds
which corresponds to a simple continuous-time branching process with two offspring, for which it is known that the resulting distribution of cell numbers is Exponential, that is, , where is the mean number of cells in the surviving clones (Haccou et al., 2005).
cells produced according to 12 follow the same fate as in the two-state GIA model above. While it is not assured that the distribution of cells is identical to that of Equation 11 (due to simultaneous production events of type ), we show in the Appendix, 'Asymptotic clone size distributions: mathematical analysis', that for large rates of production of C-cells, the distribution of C-cells – here: cells in state – attains a Normal distribution with mean equal to its variance . As each cell contributes independently to the production of -cells, we have that . Crucially, this means that in terms of the rescaled variable the standard deviation vanishes for large times, since . Hence, given fixed , can be approximated by a constant random number . Therefore, the rescaled distribution of the total number of cells is , where . Thus, the rescaled distribution of the total clone size, , is as well an Exponential.
For generic GIA or GPA models, the compartment representation, Equation 9, is not Markovian and one would not expect exactly the distributions we found in the previous section. Fortunately, the limiting distributions of non-Markovian processes and their Markovian counterparts are often, under certain conditions on the parameters, the same. While we reserve the technical arguments for the Appendix ('Asymptotic clone size distributions: mathematical analysis'), we note that this independence of the limiting distribution on the Markov property related to the central limit theorem, which does not rely on the Markov property.
To identify the correct limiting parameters for more complex cell fate models, we need to express the effective non-Markovian rates (i.e. the mean frequency of events) of representation nine in terms of the original model, 3–5. As discussed in the Appendix ('Approximation of generic GIA models' and 'Asymptotic clone size distributions: Mathematical analysis'), we identify those effective rates by the total rates of cell divisions, , , and where, for each compartment, is the probability of a single cell being in state Xi of , respectively ( are the solutions to Equation 6). In the Appendix, 'Asymptotic clone size distributions: mathematical analysis', we reason that all GPA models are expected to generate Exponential clone size distributions for large times t. This is indeed what is observed in Figure 3(b). Correspondingly, for GIA models we expect that for large the clone size distribution of GIA models would tend to a Normal distribution. To test this prediction, we simulated the same GIA models as for Figure 3 before, but we tuned parameters in such that the effective parameter becomes large (see details in the Appendix, 'GIA model for large '). The result is shown in Figure 4: for an illustrative case shown in panel (a), increasing changes the distribution from an exponential form to a peaked form akin to a Normal distribution, and for all simulated random GIA models, shown in panel (b), a Normal distribution is approached when becomes large.
We note that when taking the limit of large , as shown in Figure 4, also all other process rates ωij with i,j within increased as well. What if instead some process rates in do not scale to become large with ? To assess this situation, we studied a simple test case similar to model 10 but containing two states in , connected via direct state transition (see Appendix, 'GIAB test case: bimodal distribution'). As discussed there, if all rates within are large compared to the rates in then indeed we observe a Normal clone size distribution, as expected. However, if the direct transition rates between the states of are smaller or of equal magnitude as γC, and in addition, one of the two division rates is higher then the other, then we observe a bimodal clone size distribution. The reason is that if the transitions between the two states in are rare compared to the life time of cells, , they become essentially separated and each of those states generate separate Normal distributions with different mean (due to different cell division rates in those two states) which, when overlaid, generate a bimodal clone size distribution (see detailed arguments in the Appendix, 'Asymptotic clone size distributions: mathematical analysis').
Finally, from those considerations follows:
GPA models attain an Exponential clone size distribution for time .
GIA models attain a Normal clone size distribution if all process rates within are much larger than the inverse lifetime of C-cells, γC.
Hence, the GIA and GPA model classes, each represent a universality class, that is, a scaling limit exists in which all models of the same class yield the same rescaled clonal statistics.
Our analysis shows that intrinsic limitations exist for identifying strategies of stem cell self-renewal through clonal data from cell lineage-tracing experiments. This is due to different models of cell fate choice generating the same type of clonal statistics (clone size distributions), so that model inference based on clonal statistics – currently still the most prevalent method to determine stem cell self-renewal strategies – fails to distinguish them. The feature that different models asymptotically generate the same statistics is a form of weak convergence of random variables (Billingsley, 1968) and corresponds to universality, as known from statistical physics.
Cell fate models can in principle be very complex, with a plethora of cell (sub-)types in a tissue. We introduced a new categorization of cell types, distinguishing between cell states that are committed (C-cells), whose progeny is inevitably lost eventually, and non-committed or (self-)renewing cell states (R-cells), which retain the potential to remain or return to the apex of the lineage hierarchy. According to this categorization we classified generic models of cell fate choice as Generalized Invariant Asymmetry (GIA), if only generalized asymmetric divisions of the form occur for R-cells, and Generalized Population Asymmetry (GPA), when all kind of divisions can occur, as long as gain and loss of R-cells are balanced. Models of the GIA category are also characterized by a conservation law, since the number of R-cells is strictly conserved, while GPA models do not exhibit such a conservation law.
We found that the classification in GIA and GPA models mirrors the clonal statistics generated by them: models of the GPA class all generate clonal statistics which with time converge to an Exponential clone size distribution. Thus, two GPA models can therefore not be distinguished through clonal data, once some time has passed after induction of clones. For GIA models, distributions can generally vary, but if the rates of divisions and transitions in the compartment are much larger that the rate of cell loss, the clone size distribution of all those models becomes a Normal distribution. In that case, two GIA models can not be distinguished by the clonal data. While here we do not explicitly consider cell-extrinsic regulation of cell fate, this kind of regulation does not affect long-term clone size distributions, except when cells are arranged one-dimensionally (Klein and Simons, 2011; Bramson and Griffeath, 1980). Thus, our results cover cell dynamics in most renewing tissues, such as epithelial sheets or volumnar organs, but not (quasi-)one-dimensional arrangements of stem cells, as found in the seminiferous tubule, or in intestinal crypts, where clonal statistics may differ. Hence, our analysis shows that models of cell fate choice cannot in general be distinguished with further resolution beyond the R vs. C categorization of cell types. The universality of the model dynamics also shows that effective, simplistic models are often equally accurate to model experimental data, yet with a higher statistical power due to less free parameters.
While at first glance, this analysis seems to discourage efforts to unravel details of cell fate dynamics, room remains in regimes where the limiting conditions for asymptotic distributions are not fulfilled. In particular, if fast cycling committed progenitor cells are present, while stem cells are slow cycling, then the condition that the division rate of R-cells is much larger than the cell loss rate is not fulfilled. In that case, details of the model dynamics may affect the shape of the clone size distribution and thus allow distinction between models. However, caution should be given when an Exponential clone size distribution is observed, since this could indicate either a GIA model with high activity of committed progenitor cells, or a GPA model. In that case, the mean clone size needs to be consulted to distinguish models (see Figure 2). Differentiating between models within the GPA category is more difficult, since the predicted statistics from different models always become more similar over time. Short-term measurements would in principle allow such a distinction, but since in reality the underlying processes are not truly Markovian (as assumed for the modeling purpose) they are not necessarily a good representation of the real cell dynamics at short times. At long times, however, Markovian approximations are increasingly accurate, precisely because of the feature of universality.
How could the resolution of cell fate modeling be improved? The state-of-the-art approach to determine cell fate trajectories is via analysis and modeling of single-cell RNA-sequencing (scRNA-seq) data. However, many limitations to this method exist, discussed in Weinreb et al., 2018, and neither reversible trajectories nor the modes of cell division, such as asymmetric vs symmetric divisions, can be inferred. Intravital live imaging, on the other hand, allows to trace individual clones over time (Ritsma et al., 2012; Pittet and Weissleder, 2011; Hara et al., 2014; Rompolas et al., 2016), and thus can obtain details of cell fate trajectories, yet this technique is limited to few tissue types which are accessible for invasive long-term imaging. Nonetheless, while each of those experimental assays alone is prone to limitations in defining self-renewal strategies, advanced model inference schemes, that integrate data from different experimental sources, might be the way forward in the future to finally reveal the details of stem cell self-renewal strategies.
The numerical analysis of the random cell fate model was implemented in Matlab. The description of the stochastic models definition, the random model generation and the simulation campaign is detailed in the Appendix, 'Stochastic process modelling'. Additionally, as a validation of the implemented simulator, based on the Gillespie algorithm (Gillespie, 1977), the IA and PA models were simulated and the results analyzed in the Appendix, 'Invariant Asymmetry and Population Asymmetry models'.
Analytical solutions were partially obtained using Mathematica.
Here, we ‘translate’ the generic conditions for the existence of a Lyapunov stable stationary state for Linear Cooperative Systems (LCS) (Greulich et al., 2019) into the biological context of clonal dynamics. A linear cooperative system is one of the form where are functions of time t and A is a constant m × m matrix for which all off-diagonal elements are non-negative (the latter condition defines the cooperativity of the system) (Hirsch and Smith, 2006; Greulich et al., 2019). We note that the dynamics of mean cell numbers, Equations 6 and 7 in the main text, indeed describe an LCS according to this definition. Now we use the following definitions:
is the graph of A, that is, the graph for which A is the adjacency matrix, whose elements aij give the weight of the links from i to j ( means that no link exists). In the following, we use the terms graph and network synonymously.
If in there exists a path from node i to node j and from j to i, then we call those nodes strongly connected, , which is an equivalence relation. A maximal set of nodes which are are strongly connected with each other are called a Strongly Connected Component (SCC) of the graph (the equivalence class of the equivalence relation ‘≡').
The graph can be decomposed into its SCCs, Sk, (Cormen, 2009), which are sub-graphs associated with an adjacency matrix , such that . Since the have non-negative off-diagonal elements, they are Metzler matrices for which the Perron-Frobenius theorem ensures that a unique, simple and real maximal eigenvalue exists (Arrow, 1989). The eigenvalue is called the dominant eigenvalue of Sk. Associated with this eigenvalue, there is, for all , a positive eigenvector , that is, one with all entries .
The condensed graph of is the graph where nodes are the SCCs of and a link from SCC Sk to SCC Sl () exists if there is is at least one link from a node (in ) in Sk to a node in Sl.
If there is a path from SCC Sk to SCC Sl, then we call Sk upstream of Sl and accordingly Sl downstream of Sk. We note that there can never exist paths from Sk to Sl and from Sl to Sk, since otherwise, by definition, their nodes would be strongly connected and both together would form a single SCC (Cormen, 2009). Thus, there is a unique hierarchy of SCCs.
A stationary state of a dynamical system is Lyapunov stable if a small initial deviation from leads to a small final deviation (i.e. is not unstable). More accurately: there exists a constant such that for all times t, where is the initial condition, sufficiently close to . A stationary state of a linear system that is Lyapunov stable, yet neither asymptotically stable nor has a limit cycle, is neutrally stable.
Homeostasis means that the cell numbers in each state, , stay on average constant, (where ), and that this state is not unstable towards perturbations. This condition corresponds to a Lyapunov-stable stationary state. Note that a linear system, as the one described by Equations 6 and 7, main text, cannot have an asymptotically stable state except for the trivial state , which corresponds to a vanishing cell population. We note that when considering the tissue cell population as a whole, dynamics can be non-linear through interactions between cells and a non-vanishing asymptotically stable state may then exist. However, since single clones do not significantly affect the total configuration of cells in a tissue, the clones compete neutrally, when embedded in a homeostatic cell population, which corresponds to a Lyapunov stable, but not asymptotically stable state. We therefore use Lyapunov stability, a weaker form of stability, to define homeostasis, since an asymptotically stable vanishing state is not a biologically viable state.
Now, for an LCS holds, according to Greulich et al., 2019,
An LCS, , possesses a non-trivial Lyapunov stable stationary state (), if and only if,
G(A) does not contain any SCC, Sk, with .
There is at least one SCC, Sk, with .
There is no path between any two SCCs, Sk and Sl, which have and .
All nodes i upstream of an SCC Sl with must be empty in the the stationary state, that is, , if i is upstream of the SCC Sl.
Since Equation 7, main text, is an LCS, we can apply theorems 1 and 2 to find conditions for homeostasis, defined by a Lyapunov-stable configuration of mean cell numbers . According to theorem 1 at least one SCC with must then exist, and according to theorem 2 the stationary state of nodes upstream of it must be empty, that is, they do not exist in homeostasis. Since the condensed graph of the SCCs does not have cyclic paths, an SCC Sk with must therefore always reside at the apex of all non-vanishing cell types. In principle, an acyclic graph may have more than one apex, however, since, by definition, a stem cell clone always starts with a single stem cell, and no other SCC with may be downstream of the latter, we only consider one apex SCC with one initial cell when studying clonal dynamics.
Hence, in the context of homeostatic clonal dynamics, we can assume that there is a single SCC, Sk with at the apex of the cell state graph, while all other SCCs, Sl are downstream of it and have . Since there are no paths from the non-apex SCC to the apex SCC (as the condensed graph is acyclic) we can distinguish the two separate compartments (the renewing compartment) consisting of all nodes of the apex SCC, Sk, and (the committed compartment), consisting of all other nodes, whereby due to for all SCCs in , all progeny of cells in will vanish in the long term.
Since clonal dynamics start, by definition, with a single cell, we use stochastic dynamics to model clones. Thus, we model cell fate dynamics as a continuous-time multi-type branching process (Haccou et al., 2005), a Markov process following the rules of Equations 3-5, main text. As shown later, without losing generality, here only two types of events are modeled; considering an arbitrary number m of cell states, Xi, for , the model includes
Cell divisions: a cell in state Xi divides in two cells with rate , respectively in state Xj and Xk at a ratio .(1)
where if state Xi does not allow division. In this formulation of cell division events, which we use for the generation and numerical simulations of random models, only one division outcome is possible upon division of a particular cell state Xi. Nonetheless, multiple division outcomes per state can be implemented as single outcomes if additional metastates are introduced, which represent priming of a state Xi towards a certain division outcome option. For example, if in the original model, state Xi has different outcome options, , we can substitute this by, first, transitions from Xi to (new) states and subsequent divisions . The use of metastates to model more complex processes is discussed in detail in 'Population Asymmetry model using metastates'.
Direct state transitions: a cell in state Xi changes to state Xj at a given rate ωij.(2)
where means that no transition from Xi to Xj is possible. Additionally, we include cell loss in this scheme, by treating it as a transition to an additional special state, called hereafter death and denoted by (cells in this state do not enter in the counting of the total number of cells). In that formulation, the loss rates of the original model are .
These events define a Markov process, which can be represented as a stochastic network (Bang-Jensen and Gutin, 2007). In this view, each node can be related to a cell state, while the links represent transitions between states via cell divisions and the direct state transitions. It is noted that this stochastic network is different from the network defined in the main text and in 'Conditions for homeostasis' of this SI, which describes the dynamics of mean cell number instead. Here, for the stochastic modelling, let us define the adjacency matrix of this network, through the elements , in which are the total transition rates as defined in the main text. We note that is related to the matrix A used in the main text by , where is the diagonal matrix with entries , as defined in the main text, with the slight difference that here the loss state is treated as a separate state. Additionally, it is remarked that in this model interpretation, where only one division option for each state is possible, the term is not a continuum value, but instead it can only take the values depending on the specific outcome of the division of the cells in state Xi. Notably, more than one stochastic network may result in the same matrix , therefore, to uniquely define a process, we distinguish a matrix D which describes cell division events (note that this is possible with just a single matrix as there is only one division option per state) and a matrix which describes direct transition events. The matrix is the sum of both, .
To test the behavior of the clonal dynamics in a generic homeostatic model, a large number of random stochastic networks was generated, whereby each stochastic network corresponds to a distinct set of parameters for the stochastic stem cell fate choice model. The strategy detailed below is based on the following considerations which summarize the key requirements to achieve homeostasis detailed in 'Conditions for homeostasis': (a) each network is composed of Strongly Connected Components (SCCs) that are randomly connected; (b) only one SCC, the one at the apex of the network, forms the renewing compartment, , (i.e. it is characterized by a dominant eigenvalue with respect to A) and all the others form the committed compartment, , (i.e. they are characterized by a dominant eigenvalues ). It is further noted that the SCCs of the stochastic network are the same as those of the matrix , where defines the dynamics of mean cell numbers. This is, since transposition of an adjacency matrix and altering of diagonal elements does not affect the network topology.
To generate the stochastic network, a two-step process is followed: (1) a large number of (random) SCCs are generated; (2) a condensed network is randomly constructed and filled with randomly picked SCC from step 1.
It is noted that unitary rates are assumed in step (1) and they are successively randomly modified in step (2) to achieve the desired properties of the dominant eigenvalue μ while ensuring randomness.
Focusing now on step (1), that is, the generation of single SCCs, the following procedure is used.
The total number of states composing the SCC is defined, indicated as . An additional state is added to represent whatever is outside the SCC. In the current analysis, we set .
We build separately all the possible combinations of transition and division matrices, indicated hereafter with and , respectively. These matrices are ordered for increasing number of transitions and divisions . In case GIA networks are generated, the and combinations are filtered, to remain just with those where the division outcome is one cell inside the SCC and one outside the SCC, and where there are only transitions between states within the SCC (i.e. where cell numbers are conserved). From a computational point of view, this process is feasible up to .
The matrices stored in and are then combined together to form a model (which is completely defined by one matrix in and one in ); indicates the pool of possible models. This process is done considering separately each , and . In this step, due to technical limitations given by the high number of possible combinations, if the total number of combinations exceed then only 104 random matrices from and are combined.
Each model in is then processed to check if the corresponding network is a SCC in the first states. If not, then this model is discarded. In case GPA networks are generated, a further check is performed to discard also those models consistent with a GIA network (they cannot be a priori excluded as done in point 2 for the GIA ones). These pools of models are indicated as and for the GIA and GPA models, respectively.
For each SCC in and , the dominant eigenvalue μ is estimated. For construction, the generated GIA networks are all characterized by , while in general any value can be obtained within .
The SCCs in are additionally processed to check whether the network is compatible with homeostasis by tuning the rates. Networks satisfying this condition are additionally stored under a new pool of SCCs, called . If not, then they are discarded when (i.e. for any combination of rates the number of cells in these networks is expected to grow).
This process results in three pools of SCCs classified for , and (i.e. number of states, transitions and divisions): (1) contains GIA models; (2) contains GPA models that can be tuned to have and (3) contains GPA models characterized by or that can be tuned to meet this condition.
In step (2), the generation of random networks starting from the individual SCCs is implemented as follows.
A number of committed SCCs, , between 1 and 3 is randomly chosen.
SCCs are randomly picked from the pool of models . The selection is done considering equal probability in , and . For each SCC, the unitary rates α (where α stands for any rate λi or ωij) are modified by multiplying them for random numbers (exponentially distributed with mean and minimum ). Additionally, a threshold on the dominant eigenvalue is set, ; if this condition is not satisfied, then the rates are tuned to meet this requirement while maintaining the rates above the minimum.
The committed compartment of the condensed network is generated by randomly connecting all the outgoing components of the -SCC with states in the -SCC for . In this way, the transposed adjacency matrix of the stochastic network has triangular block form:(3)
The last SCC is forced to be linked to a single death state.
With a similar procedure described in point 2, two SCCs are randomly picked respectively from the pool of SCCs in and ; the unitary rates are modified (exponentially distributed with mean and minimum ) and, in the GPA case, tuned to meet the condition . They represent the renewing part of the network.
Two networks (one for the GIA and one for the GPA models) are produced by attaching the selected renewing network upstream the committed one; this is done based on an analogous procedure as described in step 3.
At the end of this process, we have two networks which are different in just the renewing part, being one consistent with the GIA model and the other with the GPA one. In total 2000 networks were built and analyzed.
An extensive simulation campaign was run to model the clone dynamics. The code implemented to numerically simulate the stochastic process defined by events of type 1 and 2 is based on the Gillespie algorithm (Gillespie, 1977). Since a clone is by definition the progeny of a single cell, we choose as initial condition a single cell put randomly in a state within . Concerning the final condition, given the substantial difference in the dynamics in the two models, the final time, indicated by τ, is set equal to 20 times the inverse of the minimum process rate, , in the GIA models, and to the time at which the fraction of extinct clones reaches 98% in the GPA models. Note that all critical branching processes, as homeostatic clonal dynamics are, will go extinct almost surely at some point in time (Haccou et al., 2005).
To determine the clone size distribution, 103 and simulations were run respectively in for each GIA and GPA model (in this way, both models result in the same final number of clones when 98% extinction is taken into account).
To validate the simulation approach, we tested the procedure on simple cell fate models for which analytical results are known, the Invariant Asymmetry (IA) and Population Asymmetry (PA) models. As described in the main text, in the simplest version, these are defined as,
In these processes, cells of type S represent the stem cells (called hereafter also progenitor), which divide with stochastic rate λ, and cells of type D are the differentiated cells, which are shed with rate γ. While in the PA model the three possible outcomes of the division of a progenitor are controlled by a probability parameter , in the IA model r = 0, meaning that there are strictly asymmetric division and the number of S-cells is conserved. It is remarked that in the definition of the stochastic networks given in 'Model description' only one division option for each state is modelled; however, the code implemented for the numerical simulations of the stochastic process allows for an arbitrary number of division options for each state as well (see 'Population Asymmetry model using metastates').
Considering the dynamics at tissue level, the system of ODEs describing the average number of cell and respectively of type S and D is,
It is clear that, on average, the number of S-cells remains constant. Additionally, in homeostasis, the average total number of D-cells stabilizes around a constant value that uniquely depends on the number of stem cells, which equals the initial number of stem cells , Thus, the (Lyapunov stable) stationary state of total cell numbers is given by,
Based on Equation 6, the process rates λ and γ determine the proportion of cells of type D with respect to cells of type S. Importantly, there is no difference at tissue level between the IA and PA models.
A distinction is instead evident when we look at the dynamics at the single-cell level, and study the clone size distribution, that is, the distribution of the progeny of a single cell. For the IA model, the number of S-cells is strictly constant, and thus the joint probability distribution of both S-cells and D-cells, respectively indicated as and , is fully determined by the distribution of D-cells, . The IA model’s master equation for , considering a single initial cell of type S, is given by,
This corresponds to a simple birth-and-death process for which the distribution is Poissonian with mean , (Van Kampen, 1981).
Considering now the PA model, the master equation is instead given by,
In Antal and Krapivsky, 2010, an exact result for the distribution of total cell numbers is found when and . For different values of the process parameters, the long-term distribution is shown to be Exponential.
Numerical simulations for the clonal dynamics were run, considering the above models and three different sets of test parameters each, indicated as IA# and PA#i for , which are reported in Appendix 1—table 1. It is noted that the time unit is arbitrary and therefore omitted. Simulations are based on 104 and runs respectively for the IA and PA test cases. The initial condition is a single stem cell and the final simulation time, indicated as τ, is equal to 10: this value is well representative of a steady state condition (for the IA test cases) and at which the total extinction of the process is not yet achieved (for PA test cases only). The clone size distribution at τ in the IA test cases is shown in Appendix 1—figure 1: in this figure, each profile is compared to the corresponding Poisson distribution shifted by one (i.e. plus the stem cell). Concerning the results for the PA test cases, they are shown in Appendix 1—figure 2. In this case, the profiles are compared to the numerical integration of the master Equation 8. Additionally, for the PA# test case, where and , the reference analytic solution provided in Antal and Krapivsky, 2010 is also shown. In general, a good agreement is obtained in all of the cases.
As argued before, we assume in the random model generation that cell division in state Xi has a unique outcome, (Equation 1), since thereby the stochastic process can be uniquely defined by the two matrices D and . To accommodate for the possibility of different division outcomes from the same state Xi, as in Equation 4 and Equations 3-5 in the main text, we introduce metastates, which represent short-lived states that indicate priming for either outcome, from which the cell division outcomes are unique. This is a small modification of the original model, which, however, does not lead to significant deviations if the metastates are traversed sufficiently quickly (which can be assured by a choice of high direct state transition rates in the metastates).
To illustrate this, let us consider the PA model described by 4; instead of having three different outcomes upon division of an S-cell we define the corresponding Metastate (MS) model with three primed states, , as,
in which S and D correspond to the same cell type of the PA model (i.e. the stem and the differentiated cells, respectively), while , for , represent the metastates. These states are temporary states that are used to model each one of the three different possible division options of the S-cells. The rates λi and , for , are chosen such that the time scales of division and outcome probabilities are the same as in the original PA model:
Equations 10 assure that outcome probabilities are the same as in the original model, while Equations 11 are needed to have the same total average time between two consecutive events. As there are six unknowns and only five relations, the following additional equation is added
in which is an additional parameter that is used to control how fast cells in metastate divide. Low values of imply that as soon as an S-cell transits to the metastate , it divides in two S-cells. Globally, this results in
Numerical simulations for the two models were run and compared, based on the parameters reported in Appendix 1—table 1, and specifically the PA#1 and PA#3 test cases. The time unit, which is arbitrary, is omitted. The process rates for the corresponding MS model, which are indicated in the figures as MS#1 and MS#3, are computed based on Equation 13 and . As well as for the PA test cases, the initial condition is one cell of type S and the final time, τ, is equal to 10; simulations are based on trajectories.
The mean number of cells in the surviving clones and the extinction probability as function of time (scaled by τ) are shown in Appendix 1—figure 3. The clone size distribution at τ is shown in Appendix 1—figure 4. Both MS simulations agree very well with the corresponding PA ones, which justifies the use of metastates for our simulation campaign.
A simple Generalized Invariant Asymmetric model, indicated hereafter as GIA0, was analyzed to identify the causes of the different clone size distribution behaviors observed in the randomly generated models (see main text). Thus, in this section, we study the Markov process defined by,
Here, the renewing compartment is composed of just a single state and cells in this state asymmetrically divide with rate . The committed compartment is formed of state ; cells in this state can either divide to duplicate, with rate , or die, with rate γ. It is noted that for , this model is reduced to the previously analyzed Invariant Asymmetric (IA) model (see 'Invariant Asymmetry and Population Asymmetry models').
As for the IA model, here the number of cells in state , indicated as , is conserved. It is therefore sufficient to determine the statistics of , defined by the master equation for , the probability of having cells in state , provided that there are cells in state . The master equation is given by,
also written as,
in which and . Considering that we are interested in clonal dynamics, meaning that we start from a single stem cell, is equal to one.
In this simple case, the steady state distribution , corresponding to the solution of , can be analytically derived. Defining the net flux between states and as
and considering that for every , it follows that , which means that
where is the steady state probability of having 0 cells in state . Finally, by applying the conservation of the total probability, , and rearranging the terms we obtain,
In the main text, we defined the dimensionless parameters and , representing the rescaled division rates for cells in state and , respectively. For clarity and readability, in this section, we simplify the notation using and . Equation 19 is then rewritten as,
It is noted that while p varies between 0 and , q is defined between 0 and 1.
The mean number of cells in each state, indicated respectively as and , satisfies the system of ODEs
Based on this, the steady state average number of cells is
When the mean number of cells in state is sufficiently large, that is, for large p or in case q is close to one, the discrete distribution given by Equation 20, can be approximated by a continuous probability density function , given by,
To better understand the distribution for different values of the parameters p and q, the limit behavior are analyzed below.
When , Equation 20 can be simplified considering that
Thus, the distribution results in
that is a Poisson distribution with mean equal to p. This agrees with what we were expecting considering that when the model is reduced to the IA model for which the distribution in is known to be poissonian.
Additionally, for large mean number of cells, which are obtained for large p (when , then ), the Poisson distribution tends to a Normal distribution with mean and variance equal to p. Therefore,
Rescaling the distribution, and considering , results in
that is a Normal distribution with unitary mean and variance equal to .
For the steady state mean number of cells and Equation 23 holds. This equation can be rewritten as,
If the Stirling’s approximation is applied
Considering now that
it follows that
that is a Gamma distribution with unitary mean and shape parameter given by p. Importantly, the Gamma distribution for tends to a Normal distribution with unitary mean and variance 1/p. For , it corresponds instead to an Exponential distribution with unitary mean.
When p is large, the mean number of cells is large for any value of q. Thus, Equation 32 is valid. By applying the Stirling’s approximation also to the term , we obtain,
This expression can be also rewritten as,
Considering now that p is large, then , so the term on the right can be neglected. Additionally, for the following expansions can be applied:
Finally, if we consider that
then Equation 36 results in
that is a Normal distribution with unitary mean and variance equal to .
Importantly, it is noted that the limiting behavior of for and in case of large p, are both consistent with the results obtained for and any q. In other words, remembering that and , the steady state distribution for and any value of is a Normal distribution of unitary mean and variance equal to .
To globally verify these results, numerical simulations of the stochastic process associated with model 14 for different values of and were run. The following curves were compared:
Stochastic simulation: distribution at the final simulation time, τ, of the number of cells in state . The final time was chosen here as , where ; this value is well representative of a steady state condition. Furthermore, the process rates considered are based on a unitary γ (i.e. , and ). It is noted that the time unit is arbitrary and therefore omitted.
Analytic distribution: based on Equations 20, for low mean values, and 23, for large mean values.
Approximate distributions: Poisson, Gamma and Normal distributions respectively given by Equations 27, 34 and 40.
The tested parameters and are graphically shown in Appendix 1—figure 5 a contour map showing the expected steady state mean number of cells over the -parameter plane. The curves from the numerical simulations and the corresponding exact and approximated solutions are shown in Appendix 1—figure 6, Appendix 1—figure 7 and Appendix 1—figure 8: the tested conditions are divided into three groups (one figure each) representing the limiting behaviors discussed above. Generally, analytical and numerical results agree very well. This also demonstrates that GIA models can show both peaked and non-peaked distributions, depending on the model parameters.
As shown in the main text, a generic GIA model can be expressed in terms of the compartments and (Equation 9 in the main text). We note that the the GIA0 model discussed in the previous section corresponds to the general compartment dynamics of GIA models, Equation 9, main text, if the dynamics of compartments are assumed to be Markovian. Thus, we can treat the GIA0 model as a Markovian approximation of generic GIA models. In this section, we test this approximation numerically.
To this end, we first wish to relate the effective (non-Markovian) rates and γC of a generic GIA model to the rates of the Markovian approximation, the GIA0 model. We refer to this model – the GIA0 model matched to the effective rates of a particular more complex GIA model – as the equivalent model to the latter. The equivalent rates , and γC are computed considering the same steady state condition in terms of mean number of cells. To this aim, we rewrite the dynamics of mean cell numbers, Equation 7 in the main text, in block form as,
in which denote the vectors of mean cell numbers of states restricted to compartments , respectively, and the number of lost cells (not considered for total cell numbers and homeostasis condition). It is noted that , since there cannot be links from to . Also as we do not consider loss from (see main text for the arguments).
Thus, summing up all the components in each compartment, and , results in
The equivalent parameters are then estimated from the steady state condition and , for , as,
The applicability of this approximation was evaluated by comparing the clone size distribution obtained from the random GIA models (generated as described in 'Generation of random models' and analyzed in the main text) with that from the corresponding equivalent GIA0 model with parameters and . The values of and for all the GIA random models are shown in Appendix 1—figure 9 in the contour map of the expected mean number of cells in (in compartment there is always one single cell). In general, remains below five and is spread between zero and one. As measure of the error of the equivalent model, , we choose the maximum difference between the distributions of a particular random GIA model and that of the corresponding equivalent model, relative to the peak of the distribution of the random model. For low mean cell numbers, the distribution is compared to Equation 20; for large mean number instead, the rescaled distribution is compared to Equation 23. A threshold on the mean cell number equal to 10 was chosen to distinguish between the two cases. This relative error as function of is presented in Appendix 1—figure 10, where it is evident that large errors are obtained only for large values of this parameters. Some illustrative cases, representative of different value of , were selected and their distribution is shown in Appendix 1—figure 11, Appendix 1—figure 12 and Appendix 1—figure 13. The following considerations are made:
Two cases for are presented in Appendix 1—figure 11. In these cases, the distribution obtained from the random models agrees with the analytic solution from the equivalent model, which in turn is well approximated by a Poisson distribution. As expected, larger deviations between the equivalent model’s analytic solution and the approximation are noted for increasing values of . In general, all the models in this range are well approximated by the equivalent model.
The two cases presented in Appendix 1—figure 12 have , for which the Gamma distribution is an approximation of the equivalent model’s analytic solution. The distribution in some cases (see for instance the top figure), presents some deviations with respect to the equivalent model. However, globally a good agreement is obtained in most of the cases (failing ratio, based on a 0.5 maximum error is 21.7%).
Two cases in an intermediate range are shown in Appendix 1—figure 13. Again, the equivalent model’s analytic solution is well representative of the distribution (failing ratio, based on a 0.5 maximum error is 3.2%). It is noted that for such values of an approximation of the equivalent model analytic solution is not available.
Thus, in most of the tested cases the equivalent model is able to catch the behavior of a generic random GIA model, and thus represents a good approximation (global failing ratio, based on a 0.5 maximum error is 6%). In the cases where the equivalent model does not yield a good approximation, the internal structure of the and compartments become relevant and subsequent events that affect and become dependent on each other, and thus are non-Markovian.
To test the behavior of a generic GIA model in case of large , the GIA random models (generated as described in 'Generation of random models' and analyzed in the main text) were modified by changing the process rates associated to the renewing compartment to achieve . To this aim, considering that infinite solutions are possible, we applied a global search method, and more specifically a Genetic Algorithm (Goldberg, 1989). We therefore setup an optimization problem, where the process parameters are the optimization variables and the cost function is the error of the current with respect to the target.
The envelope of curves obtained in all the random models and some illustrative profiles are shown in Appendix 1—figure 14. A reference Normal distribution, characterized by unitary mean and variance equal to is also reported: this curve corresponds to the distribution expected in the equivalent model for which . Deviations become relevant, when the internal structure of compartments in a random model leads to subsequent events that are not independent from each other. These effects alter the variance of the Normal distribution. In fact, Figure 4 in the main text is based on the same simulation results, but in this case the rescaling is done considering both the mean number of cells and its variance (a Normal distribution is a two-parameter distribution).
In the previous subsection we increased in a way which assures that other parameters within stay of the same order of magnitude. Here, we address the question what happens if some parameters within are much smaller than parameters of , such as γC. For that purpose, we study another simple GIA model, let us call it GIAB, as a modification of the GIA0 test model defined by 14. In the GIAB model the renewing compartment is composed by two states and , instead of only one. Cells in these states divide asymmetrically (i.e. one daughter cell remains within the renewing compartment while the other enters the committed compartment) or change state between and (cell state switching) while still remaining within the renewing compartment. The committed compartment of the system is composed just by a single state, , and cells in this state either duplicate or die (as the previous state in Equation 14). This corresponds to the model
In this model, the effective parameters as defined in 'Approximation of generic GIA models', , where , , and . As before, we define the non-dimensionalized parameters and here we also define , and further the parameter ratios and . In the following, we test this model for different values of and as reported in Appendix 1—table 2, while fixing , which is our main scaling parameter, as well as and .
The rescaled distribution of the number of cells in the committed compartment (i.e. in state ), , obtained at the final simulation time τ, is shown in Appendix 1—figure 15. A value of τ equal to (where is the minimum of all rate parameters) was chosen to assure that the steady state is reached. Considering first the test cases GIAB#1 and GIAB#2 according to Appendix 1—table 2, which are characterized by (i.e. there is no difference in the division timescales for the two renewing states), they both lead to a Normal distribution, independently on the value assumed by . Test cases GIAB#3 to GIAB#7 instead are all characterized by , and different orders of magnitude for are tested. The distribution in these cases is Normal until (see cases GIAB#3 to GIAB#5); when is significantly lower than , then bimodality emerges (see cases GIAB#6 and GIAB#7). Looking at the extreme case, GIAB#7, cells in each renewing state, if analyzed independently, would result in a Poisson distribution in the committed compartment with different mean values (low for the slow-dividing state and large for the fast-dividing one). Thus, globally the distribution is in line with a bimodal distribution computed as
in which is the mixing parameter, computed as,
and the parameters and for correspond to the parameter and to the mean number of cells of a system in which the renewing compartment would be composed just by state Xi. The total mean number of cells is instead indicated by . The bimodal distribution given by Equation 45 is indicated as a black dashed-dotted line in Appendix 1—figure 15.
In the main text, it is shown that GPA models predict asymptotically, for large times t, the same rescaled clone size distribution, that is, an Exponential distribution of unitary mean.
In Appendix 1—figure 16, the 50%tile distribution of all the GPA models analyzed is shown at different levels of extinction (which are related to the different time points), showing a gradual convergence to the expected Exponential distribution.
Thus, the Markov approximation to all GPA models, Equation 12 in the main text (the equivalent model of GPA models), becomes accurate for sufficiently large t and no significant deviations are observed. This also means that for large t, the distribution is independent of the choice of parameters, since only the mean value of surviving clones, , depends on parameters, which however, does not affect the rescaled distribution in terms of . We can therefore abstain from an extended study of different parameter regimes. This is in contrast to the GIA model class where distributions depend sensitively in the choice of parameters if we are not in the scaling regime of large , and the non-Markovian nature of GIA models can become relevant, as we showed in the previous section.
In the previous two sections, we studied numerically how a Markovian representation can approximate general cell fate models (GIA and GPA) models. Here, we study from an analytical view point how generic GIA and GPA models converge to the respective limiting distributions, for large time t (GPA models) and large (GIA models).
Similar to 'Approximation of generic GIA models', we define and as the cell number vectors (here: actual cell numbers of the stochastic model, not mean cell numbers) restricted to the states of compartments and , respectively. We further define the accumulated cell numbers and in and , respectively. Considering and as observables of our compartment model, this corresponds to a Hidden Markov Model in that the dynamics of the observables are not Markovian, yet they are entirely determined by a set of states which follow a Markov process.
For general GIA and GPA models in the compartment representation of Equation 9, main text, the effective rate parameter (i.e. the frequency of cell divisions in per cell), is defined similar as in 'Approximation of generic GIA Models', yet, here we take into account that can depend on time via the – not necessarily stationary – distribution of cells within (since the process is non-Markovian). Hence, in these more general terms, we define where is the probability of a single cell to be in state i at time t. may variate after each event , as the conditional probability , provided that an event E has just occurred, differs from the stationary state distribution.
In homeostasis, where the number of R-cells must on average stay constant, is also the rate, per R-cell, at which C-cells are created from R-cells, via events , or direct transition, . Thus, the total rate of C-cells being created from the R-cells by such events – let us call them -events – is . While the non-Markovian nature of the process does not assure that such events are distributed exponentially, we can state that, by definition, the number of such creation events in a time period , , has mean value .
While, may in principle depend on time, we note that when internal rates of are fast compared to the time period above (an internal rate of is a rate ωij where states i,j are both in ), then fluctuates quickly and we can make an adiabatic approximation, replacing by its average , where is the steady state value of (this corresponds for GIA models to the definition of in 'Approximation of generic GIA models'). This is fulfilled in our simulations of large , since internal rates, such as defined in 'GIAB test case: bimodal distribution', scale with when (see 'GIA model for large'). Hence, the time scales of internal rates are substantially smaller than the relevant time scale , the lifetime of generated C-cells. Therefore, when comparing with simulation results, it is generally appropriate to assume that is constant. In the following subsection, we will discuss this case. The case when internal rates are slower than the time scale γC is discussed in the subsequent subsection.
Each C-cell created by an -event initiates a sub-clone within , defined through its progeny, which then follows the dynamics of . Such sub-clones evolve independently of each other (a defining characteristic of branching processes [Haccou et al., 2005]). Let us call the number of cells of a sub-clone created by an -event at time , which evolves over time t, as . We denote two -events which happen at the same time via a symmetric division of type by different indices i and , yet with . Therefore, the total number of cells in is the sum of independent random numbers ,
Note that the random numbers are not identically distributed, since their statistics depend on the time point of the i-th -event. In particular, the mean value, and variance depend on the time passed since the respective -event at time . Thus, we cannot apply the central limit theorem in its original form to the sum of random numbers, Equation 47. However, a variation of the central limit theorem states that sums of non-identically distributed random variables, , converge to normally distributed random variables, if mean and variance of are finite, and they fulfill Lindeberg’s condition (Billingsley, 1995).
The (strict) Lindeberg’s condition is said to be fulfilled for a sequence of random numbers ,, if
where and . If this is fulfilled, then converges for to a random variable that is normal distributed.
To show that the fulfill Lindeberg’s condition, we note that follow a sub-critical multi-type branching process, for which for , which is assured since the eigenvalues of the adjacency matrix of are all negative (since dominant eigenvalues of all SCCs in are negative [astrom_murray_feedback_book]). For multi-type branching processes, the variance σ2 is proportional to the mean value, hence . Therefore, for , hence it is bounded, i.e there exists such that for all t. Furthermore, since initially, at , , we know that there exist and such that for . Now we recall that, since here we assume the validity of the adiabatic approximation discussed in the previous subsection, the number of -events within a time period is . For generic , is finite and thus is , since all for large t. However, for or , we get that and thus . On the other hand, all , which means that all for or . Hence, Lindeberg’s condition is fulfilled if or and thus, becomes normally distributed,
The variance scales with since variances of independent random numbers add linearly and each . The exact value of and the pre-factor of the variance of in this limit depend on the (non-Markovian) model details.
The arguments leading to Equation 49 hold for large if the internal rates of are comparable to , which is satisfied for all cases we sampled randomly for numerical simulations, see 'GIA model for large'. However, if internal rates of are much smaller than , then the adiabatic approximation does not apply and may vary slower than the time scale . For example, consider a GIA model in which can be decomposed into two sub-compartments, say and , whereby any rates with have , as the example discussed in 'GIAB test case: bimodal distribution'. Then, the single cell in (note that always in GIA models) may spend long time periods in and respectively. Now, if , then, for time periods exceeding , the effective asymmetric division rates are and respectively, and during these time periods the distribution of cells has mean and respectively. Hence, the total clone size distribution will be the mix of two Normal distributions with mean and , respectively, that is, a bimodal distribution. This scenario is discussed in 'GIAB test case: bimodal distribution', for the specific case of two states in .
In GIA models, the number of R-cells is conserved, and in particular, for clones, we have for all times. Hence, the rate of -events is simply . Now, if internal rates are fast and , then becomes normally distributed, as argued above. Hence, also follows a Normal distribution, with mean instead.
Nonetheless, if internal rates are less than γC then bimodal distributions may be observed, as discussed in 'GIAB test case: bimodal distribution'.
The dynamics of GPA models read, in compartment formulation,
Since the dynamics of R-cells do not depend on C-cells, we can first consider the formers’ dynamics separately. In homeostasis, where , we have thus for R-cells,
This is a simple continuous time branching process with two offspring; yet it is non-Markovian: subsequent events may be correlated, since each event imbalances the internal distribution of cells in the compartment . Yet, as for C-cells, we can write the number of R-cells as a sum of independent (but not identically distributed) random variables. Let us consider for each R-cells, born at time , the random variable describing its 'survival' state, that is, if that cell is still in , and if that cell has left via symmetric differentiation, or direct transition, . Essentially, the random numbers are the ‘branches’ of the branching process. Since these events do not depend on other cells, the random numbers are independent of each other, and thus,
is a sum of independent, not identically distributed random variables. Here, is the total number of birth events occurring at rate , , up to time t. Since and , we can argue analogue to above for Equation 49 that the sequence of fulfills Lindeberg’s condition and thus converges to a Normal distribution, whereby the mean value (since due to homeostasis the mean number is constant and the initial condition is ). Hence, the probability to have cells in is
However, here, the variance is a random variable itself: Since the are independent, , where , and where is a random variable. The random numbers can only have the values or and they follow a simple death process, so for , it must be , while for , the variance must be finite, let’s say, where can in principle depend on time, yet is not known (it depends on the non-Markovian details of the model). Hence,
since the number of summands with is the number of surviving R-cells, that is, . Substituting into Equation 54 gives,
This is an Exponential distribution with mean value . Finally, when we enforce normalisation of the probability distribution, we get,
Eventually, we also have to 'add' the C-cells. Since for , also , individual events do not significantly affect the distribution of R-cells, (in contrast to the case of for GIA models), and hence we can assume the adiabatic approximation discussed above, where and thus . Therefore, C-cells are distributed according to a Normal distribution with mean and variance . As argued in the main text, the mean value of , conditionend on survival of a clone, , must grow over time, without bound if . Therefore, we can generally assume that , and hence both and . However, if we express the clone size in form of a rescaled variable ( is the mean of surviving clones) we can write with and , and note that the rescaled standard width of the distribution of , vanishes for . Therefore, is effectively a constant in that limit, . Hence, also and thus, the rescaled clone size, , is distributed according to an Exponential distribution (here: a probability density function) with unit mean, and after renormalisation, we get that
This distribution is indeed observed in all our simulations of GPA models for large t.
All numerical data used for figures is produced by programme code, which can be found on Github, under https://github.com/cp4u17/simCellState (copy archived at https://github.com/elifesciences-publications/simCellState).
GithubID cp4u17/simCellState. simCellState.
BookHandbook of Mathematical Functions with Formulas, Graphs and Mathematical Tables, 32United States Department of Commerce, National Bureau of Standards.
Exact solution of a two-type branching process: clone size distribution in cell division kineticsJournal of Statistical Mechanics: Theory and Experiment 2010:P07028.https://doi.org/10.1088/1742-5468/2010/07/P07028
A ”dynamic” proof of the Frobenuis-Perron theorem for Metzler matricesProbability Statistics, and Mathematics 1:17–26.https://doi.org/10.1016/B978-0-12-058470-3.50009-4
BookFeedback Systems: An Introduction for Scientists and EngineersPrinceton Unversity Press.
BookConvergence of Probability MeasuresJon Wiley & Sons.
BookProbability and MeasureJohn Wiley and Sons.
Asymptotics for interacting particle systems onZ dZeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete 53:183–196.https://doi.org/10.1007/BF01013315
BookIntroduction to AlgorithmsMIT Press.
Stem cell heterogeneity and plasticity in epitheliaCell Stem Cell 16:465–476.https://doi.org/10.1016/j.stem.2015.04.014
Exact stochastic simulation of coupled chemical reactionsThe Journal of Physical Chemistry 81:2340–2361.https://doi.org/10.1021/j100540a008
BookGenetic Algorithms in Search, Optimization and Machine Learning (1st edn)Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc.
BookMathematical Modelling of Clonal Stem Cell DynamicsIn: Cahan P, editors. Computational Stem Cell Biology. New York: Humana. pp. 107–129.https://doi.org/10.1007/978-1-4939-9224-9_5
Stability and steady state of complex cooperative systems: a diakoptic approachRoyal Society Open Science 6:191090.https://doi.org/10.1098/rsos.191090
BookBranching Processes: Variation, Growth, and Extinction of PopulationsCambridge: Cambridge University Press.https://doi.org/10.1017/CBO9780511629136
Handbook of Differential Equations: Ordinary Differential Equations239–257, Monotone dynamical systems, Handbook of Differential Equations: Ordinary Differential Equations, Elsevier, 10.1016/S1874-5725(05)80006-9.
Universal patterns of stem cell fate in cycling adult tissuesDevelopment 138:3103–3111.https://doi.org/10.1242/dev.060103
ReportStem Cell BasicsNIH.
Intravital microscopy through an abdominal imaging window reveals a pre-micrometastasis stage during liver metastasisScience Translational Medicine 4:158ra145.https://doi.org/10.1126/scitranslmed.3004394
Generalized lacZ expression with the ROSA26 cre reporter strainNature Genetics 21:70–71.https://doi.org/10.1038/5007
Plasticity within stem cell hierarchies in mammalian epitheliaTrends in Cell Biology 25:100–108.https://doi.org/10.1016/j.tcb.2014.09.003
BookStochastic Processes in Physics and ChemistryElsevier.
Yukiko M YamashitaReviewing Editor; University of Michigan, United States
Naama BarkaiSenior Editor; Weizmann Institute of Science, Israel
In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.
This work builds on several recent high-profile applications of statistical physics to infer the mode of stem cell fate choice in mammalian tissues based on clone size distributions obtained via cell lineage tracing. The work makes a solid contribution especially in clarifying how a broad class of models conceptually and quantitatively fits with the population asymmetry or the invariant asymmetry model. Such universality argument may have been already taken for granted by physicists/mathematicians but may have not been obvious to biologists who have specific systems in mind that have much richer cell state structure than the original simplified models.
Through a detailed mathematical analysis, the authors demonstrate that this large set of models can, in homeostatic tissues, be decomposed into two 'universality classes' in terms of their predicted asymptotic clone size statistics. This limits our ability to qualitatively distinguish between different models of stem cell fate choice based on clone size data only.
Decision letter after peer review:
Thank you for submitting your article "Universality of clonal dynamics poses fundamental limits to identify stem cell self-renewal strategies" for consideration by eLife. Your article has been reviewed by two peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Naama Barkai as the Senior Editor The reviewers have opted to remain anonymous.
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
We would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (https://elifesciences.org/articles/57162). Specifically, we are asking editors to accept without delay manuscripts, like yours, that they judge can stand as eLife papers without additional data, even if they feel that they would make the manuscript stronger. Thus the revisions requested below only address clarity and presentation.
This manuscript provides an important contribution to the field of stem cell biology regarding their 'division modes' (symmetric vs. asymmetric divisions), as detailed in individual comments from reviewers.
The reviewers felt that the concerns can be mainly addressed textual changes as detailed in individual comments.
This manuscript describes a general framework of understanding fate decision models, which have been previously proposed in explaining the dynamics of tissue stem cells in homeostasis. The work makes a solid contribution especially in clarifying how a broad class of models conceptually and quantitatively fits with the population asymmetry or the invariant asymmetry model. Such universality argument may have been already taken for granted by physicists/mathematicians but may have not been obvious to biologists who have specific systems in mind that have much richer cell state structure than the original simplified models.
The main criticism I have about the work is that the manuscript may misguide the readers to think that the GIA and GPA models are the only models that have been discussed in explaining the clonal statistics. As one of the authors showed in their previous paper (Greulich and Simons, 2016), the basic statistics such as the clone size distribution will change when adding spatial regulation to the voter model class, which is distinct from what is obtained by GIA or GPA. In fact, for all the tissues that have been studied so far, including the gut (10.1016/j.cell.2010.09.016, Lopez-Garcia et al., 2010), seminiferous tubule (Klein et al., 2010, Hara et al., 214, and 10.1016/j.stem.2018.11.013), and the skin (10.1016/j.stem.2018.09.005), experiments suggest that the spatial correlation between the fates are non-negligible and therefore the asymptotic statistics should follow the voter model. I would assume that there needs to be at least a justification in only studying the cases without spatial interactions here, especially if the authors do not know any specific example that nicely corresponds to their model.
This work builds on several recent high-profile applications of statistical physics to infer the mode of stem cell fate choice in mammalian tissues based on clone size distributions obtained via cell lineage tracing.
The authors note that two conceptually distinct 'models' of stem cell fate choice, the Population Asymmetry (PA) and Dynamic Heterogeneity (DH) models, exhibit the same asymptotic clone size distributions. They then consider a more general set of possible models for the proliferation, differentiation and loss of a multi-type population of cells in a tissue.
Through a detailed mathematical analysis, the authors demonstrate that this large set of models can, in homeostatic tissues, be decomposed into two 'universality classes' in terms of their predicted asymptotic clone size statistics. This limits our ability to qualitatively distinguish between different models of stem cell fate choice based on clone size data only.
Overall I found this work, though technically complex, to be well motivated and well explained. The authors provide a significant advance to our understanding of how different descriptions of stem cell fate choice may be qualitatively distinguished experimentally (though, as the authors note, we may nevertheless apply parameter inference techniques and model selection to quantitatively distinguish competing models based on given datasets).https://doi.org/10.7554/eLife.56532.sa1
[…] The main criticism I have about the work is that the manuscript may misguide the readers to think that the GIA and GPA models are the only models that have been discussed in explaining the clonal statistics. As one of the authors showed in their previous paper (Greulich and Simons, 2016), the basic statistics such as the clone size distribution will change when adding spatial regulation to the voter model class, which is distinct from what is obtained by GIA or GPA. In fact, for all the tissues that have been studied so far, including the gut (10.1016/j.cell.2010.09.016, Lopez-Garcia et al., 2010), seminiferous tubule (Klein et al., 2010, Hara et al., 2014, and 10.1016/j.stem.2018.11.013), and the skin (10.1016/j.stem.2018.09.005), experiments suggest that the spatial correlation between the fates are non-negligible and therefore the asymptotic statistics should follow the voter model. I would assume that there needs to be at least a justification in only studying the cases without spatial interactions here, especially if the authors do not know any specific example that nicely corresponds to their model.
We thank the reviewer for this remark. The reviewer notes correctly that in most tissues cell-extrinsic regelation via cell-cell signalling determines fate choices. This can indeed be represented by a (multi-type) voter model – a lattice model where lost cells are compensated by dividing neighbours. However, it has been shown that a voter model in any dimension larger than one has the same asymptotic rescaled clone size distribution as the corresponding branching process without interaction, which is the type of model we use (shown rigorously by Bramson and Griffeath, 1980, and discussed in the context of cell fate dynamics by Klein and Simons, 2011). For mean clone sizes as function of time t, there is no difference for dimensions larger than two (in the two dimensional scenario merely a logarithmic prefactor distinguishes the mean clone size in both models; since this factor is nearly constant for large t, it often cannot be distinguished by experimental data (Klein and Simons, 2011)).
Therefore, while a voter model may be a more realistic depiction of a tissue with spatial cell-cell interaction, the asymptotic rescaled clone size distribution of such a model is the same as for a corresponding model without spatial regulation, if considering tissues with cell arrangements of dimensions larger than one. Hence, our results would not be different for a corresponding voter model where spatial regulation is included, for two- or three-dimensional tissues such as epithelial sheets and volumnar tissues (e.g. liver or stroma). Yet, we agree that for quasi one-dimensional arrangements of stem cells, as found in seminiferous tubules and intestinal crypts, the clone size distribution differs, and these are indeed not covered by our model.
In the original version of the manuscript we expressed those limitations after the model definition: "This model is a general multi-type branching process, which is suitable to describe cell population dynamics in any dimension larger than one, even under cell-extrinsic regulation (Klein and Simons, 2011; Bramson and Griffeath, 1980)", but we acknowledge that this point warrants a clearer explanation. We therefore now include at this point a paragraph that clarifies the validity range of results to be expected from our model. Furthermore, we now discuss this in the Discussion section.https://doi.org/10.7554/eLife.56532.sa2
- Philip Greulich
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
We thank Benjamin D MacArthur for valuable discussions that contributed in the development of this research. CP is supported by a Studentship of the Institute for Life Sciences (Southampton) and PG by Medical Research Council New Investigator Research Grant MR/R026610/1.
- Naama Barkai, Weizmann Institute of Science, Israel
- Yukiko M Yamashita, University of Michigan, United States
© 2020, Parigini and Greulich
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Studies of protein fitness landscapes reveal biophysical constraints guiding protein evolution and empower prediction of functional proteins. However, generalisation of these findings is limited due to scarceness of systematic data on fitness landscapes of proteins with a defined evolutionary relationship. We characterized the fitness peaks of four orthologous fluorescent proteins with a broad range of sequence divergence. While two of the four studied fitness peaks were sharp, the other two were considerably flatter, being almost entirely free of epistatic interactions. Mutationally robust proteins, characterized by a flat fitness peak, were not optimal templates for machine-learning-driven protein design – instead, predictions were more accurate for fragile proteins with epistatic landscapes. Our work paves insights for practical application of fitness landscape heterogeneity in protein engineering.
Using a neural network to predict how green fluorescent proteins respond to genetic mutations illuminates properties that could help design new proteins.