Hidden long-range memories of growth and cycle speed correlate cell cycles in lineage trees

  1. Erika E Kuchen
  2. Nils B Becker
  3. Nina Claudino
  4. Thomas Höfer  Is a corresponding author
  1. German Cancer Research Center (DKFZ), Germany
  2. University of Heidelberg, Germany

Abstract

Cell heterogeneity may be caused by stochastic or deterministic effects. The inheritance of regulators through cell division is a key deterministic force, but identifying inheritance effects in a systematic manner has been challenging. Here, we measure and analyze cell cycles in deep lineage trees of human cancer cells and mouse embryonic stem cells and develop a statistical framework to infer underlying rules of inheritance. The observed long-range intra-generational correlations in cell-cycle duration, up to second cousins, seem paradoxical because ancestral correlations decay rapidly. However, this correlation pattern is naturally explained by the inheritance of both cell size and cell-cycle speed over several generations, provided that cell growth and division are coupled through a minimum-size checkpoint. This model correctly predicts the effects of inhibiting cell growth or cycle progression. In sum, we show how fluctuations of cell cycles across lineage trees help in understanding the coordination of cell growth and division.

Introduction

Cells of the same type growing in homogeneous conditions often have highly heterogeneous cycle lengths (Smith and Martin, 1973). The minimal duration of the cell cycle will be determined by the maximal cellular growth rate in a given condition (Kafri et al., 2016). However, many cells, in particular, in multicellular organisms, do not grow at maximum rate, and their cycle length appears to be set by the progression of regulatory machinery through a series of checkpoints (Novak et al., 2007). While much is known about the molecular mechanisms of cell-cycle regulation, we have little quantitative understanding of the mechanisms that control duration and variability of the cell cycle.

Recently, extensive live-cell imaging data of cell lineages have become available, characterizing, for example, lymphocyte activation (Mitchell et al., 2018; Duffy et al., 2012; Hawkins et al., 2009), stem cell dynamics (Filipczyk et al., 2015), cancer cell proliferation (Spencer et al., 2013; Barr et al., 2017; Ryl et al., 2017), or nematode development (Du et al., 2015). Such studies across many cell types have found that cycle lengths are similar in sister cells, which may be due to the inheritance of molecular regulators across mitosis (Spencer et al., 2013; Mitchell et al., 2018; Yang et al., 2017; Barr et al., 2017; Arora et al., 2017). By contrast, ancestral correlations in cycle length fade rapidly, often disappearing between grandmother and granddaughter cells, or already between mother and daughter cells.

Remarkably, however, the cycle lengths of cousin cells are found to be correlated, indicating that the grandmothers exert concealed effects through at least two generations. High intra-generational correlations in the face of weak ancestral correlations have been observed in cells as diverse as bacteria (Powell, 1958), cyanobacteria (Yang et al., 2010), lymphocytes (Markham et al., 2010) and mammalian cancer cells (Staudte et al., 1984; Sandler et al., 2015; Chakrabarti et al., 2018). The ubiquity of this puzzling phenomenon suggests that it may help reveal basic principles that control cell-cycle duration.

Theoretical work has shown that more than one heritable factor is required to generate the observed cell-cycle correlations in T cell lineage trees, while the nature of these heritable factors has remained unclear (Markham et al., 2010). Stimulated by the idea of circadian gating of the cell cycle in cyanobacteria (Mori et al., 1996; Yang et al., 2010), recent comprehensive analyses of cell lineage trees across different species have proposed circadian clock control as a source of cell-cycle variability that can produce the observed high intra-generational correlations (Sandler et al., 2015; Mosheiff et al., 2018; Martins et al., 2018; Py et al., 2019); such a model also reproduced observed cycle correlations in colon cancer cells during chemotherapy (Chakrabarti et al., 2018). However, in proliferating mammalian cells in culture, the circadian clock has been found to be entrained by the cell cycle (Bieler et al., 2014; Feillet et al., 2014). Moreover, the circadian clock is strongly damped or even abrogated by oncogenes such as MYC (Altman et al., 2015; Shostak et al., 2016) yet MYC-driven cancer cells retain high intra-generational correlations (Ryl et al., 2017).

Ultimately, the cell cycle must coordinate growth and division in order to maintain a well-defined cell size over many generations. Yeast species have long served as model systems. Here, it is assumed that growth drives cell-cycle progression, although molecular mechanisms of size sensing remain controversial (Facchetti et al., 2017; Schmoller and Skotheim, 2015). By contrast, animal cells can grow very large without dividing (Conlon and Raff, 2003), and recent precise measurements suggest that growth control involves both modulation of growth rate and cell-cycle length (Sung et al., 2013; Tzur et al., 2009; Cadart et al., 2018; Ginzberg et al., 2018; Liu et al., 2018). A minimal requirement for maintaining cell size is that cells reach a critical size before dividing, which can be achieved by delaying S phase (Shields et al., 1978).

Here, we present a systematic approach to learning mechanisms from measured correlation patterns of cell cycles in deep lineage trees. First, we develop an unbiased statistical framework to identify the minimal model capable of accounting for our experimental data. We then propose a biological realization of this abstract model based on growth, inheritance and a size checkpoint, and experimentally test specific predictions of the biological model.

Results

Lineage trees exhibit extended intra-generational correlations

To study how far intra-generational cell-cycle correlations extend within cell pedigrees, we generated extensive lineage trees by imaging and tracking TET21N neuroblastoma cells for up to ten generations during exponential growth (Figure 1A, Figure 1—video 1, Figure 1—source data 1 and Figure 1—figure supplement 1A). Autonomous cycling of these cells is controlled by ectopic expression of the MYC-family oncogene MYCN, overcoming the restriction point and thus mimicking the presence of mitogenic stimuli (Ryl et al., 2017). High MYCN also downregulated circadian clock genes (Figure 1—figure supplement 2). The distribution of cycle lengths (Figure 1B and Figure 1—figure supplement 1B) was constant throughout the experiment (Figure 1C and Figure 1—figure supplement 1C) and similar across lineages (Figure 1—figure supplement 1D), showing absence of experimental drift and of strong founder cell effects, respectively. To determine cycle-length correlations without censoring bias caused by finite observation time (Figure 1—figure supplement 3A; Sandler et al., 2015), we truncated all trees after the last generation completed by the vast majority (>95%) of lineages. The resulting trees were 5–7 generations deep, enabling us to reliably calculate Spearman rank correlations between relatives up to second cousins (Figure 1D,E and Figure 1—figure supplement 3B).

Figure 1 with 4 supplements see all
Cell-cycle lengths and their correlations captured by live-cell imaging.

(A) Live-cell microscopy of neuroblastoma TET21N cell lineages. Sample trees shown with cells marked that were lost from observation (dot) or died (cross). (B) Distribution of cycle lengths, showing median length (and interquartile range). (C) Cycle length over cell birth time shows no trend over the duration of the experiment. (D) Lineage tree showing the relation of cells with a reference cell (red); ancestral lineage (light blue), first side-branch (dark blue) and second side branch (green). (E) Spearman rank correlations of cycle lengths between relatives (with bootstrap 95%-confidence bounds) of three independent microscopy experiments. Color code as in D. B and C show replicate rep3.

Figure 1—source data 1

Overview of all time-lapse experiments displayed in the manuscript.

‘Corrected’ refers to the number of fully observed generations; only these were used, in order to correct for censoring bias. ‘Figures’ refers to main text figures and the respective supplements.

https://cdn.elifesciences.org/articles/51002/elife-51002-fig1-data1-v2.pdf
Figure 1—source data 2

Raw cell cycle data for lineage trees in TET21N replicates rep1-3.

https://cdn.elifesciences.org/articles/51002/elife-51002-fig1-data2-v2.xlsx

Cycle-length correlations of cells with their ancestors decreased rapidly with each generation (Figure 1E). However, the correlations increased again when moving down from ancestors along side-branches—from the grandmother toward the first cousins and also from the great-grandmother toward the second cousins (Figure 1E). The correlations among second cousins varied somewhat between replicates (we will show below that we can control these correlations experimentally by applying molecular perturbations). If cell-cycle length alone were inherited (e.g. by passing on regulators of the cell cycle to daughter cells), causing a correlation coefficient of ρmd between mother and daughter cycle lengths, and sisters are correlated by ρss, then first and second cousins would be expected to have cycle length correlation ρssρmd2 and ρssρmd4, respectively (Staudte et al., 1984). The actually observed cousin correlations are much larger, confirming previous observations on first cousins as summarized in Sandler et al. (2015) and extending them to second cousins. This discrepancy between simple theoretical expectation and experimental data was not due to spatial inhomogeneity or temporal drift in the data (Figure 1; Figure 1—figure supplement 3C-E). Thus, the lineage trees show long-ranging intra-generational correlations that cannot be explained by the inheritance of cell-cycle length.

Correlation patterns are explained by long-range memories of two antagonistic latent variables

We used these data to search for the minimal model of cell-cycle control that accounts for the observed correlation pattern of lineage trees (Materials and methods and Appendix 2). To be unbiased, we assumed that cycle length τ is controlled jointly by a yet unknown number d of cellular quantities that are inherited from mother to daughter, x=(x1,,xd)T, such that τ=τ(l=1dαlxl), with positive weights α. We take x to be a Gaussian latent variable and, generalizing previous work (Cowan and Staudte, 1986), describe its inheritance by a generic model accounting for inter-generational inheritance as follows: In any given cell i, xi is composed of an inherited component, determined by x in the mother, and a cell-intrinsic component that is uncorrelated with the mother. The inherited component is specified by an inheritance matrix 𝐀, such that the mean of xi conditioned on the mother’s x is xi|x=Ax (Figure 2A). The cell-intrinsic component causes variations around this mean with covariance (xiAx)(xiAx)T|x=I, where, with appropriate normalization of the latent variables, 𝐈 is the unit matrix. Additional positive correlations in sister cells may arise due to inherited factors accumulated during, but not affecting, the mother’s cycle (Arora et al., 2017; Barr et al., 2017; Yang et al., 2017); additional negative correlations may result from partitioning noise (Sung et al., 2013). These are captured by the cross-covariance between the intrinsic components in sister 1 and 2, (x1Ax)(x2Ax)T|x=γI. In total, d(d+1) parameters can be adjusted to fit the correlation pattern of the lineage trees: the components alm of the inheritance matrix 𝐀, the weights αl and the sister correlation γ. Together, these inheritance rules specify bifurcating first-order autoregressive (BAR) models for multiple latent variables governing cell-cycle duration.

Figure 2 with 1 supplement see all
Bifurcating autoregressive inheritance models.

(A) Coupled inheritance of d Gaussian latent variables xl and cell-intrinsic fluctuations generate cycle lengths. (B) Relative model evidences calculated for d=1,2, for the indicated inheritance matrices 𝐀=[alm] and sister coupling γ. Although Model VII is the most parsimonious for replicates rep2 and rep3 (blue and gray bars), only Model V with unidirectionally coupled inheritance explains all data well, including rep1 (bordeaux bars). Error bars from Monte-Carlo integration. (C) Model fits for rep1. Single-variable inheritance (Model II) and pure cross-inheritance (VII) fails to generate strong intra-generational correlations; uncoupled inheritance (III) fails to generate low ancestral correlations; Model V fits the data best. Rank correlations of the data shown with bootstrap 95%-confidence bounds (black bars). Model prediction bands (colored bars) were generated from the range of the parameter sets with likelihood higher than 15% of the best fit, corresponding to a Gaussian 95% credible region. (D) Model V, best-fit ancestral autocorrelation functions, for cycle lengths τ and latent variables. Long-range memory in the latent variables is anticorrelated and masked in observed cycle times.

To determine the most parsimonious BAR model supported by the experimental data, we employed a standard Bayesian model selection scheme. Selection is based on the Bayesian evidence, which rewards fit quality while naturally penalizing models of higher complexity (defined as being able to fit more diverse data sets; for details see Appendix 2, Evidence calculation). Specifically, we evaluated the likelihood of the measured lineage trees for a given BAR model, used it to compute the Bayesian evidence, and ranked BAR models accordingly (Figure 2B).

The simplest model that generated high intra-generational correlations was based on the independent inheritance of two latent variables (Model III; Figure 2C, cyan dots), whereas one-variable models failed to meet this criterion (Model II, Figure 2C, blue dots and Model I). However, Model III consistently overestimated ancestral correlations and hence its relative evidence was low (<10% for all data sets). To allow additional degrees of freedom, we accounted for interactions of latent variables. The most general two-variable model with bidirectional interactions (Model VI), overfitted the experimental data and consequently had low evidence. The models best supported by the data had unidirectional coupling, such that x2 in the mother negatively influenced x1 inherited by the daughters, that is with a12<0 and a21=0 (Figure 2B, Models IV, V and VII). Among these, Model VII, with a single inheritance parameter a12, is simplest, but was not compatible with experimental replicate rep1 as it could not generate second-cousin correlations (Figure 2B,C). Both Models IV and V were compatible with all replicates; however, Model V with only one self-inheritance parameter for both variables (a11=a22>0) was preferred (Model V, Figure 2B,C, orange dots). Model V produced a remarkable inheritance pattern (Figure 2D): Individually, both latent variables had long-ranging memories, with ∼50% decay over 2–3 generations. However, the negative unidirectional coupling cross-correlated the variables negatively along an ancestral line, resulting in cycle-length correlations that essentially vanished after one generation. Nevertheless, strong intra-generational correlations were reproduced by the model due to long-range memories of latent variables together with positive sister-cell correlations (γ>0). We conclude that the coexistence of rapidly decaying ancestral correlations and extended intra-generational correlations can be explained by the inheritance of two latent variables, one of which inhibits the other.

Cell size and speed of cell-cycle progression are antagonistic heritable variables

During symmetric cell division, both cell size and regulators of cell-cycle progression are passed on from the mother to the daughter cells (Spencer et al., 2013; Yang et al., 2017; Arora et al., 2017; Barr et al., 2017). We now show that simple and generic inheritance rules for these two variables provide a physical realization for BAR Model V.

To divide, cells need to both grow to a minimum size (Shields et al., 1978) and receive license to progress through the cell cycle from the regulatory machinery (Novak et al., 2007). Indeed, growth and cell-cycle progression can be separately manipulated experimentally in mammalian cells (Fingar et al., 2002). In particular, cells continue to grow in size when regulatory license is withheld, for example in the absence of mitogens, and growth is not otherwise constrained, for example by mechanical force or growth inhibitors (Fingar et al., 2002; Conlon and Raff, 2003).

While growth and cell-cycle progression are separable and heritable processes, they also interact. At the very least, the length of the cell cycle needs to ensure that cells grow to a sufficient size for division. This interaction alone implies an effect of one inherited variable, cycle progression, on the other, cell growth, that anti-correlates subsequent cell cycles (as required by BAR model V): If a delayed regulatory license prolongs the mother’s cell cycle, it will grow large. By size inheritance, its daughters will be large at birth, reach a size sufficient for division quickly and hence may have shorter cell cycles. Thus, despite inheritance of growth and cell-cycle regulators mothers and daughters may have very different cycle lengths due to this interaction.

Based on these ideas, we formulated a simple quantitative model of growth and cell-cycle progression on cell lineage trees. We introduced the variables ‘cell size’ s, measuring metabolic, enzymatic and structural resources accumulated during growth, and p, characterizing the progression of the cell-cycle regulatory machinery. Unlike the latent variables of the BAR model x1 and x2, their mechanistic counterparts s and p, respectively, are governed by rules reflecting basic biological mechanisms (Figure 3A, Appendix 3). Size s grows exponentially and is divided equally between the daughters upon division. We found that under some experimental conditions generating long cell cycles (downregulation of MYCN, see below), stable cell size distributions required feedback regulation of growth rate, as seen experimentally (Sung et al., 2013; Tzur et al., 2009); we implemented this as a logistic limitation of growth rate at large sizes for these conditions. The progression variable p determines the time taken for the regulatory machinery to complete the cell cycle, which is controlled by the balance of activators and inhibitors of cyclin-dependent kinases. These regulators are inherited across mitosis (Spencer et al., 2013; Yang et al., 2017; Arora et al., 2017; Barr et al., 2017) and hence the value of p is passed on to both daughter cells with some noise. Cells divide when they have exceeded a critical size, requiring time τg, and the regulatory machinery has progressed through the cycle, which takes an approximately log-normally distributed time (Ryl et al., 2017; Mitchell et al., 2018) modeled as τp=exp(p). Hence the cycle length is τ=max(τg,τp). Apart from requiring a minimum cell size for division, the growth-progression model does not implement a drive of the cell cycle by growth and thus allows cells to grow large during long cell cycles. By this mechanism, the cell size variable s is influenced by cycle progression, analogous to the BAR variable x1. By contrast, the progression variable p is not influenced by cell size, analogous to the variable x2 in the BAR model.

Figure 3 with 1 supplement see all
The growth-progression model.

(A) Scheme of the growth-progression model with heritable variables relating to cell size s and cycle progression timing p. (B) Measured and simulated cell-cycle length distributions (upper). Model distribution resolved by the division-limiting process (lower). (C) Measured and modeled correlation pattern with Spearman rank correlation coefficient and bootstrap 95%-confidence bounds. (D) Proportion of simulated cells limited by growth or progression. (E) Correlation of simulated mother-daughter cycle lengths colored by their division limitation: both by τg (black), both by τp (green), mother τp – daughter τg (magenta), mother τg – daughter τp (cyan). Percentage of cells in each subgroup and their correlation coefficients are shown. (F) Correlation of simulated cousin-cousin cycle length colored by the limitation of the common grandmother: by τg (orange) or τp (blue). (G) Autocorrelations along ancestral line of cycle length τ, growth time τg and the progression time τp, and the cross-correlation τpτg.

Figure 3—source data 1

Best-fit parameter values of the growth-progression model for all experiments shown, obtained from ABC-simulations.

Cell size was assumed to increase exponentially except for MYCN-inhibited cells, which were modelled by a logistic growth process with a fixed normalised maximum cell size smax equal to 20 times the threshold division size. The bottom two rows show the parameters used for simulated perturbations of growth cycle progression, respectively.

https://cdn.elifesciences.org/articles/51002/elife-51002-fig3-data1-v2.pdf

We fitted this model to the measured lineage trees by Approximate Bayesian Computation (Figure 3—figure supplement 1A and Figure 3—source data 1). The parameterized model yielded a stationary cell size distribution (Figure 3—figure supplement 1B) and reproduced the cycle-length distribution (Figure 3B and Figure 3—figure supplement 1C) as well as the ancestral and intra-generational correlations (Figure 3C and Figure 3—figure supplement 1D). Thus, the dynamics of cell growth and cell-cycle progression, coupled only through a minimum-size requirement, account for the intricate cycle-length patterns in lineage trees.

To gain intuition on the inheritance patterns of cycle length, we first considered ancestral correlations, focusing on mother-daughter pairs. Individual cell cycles in the model are either growth-limited, that is division happens upon reaching the minimum size, or progression-limited, that is the cell grows beyond the minimum size until the cycle is completed (Figure 3D and Figure 3—figure supplement 1E). If both mother and daughter are progression-limited (i.e., the threshold size is exceeded by both), their cycles are positively correlated (Figure 3E, green dots). As in this case size inheritance is inconsequential, this positive correlation is explained by the inheritance of the cell-cycle progression variable p alone. By contrast, all mother-daughter pairs that involve at least one growth limitation show near-zero (Figure 3E, cyan dots) or negative correlations (Figure 3E, magenta and black dots). This pattern is explained by the anti-correlating effect that daughters of longer-lived and hence larger mother cells require on average shorter times to reach the size threshold. Next, we considered intra-generational correlations, focusing on first cousins (Figure 3F). While cousins are positively correlated overall, this correlation is carried specifically by cousins that descend from a grandmother with a progression-limited cell cycle (Figure 3F, blue dots), whereas cousins stemming from a growth-limited grandmother are hardly correlated (Figure 3F, orange dots). Since progression-limited cells can grow large, this observation indicates that cousin correlations are mediated by inheritance of excess size, as is confirmed by conditioning cousin correlations on grandmother size (Figure 3—figure supplement 1F). Size inheritance over several generations is also evident in the autocorrelation of the time required to grow to minimum size, τg (Figure 3G and Figure 3—figure supplement 1G, black dots). The autocorrelation of the progression time τp is also positive (but less long-ranging; Figure 3G and Figure 3—figure supplement 1G, green squares), while the negative interaction with growth is reflected in the negative cross-correlation (Figure 3G and Figure 3—figure supplement 1G, red triangles). In sum, the long-range memories of cell-cycle progression and cell growth are masked by negative coupling of these processes, causing rapid decay of cell-cycle length correlations along ancestral lines (Figure 3G and Figure 3—figure supplement 1G, orange triangles). These inheritance characteristics of the growth-progression model mirror those of BAR model V (see Figure 2D).

Effects of molecular perturbations on cell-cycle correlations are correctly predicted by the model

If the growth-progression model captures the key determinants of the cell-cycle patterns in lineage trees, it should be experimentally testable by separately perturbing growth versus cell-cycle progression. We first derived model predictions for these experiments. Intuitively, if growth limitation were abolished by slowing cell-cycle progression, only progression would be inherited and therefore mother-daughter correlations of cycle time could no longer be masked. As a result, we expect intra-generational correlations to be reduced relative to ancestral correlations when inhibiting cycle progression; conversely, inhibiting growth (and thereby increasing growth limitation) should raise them. Indeed, using the model to simulate perturbation experiments (Figure 3—source data 1), we found that growth inhibition increased cousin correlations relative to mother-daughter correlations, whereas slowing cell-cycle progression decreased these correlations (Figure 4A).

Figure 4 with 2 supplements see all
Targeted perturbation of growth and cell-cycle progression.

(A) Predictions for changes in the ratio ρc/ρmd of cousin to mother-daughter correlations, when slowing growth or cycle progress compared to the best-fit parameters (control). ρc1= first cousins, ρc2= second cousins. (B–F) Experimental perturbations of cycle progress and growth by MYCN inhibition and rapamycin treatment, respectively. (B) Cell size distribution. Areal forward scatter measured experimentally by flow cytometry for control high-MYCN, MYCN-inhibited and rapamycin-treated (40 nM) TET21N neuroblastoma cells; shown are two biological replicates, indicated by solid and dashed lines, that were measured with the same FACS settings. (C) Measured and best-fit model cycle length distributions. Median and interquartile range are indicated. (D) Measured (black) and best-fit correlation pattern of MYCN-inhibited and rapamycin-treated cells with Spearman rank correlation coefficient and 95%-confidence bounds. (E) Measured cousin/mother daughter correlation ratios. (F) Proportion of simulated cells limited by growth or progression, using best-fit parameters for MYCN inhibition or rapamycin-treatment.

Figure 4—source data 1

Raw cell cycle data for lineage trees in perturbed TET21N replicates -myc1-2 and rap1-2.

https://cdn.elifesciences.org/articles/51002/elife-51002-fig4-data1-v2.xlsx

To test this prediction, we slowed cell-cycle progression experimentally by reducing MYCN, exploiting the doxycycline-tunable MYCN gene integrated in the TET21N cells. Cells grew to larger average size (Figure 4B, blue lines) over longer and more variable cell cycles (Figure 4C). These data show that lowering MYCN slowed cell-cycle progression while allowing considerable cell growth. Further consistent with this phenotype, expression of mTOR, a central regulator of metabolism and growth (Fingar et al., 2002), was not lowered (Figure 4—figure supplement 1A). In a separate experiment, we inhibited cell growth by applying the mTOR inhibitor rapamycin, which reduced cell size by a small but reproducible amount (Figure 4B, red lines). This treatment also lengthened the cell cycle slightly (Figure 4C) but without changing MYCN protein levels (Figure 4—figure supplement 1B). Thus, lowering MYCN and inhibiting mTOR are orthogonal perturbations that act on cell-cycle progression and cell growth, respectively. As predicted by the growth-progression model, these perturbations resulted in markedly different cycle-length correlation patterns within lineage trees (Figure 4D,E and Figure 4—figure supplement 1C,D): Lowering MYCN decreased intra-generational correlation and, in particular, removed second-cousin correlations. By contrast, rapamycin treatment strongly increased intra-generational correlations and caused ancestral correlations to decline only weakly. Collectively, these findings support the growth-progression model of cell-cycle regulation.

Cousin correlations reflect active cell-size checkpoint

We then asked whether the cycle-length correlation patterns experimentally observed in lineage trees contain information about the underlying regulation. To this end, we fitted the BAR and growth-progression models to MYCN and rapamycin perturbation data. We again obtained good agreement with the data (Figure 4C,D and Figure 4—figure supplement 1C–E,G–J). MYCN knockdown cells grew larger; to obtain a stable cell-size distribution for the corresponding model fits, we implemented logistic regulation of growth rate at large cell sizes. When we applied, for comparison, this regulation to the control and rapamycin treatment data, the model fits (Figure 4—figure supplement 2) were not noticeably affected compared to purely exponential growth. This result indicates that growth rate regulation affecting large cells, as found experimentally (Ginzberg et al., 2018), is compatible with long-range intra-generational correlations in cell-cycle length.

In terms of fit parameters of the model, MYCN inhibition caused considerable slowing of cycle progression and also a moderate decrease in growth rate (Figure 3—figure supplement 1A, parameters μ and k, respectively). As a result, the vast majority of cell cycles were progression-limited (Figure 4F and Figure 4—figure supplement 1F). For the rapamycin-treated cells, we estimated growth rates that were lower than for the control experiments on average, as expected (Figure 3—figure supplement 1A, parameter k). Also, correlations in cell-cycle progression increased in mother-daughter and sister pairs (parameters a and γ, respectively). This is consistent with rapamycin inhibiting mTOR and hence growth, but not affecting drivers of the cell cycle, ERK and PI3K (Adlung et al., 2017), since then lengthening of the cell cycle due to slower growth may allow prolonged degradation of cell-cycle inhibitors, which would increase inheritance of cell-cycle length (Smith and Martin, 1973). Taken together, rapamycin treatment increased the fraction of growth-limited cell cycles (Figure 4F and Figure 4—figure supplement 1F) and inheritance of cell-cycle progression speed, thus causing increased ancestral and intra-generational correlations.

We hypothesized that growth may be limiting primarily for rapidly proliferating cell types, even without specific growth inhibition. We analyzed time-lapse microscopy data of non-transformed mouse embryonic stem cells (Filipczyk et al., 2015) that proliferate much faster than the neuroblastoma cells (Figure 5A and Figure 5—figure supplement 1A). Side-branch correlations of cycle length were again large (Figure 5B Figure 5—figure supplement 1B), as seen in the previous data except for the MYCN-inhibited cells. Interestingly, the strength of the intra-generational correlations was most similar to the much more slowly dividing rapamycin-treated cells (cf. Figure 4D). As before, the BAR model required two negatively coupled variables to account for these data (Figure 5C, Figure 5—figure supplement 1D,E). Fitting the growth-progression model to the data (Figure 5A,B), we found that the majority (∼60%) of cell cycles were limited by growth (Figure 5D, Figure 5—figure supplement 1C), indicating that cycle length of fast proliferating mammalian cells is, to a large extent, controlled by growth.

Figure 5 with 1 supplement see all
Rapid cell cycle of embryonic stem cells are frequently growth-limited.

(A) Cycle length distribution of data (black) and growth-progression model (purple). (B) Measured (black) and modeled (purple) correlation pattern using the growth-progression model. (C) Model evidences of the BAR model, version numbering as in Figure 2B. (D) Proportion of simulated cells limited by growth or progression. Data from Filipczyk et al. (2015) reanalyzed for cell-cycle duration.

Discussion

Here, we showed that the seemingly paradoxical pattern of cell-cycle lengths in lineage trees, with rapidly decaying ancestral and long-range intra-generational correlations, can be accounted for by the inheritance of two types of quantities: resources accumulated during the cell cycle (cell ‘size’) and regulators governing the speed of cell-cycle progression. The fact that these are fundamental processes in dividing cells may help explain the ubiquity of the paradoxical cell-cycle pattern. Targeted experimental perturbations of cell growth and cell-cycle progression support our model.

As an alternative mechanism underlying the observed cell-cycle variability and cousin correlations, modulation of the cell cycle by the circadian clock has been suggested, with strongest experimental evidence to date for cyanobacteria (Sandler et al., 2015; Mosheiff et al., 2018; Py et al., 2019). Unlike cyanobacteria, proliferating mammalian cells show entrainment of the circadian clock to the cell cycle with periods well below 24 hr (Bieler et al., 2014; Feillet et al., 2014). Whether in this setting the circadian clock could still influence cell-cycle correlations remains to be studied.

Our proposed mechanism for cycle length correlations was motivated by Bayesian model selection. All data sets which displayed long-range correlation patterns (control replicate rep1 and rapamycin treatment of neuroblastoma cells as well as ESCs) selected a BAR model featuring long-range inheritance of two memory variables, which is masked in mother-daughter pairs by an anticorrelating interaction between them. The growth-progression model shares these essential features. Size inheritance is predicted to be particularly long-range by the fits of the growth-progression model to the experimental data. This memory emerges because growth does not force cell-cycle progression, allowing cells to grow large in progression-limited cycles and then pass on size over several generations. Such weak coupling between concurrent processes echoes recent work in E. coli on timing of cell division (Micali et al., 2018) and on the existence of two parallel adder processes for replication and division (Witz et al., 2019). Of note, the latter paper also utilizes a statistical framework for systematic model selection against experimentally observed correlation patterns akin to our approach.

Our findings raise the question of the mechanistic modes by which cells coordinate cell cycle and growth, which has been a long-standing problem; see Shields et al. (1978) for historical references and Ho et al. (2018) for a recent review. Recent work on this problem has shown the existence of negative feedback of cell size on growth rate (Ginzberg et al., 2018). We have found that our results remain robust when implementing a simple form of such a feedback (logistic dependence of growth rate on size) in the growth-progression model. Another recent study has shown that growth of many mammalian cells during the cell cycle adds a volume that only weakly increases with cell volume at birth (termed near-adder behavior, Cadart et al., 2018). This behavior appears to be caused by a combination of growth-rate regulation and cell-size effects on cell-cycle progression. We expect that factoring in the cell-cycle length correlations studied here will help uncover the mechanistic details of cell size regulation. Refining our model in this direction may also help capture yet more detail of the correlation structure, such as the apparent increasing trend from aunt to first cousins and greataunt to second cousins. Moreover, we envisage that our inference approach could be extended to include finely resolved data on cell-cycle phases (Chao et al., 2019) as well as multiple cell fates via asymmetric division and differentiation (Duffy et al., 2012).

Materials and methods

Experimental methods

Request a detailed protocol

MYCN-tunable mammalian neuroblastoma SH-EP TET21N (RRID:CVCL_9812) cells were cultured as in Lutz et al. (1996). These TET21N cells were obtained from Dr. Frank Westermann (German Cancer Research Center, DKFZ) whose lab generated this line. This cell line is regularly authenticated by an in-house DKFZ service using STR profiling. Mycoplasma contamination testing was negative. MYCN and mTOR were inhibited using 1 μg/ml doxycycline or 20–40 nM rapamycin (Calbiochem, 553210–100 UG), respectively. Cells were grown on ibidi μ-slides and phase contrast images (Nikon Ti-E) acquired every 6–15 min for up to 7 days under controlled growth conditions. The presented data consists of independent biological and technical replicates with n = 3 for untreated TET21N cells, n = 2 for MYCN-inhibited and rapamycin-treated cells. Cells were tracked in Fiji (version 1.48d) using the tracking plugin MTrackJ (Meijering et al., 2012). For flow cytometry, cells were stained with MYCN primary antibody (Santa Cruz Biotechnology, Cat# sc-53993; RRID:AB_831602), secondary fluorescence-conjugated antibody goat anti-mouse Alexa Fluor 488 IgG (Life Technologies Cat#A-11001; RRID:AB_2534069) and measured on a Miltenyi VYB MACSQuant Analyser. See Appendix 1 for details.

Data analysis

Request a detailed protocol

MATLAB (R2016b) was used for all data analyses. Correlations represent Spearman rank correlations or, for the BAR model, Pearson correlation coefficients between the Gaussian-transformed cycle times. The difference between these two methods was far smaller than the experimental error. Error bounds were estimated by bootstrap re-sampling on the level of lineage trees. Censoring bias was avoided by truncating lineage trees after the last generation completed by all lineages within the experiment (see e.g. Sandler et al., 2015 and Appendix 1), truncating the trees to 7, 6 and 5 generations for the three MYCN amplified experiments. MYCN-inhibited and rapamycin-treated trees were five generations deep.

Bifurcating autoregressive (BAR) model

Request a detailed protocol

We constructed BAR models of cell-cycle inheritance, as described in detail in Appendix 2. Briefly, the cell state is determined by a vector of Gaussian (latent) variables which are inherited from the mother to the daughter cells by a linear map plus a cell-intrinsic noise term, which is correlated between daughters. The model is thus a Gaussian latent-variable model, where inheritance takes the form of an autoregressive vector-AR(1) process defined on a lineage tree. The cycle time is then calculated by an data-derived (approximately exponential) function of a weighted sum of the cellular state. We calculated whole-lineage tree log-likelihood functions analytically and used them to evaluate Bayesian Evidences (Bayes factors) that quantify the relative support from the data for various model variants.

The growth-progression model

Request a detailed protocol

Cell-cycle progression is modeled by a fluctuating, centered Gaussian heritable variable q, analogous to version II of the BAR model. Variables were scaled and shifted, p=σpq+μ, yielding log-normal progression durations τp=exp(p). Size accumulation was modeled by exponential growth or for MYCN inhibition logistic growth. The normalized critical cell size sth fluctuates slightly and independently in each cell as sth=1+ζ with ζ𝒩(0,σg2). The growth-progression model was implemented in Matlab (R2016b), R (3.4.3) and OCaml (4.06) and 30 trees of 7 generations simulated, corresponding to the experimental dataset sizes. The simulation was repeated 100 times to generate confidence bounds. Parameters were fitted using Approximate Bayesian Computation independently for each dataset. See Appendix 3 for details.

Appendix 1

Experimental methods

Cell culturing and treatment

MYCN-tunable mammalian neuroblastoma SH-EP TET21N (TET21N, RRID: CVCL_9812) (Lutz et al., 1996) cells were cultured in RPMI 1640 medium supplemented with 10% fetal calf serum and 1% penicillin/streptomycin at 37°C 5% CO2 and 88% humidity. Versene was used for harvesting. TET21N cells were originally isolated from a female patient. Cell lines are authenticated by the German Cancer Research Center in house facility every half-year. MYCN-inhibited populations were established by incubating cells with 1 μg/ml doxycycline for 48–72 hr prior to further analysis. Growth-inhibited populations were generated by treating cells with the mTOR inhibitor rapamycin (Calbiochem, 553210–100 UG) at 20 nM or 40 nM rapamycin dissolved in DMSO. Cells were treated with rapamycin or the same concentration of DMSO for 72 hr prior to harvesting for flow cytometry or live-cell microscopy.

Live-cell microscopy

103 cells were grown on 8-well ibidi μ-slides coated with collagen IV (Cat# 80822) in RPMI 1640 medium and imaged every 6–15 min for up to 7 days under controlled growth conditions at 37°C, 5% CO2 and 80% humidity (Pecon incubator P). Growth media was changed every 2–3 days. Phase contrast images were acquired with an inverted widefield microscope (Nikon Ti-E) using an EMCCD camera (Andor iXON3 885) and a 10x (CFI Planfluor DL-10x, NA 0.3) or 20x lense (CFI Plan Apochromat DM 20x, NA 0.75). Cells were tracked in Fiji (version 1.48d) using the manual tracking plugin MTrackJ (Meijering et al., 2012). The presented imaging data consists of independent biological and technical replicates with n = 3 for untreated TET21N cells, n = 2 for MYCN-inhibited cells and n = 2 for rapamycin-treated cells.

Flow cytometry and antibody staining

106 cells were fixed with 4% paraformaldehyde for 15 min at room temperature. Permeabilization was performed in 90% ice-cold methanol for at least 24 hr at −20°C. Cells were washed in staining buffer (1% BSA, 0.1% TritonX in PBS), and incubated with 0.5–1 μg per sample of MYCN primary antibody (Santa Cruz Biotechnology, Cat# sc-53993; RRID:AB_831602) for 1 hr at room temperature. Cells were washed 3x with staining buffer and incubated with a secondary flouresence-conjugated antibody, goat anti-mouse Alexa Fluor 488 IgG (Life Technologies Cat#A-11001; RRID:AB_2534069), again for 1 hr at room temperature. Cells were washed 3x with staining buffer and DNA content staining was performed with FxCycle Violet Stain (Thermo Fischer Scientific). A Miltenyi VYB MACSQuant Analyser was used for measurements and data was analysed using FlowJo software.

Transcriptomics

mTOR mRNA expression data in TET21N cells under control and MYCN-inhibited conditions was obtained from RNA-Seq measurements by Ryl et al. (2017) deposited at GEO (GSE98274).

Data analysis

MATLAB (R2016b) was used for all data analysis steps.

Correlation coefficients

Correlation patterns of lineage trees of data and the growth-progression model represent Spearman rank correlations. For the bifurcating autoregressive (BAR) model, Pearson correlation coefficients are calculated between the Gaussian-transformed cycle times. These are the same as Gaussian rank correlations between cycle times. In practice, the differences between Gaussian rank correlations and Spearman rank correlations were much smaller than experimental error bounds on either, so we did not differentiate between the measures. Confidence bounds on correlation coefficients were estimated by bootstrap resampling on the level of lineage trees; resampling on the level of individual pairs of related cells would neglect the correlations between cells of the same tree and thus underestimate variability. Trees that did not contain information on the cell pair under analysis were removed prior to bootstrapping. At each bootstrap repeat, family trees were randomly drawn with replacement up to the number of trees in the original dataset. From the resulting bootstrap sample all cell pairs were used to calculate the sample correlation coefficient. This process was repeated 10,000 times. From the resulting distribution of correlation coefficients the 95% quantiles were used as confidence bounds.

Exponential growth

For each experiment, an exponential growth model of the form Nt=N0exp(kt) was fitted to cell counts over time by performing a maximum likelihood estimation using the trust-region algorithm in MATLAB.

Cell-cycle length distribution over time

At each time point, the moving-window median was calculated from all cells born within a window of 10 hr before or after this point.

Randomization

All cells within a dataset were randomly paired with each other and correlation of the resulting sample calculated. This procedure was repeated 10,000 times. From the resulting distribution the mean and the 95% quantiles are given.

Censoring

Censoring bias resulting from a finite observation time can lead to an overrepresentation of faster cells. To demonstrate this effect, we generated trees using a toy model with independent normally distributed cycle lengths. The trees were truncated at various total observation times and Spearman rank correlation coefficients between all related cells within the observation time window were sampled. As Figure 1—figure supplement 3A shows, short observation times strongly distort the sampled mother-daughter and sibling correlations away from their true value 0. The same basic effect persists for more distant relationships and can be further enhanced if cycle lengths are inherited. This censoring bias can be avoided by truncating lineage trees not after a given observation time, but after the last generation completed by all lineages within the experiment, uniformly over all trees (see e.g. Sandler et al., 2015). In this way slower and faster-cycling lineages are represented equally. Because some cells were inevitably lost from the field of view by migration, and a small percentage of cells showed extremely slow cycles, such a strict cut-off was unfeasible in our experiments. We assumed that cell loss by migration is not correlated to cycle length, so migrating cells were not counted as missing from otherwise complete trees. Within the remaining tree, we then determined the last generation to be included in our analysis by the following procedure: We first counted the number of cells nalive(G) within each generation G that were still alive at the end of the observation period. The last generation Glast to be included was then determined as the maximum generation such that

(1) G=1Glastnalive(G)G=1nalive(G)5%.

All further generations were removed from the dataset. This procedure truncated the trees to 7, 6 and 5 generations, respectively for the three replicate experiments using our MYCN amplified cell line. MYCN-inhibited and rapamycin-treated trees were 5 generations deep.

Spatial trend

To assess potential spatial biases related to locally variable conditions, cells were divided into a 4 × 4 grid according to their position at division. The cycle length distribution of cells within each grid region (containing ≥ 5 cells) was compared to the distribution (1) within every other grid region and (2) of the whole dataset at a 5% significance level using a two-sided Kolmogorov-Smirnov test and correcting for multiple testing using the Benjamini-Hochberg procedure (using functions ks.test and p.adjust in R). Note also that because cells are motile they experience a range of local environments during their lifetime.

Appendix 2

Bifurcating autoregressive model for cycle inheritance

Setup

In order to explore systematically which simple local inheritance schemes can generate the experimentally observed cycle length correlations, we study a class of Gaussian latent variable models of adjustable complexity. In the models, the cycle length τ of a cell is obtained from a standard normal variable by a nonlinear transformation τ=g(). In our data, cell cycle lengths are roughly log-normally distributed, so g is approximately a shifted exponential function. To simplify and make the model more robust to outliers, we determine g empirically, such that its inverse g-1 transforms the cell cycle length into a standard normal variate. That is, we choose g()=cex-1(cgauss()) where cgauss is the cumulative distribution function (CDF) of a standard normal distribution, and cex is the empirical CDF of the experimental data. This transformation discards all information about the shape and mean value of the cycle length distribution; the set of variables then purely reflect the strength of correlations between cell cycles. The (Pearson) moment correlation coefficients between the variables are identical to the so-called Gaussian rank correlation coefficients (Boudt et al., 2010) between the corresponding cycle lengths τ, which are similarly robust to outliers as the more common Spearman rank correlation.

The Gaussian variable is used to model correlation by inheritance. is a weighted sum of d latent, centered Gaussian variables x=(x1,,xm)T with positive weights α=(α1,αm)T, denoted as vectors x and α. That is, =αTx=lαlxl. Inheritance in the model occurs by passing on latent variables from mother to daughter cells. The basic model equation relation reads

(2) xi=Ax+bξi+b¯ξı¯.

Here, a superscript i=1,2 denotes a daughter cell, and absence of a superscript refers to the mother cell. The matrix 𝐀 implements inheritance: The average of a daughter’s latent variables, given the mother’s is xi|x=Ax. This linear coupling of latent variables through inheritance may take any form compatible with the basic stability requirement that its operator norm must satisfy 𝐀<1. Since both daughters inherit the same contribution from the mother, inheritance correlates the daughters’ latent variables positively. Daughter cells are also subject to random fluctuations which we model by standard normal random vectors ξi. These fluctuations are correlated due to the term b¯ξı¯ in Equation 2. Here ı¯ designates the sister cell of i, for example 2¯=1. We parametrize these correlations via

(3) b=cos(β/2),b¯=sin(β/2),γ2bb¯=sinβ, where -π/2<β<π/2.

The sister correlations conditioned on the mother latent variables then become

(4) xixı¯T|x=γI;xixiT|x=I,

where 𝐈 is the d-dimensional unit matrix. Positive sister correlations (γ>0) may arise due to fluctuations that occur within the mother cell after its cycle duration has been fixed and are shared by the daughters. Negative correlations (γ<0) may arise due to partitioning noise upon inheritance. Note that latent variable fluctuations are correlated between sisters but uncorrelated between different latent variables. Effectively, our choice of parametrization partitions all fluctuating cell cycle-relevant processes within the daughter cells into d Gaussian components that are maximally decorrelated, similar to a principal component decomposition.

Overall, Equation 2 defines an unbiased model with linear, local inheritance of latent variables, and an output that is a linear combination of latent variables. Its Gaussian form may be justified as the maximum-entropy distribution (Jaynes, 1957) for this problem, since only covariance information is used as an experimental input at this stage. Our model is a first-order vector autoregressive process, defined on the lineage tree of cells; it is a latent-variable generalization of the bifurcating autoregressive model already considered by Staudte and coworkers (Staudte et al., 1984). We remark that in Staudte et al. (1984) the dimensionality is d=1, and therefore α is unnecessary.

Stationary exponential growth

Combining Equations 2 and 4 by the law of total variance, the latent covariance satisfies

(5) xixiT=AxxTAT+I.

In stationary exponential growth, averaging over a single lineage forward in time, a stationary distribution with mean 0 and covariance 𝐂 is established. Then xixiT=xxT=C, and Equation 5 implies

(6) 𝐂=𝐀𝐂𝐀𝖳+𝐈=k=0𝐀k𝐀k𝖳.

We take this stationary distribution of latent variables as initial condition for root cells of lineage trees, assuming they come from an equilibrated growth phase. This assignment is not strictly correct because the stationary distribution along forward lineages is different from the distribution of all cells in an exponentially growing population at a given time (Lin and Amir, 2017); however, the difference was small for our parameters when tested numerically (not shown).

Computation of the likelihood

We aim to compute the probability of generating a lineage tree with given cycle lengths within the model. We fix a minimum generation number and consider only trees in which essentially all branches reach this number, thus discarding overhanging cells on some branches, (see also Sandler et al., 2015). This is crucial since in experiments with finite duration, selection bias would otherwise be introduced (Cowan and Staudte, 1986).

We begin by indexing cells in a tree by their pedigrees, which are the sequences of sister indices counting from the root cell, for example I=i1i2ik for a cell in generation k and Iik+1 for one of its daughters. Sorting these indices, we can then arrange all Gaussian-transformed cycle lengths in a tree into a single vector _. Since _ is Gaussian with mean 0_, its log-probability takes on the simple quadratic form

(7) (_)=(_|A,γ,α)=12[logdet(2πC_)+_TC_1_].

To evaluate Equation 7, we need to determine the joint covariance matrix 𝐂¯ of Gaussian cycle lengths over the given tree structure as a function of the parameters A,γ,α. We start by first deriving the joint covariance matrix 𝐂¯ of the latent variables x_. This is a block matrix with d×d blocks 𝐂IJ that correspond to pairs of cells in well-defined relationships, such as mother-daughter, cousin-cousin, etc. Since the lineage tree is sampled from stationary growth, 𝐂IJ depends only on the relationship of I and J, that is on their respective ancestral lines up to the latest common ancestor, and not on the history before. In particular, if I=Jik+1 then cell I is a descendant of cell J and we write this as I>J; otherwise we write IJ. Note that II.

Splitting one cell pedigree as I=Ki, from Equation 2 we derive the relations

(8) xKixJT=(AxK+bξi+b¯ξı¯)xJT={AxKxJTJK(i)AxKxKTAT+γIJ=Kı¯(ii)

Equation 8 lets us compute 𝐂IJ by a recursive procedure, as follows:

  • Consider the case I>J. If I=Ki, then JK. Now use Equation 8 (i) repeatedly (k times), moving up the ancestral line, until arriving at the form CIJ=AkxJxJT=AkC.

  • In the case I=Ki1ik,J=Kj1jk with j1=ı¯1, no cell is a descendant of the other, and their last common ancestor is K. Use Equation 8 (i) (or its transpose) on both branches repeatedly until the form Ak1xKi1xKı¯1TAk1T is obtained. Then use Equation 8 (ii) to get 𝐂IJ=𝐀k𝐂𝐀k𝖳+γ𝐀k-1𝐀k-1𝖳

These two cases cover all possible cell-cell relations, so that the procedure fully determines the joint latent covariance 𝐂¯ for a given tree structure, inheritance matrix 𝐀 and sister correlation γ.

Finally, to obtain the covariance 𝐂¯ of the Gaussian cycle lengths , we project onto α. The elements of 𝐂¯ result as

(9) CIJ=IJ=αTxIxJTα=αTCIJα.

This completes the evaluation of the log-probability 𝒫 (Equation 7), which is also equal to the log-likelihood of the model, (_)=(A,γ,α). Accounting for the constraint 2=1 which we impose to fix the arbitrary normalization of , the full model has d2+1+d-1=d(d+1) adjustable parameters. This number can be reduced by restricting the inheritance matrix to a specific form, or by setting γ=0, as was done for the model variants discussed in the main text.

As a corollary, the Gaussian rank correlation between cycle lengths of any pair of cells results as

(10) ρτIτJgauss=ρIJ=CIJCII=αTCIJααTCα.

In the one-dimensional special case d=1, the projection on α becomes irrelevant and Equation 10 reduces to

(11) ρτIτJgauss={akI>JorJ>Iak+k+γak+k2/c=ak+k2[a2+γ(1a2)]IJandJI,

where k,k1 the distances to the latest common ancestor as in the algorithm above, and a𝐀 is the 1×1 inheritance matrix. Some special cases of Equation 11 given already in Cowan and Staudte (1986) are ρssgauss=[a2+γ(1-a2)] for sisters and ρc1gauss=a2ρssgauss=ρmdgauss2ρssgauss for first cousins. In other words, to compute the correlation between related cells, one multiplies mother-daughter correlations along the path connecting them, taking a shortcut via the daughters of the last common ancestor where one instead multiplies with the sister-sister correlation. Specializing further to γ=0, Equation 11 reduces to the well-known relation ρτIτJgauss=ρmdgaussk+k where k+k is the number of cell divisions linking I and J. As detailed in the main text, these one-dimensional special cases are insufficient to explain our data.

Evidence calculation

To compare different model versions’ ability to explain but not overfit the data, we employed a standard Bayesian model selection scheme (see e.g. Wasserman, 2000; MacKay, 2003). Within this framework, model selection is treated on the same grounds as parameter inference; the task is to assign to each one out of a set of models its likelihood to have generated the data. One or several plausible models can then be selected on these grounds. Concretely, the scheme proceeds as follows. The probability of model M to generate data _ is obtained by integrating over all parameter values πM, which are distributed over a parameter space ΠM with prior distribution p(πM|M):

(12) p(_|M)=ΠMp(_|πM,M)p(πm|M)dπM.

Here, p(_|πM,M)=exp[(_)]=exp[(πM)] is equal to the likelihood function for model M, Equation 7. Applying Bayes’ rule, the probability that among , M was the model that generated the data, is then

(13) p(M|_)=p(M)p(_|M)p(M)p(_|M).

When models are equivalent a priori as we assume here, then both the prior belief in a model, p(M), and the entire denominator in Equation 13 are unimportant constants. Then the so-called evidence or Bayes factor, obtained by calculating

(14) E(M)=ΠMp(_|πM,M)dπM,

is proportional to each model’s probability of having generated the data, Equation 13. We calculated E(M) numerically by Monte-Carlo integration of Equation 14; in the main text we show the evidences relative to Model V. Conventionally, an advantage in E of a factor of 10 or more is considered strong support in favor of a model.

We briefly discuss some important features of model selection by evidence. In the asymptotic case of large samples (not applicable for the present data), the evidence E is approximated by the well-known Bayes information criterion (BIC), which is an alternative to the popular Akaike information criterion (AIC). While AIC is constructed to select a model whose predictions are maximally similar to future repetitions of the same experiment, evidence and BIC select the model that is most likely to have generated the existing data. BIC and evidence, but not AIC, have a desirable consistency property: If the models are recruited from a hierarchy of nested models which also contains the true model, then the simplest model in comprising the true model is always favored for large enough samples (Wasserman, 2000). This consistency is a manifestation of a general preference of the evidence for parsimonious models. To illustrate this point, following MacKay (2003), we expand the log evidence around the maximum a-posteriori estimate πM*, using Laplace’s method:

(15) logE(M)(πM*)+logdet(2πHM*-1)-logvol(ΠM)(πM*)+i=1dlogsiSi.

Here, HM*=(2πMiπMj)|πM* is the Hessian of the log-likelihood, and for simplicity we have assumed a flat parameter prior p(πM)=1/vol(ΠM). The ith eigenvalue 2π/si of HM* determines the width si of the peak of the posterior distribution around πM*, along the ith principal axis. In the last equality, we have written the parameter space volume vol(ΠM)=iSi as a product of parameter ranges Si along the principal axes. Equation 15 can be interpreted as follows: As more parameters are added to a model, the fit accuracy, measured by (πM*), generally improves. However, each new parameter i incurs a penalty log(si/Si)<0. The more the new parameter needs to be constrained by the data, the more the evidence is reduced. Thus the basic mechanism of parsimony in Bayesian model selection is this: Complex models are characterized by a large number of parameters with wide a priori allowed ranges and sensitive dependence on the data; in other words, they require the data to pick parameters from a large set of possibilities. Complex models are penalized and ranked as less likely. Indeed, such an overly flexible model can be fitted to diverse data, which we should expect to diminish the support that a particular set of data can give to it.

Numerical efficiency and implementation

The evaluation of (Equation 7) requires a final numerical inversion of the recursively assembled covariance 𝐂¯ to obtain the stiffness matrix, which costs O(N3) operations, where N is the number of cells in the tree. To reduce this cost, it is possible to devise an equivalent recursive scheme in which the projected stiffness matrix, not the covariance matrix, is computed recursively, using efficient block-inverse formulæin each recursive step. The computational complexity can then be improved to O(N2). A version of this alternative, equivalent scheme was implemented in the programming language OCaml and used to obtain the results presented in the main text.

Appendix 3

The growth-progression model

Setup

The growth-progression model is based on the idea that the two cell-cycle controlling processes are cell-cycle progression and cell growth. The state variables of both processes, namely, the timing of the regulatory license to divide encoded by p, and cell size s, respectively, are inherited from mother to daughter. The two processes are coupled via inheritance as detailed in the following.

The cell-cycle progression process

Inheritance of the velocity of cell-cycle progression is modeled by a fluctuating, centered Gaussian variable q, passed on from mother to daughter, entirely analogous to version II of the BAR model (see there for additional explanation), according to

(16) qi=aq+bξi+b¯ξı¯,

where subscript i=1,2 denotes a daughter cell, ı¯ its sister, and no subscript, the mother. a with |a|<1, implements inheritance. The intrinsic fluctuation strength and coupling is given by b and b¯; effectively, daughter cells are correlated for given mother by, qiqı¯|q=γ;qi2|q=1, where γ=2bb¯. From the centered q variables, shifted and scaled Gaussian variables p=σpq+μ were generated, finally yielding log-normal regulation cycle durations τp=exp(p). The duration τp is the time elapsed since the last division until regulatory license is given to divide again. Overall, the progression process has four adjustable parameters, μ, σp, a and γ.

The growth process

The growth duration τg is defined as the time to grow from an initial size sb to the threshold size sth. Size accumulation was modeled by exponential growth with the exponential growth rate

(17) dsdt=ks.

However, under MYCN inhibition, exponential growth was prone to generate unreasonably large cells. Here we instead modeled growth by the logistic growth process

(18) dsdt=k(1ssmax)s,

where k is the growth rate constant and smax the maximum cell size. We fixed smax=20, to match the approximate cell size at which the growth rate starts to decrease with observations (Sung et al., 2013; Tzur et al., 2009). The two growth laws Equations 17, 18 yield

(19) τg=k1log(sthsb)

and

(20) τg=k1log[sth(sbsmax)sb(sthsmax)],

respectively.

The normalized threshold cell size sth fluctuates slightly and independently in each cell as sth=1+ζ with ζ𝒩(0,σg2). At division, the final mother cell size sdiv is halved, with each daughter receiving a new size at birth sb=sdiv/2. Effectively, for the subset of cell cycles that are limited by growth, this process corresponds to a sizer mechanism Facchetti et al. (2017). The growth process has two adjustable parameters, k and σg.

Coupling progression and growth

The two processes are coupled via a checkpoint which requires both to be completed before a cell has license to divide. The cell-cycle length is then determined as τ=max(τp,τg). If division is stalled by insufficient cycle progression, s continues to accumulate until cell division, so that the final cell size sdiv>sth. Thus, inheritance of the growth process is influenced by the cycle progression process. In contrast, the progression process is inherited in an autonomous fashion. This unidirectional inheritance structure recapitulates the unidirectional coupling between hidden processes found in the preferred BAR models IV and V.

Model simulation

The growth-progression model was implemented in Matlab (R2016b), R (3.4.3) and OCaml (4.06) (with identical results but increasing execution speed) and lineage trees were simulated. For each tree, an initial cell was generated with birth size sb=sth/2 and p=μ and subsequently, an unbranched single lineage was simulated for 100 generations for equilibration. Its final cell was used as founder cell for the tree. For the data shown, 30 trees of 7 generations each were simulated, roughly corresponding to the dataset sizes obtained experimentally. The simulation was repeated 100 times to generate confidence bounds.

Parameter optimization

Parameters were fitted using Approximate Bayesian Computation independently for each dataset. In an adaptive procedure, (sometimes non-uniform) prior distributions of model parameters were first generated. For each parameter set, 500 trees were simulated at a depth of 7 generations. To compare data D and simulations D^, we used a set S of summary statistics composed of the nine correlation coefficients as shown in Figure 3C, and the mean and all quartiles of the distribution of cell-cycle durations. A squared-distance between data and simulation summary statistics was calculated as

(21) ρ2[S(D^),S(D)]=i=1N[S(Di)S(Di^)σi]2,

where σi were calculated based on the data bootstrap confidence (95%) bounds. Samples of the approximate posterior probability distribution of parameter values (Figure 3–figure supplement 3A ) were then generated by accepting parameter sets with ρ2(S(D^),S(D))<ϵ, weighted by the inverse of the local prior density in parameter space. We used a tolerance ϵ=2 but results were not sensitive to the choice of ϵ within the range 14. Prior parameter ranges were extended as far as necessary for accepted sets to converge. The final accepted ranges of each parameter was used as credible regions, and the medians of each parameter distribution from these samples were selected as best-fits for further analysis.

Data availability

Data generated or analysed during this study are included in the manuscript and supporting files. Source data files have been provided for Figures 1 and 4.

The following previously published data sets were used
    1. Ryl T
    (2017) NCBI Gene Expression Omnibus
    ID GSE98274. RNA-Seq of SHEP TET21N cells upon Doxorubicin treatment.

References

  1. Book
    1. Boudt K
    2. Cornelissen J
    3. Croux C
    (2010)
    The Gaussian Rank Correlation Estimator: Robustness Properties
    Rochester, NY: Social Science Research Network.
    1. Lutz W
    2. Stöhr M
    3. Schürmann J
    4. Wenzel A
    5. Löhr A
    6. Schwab M
    (1996)
    Conditional expression of N-myc in human neuroblastoma cells increases expression of -prothymosin and ornithine decarboxylase and accelerates progression into S-phase early after mitogenic stimulation of quiescent cells
    Oncogene 13:803–812.
  2. Book
    1. MacKay DJC
    (2003)
    Information Theory, Inference and Learning Algorithms
    Cambridge University Press.

Article and author information

Author details

  1. Erika E Kuchen

    1. Theoretical Systems Biology, German Cancer Research Center (DKFZ), Heidelberg, Germany
    2. Bioquant Center, University of Heidelberg, Heidelberg, Germany
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing
    Contributed equally with
    Nils B Becker
    Competing interests
    No competing interests declared
  2. Nils B Becker

    1. Theoretical Systems Biology, German Cancer Research Center (DKFZ), Heidelberg, Germany
    2. Bioquant Center, University of Heidelberg, Heidelberg, Germany
    Contribution
    Conceptualization, Software, Formal analysis, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing
    Contributed equally with
    Erika E Kuchen
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-7490-6425
  3. Nina Claudino

    1. Theoretical Systems Biology, German Cancer Research Center (DKFZ), Heidelberg, Germany
    2. Bioquant Center, University of Heidelberg, Heidelberg, Germany
    Contribution
    Data curation, Investigation, Writing - review and editing
    Competing interests
    No competing interests declared
  4. Thomas Höfer

    1. Theoretical Systems Biology, German Cancer Research Center (DKFZ), Heidelberg, Germany
    2. Bioquant Center, University of Heidelberg, Heidelberg, Germany
    Contribution
    Conceptualization, Supervision, Funding acquisition, Writing - original draft, Writing - review and editing
    For correspondence
    t.hoefer@dkfz.de
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-3560-8780

Funding

Bundesministerium für Bildung und Forschung (0316076A)

  • Thomas Höfer

Bundesministerium für Bildung und Forschung (01ZX1307)

  • Thomas Höfer

Bundesministerium für Bildung und Forschung (031L0087A)

  • Thomas Höfer

Seventh Framework Programme (FP7-HEALTH-2010 ASSET 259348)

  • Thomas Höfer

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Frank Westermann and Tatjana Ryl for TET21N cells and laboratory setup; Timm Schroeder, Fabian Theis and Carsten Marr for embryonic stem cell data and discussions; Alessandro Greco for differential gene expression analysis and all members of the Höfer group for discussions. Grant support to TH from BMBF (MYC-NET 0316076A and Sysmed-NB 01Z × 1307), EU (FP7-HEALTH-2010 ASSET 259348), BMBF and EU (EraCoSysMed OPTIMIZE-NB 031L0087A), as well as DKFZ core funding are gratefully acknowledged; TH is a member of CellNetworks.

Copyright

© 2020, Kuchen et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 2,092
    views
  • 319
    downloads
  • 23
    citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Erika E Kuchen
  2. Nils B Becker
  3. Nina Claudino
  4. Thomas Höfer
(2020)
Hidden long-range memories of growth and cycle speed correlate cell cycles in lineage trees
eLife 9:e51002.
https://doi.org/10.7554/eLife.51002

Share this article

https://doi.org/10.7554/eLife.51002

Further reading

    1. Computational and Systems Biology
    2. Microbiology and Infectious Disease
    Gaetan De Waele, Gerben Menschaert, Willem Waegeman
    Research Article

    Timely and effective use of antimicrobial drugs can improve patient outcomes, as well as help safeguard against resistance development. Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) is currently routinely used in clinical diagnostics for rapid species identification. Mining additional data from said spectra in the form of antimicrobial resistance (AMR) profiles is, therefore, highly promising. Such AMR profiles could serve as a drop-in solution for drastically improving treatment efficiency, effectiveness, and costs. This study endeavors to develop the first machine learning models capable of predicting AMR profiles for the whole repertoire of species and drugs encountered in clinical microbiology. The resulting models can be interpreted as drug recommender systems for infectious diseases. We find that our dual-branch method delivers considerably higher performance compared to previous approaches. In addition, experiments show that the models can be efficiently fine-tuned to data from other clinical laboratories. MALDI-TOF-based AMR recommender systems can, hence, greatly extend the value of MALDI-TOF MS for clinical diagnostics. All code supporting this study is distributed on PyPI and is packaged at https://github.com/gdewael/maldi-nn.

    1. Computational and Systems Biology
    2. Genetics and Genomics
    Sanjarbek Hudaiberdiev, Ivan Ovcharenko
    Research Article

    Enhancers and promoters are classically considered to be bound by a small set of transcription factors (TFs) in a sequence-specific manner. This assumption has come under increasing skepticism as the datasets of ChIP-seq assays of TFs have expanded. In particular, high-occupancy target (HOT) loci attract hundreds of TFs with often no detectable correlation between ChIP-seq peaks and DNA-binding motif presence. Here, we used a set of 1003 TF ChIP-seq datasets (HepG2, K562, H1) to analyze the patterns of ChIP-seq peak co-occurrence in combination with functional genomics datasets. We identified 43,891 HOT loci forming at the promoter (53%) and enhancer (47%) regions. HOT promoters regulate housekeeping genes, whereas HOT enhancers are involved in tissue-specific process regulation. HOT loci form the foundation of human super-enhancers and evolve under strong negative selection, with some of these loci being located in ultraconserved regions. Sequence-based classification analysis of HOT loci suggested that their formation is driven by the sequence features, and the density of mapped ChIP-seq peaks across TF-bound loci correlates with sequence features and the expression level of flanking genes. Based on the affinities to bind to promoters and enhancers we detected five distinct clusters of TFs that form the core of the HOT loci. We report an abundance of HOT loci in the human genome and a commitment of 51% of all TF ChIP-seq binding events to HOT locus formation thus challenging the classical model of enhancer activity and propose a model of HOT locus formation based on the existence of large transcriptional condensates.