Hidden longrange memories of growth and cycle speed correlate cell cycles in lineage trees
Abstract
Cell heterogeneity may be caused by stochastic or deterministic effects. The inheritance of regulators through cell division is a key deterministic force, but identifying inheritance effects in a systematic manner has been challenging. Here, we measure and analyze cell cycles in deep lineage trees of human cancer cells and mouse embryonic stem cells and develop a statistical framework to infer underlying rules of inheritance. The observed longrange intragenerational correlations in cellcycle duration, up to second cousins, seem paradoxical because ancestral correlations decay rapidly. However, this correlation pattern is naturally explained by the inheritance of both cell size and cellcycle speed over several generations, provided that cell growth and division are coupled through a minimumsize checkpoint. This model correctly predicts the effects of inhibiting cell growth or cycle progression. In sum, we show how fluctuations of cell cycles across lineage trees help in understanding the coordination of cell growth and division.
Introduction
Cells of the same type growing in homogeneous conditions often have highly heterogeneous cycle lengths (Smith and Martin, 1973). The minimal duration of the cell cycle will be determined by the maximal cellular growth rate in a given condition (Kafri et al., 2016). However, many cells, in particular, in multicellular organisms, do not grow at maximum rate, and their cycle length appears to be set by the progression of regulatory machinery through a series of checkpoints (Novak et al., 2007). While much is known about the molecular mechanisms of cellcycle regulation, we have little quantitative understanding of the mechanisms that control duration and variability of the cell cycle.
Recently, extensive livecell imaging data of cell lineages have become available, characterizing, for example, lymphocyte activation (Mitchell et al., 2018; Duffy et al., 2012; Hawkins et al., 2009), stem cell dynamics (Filipczyk et al., 2015), cancer cell proliferation (Spencer et al., 2013; Barr et al., 2017; Ryl et al., 2017), or nematode development (Du et al., 2015). Such studies across many cell types have found that cycle lengths are similar in sister cells, which may be due to the inheritance of molecular regulators across mitosis (Spencer et al., 2013; Mitchell et al., 2018; Yang et al., 2017; Barr et al., 2017; Arora et al., 2017). By contrast, ancestral correlations in cycle length fade rapidly, often disappearing between grandmother and granddaughter cells, or already between mother and daughter cells.
Remarkably, however, the cycle lengths of cousin cells are found to be correlated, indicating that the grandmothers exert concealed effects through at least two generations. High intragenerational correlations in the face of weak ancestral correlations have been observed in cells as diverse as bacteria (Powell, 1958), cyanobacteria (Yang et al., 2010), lymphocytes (Markham et al., 2010) and mammalian cancer cells (Staudte et al., 1984; Sandler et al., 2015; Chakrabarti et al., 2018). The ubiquity of this puzzling phenomenon suggests that it may help reveal basic principles that control cellcycle duration.
Theoretical work has shown that more than one heritable factor is required to generate the observed cellcycle correlations in T cell lineage trees, while the nature of these heritable factors has remained unclear (Markham et al., 2010). Stimulated by the idea of circadian gating of the cell cycle in cyanobacteria (Mori et al., 1996; Yang et al., 2010), recent comprehensive analyses of cell lineage trees across different species have proposed circadian clock control as a source of cellcycle variability that can produce the observed high intragenerational correlations (Sandler et al., 2015; Mosheiff et al., 2018; Martins et al., 2018; Py et al., 2019); such a model also reproduced observed cycle correlations in colon cancer cells during chemotherapy (Chakrabarti et al., 2018). However, in proliferating mammalian cells in culture, the circadian clock has been found to be entrained by the cell cycle (Bieler et al., 2014; Feillet et al., 2014). Moreover, the circadian clock is strongly damped or even abrogated by oncogenes such as MYC (Altman et al., 2015; Shostak et al., 2016) yet MYCdriven cancer cells retain high intragenerational correlations (Ryl et al., 2017).
Ultimately, the cell cycle must coordinate growth and division in order to maintain a welldefined cell size over many generations. Yeast species have long served as model systems. Here, it is assumed that growth drives cellcycle progression, although molecular mechanisms of size sensing remain controversial (Facchetti et al., 2017; Schmoller and Skotheim, 2015). By contrast, animal cells can grow very large without dividing (Conlon and Raff, 2003), and recent precise measurements suggest that growth control involves both modulation of growth rate and cellcycle length (Sung et al., 2013; Tzur et al., 2009; Cadart et al., 2018; Ginzberg et al., 2018; Liu et al., 2018). A minimal requirement for maintaining cell size is that cells reach a critical size before dividing, which can be achieved by delaying S phase (Shields et al., 1978).
Here, we present a systematic approach to learning mechanisms from measured correlation patterns of cell cycles in deep lineage trees. First, we develop an unbiased statistical framework to identify the minimal model capable of accounting for our experimental data. We then propose a biological realization of this abstract model based on growth, inheritance and a size checkpoint, and experimentally test specific predictions of the biological model.
Results
Lineage trees exhibit extended intragenerational correlations
To study how far intragenerational cellcycle correlations extend within cell pedigrees, we generated extensive lineage trees by imaging and tracking TET21N neuroblastoma cells for up to ten generations during exponential growth (Figure 1A, Figure 1—video 1, Figure 1—source data 1 and Figure 1—figure supplement 1A). Autonomous cycling of these cells is controlled by ectopic expression of the MYCfamily oncogene MYCN, overcoming the restriction point and thus mimicking the presence of mitogenic stimuli (Ryl et al., 2017). High MYCN also downregulated circadian clock genes (Figure 1—figure supplement 2). The distribution of cycle lengths (Figure 1B and Figure 1—figure supplement 1B) was constant throughout the experiment (Figure 1C and Figure 1—figure supplement 1C) and similar across lineages (Figure 1—figure supplement 1D), showing absence of experimental drift and of strong founder cell effects, respectively. To determine cyclelength correlations without censoring bias caused by finite observation time (Figure 1—figure supplement 3A; Sandler et al., 2015), we truncated all trees after the last generation completed by the vast majority (>95%) of lineages. The resulting trees were 5–7 generations deep, enabling us to reliably calculate Spearman rank correlations between relatives up to second cousins (Figure 1D,E and Figure 1—figure supplement 3B).
Cyclelength correlations of cells with their ancestors decreased rapidly with each generation (Figure 1E). However, the correlations increased again when moving down from ancestors along sidebranches—from the grandmother toward the first cousins and also from the greatgrandmother toward the second cousins (Figure 1E). The correlations among second cousins varied somewhat between replicates (we will show below that we can control these correlations experimentally by applying molecular perturbations). If cellcycle length alone were inherited (e.g. by passing on regulators of the cell cycle to daughter cells), causing a correlation coefficient of $\rho}_{\mathrm{m}\mathrm{d}$ between mother and daughter cycle lengths, and sisters are correlated by $\rho}_{\mathrm{s}\mathrm{s}$, then first and second cousins would be expected to have cycle length correlation $\rho}_{\mathrm{s}\mathrm{s}}{\rho}_{\mathrm{m}\mathrm{d}}^{2$ and $\rho}_{\mathrm{s}\mathrm{s}}{\rho}_{\mathrm{m}\mathrm{d}}^{4$, respectively (Staudte et al., 1984). The actually observed cousin correlations are much larger, confirming previous observations on first cousins as summarized in Sandler et al. (2015) and extending them to second cousins. This discrepancy between simple theoretical expectation and experimental data was not due to spatial inhomogeneity or temporal drift in the data (Figure 1; Figure 1—figure supplement 3CE). Thus, the lineage trees show longranging intragenerational correlations that cannot be explained by the inheritance of cellcycle length.
Correlation patterns are explained by longrange memories of two antagonistic latent variables
We used these data to search for the minimal model of cellcycle control that accounts for the observed correlation pattern of lineage trees (Materials and methods and Appendix 2). To be unbiased, we assumed that cycle length $\tau $ is controlled jointly by a yet unknown number $d$ of cellular quantities that are inherited from mother to daughter, $\mathit{x}=({x}_{1},\dots ,{x}_{d}{)}^{\mathsf{T}}$, such that $\tau =\tau ({\sum}_{l=1}^{d}{\alpha}_{l}{x}_{l})$, with positive weights $\mathit{\alpha}$. We take $\mathit{x}$ to be a Gaussian latent variable and, generalizing previous work (Cowan and Staudte, 1986), describe its inheritance by a generic model accounting for intergenerational inheritance as follows: In any given cell $i$, $\mathit{x}}^{i$ is composed of an inherited component, determined by $\mathit{x}$ in the mother, and a cellintrinsic component that is uncorrelated with the mother. The inherited component is specified by an inheritance matrix $\mathbf{\mathbf{A}}$, such that the mean of $\mathit{x}}^{i$ conditioned on the mother’s $\mathit{x}$ is $\u27e8{\mathit{x}}^{i}\mathit{x}\u27e9=\mathbf{A}\mathit{x}$ (Figure 2A). The cellintrinsic component causes variations around this mean with covariance $\u27e8({\mathit{x}}^{i}\mathbf{A}\mathit{x})({\mathit{x}}^{i}\mathbf{A}\mathit{x}{)}^{\mathsf{T}}\mathit{x}\u27e9=\mathbf{I}$, where, with appropriate normalization of the latent variables, $\mathbf{\mathbf{I}}$ is the unit matrix. Additional positive correlations in sister cells may arise due to inherited factors accumulated during, but not affecting, the mother’s cycle (Arora et al., 2017; Barr et al., 2017; Yang et al., 2017); additional negative correlations may result from partitioning noise (Sung et al., 2013). These are captured by the crosscovariance between the intrinsic components in sister 1 and 2, $\u27e8({\mathit{x}}^{1}\mathbf{A}\mathit{x})({\mathit{x}}^{2}\mathbf{A}\mathit{x}{)}^{\mathsf{T}}\mathit{x}\u27e9=\gamma \mathbf{I}$. In total, $d(d+1)$ parameters can be adjusted to fit the correlation pattern of the lineage trees: the components ${a}_{lm}$ of the inheritance matrix $\mathbf{\mathbf{A}}$, the weights ${\alpha}_{l}$ and the sister correlation $\gamma $. Together, these inheritance rules specify bifurcating firstorder autoregressive (BAR) models for multiple latent variables governing cellcycle duration.
To determine the most parsimonious BAR model supported by the experimental data, we employed a standard Bayesian model selection scheme. Selection is based on the Bayesian evidence, which rewards fit quality while naturally penalizing models of higher complexity (defined as being able to fit more diverse data sets; for details see Appendix 2, Evidence calculation). Specifically, we evaluated the likelihood of the measured lineage trees for a given BAR model, used it to compute the Bayesian evidence, and ranked BAR models accordingly (Figure 2B).
The simplest model that generated high intragenerational correlations was based on the independent inheritance of two latent variables (Model III; Figure 2C, cyan dots), whereas onevariable models failed to meet this criterion (Model II, Figure 2C, blue dots and Model I). However, Model III consistently overestimated ancestral correlations and hence its relative evidence was low (<10% for all data sets). To allow additional degrees of freedom, we accounted for interactions of latent variables. The most general twovariable model with bidirectional interactions (Model VI), overfitted the experimental data and consequently had low evidence. The models best supported by the data had unidirectional coupling, such that ${x}_{2}$ in the mother negatively influenced ${x}_{1}$ inherited by the daughters, that is with ${a}_{12}<0$ and ${a}_{21}=0$ (Figure 2B, Models IV, V and VII). Among these, Model VII, with a single inheritance parameter ${a}_{12}$, is simplest, but was not compatible with experimental replicate rep1 as it could not generate secondcousin correlations (Figure 2B,C). Both Models IV and V were compatible with all replicates; however, Model V with only one selfinheritance parameter for both variables (${a}_{11}={a}_{22}>0)$ was preferred (Model V, Figure 2B,C, orange dots). Model V produced a remarkable inheritance pattern (Figure 2D): Individually, both latent variables had longranging memories, with ∼50% decay over 2–3 generations. However, the negative unidirectional coupling crosscorrelated the variables negatively along an ancestral line, resulting in cyclelength correlations that essentially vanished after one generation. Nevertheless, strong intragenerational correlations were reproduced by the model due to longrange memories of latent variables together with positive sistercell correlations ($\gamma >0$). We conclude that the coexistence of rapidly decaying ancestral correlations and extended intragenerational correlations can be explained by the inheritance of two latent variables, one of which inhibits the other.
Cell size and speed of cellcycle progression are antagonistic heritable variables
During symmetric cell division, both cell size and regulators of cellcycle progression are passed on from the mother to the daughter cells (Spencer et al., 2013; Yang et al., 2017; Arora et al., 2017; Barr et al., 2017). We now show that simple and generic inheritance rules for these two variables provide a physical realization for BAR Model V.
To divide, cells need to both grow to a minimum size (Shields et al., 1978) and receive license to progress through the cell cycle from the regulatory machinery (Novak et al., 2007). Indeed, growth and cellcycle progression can be separately manipulated experimentally in mammalian cells (Fingar et al., 2002). In particular, cells continue to grow in size when regulatory license is withheld, for example in the absence of mitogens, and growth is not otherwise constrained, for example by mechanical force or growth inhibitors (Fingar et al., 2002; Conlon and Raff, 2003).
While growth and cellcycle progression are separable and heritable processes, they also interact. At the very least, the length of the cell cycle needs to ensure that cells grow to a sufficient size for division. This interaction alone implies an effect of one inherited variable, cycle progression, on the other, cell growth, that anticorrelates subsequent cell cycles (as required by BAR model V): If a delayed regulatory license prolongs the mother’s cell cycle, it will grow large. By size inheritance, its daughters will be large at birth, reach a size sufficient for division quickly and hence may have shorter cell cycles. Thus, despite inheritance of growth and cellcycle regulators mothers and daughters may have very different cycle lengths due to this interaction.
Based on these ideas, we formulated a simple quantitative model of growth and cellcycle progression on cell lineage trees. We introduced the variables ‘cell size’ $s$, measuring metabolic, enzymatic and structural resources accumulated during growth, and $p$, characterizing the progression of the cellcycle regulatory machinery. Unlike the latent variables of the BAR model ${x}_{1}$ and ${x}_{2}$, their mechanistic counterparts $s$ and $p$, respectively, are governed by rules reflecting basic biological mechanisms (Figure 3A, Appendix 3). Size $s$ grows exponentially and is divided equally between the daughters upon division. We found that under some experimental conditions generating long cell cycles (downregulation of MYCN, see below), stable cell size distributions required feedback regulation of growth rate, as seen experimentally (Sung et al., 2013; Tzur et al., 2009); we implemented this as a logistic limitation of growth rate at large sizes for these conditions. The progression variable $p$ determines the time taken for the regulatory machinery to complete the cell cycle, which is controlled by the balance of activators and inhibitors of cyclindependent kinases. These regulators are inherited across mitosis (Spencer et al., 2013; Yang et al., 2017; Arora et al., 2017; Barr et al., 2017) and hence the value of $p$ is passed on to both daughter cells with some noise. Cells divide when they have exceeded a critical size, requiring time ${\tau}_{g}$, and the regulatory machinery has progressed through the cycle, which takes an approximately lognormally distributed time (Ryl et al., 2017; Mitchell et al., 2018) modeled as ${\tau}_{p}=\mathrm{exp}(p)$. Hence the cycle length is $\tau =\mathrm{max}({\tau}_{g},{\tau}_{p})$. Apart from requiring a minimum cell size for division, the growthprogression model does not implement a drive of the cell cycle by growth and thus allows cells to grow large during long cell cycles. By this mechanism, the cell size variable $s$ is influenced by cycle progression, analogous to the BAR variable ${x}_{1}$. By contrast, the progression variable $p$ is not influenced by cell size, analogous to the variable ${x}_{2}$ in the BAR model.
We fitted this model to the measured lineage trees by Approximate Bayesian Computation (Figure 3—figure supplement 1A and Figure 3—source data 1). The parameterized model yielded a stationary cell size distribution (Figure 3—figure supplement 1B) and reproduced the cyclelength distribution (Figure 3B and Figure 3—figure supplement 1C) as well as the ancestral and intragenerational correlations (Figure 3C and Figure 3—figure supplement 1D). Thus, the dynamics of cell growth and cellcycle progression, coupled only through a minimumsize requirement, account for the intricate cyclelength patterns in lineage trees.
To gain intuition on the inheritance patterns of cycle length, we first considered ancestral correlations, focusing on motherdaughter pairs. Individual cell cycles in the model are either growthlimited, that is division happens upon reaching the minimum size, or progressionlimited, that is the cell grows beyond the minimum size until the cycle is completed (Figure 3D and Figure 3—figure supplement 1E). If both mother and daughter are progressionlimited (i.e., the threshold size is exceeded by both), their cycles are positively correlated (Figure 3E, green dots). As in this case size inheritance is inconsequential, this positive correlation is explained by the inheritance of the cellcycle progression variable $p$ alone. By contrast, all motherdaughter pairs that involve at least one growth limitation show nearzero (Figure 3E, cyan dots) or negative correlations (Figure 3E, magenta and black dots). This pattern is explained by the anticorrelating effect that daughters of longerlived and hence larger mother cells require on average shorter times to reach the size threshold. Next, we considered intragenerational correlations, focusing on first cousins (Figure 3F). While cousins are positively correlated overall, this correlation is carried specifically by cousins that descend from a grandmother with a progressionlimited cell cycle (Figure 3F, blue dots), whereas cousins stemming from a growthlimited grandmother are hardly correlated (Figure 3F, orange dots). Since progressionlimited cells can grow large, this observation indicates that cousin correlations are mediated by inheritance of excess size, as is confirmed by conditioning cousin correlations on grandmother size (Figure 3—figure supplement 1F). Size inheritance over several generations is also evident in the autocorrelation of the time required to grow to minimum size, ${\tau}_{g}$ (Figure 3G and Figure 3—figure supplement 1G, black dots). The autocorrelation of the progression time ${\tau}_{p}$ is also positive (but less longranging; Figure 3G and Figure 3—figure supplement 1G, green squares), while the negative interaction with growth is reflected in the negative crosscorrelation (Figure 3G and Figure 3—figure supplement 1G, red triangles). In sum, the longrange memories of cellcycle progression and cell growth are masked by negative coupling of these processes, causing rapid decay of cellcycle length correlations along ancestral lines (Figure 3G and Figure 3—figure supplement 1G, orange triangles). These inheritance characteristics of the growthprogression model mirror those of BAR model V (see Figure 2D).
Effects of molecular perturbations on cellcycle correlations are correctly predicted by the model
If the growthprogression model captures the key determinants of the cellcycle patterns in lineage trees, it should be experimentally testable by separately perturbing growth versus cellcycle progression. We first derived model predictions for these experiments. Intuitively, if growth limitation were abolished by slowing cellcycle progression, only progression would be inherited and therefore motherdaughter correlations of cycle time could no longer be masked. As a result, we expect intragenerational correlations to be reduced relative to ancestral correlations when inhibiting cycle progression; conversely, inhibiting growth (and thereby increasing growth limitation) should raise them. Indeed, using the model to simulate perturbation experiments (Figure 3—source data 1), we found that growth inhibition increased cousin correlations relative to motherdaughter correlations, whereas slowing cellcycle progression decreased these correlations (Figure 4A).
To test this prediction, we slowed cellcycle progression experimentally by reducing MYCN, exploiting the doxycyclinetunable MYCN gene integrated in the TET21N cells. Cells grew to larger average size (Figure 4B, blue lines) over longer and more variable cell cycles (Figure 4C). These data show that lowering MYCN slowed cellcycle progression while allowing considerable cell growth. Further consistent with this phenotype, expression of mTOR, a central regulator of metabolism and growth (Fingar et al., 2002), was not lowered (Figure 4—figure supplement 1A). In a separate experiment, we inhibited cell growth by applying the mTOR inhibitor rapamycin, which reduced cell size by a small but reproducible amount (Figure 4B, red lines). This treatment also lengthened the cell cycle slightly (Figure 4C) but without changing MYCN protein levels (Figure 4—figure supplement 1B). Thus, lowering MYCN and inhibiting mTOR are orthogonal perturbations that act on cellcycle progression and cell growth, respectively. As predicted by the growthprogression model, these perturbations resulted in markedly different cyclelength correlation patterns within lineage trees (Figure 4D,E and Figure 4—figure supplement 1C,D): Lowering MYCN decreased intragenerational correlation and, in particular, removed secondcousin correlations. By contrast, rapamycin treatment strongly increased intragenerational correlations and caused ancestral correlations to decline only weakly. Collectively, these findings support the growthprogression model of cellcycle regulation.
Cousin correlations reflect active cellsize checkpoint
We then asked whether the cyclelength correlation patterns experimentally observed in lineage trees contain information about the underlying regulation. To this end, we fitted the BAR and growthprogression models to MYCN and rapamycin perturbation data. We again obtained good agreement with the data (Figure 4C,D and Figure 4—figure supplement 1C–E,G–J). MYCN knockdown cells grew larger; to obtain a stable cellsize distribution for the corresponding model fits, we implemented logistic regulation of growth rate at large cell sizes. When we applied, for comparison, this regulation to the control and rapamycin treatment data, the model fits (Figure 4—figure supplement 2) were not noticeably affected compared to purely exponential growth. This result indicates that growth rate regulation affecting large cells, as found experimentally (Ginzberg et al., 2018), is compatible with longrange intragenerational correlations in cellcycle length.
In terms of fit parameters of the model, MYCN inhibition caused considerable slowing of cycle progression and also a moderate decrease in growth rate (Figure 3—figure supplement 1A, parameters $\mu$ and $k$, respectively). As a result, the vast majority of cell cycles were progressionlimited (Figure 4F and Figure 4—figure supplement 1F). For the rapamycintreated cells, we estimated growth rates that were lower than for the control experiments on average, as expected (Figure 3—figure supplement 1A, parameter $k$). Also, correlations in cellcycle progression increased in motherdaughter and sister pairs (parameters $a$ and $\gamma $, respectively). This is consistent with rapamycin inhibiting mTOR and hence growth, but not affecting drivers of the cell cycle, ERK and PI3K (Adlung et al., 2017), since then lengthening of the cell cycle due to slower growth may allow prolonged degradation of cellcycle inhibitors, which would increase inheritance of cellcycle length (Smith and Martin, 1973). Taken together, rapamycin treatment increased the fraction of growthlimited cell cycles (Figure 4F and Figure 4—figure supplement 1F) and inheritance of cellcycle progression speed, thus causing increased ancestral and intragenerational correlations.
We hypothesized that growth may be limiting primarily for rapidly proliferating cell types, even without specific growth inhibition. We analyzed timelapse microscopy data of nontransformed mouse embryonic stem cells (Filipczyk et al., 2015) that proliferate much faster than the neuroblastoma cells (Figure 5A and Figure 5—figure supplement 1A). Sidebranch correlations of cycle length were again large (Figure 5B Figure 5—figure supplement 1B), as seen in the previous data except for the MYCNinhibited cells. Interestingly, the strength of the intragenerational correlations was most similar to the much more slowly dividing rapamycintreated cells (cf. Figure 4D). As before, the BAR model required two negatively coupled variables to account for these data (Figure 5C, Figure 5—figure supplement 1D,E). Fitting the growthprogression model to the data (Figure 5A,B), we found that the majority (∼60%) of cell cycles were limited by growth (Figure 5D, Figure 5—figure supplement 1C), indicating that cycle length of fast proliferating mammalian cells is, to a large extent, controlled by growth.
Discussion
Here, we showed that the seemingly paradoxical pattern of cellcycle lengths in lineage trees, with rapidly decaying ancestral and longrange intragenerational correlations, can be accounted for by the inheritance of two types of quantities: resources accumulated during the cell cycle (cell ‘size’) and regulators governing the speed of cellcycle progression. The fact that these are fundamental processes in dividing cells may help explain the ubiquity of the paradoxical cellcycle pattern. Targeted experimental perturbations of cell growth and cellcycle progression support our model.
As an alternative mechanism underlying the observed cellcycle variability and cousin correlations, modulation of the cell cycle by the circadian clock has been suggested, with strongest experimental evidence to date for cyanobacteria (Sandler et al., 2015; Mosheiff et al., 2018; Py et al., 2019). Unlike cyanobacteria, proliferating mammalian cells show entrainment of the circadian clock to the cell cycle with periods well below 24 hr (Bieler et al., 2014; Feillet et al., 2014). Whether in this setting the circadian clock could still influence cellcycle correlations remains to be studied.
Our proposed mechanism for cycle length correlations was motivated by Bayesian model selection. All data sets which displayed longrange correlation patterns (control replicate rep1 and rapamycin treatment of neuroblastoma cells as well as ESCs) selected a BAR model featuring longrange inheritance of two memory variables, which is masked in motherdaughter pairs by an anticorrelating interaction between them. The growthprogression model shares these essential features. Size inheritance is predicted to be particularly longrange by the fits of the growthprogression model to the experimental data. This memory emerges because growth does not force cellcycle progression, allowing cells to grow large in progressionlimited cycles and then pass on size over several generations. Such weak coupling between concurrent processes echoes recent work in E. coli on timing of cell division (Micali et al., 2018) and on the existence of two parallel adder processes for replication and division (Witz et al., 2019). Of note, the latter paper also utilizes a statistical framework for systematic model selection against experimentally observed correlation patterns akin to our approach.
Our findings raise the question of the mechanistic modes by which cells coordinate cell cycle and growth, which has been a longstanding problem; see Shields et al. (1978) for historical references and Ho et al. (2018) for a recent review. Recent work on this problem has shown the existence of negative feedback of cell size on growth rate (Ginzberg et al., 2018). We have found that our results remain robust when implementing a simple form of such a feedback (logistic dependence of growth rate on size) in the growthprogression model. Another recent study has shown that growth of many mammalian cells during the cell cycle adds a volume that only weakly increases with cell volume at birth (termed nearadder behavior, Cadart et al., 2018). This behavior appears to be caused by a combination of growthrate regulation and cellsize effects on cellcycle progression. We expect that factoring in the cellcycle length correlations studied here will help uncover the mechanistic details of cell size regulation. Refining our model in this direction may also help capture yet more detail of the correlation structure, such as the apparent increasing trend from aunt to first cousins and greataunt to second cousins. Moreover, we envisage that our inference approach could be extended to include finely resolved data on cellcycle phases (Chao et al., 2019) as well as multiple cell fates via asymmetric division and differentiation (Duffy et al., 2012).
Materials and methods
Experimental methods
Request a detailed protocolMYCNtunable mammalian neuroblastoma SHEP TET21N (RRID:CVCL_9812) cells were cultured as in Lutz et al. (1996). These TET21N cells were obtained from Dr. Frank Westermann (German Cancer Research Center, DKFZ) whose lab generated this line. This cell line is regularly authenticated by an inhouse DKFZ service using STR profiling. Mycoplasma contamination testing was negative. MYCN and mTOR were inhibited using 1 μg/ml doxycycline or 20–40 nM rapamycin (Calbiochem, 553210–100 UG), respectively. Cells were grown on ibidi μslides and phase contrast images (Nikon TiE) acquired every 6–15 min for up to 7 days under controlled growth conditions. The presented data consists of independent biological and technical replicates with n = 3 for untreated TET21N cells, n = 2 for MYCNinhibited and rapamycintreated cells. Cells were tracked in Fiji (version 1.48d) using the tracking plugin MTrackJ (Meijering et al., 2012). For flow cytometry, cells were stained with MYCN primary antibody (Santa Cruz Biotechnology, Cat# sc53993; RRID:AB_831602), secondary fluorescenceconjugated antibody goat antimouse Alexa Fluor 488 IgG (Life Technologies Cat#A11001; RRID:AB_2534069) and measured on a Miltenyi VYB MACSQuant Analyser. See Appendix 1 for details.
Data analysis
Request a detailed protocolMATLAB (R2016b) was used for all data analyses. Correlations represent Spearman rank correlations or, for the BAR model, Pearson correlation coefficients between the Gaussiantransformed cycle times. The difference between these two methods was far smaller than the experimental error. Error bounds were estimated by bootstrap resampling on the level of lineage trees. Censoring bias was avoided by truncating lineage trees after the last generation completed by all lineages within the experiment (see e.g. Sandler et al., 2015 and Appendix 1), truncating the trees to 7, 6 and 5 generations for the three MYCN amplified experiments. MYCNinhibited and rapamycintreated trees were five generations deep.
Bifurcating autoregressive (BAR) model
Request a detailed protocolWe constructed BAR models of cellcycle inheritance, as described in detail in Appendix 2. Briefly, the cell state is determined by a vector of Gaussian (latent) variables which are inherited from the mother to the daughter cells by a linear map plus a cellintrinsic noise term, which is correlated between daughters. The model is thus a Gaussian latentvariable model, where inheritance takes the form of an autoregressive vectorAR(1) process defined on a lineage tree. The cycle time is then calculated by an dataderived (approximately exponential) function of a weighted sum of the cellular state. We calculated wholelineage tree loglikelihood functions analytically and used them to evaluate Bayesian Evidences (Bayes factors) that quantify the relative support from the data for various model variants.
The growthprogression model
Request a detailed protocolCellcycle progression is modeled by a fluctuating, centered Gaussian heritable variable $q$, analogous to version II of the BAR model. Variables were scaled and shifted, $p={\sigma}_{p}q+\mu $, yielding lognormal progression durations ${\tau}_{p}=\mathrm{exp}(p)$. Size accumulation was modeled by exponential growth or for MYCN inhibition logistic growth. The normalized critical cell size ${s}_{\text{th}}$ fluctuates slightly and independently in each cell as ${s}_{\text{th}}=1+\zeta $ with $\zeta \sim \mathcal{N}(0,{\sigma}_{g}^{2})$. The growthprogression model was implemented in Matlab (R2016b), R (3.4.3) and OCaml (4.06) and 30 trees of 7 generations simulated, corresponding to the experimental dataset sizes. The simulation was repeated 100 times to generate confidence bounds. Parameters were fitted using Approximate Bayesian Computation independently for each dataset. See Appendix 3 for details.
Appendix 1
Experimental methods
Cell culturing and treatment
MYCNtunable mammalian neuroblastoma SHEP TET21N (TET21N, RRID: CVCL_9812) (Lutz et al., 1996) cells were cultured in RPMI 1640 medium supplemented with 10% fetal calf serum and 1% penicillin/streptomycin at 37°C 5% CO_{2} and 88% humidity. Versene was used for harvesting. TET21N cells were originally isolated from a female patient. Cell lines are authenticated by the German Cancer Research Center in house facility every halfyear. MYCNinhibited populations were established by incubating cells with 1 μg/ml doxycycline for 48–72 hr prior to further analysis. Growthinhibited populations were generated by treating cells with the mTOR inhibitor rapamycin (Calbiochem, 553210–100 UG) at 20 nM or 40 nM rapamycin dissolved in DMSO. Cells were treated with rapamycin or the same concentration of DMSO for 72 hr prior to harvesting for flow cytometry or livecell microscopy.
Livecell microscopy
10^{3} cells were grown on 8well ibidi μslides coated with collagen IV (Cat# 80822) in RPMI 1640 medium and imaged every 6–15 min for up to 7 days under controlled growth conditions at 37°C, 5% CO_{2} and 80% humidity (Pecon incubator P). Growth media was changed every 2–3 days. Phase contrast images were acquired with an inverted widefield microscope (Nikon TiE) using an EMCCD camera (Andor iXON3 885) and a 10x (CFI Planfluor DL10x, NA 0.3) or 20x lense (CFI Plan Apochromat DM 20x, NA 0.75). Cells were tracked in Fiji (version 1.48d) using the manual tracking plugin MTrackJ (Meijering et al., 2012). The presented imaging data consists of independent biological and technical replicates with n = 3 for untreated TET21N cells, n = 2 for MYCNinhibited cells and n = 2 for rapamycintreated cells.
Flow cytometry and antibody staining
10^{6} cells were fixed with 4% paraformaldehyde for 15 min at room temperature. Permeabilization was performed in 90% icecold methanol for at least 24 hr at −20°C. Cells were washed in staining buffer (1% BSA, 0.1% TritonX in PBS), and incubated with 0.5–1 μg per sample of MYCN primary antibody (Santa Cruz Biotechnology, Cat# sc53993; RRID:AB_831602) for 1 hr at room temperature. Cells were washed 3x with staining buffer and incubated with a secondary flouresenceconjugated antibody, goat antimouse Alexa Fluor 488 IgG (Life Technologies Cat#A11001; RRID:AB_2534069), again for 1 hr at room temperature. Cells were washed 3x with staining buffer and DNA content staining was performed with FxCycle Violet Stain (Thermo Fischer Scientific). A Miltenyi VYB MACSQuant Analyser was used for measurements and data was analysed using FlowJo software.
Transcriptomics
mTOR mRNA expression data in TET21N cells under control and MYCNinhibited conditions was obtained from RNASeq measurements by Ryl et al. (2017) deposited at GEO (GSE98274).
Data analysis
MATLAB (R2016b) was used for all data analysis steps.
Correlation coefficients
Correlation patterns of lineage trees of data and the growthprogression model represent Spearman rank correlations. For the bifurcating autoregressive (BAR) model, Pearson correlation coefficients are calculated between the Gaussiantransformed cycle times. These are the same as Gaussian rank correlations between cycle times. In practice, the differences between Gaussian rank correlations and Spearman rank correlations were much smaller than experimental error bounds on either, so we did not differentiate between the measures. Confidence bounds on correlation coefficients were estimated by bootstrap resampling on the level of lineage trees; resampling on the level of individual pairs of related cells would neglect the correlations between cells of the same tree and thus underestimate variability. Trees that did not contain information on the cell pair under analysis were removed prior to bootstrapping. At each bootstrap repeat, family trees were randomly drawn with replacement up to the number of trees in the original dataset. From the resulting bootstrap sample all cell pairs were used to calculate the sample correlation coefficient. This process was repeated 10,000 times. From the resulting distribution of correlation coefficients the 95% quantiles were used as confidence bounds.
Exponential growth
For each experiment, an exponential growth model of the form ${N}_{t}={N}_{0}\mathrm{exp}(kt)$ was fitted to cell counts over time by performing a maximum likelihood estimation using the trustregion algorithm in MATLAB.
Cellcycle length distribution over time
At each time point, the movingwindow median was calculated from all cells born within a window of 10 hr before or after this point.
Randomization
All cells within a dataset were randomly paired with each other and correlation of the resulting sample calculated. This procedure was repeated 10,000 times. From the resulting distribution the mean and the 95% quantiles are given.
Censoring
Censoring bias resulting from a finite observation time can lead to an overrepresentation of faster cells. To demonstrate this effect, we generated trees using a toy model with independent normally distributed cycle lengths. The trees were truncated at various total observation times and Spearman rank correlation coefficients between all related cells within the observation time window were sampled. As Figure 1—figure supplement 3A shows, short observation times strongly distort the sampled motherdaughter and sibling correlations away from their true value 0. The same basic effect persists for more distant relationships and can be further enhanced if cycle lengths are inherited. This censoring bias can be avoided by truncating lineage trees not after a given observation time, but after the last generation completed by all lineages within the experiment, uniformly over all trees (see e.g. Sandler et al., 2015). In this way slower and fastercycling lineages are represented equally. Because some cells were inevitably lost from the field of view by migration, and a small percentage of cells showed extremely slow cycles, such a strict cutoff was unfeasible in our experiments. We assumed that cell loss by migration is not correlated to cycle length, so migrating cells were not counted as missing from otherwise complete trees. Within the remaining tree, we then determined the last generation to be included in our analysis by the following procedure: We first counted the number of cells ${n}_{\text{alive}}(G)$ within each generation $G$ that were still alive at the end of the observation period. The last generation ${G}_{\text{last}}$ to be included was then determined as the maximum generation such that
All further generations were removed from the dataset. This procedure truncated the trees to 7, 6 and 5 generations, respectively for the three replicate experiments using our MYCN amplified cell line. MYCNinhibited and rapamycintreated trees were 5 generations deep.
Spatial trend
To assess potential spatial biases related to locally variable conditions, cells were divided into a 4 × 4 grid according to their position at division. The cycle length distribution of cells within each grid region (containing ≥ 5 cells) was compared to the distribution (1) within every other grid region and (2) of the whole dataset at a 5% significance level using a twosided KolmogorovSmirnov test and correcting for multiple testing using the BenjaminiHochberg procedure (using functions ks.test and p.adjust in R). Note also that because cells are motile they experience a range of local environments during their lifetime.
Appendix 2
Bifurcating autoregressive model for cycle inheritance
Setup
In order to explore systematically which simple local inheritance schemes can generate the experimentally observed cycle length correlations, we study a class of Gaussian latent variable models of adjustable complexity. In the models, the cycle length $\tau $ of a cell is obtained from a standard normal variable $\mathrm{\ell}$ by a nonlinear transformation $\tau =g(\mathrm{\ell})$. In our data, cell cycle lengths are roughly lognormally distributed, so $g$ is approximately a shifted exponential function. To simplify and make the model more robust to outliers, we determine $g$ empirically, such that its inverse ${g}^{1}$ transforms the cell cycle length into a standard normal variate. That is, we choose $g(\mathrm{\ell})={c}_{\text{ex}}^{1}({c}_{\text{gauss}}(\mathrm{\ell}))$ where ${c}_{\text{gauss}}$ is the cumulative distribution function (CDF) of a standard normal distribution, and ${c}_{\text{ex}}$ is the empirical CDF of the experimental data. This transformation discards all information about the shape and mean value of the cycle length distribution; the set of $\mathrm{\ell}$ variables then purely reflect the strength of correlations between cell cycles. The (Pearson) moment correlation coefficients between the variables $\mathrm{\ell}$ are identical to the socalled Gaussian rank correlation coefficients (Boudt et al., 2010) between the corresponding cycle lengths $\tau $, which are similarly robust to outliers as the more common Spearman rank correlation.
The Gaussian variable $\mathrm{\ell}$ is used to model correlation by inheritance. $\mathrm{\ell}$ is a weighted sum of $d$ latent, centered Gaussian variables $\mathit{x}=({x}_{1},\dots ,{x}_{m}{)}^{\mathsf{T}}$ with positive weights $\mathit{\alpha}=({\alpha}_{1},\dots {\alpha}_{m}{)}^{\mathsf{T}}$, denoted as vectors $\mathit{x}$ and $\mathit{\alpha}$. That is, $\ell ={\mathit{\alpha}}^{\mathsf{T}}\mathit{x}=\sum _{l}{\alpha}_{l}{x}_{l}$. Inheritance in the model occurs by passing on latent variables from mother to daughter cells. The basic model equation relation reads
Here, a superscript $i=1,2$ denotes a daughter cell, and absence of a superscript refers to the mother cell. The matrix $\mathbf{\mathbf{A}}$ implements inheritance: The average of a daughter’s latent variables, given the mother’s is $\u27e8{\mathit{x}}^{i}\mathit{x}\u27e9=\mathbf{A}\mathit{x}$. This linear coupling of latent variables through inheritance may take any form compatible with the basic stability requirement that its operator norm must satisfy $\parallel \mathbf{\mathbf{A}}\parallel <1$. Since both daughters inherit the same contribution from the mother, inheritance correlates the daughters’ latent variables positively. Daughter cells are also subject to random fluctuations which we model by standard normal random vectors $\mathit{\xi}}^{i$. These fluctuations are correlated due to the term $\overline{b}{\mathit{\xi}}^{\overline{\u0131}}$ in Equation 2. Here $\overline{\u0131}$ designates the sister cell of $i$, for example $\overline{2}=1$. We parametrize these correlations via
The sister correlations conditioned on the mother latent variables then become
where $\mathbf{\mathbf{I}}$ is the $d$dimensional unit matrix. Positive sister correlations $(\gamma >0)$ may arise due to fluctuations that occur within the mother cell after its cycle duration has been fixed and are shared by the daughters. Negative correlations $(\gamma <0)$ may arise due to partitioning noise upon inheritance. Note that latent variable fluctuations are correlated between sisters but uncorrelated between different latent variables. Effectively, our choice of parametrization partitions all fluctuating cell cyclerelevant processes within the daughter cells into $d$ Gaussian components that are maximally decorrelated, similar to a principal component decomposition.
Overall, Equation 2 defines an unbiased model with linear, local inheritance of latent variables, and an output that is a linear combination of latent variables. Its Gaussian form may be justified as the maximumentropy distribution (Jaynes, 1957) for this problem, since only covariance information is used as an experimental input at this stage. Our model is a firstorder vector autoregressive process, defined on the lineage tree of cells; it is a latentvariable generalization of the bifurcating autoregressive model already considered by Staudte and coworkers (Staudte et al., 1984). We remark that in Staudte et al. (1984) the dimensionality is $d=1$, and therefore $\mathit{\alpha}$ is unnecessary.
Stationary exponential growth
Combining Equations 2 and 4 by the law of total variance, the latent covariance satisfies
In stationary exponential growth, averaging over a single lineage forward in time, a stationary distribution with mean $\mathbf{0}$ and covariance ${\mathbf{\mathbf{C}}}_{\mathrm{\infty}}$ is established. Then $\u27e8{\mathit{x}}^{i}{{\mathit{x}}^{i}}^{\mathsf{T}}\u27e9=\u27e8\mathit{x}{\mathit{x}}^{\mathsf{T}}\u27e9={\mathbf{C}}_{\mathrm{\infty}}$, and Equation 5 implies
We take this stationary distribution of latent variables as initial condition for root cells of lineage trees, assuming they come from an equilibrated growth phase. This assignment is not strictly correct because the stationary distribution along forward lineages is different from the distribution of all cells in an exponentially growing population at a given time (Lin and Amir, 2017); however, the difference was small for our parameters when tested numerically (not shown).
Computation of the likelihood
We aim to compute the probability of generating a lineage tree with given cycle lengths within the model. We fix a minimum generation number and consider only trees in which essentially all branches reach this number, thus discarding overhanging cells on some branches, (see also Sandler et al., 2015). This is crucial since in experiments with finite duration, selection bias would otherwise be introduced (Cowan and Staudte, 1986).
We begin by indexing cells in a tree by their pedigrees, which are the sequences of sister indices counting from the root cell, for example $I={i}_{1}{i}_{2}\mathrm{\dots}{i}_{k}$ for a cell in generation $k$ and $I{i}_{k+1}$ for one of its daughters. Sorting these indices, we can then arrange all Gaussiantransformed cycle lengths in a tree into a single vector $\underset{\mathbf{\_}}{\mathit{\ell}}$. Since $\underset{\mathbf{\_}}{\mathit{\ell}}$ is Gaussian with mean $\underset{\mathbf{\_}}{\mathbf{0}}$, its logprobability takes on the simple quadratic form
To evaluate Equation 7, we need to determine the joint covariance matrix ${\underset{\xaf}{\mathbf{\mathbf{C}}}}_{\mathrm{\ell}}$ of Gaussian cycle lengths over the given tree structure as a function of the parameters $\mathbf{A},\gamma ,\mathit{\alpha}$. We start by first deriving the joint covariance matrix $\underset{\xaf}{\mathbf{\mathbf{C}}}$ of the latent variables $\underset{\mathbf{\_}}{\mathit{x}}$. This is a block matrix with $d\times d$ blocks ${\mathbf{\mathbf{C}}}^{IJ}$ that correspond to pairs of cells in welldefined relationships, such as motherdaughter, cousincousin, etc. Since the lineage tree is sampled from stationary growth, ${\mathbf{\mathbf{C}}}^{IJ}$ depends only on the relationship of $I$ and $J$, that is on their respective ancestral lines up to the latest common ancestor, and not on the history before. In particular, if $I=J{i}_{k+1}\mathrm{\dots}$ then cell $I$ is a descendant of cell $J$ and we write this as $I>J$; otherwise we write $I\ngtr J$. Note that $I\ngtr I$.
Splitting one cell pedigree as $I=Ki$, from Equation 2 we derive the relations
Equation 8 lets us compute ${\mathbf{\mathbf{C}}}^{IJ}$ by a recursive procedure, as follows:
Consider the case $I>J$. If $I=Ki$, then $J\ngtr K$. Now use Equation 8 (i) repeatedly ($k$ times), moving up the ancestral line, until arriving at the form $\mathbf{C}}^{IJ}={\mathbf{A}}^{k}\u27e8{\mathit{x}}^{J}{{\mathit{x}}^{J}}^{\mathsf{T}}\u27e9={\mathbf{A}}^{k}{\mathbf{C}}_{\mathrm{\infty}$.
In the case $I=K{i}_{1}\mathrm{\dots}{i}_{k},J=K{j}_{1}\mathrm{\dots}{j}_{{k}^{\prime}}$ with ${j}_{1}={\overline{\u0131}}_{1}$, no cell is a descendant of the other, and their last common ancestor is $K$. Use Equation 8 (i) (or its transpose) on both branches repeatedly until the form $\mathbf{A}}^{k1}\u27e8{\mathit{x}}^{K{i}_{1}}{{\mathit{x}}^{K{\overline{\u0131}}_{1}}}^{\mathsf{T}}\u27e9{{\mathbf{A}}^{{k}^{\mathrm{\prime}}1}}^{\mathsf{T}$ is obtained. Then use Equation 8 (ii) to get ${\mathbf{\mathbf{C}}}^{IJ}={\mathbf{\mathbf{A}}}^{k}{\mathbf{\mathbf{C}}}_{\mathrm{\infty}}\mathbf{\mathbf{A}}^{{k}^{\prime}}{}^{\U0001d5b3}+\gamma {\mathbf{\mathbf{A}}}^{k1}\mathbf{\mathbf{A}}^{{k}^{\prime}1}{}^{\U0001d5b3}$
These two cases cover all possible cellcell relations, so that the procedure fully determines the joint latent covariance $\underset{\xaf}{\mathbf{\mathbf{C}}}$ for a given tree structure, inheritance matrix $\mathbf{\mathbf{A}}$ and sister correlation $\gamma $.
Finally, to obtain the covariance ${\underset{\xaf}{\mathbf{\mathbf{C}}}}_{\mathrm{\ell}}$ of the Gaussian cycle lengths $\mathrm{\ell}$, we project onto $\mathit{\alpha}$. The elements of ${\underset{\xaf}{\mathbf{\mathbf{C}}}}_{\mathrm{\ell}}$ result as
This completes the evaluation of the logprobability $\mathcal{P}$ (Equation 7), which is also equal to the loglikelihood of the model, $\mathcal{\wp}(\underset{\mathbf{\_}}{\mathit{\ell}})=\mathcal{\mathcal{L}}(\mathbf{A},\gamma ,\mathit{\alpha})$. Accounting for the constraint $\u27e8{\mathrm{\ell}}^{2}\u27e9=1$ which we impose to fix the arbitrary normalization of $\mathrm{\ell}$, the full model has ${d}^{2}+1+d1=d(d+1)$ adjustable parameters. This number can be reduced by restricting the inheritance matrix to a specific form, or by setting $\gamma =0$, as was done for the model variants discussed in the main text.
As a corollary, the Gaussian rank correlation between cycle lengths of any pair of cells results as
In the onedimensional special case $d=1$, the projection on $\alpha $ becomes irrelevant and Equation 10 reduces to
where $k,{k}^{\prime}\ge 1$ the distances to the latest common ancestor as in the algorithm above, and $a\equiv \mathbf{\mathbf{A}}$ is the $1\times 1$ inheritance matrix. Some special cases of Equation 11 given already in Cowan and Staudte (1986) are ${\rho}_{\text{ss}}^{\text{gauss}}=[{a}^{2}+\gamma (1{a}^{2})]$ for sisters and $\rho}_{\mathrm{c}1}^{\mathrm{g}\mathrm{a}\mathrm{u}\mathrm{s}\mathrm{s}}={a}^{2}{\rho}_{\mathrm{s}\mathrm{s}}^{\mathrm{g}\mathrm{a}\mathrm{u}\mathrm{s}\mathrm{s}}={{\rho}_{\mathrm{m}\mathrm{d}}^{\mathrm{g}\mathrm{a}\mathrm{u}\mathrm{s}\mathrm{s}}}^{2}{\rho}_{\mathrm{s}\mathrm{s}}^{\mathrm{g}\mathrm{a}\mathrm{u}\mathrm{s}\mathrm{s}$ for first cousins. In other words, to compute the correlation between related cells, one multiplies motherdaughter correlations along the path connecting them, taking a shortcut via the daughters of the last common ancestor where one instead multiplies with the sistersister correlation. Specializing further to $\gamma =0$, Equation 11 reduces to the wellknown relation $\rho}_{{\tau}^{I}{\tau}^{J}}^{\mathrm{g}\mathrm{a}\mathrm{u}\mathrm{s}\mathrm{s}}={{\rho}_{\mathrm{m}\mathrm{d}}^{\mathrm{g}\mathrm{a}\mathrm{u}\mathrm{s}\mathrm{s}}}^{k+{k}^{\mathrm{\prime}}$ where $k+{k}^{\prime}$ is the number of cell divisions linking $I$ and $J$. As detailed in the main text, these onedimensional special cases are insufficient to explain our data.
Evidence calculation
To compare different model versions’ ability to explain but not overfit the data, we employed a standard Bayesian model selection scheme (see e.g. Wasserman, 2000; MacKay, 2003). Within this framework, model selection is treated on the same grounds as parameter inference; the task is to assign to each one out of a set $\mathcal{M}$ of models its likelihood to have generated the data. One or several plausible models can then be selected on these grounds. Concretely, the scheme proceeds as follows. The probability of model $M$ to generate data $\underset{\mathbf{\_}}{\mathit{\ell}}$ is obtained by integrating over all parameter values ${\pi}_{M}$, which are distributed over a parameter space ${\mathrm{\Pi}}_{M}$ with prior distribution $p({\pi}_{M}M)$:
Here, $p(\underset{\mathbf{\_}}{\mathit{\ell}}{\pi}_{M},M)=\mathrm{exp}[\mathcal{\wp}(\underset{\mathbf{\_}}{\mathit{\ell}})]=\mathrm{exp}[\mathcal{\mathcal{L}}({\pi}_{M})]$ is equal to the likelihood function for model $M$, Equation 7. Applying Bayes’ rule, the probability that among $\mathcal{M}$, $M$ was the model that generated the data, is then
When models are equivalent a priori as we assume here, then both the prior belief in a model, $p(M)$, and the entire denominator in Equation 13 are unimportant constants. Then the socalled evidence or Bayes factor, obtained by calculating
is proportional to each model’s probability of having generated the data, Equation 13. We calculated $E(M)$ numerically by MonteCarlo integration of Equation 14; in the main text we show the evidences relative to Model V. Conventionally, an advantage in $E$ of a factor of 10 or more is considered strong support in favor of a model.
We briefly discuss some important features of model selection by evidence. In the asymptotic case of large samples (not applicable for the present data), the evidence $E$ is approximated by the wellknown Bayes information criterion (BIC), which is an alternative to the popular Akaike information criterion (AIC). While AIC is constructed to select a model whose predictions are maximally similar to future repetitions of the same experiment, evidence and BIC select the model that is most likely to have generated the existing data. BIC and evidence, but not AIC, have a desirable consistency property: If the models $\mathcal{M}$ are recruited from a hierarchy of nested models which also contains the true model, then the simplest model in $\mathcal{M}$ comprising the true model is always favored for large enough samples (Wasserman, 2000). This consistency is a manifestation of a general preference of the evidence for parsimonious models. To illustrate this point, following MacKay (2003), we expand the log evidence around the maximum aposteriori estimate ${\pi}_{M}^{*}$, using Laplace’s method:
Here, ${H}_{M*}={\left(\frac{{\partial}^{2}\mathcal{L}}{{\partial}_{{\pi}_{M}^{i}}{\partial}_{{\pi}_{M}^{j}}}\right)}_{{\pi}_{M}^{*}}$ is the Hessian of the loglikelihood, and for simplicity we have assumed a flat parameter prior $p({\pi}_{M})=1/\mathrm{vol}({\mathrm{\Pi}}_{M})$. The ith eigenvalue $2\pi /{s}_{i}$ of ${H}_{M*}$ determines the width $s}_{i$ of the peak of the posterior distribution around ${\pi}_{M}^{*}$, along the ith principal axis. In the last equality, we have written the parameter space volume $\mathrm{vol}({\mathrm{\Pi}}_{M})={\prod}_{i}{S}_{i}$ as a product of parameter ranges $S}_{i$ along the principal axes. Equation 15 can be interpreted as follows: As more parameters are added to a model, the fit accuracy, measured by $\mathcal{L}({\pi}_{M}^{*})$, generally improves. However, each new parameter ${i}^{\prime}$ incurs a penalty $\mathrm{log}({s}_{{i}^{\mathrm{\prime}}}/{S}_{{i}^{\mathrm{\prime}}})<0$. The more the new parameter needs to be constrained by the data, the more the evidence is reduced. Thus the basic mechanism of parsimony in Bayesian model selection is this: Complex models are characterized by a large number of parameters with wide a priori allowed ranges and sensitive dependence on the data; in other words, they require the data to pick parameters from a large set of possibilities. Complex models are penalized and ranked as less likely. Indeed, such an overly flexible model can be fitted to diverse data, which we should expect to diminish the support that a particular set of data can give to it.
Numerical efficiency and implementation
The evaluation of $\mathcal{L}$ (Equation 7) requires a final numerical inversion of the recursively assembled covariance ${\underset{\xaf}{\mathbf{\mathbf{C}}}}_{\mathrm{\ell}}$ to obtain the stiffness matrix, which costs $O({N}^{3})$ operations, where $N$ is the number of cells in the tree. To reduce this cost, it is possible to devise an equivalent recursive scheme in which the projected stiffness matrix, not the covariance matrix, is computed recursively, using efficient blockinverse formulæin each recursive step. The computational complexity can then be improved to $O({N}^{2})$. A version of this alternative, equivalent scheme was implemented in the programming language OCaml and used to obtain the results presented in the main text.
Appendix 3
The growthprogression model
Setup
The growthprogression model is based on the idea that the two cellcycle controlling processes are cellcycle progression and cell growth. The state variables of both processes, namely, the timing of the regulatory license to divide encoded by $p$, and cell size $s$, respectively, are inherited from mother to daughter. The two processes are coupled via inheritance as detailed in the following.
The cellcycle progression process
Inheritance of the velocity of cellcycle progression is modeled by a fluctuating, centered Gaussian variable $q$, passed on from mother to daughter, entirely analogous to version II of the BAR model (see there for additional explanation), according to
where subscript $i=1,2$ denotes a daughter cell, $\overline{\u0131}$ its sister, and no subscript, the mother. $a$ with $a<1$, implements inheritance. The intrinsic fluctuation strength and coupling is given by $b$ and $\overline{b}$; effectively, daughter cells are correlated for given mother by, $\u27e8{q}_{i}{q}_{\overline{\u0131}}q\u27e9=\gamma ;\u27e8{q}_{i}^{2}q\u27e9=1,$ where $\gamma =2b\overline{b}$. From the centered $q$ variables, shifted and scaled Gaussian variables $p={\sigma}_{p}q+\mu $ were generated, finally yielding lognormal regulation cycle durations ${\tau}_{p}=\mathrm{exp}(p)$. The duration ${\tau}_{p}$ is the time elapsed since the last division until regulatory license is given to divide again. Overall, the progression process has four adjustable parameters, $\mu$, ${\sigma}_{p}$, $a$ and $\gamma $.
The growth process
The growth duration ${\tau}_{g}$ is defined as the time to grow from an initial size ${s}_{b}$ to the threshold size ${s}_{\text{th}}$. Size accumulation was modeled by exponential growth with the exponential growth rate
However, under MYCN inhibition, exponential growth was prone to generate unreasonably large cells. Here we instead modeled growth by the logistic growth process
where $k$ is the growth rate constant and ${s}_{\text{max}}$ the maximum cell size. We fixed ${s}_{\text{max}}=20$, to match the approximate cell size at which the growth rate starts to decrease with observations (Sung et al., 2013; Tzur et al., 2009). The two growth laws Equations 17, 18 yield
and
respectively.
The normalized threshold cell size ${s}_{\text{th}}$ fluctuates slightly and independently in each cell as ${s}_{\text{th}}=1+\zeta $ with $\zeta \sim \mathcal{N}(0,{\sigma}_{g}^{2})$. At division, the final mother cell size ${s}_{\text{div}}$ is halved, with each daughter receiving a new size at birth ${s}_{b}={s}_{\text{div}}/2$. Effectively, for the subset of cell cycles that are limited by growth, this process corresponds to a sizer mechanism Facchetti et al. (2017). The growth process has two adjustable parameters, $k$ and ${\sigma}_{g}$.
Coupling progression and growth
The two processes are coupled via a checkpoint which requires both to be completed before a cell has license to divide. The cellcycle length is then determined as $\tau =\mathrm{max}({\tau}_{p},{\tau}_{g})$. If division is stalled by insufficient cycle progression, $s$ continues to accumulate until cell division, so that the final cell size ${s}_{\text{div}}>{s}_{\text{th}}$. Thus, inheritance of the growth process is influenced by the cycle progression process. In contrast, the progression process is inherited in an autonomous fashion. This unidirectional inheritance structure recapitulates the unidirectional coupling between hidden processes found in the preferred BAR models IV and V.
Model simulation
The growthprogression model was implemented in Matlab (R2016b), R (3.4.3) and OCaml (4.06) (with identical results but increasing execution speed) and lineage trees were simulated. For each tree, an initial cell was generated with birth size ${s}_{b}={s}_{\text{th}}/2$ and $p=\mu $ and subsequently, an unbranched single lineage was simulated for 100 generations for equilibration. Its final cell was used as founder cell for the tree. For the data shown, 30 trees of 7 generations each were simulated, roughly corresponding to the dataset sizes obtained experimentally. The simulation was repeated 100 times to generate confidence bounds.
Parameter optimization
Parameters were fitted using Approximate Bayesian Computation independently for each dataset. In an adaptive procedure, (sometimes nonuniform) prior distributions of model parameters were first generated. For each parameter set, 500 trees were simulated at a depth of 7 generations. To compare data $D$ and simulations $\widehat{D}$, we used a set $S$ of summary statistics composed of the nine correlation coefficients as shown in Figure 3C, and the mean and all quartiles of the distribution of cellcycle durations. A squareddistance between data and simulation summary statistics was calculated as
where ${\sigma}_{i}$ were calculated based on the data bootstrap confidence (95%) bounds. Samples of the approximate posterior probability distribution of parameter values (Figure 3–figure supplement 3A ) were then generated by accepting parameter sets with ${\rho}^{2}(S(\widehat{D}),S(D))<\u03f5$, weighted by the inverse of the local prior density in parameter space. We used a tolerance $\u03f5=2$ but results were not sensitive to the choice of $\u03f5$ within the range $1\dots 4$. Prior parameter ranges were extended as far as necessary for accepted sets to converge. The final accepted ranges of each parameter was used as credible regions, and the medians of each parameter distribution from these samples were selected as bestfits for further analysis.
Data availability
Data generated or analysed during this study are included in the manuscript and supporting files. Source data files have been provided for Figures 1 and 4.

NCBI Gene Expression OmnibusID GSE98274. RNASeq of SHEP TET21N cells upon Doxorubicin treatment.
References

MYC disrupts the circadian clock and metabolism in Cancer cellsCell Metabolism 22:1009–1019.https://doi.org/10.1016/j.cmet.2015.09.003

BookThe Gaussian Rank Correlation Estimator: Robustness PropertiesRochester, NY: Social Science Research Network.

Evidence that the human cell cycle is a series of uncoupled, memoryless phasesMolecular Systems Biology 15:e8604.https://doi.org/10.15252/msb.20188604

The regulatory landscape of lineage differentiation in a metazoan embryoDevelopmental Cell 34:592–607.https://doi.org/10.1016/j.devcel.2015.07.014

Controlling cell size through Sizer mechanismsCurrent Opinion in Systems Biology 5:86–92.https://doi.org/10.1016/j.coisb.2017.08.010

Network plasticity of pluripotency transcription factors in embryonic stem cellsNature Cell Biology 17:1235–1246.https://doi.org/10.1038/ncb3237

Mammalian cell size is controlled by mTOR and its downstream targets S6K1 and 4ebp1/eIF4EGenes & Development 16:1472–1487.https://doi.org/10.1101/gad.995802

Information theory and statistical mechanicsPhysical Review 106:620–630.https://doi.org/10.1103/PhysRev.106.620

Conditional expression of Nmyc in human neuroblastoma cells increases expression of prothymosin and ornithine decarboxylase and accelerates progression into Sphase early after mitogenic stimulation of quiescent cellsOncogene 13:803–812.

BookInformation Theory, Inference and Learning AlgorithmsCambridge University Press.

A minimum of two distinct heritable factors are required to explain correlation structures in proliferating lymphocytesJournal of the Royal Society Interface 7:1049–1059.https://doi.org/10.1098/rsif.2009.0488

Methods for cell and particle trackingMethods in Enzymology 504:183–200.https://doi.org/10.1016/B9780123918574.000094

Concurrent processes set E. coli cell divisionScience Advances 4:eaau3324.https://doi.org/10.1126/sciadv.aau3324

Irreversible cellcycle transitions are due to systemslevel feedbackNature Cell Biology 9:724–728.https://doi.org/10.1038/ncb0707724

An outline of the pattern of bacterial generation timesJournal of General Microbiology 18:382–417.https://doi.org/10.1099/00221287182382

The biosynthetic basis of cell size controlTrends in Cell Biology 25:793–802.https://doi.org/10.1016/j.tcb.2015.10.006

Additive models for dependent cell populationsJournal of Theoretical Biology 109:127–146.https://doi.org/10.1016/S00225193(84)801150

Bayesian model selection and model averagingJournal of Mathematical Psychology 44:92–107.https://doi.org/10.1006/jmps.1999.1278
Decision letter

Naama BarkaiSenior and Reviewing Editor; Weizmann Institute of Science, Israel
In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.
Acceptance summary:
Kuchen et al. have developed a general model that can explain the high degree of intragenerational correlation in cell cycle times that have been observed in multiple systems. Based on insight provided by this general model, they show that these correlations can arise from a unidirectional coupling of cell size and cellcycle speed by a minimal cellsize checkpoint, which they verify using perturbation experiments.
Decision letter after peer review:
Thank you for submitting your article "Hidden longrange memories of growth and cycle speed correlate cell cycles in lineage trees" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by Naama Barkai as the Senior and Reviewing Editor. The reviewers have opted to remain anonymous.
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
Summary:
Kuchen et al. have developed an interesting and general model that can explain the high degree of intragenerational correlation in cell cycle times that have been observed in multiple systems. First, using Bayesian inference on their own timelapse data, they find that a model of two unidirectionally coupled heritable components that affect cellcycle time best explains their data. They go on to propose that these components can be cell size and cellcycle speed, provided that growth and division are coupled via a minimum size checkpoint. They then verify this model through perturbation experiments.
Essential Revisions:
Please address all comments below.
In particular please pay a special attention to the following two points:
1) The concerns over reproducibility between datasets,
2) Provide much clearer explanation of the unbiased modelselection.
Reviewer #1:
The paper studies division time correlations within neuroblastoma cells through timelapse microscopy. Their approach uses unbiased statistical models to infer a minimal model that they claim recapitulates the experimentally observed correlations within populations of cells based on a set of heritable factors influencing interdivision times. The authors then study a model of division times based on growth to a threshold size and the inheritance of cell cycle progression factors that determine the rate of passage through the cell cycle. The authors claim that this second model recapitulates WT data in addition to matching the correlations observed in cells either grown with rapamycin (to limit cell growth) or cells with a reduced rate of passage through the cell cycle obtained by restricting the expression of the oncogene MYCN.
I found their unbiased statistical approach of particular interest since they considered a range of hierarchical models and used a metric for evaluating them which accounts for potential effects of overfitting with larger numbers of fitting parameters. However, to interpret this section with a critical eye requires more intuition for the metric of discrimination between these hierarchical models. In particular, Equation 12 which appears to address this should be referenced and discussed in the main text. With regards to the rest of the paper, the predictions of the authors when varying the limitations of either growth or cell cycle progression were unclear. This would be greatly aided by giving the reader intuition about what one would expect for either "pure" model (either a pure sizer for size control or cells which are completely limited by cell cycle progression).
Additionally, the following main comments should be addressed:
1) How are the fitting parameters reevaluated when inhibiting growth or cell cycle progression relative to the control case?
2) Why do the authors switch to logistic growth during the MYCN growth? This is currently not even mentioned in the main text. It is concerning that in order to fit the MYCN growth case this fundamental assumption needed to be altered.
3) Is the agreement between the model predictions and the observed division time correlations robust to variations in the cell size control strategies assumed? There is increasing evidence that many mammalian cells follow an "adder" model (see for example Cadart et al., 2018 which is incorrectly referred to in the paper as supporting a critical size), so testing the model for this size control strategy would be an important test of the proposed model.
4) The choice of cell cycle timing as a feature of interest was not adequately justified. The results would be stronger if the authors can use data on correlations in cell size between relatives to test Model V and its predictions.
5) The explanation of the results shown in Figure 4 does not provide clear intuition for why the correlations should be altered in the observed fashion. It would be helpful if the authors took more care when explaining this. For example: The authors show in Figure 3E that growthlimited cells have negative motherdaughter correlations in cell cycle timing. This being the case, why is it that the motherdaughter correlations in the presence of rapamycin (Figure 4D) are in fact larger than those correlations measured in the MYCN treated cells?
Reviewer #2:
In the present manuscript, Kuchen and colleagues investigate how regulation of cell cycle progression and cell growth are coupled using analysis of extended cell lineages and mathematical modeling. Initially, they performed timeresolved livecell microscopy of a MYCN – overexpressing neuroblastoma line to generate cell lineage trees spanning 57 generations. As previously reported, they observed higher intragenerational than ancestral correlations, which has so far been attributed to coupling between cell cycle and circadian clock (Sandler et al., 2015, Chakrabarti et al., 2018). The authors now took an unbiased approached based on a Gaussian latent variables and propose a model where two unidirectionally coupled variables are inherited to daughter cells. They mechanistically implement these two variables as cell growth and cellcycle progression and show that it is sufficient to couple both through a minimalsize checkpoint to account for the observed correlation patterns in lineage trees. This model is supported by experimental perturbations of cell growth (via mTOR inhibition) and cell cycle progression (decreasing MYCN levels) as well as by analysis of existing lineage data from rapidly cycling human ES cells.
The paper is well written and easy to follow. The authors present a very plausible alternative explanations for the nonintuitive correlation patterns in lineage trees and provide further evidence for a size checkpoint in mammalian cell cycle regulation. As the focus of the manuscript is on the use of abstracted mathematical models to explain a complex cellular phenotype (inheritance of cell cycle length), there is little mechanistic/molecular inside regarding the details of the observed phenomena (e.g. minimal cell size checkpoint). However, the provided experimental validation provides strong support for the presented hypothesis. As an experimentalist, I would have liked to see additional orthogonal perturbations to further strengthen the data, but believe that this would be beyond the scope of this study.
Main points:
1) While the modelling results match the data within the estimated error bounds, the correlation patterns seem to be less pronounced. For example, the increased correlation to first cousins compared to aunts in Figure 3C (and Figure 3—figure supplement 1D) is hardly reproduced in the model. Similarly, the correlations to the removed sidebranch seem to be about equal for all generations in the model, while the data indicates increased intragenerational correlations. It would be helpful if the authors comment on and explain such deviations or limitations of the model.
2) As a main assumption of the authors is that MYCN overexpression disrupts the circadian clock which has been previously linked to the inheritance patterns of cell cycle length, they should validate this in the cell line used. This could be done for example by quantifying BMAL1 expression in the presence/absence of doxycycline (see Figure 4—figure supplement 1A.
Reviewer #3:
Kuchen et al. have developed an interesting and general model that can explain the high degree of intragenerational correlation in cell cycle times that have been observed in multiple systems. First, using Bayesian inference on their own timelapse data, they find that a model of two unidirectionally coupled heritable components that affect cellcycle time best explains their data. They go on to propose that these components can be cell size and cellcycle speed, provided that growth and division are coupled via a minimum size checkpoint. They then verify this model through perturbation experiments. Although the paper appears to be of general interest to eLife readers, there are several points that require clarification before publication.
1) Representation of data. The authors present 3 repeats of the timelapse microscopy for neuroblastoma TET21N cell lineages. (Figure 1E). For the Bayesian analysis (Figure 2) they only show the different models plotted against Repeat 1 in the main and supplemental figures (which appears to be the bottom dataset of Figure 1E – it would be helpful if the different repeats were labelled in this figure). However, there appears to be significant differences between the 3 repeats shown in Figure 1E. Although it is difficult to see (in part because the horizontal line across each axis is at a different value of the Cross correlation coefficient – how was this value selected?) it appears that in repeats 2 and 3 the level of correlation in distant side branches is much lower (around 0). This appears particularly worrying given the results of the Bayesian analysis for Model VII (Figure 2—figure supplement 1). Although this model results in no correlation past first cousins, it gives the best fit to data for repeat 2 and 3 (better in fact than model 5, Figure 2—figure supplement 1), and only fails to reproduce replicate 1. This suggests that for 2 out of 3 datasets a model that assumes no correlations past first cousins is being selected as the best model from the Bayesian analysis. The authors should address this issue, and show plots of the Bayesian analysis against all 3 repeats.
In the rest of the figures, which dataset chosen is not consistent. Although they do display a fit of their growth/progression model to repeat 1 and 2 in the Figure 3—figure supplement 1, they now chose repeat 3 to display in Figure 3 – which appears to be the best fit to the model. In Figure 4, panel C shows cellcycle lengths for repeat 3, but it is not clear which repeat(s) is/are used in the calculation of the correlation ratios. It would be helpful if the authors were consistent with their data selection in the main figures and supplements throughout the paper.
2) Experimental evidence for cell size/cell cycle checkpoint model. The authors relate latent variables x_{1} and x_{2} to cell size and the cycling of cell cycle factors respectively. However, since these quantities are not dynamically tracked throughout cell growth, I have some questions about the authors' interpretation and model. First, the model assumes the existence of a size threshold. If size does not affect cell cycle progression (another assumption), that suggests smallborn cells are under a sizertype control, whereas largeborn cells (progression limited) are under a timertype control. Such a model seems difficult to square with ongoing work in other organisms, which typically find addertype (many singlecell microbes but also a number of mammalian cell lines, cf. Cadart et al., 2018) or sizertype (e.g. fission yeast) control. Could the authors comment on how their model is biologically justified and how does it relate to alternative models of cell growth? Timer control is usually not considered because, on its own, it does not generate stable cell size. Indeed, in the appendix the authors note that, in the progressionlimited regime, their model generates cells that are too large, so they implemented logistic, rather than exponential growth. In my understanding, this effectively implements a sizer again. What evidence do the authors have for these assumptions?
On the other hand, in subsection “Effects ofmolecular perturbations on cellcycle correlations correctly predicted by the model” the authors claim that the growth of MYCNinhibited cells is not affected. How did the authors verify this? If growth is logistic, as in the model, then growth rate is clearly affected beyond a certain size. Naively, I would also expect slower growth rates if growth is exponential and cell size is homeostatic. Additionally, the effects on cell size due to the addition of rapamycin appears weak (Figure 4B) – Is it possible to quantify this difference, with an estimation of errors?
https://doi.org/10.7554/eLife.51002.sa1Author response
Reviewer #1:
[…]
However, to interpret this section with a critical eye requires more intuition for the metric of discrimination between these hierarchical models. In particular, Equation 12 which appears to address this should be referenced and discussed in the main text.
The reviewer asks us to elaborate on model selection. We agree that this is a critical issue. In response, we have added in the main text a short intuitive explanation on Bayesian evidence used for model selection: “Selection is based on the Bayesian evidence, which rewards fit quality while naturally penalizing models of higher complexity (defined as being able to fit more diverse data sets; for details see Appendix 2, “Evidence calculation”).” To underpin this, we have greatly expanded on the fundamentals of Bayesian evidence and how we calculated it in Appendix 2, section Evidence calculation. Particular emphasis is placed on an intuitive explanation of how the penalty for overfitting is builtin to Bayesian model selection. While rather intuitive, this detailed explanation requires some technical preliminaries and recapitulates prior work of others. We therefore present it in an Appendix. In addition to the book by MacKay previously cited, we now also cite a review by Wasserman, 2000, that gives a lucid methodological overview of this topic.
With regards to the rest of the paper, the predictions of the authors when varying the limitations of either growth or cell cycle progression were unclear. This would be greatly aided by giving the reader intuition about what one would expect for either "pure" model (either a pure sizer for size control or cells which are completely limited by cell cycle progression).
The reviewer points out that developing intuition on how the growthprogression model functions would help understand the outcome of the perturbation experiments. We thank them for making us think further in this direction. We have now made several additions throughout the paper that build upon each other in giving an intuitive view on how cell growth and cellcycle progression interact in our model to yield the observed correlation patterns of cellcycle length. Before outlining this new material, we first remark that a pure “sizer” model of the type proposed here is a degenerate case that may obscure, rather than enlighten, what happens in our model. A pure sizer would simply cause exactly equal (no threshold noise) or slightly anticorrelated (with noise) cellcycle lengths in mother and daughters and have no longrange memory whatsoever. Of note, even the very fast proliferating ESCs in our manuscript are not described by a pure sizer. More generally, a pure sizer is also an unrealistic scenario in the light of recent data (e.g., Cadart et al., 2018, showing nearadder, rather than sizer, behavior for many mammalian cell lines), and hence we chose not to discuss it, mainly to avoid confusion. In the new sections building intuition for the model behavior, we emphasize from the start how growth and cycleprogression interact in generating longrange cellcycle correlations.
This intuitive explanation: “While growth and cellcycle progression are separable and heritable processes, they also interact. […] Thus, despite inheritance of growth and cellcycle regulators mothers and daughters may have very different cycle lengths due to this interaction.” This interaction is then the basis for both rapidly decaying ancestral (e.g., motherdaughter) and longrange intragenerational correlations in cellcycle length. This is now explained in a new paragraph: “To gain intuition …” The new paragraph makes use of Figure 3E (already present previously), showing that mother and daughter cell cycles become anticorrelated due to the abovedescribed influence of cellcycle progression on growth.
To clarify the origin of longrange intragenerational correlations further, we have now added a new Figure 3F (complemented by new Figure 3—figure supplement 1F) and a new explanation: “While cousins are positively correlated overall, this correlation is carried specifically by cousins that descend from a grandmother with a progressionlimited cell cycle (Figure 3F, blue dots), whereas cousins stemming from a growthlimited grandmother are hardly correlated (Figure 3F, orange dots). Since progressionlimited cells can grow large, this observation indicates that cousin correlations are mediated by inheritance of excess size, as is confirmed by conditioning cousin correlations on grandmother size…”
These findings show that both low motherdaughter correlations and high intragenerational (e.g., cousin) correlations are emergent phenomena of the interplay between growth and cellcycle progression. This intuition is then further developed in an amended paragraph on the perturbation experiments. In that paragraph, we found it quite helpful to first discuss the limiting case of a pure “timer”, i.e. pure progressionlimited cycles, as this is actually approximated experimentally by MYCN inhibition. Hence we have taken up, at least in part, the above concrete recommendation by the reviewer.
Additionally, the following main comments should be addressed:
1) How are the fitting parameters reevaluated when inhibiting growth or cell cycle progression relative to the control case?
This question relates to two points: (i) prediction of the effects of perturbations and (ii) application of the model to the experimental data from the perturbation experiments. First, for the prediction of perturbation effects (Figure 4A), we used the parameters for TET21N cells control conditions (replicate rep1). To simulate inhibition of cell growth, we lowered the growth rate k by 10%. To simulate inhibition of cellcycle progression, we increased the average cycle length via increasing the mean log of the progression time μ by 7.5%. These details were inadvertently omitted from the Materials and methods for which we apologize; we have now added these. Moreover, for ease of reference, we have now added a table with the numerical values of the parameters used in all the simulations as the new Figure 3—figure supplement 2, which complements Figure 3—figure supplement 1A, graphically depicting these values and their confidence bounds.
Second, after performing the perturbation experiments for growth and cellcycle progression (using rapamycin and doxocyclininducible downregulation of MYCN, respectively) we refitted the model to the experimental data, subjecting all model parameters to optimization. For inhibition of cellcycle progression, we found that the only parameter that showed a consistent difference between these perturbation data and control replicates 13 was μ, which was larger, corresponding to a longer cellcycle duration (see Figure 3—figure supplement 1A, myc1 and myc2). For inhibition of cell growth, several parameters changed in an informative manner. Consistent with the known rapamycin effect, the estimated growth rates were either at the lower end (rap2) or strongly decreased (rap1) by comparison with the values for the control experiments. At the same time, motherdaughter and sister cellcycle length correlations (a and g, respectively) increased. Mechanistically, such a tendency would be caused, for instance, by larger depletion of cellcycle inhibitors. This finding is consistent with the action of rapamycin, which inhibits mTOR and hence growth, but leaves ERK and PI3K activities unaffected, which drive cellcycle progression (e.g., Adlung et al., 2017). Hence, slower growth would delay attainment of the size threshold and thereby allow longer phases of degradation of inhibitors such as p21, which in turn should increase motherdaughter and sister correlations (Spencer et al., 2013). We added these results and a brief discussion of their biological significance in subsection “Cousin correlations reflect active cellsize checkpoint”, in connection with Figure 4.
2) Why do the authors switch to logistic growth during the MYCN growth? This is currently not even mentioned in the main text. It is concerning that in order to fit the MYCN growth case this fundamental assumption needed to be altered.
The reviewer correctly observes that when modeling cells with low MYCN, which proliferate slowly, we implement a limitation of growth rate via logistic growth, whereas in all other cases, we do not. This was done in this manner because we aimed at the simplest possible version of the growthprogression model for each particular situation. For all cases but the slowly growing and large lowMYCN cells, we obtained wellbounded distributions of the “size” (or resource) variable with a constant accumulation rate, i.e. exponential growth. To achieve this for lowMYCN cells, we needed to implement negative regulation of accumulation of the size variable, by introducing a logistic dependence of accumulation rate on size. This regulation kicks in at high values of the size variable that are very rarely reached when cells proliferate more rapidly, i.e. in highMYC cells. Hence some form of growth limitation will surely exist, but it is largely inconsequential for the more rapid cell cycles. We have also verified that including growth limitation has only minor effects on the model fit results in the control and growthinhibited conditions. In detail, we have made the following changes in response to this point of the reviewer:
1) We explain more clearly in the main text that logistic growth was used for lowMYCN cells (first when introducing the model and then again in connection with fits to perturbation experiments, Figure 4 C, D).
2) We now show in addition that the incorporation of a logistic growth limitation does not significantly alter the bestfit parameters and the corresponding fitted cellcycle correlations in lineage trees when cell cycles are faster, as in TET21N control conditions and under rapamycin treatment. These results are shown in the new Figure 4—figure supplement 2. Hence leaving out growth limitation for faster growing cells is a valid approximation here.
3) We now give a more concrete perspective on the implications of our findings on inheritance of cellcycle speed and cell size for cellsize regulation, (Discussion paragraph two), see also our response to point 3) below.
3) Is the agreement between the model predictions and the observed division time correlations robust to variations in the cell size control strategies assumed? There is increasing evidence that many mammalian cells follow an "adder" model (see for example Cadart et al., 2018 which is incorrectly referred to in the paper as supporting a critical size), so testing the model for this size control strategy would be an important test of the proposed model.
The reviewer raises the important question of size control strategies used by cells and refers to one of the basic phenomenological models, the adder model (a similar question was posed by reviewer #3, Point 2).
We thank the reviewer for pointing out the ambiguous referencing of Cadart et al., 2018, which we have now corrected and expanded as follows: “…shown that growth of many mammalian cells during the cell cycle adds a volume that only weakly increases with cell volume at birth (termed nearadder behavior”. Cadart et al. show that different mammalian cells fall within a continuum, ranging from sizer over adder to timer behavior, when quantifying cell size gain within one cell cycle. Our manuscript focuses on inheritance mechanisms that explain correlation patterns between cycles of related cells. Interestingly, our findings touch upon size control strategies, but from a different angle than the phenomenological timer, adder and sizer models. Our growthprogression model in effect combines “timer” cycles, when cycle duration is controlled by cellcycle progression, and “sizer” cycles, when cycle duration is controlled by growth. Overall, the mix leads to a pattern of mitotic size versus birth size that lies between a pure timer and a pure adder, a behavior that has been observed experimentally by Cadart et al. for mammalian cell lines (Author response image 1). This finding indicates that our model is principally consistent with findings by Cadart et al., although it does not explicitly implement an adder mechanism. Indeed, adder and sizer are phenotypic descriptions of cell growth behavior that could be realized by different types of cellular mechanisms. Given that the mechanistic basis of the fundamental findings by Cadart et al. is not fully understood, we regard it as beyond the scope of the current manuscript to test various hypotheses with our model. We indicate now in closing the Discussion that this may be a fruitful direction for future work.
4) The choice of cell cycle timing as a feature of interest was not adequately justified. The results would be stronger if the authors can use data on correlations in cell size between relatives to test Model V and its predictions.
As explained in our response to Point 3, our main focus is on the mechanisms that determine cellcycle timing.
5) The explanation of the results shown in Figure 4 does not provide clear intuition for why the correlations should be altered in the observed fashion. It would be helpful if the authors took more care when explaining this. For example: The authors show in Figure 3E that growthlimited cells have negative motherdaughter correlations in cell cycle timing. This being the case, why is it that the motherdaughter correlations in the presence of rapamycin (Figure 4D) are in fact larger than those correlations measured in the MYCN treated cells?
We agree that an intuitive explanation of the observed changes in correlation patterns was lacking. To give such an intuition, we need to interrogate the predictions of our growthprogression model. We have tried to do so in several places. First, we have added a more extensive discussion of how the interplay of growth and progression produce motherdaughter and cousincousin correlation (in subsection “Cell size and speed of cellcycle progression are antagonistic heritable variables”, connected to Figure 3E and F). We then explain in more detail how the experimental perturbations changed the growthprogression parameters (beginning with “In terms of fit parameters…”). In particular, rapamycin slows the growth rate but also increases the strength of inheritance of the progression process, consistent with a picture where prolonged cycles also make e.g. dilution of cellcycle inhibitors towards the end of the cycle more pronounced, as detailed in our response to the reviewer’s point 1) above. Thus, while growthlimited cycles are more frequent in presence of rapamycin, growth inhibition appears to also increase the inheritance of cell cycle factors, and together these effects produce the strong correlations shown in Figure 4D, including a motherdaughter correlation that is in fact higher than in MYCN inhibited conditions. We have added a statement to this effect in subsection “Cousin correlations reflect active cellsize checkpoint”.
Reviewer #2:
[…]
Main points:
1) While the modelling results match the data within the estimated error bounds, the correlation patterns seem to be less pronounced. For example, the increased correlation to first cousins compared to aunts in Figure 3C (and Figure 3—figure supplement 1D) is hardly reproduced in the model. Similarly, the correlations to the removed sidebranch seem to be about equal for all generations in the model, while the data indicates increased intragenerational correlations. It would be helpful if the authors comment on and explain such deviations or limitations of the model.
The reviewer points out potential differences in detail between correlation structures in the experimental data and in the model. We agree with this point. Both BAR model and growth progression models do not show an increase of correlations from aunts to cousins, although they do produce a moderate increase of correlation descending down the second side branch. As the reviewer also points out, the rather substantial measurement error bounds currently preclude us from going further in this direction. Even higherprecision data, or combination with other datasets (e.g., on detailed mechanisms of size regulation) may be necessary for further refining our models (for instance, resolving cellcycle phases) in the future. We now indicate this in the Discussion.
2) As a main assumption of the authors is that MYCN overexpression disrupts the circadian clock which has been previously linked to the inheritance patterns of cell cycle length, they should validate this in the cell line used. This could be done for example by quantifying BMAL1 expression in the presence/absence of doxycycline (see Figure 4—figure supplement 1A).
The reviewer asks for additional evidence that the circadian clock is downregulated by high MYCN levels. In response, we compared the RNA levels of circadian clock genes in TET21N cells with high MYCN expression to their counterparts with downregulated MYCN (based on RNAseq data obtained previously by us, Ryl et al., 2017). This analysis shows significant downregulation of four out of nine clock genes studied (PER1, PER3, CRY2, NR1D1 downregulated); the other five genes (ARNTL, PER2, CRY1, CLOCK, NR1D2) are not significantly changed. At least two of those latter genes appeared moderately downregulated (PER2, CLOCK, but this was not significant) while none appeared upregulated.
As MYC/MYCN is known to induce moderate changes in transcript levels to a large number of genes, we asked whether the entire gene module of the circadian clock was coherently regulated. Applying a standard genemodule approach (Robinson and Smyth, 2008), we compared the transcript level of the circadian clock module formed by the nine clock genes to the expression of randomly drawn modules with similar dispersion of transcript levels as the clock module. In this comparison, the clock module was strongly (and significantly) downregulated. We have added these new data as Figure 1—figure supplement 4 and refer to this figure in the main text.
In a line of work separate from this manuscript, we engineered a stable dualreporter TET21N cell line for imaging cell cycle (using the Cdt1 degron from the FUCCI markers) and circadian clock (fluorescent protein expression driven by NR1D1). In the first set of experiments, we found 1:1 phase locking with periods around 18 hours. As an example, Author response image 2 shows fluorescence traces of three cells. While analysis of more cell traces will be needed, this behavior is very similar to the 1:1 phase locking of cell cycle and circadian clock reported previously for mammalian cell lines (Bieler et al., 2014; Feillet et al., 2014), except that the clock in TET21N cells appears to be much noisier. The observed behavior is different from circadian gating or modulation of the cell cycle in cyanobacteria, which involves the circadian clock as an external oscillatory forcing with 24 hour period.
Reviewer #3:
Kuchen et al. have developed an interesting and general model that can explain the high degree of intragenerational correlation in cell cycle times that have been observed in multiple systems. First, using Bayesian inference on their own timelapse data, they find that a model of two unidirectionally coupled heritable components that affect cellcycle time best explains their data. They go on to propose that these components can be cell size and cellcycle speed, provided that growth and division are coupled via a minimum size checkpoint. They then verify this model through perturbation experiments. Although the paper appears to be of general interest to eLife readers, there are several points that require clarification before publication.
1) Representation of data. The authors present 3 repeats of the timelapse microscopy for neuroblastoma TET21N cell lineages. (Figure 1E). For the Bayesian analysis (Figure 2) they only show the different models plotted against Repeat 1 in the main and supplemental figures (which appears to be the bottom dataset of Figure 1E – it would be helpful if the different repeats were labelled in this figure). However, there appears to be significant differences between the 3 repeats shown in Figure 1E. Although it is difficult to see (in part because the horizontal line across each axis is at a different value of the Cross correlation coefficient – how was this value selected?) it appears that in repeats 2 and 3 the level of correlation in distant side branches is much lower (around 0). This appears particularly worrying given the results of the Bayesian analysis for Model VII (Figure 2—figure supplement 1). Although this model results in no correlation past first cousins, it gives the best fit to data for repeat 2 and 3 (better in fact than model 5, Figure 2—figure supplement 1), and only fails to reproduce replicate 1. This suggests that for 2 out of 3 datasets a model that assumes no correlations past first cousins is being selected as the best model from the Bayesian analysis. The authors should address this issue, and show plots of the Bayesian analysis against all 3 repeats.
The reviewer begins by pointing out that experimental data are not optimally presented in the manuscript. In particular, there was an error in Figure 1E, where two of the xaxes were inadvertently drawn at the wrong y position. We apologize for this drawing error and have corrected it. We have also added replicate labels in this figure panel.
The reviewer then continues to point out that the three control repeats differ with respect to the extent of longranging intragenerational correlations and, indeed, the selection of BAR models yields different results for repeat 1 versus repeats 2 and 3. We agree that these are important points which were not fully discussed in the original manuscript.
We understood that this discussion arises largely due to the fact that the results on BAR model selection are split between main figure (Figure 2) and supplement (Figure 2—figure supplement 1). To rectify this problem, we have revised these figures as follows:
1) In the revised Figure 2, the complete selection of BAR models is shown. In the accompanying revision of the main text, we now discuss in more detail that the extent of longranging correlation differs between the control replicates and that this fact impacts model selection. Specifically, longranging correlations favor Model V over Model VII. To illustrate this, we show BAR model fits (including Model VII) versus repeat 1 in the main figure.
2) To further emphasize this point we have now added the fits of the different BAR models also for experimental repeats 2 and 3, as also requested by the reviewer. To not overload the main figure, these are now shown in Figure 2—figure supplement 1.
The key conclusion of our model selection is that the existence of longrange correlations selects BAR models that with two latent variables that show both selfinheritance and crossinheritance, as exemplified by Model V. We exclude Model VII because it cannot account for the full data.
This conclusion is then corroborated subsequently in the manuscript by the rapamycin experiments with TET21N cells (Figure 4—figure supplement 1H) and by the ESC data (Figure 5—figure supplement 1D). where, again, Model VII is rejected due to longrange correlations. Note that in these special cases, Model IV (which is a more complex “version” of Model V) would be acceptable. Finally, the MYCN inhibited experiments are compatible and even prefer Model VII, due to the absence of longrange correlations (Figure 4—figure supplement 1G). Overall, only Model V can account acceptably for the full set of perturbed data, making it the only candidate for a model that describes a general mechanism. This selection results already when taking only the set of MYCNhigh experiments rep13 into account.
We have corrected the mislabeled caption of Figure 4—figure supplement 1. In the Discussion section we now added a sentence stating more precisely why the BAR Model V is favored overall, summarizing the above arguments.
In the rest of the figures, which dataset chosen is not consistent. Although they do display a fit of their growth/progression model to repeat 1 and 2 in the Figure 3—figure supplement 1, they now chose repeat 3 to display in Figure 3 – which appears to be the best fit to the model. In Figure 4, panel C shows cellcycle lengths for repeat 3, but it is not clear which repeat(s) is/are used in the calculation of the correlation ratios. It would be helpful if the authors were consistent with their data selection in the main figures and supplements throughout the paper.
The reviewer points out that, in Figures 3 and 4, fits of the growthprogression model are shown for different experimental replicates while the fits to the remaining repeats are shown in the supplements to these figures. We agree that this may appear confusing and have now chosen in a consistent manner to show fits to replicate rep1 in the main figures – Figure 2 for the BAR model, Figure 3 for the growthprogression model, and again as control in Figure 4C (perturbation experiments) – while fits to repeats 2 and 3 are shown in the respective figure supplements. This reordering of figure panels does not influence the conclusions drawn (see above). Also, for the guidance of the reader, we added an overview of all timelapse experiments used in the paper, as the new Figure 1—figure supplement 2.
2) Experimental evidence for cell size/cell cycle checkpoint model. The authors relate latent variables x_{1} and χ2 to cell size and the cycling of cell cycle factors respectively. However, since these quantities are not dynamically tracked throughout cell growth, I have some questions about the authors' interpretation and model. First, the model assumes the existence of a size threshold. If size does not affect cell cycle progression (another assumption), that suggests smallborn cells are under a sizertype control, whereas largeborn cells (progression limited) are under a timertype control. Such a model seems difficult to square with ongoing work in other organisms, which typically find addertype (many singlecell microbes but also a number of mammalian cell lines, cf. Cadart et al., 2018) or sizertype (e.g. fission yeast) control. Could the authors comment on how their model is biologically justified and how does it relate to alternative models of cell growth? Timer control is usually not considered because, on its own, it does not generate stable cell size.
The reviewer raises the important question of how our growthprogression model relates to recent experimental work on size control mechanisms in mammalian cells and fission yeast (a similar point was also raised by reviewer #1, Point 3).
The reviewer correctly points out that our growthprogression model combines “timer” cycles, when cycle duration is controlled by cellcycle progression, and “sizer” cycles, when cycle duration is controlled by growth. Overall, the mix leads to a pattern of mitotic size versus birth size that lies between a pure timer and a pure adder (see Author response image 1). Indeed, Cadart et al. show that different mammalian cells fall within a continuum from timer to adder to sizer when quantifying cell size gain during the cycle. Hence, our specific model lies within this continuum, between timer and adder. It is important here to point out that timer, adder and sizer are phenomenological models that do not rely on actual molecular mechanisms of size control. It is conceivable that a specific molecular mechanism may generate something different from any such pure model. The fact that our growthprogression model (a phenomenological model that per se does not focus on size control) creates a pattern of cell size gain intermediate between pure timer and pure sizer is interesting in this respect. However, as the mechanistic basis of the fundamental findings by Cadart et al. is not fully understood, we feel it is beyond the scope of the current manuscript to address these questions. We indicate now in closing the Discussion that this may be a fruitful direction for future work.
Indeed, in the appendix the authors note that, in the progressionlimited regime, their model generates cells that are too large, so they implemented logistic, rather than exponential growth. In my understanding, this effectively implements a sizer again. What evidence do the authors have for these assumptions?
On the other hand, in subsection “Effects ofmolecular perturbations on cellcycle correlations correctly predicted by the model” the authors claim that the growth of MYCNinhibited cells is not affected. How did the authors verify this? If growth is logistic, as in the model, then growth rate is clearly affected beyond a certain size. Naively, I would also expect slower growth rates if growth is exponential and cell size is homeostatic.
The assumption of a minimumsize threshold in the growth progression model is clearly a parsimonious assumption intended to keep the number of model parameters to a minimum. There is now conclusive work showing that cells do sense their size and can adjust their growth rate as a result (e.g. Ginzburg et al., 2018). Indeed, we used such a mechanism – implemented as a logistic dependence of growth rate on size – to avoid size divergence when the fraction of growthlimited cycles becomes very small and progressionlimited cycles dominate (as in the case of MYCN knockdown). Thus the critical question arises whether this type of sizer regulation would work in general in the growthprogression model. We have now tested this and found that incorporation of a logistic growth limitation does not significantly alter our results on cellcycle correlations in lineage trees when cell cycles are faster, as in TET21N control conditions. This shows that our conclusion remain robust when feedback control of growth rate by size is incorporated.
Importantly, growth rate regulation kicks in at high values of the size variable that are very rarely reached when cells proliferate more rapidly. Hence some form of growth limitation will surely exist, but it is largely inconsequential for the more rapid cell cycles. Hence leaving out growth limitation for faster growing cells is a valid approximation.
These new results are shown in the new Figure 4—figure supplement 2 and commented on in the Results as well as in the last paragraph of the Discussion.
Additionally, the effects on cell size due to the addition of rapamycin appears weak (Figure 4B) – Is it possible to quantify this difference, with an estimation of errors?
Precise cell size (mass or volume) measurement is technically hard (see e.g., Tzur et al. 2009; Ginzberg et al., 2018; Cadart et al., 2018) and is not in the focus of the current paper. We used, as a robust proxy, the forward scatter in the flow cytometer. This does not allow quantification of size changes. We fully agree with the reviewer that the size decrease upon rapamycin treatment is small (we used doses of rapamycin that had a growthinhibitory effect but did not stall cell cycling altogether). In response to the point by the reviewer, we now show two biological replicates sidebyside in amended Figure 4B for each condition (control, rapamycin, MYCN inhibition). These data show that the small size decrease upon rapamycin treatment is completely reproducible.
https://doi.org/10.7554/eLife.51002.sa2Article and author information
Author details
Funding
Bundesministerium für Bildung und Forschung (0316076A)
 Thomas Höfer
Bundesministerium für Bildung und Forschung (01ZX1307)
 Thomas Höfer
Bundesministerium für Bildung und Forschung (031L0087A)
 Thomas Höfer
Seventh Framework Programme (FP7HEALTH2010 ASSET 259348)
 Thomas Höfer
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We thank Frank Westermann and Tatjana Ryl for TET21N cells and laboratory setup; Timm Schroeder, Fabian Theis and Carsten Marr for embryonic stem cell data and discussions; Alessandro Greco for differential gene expression analysis and all members of the Höfer group for discussions. Grant support to TH from BMBF (MYCNET 0316076A and SysmedNB 01Z × 1307), EU (FP7HEALTH2010 ASSET 259348), BMBF and EU (EraCoSysMed OPTIMIZENB 031L0087A), as well as DKFZ core funding are gratefully acknowledged; TH is a member of CellNetworks.
Senior and Reviewing Editor
 Naama Barkai, Weizmann Institute of Science, Israel
Publication history
 Received: August 10, 2019
 Accepted: January 22, 2020
 Accepted Manuscript published: January 23, 2020 (version 1)
 Version of Record published: February 13, 2020 (version 2)
Copyright
© 2020, Kuchen et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 1,775
 Page views

 282
 Downloads

 10
 Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Computational and Systems Biology
 Neuroscience
Epifluorescence miniature microscopes ('miniscopes') are widely used for in vivo calcium imaging of neural population activity. Imaging data is typically collected during a behavioral task and stored for later offline analysis, but emerging techniques for online imaging can support novel closedloop experiments in which neural population activity is decoded in real time to trigger neurostimulation or sensory feedback. To achieve short feedback latencies, online imaging systems must be optimally designed to maximize computational speed and efficiency while minimizing errors in population decoding. Here we introduce DeCalciOn, an opensource device for realtime imaging and population decoding of in vivo calcium signals that is hardware compatible with all miniscopes that use the UCLA Data Acquisition (DAQ) interface. DeCalciOn performs online motion stabilization, neural enhancement, calcium trace extraction, and decoding of up to 1024 traces per frame at latencies of <50 ms after fluorescence photons arrive at the miniscope image sensor. We show that DeCalciOn can accurately decode the position of rats (n=12) running on a linear track from calcium fluorescence in the hippocampal CA1 layer, and can categorically classify behaviors performed by rats (n=2) during an instrumental task from calcium fluorescence in orbitofrontal cortex (OFC). DeCalciOn achieves high decoding accuracy at short latencies using innovations such as fieldprogrammable gate array (FPGA) hardware for real time image processing and contourfree methods to efficiently extract calcium traces from sensor images. In summary, our system offers an affordable plugandplay solution for realtime calcium imaging experiments in behaving animals.

 Computational and Systems Biology
 Immunology and Inflammation
Highthroughput sequencing of adaptive immune receptor repertoires is a valuable tool for receiving insights in adaptive immunity studies. Several powerful TCR/BCR repertoire reconstruction and analysis methods have been developed in the past decade. However, detecting and correcting the discrepancy between real and experimentally observed lymphocyte clone frequencies is still challenging. Here we discovered a hallmark anomaly in the ratio between read count and clone countbased frequencies of nonfunctional clonotypes in multiplex PCRbased immune repertoires. Calculating this anomaly, we formulated a quantitative measure of V and Jgenes frequency bias driven by multiplex PCR during library preparation called Over Amplification Rate (OAR). Based on the OAR concept, we developed an original software for multiplex PCRspecific bias evaluation and correction named iROAR: Immune Repertoire Over Amplification Removal (https://github.com/smiranast/iROAR). The iROAR algorithm was successfully tested on previously published TCR repertoires obtained using both 5' RACE (Rapid Amplification of cDNA Ends)based and multiplex PCRbased approaches and compared with a biological spikeinbased method for PCR bias evaluation. The developed approach can increase the accuracy and consistency of repertoires reconstructed by different methods making them more applicable for comparative analysis.