Patterns of interdivision time correlations reveal hidden cell cycle factors
Abstract
The time taken for cells to complete a round of cell division is a stochastic process controlled, in part, by intracellular factors. These factors can be inherited across cellular generations which gives rise to, often nonintuitive, correlation patterns in cell cycle timing between cells of different family relationships on lineage trees. Here, we formulate a framework of hidden inherited factors affecting the cell cycle that unifies known cell cycle control models and reveals three distinct interdivision time correlation patterns: aperiodic, alternator, and oscillator. We use Bayesian inference with singlecell datasets of cell division in bacteria, mammalian and cancer cells, to identify the inheritance motifs that underlie these datasets. From our inference, we find that interdivision time correlation patterns do not identify a single cell cycle model but generally admit a broad posterior distribution of possible mechanisms. Despite this unidentifiability, we observe that the inferred patterns reveal interpretable inheritance dynamics and hidden rhythmicity of cell cycle factors. This reveals that cell cycle factors are commonly driven by circadian rhythms, but their period may differ in cancer. Our quantitative analysis thus reveals that correlation patterns are an emergent phenomenon that impact cell proliferation and these patterns may be altered in disease.
Editor's evaluation
This work makes an important contribution to the study of the cell cycle and inferring mechanisms by studying correlations in division timing between single cells. By treating the problem in a general way and computing over lineage trees, the authors can infer timescales in the underlying mechanism. The method is validated on data sets from bacterial and mammalian cells and can suggest when additional measurements are needed to distinguish competing models.
https://doi.org/10.7554/eLife.80927.sa0Introduction
Cell proliferation, the process of repeated rounds of DNA replication and cell division, is driven by multiple cell extrinsic and intrinsic factors (Matson and Cook, 2017; Darzynkiewicz et al., 1982). Stochasticity in any or all of these factors therefore influences the time taken for a cell to divide, generating heterogeneity in cell cycle length, even in genetically identical populations. For example, stochastic gene expression (Elowitz et al., 2002) can lead to heterogeneity in cell cycle length (Kiviet et al., 2014; Ghusinga et al., 2016; Thomas et al., 2018) as these fluctuations can be propagated by concerted cellular cues (Co et al., 2017). These cues can exhibit reproducible stochastic patterns that are important in development, homeostasis and ultimately, for cell survival (Raser and O’Shea, 2005).
Singlecell technologies illuminate a world of cellular variation by replacing bulkaverage information with singlecell distributions. A key challenge is to exploit celltocell variability to identify the mechanisms of cellular regulation and responses (Raser and O’Shea, 2005; Martins and Locke, 2015). Timelapse microscopy allows us to resolve cell dynamics such as division timing, growth and protein expression (Locke and Elowitz, 2009, Figure 1a, left). This has led to many discoveries in cell cycle dynamics in bacteria (TaheriAraghi et al., 2015; Martins et al., 2016; Martins et al., 2018; Kohram et al., 2021) and mammalian cells (Barr et al., 2017; Arora et al., 2017; Ryl et al., 2017; Chakrabarti and Michor, 2020). Early advances included measuring the distribution of division times across single cells (Powell, 1956) and the correlations between cellular variables leading to cell size homeostasis (TaheriAraghi et al., 2015), while more recent applications of timelapse microscopy have captured multiple generations of proliferating cells, making lineage tracing possible (Errington et al., 2013; Cooper and Bakal, 2017).
While singlecell distributions measure variation between cellular variables, they ignore both temporal signals and variations propagating across generations to entire lineage trees (Ulicna et al., 2021; Kuchen et al., 2020; Sandler et al., 2015; Cowan and Staudte, 1986). These lineage tree correlation patterns can be robust and steady, similar to what is known in spatiotemporal pattern formation (Turing, 1990; Chaplain et al., 1999). Common examples of lineage tree correlation patterns concern the motherdaughter and the sister correlations that have been used to study cell size homeostasis in E. coli (TaheriAraghi et al., 2015; Ho et al., 2018) and other mechanisms generating correlated interdivision times such as population growth rate (Powell, 1956) and initiation of DNA synthesis (Cooper, 1982).
A counterintuitive correlation pattern presented by many cell types is the ‘cousinmother inequality’ (Sandler et al., 2015), where the interdivision times of cousin cells are more correlated than those of motherdaughter pairs. This inequality can be observed both in bacteria and mammalian cells (Figure 1b). More generally, lineage tree data gives rise to correlation patterns by comparing a single cell to any other cell on the tree (Figure 1a, right). Family relations – such as daughter, grandmother, cousin cells etc. – encode inheritance patterns, and correlations between these related cells have been used to understand the dynamics of cell populations (Mohammadi et al., 2021; Nakashima et al., 2020, Figure 1c). Several stochastic models have been proposed to explain interdivision time correlation patterns. Most of them make prior assumptions on the underlying mechanism controlling cell division such as those focusing on cell size control (Ho et al., 2018), DNA replication (Cooper and Helmstetter, 1968; Cooper, 1982) or underlying oscillators (Kohram et al., 2021). For example, inheritance of DNA content can explain the correlation in interdivision time between sister cells in bacteria (Cooper, 1982). Similarly, it has been shown that a simple model with interdivision time correlations (Cowan and Staudte, 1986) cannot satisfy the ‘cousinmother inequality’ (Sandler et al., 2015), but a more complex kicked cell cycle model does (Mosheiff et al., 2018). It is presently unclear what information correlation patterns carry about the underlying mechanisms that generate them. This is because a unified and systematic framework to generate any desired interdivision time correlation pattern is lacking.
Here, we propose a stochastic model to investigate how cell cycle factors – which we define in this work as hidden properties that affect interdivision time – shape the lineage tree correlation patterns of cells. These could include physiological factors, such as cell size, growth rate and cell cycle checkpoints, or specific cell cycle drivers such as CDKs, mitogens and division proteins. We will only focus on data describing patterns of interdivision time in bacterial and mammalian cell types, which circumvents intricate measurements of cell volume, mass, and DNA replication. This also avoids dealing with fluorescent reporter strains that may be difficult to engineer depending on cell type. We propose a generative model of correlation patterns that involves a number of hidden cell cycle factors and reduces to common mechanistic cell cycle models for specific parameter choices. Our theory predicts three distinct lineage correlation patterns; aperiodic, alternator and oscillator. We demonstrate how the model can be used to identify these patterns using Bayesian inference in bacteria and mammalian cells. Our analysis reveals several dynamical signatures of cell cycle factors hidden in lineage tree interdivision time data.
Results
A general inheritance matrix model provides a unified framework for lineage tree correlation patterns
Previous studies (Cowan and Staudte, 1986; Sandler et al., 2015) found that simple inheritance rules, where interdivision times are correlated from one generation to another through a single parameter, cannot explain the lineage correlation patterns seen in experimental singlecell data. To address this issue, we propose a unified framework where the interdivision time is determined by a number of cell cycle factors that represent hidden variables such as cell cycle phase lengths, protein levels, cell growth rate or other unknowns (Figure 1c), that each have their own inheritance pattern.
The states of the cell cycle factors is assumed to be a vector ${\mathit{y}}_{p}={({y}_{p,1},{y}_{p,1},\mathrm{\dots},{y}_{p,N})}^{\top}$ that determine the interdivision time of a cell with index $p$ via
Inheritance from mother to daughter of the $N$ cell cycle factors is described by a nonlinear stochastic Markov model on a lineage tree:
where $m\text{in}\mathbb{N}$ denotes the mother cell index and $2m$ and $2m+1$ the daughter cell indices. $f:{\mathbb{R}}_{+}^{N}\to {\mathbb{R}}_{+}$ and $\mathit{g}:{\mathbb{R}}_{+}^{N}\to {\mathbb{R}}_{+}^{N}$ are possibly nonlinear functions that model the dependence of the interdivision time on cell cycle factors and the inheritance process. ${\mathit{e}}_{p}={({e}_{p,1},{e}_{p,2},\mathrm{\dots},{e}_{p,N})}^{\top}$ is a noise vector for which the pair ${\mathit{e}}_{2m},{\mathit{e}}_{2m+1}$ are identically distributed random vectors with covariance matrix independent of $m$. A nonzero covariance between these noise vectors can account for correlated noise of sister cells. We implicitly assume symmetric cell division such that the deterministic part of the inheritance dynamics $\mathit{g}$ is identical between the daughter cells. Note that we choose Equation 1a to be deterministic since division noise can be modelled by adding one more cell cycle factor that does not affect inheritance dynamics $\mathit{g}$.
The general model (Equation 1a and b) includes many known cell cycle models as a special case. For example, the interactions between cell cycle factors could model cell size control mechanisms (Appendix 1  Section A6.1), the coordination of cell cycle phases (Appendix 1  Section A6.3), or deterministic cues, such as periodic forcing of the cell cycle (Appendix 1  Section A7.1), or coupling of the circadian clock to cell size control (Appendix 1  Section A7.2).
The full model can only be solved for specific choices of $f$ and $\mathit{g}$, and these functions are generally unknown in inference problems. To overcome this limitation, we assume small fluctuations resulting in an approximate linear stochastic system (see Appendix 1  Section A1 for a derivation) involving the interdivision time
The vector of cell cycle factor fluctuations ${\mathit{x}}_{p}={({x}_{p,1},{x}_{p,2},\mathrm{\cdots},{x}_{p,N})}^{\top}$ obeys
Here, $\overline{\tau}$ is the stationary mean interdivision time, $\mathit{\theta}$ is the $N\times N$inheritance matrix and ${\mathit{z}}_{2m}$ and ${\mathit{z}}_{2m+1}$ are two noise vectors of length $N$ that capture the stochasticity of inheritance dynamics and differentiate the sister cells (Figure 2a). We denote the $N\times N$ covariance matrices ${\mathit{S}}_{1}=\mathrm{Var}({\mathit{z}}_{2m})=\mathrm{Var}({\mathit{z}}_{2m+1})$ and ${\mathit{S}}_{2}=\mathrm{Cov}({\mathit{z}}_{2m},{\mathit{z}}_{2m+1}),\text{for all}m\text{in}\mathbb{N}$ of the noise terms $\mathit{z}$ (and $\mathit{e}$) in individual cells and between sister cells, respectively. The noise terms are independent for all other family relations. The cell cycle factor fluctuations are scaled such that $\mathit{\alpha}={({\alpha}_{1},{\alpha}_{2},\mathrm{\dots},{\alpha}_{N})}^{\top}$ is a binary vector of length $N$ made up of 1 s and 0 s depending on whether the function $f$ determining the interdivision time has dependence on a given cell cycle factor (see Appendix 1  Section A1 for details). Under this scaling each cell cycle factor has a positive effect on the interdivision time, and hence we do not distinguish between factors with positive or negative effects on interdivision time.
When the special case of a single cell cycle factor ($N=1$) is considered, the inheritance matrix model system reduces to a wellknown model with correlated division times (Cowan and Staudte, 1986; Staudte et al., 1984; Staudte, 1992; Staudte et al., 1997), and we will refer to this case as simple inheritance rules (see also Appendix 1  Section A5). In the following, we will explore the correlation patterns generated by multiple cell cycle factors.
The inheritance matrix model reveals three distinct interdivision time correlation patterns
Here, we define a correlation pattern to be the correlation coefficients of pairs of cells on a lineage tree. Here we introduce a function $\rho (k,l)$ which we call the generalised tree correlation function:
where ${\tau}_{k}$ and ${\tau}_{l}$ are the interdivision times of cells in the pair $(k,l)$, and ${s}_{\tau}$ is the interdivision time variance. The coordinate $(k,l)$ describes the distance in generations from each cell in the pair to their shared nearest common ancestor (Figure 2b and c). We have derived a closedform formula for $\rho (k,l)$ (Equation M3) in Materials and Methods  ‘Analytical solution of the inheritance matrix model’; (see Appendix 1  Section A3 for a full derivation) as a weighted sum of powers of the inheritance matrix eigenvalues $\lambda $:
with
We observe that the eigenvalues determine the dependence of the tree correlation function on $k$ and $l$, while the noise matrices ${\mathit{S}}_{1}$ and ${\mathit{S}}_{2}$ determine their relative weights ${w}_{ij}$ (see Equation 5).
Our theoretical analysis reveals three distinct correlation patterns that can be generated by the inheritance matrix model (further details in Materials and methods  ‘Analysis of tree correlation patterns’). These can be classified by the eigenvalues of the inheritance matrix $\mathit{\theta}$: (i) if the inheritance matrix exhibits real positive eigenvalues, we observe an aperiodic pattern (Figure 2d); (ii) if the inheritance matrix has real eigenvalues with at least one negative eigenvalue, we observe an alternator pattern (Figure 2e); and (iii) if there is a pair of complex eigenvalues we observe an oscillator pattern (Figure 2f). An intuitive interpretation of the eigenvalue decomposition is that it transforms the cell cycle factors into effective factors inherited independently. Hence, the inheritance matrix is diagonal in this basis. However, the analogy is limited to the case where the inheritance matrix is symmetric and the eigenvalues are real. For simplicity, we will focus on models with two cell cycle factors and note that in higher dimensions ($N\ge 3$), the correlation patterns involve a mixture of the three patterns discussed in detail in this section (Appendix 1—figure 6c, d and g,h).
To demonstrate the aperiodic correlation pattern, we utilise an inheritance matrix with positive real eigenvalues (Figure 2d). Characteristically, the modelled interdivision time correlations decay to zero as the distance to the most recent ancestor increases (Figure 2g) since the eigenvalues in Equation 4 are bounded between 0 and 1. To look more closely at the patterns on the tree, we utilise two reductions of the generalised tree correlation function. These are the lineage correlation function ($\rho (k,l)$ for $k$ or $l=0$) and the crossbranch correlation function ($\rho (k,l)$ for $k=l$). We look at these functions for continuous $k,l$ to visualise better the patterns that occur down the lineage and across the branches of the tree. The lineage correlation function gives the correlation dynamics as you go down the lineage tree, whereas the crossbranch correlation function gives the correlation dynamics as you move across neighbouring branches of the lineage tree. We observe that the interdivision time correlations decrease as we move both across generations and branches (Figure 2j).
In contrast, the alternator pattern generates oscillations with a fixed period of two generations in the lineage correlation function. The behaviour is typically observed for cell cycle factors with negative motherdaughter correlations (Appendix 1  Section A6.1). In this case, we have at least one negative eigenvalue and thus Equation 4 will alternate between positive and negative values for successive generations, producing the period two oscillation. We demonstrate this correlation pattern for the generalised tree correlation function (Figure 2h) using a diagonal $\mathit{\theta}$ matrix (Figure 2e). We observe alternating correlations across generations in the lineage correlation function, and the continuous interpolation of the crossbranch correlation function (Figure 2k). Although the period is fixed to two generations, the amplitude of the correlation oscillation varies with the absolute magnitude of the eigenvalues (Materials and methods  ‘Analysis of tree correlation patterns’).
To investigate the oscillator correlation pattern, we propose a hypothetical inheritance matrix $\mathit{\theta}$ with eigenvalues $\mathit{\lambda}=(D{e}^{+i\frac{2\pi}{P}},D{e}^{i\frac{2\pi}{P}})$ which are complex for $D,P\ne 0$ and $P\ne \frac{2}{k},k\text{in}\mathbb{Z}$ (Figure 2f). The parameters $P$ and $D$ control the period and the respective damping of an underlying oscillator, i.e., the limit $D\to 1$ leads to an undamped oscillation and $D\to 0$ corresponds to an overdamped oscillation (see Materials and methods  ‘Determining the period of correlation oscillations from the eigenvalues’ for details). Correspondingly, the graph of the generalised tree correlation function (Figure 2i) shows clear oscillations across generations. These correlation oscillations are also evident in the lineage correlation function but are absent in the crossbranch correlation function (Figure 2l). However, oscillations are possible in the cross branch correlation function for other choices of $\mathit{\theta}$ with complex eigenvalues (see model fits in Figure 3 and Methods  ‘Analysis of tree correlation patterns‘). In summary, the qualitative behaviour of the interdivision time correlation patterns can be studied using the eigenvalue decomposition of the inheritance matrix $\mathit{\theta}$.
The cousinmother inequality is not required to generate complex correlation patterns
Our analysis shows that of the three specified patterns, only the oscillator pattern cannot arise from simple inheritance rules. This is because it requires at least two inherited cell cycle factors ($N\ge 2$) for the inheritance matrix to possess complex eigenvalues. We therefore asked whether the oscillator pattern is necessary for the cousinmother inequality to be satisfied. We find that this is not the case, but instead, all three correlation patterns can be compatible with the cousinmother inequality if $N\ge 2$. To demonstrate this, we choose three specific twodimensional inheritance matrices $\mathit{\theta}$ that produce the required eigenvalue structure (Figure 2d–f). We then use these matrices with our analytical solution for the generalised tree correlation function (Materials and methods  ‘Analytical solution of the inheritance matrix model’) to map the regions where the cousinmother inequality can be satisfied (Figure 2m–o). Interestingly, we find that oscillations can arise even in parameter regions that violate the cousinmother inequality (Figure 2o). We conclude that both the cousinmother inequality and the oscillator pattern are sufficient but not necessary conditions to rule out simple inheritance rules.
To understand which datasets can be explained by simple inheritance rules, we fit the onedimensional model ($N=1$) to six publicly available lineage tree datasets (Appendix 1—table 1) using Bayesian methods (Materials and methods  ‘Data analysis and Bayesian inference of the inheritance matrix model’). These datasets were chosen as they each had a sufficient number of cells for correlation analysis and covered a broad range of cell types. We found that the model fit is poor for the datasets that display the cousinmother inequality, which is the case for cyanobacteria, clockdeleted cyanobacteria, neuroblastoma and human colorectal cancer cells (Appendix 1—figure 1a–f). Despite not obeying the cousinmother inequality, the fit is also poor for mouse embryonic fibroblasts (Appendix 1—figure 1f) as the median inferred correlation lies outside the 95% confidence intervals for both the grandmother and cousin correlations which are included in the model fit, and the confidence intervals for the data vs the credible intervals from the inference show minimal overlap (Appendix 1—figure 2f). Another inequality may be violated in this dataset that cannot be explained using the onedimensional model, suggesting that the absence of the cousinmother inequality cannot rule out more complex division rules. The only cell type that has a good fit for the onedimensional model is mycobacteria (Appendix 1—figure 1c). We thus conclude that the majority of the datasets must be described by higher dimensional inheritance dynamics of multiple cell cycle factors.
The twodimensional inheritance matrix model fits interdivision time correlation patterns from a range of cell types
We asked whether the correlation patterns are better described by a twodimensional inheritance matrix model. Bayesian inference (Materials and methods  ‘Data analysis and Bayesian inference of the inheritance matrix model’) produced a good model fit for all six datasets (Figure 3a–f) for the twofactor inheritance matrix model, within relatively narrow error bars of mother, grandmother, sister and cousin correlations (Appendix 1—table 1). The credible intervals from the Bayesian inference matched the confidence intervals of correlations used for fitting (Appendix 1—figure 2). We quantified the quality of our fits using the Akaike information criterion (AIC) (Materials and methods  ‘Data analysis and Bayesian inference of the inheritance matrix model’, (Equation M11)) for each dataset and compared these to the onedimensional model (Appendix 1—table 1). The AIC estimates the goodness of fit with a penalty for model complexity allowing us to select the simplest model that explains the data. The AIC values indicate that the inheritance matrix model with two cell cycle factors provides the simplest fit for all cell types used here, except for the mycobacteria data where simple inheritance rules provided an equally good fit with a significant reduction in the number of model parameters. We expected the AIC to select the two dimensional model where the cousinmother inequality was satisfied such as in cyanobacteria, clockdeleted cyanobacteria, neuroblastoma, and human colorectal cancer cells. The match with the twofactor inheritance matrix model in fibroblasts was less obvious.
Crucially, we find that the model has a good predictive capacity for correlations further down the lineage tree. For each pattern, we show several samples from the conditional posterior distribution (solid and shaded lines) to illustrate fits of the lineage correlation and crossbranch correlation function (Figure 3a–f). For all datasets except neuroblastoma, the curves also intercept the greatgrandmother and greatgreatgrandmother correlations that were not used for fitting (Figure 3a–d and f), and bootstrapped confidence intervals from the data overlapped with the credible intervals obtained from Bayesian inference (Appendix 1—figure 2). We then asked which correlation patterns underlie the data. To assess this, we calculated the eigenvalues of each posterior sample of the inheritance matrix to categorise the aperiodic, alternator and oscillator patterns (Figure 3a–f, bar charts). We found that in every dataset, the dominant correlation pattern was identifiable with probabilities well above 50%, except for mycobacteria (Figure 3c) that was better described by simple inheritance rules (Appendix 1—figure 1c).
Cyanobacteria, (Figure 3a), human colorectal cancer (Figure 3d), and mouse embryonic fibroblasts (Figure 3f) display a dominant oscillator pattern, but we see that their lineage correlation functions exhibit widely different periodicities. For example, the posterior lineage correlation for cyanobacteria displays a higher frequency oscillation than those in human colorectal cancer cells and fibroblasts. Clockdeleted cyanobacteria (Figure 3b) and mycobacteria (Figure 3c) display a dominant alternator pattern which could be induced by strong sister correlations. We see that clockdeleted cyanobacteria (Figure 3b) has a 100% alternator pattern in contrast to the 100% oscillator pattern seen for wild type cyanobacteria, suggesting that the deletion of the clock gene has completely transformed the correlation pattern and has abolished the underlying oscillation. Neuroblastoma (Figure 3e) displays a dominant aperiodic pattern. The predictive capacity for this cell type is weaker than for the other datasets, which we assume is due to the tight confidence interval in the correlations. Despite this discrepancy, we find that the inheritance matrix model produces excellent fits and has good predictive capacity for all other cell types studied in this work.
Bayesian inference reveals that individual inheritance parameters are not identifiable
We next ask which mechanisms are responsible for generating the observed correlation patterns. The Bayesian inference used for model fitting (Materials and methods  ‘Data analysis and Bayesian inference of the inheritance matrix model’) samples parameters using a MCMC Gibbs sampler. The Gibbs sampler can be thought of as a random walk in parameter space that settles around parameter regions with high likelihood. We found that the explorations of the Gibbs sampler did not settle in a particular parameter subspace but meandered off to explore vast areas of the parameter space without improving the likelihood values (Appendix 1—figure 3a and b). Such behaviour is expected when model parameters are not identifiable and the posterior distribution of parameters cannot be efficiently sampled (Rannala, 2002; Raue et al., 2013).
To provide further evidence of unidentifiablity, we obtained four histograms of a single parameter of the inheritance matrix for different initialisations. The four distributions are very different (Figure 4a), showing that the random walk does not settle to a stationary distribution. We further observe that the mean squared displacement increases without bound (Figure 4b) showing that the sampling does not settle in a particular subset of the parameter space. In contrast to the individual parameters, the sampled posterior distribution of the eigenvalues is consistent across the averages (Figure 4c) and their mean squared displacement converges rapidly (Figure 4d). We note that unidentifiability arises for the inheritance matrix model with multiple cell cycle factors and does not feature for simple inheritance rules (Appendix 1  Section A5). This ultimately demonstrates that the interdivision time correlation patterns do not identify a single set of inheritance parameters, but rather need to be described by a distribution of inheritance mechanisms.
The inheritance matrix model predicts the hidden dynamical correlations of cell cycle factors
Clockdeleted cyanobacteria and neuroblastoma both satisfy the cousinmother inequality (Figure 1b), which indicates that at least two cell cycle factors are responsible for the corresponding correlation patterns. The eigenvalues of the inheritance matrix concentrate in different regions of the admissible parameter space (Figure 4e), suggesting the correlation patterns that generate the cousinmother inequality are distinct. For the clockdeleted cyanobacteria dataset, we found that all posterior samples were consistent with an alternator correlation pattern, while most posterior samples presented aperiodic correlation patterns in neuroblastoma (Figure 3b and e bar charts).
We hypothesised that different inheritance models generate these patterns. To verify this hypothesis and since we cannot identify the cell cycle factors directly, we computed the motherdaughter correlations between the two hidden cell cycle factors. Since the order of factors is interchangeable, we only distinguish between motherdaughter correlations between the same ($\text{corr}({x}_{m,i},{x}_{2m,i})$ and $\text{corr}({x}_{m,i},{x}_{2m+1,i})$ for $i=1,2$) and alternate factors ($\text{corr}({x}_{m,i},{x}_{2m+1,j})$ and $\text{corr}({x}_{m,i},{x}_{2m,j})$ for $i\ne j=1,2$). The resulting posterior distributions revealed distinct correlation patterns of cell cycle factor correlations for clockdeleted cyanobacteria and neuroblastoma (Figure 4f). For clockdeleted cyanobacteria, we predict that at least one factor has a negative motherdaughter correlation while its crosscorrelation with the other factor must be positive; while the correlations are of opposite sign for neuroblastoma (Figure 4f). We sketch influence diagrams that summarise these relationships between factors (Figure 4g and h). Thus, the different interdivision time correlation patterns observed for clockdeleted cyanobacteria and neuroblastoma stem from distinct hidden correlation patterns of cell cycle factor fluctuations.
The inheritance matrix model reveals biological rhythms underlying the cell cycle
We observe that the lineage correlation functions of cyanobacteria, human colorectal cancer cells, and fibroblasts exhibit vastly different correlation oscillation periods (Figure 3). Next, we are interested to see whether the oscillations seen in these datasets are compatible with biological oscillators known to affect cell cycle control.
Correlation oscillations and underlying rhythms can exhibit vastly different periods
The period of the correlation oscillation is related to the location of the eigenvalues of the inheritance matrix on the complex plane. We consider an eigenvalue $\lambda $ of the inheritance matrix. In terms of the mean interdivision time $\overline{\tau}$, the correlation period $T}_{0$ is:
and the inequality means that the period $T}_{0$ is always greater than twice the mean interdivision time $\overline{\tau}$. More generally, there is an oscillation period associated with each eigenmode of the inheritance matrix, but the period is infinite for real eigenvalues, and thus only complex eigenvalues generate correlation oscillations. This inequality follows from Equation 6 using $\text{Arg}(\lambda )\le \pi $. However, known biological oscillators that influence cell cycle control often have periods less than twice the mean interdivision time, such as stress response regulators (Harper et al., 2018; StewartOrnstein et al., 2017) and gene expression oscillations (William et al., 2007; Gao et al., 2014; Whitfield et al., 2002). How can relatively slow observed correlation oscillations be compatible with much faster biological oscillators underlying the cell cycle?
The resolution to this issue is that the period of the correlation oscillation does not always match the frequency of the underlying oscillator. Instead there are a number of possible oscillator periods ${T}_{n}$ compatible with the correlation oscillation period $T}_{0$ (Appendix 1  Section A4) given by:
for $n\text{in}\mathbb{Z}$. This phenomenon, that the same correlation oscillation can be explained by multiple underlying oscillators, can be understood using the intuition in Figure 5a.
Circadian oscillations in cyanobacteria and fibroblasts support coupling of the circadian clock and the cell cycle
Cyanobacteria and fibroblasts both exhibit correlation patterns consistent with an oscillator underlying cell divisions (Figure 3e, bar chart). We observe that the posterior distribution of the eigenvalues is confined to a region with negative real parts for cyanobacteria and positive real parts for fibroblasts (Figure 5c). Using these distributions, we estimate the median period of the correlation oscillations (using Equation 6) to be 41.7 hr for cyanobacteria and 144.3 hr for fibroblasts (Figure 5d). We wondered whether the stark difference in the periods of the correlation oscillations indicates a different underlying rhythm. Conversely, we found this was not the case, but both correlation patterns were consistent with an approximate circadian rhythm. The posterior of the oscillator period ${T}_{1}$, which is closest to the period of correlation oscillation $T}_{0$, suggests a median period of 24.6 hr for cyanobacteria and a median period of 23.8 hr for fibroblasts (Figure 5e). We also validated the inference result using simulated data (Appendix 1—figure 9). This finding supports a strong coupling of circadian rhythms to the cell cycle, as reported previously for both cyanobacteria (Yang et al., 2010; Martins et al., 2018) and fibroblasts (Nagoshi et al., 2004; Menger et al., 2007; Nagoshi et al., 2005). Notably, we see that clockdeleted cyanobacteria displays 100% alternator pattern (Figure 3b) and therefore has a lineage tree correlation pattern that cannot be described by an approximate 24 hr oscillator, in contrast to wild type cyanobacteria.
Bimodal posterior distribution of underlying oscillations in human colon cancer
Finally, we turn to the analysis of cancer cell data. The dominant correlation pattern was oscillatory (78% posterior probability, Figure 3d, bar chart). The posterior distribution of complex eigenvalues for the oscillator pattern has support in a large region of the parameter space. It has two distribution modes depending on whether the eigenvalues have positive or negative real parts (Figure 5f). Similarly, the posterior of the correlation oscillation period is bimodal, too (Figure 5g), which means that two competing oscillator patterns are compatible with the data.
To disentangle these alternative hypotheses, we cluster the posterior samples by the real part of the eigenvalues. We label cluster A for negative real parts and cluster B for positive ones. The correlation periods of the individual clusters do not provide us with immediate clues about the underlying oscillators. Cluster A has a median correlation oscillation period of 51.2 hr while cluster B has a median period of 100.6 hr (Figure 5g). We therefore inspected the oscillator periods ${T}_{1}$ for each cluster, which are closest to the observed correlation period (Figure 5h). The median of the predicted oscillator period of cluster A has an oscillator period ${T}_{1}$ of 24.1 hr, which hints at a circadian oscillator underlying the cell cycle in agreement with a previous model (Chakrabarti et al., 2018). However, only about 33% of posterior samples with complex eigenvalues were assigned to this cluster. The majority of posterior samples, cluster B, had a different predicted period with a median of 19.6 hr (Figure 5h). A possible explanation is that the circadian period is shortened in cancer cells.
A strength of the Bayesian framework is that it allows us to express our confidence in this prediction. We find that our analysis is not conclusive about the correlation pattern as 78% of posterior samples showed an oscillator pattern. As a result, about 52% of all the posterior samples favour a 19.6 hr oscillator and 26% for the 24.1 hr oscillator, matching approximately circadian rhythm. 16% of the samples demonstrate alternator correlation patterns, and the remaining 6% samples are aperiodic (compare bar charts in Figures 3d and 5g). We therefore ask whether these competing models make predictions that translate into testable hypotheses. We found that the oscillator correlation pattern predicts a negative grandmother correlation while the alternator pattern predicts a positive grandmother correlation (Figure 5i and j). Thus, measuring the grandmother correlation with higher precision, for example, via increasing sample size, would tighten the confidence intervals of measured correlations (Figure 5i), and improve our ability to narrow down the true pattern. On the contrary, predicting the greatgrandmother correlation allows us to distinguish between the 19.6 hr and 24.1 hr rhythms (Figure 5h). Posterior samples in cluster A predicted a positive interdivision time correlation between a cell and its greatgrandmother, while cluster B predicted a negative correlation (Figure 5k and l). While the greatgrandmother correlation could not be estimated using the present data, deeper lineage trees could be used to discriminate the period of the biological oscillator and help reveal whether the circadian period is altered in cancer cells, or not. In summary, our theory helps to predict the hidden periodicities of biological oscillators from lineage tree interdivision time data.
Discussion
We propose a Bayesian approach to predict hidden cell cycle factor dynamics from interdivision time correlation patterns. Our underlying model fits the lineage tree data for a range of bacterial and mammalian cell types and allows us to classify different correlation patterns. Our inference demonstrates that these patterns are identifiable, but the individual inheritance parameters are not. This finding suggests that interdivision time correlations alone are insufficient to gain mechanistic insights into cell cycle control mechanisms. The identified correlation patterns, however, reveal the dynamics of the underlying cell cycle factors.
We focused on a datadriven approach without any prior assumptions of the division mechanism, allowing the interdivision time data to speak for itself. Other studies used a model similar to the inheritance matrix model proposed here, and linked latent factors to the interplay between cell cycle progression and growth (Kuchen et al., 2020). Autoregressive models have also been used in bacteria to discriminate between different mechanisms of cell size control (Kohram et al., 2021). Additionally, they have been used to combine growth and cell cycle reporters to explain interdivision time dynamics in fibroblasts (Mura et al., 2019). In principle, the inheritance matrix model can be used to model the inheritance dynamics of any factor affecting the interdivision time of a cell. In fact, it comprises many mechanistic models as special cases, such as those based on DNA replication, cell size control or cell cycle phases (Appendix 1  Section A6 and Appendix 1—figures 5 and 6). In future work, it will be useful to improve the identifiability of the model parameters. This could be accomplished either through including knowledge of inheritance mechanisms through prior distributions, or by including additional data on measured cell cycle factor dynamics – such as cell cycle phases, cell size, protein expression etc. – in the inference.
Another limitation of our inference is that we computed the interdivision time variance ${s}_{\tau}$ in Equation M2 of the model assuming that trees have equal number of generations in each branch. The advantage of this estimator is that it does not assume any particular noise distribution but this may lead to a statistical bias compared to the sample variance of treestructured data with branches of varying length (Powell, 1956; Hashimoto et al., 2016; Thomas, 2017; Jafarpour et al., 2018; Kuchen et al., 2020; Sandler et al., 2015). However, the approximation does not change the identified correlation patterns and the conclusion of this work, since any variance bias can be compensated by multiplying the noise matrices (${\mathit{S}}_{1,2}$ in Equation 5) with a constant, and, for the data analysed, the interdivision time variance estimators cannot be distinguished within the 95% confidence intervals (Appendix 1—table 3). Developing a theory correcting for such biases in lineage tree data will be the subject of future work.
An important result of the present analysis is that lineage tree correlation patterns of very different cell types – cyanobacteria, mouse embryonic fibroblasts and human colorectal cancer – can be explained through an underlying circadian oscillator coupled to cell division. While the coupling between the cell cycle and circadian clock is well established both in cyanobacteria and mouse embryonic fibroblasts, it is less well studied in cancer (Shostak, 2017; Kiessling et al., 2017). Our method robustly reconstructs the circadian rhythms from the interdivision time correlation patterns despite the lack of the cousinmother inequality for fibroblasts, demonstrating the cousinmother inequality is not required for complex correlation patterns (‘The cousinmother inequality is not required togenerate complex correlation patterns’). It is interesting to observe the differences in the oscillatory correlation patterns in these organisms. They are characterised by complex eigenvalues with negative real parts in cyanobacteria, but positive real parts in fibroblasts (Figure 5c), resulting in opposite motherdaughter correlations for these datasets (Figure 3a and f).
It would be interesting to explore what mechanisms underlie these different patterns. While the circadian clock in fibroblasts relies on transcriptional mechanisms (Hughes et al., 2009; Menger et al., 2007; Takahashi, 2017), the origin of the clock is nontranscriptional in cyanobacteria (Cohen and Golden, 2015; Tomita et al., 2005; Nakajima et al., 2005). The negative motherdaughter correlation in cyanobacteria likely stems from size control mechanisms that are modulated by the circadian clock (Martins et al., 2018). However, the mechanisms that generate positive motherdaughter correlations in fibroblasts are still to be explored. Interestingly, in human colorectal cancer, two oscillatory correlation patterns divide the posterior distributions into two distinct clusters with positive and negative motherdaughter correlations. If the circadian clock was to generate a positive motherdaughter correlation, as it does in fibroblasts which have a structurally related clock, the period corresponds to a 20 hr rhythm. This finding thus suggests that the circadian period is altered in cancerous cells. Indeed, several studies report similar periods of 18 hr and 20 hr for gene expressions in the human colorectal cancer coreclock (Fuhr et al., 2019; Parascandolo et al., 2020).
Our theory predicts that an oscillator’s period does not always match the period of the observed correlation oscillations. We describe a lower bound on the correlation period that is reminiscent of the NyquistShannon sampling theorem. This theorem describes temporal aliasing in digital audio processing, where a highfrequency signal produces lowfrequency oscillations when sampled at a frequency less than twice the sampling frequency. Similarly, spatial aliasing is observed in digital image processing as a moire pattern. In our analogy, the highfrequency signal is a biological oscillator that couples to cell division and is sampled at the cell division frequency (Figure 5a). Our result thus extends the NyquistShannon sampling theorem to lineage trees. Our finding has fundamental implications for the reconstruction of oscillator periods from interdivision time data, revealing that there exists a number of oscillators that can all explain the same correlation pattern.
Here, we concentrated on the oscillator periods ${T}_{1}$ that are closest to the correlation oscillation periods $T}_{0$. In principle, we cannot exclude that oscillators with shorter physiological periods are contributing to the observed lineage tree correlation patterns. For example, HES1 expression oscillates with a period of around 5 h in human colon cancer cells (William et al., 2007; Gao et al., 2014). The stress response regulators NFκB and p53, which are critical for tumour development, oscillate with periods of approximately 100 min and 5 hr, respectively (Harper et al., 2018; StewartOrnstein et al., 2017). The posterior distributions for periods in this region are not well separated (Appendix 1—figure 7c), which makes it challenging to identify factors that oscillate significantly faster than the cell cycle using interdivision time data. It is, however, unknown whether such hypothetical factors couple to cell division specifically in a manner to induce oscillatory interdivision time correlation patterns.
Going forward, there is a need to go beyond the NyquistShannon limit and develop methods that have increased sensitivity to discriminate a broader range of oscillator periods. One way to circumvent the limitation would be to employ fluorescent reporters of the circadian clock that could be correlated directly with cell division timing. Another way, would be to provide parallel readouts of the underlying rhythm through events that subsample the cell cycle, such as DNA replication, or the timing of individual cell cycle phases. Not only would we be able to look at the correlation in interdivision time between cells on a lineage tree, but we would also be able to analyse the correlations between individual phases and family members, to reveal specific phase control mechanisms. Our main findings result from the the inheritance matrix model with two cell cycle factors, as this was sufficient to explain the correlation patterns of the chosen data. In principle, increasing the number of interacting cell cycle factors can lead to more complex composite patterns that involve combinations of the three patterns discussed in this paper, such as the alternatoroscillator (Appendix 1—figure 6c and d), aperiodicoscillator (Appendix 1—figure 6g and h), or birhythmic correlation patterns. Such composite patterns could also arise as the result of nonlinear fluctuations that, within our framework, can be described by adding complexes of cell cycle factors to the inheritance matrix model (Appendix 1  Section A2). The presence of such complexes induces higherorder harmonics in the correlation oscillations, similar to those observed in the cyanobacterial and mammalian circadian clock (Martins et al., 2016; Thomas et al., 2013), and detecting such complexes could provide an alternative route to increase the sensitivity of our inference method.
In summary, our findings highlight the predictive power of Bayesian inference on singlecell data and how it can be leveraged to draw testable hypotheses for the design of future experiments. This was exemplified for human colorectal cancer cells, where various patterns were compatible with the data, something that nonprobabilistic approaches cannot accomplish as they fit only a single correlation pattern. In the future, it will be crucial to understand why different cell types have evolved specific lineage correlation patterns and how these patterns affect cell proliferation and disease. It would be interesting to understand whether specific correlation patterns give or reveal some fitness advantage and whether we can use them to predict cell survival. We anticipate that identifying hidden cell cycle factors and their rhythmicity using noninvasive methods such as interdivision time measurements will be instrumental in answering these questions and may benefit other fields where cell proliferation plays a pivotal role.
Code availability
Code available at https://github.com/pthomaslab/Lineagetreecorrelationpatterninference (copy archived at swh:1:rev:dc69bbce5ce909813d7d4356c9fd2da045e02c79; Hughes, 2022).
Materials and methods
Analytical solution of the inheritance matrix model
Request a detailed protocolFrom Equation 2b and $\mathrm{IE}[{\mathit{z}}_{p}]=0,\text{for all}p\text{in}\mathbb{N}$, we see that the vector of cell cycle factors has zero mean $\mathrm{IE}[{\mathit{x}}_{p}]=0$. Its $N\times N$ covariance matrix $\mathbf{\sum}=\mathrm{Cov}({\mathit{x}}_{p},{\mathit{x}}_{p})$ satisfies a discretetime Lyapunov equation:
From the solution of Equation M1, we compute the variance of the interdivision time
and the generalised tree correlation function $\rho (k,l)$ (see Appendix 1  Section A3 for a detailed derivation) given by:
where $\mathit{\omega}(k,l)={\mathit{\theta}}^{k}\mathbf{\sum}{\left({\mathit{\theta}}^{l}\right)}^{\top}+{\delta}_{k\ge 1}{\delta}_{l\ge 1}{\mathit{\theta}}^{k1}{\mathit{S}}_{2}{\left({\mathit{\theta}}^{l1}\right)}^{\top}$ with
To ensure that the lineage tree correlation pattern is stationary, we require $\text{SR}(\mathit{\theta})<1$ where $\text{SR}(\mathit{\theta})=\text{max}({\lambda}_{1},{\lambda}_{2},\mathrm{\dots},{\lambda}_{N})$ is the spectral radius of $\mathit{\theta}$. This also ensures that the solutions to Equation M1; $\mathbf{\sum}$, ${\mathit{S}}_{1}$ and the function Equation M3 are unique and independent of the initial conditions.
Analysis of tree correlation patterns
Request a detailed protocolThe patterns of the generalised tree correlation function can be characterised through its eigendecomposition. The general decomposition proceeds through finding the matrix of eigenvectors $U$ of $\mathit{\theta}$ such that
is the diagonal matrix of eigenvalues. Defining ${\widehat{\mathit{S}}}_{1,2}=U{\mathit{S}}_{1,2}{U}^{\top}$ and $\widehat{\mathit{\alpha}}={({U}^{1})}^{\top}\mathit{\alpha}$, the solution to Equation M1 is given by
This result can then be used to find an explicit expression for the generalised tree correlation function:
where
Equation M7 can be rewritten as a superposition of patterns Equation 4 with weights given by Equation 5.
The pattern of the tree correlation function is thus governed by the eigenvalues of the inheritance matrix $\mathit{\theta}$: (i) if one eigenvalue, say ${\lambda}_{1}$, is positive then the factor ${\widehat{\omega}}_{11}(k,0)={\widehat{\omega}}_{11}(0,k)\propto {\lambda}_{1}^{k}$ contributing to lineage correlation decays monotonically. The factor ${\widehat{\omega}}_{11}(k,k)\propto {\lambda}_{1}^{2k}$ contributing to the crossbranch correlation decays twice as fast; (ii) if there is a negative eigenvalue, the factor ${\widehat{\omega}}_{11}(k,0)={\widehat{\omega}}_{11}(0,k)\propto {(1)}^{k}{{\lambda}_{1}}^{k}$ alternates between negative and positive values with an envelope of ${{\lambda}_{1}}^{k}$, while the corresponding contribution to the crossbranch correlation decays monotonically with rate as ${{\lambda}_{1}}^{2k}$. Finally, if we have a pair of complex eigenvalues ${\lambda}_{1}={\lambda}_{2}^{\ast}=D{e}^{i\mathrm{\Omega}}$ then the factors ${\widehat{\omega}}_{i,j}(k,0)={\widehat{\omega}}_{i,j}(0,k)$ contributing to the lineage correlation function display damped oscillations with frequency $\mathrm{\Omega}$ and envelope ${D}^{k}$, while the factor ${\widehat{\omega}}_{12}(k,k)={\widehat{\omega}}_{12}^{\ast}(k,k)\propto {D}^{2k}$ and the factor ${\widehat{\omega}}_{11}(k,k)={\widehat{\omega}}_{22}^{\ast}(k,k)\propto {D}^{2k}{e}^{i2\mathrm{\Omega}k}$ oscillate with frequency $2\mathrm{\Omega}$.
Determining the period of correlation oscillations from the eigenvalues
Request a detailed protocolWe consider the case where the inheritance matrix $\mathit{\theta}$ has a pair of complex conjugate eigenvalues ${\lambda}^{\pm}=D{e}^{\pm i2\pi /P}$. The lineage correlation function then oscillates whenever $D\ne \{0,1\}$ and $P\ne \frac{2}{k},k\text{in}\mathbb{Z}$. The period of correlation oscillations per generation is given by
where $\text{Arg}(\lambda )\phantom{\rule{thickmathspace}{0ex}}\text{in}\phantom{\rule{thickmathspace}{0ex}}(\pi ,\pi ]$ is the argument of the eigenvalue and $\mathrm{ln}(\cdot )$ is the complex logarithm. The former is the angle made between the line joining the origin and the eigenvalue $\lambda $ on the complex plane with the real axis. This means that ${T}_{0}/\overline{\tau}=P$ if and only if $P>2$. Otherwise, $T}_{0$ is calculated in terms of $P$ by Equation M9, (Appendix 1—figure 8).
Data analysis and Bayesian inference of the inheritance matrix model
Request a detailed protocolWe determined all pairs of cells in a lineage tree, sorted them by family relations $(k,l)$ and calculated the sample correlation coefficient of interdivision times (Equation 3). To maximise the number of samples used to calculate these correlations, an individual cell can appear in more than one pair. For example, if a cell had two cousins, it would be counted in two separate cousin pairs in the cousincousin correlation coefficient calculation. For training, we focus on the sample statistics $\widehat{\mathit{X}}=({\widehat{s}}_{\tau},{\{{\widehat{\rho}}_{(k,l)}\}}_{(k,l)\text{in}C})$ with $C=\{(1,0),(2,0),(1,1),(2,2)\}$ comprised of the interdivision time sample variance and four interdivision time sample correlation coefficients given by the motherdaughter, grandmothergranddaughter, sistersister and cousincousin relations (Figure 2a). Note that ${\widehat{s}}_{\tau}$ is computed across all interdivision times used to calculate the correlation coefficients in each dataset. Errors are estimated using bootstrapping by resampling cell pairs with replacement 10,000 times. The resulting variances and correlation coefficients are given in Appendix 1—table 1.
The vector of inferred model parameters for the twodimensional model is $\mathbf{\Theta}=(\mathit{\theta},{\mathit{S}}_{1})$, where we fix $\mathit{\alpha}={(1,1)}^{\top}$ and ${\mathit{S}}_{2}=0$ for simplicity. A different choice of $\mathit{\alpha}$ did not affect our results (Appendix 1—figure 4). Since ${\mathit{S}}_{1}$ is symmetric, it consists of the $N$ variances and $N(N1)/2$ correlation coefficients between the components of $\mathit{z}$. Thus for $N=2$ the inheritance matrix model has seven free parameters to be estimated. We assumed that the loglikelihood for these statistics is the sum of square errors:
which is equivalent to assuming that the sample variance and correlation coefficients are normally distributed for large sample sizes. We calculate the interdivision time variance ${s}_{\tau}$ and the generalised tree correlation function $\rho (k,l)$ from Equation M2 and Equation M3. Note that Equation M2 is the interdivision time variance from a tree where all lineages have the same number of generations, which approximates the variance across all cells in the observed trees (Appendix 1—table 3). For simplicity, we neglected possible correlations between the sample statistics in $\widehat{\mathit{X}}$ and used bootstrapped estimates for the standard deviation of the sample statistics ${\widehat{\sigma}}_{{\widehat{s}}_{\tau}}$ and ${\widehat{\sigma}}_{{\widehat{\rho}}_{(k,l)}}$ (Appendix 1—table 1). Note that the likelihood is independent of the mean since it is irrelevant for the correlation pattern. We assumed a flat prior with support restricted to $\text{SR}(\mathit{\theta})<1$ and ${\mathit{S}}_{1}$ positive semidefinite to guarantee the existence of a stationary correlation pattern.
The numerical implementation uses the adaptive Gibbs‐sampler implemented in the Julia library Mamba.jl (Smith, 2018). For each dataset, we sample 11 million parameter sets which include a burnin transient of 1 million samples. These samples are removed before analysis of the output.
For model comparison we use the AIC (Akaike, 1974) given by
where $k$ is the number of model parameters and $\mathrm{ln}\left(\widehat{\mathcal{L}}\right)$ is the maximum value of the loglikelihood function given by Equation M10. For ${\mathit{S}}_{2}\ne 0$, the inheritance matrix model has $k=d(1+2d)$ parameters where $d$ is the number of cell cycle factors in the model. For ${\mathit{S}}_{2}=0$ the number of parameters reduces to $k=\frac{1}{2}d(1+3d)$.
Appendix 1
A1. Small noise approximation
Here, we will derive the inheritance matrix model given by Equation 2a, and b in the main text. We assume that the fluctuations in the hidden cell factor dynamics are small, which leads to a computationally efficient approximation.
Firstly, in the limit of zero fluctuations, all cells must be identical. Hence, all cell cycle factors are equal to their means $\mathit{\mu}=({\mu}_{1},{\mu}_{2},\cdots ,{\mu}_{N}{)}^{\mathrm{\top}}=\mathbf{E}({y}_{p})$ and similarly for the noise vectors $\beta =({\beta}_{1},{\beta}_{2},\cdots ,{\beta}_{N}{)}^{\mathrm{\top}}=\mathbf{E}({e}_{2m})=\mathbf{E}({e}_{2m+1})$ in Equation 1b. From Equation 1a, and b we then find that
which can be efficiently solved for $\overline{\tau}$ and $\mu$ using standard numerical methods.
Secondly, we can decompose the interdivision time and the cell cycle factor vector into their respective mean and fluctuating components by
Denoting the index of the present cell by p and the one of its mother by m, we can expand f and g around the limit of zero fluctuations and we obtain to leading order
where
Using this expansion and Equation A2 in Equation 1a and b of the main text we arrive at
where we have set ${\mathit{e}}_{p}=\mathit{\beta}+{\stackrel{~}{\mathit{z}}}_{p}$ and ${\stackrel{~}{\mathit{z}}}_{p}={({\stackrel{~}{z}}_{p,1},{\stackrel{~}{z}}_{p,2},\mathrm{\cdots},{\stackrel{~}{z}}_{p,N})}^{\top}$ giving the fluctuations around the mean for the noise vectors. Comparing Equation A7 with Equation A1 and collecting terms to leading order, we obtain the linearised system:
Next, we define the diagonal scaling matrix $\mathbf{\Gamma}$ with nonzero elements as
for $i=1,2,\mathrm{\dots},N$. Using the rescaled noise sources $\mathit{z}=\mathbf{\Gamma}\stackrel{~}{\mathit{z}}$, we find the rescaled inheritance matrix $\mathit{\theta}$ and $\mathit{\alpha}$coefficients
The rescaled cell cycle factor fluctuations ${\mathit{x}}_{m}=\mathbf{\Gamma}{\stackrel{~}{\mathit{x}}}_{m}$ follow Equation 2 of the main text and we reach rescaled variancecovariance matrices ${\mathit{S}}_{1}$ and ${\mathit{S}}_{2}$ as follows
A2. Beyond the small noise approximation: cell cycle factor complexes account for nonlinear fluctuations
Here we analyse the effect of nonlinearity on the interdivision time correlation patterns. For simplicity we consider a single cell cycle factor and follow the same lines as in Appendix 1  Section A1, Equation A9, while including terms of order ${x}^{2}$. This leads to the expansion interdivision time and factor fluctuations
where $\stackrel{~}{\theta}$ is the Jacobian of the cell cycle factor dynamics, as before, and $\stackrel{~}{H}={g}^{\prime \prime}({\mu}_{x})$ is the Hessian. From the second equation we obtain
Defining ${\stackrel{~}{\mathit{X}}}_{p}=({\stackrel{~}{x}}_{p},{\stackrel{~}{x}}_{p}^{2})$ and ${\stackrel{~}{\mathit{Z}}}_{p}=({\stackrel{~}{z}}_{p},{\stackrel{~}{z}}_{p}^{2})$, combining Equation A14 and Equation A16 and rescaling variables as in Equation A12 and Equation A13, we find the extended inheritance matrix model
where
Here $H=\stackrel{~}{H}\stackrel{~}{\alpha}/\stackrel{~}{\beta}$ and $\beta =1$ if $\stackrel{~}{\beta}\ne 0$, and analogously, $H=\stackrel{~}{H}\stackrel{~}{\alpha}$ and $\beta =0$ if $\stackrel{~}{\beta}=0$. Hence, the interdivision time correlation patterns with small to moderate fluctuations can be described through an extended linear system (Equation A7) that includes nonlinear terms ${x}_{p}^{2}$. These additional terms can be interpreted as cell cycle factors forming binary complexes. The presence of these complexes increases the number of cell cycle factors and extends the eigenvalue spectrum of the effective inheritance matrix $\mathrm{\Theta}$ by ${\theta}^{2}$. Hence, the presence of complexes leads to mixed correlation patterns. For example, for a single cell cycle factor, the eigenvalues of $\mathbf{\Theta}$ are $(\theta ,{\theta}^{2})$, which corresponds to an alternator pattern for $\theta <0$. More generally, we may expect that nonlinear patterns can be described through mixtures of aperiodic, alternator, and oscillatory patterns. For example, the complex eigenvalue spectrum of an oscillator pattern (${e}^{\pm i2\pi /P}$) will include powers of complex eigenvalues (${e}^{\pm i4\pi /P}$) resulting in harmonics of the fundamental correlation oscillation frequency similar to higher order harmonics observed in singlecell timeseries of the circadian clock (Martins et al., 2016; Thomas et al., 2013).
A3. Derivation of the generalised tree correlation function
In this section we derive an analytical expression for the generalised tree correlation function. This gives the Pearson correlation coefficient in interdivision time for any pair of related cells. We start with the equation for the Pearson correlation coefficient, and from there derive a formula for the interdivision time covariance using the known properties of the cell cycle factors $\mathit{x}$. From this, we can derive the general formula for the correlation coefficient between any related cell pair.
We associate a cell pair with an index $(k,l)$ which measures the distance to the nearest common ancestor as given in ‘The inheritance matrix model reveals threedistinct interdivision time correlation patterns’ (Figure 2a). From this, we denote their interdivision time fluctuations as ${\tau}_{k}^{\prime}$ and ${\tau}_{l}^{\prime}$ respectively. The Pearson correlation coefficient between these fluctuations is given by
where ${s}_{{\tau}^{\prime}}$ is the variance of the interdivision time fluctuations.
The interdivision time fluctuations ${\tau}_{k}^{\prime}$ and ${\tau}_{l}^{\prime}$ are calculated from the vector of rescaled cell cycle factor fluctuations ${\mathit{x}}_{k}$ as given in ‘A general inheritance matrix model provides a unified framework for lineage tree correlation patterns’, giving the equations
Substituting Equation A20 into Equation A19, we obtain a formula for in terms of the cell cycle factor fluctuations and the coefficients alone
Since ${\mathit{x}}_{k}$ and ${\mathit{x}}_{l}$ are identically distributed in steady state, we have that $\mathrm{Var}({\mathit{x}}_{k})=\mathrm{Var}({\mathit{x}}_{l})=\mathrm{Cov}(\mathit{x},\mathit{x})=\mathbf{\sum}$ as specified in Materials and methods  ‘Analytical solution of the inheritance matrix model’‘. We can write $\rho (k,l)$ now as
where ${\mathit{\alpha}}^{\top}\mathbf{\sum}\mathit{\alpha}$ gives the variance of the interdivision time fluctuations ${\tau}^{\prime}$.
Using the model Equation 2 we can write the formula for the $\mathit{x}$ vectors for the two cells in the cell pair $(k,l)$ as
where cells $k$ and $l$ have mother cells $k1$ and $l1$ respectively. The two cells are sisters if and only if their subscripts are both equal to 1, meaning they share a mother cell. Using recurrence of the model, we can write these equations as
where ${\mathit{x}}_{0}$ is the vector of cell cycle factors for the most recent common ancestor for a cell pair given by $(k,l)$.
All that remains is to derive a function for $\mathrm{Cov}({\mathit{x}}_{k},{\mathit{x}}_{l})$ which we will denote $\mathit{\omega}(k,l)$. We calculate $\mathit{\omega}(k,l)$ as follows using expectations:
where ${\mathit{\mu}}_{{\mathit{x}}_{k}}$ and ${\mathit{\mu}}_{{\mathit{x}}_{l}}$ are the mean vectors of ${\mathit{x}}_{k}$ and ${\mathit{x}}_{l}$ respectively which are both equal to 0, giving
To find $\mathrm{IE}\left[{\mathit{x}}_{k}{\mathit{x}}_{l}^{\top}\right]$ in terms of the model parameters, we substitute in Equation A26 for ${\mathit{x}}_{k}$ and ${\mathit{x}}_{l}$ and get
The noise term fluctuations $\mathit{z}$ are only correlated if the cells are sisters, which only occurs when the distance we have $(k,l)=(1,1)$. So for the summations above, we exclude all terms except where $i=j=1$. Doing this and expanding we get
where ${\delta}_{k\ge 1}$ and ${\delta}_{l\ge 1}$ are given in Equation M4. We also have that
The matrix $\mathrm{Cov}({\mathit{x}}_{0},{\mathit{x}}_{0})$ is equivalent to the covariance matrix for any $\mathit{x}$, giving $\mathrm{Cov}({\mathit{x}}_{0},{\mathit{x}}_{0})=\mathbf{\sum}$. This gives
Similarly we have,
As $\mathrm{IE}(\mathit{z})=0$, and $\mathrm{Cov}({\mathit{z}}_{2m},{\mathit{z}}_{2m+1})={\mathit{S}}_{2}$ as stated in Materials and methods  ‘Analytical solution of the inheritance matrix model’, we obtain,
Equation A31 therefore becomes:
Substituting Equation A37 back into Equation A29 we get
giving us the final equation for $\mathit{\omega}(k,l)$. Using the above equation in Equation A23, we obtain Equation M3 of the Methods.
A4. Derivation of the formula for the oscillator periods, ${T}_{n}$
The period of correlation oscillation as observed in the lineage correlation functions is given by Thomas et al., 2018. We can reveal the underlying oscillator periods by shifting the inferred period $T}_{0$ to obtain a smaller period ${T}_{n}$. This means that shorter periods would produce the same inferred period in the lineage correlation function when sampled at the original frequency of once per cell cycle (Figure 5a).
The oscillator periods are obtained by adding or subtracting multiples of $2\pi $ to the argument of the eigenvalue which results in the new argument being in the same position in the complex plane. The oscillator period ${T}_{n}$ with shift $n\text{in}\mathbb{Z}$ is therefore given by
Taking Equation A40 and substituting in Equation 6, we obtain ${T}_{n}$ in terms of $T}_{0$ as Equation 7.
A5. Solution of the tree correlation function and parameter identifiability for simple inheritance rules
We consider the limiting case of a single cell cycle factor ($N=1$) resulting in simple inheritance rules. This situation could model a growth factor that can either increase or decrease interdivision times of cells depending on the monotonicity of $f$ in Equation 1. The analytical solution (Equation M7) of the inheritance matrix model then reduces to
where $w=1+{\delta}_{k\ge 1}{\delta}_{l\ge 1}\frac{{S}_{2}}{{S}_{1}}\frac{(1{\theta}^{2})}{{\theta}^{2}}$. First, we observe that, given single cell measurements of the motherdaughter correlation coefficient $\rho (0,1)$, the daughterdaughter correlation coefficient $\rho (1,1)$ and the variance ${s}_{\tau}$, the parameters $\theta $, S_{1} and S_{2} are uniquely identifiable:
Thus measurements of the variance, lineage and crossbranch correlations fully determine the parameters. The tree correlation function is, however, independent of $f$, which means that the interdivision time correlation pattern carries no information whether the growth factor increases or decreases growth. The reason for this indifference is that cell cycle factors are identified only by their fluctuation pattern, i.e., for each cell cycle factor whose fluctuations increase interdivision time $x$, we could define another cell cycle factor fluctuation that decrease interdivision time $x$. We accounted for this unidentifiability issue trough a similarity transformation using the scaling matrix $\mathbf{\Gamma}$ in Equation A12 and Equation A13 that transforms all cell cycle factor fluctuations to increase interdivision time. Of course, this unidentifiabiliy could be removed through explicitly measuring the involved cell cycle factors.
A6. Mapping mechanistic cell cycle and cell size control models to the inheritance matrix model
To further investigate the output of the inheritance matrix model, we propose multiple models of known cell cycle control mechanisms, and map them to our inheritance matrix model framework. All cell size models assume symmetric division.
A6.1 Cell size control model with correlated growth
Considering the influence of cell size control on interdivision time (TaheriAraghi et al., 2015; Kohram et al., 2021; Ho et al., 2018), here we propose a cell size control model where we have some mother to daughter inheritance of both the added size $\mathrm{\Delta}$ and the growth rate $\kappa $ (Appendix 1—figure 5g). The model equations are given by:
The noise terms $\xi $ and $\varphi $ are independent between sisters such that $\mathrm{Cov}({\xi}_{2m},{\xi}_{2m+1})=\mathrm{Cov}({\varphi}_{2m},{\varphi}_{2m+1})=0$. Assuming exponential growth the formula for the interdivision time is given by
where $p$ represents the index of a given cell. Taking the vector of cell cycle factors for the mother cell to be ${\mathit{y}}_{m}={({y}_{m,1},{y}_{m,2},{y}_{m,3})}^{\top}={({\mathrm{\Delta}}_{m},{s}_{b,m},{\kappa}_{m})}^{\top}$ and comparing Equation 1a and b with Equation A43 and Equation A44, we obtain
Then we can calculate the means from Equation A1,
Then using Equation A45 in Equation A5 and Equation A12, we find
Assuming ${\stackrel{~}{\mathit{S}}}_{2}=0$ and using Equation A13, we find
The $\mathit{\theta}$ matrix has eigenvalues $\mathit{\lambda}=(\frac{a}{2},b,c)$ which give an aperiodic pattern for $a,b,c>0$ and an alternator pattern otherwise (Appendix 1—figure 5i and j). These same patterns arise for all real eigenvalues in the 3D model in the same was as in the twodimensional system. Only a single negative eigenvalue is needed for the lineage correlation function to display an alternator pattern. We are restricted to $a\text{in}(2,2)$ and $b,c\text{in}(1,1)$ to ensure $\text{SR}(\mathit{\theta})<1$. The cousinmother inequality for this system is too complex to be looked at analytically, so we use numerical methods to visualise the parameter region in which the cousinmother inequality can be satisfied (Appendix 1—figure 5h).
For the case of the aperiodic pattern, we observe positive same factor motherdaughter correlation and negative alternate factor motherdaughter correlation (Appendix 1—figure 5k). In contrast, for an alternator pattern, the mother daughter same factor correlation is negative, but the alternate factor correlations vary between positive and negative values (Appendix 1—figure 5I).
A6.2 Simple cell size control model
For the special case of $b=\mathrm{Var}[\varphi ]=0$ and $c=1$, the model reduces to a simple cell size control model with fluctuating added size (Appendix 1—figure 5a). The inheritance matrix $\mathit{\theta}$ then has eigenvalues $\mathit{\lambda}=(0,\frac{a}{2})$. Thus depending on the choice of $a$, this model can produce both an alternator and aperiodic pattern (Appendix 1—figure 5c and d). In this case, using Equation M3 the cousinmother inequality becomes
which cannot be satisfied for $a<2$, which implies $\text{SR}(\mathit{\theta})=\frac{a}{2}<1$. Hence the cousinmother inequality cannot be satisfied for any reasonable choice of $a$ in this simple model (Appendix 1—figure 5b).
For an aperiodic pattern, this simplified model exhibits positive same factor motherdaughter correlation and negative alternate factor motherdaughter correlation (Appendix 1—figure 5e). In the alternator case, this model exhibits negative same factor motherdaughter correlation and also negative alternate factor motherdaughter correlation (Appendix 1—figure 5f).
A6.3 Abstract cell cycle phase model
We propose a model of two abstract cell cycle phases that have no integrated dependence on cell size (Appendix 1—figure 5m). The model equations are given by
The noise terms $\xi $ and $\varphi $ are independent between sister cells such that $\mathrm{Cov}({\xi}_{2m},{\xi}_{2m+1})=\mathrm{Cov}({\varphi}_{2m},{\varphi}_{2m+1})=0.$ In this case we have that the two factors make up the length of the cell cycle, so we simply have ${\tau}_{p}={y}_{p,1}+{y}_{p,2}$.
Therefore using Equation 1a and b, we obtain
We calculate the means from Equation A1,
Then using Equation A51 in Equation A5 and Equation A12, we find
As the noise terms are independent between sisters we have ${\stackrel{~}{\mathit{S}}}_{2}=0$ and using Equation A13 we obtain
The inheritance matrix $\mathit{\theta}$ has eigenvalues $\mathit{\lambda}=(a,c)$ which gives an aperiodic pattern for $a$ and $c>0$ and an alternator pattern otherwise (Appendix 1—figure 5o, p).
The analytical form of the cousinmother inequality is complex so we use numerical methods to visualise the parameter region in which the cousinmother inequality can be satisfied (Appendix 1—figure 5n).
We calculate individual factor motherdaughter correlations and find that for an aperiodic pattern, the model exhibits a range of correlation patterns (Appendix 1—figure 5q). However, for an alternator pattern, we obtain positive same factor motherdaughter correlation and negative alternate factor motherdaughter correlation (Appendix 1—figure 5r)
A7. Models of circadianclockdriven correlation patterns
A7.1 Kicked cell cycle model
Here we analyse the kicked cell cycle model (Mosheiff et al., 2018) with our framework (Appendix 1—figure 6a). We will propose an inheritance matrix and then show that it reduces to the kicked cell cycle model for certain parameter choices. Consider the $3\times 3$ inheritance matrix $\mathit{\theta}$ and noise vector ${\mathit{z}}_{n}$ given by
for $n\in \{2m,2m+1\}$. We have that ${\mathit{S}}_{1}$ is given by $\mathrm{Cov}({\mathit{z}}_{2m},{\mathit{z}}_{2m})$, however we assume that the noise terms ${\xi}_{n}$ are independent between sisters such that ${\mathit{S}}_{2}=\mathrm{Cov}({\mathit{z}}_{2m},{\mathit{z}}_{2m+1})=0$. Assuming $\mathit{\alpha}={(1,0,0)}^{\top}$, the interdivision times are governed by
The oscillator is represented by the cell cycle factors $\widehat{\mathit{x}}$ that evolve according to
with oscillator inheritance matrix
We can solve Equation A56 along an ancestral lineage of $n$ generations
where ${\widehat{\mathit{x}}}_{0}$ is the state of the ancestral cell. Substituting Equation A58 into Equation A55 and assuming ${\widehat{\mathit{z}}}_{i}=0$, i.e., the cell cycle oscillator $\widehat{\mathit{x}}$ is deterministic, the interdivision time of the mother determines the interdivision time of the daughter cell via
where ${\widehat{x}}_{0}^{+}=({\widehat{x}}_{0,1}+{\widehat{x}}_{0,2})$ and ${\widehat{x}}_{0}^{}=({\widehat{x}}_{0,1}{\widehat{x}}_{0,2})$ which represent initial conditions. Assuming ${t}_{n}={\sum}_{i=1}^{n}{\tau}_{n}\approx n\overline{\tau}$ approximates the time at birth for $n\gg 1$, this leads to
Comparing Equation A60 to Equations 1 and 2 in Mosheiff et al., 2018, we see that our IMM agrees with the kicked cell cycle model when $D=1$, ${\widehat{x}}_{0}^{+}=0$, ${\xi}_{n,1}={\xi}_{n,2}=0$, and large $n$.
A7.2 Circadianclockdriven cell size control model
Here we analyse the model of cell size control driven by the circadian clock proposed in Martins et al., 2018 within the inheritance matrix model framework (Appendix 1—figure 6e). The division rate, $\mathrm{\Gamma}(s,{s}_{b},\frac{\partial s}{\partial t},t)$ in Equation 1 of Martins et al., 2018 is given by
where $s$ is the cell size with s_{b} being the size at birth. $G(t)$ is a function of time $t$ that couples the size control to the circadian clock, and $S(s,{s}_{b})$ is the division rate per unit volume of the cell. Assuming cells grow exponentially with growth rate $\alpha $, we have
and the division size follows
where $s}_{b$ is the size at birth and $t}_{b$ is the time at birth.
To map these to our inheritance matrix model, we observe that samples from Equation A63 follow
where $\stackrel{~}{g}({t}_{b,m},{s}_{b,m})={E}_{P}[{s}_{d}{t}_{b},{s}_{b}]$ is a drift term and ${\stackrel{~}{\eta}}_{m}$ is a zeromean noise term that depends both on time of day and birth size. Note that both $\stackrel{~}{g}({t}_{b,m})$ and ${\stackrel{~}{\eta}}_{m}({t}_{b,m},{s}_{b,m})$ are periodic functions of time at birth ${t}_{b,m}$. Since the latter is not explicitly modelled in our framework, here, we replace it with the state ${x}_{0,m}$ of the circadian clock, such that the update equations in Equation A64 now appear as
To gain intuition into the shape of the unknown functions $g$ and $h$, we linearise the equations around some basal level $x=\delta $ of a clockless mutant, which gives
For simplicity assume ${\eta}_{m}^{\mathrm{\prime}}(\delta ,{s}_{b,m})=0$ and that the clockless mutant follows a linear cell size control model with gammadistributed size increments ${\varphi}_{A,m}\sim \text{Gamma}$ with mean $\mathrm{\Delta}$ as in Martins et al., 2018. These assumptions lead to the relations,
Using ${s}_{b,2m}={s}_{b,2m+1}={s}_{d,m}/2$, we can obtain the linearised inheritance matrix model equations for the circadian cell size control model (Appendix 1—figure 6e):
where now ${\xi}_{m}$ is the added size and ${x}_{0,m}$ is the output ${x}_{0,m}={x}_{1,m}+{x}_{2,m}$ of a circadian oscillator governed by
for cell generation $n$, where $\mathit{\varphi}={({\varphi}_{1},{\varphi}_{2})}^{\top}$ are noise terms added to the elements of ${\mathit{x}}_{0}$ and $\widehat{\mathit{\theta}}$ is some complex eigenvalued $2\times 2$ inheritance matrix given by
Following this, we see that the circadian clock is incorporated into this cell size control system in the same way as the kicked cell cycle model outlined in the previous section (Appendix 1  Section A7.1). Using Equation A62 we can write the interdivision time of a cell with index $p$ as
Then taking the vector of cell cycle factors for the mother cell to be ${\mathit{y}}_{m}={({y}_{m,1},{y}_{m,2},{y}_{m,3},{y}_{m,4})}^{\top}={({s}_{b,m},{x}_{1,m},{x}_{2,m},{\xi}_{m})}^{\top}$ and comparing Equation 1a and b with Equation A68 and Equation A71 we obtain
Computing the means using Equation A1 we get
Then using Equation A72 in Equation A5 and Equation A12, we can solve for the means
Then taking ${\stackrel{~}{\mathit{S}}}_{2}=0$ and using Equation A13, we obtain the following for ${\mathit{S}}_{1}$:
where ${\text{cor}}_{ij}$ indicates the correlation between a pair of noise terms ${\varphi}_{i}$ and ${\varphi}_{j}$, and ${\eta}_{i}^{2}=\frac{\mathrm{Var}({\varphi}_{i})}{{\mu}_{\varphi}^{2}}$ for $i,j\in \{1,2,A\}$.
A7.3 Model comparison
We notice that the kicked cell cycle model has three cell cycle factors, while the circadianclockdriven cell size control model has four cell cycle factors. The eigenvalues of the inheritance matrix $\mathit{\theta}$ determining the correlation patterns are
for the kicked cell cycle model and
and for the cell size control model. In both models, either the complex pair of eigenvalues $D{e}^{\pm \frac{2\pi i}{P}}$ produces oscillatory behaviour. The overall correlation patterns are of mixed type, depending on the parameters $\beta $ and $a$.
To compare the models quantitatively, we match their motherdaughter interdivision time correlation coefficient in the absence of clock coupling. For the kicked cell cycle model, we notice that ${\rho}_{(1,0)}=\beta $ in the absence of clock coupling. The cell size control model reduces to the model in Appendix 1  Section A6.1 in the absence of clock coupling, which satisfies ${\rho}_{(1,0)}=\frac{a2}{4}$. Since realistic cell size control mechanisms (TaheriAraghi et al., 2015; Amir, 2014; Tanouchi et al., 2015; Sauls et al., 2016) ($a\in [0,2)$) ranging from sizers ($a=0$) to adders ($a=1$) to timers ($a=2$) imply $\beta \le 0$, we find that the kicked cell cycle obeys a mixed correlation pattern of the alternator/oscillator type while the cell size control model obeys a aperiodic/oscillator pattern.
Focusing on the common adder size control ($a=1$), we find that the regions where the cousinmother inequality is satisfied is remarkably similar in both models when $\beta $ is matched accordingly (Appendix 1—figure 6b and f). The lineage correlation function (red line) oscillates but the crossbranch correlation functions (blue line) alternates for the kicked cell cycle (Appendix 1—figure 6c–d) but not for the cell size control model (Appendix 1—figure 6g–h).
A8. Inference validation using simulated data
To validate the inference results discussed in the main text we simulate interdivision time data using the maximum posterior parameters from the inference on two of the original live imaging datasets, and compare the output and model fit to our original inference.
We take the maximum posterior parameter sets from the original inference on two datasets (Appendix 1—table 2), cyanobacteria and mouse embryonic fibroblasts, and produce simulated interdivision time lineage data in MATLAB using custom scripts and Random Trees (Kaj and Gaigalas, 2022). We chose to look at these two datasets in order to analyse the posterior distribution of the inferred underlying period ${T}_{1}$ to compare to the approximately 24 hr results seen in the main text.
From this simulated data, the correlation coefficients are calculated using the methods outlined in Materials and methods  ‘Data analysis and Bayesian inference of the inheritance matrix model’, and then we look at the model inference on these new, simulated correlations, to compare to the original. These simulations produce correlation patterns that reproduce the experimentally measured correlations (comparing Appendix 1—figure 9a–b with Figure 3a and f).
The posterior distribution of the simulated patterns are the same for the cyanobacteria, exhibiting an 100% oscillator pattern (Appendix 1—figure 9a), matching the fitting to the original dataset (Figure 3a). Mouse embryonic fibroblasts (Appendix 1—figure 9b) loses some of it’s original 100% oscillator pattern (Figure 3f) in favour of an alternator pattern. However, an oscillator pattern is still dominant.
We see that for cyanobacteria (Appendix 1—figure 9c) and mouse embryonic fibroblasts (Appendix 1—figure 9d), the posterior distribution for the inference on the simulated data for the correlation function oscillatory period, ${T}_{1}$ (Appendix 1—figure 9c and d), exhibits a large overlap with the original posterior distribution discussed in Circadian oscillations in cyanobacteria and fibroblastssupport coupling of the circadian clock and the cell cycle (Figure 5e). The difference in the median for these posterior distributions is 0.42 hr for mouse embryonic fibroblasts (Appendix 1—figure 9d) and just 0.11 h for cyanobacteria (Appendix 1—figure 9c). This result validates our analysis of these posterior distributions showing that the period that we reconstruct from the simulated correlation patterns is consistent with the original data.
Data availability
The current manuscript is a computational study, so no data have been generated for this manuscript. Modelling code is uploaded to GitHub https://github.com/fernhughes/Lineagetreecorrelationpatterninference (copy archived at swh:1:rev:dc69bbce5ce909813d7d4356c9fd2da045e02c79).

Apollo  University of Cambridge RepositoryResearch data supporting Cell size control driven by the circadian clock and environment in cyanobacteria.https://doi.org/10.17863/CAM.31834
References

A new look at the statistical model identificationIEEE Transactions on Automatic Control 19:716–723.https://doi.org/10.1109/TAC.1974.1100705

Cell size regulation in bacteriaPhysical Review Letters 112:208102.https://doi.org/10.1103/PhysRevLett.112.208102

Circadian clock effects on cellular proliferation: insights from theory and experimentsCurrent Opinion in Cell Biology 67:17–26.https://doi.org/10.1016/j.ceb.2020.07.003

Stochastic timing in gene expression for simple regulatory strategiesNucleic Acids Research 45:1069–1078.https://doi.org/10.1093/nar/gkw1235

Circadian rhythms in cyanobacteriaMicrobiology and Molecular Biology Reviews 79:373–385.https://doi.org/10.1128/MMBR.0003615

Chromosome replication and the division cycle of Escherichia coli B/rJournal of Molecular Biology 31:519–540.https://doi.org/10.1016/00222836(68)904257

The continuum model: statistical implicationsJournal of Theoretical Biology 94:783–800.https://doi.org/10.1016/00225193(82)900789

Accelerating live singlecell signalling studiesTrends in Biotechnology 35:422–433.https://doi.org/10.1016/j.tibtech.2017.01.002

Cell heterogeneity during the cell cycleJournal of Cellular Physiology 113:465–474.https://doi.org/10.1002/jcp.1041130316

Stochastic gene expression in a single cellScience 297:1183–1186.https://doi.org/10.1126/science.1070919

TimeLapse microscopy approaches to track cell cycle and lineage progression at the singlecell levelCurrent Protocols in Cytometry Chapter 12:Unit12.https://doi.org/10.1002/0471142956.cy1204s64

Harmonics of circadian gene transcription in mammalsPLOS Genetics 5:e1000442.https://doi.org/10.1371/journal.pgen.1000442

SoftwareLineagetreecorrelationpatterninference, version swh:1:rev:dc69bbce5ce909813d7d4356c9fd2da045e02c79Software Heritage.

Bridging the timescales of singlecell and population dynamicsPhysical Review X 8:021007.https://doi.org/10.1103/PhysRevX.8.021007

Using movies to analyse gene circuit dynamics in single cellsNature Reviews. Microbiology 7:383–392.https://doi.org/10.1038/nrmicro2056

Microbial individuality: how singlecell heterogeneity enables population level strategiesCurrent Opinion in Microbiology 24:104–112.https://doi.org/10.1016/j.mib.2015.01.003

Frequency doubling in the cyanobacterial circadian clockMolecular Systems Biology 12:896.https://doi.org/10.15252/msb.20167087

Cell cycle proliferation decisions: the impact of single cell analysesThe FEBS Journal 284:362–375.https://doi.org/10.1111/febs.13898

Circadian gene expression in cultured cellsMethods in Enzymology 393:543–557.https://doi.org/10.1016/S00766879(05)930280

Growth rate and generation time of bacteria, with special reference to continuous cultureJournal of General Microbiology 15:492–511.https://doi.org/10.1099/00221287153492

Mycobacteria modify their cell size control under suboptimal carbon sourcesFrontiers in Cell and Developmental Biology 5:64.https://doi.org/10.3389/fcell.2017.00064

Identifiability of parameters in MCMC Bayesian inference of phylogenySystematic Biology 51:754–760.https://doi.org/10.1080/10635150290102429

Joining forces of Bayesian and frequentist methodology: a study for inference in the presence of nonidentifiabilityPhilosophical Transactions. Series A, Mathematical, Physical, and Engineering Sciences 371:20110544.https://doi.org/10.1098/rsta.2011.0544

Adder and a coarsegrained approach to cell size homeostasis in bacteriaCurrent Opinion in Cell Biology 38:38–44.https://doi.org/10.1016/j.ceb.2016.02.004

Circadian clock, cell division, and cancer: from molecules to organismInternational Journal of Molecular Sciences 18:E873.https://doi.org/10.3390/ijms18040873

WebsiteMamba.jl: markov chain monte carlo (MCMC) for bayesian analysis in juliaAccessed October 20, 2018.

Additive models for dependent cell populationsJournal of Theoretical Biology 109:127–146.https://doi.org/10.1016/s00225193(84)801150

A bifurcating autoregression model for cell lineages with variable generation meansJournal of Theoretical Biology 156:183–195.https://doi.org/10.1016/s00225193(05)806721

CellSize control and homeostasis in bacteriaCurrent Biology 25:385–391.https://doi.org/10.1016/j.cub.2014.12.009

Transcriptional architecture of the mammalian circadian clockNature Reviews. Genetics 18:164–179.https://doi.org/10.1038/nrg.2016.150

Signatures of nonlinearity in single cell noiseinduced oscillationsJournal of Theoretical Biology 335:222–234.https://doi.org/10.1016/j.jtbi.2013.06.021

Making sense of snapshot data: ergodic principle for clonal cell populationsJournal of the Royal Society, Interface 14:20170467.https://doi.org/10.1098/rsif.2017.0467

The chemical basis of morphogenesisBulletin of Mathematical Biology 52:153–197.https://doi.org/10.1016/S00928240(05)800084

Automated deep lineage tree analysis using a bayesian single cell tracking approachFrontiers in Computer Science 3:734559.https://doi.org/10.3389/fcomp.2021.734559

Identification of genes periodically expressed in the human cell cycle and their expression in tumorsMolecular Biology of the Cell 13:1977–2000.https://doi.org/10.1091/mbc.02020030
Decision letter

Arvind MuruganReviewing Editor; University of Chicago, United States

Aleksandra M WalczakSenior Editor; CNRS LPENS, France

Michael J RustReviewer; University of Chicago, United States

Farshid JafarpourReviewer; Utrecht University, Netherlands
Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.
Decision letter after peer review:
Thank you for submitting your article "Patterns of interdivision time correlations reveal hidden cell cycle factors" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Aleksandra Walczak as the Senior Editor. The following individuals involved in the review of your submission have agreed to reveal their identity: Michael J Rust (Reviewer #1); Farshid Jafarpour (Reviewer #2).
The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.
Essential revisions:
1) As suggested by reviewer 1, please include mechanistic details for circadian rhythm models (e.g., showing the inheritance matrix). Also, explain why whether leading order approximations are valid for oscillatory models as suggested by the same reviewer.
2) As suggested by reviewer 2, more intuition is needed for some of the results. E.g., intuition why only eigenvalues of the inheritance matrix matter + intuition for the alternator pattern.
Reviewer #1 (Recommendations for the authors):
Concretely, while section 4 of the supplement shows how some simple mechanistic models map onto this formalism, none of these include an explicit circadian rhythm. Since the authors did develop a model like this in Martins et al., it would be valuable to show what the inheritance matrix, etc. are for this kind of model, and indeed whether going to leading order in fluctuations works satisfactorily.
Reviewer #2 (Recommendations for the authors):
The paper is wellwritten, scientifically sound, and easy to follow. I only have a few minor suggestions/comments:
1) It is not clear what the variance in Equation 4 is. Is it the variance of generation times across a single lineage or across the whole tree or the population variance? If I understand it correctly from the derivation in SI, it should be a lineage variance, which may not be immediately obvious given that rho is a tree variable. So it would be nice to specify which one it is.
2) I find it very surprising that only the eigenvalues of the inheritance matrix determine the correlation patterns, while all the other parameters, including how noises are correlated (anticorrelated) and whether growth factors have a positive or negative effect on the interdivision times, are irrelevant. It would be nice if the authors could provide an intuition for why that is the case.
3) The authors have used the AIC method for the goodness of fit. Initially, without being familiar with the method, I found it surprising that in the case of mycobacteria, a more general model could have a worse fit than its special case. I think a onesentence explanation of this method could prevent such potential confusion for readers not familiar with this method.
4) What is the intuition behind the alternator pattern, i.e. why is there a period 2 oscillation in a model with real eigenvalues?
5) In the model, the interdivision time is a deterministic function of the cellcycle factors (Equations 1 and 3a). The noise is only allowed in the inheritance of these factors, not in the division itself. This puts a constraint on what variables of a given model can be used as cellcycle factors. I think this would be worth mentioning in the paper.
https://doi.org/10.7554/eLife.80927.sa1Author response
Essential revisions:
1) As suggested by reviewer 1, please include mechanistic details for circadian rhythm models (e.g., showing the inheritance matrix). Also, explain why whether leading order approximations are valid for oscillatory models as suggested by the same reviewer.
2) As suggested by reviewer 2, more intuition is needed for some of the results. E.g., intuition why only eigenvalues of the inheritance matrix matter + intuition for the alternator pattern.
We have addressed all reviewers' comments, which improved the manuscript significantly. We made the following changes:
a) We included a new section in the Appendix entitled “A7 Models of circadianclockdriven correlation patterns”, where we derive the inheritance matrix of the kicked cell cycle model and the model of Martins et al. 2018 sidebyside. We also produced the new Appendix 1 – Figure A6 to compare the resulting correlation patterns.
b) We include a new section in the Appendix entitled “A2 Beyond the small noise approximation: cell cycle factor complexes account for nonlinear fluctuations”, where we derive the effect of nonlinearity on the resulting correlation patterns analytically. Such features can be described through extra cell cycle factors called complexes. However, additional complexes were not necessary to fit the present data.
c) We present a new formula, Equation 4, for the lineage tree correlation function highlighting the intuition of how the eigenvalues determine the correlation patterns while other factors determine their relative weights. We also included a new section in the Appendix entitled “A5 Solution of the tree correlation function and parameter identifiability for simple inheritance rules”, where we explain intuitively why the interdivision time correlation pattern carries no information on whether the growth factor increases or decreases growth.
d) We now clarify the meaning of our variance estimators and show that different estimators produce similar results. This is summarised in Appendix – Table A3.
Reviewer #1 (Recommendations for the authors):
Concretely, while section 4 of the supplement shows how some simple mechanistic models map onto this formalism, none of these include an explicit circadian rhythm. Since the authors did develop a model like this in Martins et al., it would be valuable to show what the inheritance matrix, etc. are for this kind of model, and indeed whether going to leading order in fluctuations works satisfactorily.
Thank you for this inspiring comment. We have included new detailed analyses of modelling circadian clockdriven correlation patterns in Appendix 1 – Section A7. We include and compare both the kicked cell cycle model and the model from the Martins et al. 2018 PNAS paper; the latter models the circadian modulation of division rate and cell size control. We have derived the model's inheritance matrix and characterised its eigenvalues regarding cell size control parameters and the coupling to the circadian clock.
Within the IMM framework, the kicked cell cycle model and the Martins et al. 2018 model reduce to inheritance matrice with three and four cell cycle factors, respectively. The two models produce different correlation patterns corresponding to the oscillator and oscillatoralternator mixed patterns, respectively. Nevertheless, we find that only the two circadian factors are necessary to fit the experimental data with oscillatory patterns.
We understand that the divisioncoupling function described in Martins et al. 2018 is sharply peaked, while oscillations of cell cycle factors produced in the IMM are sinusoidal. Going beyond leading order fluctuations could account for such nonlinear oscillations. We now show that such nonlinearity can be modelled through extra cell cycle factors akin to cell cycle factor complexes. In our inference, however, we found that two cell cycle factors were sufficient to fit all circadian datasets. We here applied Occam's razor through Akaike’s Information Criterion, preferring a simple model over a more complex one when both fit the data (see also reply to comment 3 of Reviewer 2). Thus the circadian component of these oscillations is enough to characterise the interdivision time data.
Of course, the model by Martins et al. 2018 is more detailed as it includes cell size information. Including extra cell cycle factors, such as cell size, could significantly improve identifying the mechanisms generating these correlation patterns, but this is beyond the scope of our manuscript.
We realise there are some limitations to the linear model, and including nonlinear interactions via complexes could improve model identification. We now include the following paragraph in the discussion (lines 759773):
“In principle, increasing the number of interacting cell cycle factors can lead to more complex composite patterns that involve combinations of the three patterns discussed in this paper, such as the alternatoroscillator (Appendix 1 – Figure A6c, d), aperiodicoscillator (Appendix 1 – FigureA6g,h), or birhythmic correlation patterns. Such composite patterns could also arise as the result of nonlinear fluctuations that, within our framework, can be described by adding complexes of cell cycle factors to the inheritance matrix model (Appendix 1 – Section A2). The presence of such complexes induces higherorder harmonics in the correlation oscillations, similar to those observed in the cyanobacterial and mammalian circadian clock [12, 63], and detecting such complexes could provide an alternative route to increase the sensitivity of our inference method.”
Reviewer #2 (Recommendations for the authors):
The paper is wellwritten, scientifically sound, and easy to follow. I only have a few minor suggestions/comments:
1) It is not clear what the variance in Equation 4 is. Is it the variance of generation times across a single lineage or across the whole tree or the population variance? If I understand it correctly from the derivation in SI, it should be a lineage variance, which may not be immediately obvious given that rho is a tree variable. So it would be nice to specify which one it is.
Thank you for pointing this out. The variance derived in the Appendix, now Equation M2 of the main text, is indeed the lineage variance that is also equal to the variance from a tree where all lineages have the same number of generations. However, in the data analysis, we compute the variance over all cells used to compute the correlation coefficients, which equals the variance across the whole tree. We are aware that this approximation can introduce bias, and we made several changes to address this:
a) We now clarify in Section Methods D how the variance is computed from the data (line 872):
“Note that ŝ_{τ} is computed across all interdivision times used to calculate the correlation coefficients in each dataset.”
b) We now state clearly in Section Methods D the meaning of the variance in Equation M2 (line894):
“Note that (M2) is the interdivision time variance from a tree where all lineages have the same number of generations, which approximates the variance across all cells in the observed trees.”
c) We found that the various bias introduced was negligible for the data analysed. To quantify the variance bias introduced, we have added Appendix 1 – Table A3 that compares the tree variance statistics used in our inference to i) the variance estimator obtained from trees with equal generations per lineage, thus ignoring parts of the data similar to references [24] and [27] ; and ii) the variance computed from forwardlineageweighting of the tree data, similar to what is done in references [22] and [66] of the main text. This suggests that the three estimators cannot be distinguished within the 95% confidence intervals for all six datasets.
d) We understand that this approximation has limitations. We now clarify how potential biases would affect our parameter inference and the identified correlation patterns. We found that a potential bias in the variance could lead to a biased estimate of the noise matrices S1,2 since multiplying the latter by a constant increases the variance by the same amount. The eigenvalues and the identified correlation patterns remain unaffected, as well as the conclusions of our work. Of course, our theory cannot predict this constant and an extended theory would be required to quantify such biases. This is beyond the scope of this manuscript, but we hope to address this in future work.
In summary, the identified correlation patterns remain unaffected, as well as the conclusions of our work. We have added Appendix 1 – Table A3:
2) I find it very surprising that only the eigenvalues of the inheritance matrix determine the correlation patterns, while all the other parameters, including how noises are correlated (anticorrelated) and whether growth factors have a positive or negative effect on the interdivision times, are irrelevant. It would be nice if the authors could provide an intuition for why that is the case.
Thank you for this comment. To provide an intuition as to how the correlation patterns depend on parameters, we have added a new formula that shows the lineage correlation function to be a weighted sum of powers of eigenvalues. This shows that the behaviour of the tree correlation as we change k, l corresponding to a pattern depends only on the eigenvalues, but the noise matrices ^{S}1,2 modulate the weights, i.e., they can amplify or attenuate a correlation pattern.
This is now clarified in Results B.
It is an interesting question whether the correlation patterns of growth factors that have positive effects on the interdivision times differ from those that include factors with negative effects. we have included an entire section in the Appendix addressing this issue (Appendix Section A5). There we provide an analytical solution of the effect of a growth factor on the interdivision time. The correlation patterns indeed coincide for both growthenhancing and attenuating cell cycle factors. The intuition behind this effect is that the pattern only measures the noise. Thus we cannot tell apart the effect of an attenuating factor, from a growthenhancing factor.
Mathematically, we rescale all cell cycle factors such that each has a positive contribution to the noise pattern. Since we only deal with the rescaled cell cycle factors, we avoided this kind of unidentifiability. Of course, the issue only arises if the cell cycle factor was hidden and simultaneous measurement of cell cycle factor correlations could resolve whether such factors are growthenhancing and attenuating.
3) The authors have used the AIC method for the goodness of fit. Initially, without being familiar with the method, I found it surprising that in the case of mycobacteria, a more general model could have a worse fit than its special case. I think a onesentence explanation of this method could prevent such potential confusion for readers not familiar with this method.
To clarify the use of AIC we have added the following text to Results section D where we compare models for the 6 datasets (lines 349360):
“We quantified the quality of our fits using the Akaike information criterion (AIC) (Methods D, (D2)) for each dataset and compared these to the onedimensional model (Appendix 1 – Table A1). The AIC estimates the goodness of fit with a penalty for model complexity allowing us to select the simplest model that explains the data. The AIC values indicate that the inheritance matrix model with two cell cycle factors provides the simplest fit for all cell types used here, except for the mycobacteria data where simple inheritance rules provided an equally good fit with a significant reduction in the number of model parameters.”
Additionally we have added the following to the end of Methods D which gives the formula for the AIC.
During revisions we noted that the AIC values were erroneously stated for k=2 while indeed we used k=3 due to S_{2} ≠ 0 taking into account the sistersister correlations. As a result, the 1D AIC reported in Table S1 increased slightly (by a value of 2). However this does not change the model selection or any other conclusion.
4) What is the intuition behind the alternator pattern, i.e. why is there a period 2 oscillation in a model with real eigenvalues?
We added some explanations to provide more intuition for the alternator pattern. The corresponding text reads now (lines 250257):
“.. the alternator pattern generates oscillations with a fixed period of two generations in the lineage correlation function. The behaviour is typically observed for cell cycle factors with negative motherdaughter correlations (Appendix 1 – Section S6 a). In this case, we have at least one negative eigenvalue and thus (4) will alternate between positive and negative values for successive generations, producing the period two oscillation.”
We hope that our reply to point (2) helps to clarify this further.
(5) In the model, the interdivision time is a deterministic function of the cellcycle factors (Equation 1 and 3a). The noise is only allowed in the inheritance of these factors, not in the division itself. This puts a constraint on what variables of a given model can be used as cellcycle factors. I think this would be worth mentioning in the paper.
Noise added on ‘after’ division would be equivalent to adding an additional cell cycle factor y_{i} with corresponding elements of the θ matrix θ_{ij} = θ_{ji} = 0 where j is the index of the other factors. Therefore this additional noise is already encapsulated in the IMM framework.
In the main text, Results A, we have added the following sentence to clarify this (lines 148151):
“Note that we choose (1a) to be deterministic since division noise can be modelled by adding one more cell cycle factor that does not affect inheritance dynamics g.”
https://doi.org/10.7554/eLife.80927.sa2Article and author information
Author details
Funding
EPSRC Centre for Mathematics of Precision Healthcare (EP/N014529/1)
 Fern A Hughes
MRC London Institute of Medical Sciences (MCA6585TY60)
 Alexis R Barr
UKRI Future Leaders Fellowship (MR/T018429/1)
 Philipp Thomas
Cancer Research UK (Career Development Fellowship C63833/A25729)
 Alexis R Barr
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We thank Bruno Martins, Dimitris Volteras and Paul Piho for their comments on the manuscript. This work has been supported by a scholarship to FAH provided by the EPSRC Centre for Mathematics of Precision Healthcare (EP/N014529/1) and MRC core funding to the London Institute of Medical Sciences (MCA6585TY60). ARB is funded by a CRUK Career Development Fellowship (C63833/A25729). PT is funded by a UKRI Future Leaders Fellowship (MR/T018429/1).
Senior Editor
 Aleksandra M Walczak, CNRS LPENS, France
Reviewing Editor
 Arvind Murugan, University of Chicago, United States
Reviewers
 Michael J Rust, University of Chicago, United States
 Farshid Jafarpour, Utrecht University, Netherlands
Version history
 Received: June 9, 2022
 Preprint posted: June 28, 2022 (view preprint)
 Accepted: November 14, 2022
 Accepted Manuscript published: November 15, 2022 (version 1)
 Version of Record published: January 6, 2023 (version 2)
Copyright
© 2022, Hughes et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 1,056
 Page views

 181
 Downloads

 0
 Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Biochemistry and Chemical Biology
 Cell Biology
The ATPase p97 (also known as VCP, Cdc48) has crucial functions in a variety of important cellular processes such as protein quality control, organellar homeostasis, and DNA damage repair, and its deregulation is linked to neuromuscular diseases and cancer. p97 is tightly controlled by numerous regulatory cofactors, but the full range and function of the p97–cofactor network is unknown. Here, we identify the hitherto uncharacterized FAM104 proteins as a conserved family of p97 interactors. The two human family members VCP nuclear cofactor family member 1 and 2 (VCF1/2) bind p97 directly via a novel, alphahelical motif and associate with p97UFD1NPL4 and p97UBXN2B complexes in cells. VCF1/2 localize to the nucleus and promote the nuclear import of p97. Loss of VCF1/2 results in reduced nuclear p97 levels, slow growth, and hypersensitivity to chemical inhibition of p97 in the absence and presence of DNA damage, suggesting that FAM104 proteins are critical regulators of nuclear p97 functions.

 Cell Biology
 Neuroscience
The amyloid beta (Aβ) plaques found in Alzheimer’s disease (AD) patients’ brains contain collagens and are embedded extracellularly. Several collagens have been proposed to influence Aβ aggregate formation, yet their role in clearance is unknown. To investigate the potential role of collagens in forming and clearance of extracellular aggregates in vivo, we created a transgenic Caenorhabditis elegans strain that expresses and secretes human Aβ_{142}. This secreted Aβ forms aggregates in two distinct places within the extracellular matrix. In a screen for extracellular human Aβ aggregation regulators, we identified different collagens to ameliorate or potentiate Aβ aggregation. We show that a disintegrin and metalloprotease a disintegrin and metalloprotease 2 (ADM2), an ortholog of ADAM9, reduces the load of extracellular Aβ aggregates. ADM2 is required and sufficient to remove the extracellular Aβ aggregates. Thus, we provide in vivo evidence of collagens essential for aggregate formation and metalloprotease participating in extracellular Aβ aggregate removal.