Patterns of interdivision time correlations reveal hidden cell cycle factors
Abstract
The time taken for cells to complete a round of cell division is a stochastic process controlled, in part, by intracellular factors. These factors can be inherited across cellular generations which gives rise to, often non-intuitive, correlation patterns in cell cycle timing between cells of different family relationships on lineage trees. Here, we formulate a framework of hidden inherited factors affecting the cell cycle that unifies known cell cycle control models and reveals three distinct interdivision time correlation patterns: aperiodic, alternator, and oscillator. We use Bayesian inference with single-cell datasets of cell division in bacteria, mammalian and cancer cells, to identify the inheritance motifs that underlie these datasets. From our inference, we find that interdivision time correlation patterns do not identify a single cell cycle model but generally admit a broad posterior distribution of possible mechanisms. Despite this unidentifiability, we observe that the inferred patterns reveal interpretable inheritance dynamics and hidden rhythmicity of cell cycle factors. This reveals that cell cycle factors are commonly driven by circadian rhythms, but their period may differ in cancer. Our quantitative analysis thus reveals that correlation patterns are an emergent phenomenon that impact cell proliferation and these patterns may be altered in disease.
Editor's evaluation
This work makes an important contribution to the study of the cell cycle and inferring mechanisms by studying correlations in division timing between single cells. By treating the problem in a general way and computing over lineage trees, the authors can infer timescales in the underlying mechanism. The method is validated on data sets from bacterial and mammalian cells and can suggest when additional measurements are needed to distinguish competing models.
https://doi.org/10.7554/eLife.80927.sa0Introduction
Cell proliferation, the process of repeated rounds of DNA replication and cell division, is driven by multiple cell extrinsic and intrinsic factors (Matson and Cook, 2017; Darzynkiewicz et al., 1982). Stochasticity in any or all of these factors therefore influences the time taken for a cell to divide, generating heterogeneity in cell cycle length, even in genetically identical populations. For example, stochastic gene expression (Elowitz et al., 2002) can lead to heterogeneity in cell cycle length (Kiviet et al., 2014; Ghusinga et al., 2016; Thomas et al., 2018) as these fluctuations can be propagated by concerted cellular cues (Co et al., 2017). These cues can exhibit reproducible stochastic patterns that are important in development, homeostasis and ultimately, for cell survival (Raser and O’Shea, 2005).
Single-cell technologies illuminate a world of cellular variation by replacing bulk-average information with single-cell distributions. A key challenge is to exploit cell-to-cell variability to identify the mechanisms of cellular regulation and responses (Raser and O’Shea, 2005; Martins and Locke, 2015). Time-lapse microscopy allows us to resolve cell dynamics such as division timing, growth and protein expression (Locke and Elowitz, 2009, Figure 1a, left). This has led to many discoveries in cell cycle dynamics in bacteria (Taheri-Araghi et al., 2015; Martins et al., 2016; Martins et al., 2018; Kohram et al., 2021) and mammalian cells (Barr et al., 2017; Arora et al., 2017; Ryl et al., 2017; Chakrabarti and Michor, 2020). Early advances included measuring the distribution of division times across single cells (Powell, 1956) and the correlations between cellular variables leading to cell size homeostasis (Taheri-Araghi et al., 2015), while more recent applications of time-lapse microscopy have captured multiple generations of proliferating cells, making lineage tracing possible (Errington et al., 2013; Cooper and Bakal, 2017).
While single-cell distributions measure variation between cellular variables, they ignore both temporal signals and variations propagating across generations to entire lineage trees (Ulicna et al., 2021; Kuchen et al., 2020; Sandler et al., 2015; Cowan and Staudte, 1986). These lineage tree correlation patterns can be robust and steady, similar to what is known in spatio-temporal pattern formation (Turing, 1990; Chaplain et al., 1999). Common examples of lineage tree correlation patterns concern the mother-daughter and the sister correlations that have been used to study cell size homeostasis in E. coli (Taheri-Araghi et al., 2015; Ho et al., 2018) and other mechanisms generating correlated interdivision times such as population growth rate (Powell, 1956) and initiation of DNA synthesis (Cooper, 1982).
A counter-intuitive correlation pattern presented by many cell types is the ‘cousin-mother inequality’ (Sandler et al., 2015), where the interdivision times of cousin cells are more correlated than those of mother-daughter pairs. This inequality can be observed both in bacteria and mammalian cells (Figure 1b). More generally, lineage tree data gives rise to correlation patterns by comparing a single cell to any other cell on the tree (Figure 1a, right). Family relations – such as daughter, grandmother, cousin cells etc. – encode inheritance patterns, and correlations between these related cells have been used to understand the dynamics of cell populations (Mohammadi et al., 2021; Nakashima et al., 2020, Figure 1c). Several stochastic models have been proposed to explain interdivision time correlation patterns. Most of them make prior assumptions on the underlying mechanism controlling cell division such as those focusing on cell size control (Ho et al., 2018), DNA replication (Cooper and Helmstetter, 1968; Cooper, 1982) or underlying oscillators (Kohram et al., 2021). For example, inheritance of DNA content can explain the correlation in interdivision time between sister cells in bacteria (Cooper, 1982). Similarly, it has been shown that a simple model with interdivision time correlations (Cowan and Staudte, 1986) cannot satisfy the ‘cousin-mother inequality’ (Sandler et al., 2015), but a more complex kicked cell cycle model does (Mosheiff et al., 2018). It is presently unclear what information correlation patterns carry about the underlying mechanisms that generate them. This is because a unified and systematic framework to generate any desired interdivision time correlation pattern is lacking.
Here, we propose a stochastic model to investigate how cell cycle factors – which we define in this work as hidden properties that affect interdivision time – shape the lineage tree correlation patterns of cells. These could include physiological factors, such as cell size, growth rate and cell cycle checkpoints, or specific cell cycle drivers such as CDKs, mitogens and division proteins. We will only focus on data describing patterns of interdivision time in bacterial and mammalian cell types, which circumvents intricate measurements of cell volume, mass, and DNA replication. This also avoids dealing with fluorescent reporter strains that may be difficult to engineer depending on cell type. We propose a generative model of correlation patterns that involves a number of hidden cell cycle factors and reduces to common mechanistic cell cycle models for specific parameter choices. Our theory predicts three distinct lineage correlation patterns; aperiodic, alternator and oscillator. We demonstrate how the model can be used to identify these patterns using Bayesian inference in bacteria and mammalian cells. Our analysis reveals several dynamical signatures of cell cycle factors hidden in lineage tree interdivision time data.
Results
A general inheritance matrix model provides a unified framework for lineage tree correlation patterns
Previous studies (Cowan and Staudte, 1986; Sandler et al., 2015) found that simple inheritance rules, where interdivision times are correlated from one generation to another through a single parameter, cannot explain the lineage correlation patterns seen in experimental single-cell data. To address this issue, we propose a unified framework where the interdivision time is determined by a number of cell cycle factors that represent hidden variables such as cell cycle phase lengths, protein levels, cell growth rate or other unknowns (Figure 1c), that each have their own inheritance pattern.
The states of the cell cycle factors is assumed to be a vector that determine the interdivision time of a cell with index via
Inheritance from mother to daughter of the cell cycle factors is described by a nonlinear stochastic Markov model on a lineage tree:
where denotes the mother cell index and and the daughter cell indices. and are possibly nonlinear functions that model the dependence of the interdivision time on cell cycle factors and the inheritance process. is a noise vector for which the pair are identically distributed random vectors with covariance matrix independent of . A non-zero covariance between these noise vectors can account for correlated noise of sister cells. We implicitly assume symmetric cell division such that the deterministic part of the inheritance dynamics is identical between the daughter cells. Note that we choose Equation 1a to be deterministic since division noise can be modelled by adding one more cell cycle factor that does not affect inheritance dynamics .
The general model (Equation 1a and b) includes many known cell cycle models as a special case. For example, the interactions between cell cycle factors could model cell size control mechanisms (Appendix 1 - Section A6.1), the coordination of cell cycle phases (Appendix 1 - Section A6.3), or deterministic cues, such as periodic forcing of the cell cycle (Appendix 1 - Section A7.1), or coupling of the circadian clock to cell size control (Appendix 1 - Section A7.2).
The full model can only be solved for specific choices of and , and these functions are generally unknown in inference problems. To overcome this limitation, we assume small fluctuations resulting in an approximate linear stochastic system (see Appendix 1 - Section A1 for a derivation) involving the interdivision time
The vector of cell cycle factor fluctuations obeys
Here, is the stationary mean interdivision time, is the inheritance matrix and and are two noise vectors of length that capture the stochasticity of inheritance dynamics and differentiate the sister cells (Figure 2a). We denote the covariance matrices and of the noise terms (and ) in individual cells and between sister cells, respectively. The noise terms are independent for all other family relations. The cell cycle factor fluctuations are scaled such that is a binary vector of length made up of 1 s and 0 s depending on whether the function determining the interdivision time has dependence on a given cell cycle factor (see Appendix 1 - Section A1 for details). Under this scaling each cell cycle factor has a positive effect on the interdivision time, and hence we do not distinguish between factors with positive or negative effects on interdivision time.
When the special case of a single cell cycle factor () is considered, the inheritance matrix model system reduces to a well-known model with correlated division times (Cowan and Staudte, 1986; Staudte et al., 1984; Staudte, 1992; Staudte et al., 1997), and we will refer to this case as simple inheritance rules (see also Appendix 1 - Section A5). In the following, we will explore the correlation patterns generated by multiple cell cycle factors.
The inheritance matrix model reveals three distinct interdivision time correlation patterns
Here, we define a correlation pattern to be the correlation coefficients of pairs of cells on a lineage tree. Here we introduce a function which we call the generalised tree correlation function:
where and are the interdivision times of cells in the pair , and is the interdivision time variance. The coordinate describes the distance in generations from each cell in the pair to their shared nearest common ancestor (Figure 2b and c). We have derived a closed-form formula for (Equation M3) in Materials and Methods - ‘Analytical solution of the inheritance matrix model’; (see Appendix 1 - Section A3 for a full derivation) as a weighted sum of powers of the inheritance matrix eigenvalues :
with
We observe that the eigenvalues determine the dependence of the tree correlation function on and , while the noise matrices and determine their relative weights (see Equation 5).
Our theoretical analysis reveals three distinct correlation patterns that can be generated by the inheritance matrix model (further details in Materials and methods - ‘Analysis of tree correlation patterns’). These can be classified by the eigenvalues of the inheritance matrix : (i) if the inheritance matrix exhibits real positive eigenvalues, we observe an aperiodic pattern (Figure 2d); (ii) if the inheritance matrix has real eigenvalues with at least one negative eigenvalue, we observe an alternator pattern (Figure 2e); and (iii) if there is a pair of complex eigenvalues we observe an oscillator pattern (Figure 2f). An intuitive interpretation of the eigenvalue decomposition is that it transforms the cell cycle factors into effective factors inherited independently. Hence, the inheritance matrix is diagonal in this basis. However, the analogy is limited to the case where the inheritance matrix is symmetric and the eigenvalues are real. For simplicity, we will focus on models with two cell cycle factors and note that in higher dimensions (), the correlation patterns involve a mixture of the three patterns discussed in detail in this section (Appendix 1—figure 6c, d and g,h).
To demonstrate the aperiodic correlation pattern, we utilise an inheritance matrix with positive real eigenvalues (Figure 2d). Characteristically, the modelled interdivision time correlations decay to zero as the distance to the most recent ancestor increases (Figure 2g) since the eigenvalues in Equation 4 are bounded between 0 and 1. To look more closely at the patterns on the tree, we utilise two reductions of the generalised tree correlation function. These are the lineage correlation function ( for or ) and the cross-branch correlation function ( for ). We look at these functions for continuous to visualise better the patterns that occur down the lineage and across the branches of the tree. The lineage correlation function gives the correlation dynamics as you go down the lineage tree, whereas the cross-branch correlation function gives the correlation dynamics as you move across neighbouring branches of the lineage tree. We observe that the interdivision time correlations decrease as we move both across generations and branches (Figure 2j).
In contrast, the alternator pattern generates oscillations with a fixed period of two generations in the lineage correlation function. The behaviour is typically observed for cell cycle factors with negative mother-daughter correlations (Appendix 1 - Section A6.1). In this case, we have at least one negative eigenvalue and thus Equation 4 will alternate between positive and negative values for successive generations, producing the period two oscillation. We demonstrate this correlation pattern for the generalised tree correlation function (Figure 2h) using a diagonal matrix (Figure 2e). We observe alternating correlations across generations in the lineage correlation function, and the continuous interpolation of the cross-branch correlation function (Figure 2k). Although the period is fixed to two generations, the amplitude of the correlation oscillation varies with the absolute magnitude of the eigenvalues (Materials and methods - ‘Analysis of tree correlation patterns’).
To investigate the oscillator correlation pattern, we propose a hypothetical inheritance matrix with eigenvalues which are complex for and (Figure 2f). The parameters and control the period and the respective damping of an underlying oscillator, i.e., the limit leads to an undamped oscillation and corresponds to an overdamped oscillation (see Materials and methods - ‘Determining the period of correlation oscillations from the eigenvalues’ for details). Correspondingly, the graph of the generalised tree correlation function (Figure 2i) shows clear oscillations across generations. These correlation oscillations are also evident in the lineage correlation function but are absent in the cross-branch correlation function (Figure 2l). However, oscillations are possible in the cross branch correlation function for other choices of with complex eigenvalues (see model fits in Figure 3 and Methods - ‘Analysis of tree correlation patterns‘). In summary, the qualitative behaviour of the interdivision time correlation patterns can be studied using the eigenvalue decomposition of the inheritance matrix .
The cousin-mother inequality is not required to generate complex correlation patterns
Our analysis shows that of the three specified patterns, only the oscillator pattern cannot arise from simple inheritance rules. This is because it requires at least two inherited cell cycle factors () for the inheritance matrix to possess complex eigenvalues. We therefore asked whether the oscillator pattern is necessary for the cousin-mother inequality to be satisfied. We find that this is not the case, but instead, all three correlation patterns can be compatible with the cousin-mother inequality if . To demonstrate this, we choose three specific two-dimensional inheritance matrices that produce the required eigenvalue structure (Figure 2d–f). We then use these matrices with our analytical solution for the generalised tree correlation function (Materials and methods - ‘Analytical solution of the inheritance matrix model’) to map the regions where the cousin-mother inequality can be satisfied (Figure 2m–o). Interestingly, we find that oscillations can arise even in parameter regions that violate the cousin-mother inequality (Figure 2o). We conclude that both the cousin-mother inequality and the oscillator pattern are sufficient but not necessary conditions to rule out simple inheritance rules.
To understand which datasets can be explained by simple inheritance rules, we fit the one-dimensional model () to six publicly available lineage tree datasets (Appendix 1—table 1) using Bayesian methods (Materials and methods - ‘Data analysis and Bayesian inference of the inheritance matrix model’). These datasets were chosen as they each had a sufficient number of cells for correlation analysis and covered a broad range of cell types. We found that the model fit is poor for the datasets that display the cousin-mother inequality, which is the case for cyanobacteria, clock-deleted cyanobacteria, neuroblastoma and human colorectal cancer cells (Appendix 1—figure 1a–f). Despite not obeying the cousin-mother inequality, the fit is also poor for mouse embryonic fibroblasts (Appendix 1—figure 1f) as the median inferred correlation lies outside the 95% confidence intervals for both the grandmother and cousin correlations which are included in the model fit, and the confidence intervals for the data vs the credible intervals from the inference show minimal overlap (Appendix 1—figure 2f). Another inequality may be violated in this dataset that cannot be explained using the one-dimensional model, suggesting that the absence of the cousin-mother inequality cannot rule out more complex division rules. The only cell type that has a good fit for the one-dimensional model is mycobacteria (Appendix 1—figure 1c). We thus conclude that the majority of the datasets must be described by higher dimensional inheritance dynamics of multiple cell cycle factors.
The two-dimensional inheritance matrix model fits interdivision time correlation patterns from a range of cell types
We asked whether the correlation patterns are better described by a two-dimensional inheritance matrix model. Bayesian inference (Materials and methods - ‘Data analysis and Bayesian inference of the inheritance matrix model’) produced a good model fit for all six datasets (Figure 3a–f) for the two-factor inheritance matrix model, within relatively narrow error bars of mother, grandmother, sister and cousin correlations (Appendix 1—table 1). The credible intervals from the Bayesian inference matched the confidence intervals of correlations used for fitting (Appendix 1—figure 2). We quantified the quality of our fits using the Akaike information criterion (AIC) (Materials and methods - ‘Data analysis and Bayesian inference of the inheritance matrix model’, (Equation M11)) for each dataset and compared these to the one-dimensional model (Appendix 1—table 1). The AIC estimates the goodness of fit with a penalty for model complexity allowing us to select the simplest model that explains the data. The AIC values indicate that the inheritance matrix model with two cell cycle factors provides the simplest fit for all cell types used here, except for the mycobacteria data where simple inheritance rules provided an equally good fit with a significant reduction in the number of model parameters. We expected the AIC to select the two dimensional model where the cousin-mother inequality was satisfied such as in cyanobacteria, clock-deleted cyanobacteria, neuroblastoma, and human colorectal cancer cells. The match with the two-factor inheritance matrix model in fibroblasts was less obvious.
Crucially, we find that the model has a good predictive capacity for correlations further down the lineage tree. For each pattern, we show several samples from the conditional posterior distribution (solid and shaded lines) to illustrate fits of the lineage correlation and cross-branch correlation function (Figure 3a–f). For all datasets except neuroblastoma, the curves also intercept the great-grandmother and great-great-grandmother correlations that were not used for fitting (Figure 3a–d and f), and bootstrapped confidence intervals from the data overlapped with the credible intervals obtained from Bayesian inference (Appendix 1—figure 2). We then asked which correlation patterns underlie the data. To assess this, we calculated the eigenvalues of each posterior sample of the inheritance matrix to categorise the aperiodic, alternator and oscillator patterns (Figure 3a–f, bar charts). We found that in every dataset, the dominant correlation pattern was identifiable with probabilities well above 50%, except for mycobacteria (Figure 3c) that was better described by simple inheritance rules (Appendix 1—figure 1c).
Cyanobacteria, (Figure 3a), human colorectal cancer (Figure 3d), and mouse embryonic fibroblasts (Figure 3f) display a dominant oscillator pattern, but we see that their lineage correlation functions exhibit widely different periodicities. For example, the posterior lineage correlation for cyanobacteria displays a higher frequency oscillation than those in human colorectal cancer cells and fibroblasts. Clock-deleted cyanobacteria (Figure 3b) and mycobacteria (Figure 3c) display a dominant alternator pattern which could be induced by strong sister correlations. We see that clock-deleted cyanobacteria (Figure 3b) has a 100% alternator pattern in contrast to the 100% oscillator pattern seen for wild type cyanobacteria, suggesting that the deletion of the clock gene has completely transformed the correlation pattern and has abolished the underlying oscillation. Neuroblastoma (Figure 3e) displays a dominant aperiodic pattern. The predictive capacity for this cell type is weaker than for the other datasets, which we assume is due to the tight confidence interval in the correlations. Despite this discrepancy, we find that the inheritance matrix model produces excellent fits and has good predictive capacity for all other cell types studied in this work.
Bayesian inference reveals that individual inheritance parameters are not identifiable
We next ask which mechanisms are responsible for generating the observed correlation patterns. The Bayesian inference used for model fitting (Materials and methods - ‘Data analysis and Bayesian inference of the inheritance matrix model’) samples parameters using a MCMC Gibbs sampler. The Gibbs sampler can be thought of as a random walk in parameter space that settles around parameter regions with high likelihood. We found that the explorations of the Gibbs sampler did not settle in a particular parameter subspace but meandered off to explore vast areas of the parameter space without improving the likelihood values (Appendix 1—figure 3a and b). Such behaviour is expected when model parameters are not identifiable and the posterior distribution of parameters cannot be efficiently sampled (Rannala, 2002; Raue et al., 2013).
To provide further evidence of unidentifiablity, we obtained four histograms of a single parameter of the inheritance matrix for different initialisations. The four distributions are very different (Figure 4a), showing that the random walk does not settle to a stationary distribution. We further observe that the mean squared displacement increases without bound (Figure 4b) showing that the sampling does not settle in a particular subset of the parameter space. In contrast to the individual parameters, the sampled posterior distribution of the eigenvalues is consistent across the averages (Figure 4c) and their mean squared displacement converges rapidly (Figure 4d). We note that unidentifiability arises for the inheritance matrix model with multiple cell cycle factors and does not feature for simple inheritance rules (Appendix 1 - Section A5). This ultimately demonstrates that the interdivision time correlation patterns do not identify a single set of inheritance parameters, but rather need to be described by a distribution of inheritance mechanisms.
The inheritance matrix model predicts the hidden dynamical correlations of cell cycle factors
Clock-deleted cyanobacteria and neuroblastoma both satisfy the cousin-mother inequality (Figure 1b), which indicates that at least two cell cycle factors are responsible for the corresponding correlation patterns. The eigenvalues of the inheritance matrix concentrate in different regions of the admissible parameter space (Figure 4e), suggesting the correlation patterns that generate the cousin-mother inequality are distinct. For the clock-deleted cyanobacteria dataset, we found that all posterior samples were consistent with an alternator correlation pattern, while most posterior samples presented aperiodic correlation patterns in neuroblastoma (Figure 3b and e bar charts).
We hypothesised that different inheritance models generate these patterns. To verify this hypothesis and since we cannot identify the cell cycle factors directly, we computed the mother-daughter correlations between the two hidden cell cycle factors. Since the order of factors is interchangeable, we only distinguish between mother-daughter correlations between the same ( and for ) and alternate factors ( and for ). The resulting posterior distributions revealed distinct correlation patterns of cell cycle factor correlations for clock-deleted cyanobacteria and neuroblastoma (Figure 4f). For clock-deleted cyanobacteria, we predict that at least one factor has a negative mother-daughter correlation while its cross-correlation with the other factor must be positive; while the correlations are of opposite sign for neuroblastoma (Figure 4f). We sketch influence diagrams that summarise these relationships between factors (Figure 4g and h). Thus, the different interdivision time correlation patterns observed for clock-deleted cyanobacteria and neuroblastoma stem from distinct hidden correlation patterns of cell cycle factor fluctuations.
The inheritance matrix model reveals biological rhythms underlying the cell cycle
We observe that the lineage correlation functions of cyanobacteria, human colorectal cancer cells, and fibroblasts exhibit vastly different correlation oscillation periods (Figure 3). Next, we are interested to see whether the oscillations seen in these datasets are compatible with biological oscillators known to affect cell cycle control.
Correlation oscillations and underlying rhythms can exhibit vastly different periods
The period of the correlation oscillation is related to the location of the eigenvalues of the inheritance matrix on the complex plane. We consider an eigenvalue of the inheritance matrix. In terms of the mean interdivision time , the correlation period is:
and the inequality means that the period is always greater than twice the mean interdivision time . More generally, there is an oscillation period associated with each eigenmode of the inheritance matrix, but the period is infinite for real eigenvalues, and thus only complex eigenvalues generate correlation oscillations. This inequality follows from Equation 6 using . However, known biological oscillators that influence cell cycle control often have periods less than twice the mean interdivision time, such as stress response regulators (Harper et al., 2018; Stewart-Ornstein et al., 2017) and gene expression oscillations (William et al., 2007; Gao et al., 2014; Whitfield et al., 2002). How can relatively slow observed correlation oscillations be compatible with much faster biological oscillators underlying the cell cycle?
The resolution to this issue is that the period of the correlation oscillation does not always match the frequency of the underlying oscillator. Instead there are a number of possible oscillator periods compatible with the correlation oscillation period (Appendix 1 - Section A4) given by:
for . This phenomenon, that the same correlation oscillation can be explained by multiple underlying oscillators, can be understood using the intuition in Figure 5a.
Circadian oscillations in cyanobacteria and fibroblasts support coupling of the circadian clock and the cell cycle
Cyanobacteria and fibroblasts both exhibit correlation patterns consistent with an oscillator underlying cell divisions (Figure 3e, bar chart). We observe that the posterior distribution of the eigenvalues is confined to a region with negative real parts for cyanobacteria and positive real parts for fibroblasts (Figure 5c). Using these distributions, we estimate the median period of the correlation oscillations (using Equation 6) to be 41.7 hr for cyanobacteria and 144.3 hr for fibroblasts (Figure 5d). We wondered whether the stark difference in the periods of the correlation oscillations indicates a different underlying rhythm. Conversely, we found this was not the case, but both correlation patterns were consistent with an approximate circadian rhythm. The posterior of the oscillator period , which is closest to the period of correlation oscillation , suggests a median period of 24.6 hr for cyanobacteria and a median period of 23.8 hr for fibroblasts (Figure 5e). We also validated the inference result using simulated data (Appendix 1—figure 9). This finding supports a strong coupling of circadian rhythms to the cell cycle, as reported previously for both cyanobacteria (Yang et al., 2010; Martins et al., 2018) and fibroblasts (Nagoshi et al., 2004; Menger et al., 2007; Nagoshi et al., 2005). Notably, we see that clock-deleted cyanobacteria displays 100% alternator pattern (Figure 3b) and therefore has a lineage tree correlation pattern that cannot be described by an approximate 24 hr oscillator, in contrast to wild type cyanobacteria.
Bimodal posterior distribution of underlying oscillations in human colon cancer
Finally, we turn to the analysis of cancer cell data. The dominant correlation pattern was oscillatory (78% posterior probability, Figure 3d, bar chart). The posterior distribution of complex eigenvalues for the oscillator pattern has support in a large region of the parameter space. It has two distribution modes depending on whether the eigenvalues have positive or negative real parts (Figure 5f). Similarly, the posterior of the correlation oscillation period is bimodal, too (Figure 5g), which means that two competing oscillator patterns are compatible with the data.
To disentangle these alternative hypotheses, we cluster the posterior samples by the real part of the eigenvalues. We label cluster A for negative real parts and cluster B for positive ones. The correlation periods of the individual clusters do not provide us with immediate clues about the underlying oscillators. Cluster A has a median correlation oscillation period of 51.2 hr while cluster B has a median period of 100.6 hr (Figure 5g). We therefore inspected the oscillator periods for each cluster, which are closest to the observed correlation period (Figure 5h). The median of the predicted oscillator period of cluster A has an oscillator period of 24.1 hr, which hints at a circadian oscillator underlying the cell cycle in agreement with a previous model (Chakrabarti et al., 2018). However, only about 33% of posterior samples with complex eigenvalues were assigned to this cluster. The majority of posterior samples, cluster B, had a different predicted period with a median of 19.6 hr (Figure 5h). A possible explanation is that the circadian period is shortened in cancer cells.
A strength of the Bayesian framework is that it allows us to express our confidence in this prediction. We find that our analysis is not conclusive about the correlation pattern as 78% of posterior samples showed an oscillator pattern. As a result, about 52% of all the posterior samples favour a 19.6 hr oscillator and 26% for the 24.1 hr oscillator, matching approximately circadian rhythm. 16% of the samples demonstrate alternator correlation patterns, and the remaining 6% samples are aperiodic (compare bar charts in Figures 3d and 5g). We therefore ask whether these competing models make predictions that translate into testable hypotheses. We found that the oscillator correlation pattern predicts a negative grandmother correlation while the alternator pattern predicts a positive grandmother correlation (Figure 5i and j). Thus, measuring the grandmother correlation with higher precision, for example, via increasing sample size, would tighten the confidence intervals of measured correlations (Figure 5i), and improve our ability to narrow down the true pattern. On the contrary, predicting the great-grandmother correlation allows us to distinguish between the 19.6 hr and 24.1 hr rhythms (Figure 5h). Posterior samples in cluster A predicted a positive interdivision time correlation between a cell and its great-grandmother, while cluster B predicted a negative correlation (Figure 5k and l). While the great-grandmother correlation could not be estimated using the present data, deeper lineage trees could be used to discriminate the period of the biological oscillator and help reveal whether the circadian period is altered in cancer cells, or not. In summary, our theory helps to predict the hidden periodicities of biological oscillators from lineage tree interdivision time data.
Discussion
We propose a Bayesian approach to predict hidden cell cycle factor dynamics from interdivision time correlation patterns. Our underlying model fits the lineage tree data for a range of bacterial and mammalian cell types and allows us to classify different correlation patterns. Our inference demonstrates that these patterns are identifiable, but the individual inheritance parameters are not. This finding suggests that interdivision time correlations alone are insufficient to gain mechanistic insights into cell cycle control mechanisms. The identified correlation patterns, however, reveal the dynamics of the underlying cell cycle factors.
We focused on a data-driven approach without any prior assumptions of the division mechanism, allowing the interdivision time data to speak for itself. Other studies used a model similar to the inheritance matrix model proposed here, and linked latent factors to the interplay between cell cycle progression and growth (Kuchen et al., 2020). Auto-regressive models have also been used in bacteria to discriminate between different mechanisms of cell size control (Kohram et al., 2021). Additionally, they have been used to combine growth and cell cycle reporters to explain interdivision time dynamics in fibroblasts (Mura et al., 2019). In principle, the inheritance matrix model can be used to model the inheritance dynamics of any factor affecting the interdivision time of a cell. In fact, it comprises many mechanistic models as special cases, such as those based on DNA replication, cell size control or cell cycle phases (Appendix 1 - Section A6 and Appendix 1—figures 5 and 6). In future work, it will be useful to improve the identifiability of the model parameters. This could be accomplished either through including knowledge of inheritance mechanisms through prior distributions, or by including additional data on measured cell cycle factor dynamics – such as cell cycle phases, cell size, protein expression etc. – in the inference.
Another limitation of our inference is that we computed the interdivision time variance in Equation M2 of the model assuming that trees have equal number of generations in each branch. The advantage of this estimator is that it does not assume any particular noise distribution but this may lead to a statistical bias compared to the sample variance of tree-structured data with branches of varying length (Powell, 1956; Hashimoto et al., 2016; Thomas, 2017; Jafarpour et al., 2018; Kuchen et al., 2020; Sandler et al., 2015). However, the approximation does not change the identified correlation patterns and the conclusion of this work, since any variance bias can be compensated by multiplying the noise matrices ( in Equation 5) with a constant, and, for the data analysed, the interdivision time variance estimators cannot be distinguished within the 95% confidence intervals (Appendix 1—table 3). Developing a theory correcting for such biases in lineage tree data will be the subject of future work.
An important result of the present analysis is that lineage tree correlation patterns of very different cell types – cyanobacteria, mouse embryonic fibroblasts and human colorectal cancer – can be explained through an underlying circadian oscillator coupled to cell division. While the coupling between the cell cycle and circadian clock is well established both in cyanobacteria and mouse embryonic fibroblasts, it is less well studied in cancer (Shostak, 2017; Kiessling et al., 2017). Our method robustly reconstructs the circadian rhythms from the interdivision time correlation patterns despite the lack of the cousin-mother inequality for fibroblasts, demonstrating the cousin-mother inequality is not required for complex correlation patterns (‘The cousin-mother inequality is not required togenerate complex correlation patterns’). It is interesting to observe the differences in the oscillatory correlation patterns in these organisms. They are characterised by complex eigenvalues with negative real parts in cyanobacteria, but positive real parts in fibroblasts (Figure 5c), resulting in opposite mother-daughter correlations for these datasets (Figure 3a and f).
It would be interesting to explore what mechanisms underlie these different patterns. While the circadian clock in fibroblasts relies on transcriptional mechanisms (Hughes et al., 2009; Menger et al., 2007; Takahashi, 2017), the origin of the clock is non-transcriptional in cyanobacteria (Cohen and Golden, 2015; Tomita et al., 2005; Nakajima et al., 2005). The negative mother-daughter correlation in cyanobacteria likely stems from size control mechanisms that are modulated by the circadian clock (Martins et al., 2018). However, the mechanisms that generate positive mother-daughter correlations in fibroblasts are still to be explored. Interestingly, in human colorectal cancer, two oscillatory correlation patterns divide the posterior distributions into two distinct clusters with positive and negative mother-daughter correlations. If the circadian clock was to generate a positive mother-daughter correlation, as it does in fibroblasts which have a structurally related clock, the period corresponds to a 20 hr rhythm. This finding thus suggests that the circadian period is altered in cancerous cells. Indeed, several studies report similar periods of 18 hr and 20 hr for gene expressions in the human colorectal cancer core-clock (Fuhr et al., 2019; Parascandolo et al., 2020).
Our theory predicts that an oscillator’s period does not always match the period of the observed correlation oscillations. We describe a lower bound on the correlation period that is reminiscent of the Nyquist-Shannon sampling theorem. This theorem describes temporal aliasing in digital audio processing, where a high-frequency signal produces low-frequency oscillations when sampled at a frequency less than twice the sampling frequency. Similarly, spatial aliasing is observed in digital image processing as a moire pattern. In our analogy, the high-frequency signal is a biological oscillator that couples to cell division and is sampled at the cell division frequency (Figure 5a). Our result thus extends the Nyquist-Shannon sampling theorem to lineage trees. Our finding has fundamental implications for the reconstruction of oscillator periods from interdivision time data, revealing that there exists a number of oscillators that can all explain the same correlation pattern.
Here, we concentrated on the oscillator periods that are closest to the correlation oscillation periods . In principle, we cannot exclude that oscillators with shorter physiological periods are contributing to the observed lineage tree correlation patterns. For example, HES1 expression oscillates with a period of around 5 h in human colon cancer cells (William et al., 2007; Gao et al., 2014). The stress response regulators NF-κB and p53, which are critical for tumour development, oscillate with periods of approximately 100 min and 5 hr, respectively (Harper et al., 2018; Stewart-Ornstein et al., 2017). The posterior distributions for periods in this region are not well separated (Appendix 1—figure 7c), which makes it challenging to identify factors that oscillate significantly faster than the cell cycle using interdivision time data. It is, however, unknown whether such hypothetical factors couple to cell division specifically in a manner to induce oscillatory interdivision time correlation patterns.
Going forward, there is a need to go beyond the Nyquist-Shannon limit and develop methods that have increased sensitivity to discriminate a broader range of oscillator periods. One way to circumvent the limitation would be to employ fluorescent reporters of the circadian clock that could be correlated directly with cell division timing. Another way, would be to provide parallel readouts of the underlying rhythm through events that sub-sample the cell cycle, such as DNA replication, or the timing of individual cell cycle phases. Not only would we be able to look at the correlation in interdivision time between cells on a lineage tree, but we would also be able to analyse the correlations between individual phases and family members, to reveal specific phase control mechanisms. Our main findings result from the the inheritance matrix model with two cell cycle factors, as this was sufficient to explain the correlation patterns of the chosen data. In principle, increasing the number of interacting cell cycle factors can lead to more complex composite patterns that involve combinations of the three patterns discussed in this paper, such as the alternator-oscillator (Appendix 1—figure 6c and d), aperiodic-oscillator (Appendix 1—figure 6g and h), or birhythmic correlation patterns. Such composite patterns could also arise as the result of nonlinear fluctuations that, within our framework, can be described by adding complexes of cell cycle factors to the inheritance matrix model (Appendix 1 - Section A2). The presence of such complexes induces higher-order harmonics in the correlation oscillations, similar to those observed in the cyanobacterial and mammalian circadian clock (Martins et al., 2016; Thomas et al., 2013), and detecting such complexes could provide an alternative route to increase the sensitivity of our inference method.
In summary, our findings highlight the predictive power of Bayesian inference on single-cell data and how it can be leveraged to draw testable hypotheses for the design of future experiments. This was exemplified for human colorectal cancer cells, where various patterns were compatible with the data, something that non-probabilistic approaches cannot accomplish as they fit only a single correlation pattern. In the future, it will be crucial to understand why different cell types have evolved specific lineage correlation patterns and how these patterns affect cell proliferation and disease. It would be interesting to understand whether specific correlation patterns give or reveal some fitness advantage and whether we can use them to predict cell survival. We anticipate that identifying hidden cell cycle factors and their rhythmicity using non-invasive methods such as interdivision time measurements will be instrumental in answering these questions and may benefit other fields where cell proliferation plays a pivotal role.
Code availability
Code available at https://github.com/pthomaslab/Lineage-tree-correlation-pattern-inference (copy archived at swh:1:rev:dc69bbce5ce909813d7d4356c9fd2da045e02c79; Hughes, 2022).
Materials and methods
Analytical solution of the inheritance matrix model
Request a detailed protocolFrom Equation 2b and , we see that the vector of cell cycle factors has zero mean . Its covariance matrix satisfies a discrete-time Lyapunov equation:
From the solution of Equation M1, we compute the variance of the interdivision time
and the generalised tree correlation function (see Appendix 1 - Section A3 for a detailed derivation) given by:
where with
To ensure that the lineage tree correlation pattern is stationary, we require where is the spectral radius of . This also ensures that the solutions to Equation M1; , and the function Equation M3 are unique and independent of the initial conditions.
Analysis of tree correlation patterns
Request a detailed protocolThe patterns of the generalised tree correlation function can be characterised through its eigendecomposition. The general decomposition proceeds through finding the matrix of eigenvectors of such that
is the diagonal matrix of eigenvalues. Defining and , the solution to Equation M1 is given by
This result can then be used to find an explicit expression for the generalised tree correlation function:
where
Equation M7 can be rewritten as a superposition of patterns Equation 4 with weights given by Equation 5.
The pattern of the tree correlation function is thus governed by the eigenvalues of the inheritance matrix : (i) if one eigenvalue, say , is positive then the factor contributing to lineage correlation decays monotonically. The factor contributing to the cross-branch correlation decays twice as fast; (ii) if there is a negative eigenvalue, the factor alternates between negative and positive values with an envelope of , while the corresponding contribution to the cross-branch correlation decays monotonically with rate as . Finally, if we have a pair of complex eigenvalues then the factors contributing to the lineage correlation function display damped oscillations with frequency and envelope , while the factor and the factor oscillate with frequency .
Determining the period of correlation oscillations from the eigenvalues
Request a detailed protocolWe consider the case where the inheritance matrix has a pair of complex conjugate eigenvalues . The lineage correlation function then oscillates whenever and . The period of correlation oscillations per generation is given by
where is the argument of the eigenvalue and is the complex logarithm. The former is the angle made between the line joining the origin and the eigenvalue on the complex plane with the real axis. This means that if and only if . Otherwise, is calculated in terms of by Equation M9, (Appendix 1—figure 8).
Data analysis and Bayesian inference of the inheritance matrix model
Request a detailed protocolWe determined all pairs of cells in a lineage tree, sorted them by family relations and calculated the sample correlation coefficient of interdivision times (Equation 3). To maximise the number of samples used to calculate these correlations, an individual cell can appear in more than one pair. For example, if a cell had two cousins, it would be counted in two separate cousin pairs in the cousin-cousin correlation coefficient calculation. For training, we focus on the sample statistics with comprised of the interdivision time sample variance and four interdivision time sample correlation coefficients given by the mother-daughter, grandmother-granddaughter, sister-sister and cousin-cousin relations (Figure 2a). Note that is computed across all interdivision times used to calculate the correlation coefficients in each dataset. Errors are estimated using bootstrapping by re-sampling cell pairs with replacement 10,000 times. The resulting variances and correlation coefficients are given in Appendix 1—table 1.
The vector of inferred model parameters for the two-dimensional model is , where we fix and for simplicity. A different choice of did not affect our results (Appendix 1—figure 4). Since is symmetric, it consists of the variances and correlation coefficients between the components of . Thus for the inheritance matrix model has seven free parameters to be estimated. We assumed that the log-likelihood for these statistics is the sum of square errors:
which is equivalent to assuming that the sample variance and correlation coefficients are normally distributed for large sample sizes. We calculate the interdivision time variance and the generalised tree correlation function from Equation M2 and Equation M3. Note that Equation M2 is the interdivision time variance from a tree where all lineages have the same number of generations, which approximates the variance across all cells in the observed trees (Appendix 1—table 3). For simplicity, we neglected possible correlations between the sample statistics in and used bootstrapped estimates for the standard deviation of the sample statistics and (Appendix 1—table 1). Note that the likelihood is independent of the mean since it is irrelevant for the correlation pattern. We assumed a flat prior with support restricted to and positive semi-definite to guarantee the existence of a stationary correlation pattern.
The numerical implementation uses the adaptive Gibbs‐sampler implemented in the Julia library Mamba.jl (Smith, 2018). For each dataset, we sample 11 million parameter sets which include a burn-in transient of 1 million samples. These samples are removed before analysis of the output.
For model comparison we use the AIC (Akaike, 1974) given by
where is the number of model parameters and is the maximum value of the log-likelihood function given by Equation M10. For , the inheritance matrix model has parameters where is the number of cell cycle factors in the model. For the number of parameters reduces to .
Appendix 1
A1. Small noise approximation
Here, we will derive the inheritance matrix model given by Equation 2a, and b in the main text. We assume that the fluctuations in the hidden cell factor dynamics are small, which leads to a computationally efficient approximation.
Firstly, in the limit of zero fluctuations, all cells must be identical. Hence, all cell cycle factors are equal to their means and similarly for the noise vectors in Equation 1b. From Equation 1a, and b we then find that
which can be efficiently solved for and using standard numerical methods.
Secondly, we can decompose the interdivision time and the cell cycle factor vector into their respective mean and fluctuating components by
Denoting the index of the present cell by p and the one of its mother by m, we can expand f and g around the limit of zero fluctuations and we obtain to leading order
where
Using this expansion and Equation A2 in Equation 1a and b of the main text we arrive at
where we have set and giving the fluctuations around the mean for the noise vectors. Comparing Equation A7 with Equation A1 and collecting terms to leading order, we obtain the linearised system:
Next, we define the diagonal scaling matrix with non-zero elements as
for . Using the rescaled noise sources , we find the rescaled inheritance matrix and -coefficients
The rescaled cell cycle factor fluctuations follow Equation 2 of the main text and we reach rescaled variance-covariance matrices and as follows
A2. Beyond the small noise approximation: cell cycle factor complexes account for nonlinear fluctuations
Here we analyse the effect of nonlinearity on the interdivision time correlation patterns. For simplicity we consider a single cell cycle factor and follow the same lines as in Appendix 1 - Section A1, Equation A9, while including terms of order . This leads to the expansion interdivision time and factor fluctuations
where is the Jacobian of the cell cycle factor dynamics, as before, and is the Hessian. From the second equation we obtain
Defining and , combining Equation A14 and Equation A16 and rescaling variables as in Equation A12 and Equation A13, we find the extended inheritance matrix model
where
Here and if , and analogously, and if . Hence, the interdivision time correlation patterns with small to moderate fluctuations can be described through an extended linear system (Equation A7) that includes nonlinear terms . These additional terms can be interpreted as cell cycle factors forming binary complexes. The presence of these complexes increases the number of cell cycle factors and extends the eigenvalue spectrum of the effective inheritance matrix by . Hence, the presence of complexes leads to mixed correlation patterns. For example, for a single cell cycle factor, the eigenvalues of are , which corresponds to an alternator pattern for . More generally, we may expect that nonlinear patterns can be described through mixtures of aperiodic, alternator, and oscillatory patterns. For example, the complex eigenvalue spectrum of an oscillator pattern () will include powers of complex eigenvalues () resulting in harmonics of the fundamental correlation oscillation frequency similar to higher order harmonics observed in single-cell time-series of the circadian clock (Martins et al., 2016; Thomas et al., 2013).
A3. Derivation of the generalised tree correlation function
In this section we derive an analytical expression for the generalised tree correlation function. This gives the Pearson correlation coefficient in interdivision time for any pair of related cells. We start with the equation for the Pearson correlation coefficient, and from there derive a formula for the interdivision time covariance using the known properties of the cell cycle factors . From this, we can derive the general formula for the correlation coefficient between any related cell pair.
We associate a cell pair with an index which measures the distance to the nearest common ancestor as given in ‘The inheritance matrix model reveals threedistinct interdivision time correlation patterns’ (Figure 2a). From this, we denote their interdivision time fluctuations as and respectively. The Pearson correlation coefficient between these fluctuations is given by
where is the variance of the interdivision time fluctuations.
The interdivision time fluctuations and are calculated from the vector of rescaled cell cycle factor fluctuations as given in ‘A general inheritance matrix model provides a unified framework for lineage tree correlation patterns’, giving the equations
Substituting Equation A20 into Equation A19, we obtain a formula for in terms of the cell cycle factor fluctuations and the coefficients alone
Since and are identically distributed in steady state, we have that as specified in Materials and methods - ‘Analytical solution of the inheritance matrix model’‘. We can write now as
where gives the variance of the interdivision time fluctuations .
Using the model Equation 2 we can write the formula for the vectors for the two cells in the cell pair as
where cells and have mother cells and respectively. The two cells are sisters if and only if their subscripts are both equal to 1, meaning they share a mother cell. Using recurrence of the model, we can write these equations as
where is the vector of cell cycle factors for the most recent common ancestor for a cell pair given by .
All that remains is to derive a function for which we will denote . We calculate as follows using expectations:
where and are the mean vectors of and respectively which are both equal to 0, giving
To find in terms of the model parameters, we substitute in Equation A26 for and and get
The noise term fluctuations are only correlated if the cells are sisters, which only occurs when the distance we have . So for the summations above, we exclude all terms except where . Doing this and expanding we get
where and are given in Equation M4. We also have that
The matrix is equivalent to the covariance matrix for any , giving . This gives
Similarly we have,
As , and as stated in Materials and methods - ‘Analytical solution of the inheritance matrix model’, we obtain,
Equation A31 therefore becomes:
Substituting Equation A37 back into Equation A29 we get
giving us the final equation for . Using the above equation in Equation A23, we obtain Equation M3 of the Methods.
A4. Derivation of the formula for the oscillator periods,
The period of correlation oscillation as observed in the lineage correlation functions is given by Thomas et al., 2018. We can reveal the underlying oscillator periods by shifting the inferred period to obtain a smaller period . This means that shorter periods would produce the same inferred period in the lineage correlation function when sampled at the original frequency of once per cell cycle (Figure 5a).
The oscillator periods are obtained by adding or subtracting multiples of to the argument of the eigenvalue which results in the new argument being in the same position in the complex plane. The oscillator period with shift is therefore given by
Taking Equation A40 and substituting in Equation 6, we obtain in terms of as Equation 7.
A5. Solution of the tree correlation function and parameter identifiability for simple inheritance rules
We consider the limiting case of a single cell cycle factor () resulting in simple inheritance rules. This situation could model a growth factor that can either increase or decrease interdivision times of cells depending on the monotonicity of in Equation 1. The analytical solution (Equation M7) of the inheritance matrix model then reduces to
where . First, we observe that, given single cell measurements of the mother-daughter correlation coefficient , the daughter-daughter correlation coefficient and the variance , the parameters , S1 and S2 are uniquely identifiable:
Thus measurements of the variance, lineage- and cross-branch correlations fully determine the parameters. The tree correlation function is, however, independent of , which means that the interdivision time correlation pattern carries no information whether the growth factor increases or decreases growth. The reason for this indifference is that cell cycle factors are identified only by their fluctuation pattern, i.e., for each cell cycle factor whose fluctuations increase interdivision time , we could define another cell cycle factor fluctuation that decrease interdivision time . We accounted for this unidentifiability issue trough a similarity transformation using the scaling matrix in Equation A12 and Equation A13 that transforms all cell cycle factor fluctuations to increase interdivision time. Of course, this unidentifiabiliy could be removed through explicitly measuring the involved cell cycle factors.
A6. Mapping mechanistic cell cycle and cell size control models to the inheritance matrix model
To further investigate the output of the inheritance matrix model, we propose multiple models of known cell cycle control mechanisms, and map them to our inheritance matrix model framework. All cell size models assume symmetric division.
A6.1 Cell size control model with correlated growth
Considering the influence of cell size control on interdivision time (Taheri-Araghi et al., 2015; Kohram et al., 2021; Ho et al., 2018), here we propose a cell size control model where we have some mother to daughter inheritance of both the added size and the growth rate (Appendix 1—figure 5g). The model equations are given by:
The noise terms and are independent between sisters such that . Assuming exponential growth the formula for the interdivision time is given by
where represents the index of a given cell. Taking the vector of cell cycle factors for the mother cell to be and comparing Equation 1a and b with Equation A43 and Equation A44, we obtain
Then we can calculate the means from Equation A1,
Then using Equation A45 in Equation A5 and Equation A12, we find
Assuming and using Equation A13, we find
The matrix has eigenvalues which give an aperiodic pattern for and an alternator pattern otherwise (Appendix 1—figure 5i and j). These same patterns arise for all real eigenvalues in the 3D model in the same was as in the two-dimensional system. Only a single negative eigenvalue is needed for the lineage correlation function to display an alternator pattern. We are restricted to and to ensure . The cousin-mother inequality for this system is too complex to be looked at analytically, so we use numerical methods to visualise the parameter region in which the cousin-mother inequality can be satisfied (Appendix 1—figure 5h).
For the case of the aperiodic pattern, we observe positive same factor mother-daughter correlation and negative alternate factor mother-daughter correlation (Appendix 1—figure 5k). In contrast, for an alternator pattern, the mother daughter same factor correlation is negative, but the alternate factor correlations vary between positive and negative values (Appendix 1—figure 5I).
A6.2 Simple cell size control model
For the special case of and , the model reduces to a simple cell size control model with fluctuating added size (Appendix 1—figure 5a). The inheritance matrix then has eigenvalues . Thus depending on the choice of , this model can produce both an alternator and aperiodic pattern (Appendix 1—figure 5c and d). In this case, using Equation M3 the cousin-mother inequality becomes
which cannot be satisfied for , which implies . Hence the cousin-mother inequality cannot be satisfied for any reasonable choice of in this simple model (Appendix 1—figure 5b).
For an aperiodic pattern, this simplified model exhibits positive same factor mother-daughter correlation and negative alternate factor mother-daughter correlation (Appendix 1—figure 5e). In the alternator case, this model exhibits negative same factor mother-daughter correlation and also negative alternate factor mother-daughter correlation (Appendix 1—figure 5f).
A6.3 Abstract cell cycle phase model
We propose a model of two abstract cell cycle phases that have no integrated dependence on cell size (Appendix 1—figure 5m). The model equations are given by
The noise terms and are independent between sister cells such that In this case we have that the two factors make up the length of the cell cycle, so we simply have .
Therefore using Equation 1a and b, we obtain
We calculate the means from Equation A1,
Then using Equation A51 in Equation A5 and Equation A12, we find
As the noise terms are independent between sisters we have and using Equation A13 we obtain
The inheritance matrix has eigenvalues which gives an aperiodic pattern for and and an alternator pattern otherwise (Appendix 1—figure 5o, p).
The analytical form of the cousin-mother inequality is complex so we use numerical methods to visualise the parameter region in which the cousin-mother inequality can be satisfied (Appendix 1—figure 5n).
We calculate individual factor mother-daughter correlations and find that for an aperiodic pattern, the model exhibits a range of correlation patterns (Appendix 1—figure 5q). However, for an alternator pattern, we obtain positive same factor mother-daughter correlation and negative alternate factor mother-daughter correlation (Appendix 1—figure 5r)
A7. Models of circadian-clock-driven correlation patterns
A7.1 Kicked cell cycle model
Here we analyse the kicked cell cycle model (Mosheiff et al., 2018) with our framework (Appendix 1—figure 6a). We will propose an inheritance matrix and then show that it reduces to the kicked cell cycle model for certain parameter choices. Consider the inheritance matrix and noise vector given by
for . We have that is given by , however we assume that the noise terms are independent between sisters such that . Assuming , the interdivision times are governed by
The oscillator is represented by the cell cycle factors that evolve according to
with oscillator inheritance matrix
We can solve Equation A56 along an ancestral lineage of generations
where is the state of the ancestral cell. Substituting Equation A58 into Equation A55 and assuming , i.e., the cell cycle oscillator is deterministic, the interdivision time of the mother determines the interdivision time of the daughter cell via
where and which represent initial conditions. Assuming approximates the time at birth for , this leads to
Comparing Equation A60 to Equations 1 and 2 in Mosheiff et al., 2018, we see that our IMM agrees with the kicked cell cycle model when , , , and large .
A7.2 Circadian-clock-driven cell size control model
Here we analyse the model of cell size control driven by the circadian clock proposed in Martins et al., 2018 within the inheritance matrix model framework (Appendix 1—figure 6e). The division rate, in Equation 1 of Martins et al., 2018 is given by
where is the cell size with sb being the size at birth. is a function of time that couples the size control to the circadian clock, and is the division rate per unit volume of the cell. Assuming cells grow exponentially with growth rate , we have
and the division size follows
where is the size at birth and is the time at birth.
To map these to our inheritance matrix model, we observe that samples from Equation A63 follow
where is a drift term and is a zero-mean noise term that depends both on time of day and birth size. Note that both and are periodic functions of time at birth . Since the latter is not explicitly modelled in our framework, here, we replace it with the state of the circadian clock, such that the update equations in Equation A64 now appear as
To gain intuition into the shape of the unknown functions and , we linearise the equations around some basal level of a clock-less mutant, which gives
For simplicity assume and that the clock-less mutant follows a linear cell size control model with gamma-distributed size increments with mean as in Martins et al., 2018. These assumptions lead to the relations,
Using , we can obtain the linearised inheritance matrix model equations for the circadian cell size control model (Appendix 1—figure 6e):
where now is the added size and is the output of a circadian oscillator governed by
for cell generation , where are noise terms added to the elements of and is some complex eigenvalued inheritance matrix given by
Following this, we see that the circadian clock is incorporated into this cell size control system in the same way as the kicked cell cycle model outlined in the previous section (Appendix 1 - Section A7.1). Using Equation A62 we can write the interdivision time of a cell with index as
Then taking the vector of cell cycle factors for the mother cell to be and comparing Equation 1a and b with Equation A68 and Equation A71 we obtain
Computing the means using Equation A1 we get
Then using Equation A72 in Equation A5 and Equation A12, we can solve for the means
Then taking and using Equation A13, we obtain the following for :
where indicates the correlation between a pair of noise terms and , and for .
A7.3 Model comparison
We notice that the kicked cell cycle model has three cell cycle factors, while the circadian-clock-driven cell size control model has four cell cycle factors. The eigenvalues of the inheritance matrix determining the correlation patterns are
for the kicked cell cycle model and
and for the cell size control model. In both models, either the complex pair of eigenvalues produces oscillatory behaviour. The overall correlation patterns are of mixed type, depending on the parameters and .
To compare the models quantitatively, we match their mother-daughter interdivision time correlation coefficient in the absence of clock coupling. For the kicked cell cycle model, we notice that in the absence of clock coupling. The cell size control model reduces to the model in Appendix 1 - Section A6.1 in the absence of clock coupling, which satisfies . Since realistic cell size control mechanisms (Taheri-Araghi et al., 2015; Amir, 2014; Tanouchi et al., 2015; Sauls et al., 2016) () ranging from sizers () to adders () to timers () imply , we find that the kicked cell cycle obeys a mixed correlation pattern of the alternator/oscillator type while the cell size control model obeys a aperiodic/oscillator pattern.
Focusing on the common adder size control (), we find that the regions where the cousin-mother inequality is satisfied is remarkably similar in both models when is matched accordingly (Appendix 1—figure 6b and f). The lineage correlation function (red line) oscillates but the cross-branch correlation functions (blue line) alternates for the kicked cell cycle (Appendix 1—figure 6c–d) but not for the cell size control model (Appendix 1—figure 6g–h).
A8. Inference validation using simulated data
To validate the inference results discussed in the main text we simulate interdivision time data using the maximum posterior parameters from the inference on two of the original live imaging datasets, and compare the output and model fit to our original inference.
We take the maximum posterior parameter sets from the original inference on two datasets (Appendix 1—table 2), cyanobacteria and mouse embryonic fibroblasts, and produce simulated interdivision time lineage data in MATLAB using custom scripts and Random Trees (Kaj and Gaigalas, 2022). We chose to look at these two datasets in order to analyse the posterior distribution of the inferred underlying period to compare to the approximately 24 hr results seen in the main text.
From this simulated data, the correlation coefficients are calculated using the methods outlined in Materials and methods - ‘Data analysis and Bayesian inference of the inheritance matrix model’, and then we look at the model inference on these new, simulated correlations, to compare to the original. These simulations produce correlation patterns that reproduce the experimentally measured correlations (comparing Appendix 1—figure 9a–b with Figure 3a and f).
The posterior distribution of the simulated patterns are the same for the cyanobacteria, exhibiting an 100% oscillator pattern (Appendix 1—figure 9a), matching the fitting to the original dataset (Figure 3a). Mouse embryonic fibroblasts (Appendix 1—figure 9b) loses some of it’s original 100% oscillator pattern (Figure 3f) in favour of an alternator pattern. However, an oscillator pattern is still dominant.
We see that for cyanobacteria (Appendix 1—figure 9c) and mouse embryonic fibroblasts (Appendix 1—figure 9d), the posterior distribution for the inference on the simulated data for the correlation function oscillatory period, (Appendix 1—figure 9c and d), exhibits a large overlap with the original posterior distribution discussed in Circadian oscillations in cyanobacteria and fibroblastssupport coupling of the circadian clock and the cell cycle (Figure 5e). The difference in the median for these posterior distributions is 0.42 hr for mouse embryonic fibroblasts (Appendix 1—figure 9d) and just 0.11 h for cyanobacteria (Appendix 1—figure 9c). This result validates our analysis of these posterior distributions showing that the period that we reconstruct from the simulated correlation patterns is consistent with the original data.
Data availability
The current manuscript is a computational study, so no data have been generated for this manuscript. Modelling code is uploaded to GitHub https://github.com/fernhughes/Lineage-tree-correlation-pattern-inference (copy archived at swh:1:rev:dc69bbce5ce909813d7d4356c9fd2da045e02c79).
-
Apollo - University of Cambridge RepositoryResearch data supporting Cell size control driven by the circadian clock and environment in cyanobacteria.https://doi.org/10.17863/CAM.31834
References
-
A new look at the statistical model identificationIEEE Transactions on Automatic Control 19:716–723.https://doi.org/10.1109/TAC.1974.1100705
-
Cell size regulation in bacteriaPhysical Review Letters 112:208102.https://doi.org/10.1103/PhysRevLett.112.208102
-
Circadian clock effects on cellular proliferation: insights from theory and experimentsCurrent Opinion in Cell Biology 67:17–26.https://doi.org/10.1016/j.ceb.2020.07.003
-
Stochastic timing in gene expression for simple regulatory strategiesNucleic Acids Research 45:1069–1078.https://doi.org/10.1093/nar/gkw1235
-
Circadian rhythms in cyanobacteriaMicrobiology and Molecular Biology Reviews 79:373–385.https://doi.org/10.1128/MMBR.00036-15
-
Chromosome replication and the division cycle of Escherichia coli B/rJournal of Molecular Biology 31:519–540.https://doi.org/10.1016/0022-2836(68)90425-7
-
The continuum model: statistical implicationsJournal of Theoretical Biology 94:783–800.https://doi.org/10.1016/0022-5193(82)90078-9
-
Accelerating live single-cell signalling studiesTrends in Biotechnology 35:422–433.https://doi.org/10.1016/j.tibtech.2017.01.002
-
Cell heterogeneity during the cell cycleJournal of Cellular Physiology 113:465–474.https://doi.org/10.1002/jcp.1041130316
-
Stochastic gene expression in a single cellScience 297:1183–1186.https://doi.org/10.1126/science.1070919
-
Time-Lapse microscopy approaches to track cell cycle and lineage progression at the single-cell levelCurrent Protocols in Cytometry Chapter 12:Unit12.https://doi.org/10.1002/0471142956.cy1204s64
-
Harmonics of circadian gene transcription in mammalsPLOS Genetics 5:e1000442.https://doi.org/10.1371/journal.pgen.1000442
-
SoftwareLineage-tree-correlation-pattern-inference, version swh:1:rev:dc69bbce5ce909813d7d4356c9fd2da045e02c79Software Heritage.
-
Bridging the timescales of single-cell and population dynamicsPhysical Review X 8:021007.https://doi.org/10.1103/PhysRevX.8.021007
-
Using movies to analyse gene circuit dynamics in single cellsNature Reviews. Microbiology 7:383–392.https://doi.org/10.1038/nrmicro2056
-
Microbial individuality: how single-cell heterogeneity enables population level strategiesCurrent Opinion in Microbiology 24:104–112.https://doi.org/10.1016/j.mib.2015.01.003
-
Frequency doubling in the cyanobacterial circadian clockMolecular Systems Biology 12:896.https://doi.org/10.15252/msb.20167087
-
Cell cycle proliferation decisions: the impact of single cell analysesThe FEBS Journal 284:362–375.https://doi.org/10.1111/febs.13898
-
Circadian gene expression in cultured cellsMethods in Enzymology 393:543–557.https://doi.org/10.1016/S0076-6879(05)93028-0
-
Growth rate and generation time of bacteria, with special reference to continuous cultureJournal of General Microbiology 15:492–511.https://doi.org/10.1099/00221287-15-3-492
-
Mycobacteria modify their cell size control under sub-optimal carbon sourcesFrontiers in Cell and Developmental Biology 5:64.https://doi.org/10.3389/fcell.2017.00064
-
Identifiability of parameters in MCMC Bayesian inference of phylogenySystematic Biology 51:754–760.https://doi.org/10.1080/10635150290102429
-
Joining forces of Bayesian and frequentist methodology: a study for inference in the presence of non-identifiabilityPhilosophical Transactions. Series A, Mathematical, Physical, and Engineering Sciences 371:20110544.https://doi.org/10.1098/rsta.2011.0544
-
Adder and a coarse-grained approach to cell size homeostasis in bacteriaCurrent Opinion in Cell Biology 38:38–44.https://doi.org/10.1016/j.ceb.2016.02.004
-
Circadian clock, cell division, and cancer: from molecules to organismInternational Journal of Molecular Sciences 18:E873.https://doi.org/10.3390/ijms18040873
-
WebsiteMamba.jl: markov chain monte carlo (MCMC) for bayesian analysis in juliaAccessed October 20, 2018.
-
Additive models for dependent cell populationsJournal of Theoretical Biology 109:127–146.https://doi.org/10.1016/s0022-5193(84)80115-0
-
A bifurcating autoregression model for cell lineages with variable generation meansJournal of Theoretical Biology 156:183–195.https://doi.org/10.1016/s0022-5193(05)80672-1
-
Cell-Size control and homeostasis in bacteriaCurrent Biology 25:385–391.https://doi.org/10.1016/j.cub.2014.12.009
-
Transcriptional architecture of the mammalian circadian clockNature Reviews. Genetics 18:164–179.https://doi.org/10.1038/nrg.2016.150
-
Signatures of nonlinearity in single cell noise-induced oscillationsJournal of Theoretical Biology 335:222–234.https://doi.org/10.1016/j.jtbi.2013.06.021
-
Making sense of snapshot data: ergodic principle for clonal cell populationsJournal of the Royal Society, Interface 14:20170467.https://doi.org/10.1098/rsif.2017.0467
-
The chemical basis of morphogenesisBulletin of Mathematical Biology 52:153–197.https://doi.org/10.1016/S0092-8240(05)80008-4
-
Automated deep lineage tree analysis using a bayesian single cell tracking approachFrontiers in Computer Science 3:734559.https://doi.org/10.3389/fcomp.2021.734559
-
Identification of genes periodically expressed in the human cell cycle and their expression in tumorsMolecular Biology of the Cell 13:1977–2000.https://doi.org/10.1091/mbc.02-02-0030
Article and author information
Author details
Funding
EPSRC Centre for Mathematics of Precision Healthcare (EP/N014529/1)
- Fern A Hughes
MRC London Institute of Medical Sciences (MC-A658-5TY60)
- Alexis R Barr
UKRI Future Leaders Fellowship (MR/T018429/1)
- Philipp Thomas
Cancer Research UK (Career Development Fellowship C63833/A25729)
- Alexis R Barr
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We thank Bruno Martins, Dimitris Volteras and Paul Piho for their comments on the manuscript. This work has been supported by a scholarship to FAH provided by the EPSRC Centre for Mathematics of Precision Healthcare (EP/N014529/1) and MRC core funding to the London Institute of Medical Sciences (MC-A658-5TY60). ARB is funded by a CRUK Career Development Fellowship (C63833/A25729). PT is funded by a UKRI Future Leaders Fellowship (MR/T018429/1).
Copyright
© 2022, Hughes et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 1,347
- views
-
- 225
- downloads
-
- 7
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Cell Biology
- Developmental Biology
In most murine species, spermatozoa exhibit a falciform apical hook at the head end. The function of the sperm hook is not yet clearly understood. In this study, we investigate the role of the sperm hook in the migration of spermatozoa through the female reproductive tract in Mus musculus (C57BL/6), using a deep tissue imaging custom-built two-photon microscope. Through live reproductive tract imaging, we found evidence indicating that the sperm hook aids in the attachment of spermatozoa to the epithelium and facilitates interactions between spermatozoa and the epithelium during migration in the uterus and oviduct. We also observed synchronized sperm beating, which resulted from the spontaneous unidirectional rearrangement of spermatozoa in the uterus. Based on live imaging of spermatozoa-epithelium interaction dynamics, we propose that the sperm hook plays a crucial role in successful migration through the female reproductive tract by providing anchor-like mechanical support and facilitating interactions between spermatozoa and the female reproductive tract in the house mouse.
-
- Cell Biology
Multiple gut antimicrobial mechanisms are coordinated in space and time to efficiently fight foodborne pathogens. In Drosophila melanogaster, production of reactive oxygen species (ROS) and antimicrobial peptides (AMPs) together with intestinal cell renewal play a key role in eliminating gut microbes. A complementary mechanism would be to isolate and treat pathogenic bacteria while allowing colonization by commensals. Using real-time imaging to follow the fate of ingested bacteria, we demonstrate that while commensal Lactiplantibacillus plantarum freely circulate within the intestinal lumen, pathogenic strains such as Erwinia carotovora or Bacillus thuringiensis, are blocked in the anterior midgut where they are rapidly eliminated by antimicrobial peptides. This sequestration of pathogenic bacteria in the anterior midgut requires the Duox enzyme in enterocytes, and both TrpA1 and Dh31 in enteroendocrine cells. Supplementing larval food with hCGRP, the human homolog of Dh31, is sufficient to block the bacteria, suggesting the existence of a conserved mechanism. While the immune deficiency (IMD) pathway is essential for eliminating the trapped bacteria, it is dispensable for the blockage. Genetic manipulations impairing bacterial compartmentalization result in abnormal colonization of posterior midgut regions by pathogenic bacteria. Despite a functional IMD pathway, this ectopic colonization leads to bacterial proliferation and larval death, demonstrating the critical role of bacteria anterior sequestration in larval defense. Our study reveals a temporal orchestration during which pathogenic bacteria, but not innocuous, are confined in the anterior part of the midgut in which they are eliminated in an IMD-pathway-dependent manner.