Distinguishing different modes of growth using single-cell data

  1. Prathitha Kar
  2. Sriram Tiruvadi-Krishnan
  3. Jaana Männik
  4. Jaan Männik  Is a corresponding author
  5. Ariel Amir  Is a corresponding author
  1. Harvard John A. Paulson School of Engineering and Applied Sciences, Harvard University, United States
  2. Department of Chemistry and Chemical Biology, Harvard University, United States
  3. Department of Physics and Astronomy, University of Tennessee, United States

Abstract

Collection of high-throughput data has become prevalent in biology. Large datasets allow the use of statistical constructs such as binning and linear regression to quantify relationships between variables and hypothesize underlying biological mechanisms based on it. We discuss several such examples in relation to single-cell data and cellular growth. In particular, we show instances where what appears to be ordinary use of these statistical methods leads to incorrect conclusions such as growth being non-exponential as opposed to exponential and vice versa. We propose that the data analysis and its interpretation should be done in the context of a generative model, if possible. In this way, the statistical methods can be validated either analytically or against synthetic data generated via the use of the model, leading to a consistent method for inferring biological mechanisms from data. On applying the validated methods of data analysis to infer cellular growth on our experimental data, we find the growth of length in E. coli to be non-exponential. Our analysis shows that in the later stages of the cell cycle the growth rate is faster than exponential.

Editor's evaluation

In this manuscript, the authors describe a generative model-based framework to better analyze stochastic growth data, including bacterial cell growth. They show how this framework can be applied to gain insight into the processes underlying these phenomena. This work is well-supported by simulations and data analysis and will likely be of interest to those trying to understand the processes governing bacterial growth, as well as those studying stochastic growth processes in biology more broadly.

https://doi.org/10.7554/eLife.72565.sa0

eLife digest

All cells – from bacteria to humans – tightly control their size as they grow and divide. Cells can also change the speed at which they grow, and the pattern of how fast a cell grows with time is called ‘mode of growth’. Mode of growth can be ‘linear’, when cells increase their size at a constant rate, or ‘exponential’, when cells increase their size at a rate proportional to their current size. A cell’s mode of growth influences its inner workings, so identifying how a cell grows can reveal information about how a cell will behave.

Scientists can measure the size of cells as they age and identify their mode of growth using single cell imaging techniques. Unfortunately, the statistical methods available to analyze the large amounts of data generated in these experiments can lead to incorrect conclusions. Specifically, Kar et al. found that scientists had been using specific types of plots to analyze growth data that were prone to these errors, and may lead to misinterpreting exponential growth as linear and vice versa.

This discrepancy can be resolved by ensuring that the plots used to determine the mode of growth are adequate for this analysis. But how can the adequacy of a plot be tested? One way to do this is to generate synthetic data from a known model, which can have a specific and known mode of growth, and using this data to test the different plots. Kar et al. developed such a ‘generative model’ to produce synthetic data similar to the experimental data, and used these data to determine which plots are best suited to determine growth mode. Once they had validated the best statistical methods for studying mode of growth, Kar et al. applied these methods to growth data from the bacterium Escherichia coli. This showed that these cells have a form of growth called ‘super-exponential growth’.

These findings identify a strategy to validate statistical methods used to analyze cell growth data. Furthermore, this strategy – the use of generative models to produce synthetic data to test the accuracy of statistical methods – could be used in other areas of biology to validate statistical approaches.

Introduction

The last decade has seen a tremendous increase in the availability of high-quality large datasets in biology, in particular in the context of single-cell level measurements. Such data are complementary to ‘bulk’ measurements made over a population of cells. They have led to new biological paradigms and motivated the development of quantitative models (Osella et al., 2017; Facchetti et al., 2017; Ho et al., 2018; Soifer et al., 2016; Jun et al., 2018; Amir and Balaban, 2018; Kohram et al., 2021). Nevertheless, they have also led to new challenges in data analysis, and here we will point out some of the pitfalls that exist in handling such data. In particular, we will show that the commonly used procedure of binning data and linear regression may hint at specific functional relations between the two variables plotted that are inconsistent with the true functional relations. As we shall show, this may come about due to the ‘hidden’ noise sources that affect the binning procedure and the phenomenon of ‘inspection bias’ where certain bins have biased contributions. One of our main take home messages is the significance of having an underlying model (or models) to guide/test/validate data analysis methods. The underlying model is referred to as a generative model in the sense that it leads to similar data to that observed in the experiments. The importance of a so-called generative model has been beautifully advocated in the context of astrophysical data analysis (Hogg et al., 2010), yet biology brings in a plethora of exciting differences: while in physics noise from measurement instruments often dominates, in the biological examples we will dwell on here it is the intrinsic biological noise that can obscure the mathematical relation between variables when not handled properly. In the following, we will illustrate this rather philosophical introduction on a concrete and fundamental example, albeit e pluribus unum. We will focus on the analysis of the Escherichia coli growth curves obtained via high throughput optical microscopy. Nevertheless we anticipate the conceptual points made here – and demonstrated on a particular example of interest – will translate to other types of measurements, which make use of microscopy but also beyond.

Binning corresponds to grouping data based on the value of the x-axis variable, and finding the mean of the fluctuating y-axis variable for this group. By removing the fluctuations of the y-variable, the binning process often aims to expose the ‘true’ functional relation between the two variables which can be used to infer the underlying biological mechanism. While binning may provide a smooth non-linear relation between variables, linear regression is used to find a linear relationship between the variables. In addition to binning, we use the ordinary least squares regression where the slope and the intercept of the best linear fit line are obtained by minimizing the squared sum of the difference between the dependent variable raw data and the predicted value. Here, the best fit/the best linear fit is obtained using the raw data and not the binned data. Similar to binning, the assumption underlying linear regression is that our knowledge of x-axis variable is precise while the noise is in the y-axis variable.

It is important to discuss the sources of fluctuations in the y-axis variable before we proceed. In biology, fluctuations in the variables arise inevitably from the intrinsic variability within a cell population. Cells growing in the same medium and environment have different characteristics (e.g. growth rate) due to the stochastic nature of biochemical reactions in the cell (Kiviet et al., 2014). For example, the division event is controlled by stochastic reactions, whose variability leads to cell dividing at a size smaller or larger than the mean. In this paper, when modeling the data, we will consider the intrinsic noise as the only source of variability and assume that the measurement error is much smaller than the intrinsic variation in the population.

One example of the use of binning and linear regression is shown in Figure 1A where size at division (Ld) vs size at birth (Lb) is plotted using experimental data obtained by Tanouchi et al. for E. coli growing at 25 °C (Tanouchi et al., 2017). In Figure 1A, the functional relation between length at division and length at birth for E. coli is observed to be linear and close to Ld=Lb+ΔL (see the Experimental data section for details). The relation obtained allows us to hypothesize a coarse-grained biological model known as the adder model as shown in Figure 1B in which the length at division is set by addition of length ΔL from birth (Soifer et al., 2016; Harris and Theriot, 2016; Si et al., 2019; Amir, 2014; Campos et al., 2014; Taheri-Araghi et al., 2015; Eun et al., 2018). This previously discussed example demonstrates and reiterates the use of statistical analysis on single-cell data to understand the underlying cell regulation mechanisms. Using statistical methods such as binning and linear regression, other phenomenological models apart from adder have also been proposed in E. coli where the division length (Ld) is not directly ‘set’ by that at birth (Ho and Amir, 2015; Micali et al., 2018; Witz et al., 2019). The phenomenological models, in turn, can be related to mechanistic (molecular-level) models of cell size and cell cycle regulation (Barber et al., 2017). Recent work has shed light on the subtleties involved in interpreting the linear regression results for the Ld vs Lb plot where seemingly adder behavior in length can be obtained from a sizer model (division occurring on reaching a critical size) due to the interplay of multiple sources of variability (Facchetti et al., 2019). This issue is similar in spirit to those we highlight here.

Utility of binning and linear regression.

(A) Length at division (Ld) vs length at birth (Lb) is plotted using data obtained by Tanouchi et al., 2017. Raw data is shown as blue dots. We find the trend in binned data (red) to be linear with the underlying best linear fit (yellow) following the equation, Ld=1.09Lb+2.24μm. This is close to the adder behavior with an underlying equation given by Ld=Lb+ΔL, where ΔL is the mean size added between birth and division (shown as black dashed line). B. A schematic of the adder mechanism is shown where the cell grows over its generation time (Td) and divides after addition of length ΔL from birth. This ensures cell size homeostasis in single cells.

The volume growth of single bacterial cells has been typically assumed to be exponential (Godin et al., 2010; Wang et al., 2010; Campos et al., 2014; Cermak et al., 2016; Soifer et al., 2016; Iyer-Biswas et al., 2014). Assuming ribosomes to be the limiting component in translation, growth is predicted to be exponential and growth rate depends on the active ribosome content in the cell (Scott et al., 2010; Lin and Amir, 2018; Metzl-Raz et al., 2017). Under the assumption of exponential growth, the size at birth (Lb), the size at division (Ld), and the generation time (Td) are related to each other by,

(1) ln(LdLb)=λTd,

where λ is the growth rate. Understanding the mode of growth is important for example, due to its potential effects on cell size homeostasis. Exponentially growing cells cannot employ a mechanism where they control division by timing a constant duration from birth but such a mechanism is possible in case of linear growth (Amir, 2014; Kafri et al., 2016; Ho et al., 2018). Linear regression performed on ln(LdLb) vs λTd plot, where λ is the mean growth rate, was used to infer the mode of growth in the archaeon H. salinarum (Eun et al., 2018), and in the bacteria M. smegmatis (Logsdon et al., 2017) and C. glutamicum (Messelink et al., 2020), for example. If the best linear fit follows the y = x trend, the resulting functional relation might point to growth being exponential. A corollary to this is the rejection of exponential growth when the slope and intercept of the best linear fit deviate from one and zero, respectively (Messelink et al., 2020). Thus, binning and linear regression applied on single-cell data appear to provide information about the underlying biology, in this case, the mode of cellular growth. We will test the validity of such inference by analyzing synthetic data generated using generative models. We find that linear regression performed on the plot ln(LdLb) vs λTd, surprisingly, does not provide information about the mode of growth. Nonetheless, we show that other methods of statistical analysis such as binning growth rate vs age plots are adequate in addressing the problem. Using these validated methods on experimental data, we find that E. coli grows non-exponentially. In later stages of the cell cycle, the growth rate is higher than that in early stages.

Results

Statistical methods like binning and linear regression should be interpreted based on a model

To illustrate the pitfalls associated with binning, we use data from recent experiments on E. coli where the length at birth, the length at division and the generation time were obtained for multiple cells (see Experimental methods and [Tiruvadi-Krishnan et al., 2021]). Phase-contrast microscopy was used to obtain cell length at equal intervals of time. Note that we consider length to reflect cell size in this paper rather than other cell geometry characteristics such as surface area and volume. The length growth rate that we elucidate in the paper can be different from the cell volume growth rate as shown in Appendix 1 assuming a simple cell morphology and exponential growth. Using the same cell morphology, we also find the length growth rate to be identical to cell surface growth rate. To investigate if the cell growth was exponential, we plotted ln(LdLb) vs λTd for cells growing in M9 alanine minimal medium at 28 °C (Td = 214 min). The linear regression of these data yields a slope of 0.3 and an intercept of 0.4 as shown in Figure 2A. The binned data and the best linear fit deviate significantly from the y = x line (see Supplementary file 1). Additionally, the binned data follows a non-linear trend and flattens out at longer generation times. We also found similar deviations in the binned data and best linear fit in glycerol medium (Td = 164 min) shown in Figure 2—figure supplement 1A, and glucose-cas medium (Td = 65 min) shown in Figure 2—figure supplement 1B. Qualitatively similar results have been recently obtained for another bacterium, C. glutamicum, in Messelink et al., 2020. These results might point to growth being non-exponential.

Figure 2 with 2 supplements see all
Plots that could potentially lead to misinterpreting exponential growth.

(A, B) Data is obtained from experiments in M9 alanine medium (Td = 214 min, N = 816 cells). (A) ln(LdLb) vs λTd plot is shown. The blue dots are the raw data, the red correspond to the binned data trend, the yellow line is the best linear fit obtained by performing linear regression on the raw data and the black dashed line is the y = x line. A priori, non-linear trend in binned data might point to growth being non-exponential. (B) λTd vs ln(LdLb) plot is shown for the same experiments. (C, D) Simulations of exponentially growing cells following the adder model are carried out for N = 2500 cells. The parameters used are provided in the Simulations section. (C) ln(LdLb) vs λTd plot is shown. The trend in binned data shown in red is non-linear and the best linear fit of raw data (yellow) deviates from the y = x line (black dashed line). The black dotted line is the expected trend obtained from theory (Equation 2). For parameters used in the simulations here, the black dotted line follows ln(LdLb)=1.26λTd-0.38(λTd)2. (D) λTd vs ln(LdLb) plot is shown with binned data in red and the best linear fit on raw data in yellow closely following the expected trend of y = x line (black dashed line). The theoretical binned data trend (black dotted line) is expected to follow the y = x trend. In all of these plots, the binned data is shown only for those bins with more than 15 data points in them.

Next, we will approach the same problem but with a generative model. We will first show that the ln(LdLb) vs λTd binned plot could not distinguish exponential growth from non-exponential growth. For that purpose, we use a previously studied model (Eun et al., 2018) which considers growth to be exponential with the growth rate distributed normally and independently between cell cycles with mean growth rate λ and standard deviation CVλλ. CVλ is thus the coefficient of variation (CV) of the growth rate and is assumed to be small. To maintain a narrow distribution of cell size, cells must employ regulatory mechanisms. In our model, we assume that, barring the noise due to stochastic biochemical reactions, cells attempt to divide at a particular size Ld given size at birth Lb. Keeping the model as generic as possible, we can write Ld as a function of Lb, f(Lb) which can be thought of as a coarse-grained model for the regulatory mechanism. Amir, 2014 provides a framework to capture the regulatory mechanisms by choosing f(Lb) = 2 Lb1-αL0α. L0 is the typical size at birth and α, which can take values between 0 and 2, reflects the strength of regulation strategy. α = 0 corresponds to the timer model where division occurs on average after a constant time from birth, and α = 1 is the sizer model where a cell divides upon reaching a critical size. α = 1/2 can be shown to be equivalent to the adder model where division is controlled by addition of constant size from birth (Amir, 2014). In addition to the deterministic function (f) specifying division, the size at division is affected by noise (ζλ) in division timing. We assume it has a Gaussian distribution with mean zero and standard deviation σnλ and that it is independent of the growth rate. Thus, the generation time (Td) can be mathematically written as Td=1λln(f(Lb)Lb)+ζλ and is influenced by growth rate noise and division timing noise. Note that replacing the time additive division timing noise with a size additive division timing noise will not affect the results qualitatively (see ‘Model’ and ‘Exponential growth’ sections for details and Supplementary file 1 for variable definitions).

For perfectly symmetrically dividing cells whose sizes are narrowly distributed, we find the trend in the binned data for ln(LdLb) vs λTd plot to be (see section ‘Predicting the results of statistical constructs applied on ln(LdLb) vs λTd and λTd vs ln(LdLb)’),

(2) y=x(1+1xln(2)1+22ασn2CVλ2ln2(2)).

Fixing CVλ = σn = 0.15, we show using simulations in Figure 2C the non-linear trend in the binned data even though we assumed exponential growth. Similarly, on performing linear regression on the raw data of ln(LdLb) vs λTd plot, we find that the slope of the best linear fit is not equal to one and the intercept is non-zero (see Equation 27 and 28 and Figure 2C). Equation 2 shows that the trend in the binned data depends on the ratio of growth rate noise and division timing noise. The slope is equal to one and intercept is zero only if the noise in growth rate is negligible as compared to the division timing noise. In experiments that is rarely the case, hence, the binned data trend and the best linear fit deviate from the y = x line even though growth might be exponential. Thus, we cannot rule out exponential growth in the E. coli experiments despite the binned data trend being non-linear and the best-fit line deviating from the y = x line.

Why does a non-linear relationship in the binned data for the plot ln(LdLb) vs λTd arise even for exponential growth? According to the model, Ld is determined by a deterministic strategy, f(Lb) and a time/size additive division timing noise. The noise component which affects Ld and subsequently the quantity ln(LdLb) is thus the noise in division timing and not the growth rate. The generation time (Td) plotted on the x-axis is influenced by the noise in division timing as well as the noise in growth rate. Binning assumes that for a fixed value of the x-axis variable, the noise from other sources affects only the y-axis variable (the binned variable). Similarly for linear regression, the underlying assumption is that the independent variable on x-axis is precisely known while the dependent variable on the y-axis is influenced by the independent variable and from external factors other than the independent variable. In this case, only λTd plotted on x-axis is influenced by growth rate noise while both λTd and ln(LdLb) are influenced by noise in division time. This does not fit the assumption for binning and linear regression and hence, the best linear fit for ln(LdLb) vs λTd plot might deviate from the y = x line even in the case of exponential growth.

Another way of explaining the deviation from the linear y = x trend is by inspection bias, which arises when certain data is over-represented (Stein and Dattero, 2018). Cells which have a longer generation time than the mean will most likely have a slower growth rate. Thus, in Figure 2A and C, at larger values of λTd or Td, the bin averages are biased by slower growing cells, thus making ln(LdLb) or λTd to be lower than expected. This provides an explanation for the flattening of the trend.

It follows from the previous discussion that if one bins data by ln(LdLb) then the assumption for binning is met. Both of the variables λTd and ln(LdLb) are influenced by the noise in division time but λTd plotted on the y-axis is also influenced by the growth rate noise. Thus, the y-axis variable, λTd is determined by the x-axis variable, ln(LdLb), and an external source of noise, in this case, the growth rate noise. Thus, based on our model, we expect the trend in binned data and linear regression performed on the interchanged axes to follow the y = x trend for exponentially growing cells (see section ‘Predicting the results of statistical constructs applied on ln(LdLb) vs λTd and λTd vs ln(LdLb)’). Indeed, on interchanging the axis and plotting λTd vs ln(LdLb) for synthetic data, we find that the trend in the binned data and the best linear fit closely follows the y = x line (Figure 2D). We also find that the best linear fit follows the y = x line in the case of alanine (Figure 2B), glycerol (Figure 2—figure supplement 1A) and glucose-cas (Figure 2—figure supplement 1B). A change from non-linear behavior to that of linear on interchanging the axes is also observed in a related problem where growth rate (λ) and inverse generation time (1Td) are considered (Figure 2—figure supplement 2 and Section ‘Interchanging axes in growth rate vs inverse generation time plot might lead to different interpretations’).

Thus far, we showed for a range of models where birth controls division that the binned data trend for ln(LdLb) as function of λTd is non-linear and dependent on the noise ratio σnCVλ in the case of exponential growth. On interchanging the axes the binned data trend agrees with the y = x line independent of the growth rate and division time noise. However, we will show next that this agreement with the y = x trend cannot be used as a ‘smoking gun’ for inferring exponential growth from the data.

To investigate this further, let us consider linear growth, which has also been suggested to be followed by E. coli cells (Mitchison, 2005; Abner et al., 2014). The underlying equation for linear growth is,

(3) Ld-Lb=λTd,

where λ is the the elongation speed that is, dLdt. For cells growing linearly, the best linear fit for the plot λTd vs ln(LdLb) is expected to deviate from the y = x line. As before, we fix λ to be the mean of 1Tdln(LdLb), agnostic of the linear mode of growth. Surprisingly, we found that for the class of models where birth controls division by a strategy f(Lb) and cells grow linearly, the best linear fit for λTd vs ln(LdLb) agrees closely with the y = x trend. On carrying out analytical calculations based on this model, we obtain the slope and the intercept of the λTd vs ln(LdLb) plot to be 32ln(2) 1.04 and –0.03 respectively, which is very close to that for exponential growth (see section ‘Differentiating linear from exponential growth’). This is shown for simulations of linear growth with cells following an adder model in Figure 3A. Given no information about the underlying model, Figure 3A could be interpreted as cells undergoing exponential growth contrary to the assumption of linear growth in simulations. Thus, when handling experimental data, cells undergoing either exponential or linear growth might seem to agree closely with the y = x trend. Deforet et al., 2015 used the linear binned data trend in case of λTd vs ln(LdLb) plot to infer exponential growth but as we showed in this section, the linear trend does not rule out linear growth. This again reiterates our message of having a generative model to guide the data analysis methods such as binning and linear regression. For completeness, we also test the utility of ln(LdLb) vs Tdλ and its interchanged axes plots to elucidate the mode of growth (Appendix 2). We find that binning and linear regression applied on these plots can not differentiate between exponential and linear growth.

Figure 3 with 3 supplements see all
Differentiating linear growth from exponential growth.

(A) λTd vs ln(LdLb) plot is shown for simulations of linearly growing cells following the adder model for N = 2500 cell cycles. The binned data (red) and the best linear fit on raw data (yellow) closely follows the y = x trend (black dashed line) which could be incorrectly interpreted as cells undergoing exponential growth. (B) The binned data trend for growth rate vs age plot is shown as purple circles for simulations of N = 2500 cell cycles of exponentially growing cells following the adder model. We observe the trend to be nearly constant as expected for exponential growth (purple dotted line). Since the growth rate is fixed at the beginning of each cell cycle in the above simulations, we do not show error bars for each bin within the cell cycle. Also shown as green squares is the growth rate vs age plot for simulations of N = 2500 cell cycles of linearly growing cells following the adder model. As expected for linear growth, the binned growth rate decreases with age as λ11+age (green dotted line). The binned growth rate trend (shown as magenta diamonds) is also found to be nearly constant as expected (shown as magenta dotted line) for the simulations of exponentially growing cells following the adder per origin model. We also show that the binned growth rate trend (red triangles) increases for simulations of the adder model with the cells undergoing faster than exponential growth. The trend is in agreement with the underlying growth rate function (shown as red dotted line) used in the simulations of super-exponential growth. Thus, the plot growth rate vs age provides a consistent method to identify the mode of growth. Parameters used in the above simulations of exponential, linear and super-exponential growth are derived from the experimental data in alanine medium. Details are provided in the Simulations section.

To conclude the discussion of linear growth, we note that the natural plot for this growth regime is λlinTd vs ld-lb and the plot obtained on interchanging the axes (see the Linear growth section and Figure 3—figure supplement 1A and B). Here lb, ld and λlin are defined to be quantities Lb, Ld and λ, respectively, normalized by the mean length at birth. For cells growing exponentially, the best linear fit for the λlinTd vs ld-lb plot is expected to deviate from the y = x line. This is indeed what is observed in Figure 3—figure supplement 1 where simulations of exponentially growing cells following the adder model are presented (see ‘Differentiating linear from exponential growth’ for extended discussion).

In all the cases above, the problem at hand deals with distilling the biologically relevant functional relation between two variables. However, the data is assumed to be subjected to fluctuations of various sources, and it is important to ensure that the statistical construct we are using (e.g. binning) is robust to these. How can we know a priori whether the statistical method is appropriate and a ‘smoking gun’ for the functional relation we are conjecturing? The examples shown above suggest that performing statistical tests on synthetic data obtained using a generative model is a convenient and powerful approach. Note that in cases such as the ones studied here where analytical calculations may be performed, one may not even need to perform any numerical simulations to test the validity of the methods.

Growth rate vs age plots are consistent with the underlying growth mode

In the last section, we showed that the plots ln(LdLb) vs λTd and λTd vs ln(LdLb) are not decisive in identifying the mode of growth. Recent works on B. subtilis (Nordholt et al., 2020) and fission yeast (Knapp et al., 2019) have used differential methods of quantifying growth namely growth rate ( = 1LdLdt) vs age plots and elongation speed (=dLdt) vs age plots to probe the mode of growth within a cell cycle. Here, L denotes the size of the cell after time t from birth in the cell cycle and age denotes the ratio of time t to Td within a cell cycle (hence it ranges from 0 to 1 by construction within a cell cycle). In this section, using various models of cell growth and cell cycle, we test the growth rate vs age method. Note that the growth rate vs age and the elongation speed vs age plots are not dimensionless unlike the previous plots. Using the growth rate vs age and elongation speed vs age plots, we aim to quantify the growth rate changes within a cell cycle. For cells assumed to be growing exponentially, growth rate is constant throughout the cell cycle. On averaging over multiple cell cycles, the trend of binned data is expected to be a horizontal line with value equal to mean growth rate which is indeed what we find in the numerical simulations of the adder and the adder per origin model (Ho and Amir, 2015), as shown in Figure 3B. The binned data trend in each of the models matches the theoretical predictions of growth rate (shown as dotted lines). In contrast, for linearly growing cells, the elongation speed is expected to remain constant. We show this constancy using numerical simulations of linearly growing cells following the adder model (Figure 3—figure supplement 3A). In accordance with this result, the growth rate is expected to decrease with cell age as λ11+age. This is verified in Figure 3B by again using the numerical simulations of linear growth with cells following the adder model. The binned data trend for linear growth (green squares) matches the theoretical predictions of λ11+age (green dotted line).

Thus, the two growth modes (exponential and linear) could be differentiated using the growth rate vs age plot (for details see ‘Growth rate vs age and elongation speed vs age plots’ section). However, the growth rate vs age plots can be used to infer the mode of growth beyond the two discussed above. We show this by using simulations of cells following the adder model and undergoing faster than exponential or super-exponential growth (see the Simulations section for details). In such a case, the growth rate is expected to increase. This increase in growth rate is shown in Figure 3B using simulations. The binned data trend (red triangles) again matches the growth rate mode used in the simulations (red dotted line). Thus, the growth rate vs age plots are a consistent method to distinguish linear from exponential and super-exponential growths.

Using the validated growth rate vs age plots, we obtained the growth rate trend for experimental data on E. coli for the three growth conditions studied in this paper (Figure 4A-C). We found an increase in growth rate in all growth conditions during the course of the cell cycle. One may wonder whether such an increase may be explained by the E. coli morphology alone, due to the presence of hemispherical poles. For exponentially growing cell volume and considering a geometry of E. coli with spherical caps at the poles, the percentage increase in the growth rate of length over a cell cycle is around 3 % which is significantly smaller than that observed in our experimental data. Considering cell size trajectories (cell size, L at time, t data) where cell lengths were tracked beyond the cell division event (by considering cell size in both daughter cells), we also found that the growth rate decreases close to division (age ≈ 1) and returns to a value nearly equal to that observed at the beginning of cell cycle (age ≈ 0) as shown in Figure 4—figure supplement 1 (see ‘Growth rate vs age and elongation speed vs age plots’ section for extended discussion).

Figure 4 with 2 supplements see all
Growth rate vs age obtained from experiments: Growth rate vs age plots are shown for E. coli experimental data.

The red dots correspond to the binned data trends showing the variation in growth rate. The medium in which the experiments were conducted are (A) Alanine (Td = 214 min) (B) Glycerol (Td = 164 min) (C) Glucose-cas (Td = 65 min). The error bars show the standard deviation of the growth rate in each bin scaled by 1N, where N is the number of cells in that bin. The dashed vertical lines mark the age at initiation of DNA replication (left line) and the start of septum formation (right line). In case of glucose-cas, the initiation age is not marked as it occurs in the mother cell.

The above question of mode of growth within a cell cycle can also be analyzed in relation to a specific event. Several studies have pointed to a change in growth rate at the onset of constriction (Reshes et al., 2008; Banerjee et al., 2017). This change in growth rate can be probed using growth rate vs time plots where time is taken relative to the onset of constriction as shown in Figure 4—figure supplement 2. These plots show a decrease in growth rates at the two extremes of the plot. These decreases are due to inspection bias, where the growth rate trend is affected by the biased contribution of cells with a higher than average generation time or equivalently slower growth rate (see ‘Growth rate vs time from specific event plots are affected by inspection bias’ section for extended discussion). Inspection bias is also observed when timing is considered relative to other cell events such as cell birth (see ‘Growth rate vs time from specific event plots are affected by inspection bias’ section and Figure 3—figure supplement 2).

It might not always be possible to obtain growth rate trajectories as a function of time/cell age. Godin et al. instead obtained the instantaneous biomass growth speed (dMdt) as a function of its buoyant mass (M) (Godin et al., 2010). On applying linear regression for instantaneous mass growth speed vs mass, we expect the slope of the best linear fit obtained to provide the average growth rate (1MdMdt) under the assumption of exponential growth while for linear growth the intercept provides the average growth speed. Using this method, biomass was suggested to be growing exponentially. This method can be applied to study the length growth rate within the cell cycle by plotting elongation speed as a function of length (Cadart et al., 2019). We find that the binned data trend and the best linear fit of this plot follow the expected trend for linear and exponential growth as shown in Figure 3—figure supplement 3B and Figure 3—figure supplement 3D, respectively, for a cell cycle model where division is controlled via an adder mechanism from birth. However, the trend obtained appears to be model-dependent as shown in Figure 3—figure supplement 3F where the underlying cell cycle model used in the simulations is the adder per origin model. For this model, the binned data trend is found to be non-linear with the growth rate speeding up at large sizes, despite the synthetic data being generated for perfectly exponential growth. This non-linear trend can lead to growth rate being misinterpreted as non-exponential within the cell cycle (see ‘Results of elongation speed vs size plots are model-dependent’ section for details). Thus, an analysis using the elongation speed vs size plot must be accompanied with an underlying cell cycle model.

In summary, we found that the growth rate vs age plot was a consistent method to determine the changes in growth rate within a cell cycle. Unlike the growth rate vs age plots, the inference from the growth rate vs size plots was found to be model-dependent. Using the growth rate vs age plots, we show that the length growth of E. coli can be faster than exponential.

Discussion

Statistical methods such as binning and linear regression are useful for interpreting data and generating hypotheses for biological models. However, we show in this paper that predicting the relationships between experimentally measured quantities based on these methods might lead to misinterpretations. Constructing a generic model and verifying the statistical analysis on the synthetic data generated by this model provides a more rigorous way to mitigate these risks.

In the paper, we provide examples in which ln(LdLb) vs λTd and λTd vs ln(LdLb) plots fail as a method to infer the mode of growth. The binned data trend and the best linear fit for the ln(LdLb) vs λTd plot was found to be dependent upon the noise parameters in the class of models where birth controlled division (Equation 2). We also show that λTd vs ln(LdLb) plot could not differentiate between exponential and linear modes of growth (Figures 2D and 3A). Thus, we conclude that the best linear fit for the above plots might not be a suitable method to infer the mode of growth but they are just one of the many correlations which the correct cell cycle model should be able to predict.

We found growth rate vs age and elongation speed vs age plots to be consistent methods to probe growth within a cell cycle. The method was validated using simulations of various cell cycle models (such as the adder, and adder per origin model, where in the latter, control over division is coupled to DNA replication) and the binned growth rate trend agreed closely with the underlying mode of growth for the wide range of models considered (Figure 3B). In the case of growth rate vs time plots, it was important to take into consideration the effects of inspection bias. We used cell cycle models to show the time regimes where inspection bias could be observed (Figure 3—figure supplement 2). In the regime with negligible inspection bias, we could reconcile the growth rate trend obtained using growth rate vs age (Figure 4A-C) and growth rate vs time plots (Figure 4—figure supplement 2). The authors in Messelink et al., 2020 circumvent inspection bias in the elongation speed vs time from birth plots by focusing their analysis on the time period from cell birth to the generation time of the fastest dividing cell. The authors of Panlilio et al., 2021, while investigating the division behavior in the cells undergoing nutrient shift within their cell cycle, use both models and experimental data from steady-state conditions to identify inspection bias. These serve as good examples of using models to aid data analysis.

Statistics obtained from linear regression such as in Figure 1A help narrow down the landscape of cell cycle models, but many have potential pitfalls lurking which might lead to misinterpretations (Figures 2C and 3A). There are additional issues beyond those concerning linear regression and binning discussed here. For example, Willis and Huang, 2017 discusses Simpson’s paradox (Simpson, 1951) where distinct cellular sub-populations might lead to erroneous interpretation of cell cycle mechanisms. Examples of such distinct sub-populations are found in asymmetrically dividing bacteria such as M. smegmatis (Aldridge et al., 2012; Logsdon et al., 2017). Another source of misinterpretation could arise from presence of measurement errors. Throughout this work, we deal with intrinsic noise and neglect measurement error. However, when measurement noise affects both x-axis and y-axis variables, the slope of the best linear fit is biased towards zero. This can lead to potentially related variables being misinterpreted as uncorrelated. Measurement errors can, however, be handled based on a model. Using a model which includes measurement error as a source of noise, we can guide the binning analysis. Using this methodology, we verified that typical measurement errors (0.02Lb) Messelink et al., 2020; Kaiser et al., 2018 have negligible effects on the growth rate trends obtained from the experimental data used in our work.

Single cell size in E. coli has been reported to grow exponentially (Campos et al., 2014; Wang et al., 2010; Cermak et al., 2016; Soifer et al., 2016; Iyer-Biswas et al., 2014; Godin et al., 2010), linearly (Mitchison, 2005), bilinearly (Kubitschek, 1981) or trilinearly (Reshes et al., 2008). These are inconsistent with our observations in Figure 4A-C where we find that growth can be super-exponential. The non-monotonic behavior in the fastest-growth condition is reminiscent of the results reported in Nordholt et al., 2020 for B. subtilis. The authors of Nordholt et al., 2020 attribute the increase in growth rate to a multitude of cell cycle processes such as initiation of DNA replication, divisome assembly, septum formation. In the two slower growth conditions (Figure 4A-B), we find that the growth rate increase starts before the time when the septal cell wall synthesis starts i.e., the constriction event. However, in the fastest growth condition (Figure 4C), the timing of growth rate increase seems to coincide with the onset of constriction which is in agreement with previous findings (Reshes et al., 2008; Banerjee et al., 2017).

It is important to distinguish between length growth and biomass growth. Oldewurtel et al., 2021 measures biomass and cell volume and finds the mass-density variations within the cell-cycle to be small. In this paper, since we observe the length growth to be non-exponential (Figure 4), it remains to be seen whether biomass growth also follows a similar non-exponential behavior or if it is exponential as previously suggested (Godin et al., 2010; Oldewurtel et al., 2021).

In conclusion, the paper draws the attention of the readers to the careful use of statistical methods such as linear regression and binning. Although shown in relation to cell growth, this approach to data analysis seems ubiquitous. The general framework of carrying out data analysis is presented in Figure 5. It proposes the construction of a generative model based on the experimental data collected. Of course, we do not always know whether the model used is an adequate description of the system. What is the fate of the methodology described here in such cases? First, we should be reminded of Box’s famous quote ‘all models are wrong, some are useful’. The goal of a model is not to provide as accurate a description of a system as possible, but rather to capture the essence of the phenomena we are interested in and stimulate further ideas and understanding. In our context, the goal of the model is to provide a rigorous framework in which data analysis tools can be critically tested. If verified within the model, it is by no means proof of the success of the model and the method itself, and further comparisons with the data may falsify it leading to the usual (and productive) cycle of model rejection and improvement via comparison with experiments. However, if the best model we have at hand shows that the data analysis method is non-informative, as we have shown here on several methods used to identify the mode of growth, then clearly we should revise the analysis as it provides us with a non-consistent framework, where our modeling is at odds with our data analysis. Furthermore, testing the methods on a simplified model is still advantageous compared with the option of using the methods without any validation. To mitigate the risk of using irrelevant models, in some cases it may be desirable to test the analysis methods on as broad a class of models as possible as we have done in the paper, for example by our use of a general value of α to describe the size-control strategy within our models. Thus, guided by the model, the data analysis methods can be ultimately applied to experimental data and underlying functional relationships can be inferred. Reiterating the message of the authors in Hogg et al., 2010, the data analysis using this framework aims to justify the methods being used, thus, reducing arbitrariness and promoting consensus among the scientists working in the field.

A flowchart of the general framework proposed in the paper to carry out data analysis.

Materials and methods

Experimental methods

Request a detailed protocol

Strain engineering: STK13 strain (ΔftsN::frt-Ypet-FtsN, ΔdnaN::frt-mCherry-dnaN) is derivative of E. coli K12 BW27783 (CGSC#: 12119) constructed by λ-Red engineering (Datsenko and Wanner, 2000) and by P1 transduction (Thom, 2007). For chromosomal replacement of ftsN with fluorescence derivative, we used primers carrying 40nt tails with identical sequence to the ftsN chromosomal locus and a plasmid carrying a copy of ypet preceded by a kanamycin resistance cassette flanked by frt sites (frt-kanR-frt-Ypet-linker) as PCR template (a kind gift from R. Reyes-Lamothe McGill University, Canada; Reyes-Lamothe et al., 2010). The resulting PCR product was transformed by electroporation into a strain carrying the λ-Red-expressing plasmid pKD46. Colonies were selected by kanamycin resistance, verified by fluorescence microscopy and by PCR using primers annealing to regions flanking ftsN gene. After removal of kanamycin resistance by expressing the Flp recombinase from plasmid pCP20 (Cherepanov and Wackernagel, 1995), we transferred the mCherry-dnaN gene fusion (BN1682 strain; a kind gift from Nynke Dekker from TUDelft, The Netherlands, Moolman et al., 2014) into the strain by P1 transduction. To minimize the effect of the insertion on the expression levels of the gene we removed the kanamycin cassette using Flp recombinase expressing plasmid pCP20.

Cells growth, preparation, and culturing E. coli in mother machine microfluidic devices: All cells were grown and imaged in M9 minimal medium (Teknova) supplemented with 2 mM magnesium sulfate (Sigma) and corresponding carbon sources at 28 °C. Three different carbon sources were used: 0.5 % glucose supplemented by 0.2 % casamino acids (Cas) (Sigma), 0.3 % glycerol (Fisher), and 0.3 % alanine (Fisher) supplemented with 1 x trace elements (Teknova).

For microscopy, we used mother machine microfluidic devices made of PDMS (polydimethylsiloxane). These were fabricated following to previously described procedure (Yang et al., 2018). To grow and image cells in microfluidic device, we pipetted 2–3 µl of resuspended concentrated overnight culture of OD600∼ 0.1 into main flow channel of the device and let cells to populate the dead-end channels. Once these channels were sufficiently populated (about 1 hr), tubing was connected to the device, and the flow of fresh M9 medium with BSA (0.75 µg/ml) was started. The flow was maintained at 5 µl/min during the entire experiment by an NE-1000 Syringe Pump (New Era Pump Systems, NY). To ensure steady-state growth, the cells were left to grow in channels for at least 14 hr before imaging started.

Microscopy: A Nikon Ti-E inverted epifluorescence microscope (Nikon Instruments, Japan) with a 100 X (NA = 1.45) oil immersion phase contrast objective (Nikon Instruments, Japan), was used for imaging the bacteria. Images were captured on an iXon DU897 EMCCD camera (Andor Technology, Ireland) and recorded using NIS-Elements software (Nikon Instruments, Japan). Fluorophores were excited by a 200 W Hg lamp through an ND8 neutral density filter. A Chroma 41,004 filtercube was used for capturing mCherry images, and a Chroma 41,001 (Chroma Technology Corp., VT) for Ypet images. A motorized stage and a perfect focus system were utilized throughout time-lapse imaging. Images in all growth conditions were obtained at 4 min frame rate.

Image analysis: Image analysis was carried out using Matlab (MathWorks, MA) scripts based on Matlab Image Analysis Toolbox, Optimization Toolbox, and DipImage Toolbox (https://www.diplib.org/). Cell lengths were determined based on segmented phase contrast images. Dissociation of Ypet-FtsN label from cell middle was used to determine the exact timing of cell divisions.

Further experimental details can also be found in Tiruvadi-Krishnan et al., 2021.

Model

Request a detailed protocol

Consider a model of cell cycle characterized by two events: cell birth and division. In our model, we assume that, barring the noise, cells tend to divide at a particular size vd given size at birth vb, via some regulatory mechanism. Hence, we can write vd as a function of vb, f(vb). Amir, 2014 provides a framework to capture the regulatory mechanisms by choosing f(vb) = 2 vb1-αv0α. v0 is the typical size at birth and α captures the strength of regulation strategy. α = 0 corresponds to the timer model where division occurs after a constant time from birth, and α = 1 is the sizer where a cell divides on reaching a critical size. α = 1/2 can be shown to be equivalent to an adder where division is controlled by addition of constant size from birth (Amir, 2014). From here on, we would be using the length of the cell (Lb, Ld, etc.) as a proxy for size (vb, vd, etc.). To reiterate, the length growth is not the same as cell volume growth as shown in Appendix 1. All of the variable definitions are summarized in Supplementary file 1. We also define lb=LbLb and ld=LdLb. Using this, we can write the division strategy f(lb) to be ld = f(lb) = 2 lb1-α. The total division size obtained will be a combination of f(lb) and noise in the division timing, the source of which could be the stochasticity in biochemical reactions controlling division.

We will assume that division is perfectly symmetric i.e., size at birth in the (n+1)th generation (lbn+1) is half of size at division in the nth generation (ldn). Using the size additive division timing noise (ζs(0,σbd)) and f(lb) specified above, we obtain,

(4) xn+1=(1-α)xn+ln(1+ζs(0,σbd)2(1+xn)1-α),

where xn = ln(lbn). Size at birth (Lb) is narrowly distributed, hence lb1 and we can write x = ln(lb) = ln(1+δ) where δ is a small number. We obtain x1 and,

(5) xδ=lb-1.

The size additive noise, ζs(0,σbd) is assumed to be small and has a normal distribution with mean 0 and standard deviation σbd. Note that σbd is a dimensionless quantity. Since ζs(0,σbd) is assumed to be small and xn1, we can Taylor expand the last term of Equation 4 to first order,

(6) xn+1(1-α)xn+ζs(0,σbd)2.

Equation 6 shows a recursive relation for cell size and it is agnostic of the mode of growth. We will show later for exponential growth that replacing the size additive noise with time additive noise does not change the structure of Equation 6.

Exponential growth

Next, we will try to obtain the generation time (Td) in the case of exponentially growing cells. For exponential growth, the time at division Td is given by,

(7) Td=1λln(LdLb).

For simplicity, we will assume a constant growth rate (λ) within the cell-cycle. Growth rate is fixed at the start of the cell-cycle and is given by λ = λ + λξ(0,CVλ), where λ is the mean growth rate and ξ(0,CVλ) is assumed to be small with a normal distribution that has mean 0 and standard deviation CVλ . CVλ denotes the coefficient of variation (CV) of the growth rate. This captures the variability in growth rate within cells arising from the stochastic nature of biochemical reactions occurring within the cell.

Size additive noise

Request a detailed protocol

Here we will calculate the generation time using the division strategy f(lb) and a size additive division timing noise (ζs(0,σbd)) as described previously. On substituting Ld=(f(lb)+ζs)Lb into Equation 7 we obtain,

(8) Td=1λ+λξ(0,CVλ)ln(2lb1-α+ζs(0,σbd)lb),

where the size additive noise (ζs(0,σbd)) is Gaussian with mean 0 and standard deviation σbd.

The noise ζs(0,σbd) is assumed to be small, and we obtain to first order,

(9) Td1λ(ln(2)-αxn+ζs(0,σbd)2(1+xn)1-α).

Since xn 0, on Taylor expanding 1(1+xn)1-α to first order,

(10) Td1λ(ln(2)-αxn+ζs(0,σbd)2(1+(1-α)xn)).

Assuming noise in growth rate to be small and expanding to first order, we obtain,

(11) Td1λ(ln(2)-αxn-ln(2)ξ(0,CVλ)+ζs(0,σbd)2).

Equation 11 gives the generation time for the class of models where birth controls division under the assumption that growth is exponential.

Time additive noise

Request a detailed protocol

Next, we ensure that the recursive relation for size at birth and the expression for the generation time given by Equations 6 and 11, respectively, are robust to the nature of noise assumed. In this section, the generation time is obtained using the division strategy f(lb) as described previously along with a time additive division timing noise (ζλ). In such a case, Td is obtained to be,

(12) Td=1λ(ln(2)-αxn)+ζ(0,σn)λ.

The time additive noise, ζ(0,σn)λ, is assumed to be small and has a normal distribution with mean 0 and standard deviation σnλ. Note that σn is a dimensionless quantity.

Assuming noise in growth rate to be small, we find Td to first order to be,

(13) Td1λ(ln(2)-αxn-ln(2)ξ(0,CVλ)+ζ(0,σn)).

Equation 13 is same as Equation 11, if the time additive noise term, ζ(0,σn), in Equation 12 is replaced by ζs(0,σbd)/2. Using Equation 13, the variance in Td (σt2) is,

(14) σt2=1λ2(ln2(2)CVλ2+2σn22-α).

For exponential growth, we also find,

(15) ln(LdLb)=xn+1-xn+ln(2)=λTd.

On substituting Equation 12 into Equation 15 we obtain to first order,

(16) xn+1(1-α)xn+ζ(0,σn).

On replacing the time additive noise term, ζ(0,σn), in Equation 16 with ζs(0,σbd)/2, we recover the recursive relation for size at birth obtained in the case of size additive noise shown in Equation 6. Hence, the model is insensitive to noise being size additive or time additive with a simple mapping for going from one noise type to another in the small noise limit.

At steady state, x has a normal distribution with mean 0 and variance σx2 whose value is given by,

(17) σx2=σn2α(2-α).

We note that some of the derivations above have also been presented in Eun et al., 2018, but are provided here for completeness.

Predicting the results of statistical constructs applied on ln(LdLb) vs λTd and λTd vs ln(LdLb)

Request a detailed protocol
Obtaining the best linear fit
Request a detailed protocol

Next, we calculate the equation for the best linear fit for the choice of ln(LdLb) as y-axis and λTd as x-axis and vice versa. For simplicity, in this section, we will consider time additive division timing noise. However, the results obtained here will hold for size additive noise as well because the model is robust to the type of noise added as shown in the previous section.

First, we calculate the correlation coefficient (ρexp) for ln(LdLb) and time of division Td,

(18) ρexp=(ln(LdLb)ln(LdLb))(TdTd)σlσt,

where σl is the standard deviation in ln(LdLb). Using Equations 15 and 16 we obtain,

(19) ln(LdLb)ln(2)-αxn+ζ(0,σn).

Substituting Equations 13 and 19 into the numerator of Equation 18,

(20) (ln(LdLb)-ln(LdLb))(Td-Td)=(-αxn+ζ(0,σn))(-αxn-ln(2)ξ(0,CVλ)+ζ(0,σn))λ.

As the terms ζ(0,σn), ξ(0,CVλ) and xn are independent of each other, ξ(0,CVλ)ζ(0,σn) = 0, ξ(0,CVλ)xn = 0 and xnζ(0,σn) = 0. Equation 20 simplifies to,

(21) (ln(LdLb)-ln(LdLb))(Td-Td)=(α2σx2+σn2)1λ.

The variance of ln(LdLb) obtained using Equation 19 is,

(22) σl2=α2σx2+σn2=2σn22-α.

Inserting Equations 14, 21 and 22 into Equation 18, we get,

(23) ρexp=11+(1-α2)ln2(2)CVλ2σn2.

The slope of a linear regression line is given by,

(24) m=ρσyσx,

where σx, σy, and ρ are the standard deviation of the x-variable, the standard deviation of the y-variable and the correlation coefficient of the (x,y) pair, respectively. The intercept is,

(25) c=y-mx.

On the x-axis, we plot λTd and the y-axis is chosen as ln(LdLb). The slope for this choice (mtl) can be calculated by,

(26) mtl=ρexpσlσtλ.

On substituting the values we get,

(27) mtl=11+(1α2)ln2(2)CVλ2σn2.

Only for CVλσn we would expect a slope close to 1.

The intercept (ctl) for the ln(LdLb) vs λTd plot is given by,

(28) ctl=ln(LdLb)mtlλTd=ln(2)(111+(1α2)ln2(2)CVλ2σn2).

However, if we choose the x-axis as ln(LdLb) and the y-axis is chosen as λTd, we obtain the slope mlt,

(29) mlt=ρexpσtλσl.

On substituting the values we obtain mlt = 1 independent of the noise parameters and find that the intercept is zero.

Non-linearity in binned data
Request a detailed protocol

In the Main text, for the plot ln(LdLb) vs λTd, we find the binned data to be non-linear (see Figure 2C of the Main text). In this section, we explain the non-linearity observed using the model developed in the previous sections.

Binning data based on the x-axis means taking an average of the y-variable conditioned on the value of the x-variable. Mathematically, this amounts to calculating E[yx] i.e., the conditional expectation of the y-variable given that x is fixed. In our case, we need to calculate E[ln(LdLb)λTd] . ln(LdLb)= λTd by definition of exponential growth, hence,

(30) E[ln(LdLb)λTd]=E[λTdλTd].

Since Td is fixed, this is equivalent to calculating E[λTd]. Using Equation 13,

(31) E[λTd]=λp(x,ξ,ζ) δ(Td(ln(2)λαxλln(2)ξλ+ζλ) dx dξ dζ)p(x,ξ,ζ) δ(Td(ln(2)λαxλln(2)ξλ+ζλ) dx dξ dζ).

p(x,ξ,ζ) is the joint probability distribution of x and noise parameters ξ and ζ. Since, they are independent of each other, the joint distribution is product of the individual distributions f1(x), f2(ξ) and f3(ζ), the distributions being Gaussian with mean 0 and standard deviation σx, CVλ and σn, respectively. σx, σn are related by Equation 17. Since x, ξ, and ζ are narrowly distributed around zero, the contribution from large positive or negative values is extremely small. This ensures that Td is also close to its mean and non-negative despite the limits of the integral being - to . Using λ = λ+λξ(0,CVλ) in Equation 31,

(32) E[λTd]=λ(1+ξf1(x)f2(ξ)f3(ζ) δ(Td(ln(2)λαxλln(2)ξλ+ζλ)) dx dξ dζf1(x)f2(ξ)f3(ζ) δ(Td(ln(2)λαxλln(2)ξλ+ζλ)) dx dξ dζ).

On evaluating the integrals, we obtain,

(33) E[λTd]=λ(1+11+22ασn2CVλ2ln2(2)λTdln(2)1+22ασn2CVλ2ln2(2)).

Thus, the trend of binned data is found to be,

(34) E[ln(LdLb)λTd]=λTd(1+11+22ασn2CVλ2ln2(2)λTdln(2)1+22ασn2CVλ2ln2(2)).

In the regime CVλσn , the last two terms on the RHS of Equation 34 vanish and the binned data follows the trend y = x.

For the λTd vs ln(LdLb) plot, we need to calculate E[λTdln(LdLb)]. Using Equations 13 and 19, we obtain,

(35) λTd=ln(LdLb)-ln(2)ξ(0,CVλ).

ln(LdLb) is independent of ξ(0,CVλ). Using this, we can write E[λTdln(LdLb)] as,

(36) E[λTdln(LdLb)]=(λTd) f2(ξ) f4(ln(LdLb)) δ(λTd (ln(LdLb)ln(2)ξ)) d(λTd) dξf4(ln(LdLb)).

Note that the integral over λTd goes from - to although λTd cannot be negative. As before, this is not an issue because we assume λTd to be tightly regulated around ln(2) and the contribution to the integral from - to 0 is negligible. f4(ln(LdLb)) denotes the probability distribution for ln(LdLb), the distribution being Gaussian with mean ln(2), and standard deviation σl which is calculated in Equation 22. Putting the Gaussian form of f2(ξ) into the integral and simplifying we get,

(37) E[λTdln(LdLb)]=ln(LdLb).

The trend of binned data to first order in noise and x is E[λTdln(LdLb)] = ln(LdLb). This is shown in Figure 2D of the Main text where the binned data follows the y = x line.

Linear growth

Request a detailed protocol

In this section, we will focus on finding the equation of the best linear fit for relevant plots in the case of linear growth. The time at division for linear growth is given by,

(38) Td=Ld-Lbλ.

Note that λ has units of [length/time] and is defined as the elongation speed. This is different from the exponential growth rate which has units [1/time]. Here, we will work with the normalized length at birth (lb) and division (ld),

(39) Td=ld-lbλlin.

Consider the normalized elongation speed to be λlin = λlin+λlinξlin(0,CVλ,lin), where λlin is the mean normalized elongation speed for a lineage of cells and ξlin(0,CVλ,lin) is normally distributed with mean 0 and standard deviation CVλ,lin. Thus, the CV of elongation speed is CVλ,lin. The regulation strategy which the cell undertakes is equivalent to that in previous sections and is given by g(lb) = 2+2(1-α)(lb-1). Note that we can obtain g(lb) by Taylor expanding f(lb) around lb = 1. Using the regulation strategy g(lb) and adding a size additive noise ζs(0,σbd) which is independent of lb, we find,

(40) Td=2+2(1α)(lbn1)+ζs(0,σbd)lbnλlin(1+ξlin(0,CVλ,lin)).

Note that we chose size additive division timing noise (ζs(0,σbd)) for convenience in this section. However, it can be shown as done previously that the model is robust to the noise in division timing being size additive or time additive. Assuming that the noise terms ξlin(0,CVλ,lin) and ζs(0,σbd) are small, we obtain to first order,

(41) Td(1-2α)(lb-1)+1+ζs(0,σbd)-ξlin(0,CVλ,lin)λlin.

The terms lb, ζs(0,σbd) and ξlin(0,CVλ,lin) are independent of each other. The standard deviation of Td (σt) can be calculated to be,

(42) σt2=(1-2α)2σb2+σbd2+CVλ,lin2λlin2.

Assuming perfectly symmetric division and using ldn=g(lbn)+ζs(0,σbd), we find the recursive relation for lbn to be,

(43) ldn-lbn=2lbn+1-lbn=(1-2α)lbn+2α+ζs(0,σbd).

Note that Equation 43 is the same as Equation 6 under the approximation xn=lbn-1. At steady state, the standard deviation of lb is denoted by σb and using Equation 43 its value is obtained to be,

(44) σb2=σbd24α(2-α).

Similarly, the standard deviation of ld-lb, or equivalently λlinTd, denoted by σl,lin, is calculated to be,

(45) σl,lin2=4α+14α(2-α)σbd2.

For linear growth, a natural plot is ld-lb vs λlinTd (reminiscent of the ln(LdLb) vs λTd plot for exponential growth). To calculate the slope of the best linear fit, we have to calculate the correlation coefficient ρlin given by,

(46) ρlin=(ld-lb-ld-lb)(λlinTd-λlinTd)λlinσl,linσt.

Again using the independence of terms lb, ζs(0,σbd) and ξlin(0,CVλ,lin) from each other, we get,

(47) ρlin=(1-2α)2σb2+σbd2λlinσl,linσt=σl,linλlinσt.

The slope of best linear fit for the plot ld-lb vs λlinTd is given by,

(48) mtl,lin=ρlinσl,linλlinσt=11+CVλ,lin24α(2α)σbd2(4α+1).

The intercept ctl,lin is found to be,

(49) ctl,lin=ldlbmtl,linλlinTd=111+CVλ,lin24α(2α)σbd2(4α+1).

On flipping the axis, the slope (mlt,lin) for the plot λlinTd vs ld-lb is obtained to be,

(50) mlt,lin=ρlinλlinσtσl,lin=1.

The intercept clt,lin is found to be,

(51) clt,lin=λlinTd-mlt,linld-lb=0.

The best linear fit for the λlinTd vs ld-lb plot follows the trend y = x.

Simulations of the adder model for linearly growing cells were carried out. The deviation of the best linear fit for the ld-lb vs λlinTd plot from the y = x line is shown in Figure 3—figure supplement 1A, while in Figure 3—figure supplement 1B, the best linear fit for the plot λlinTd vs ld-lb is shown to agree with the y = x line.

Differentiating linear from exponential growth

Request a detailed protocol

In this section, we explore the equation for the best linear fit of λlinTd vs ld-lb plot in the case of exponential growth and λTd vs ln(LdLb) plot for linear growth. Intuitively, we expect the best linear fit in both cases to deviate from the y = x line. In this section, we will calculate the best linear fit explicitly. Surprisingly, we will find that, in the case of linear growth, the best linear fit for the λTd vs ln(LdLb) plot follows the y = x line closely.

Let us begin with exponential growth with growth rate, λ = λ + λξ(0,CVλ) as defined previously. Again, ξ(0,CVλ) has a normal distribution with mean 0 and standard deviation CVλ, it being the CV of the growth rate. The time at division is given by Equation 7. The average growth rate λ = ln(2)Tdln(2)Td. For exponential growth, we will plot λlinTd vs ld-lb. As previously defined, λlin is the mean normalized elongation speed and λlin=1Td1Td. λlin is related to λ by,

(52) λlin=λln(2).

ld-lbcan be calculated by using the regulation strategy f(lb) introduced in the Model section and a normally distributed size additive noise ζs(0,σbd). Note that we have chosen the noise in division timing to be size additive. However, the model is robust to the choice of type of noise as we showed in the Exponential growth section. Using Equations 5 and 6 we obtain,

(53) ldn-lbn1+(1-2α)xn+ζs(0,σbd).

Using Equation 11, λlinTd is obtained to be,

(54) λlinTd=1-αxln(2)-ξ(0,CVλ)+ζs(0,σbd)2ln(2).

To calculate the expression for mlt,lin, the slope of the best linear fit for λlinTd vs ld-lb plot, we first calculate ρlin given by Equation 46. The expression for σl,lin (standard deviation of ld-lb) and σt (standard deviation of Td) are found to be,

(55) σl,lin2=(1-2α)2σx2+σbd2,
(56) σt2=1λlin2( (ασxln(2))2+CVλ2+(σbd2ln(2))2 ).

σxis related to σn via Equation 17. In Exponential growth section, we also showed that σn = σbd2. Using these, we can write,

(57) σx2=σbd24α(2-α).

Now using the expressions for σt, σl,lin and the fact that x, ξ(0,CVλ) and ζs(0,σbd) are independent of each other, we get,

(58) ρlin=(2α1)ασx2ln(2)+σbd22ln(2)λlinσl,linσt.

For the plot λlinTd vs ld-lb, the slope mlt,lin is given by,

(59) mlt,lin=ρlinσtλlinσl,lin=(2α1)ασx2ln(2)+σbd22ln(2)σl,lin2.

Inserting Equation 55 into Equation 59 and substituting σx2 given by Equation 57, we obtain,

(60) mlt,lin=1ln(2)3α4α+1.

The intercept clt,lin is found to be,

(61) clt,lin=λlinTd-mlt,linld-lb=1-1ln(2)3α4α+1.

For the adder model (α = 12), we get the value of slope mlin,lt = 12ln(2)0.7213 and intercept clin,lt = 1-12ln(2)0.279. This is different from the best linear fit obtained for same regulatory mechanism controlling division in linearly growing cells where we found that the best linear fit follows the y = x line. Intuitively, we expect the best linear fit of λlinTd vs ld-lb plot to deviate from y = x line in the case of exponential growth. We showed analytically that for a class of models where birth controls division, it is indeed the case. This is also shown using simulations of the adder model in Figure 3—figure supplement 1C.

In the ‘ Obtaining the best linear fit’ section, we found the best linear fit for λTd vs ln(LdLb) plot to follow the y = x line for exponentially growing cells where division is regulated by birth event via regulation strategy f(lb). Next, we calculate the equation for the best linear fit of λTd vs ln(LdLb) plot given that growth is linear. The model for division control will be same as that in the Linear growth section that is, the regulation strategy for division is given by g(lb) = 2+2(1-α)(lb-1) which is also equivalent to f(lb). The linearly growing cells grow with elongation speed λlin = λlin(1+ξlin(0,CVλ,lin)). As discussed before, ξlin(0,CVλ,lin) has a normal distribution with mean 0 and standard deviation CVλ,lin, it being the CV of the elongation speed. Using Equations 5 and 6, we get,

(62) ln(LdLb)=ln(2)-αxn+ζs(0,σbd)2.

Using Equations 5 and 52, we obtain from Equation 41,

(63) λTd=ln(2)+(1-2α)ln(2)x+ln(2)ζs(0,σbd)-ln(2)ξlin(0,CVλ,lin).

Since x, ξlin(0,CVλ,lin) and ζs(0,σbd) are uncorrelated, the standard deviation of ln(LdLb) and Td denoted by σl and σt respectively are calculated to be,

(64) σl2=α2σx2+σbd24,
(65) σt2=ln2(2)λ2((1-2α)2σx2+σbd2+CVλ,lin2).

We calculate the correlation coefficient for the pair (ln(LdLb), λTd). Since the correlation coefficient is unaffected by multiplying one of the variables with a positive constant, we can calculate the correlation coefficient for the pair (ln(LdLb), Td) or ρexp as given by Equation 18. Using the independence of terms x, ξlin(0,CVλ,lin) and ζs(0,σbd),

(66) ρexp=ln(2)(σx2(2α1)α+σbd22)λσlσt.

For the plot λTd vs ln(LdLb), the slope mlt of the best linear fit is given by,

(67) mlt=ρexpσtλσl=ln(2)(σx2(2α1)α+σbd22)σl2.

Inserting Equation 64 into Equation 67 and using Equation 57, we get,

(68) mlt=32ln(2)1.0397.

Similarly the intercept (clt) for the plot λTd vs ln(LdLb) is found to be,

(69) clt=λTd-mltln(LdLb)=ln(2)(1-32ln(2))-0.0275.

This is very close to y = x trend obtained for the same regulatory mechanism controlling division in exponentially growing cells (Figure 3A).

Growth rate vs age and elongation speed vs age plots

Request a detailed protocol

In the previous sections, we found that binning and linear regression on the plot ln(LdLb) vs λTd, and the plot obtained by interchanging the axes, were inadequate to identify the mode of growth. In this section, we try to validate the growth rate vs age plot as a method to elucidate the mode of growth.

In addition to cell size at birth and division and the generation time, cell size trajectories (cell size, L vs time from birth, t) were obtained for multiple cell cycles. In our case, the cell size trajectories were collected either via simulations (in Figure 3B) or from experiments (for Figure 4A-C) at intervals of 4 min. Note that if the measurements were to be carried out at equal length intervals instead of time, the results discussed in the paper would still remain unchanged. For each trajectory, growth rate at time t or age tTd is calculated as 1L(t)L(t+Δt)-L(t)Δt where Δt is the time between consecutive measurements. To obtain elongation speed vs age plots, the formula before needs to be replaced with L(t+Δt)-L(t)Δt. The growth rate is interpolated to contain 200 points at equal intervals of time for each cell trajectory. The growth rate trends appear to be robust with regards to a different number of interpolated points (from 100 to 500 points). To obtain the growth rate trend as a function of cell age, we use the method previously applied in Nordholt et al., 2020. In this method, growth rate is binned based on age for each individual trajectory (50 bins) and the average growth rate is obtained in each of the bins. The binned data trend for growth rate vs age is then found by taking the average of the growth rate in each bin over all trajectories. Binning the growth rate for each trajectory ensures that each trajectory has an equal contribution to the final growth rate trend so as to avoid inspection bias. This step is especially important when data collected at equal intervals of time is analyzed. In such a case, cells with larger generation times have a greater number of measurements than cells with smaller generation times. Obtaining the growth rate trend without binning growth rate for each trajectory would have biased the binned data trend for the growth rate vs age plot to a smaller value because of over-representation by slower-growing cells (or equivalently cells with longer generation time). This bias toward lower growth rate values in the growth rate vs age plots is an instance of inspection bias.

In Figure 4A-C, we find the growth rate obtained from E. coli experiments to change within the cell cycle. In the two slower growth media (Figure 4A and B), the growth rate is found to increase with cell age while for the fastest growth media (Figure 4C) the growth rate follows a non-monotonic behaviour similar to that observed in Nordholt et al., 2020 for B. subtilis. Abrupt changes in growth rate are reported at constriction in Reshes et al., 2008; Banerjee et al., 2017. We find that the growth rate changes start before constriction in the two slower growth conditions considered. One possibility is that this increase is due to preseptal cell wall synthesis (Pazos et al., 2018). Preseptal cell wall synthesis does not require activity of PBP3 (FtsI) but instead relies on bifunctional glycosyltransferases PBP1A and PBP1B that link to FtsZ via ZipA. One hypothesis that can be tested in future works is that at the onset of constriction, activity from PBP1A and PBP1B starts to gradually shift to the PBP3/FtsW complex and therefore no abrupt change in growth rate is observed. In the fastest growth condition (glucose-cas medium), we find that the increase in growth rate approximately coincides with onset of constriction, in agreement with the previous findings (Reshes et al., 2008; Banerjee et al., 2017).

In Figure 4A-C, the growth rate trends are not obtained for age close to one. This is because growth rate at age = 1 is given by 1L(Td)L(Td+Δt)-L(Td)Δt and this requires knowing the cell lengths beyond the division event (L(Td+Δt)). To estimate growth rates at age close to one, we approximate L(Td+Δt) to be the sum of cell sizes of the two daughter cells. In order to minimize inspection bias, we considered only those cell size trajectories which had L(t) data for 12 min after division (corresponding to an age of approximately 1.1). However, the growth rate trends in all three growth media were robust with regards to a different time for which L(t) was considered (4 min to 20 min after division). We use the binning procedure discussed before in this section. To validate this method, we applied it on synthetic data obtained from the simulations of exponentially growing cells following the adder and the adder per origin model. Cells were assumed to divide in a perfectly symmetric manner and both of the daughter cells were assumed to grow with the same growth rate, independent of the growth rate in the mother cell. The growth rate trends for the two models considered (adder and adder per origin) are expected to be constant even for cell age >1. We found that the growth rate trends were indeed approximately constant as shown in Figure 4—figure supplement 1D. We also considered linear growth with division controlled via an adder model. The daughter cells were assumed to grow with the same elongation speed, independent of the elongation speed in the mother cell. In this case, we expect the elongation speed trend to be constant for cell age >1. This is indeed what we observed as shown in the inset of Figure 4—figure supplement 1D. We used this method on E. coli experimental data and found that the growth rate trends obtained for the three growth conditions (Figure 4—figure supplement 1A–1C) were consistent with that shown in Figure 4A-C in the relevant age ranges. For cell age close to one, we found that the growth rate decreased to a value close to the growth rate near cell birth (age ≈ 0) for all three growth conditions considered.

In summary, we find that the growth rate vs age plots are a consistent method to probe the mode of cell growth within a cell cycle.

Growth rate vs time from specific event plots are affected by inspection bias

Request a detailed protocol

To probe the growth rate trend in relation to a specific cell cycle event, for example cell birth, growth rate vs time from birth plots are obtained for simulations of exponentially growing cells following the adder model. In the growth rate vs time from birth plot, the rate is found to stay constant and then decrease at longer times (Figure 3—figure supplement 2C) even though cells are exponentially growing. Because of inspection bias (or survivor bias), at later times, only the cells with larger generation times (or slower growth rates) ‘survive’. The average generation time of the cells averaged upon in each bin of Figure 3—figure supplement 2C is shown in Figure 3—figure supplement 2D. The decrease in growth rate in Figure 3—figure supplement 2C occurs around the same time when an increase in generation time is observed in Figure 3—figure supplement 2D. Thus, the trend in growth rate is biased toward lower values at longer times. The problem might be circumvented by restricting the time on the x-axis to the smallest generation time of all the cell cycles considered (Messelink et al., 2020).

To check for growth rate changes at constriction, we used plots of growth rate vs time from constriction (t-Tn). Growth rate trends obtained from E. coli experimental data show a decrease at the edges of the plots (Figure 4—figure supplement 2A and C, and 2E). These deviate from the trends obtained using the growth rate vs age plots (Figure 4A-C). To investigate this discrepancy, we use a model which takes into account the constriction and the division event. Currently, it is unknown how constriction is related to division. For the purpose of methods validation, we use a model where cells grow exponentially, constriction occurs after a constant size addition from birth, and division occurs after a constant size addition from constriction. Note that other models where constriction occurs after a constant size addition from birth while division occurs after a constant time from constriction, as well as a mixed timer-adder model proposed in Banerjee et al., 2017, lead to similar results. We expect the growth rate trend to be constant for exponentially growing cells. However, we find using numerical simulations that it decreases at the plot edges both before and after the constriction event (Figure 3—figure supplement 2A). This decrease can be attributed to inspection bias. The average growth rate in time bins at the extremes are biased by cells with smaller growth rates. This is shown in Figure 3—figure supplement 2B where the average generation time for the cells contributing in each of the bins of Figure 3—figure supplement 2A is plotted. The time at which the growth rate decreases on both sides of the constriction event is close to the time at which the average generation time increases. For example, in alanine medium, the generation time for each of the bins is plotted in Figure 4—figure supplement 2B. The average generation time for the cells contributing to each of the bins is almost constant for the timings between –80 min and 20 min. Thus, for this time range the changes in growth rate are not because of inspection bias but are a real biological effect. The behavior of growth rate within this time range in Figure 4—figure supplement 2A is in agreement with the trend in growth rate vs age plot of Figure 4A. On accounting for inspection bias, the growth rate vs age plots agree with the growth rate vs time from constriction plots in other growth media as well (Figure 4—figure supplement 2C, Figure 4—figure supplement 2E). Thus, growth rate vs time plots are also a consistent method to probe growth rate modulation in the time range when avoiding the regimes prone to inspection bias.

Results of elongation speed vs size plots are model-dependent

Request a detailed protocol

Cells assumed to undergo exponential growth have elongation speed proportional to their size. In the case of exponential growth, the binned data trend of the plot elongation speed vs size is expected to be linear with the slope of the best linear fit providing the value of growth rate and intercept being zero. In this section, we use the simulations to test if binning and linear regression on the elongation speed vs size plots are suitable methods to differentiate exponential growth from linear growth (Cadart et al., 2019).

To test the method, we generate cell size trajectories using simulations of the adder model with a size additive division timing noise and assuming exponential growth. Elongation speed at size L(t) is calculated for each trajectory as L(t+Δt)-L(t)Δt where Δt is the time between consecutive measurements ( = 4 min in our case). Each trajectory is binned into 10 equally sized bins based on their cell sizes and the average elongation speed is obtained for each bin. The final trend of elongation speed as a function of size is then obtained by binning (based on size) the pooled average elongation speed data of all the cell cycles.

We find that the binned data trend is linear with the slope of the best linear fit close to the average growth rate considered in the simulations (Figure 3—figure supplement 3D). This is in agreement with our expectations for exponential growth. In order to check if this method could differentiate between exponential growth and linear growth, we used simulations of the adder model undergoing linear growth to generate cell size trajectories for multiple cell cycles. For linear growth, elongation speed is expected to be constant, independent of its cell size. The binned data trend for the elongation speed vs size plot is also obtained to be constant for the simulations of linearly growing cells (Figure 3—figure supplement 3B). The intercept of the best linear fit obtained is close to the average elongation speed considered in the simulations. The binned data trend for linear and exponential growth are clearly different as shown in Figure 3—figure supplement 3B and Figure 3—figure supplement 3D, respectively, and this result holds for a broad class of models where the division event is controlled by birth and the growth rate (for exponential growth)/elongation speed (for linear growth) is distributed normally and independently between cell-cycles.

Next, we consider the adder per origin cell cycle model for exponentially growing cells (Ho and Amir, 2015). In this model space, the cell initiates DNA replication by adding a constant size per origin from the previous initiation size. The division occurs on average after a constant time from initiation. For exponentially growing cells, the binned data trend is still expected to be linear as before. Instead, we find using simulations that the trend is non-linear and it might be misinterpreted as non-exponential growth (Figure 3—figure supplement 3F).

Thus, the results of binning and linear regression for the plot elongation speed vs size is model-dependent.

Interchanging axes in growth rate vs inverse generation time plot might lead to different interpretations

Request a detailed protocol

So far, our discussion was focused on the question of mode of single-cell growth. A related problem regards the relation between growth rate (λ) and the inverse generation time (1Td). On a population level, the two are clearly proportional to each other. However, single-cell studies based on binning showed an intriguing non-linear dependence between the two, with the two variables becoming uncorrelated in the faster-growth media (Kennard et al., 2016; Iyer-Biswas et al., 2014). Within the same medium, the binned data curve for the plot λ vs 1Td flattened out for faster dividing cells. The trend in the binned data was different from the trend of y = ln(2) x line as observed for the population means. A priori one might speculate that the flattening in faster dividing cells could be because the faster dividing cells might have less time to adapt their division rate to transient fluctuations in the environment. Kennard et al., 2016 insightfully also plotted 1Td vs λ and found a collapse of the binned data for all growth conditions onto the y = 1ln(2) x line. These results are reminiscent of what we previously showed for the relation of ln(LdLb) and λTd.

In the following, we will elucidate why this occurs in this case using an underlying model and predicting the trend based on it. We use simulations of the adder model undergoing exponential growth. The parameters for size added in a cell cycle and mean growth rates are extracted from the experimental data. CV of growth rate is assumed lower in faster growth media as observed by Kennard et al. Using this model, we could obtain the same pattern of flattening at faster growth conditions that is observed in the experiments (Figure 2—figure supplement 2A). The population mean for λ and 1Td follows the expected y = ln(2) x equation (shown as black dashed line) as was the case in experiments. Intuitively, such a departure from the expected y = ln(2) x line for the single-cell data can again be explained by determining the effect of noise on variables plotted on both axes. As previously stated Td is affected by both growth rate noise and noise in division timing while growth rate fluctuates independently of other sources of noise. This does not agree with the assumption for binning as noise in division timing affects the x-axis variable rather than the y-axis variable. In such a case, the trend in the binned data might not follow the expected y = ln(2) x line. However, on interchanging the axes, we would expect the assumptions of binning to be met and the trend to follow the y = 1ln(2) x line (Figure 2—figure supplement 2B).

Data and simulations

Experimental data

Request a detailed protocol

Experimental data obtained by Tanouchi et al., 2017 was used to plot Ld vs Lb shown in Figure 1A. E. coli cells were grown at 25 °C in a mother machine device and the length at birth and division were collected for multiple cell cycles. Ld vs Lb plot was obtained using these cells and linear regression performed on it provided a best linear fit.

Data from recent mother machine experiments on E. coli was used to make all other plots. Details are provided in the Experimental methods and Tiruvadi-Krishnan et al., 2021. The experiments were conducted at 28 °C in three different growth conditions - alanine, glycerol, and glucose-cas (also see Experimental methods). Cell size trajectories were collected for multiple cell cycles and all of the data collected were considered while making the plots in the paper.

Simulations

Request a detailed protocol

MATLAB R2021a was used for simulations. Simulations of the adder model for exponentially growing cells were carried out over a single lineage of 2500 generations (Figure 2C and D, Figure 3—figure supplement 1C). The mean length added between birth and division was set to 1.73 μm in line with the experimental results for alanine medium. Growth rate was variable and sampled from a normal distribution at the start of each cell cycle. The mean growth rate was set to ln(2)Td, where Td = 212 min and coefficient of variation (CV) = CVλ = 0.15. The noise in division timing was assumed to be time additive with mean 0 and standard deviation σnλ, where σn = 0.15. The binning data trends and the best linear fits obtained using these simulations could be compared with the analytical results obtained in sections ‘Non-linearity in binned data’ and ‘Differentiating linear from exponential growth’.

For simulations of linear growth (Figure 3A-B, Figure 3—figure supplement 1A and B, Figure 3—figure supplement 3A, B , Figure 4—figure supplement 1D), the mean growth rate was set to Ld-LbTd, with the values of Ld-Lb and Td used as mentioned previously. The noise in division timing was size additive with standard deviation = 0.15 Lb. Noise was also considered to be size additive with the same standard deviation for the simulations of exponentially growing cells shown in Figure 3B, Figure 3—figure supplement Figure 3—figure supplements 2C and 3C. D, and Figure 4—figure supplement 1D.

In the simulations of super-exponential growth carried over a single lineage of 2500 generations (Figure 3B), the cells initially grew exponentially but in the later stages of the cell cycle, the growth rate increased as,

(70) dλdt=2k(t-tc),

where k was fixed to be 2Td3 and tc was the time from birth at which the growth rate changed from exponential to super-exponential growth. tc was fixed to be half of the generation time of the cell or equivalently an age of 0.5. The division size was set by the adder model with a time additive noise with similar parameters as before for exponential growth simulations. The exponential growth rate at the start of each cell cycle was drawn from a normal distribution with mean set to ln(2)242min-1 and CV = 0.15.

For Figure 3B, Figure 3—figure supplement 3E and F, Figure 4—figure supplement 1D, simulations were carried out over a lineage of 2500 generations for exponentially growing cells following the adder per origin model. In the simulations, the time increment is 0.01 min. The initial condition for the simulations is that cells are born and initiate DNA replication at time t = 0 but the results are independent of initial conditions. The number of origins is also tracked throughout the simulations beginning with an initial value of 2. Cells divide into two daughter cells in a perfectly symmetrical manner (no noise in division ratio), and one of the daughter cells is discarded for the next cell cycle. In simulations, the growth rate was fixed within a cell cycle but varied between different cell cycles. On division, the growth rate for that cell cycle was drawn from a normal distribution with mean λ and coefficient of variation (CVλ) whose values were fixed using the experimental data from alanine medium. The total length at which the next initiation happens is determined by,

(71) Litot,next=Li+OΔii,

where Δii is the length added per origin and O is the number of origins. To determine Litot,next, Δii was drawn on reaching initiation length from a normal distribution. The mean and CV of Δii was obtained from experiments done in alanine medium. In the adder per origin model, division happens after a C + D time from initiation. The division length (Ld) is obtained to be,

(72) Ld=Lieλ(C+D).

In the simulations, once the initiation length was reached, the corresponding division occurred a time C + D after initiation. C + D timings for each initiation event were again drawn from a normal distribution with the same mean and CV as that of the experiments in alanine medium.

For Figure 3—figure supplement 2A, cells were assumed to grow exponentially in the simulations. The constriction length (Ln) was set to be,

(73) Ln=Lb+Δbn.

The length added (Δbn) was assumed to have a normal distribution with the mean length added between birth and constriction set to 1.18 μm and the CV = 0.23, in line with the experimental results for alanine medium. The length at division was set as,

(74) Ld=Ln+Δnd.

The length added (Δnd) was also assumed to have a normal distribution with the mean length added set to 0.53 µm and the CV = 0.26, again in line with the experimental results for alanine medium.

For Figure 3B, Figure 3—figure supplement 2A–2D, 3 A-3F, Figure 4—figure supplement 1D, the cell sizes are recorded within the cell cycle at equal intervals of 4 min, similar to that in the E. coli experiments of Tiruvadi-Krishnan et al., 2021.

For simulations shown in Figure 4—figure supplement 1D, the cell size trajectories are obtained at intervals of 4 min beyond the current cell-cycle. The size after the division event is said to be the sum of the sizes of the daughter cells. It is also further assumed that the daughter cells are equal in size (perfectly symmetric division) and they both grow with the same growth rate (for exponential growth) or elongation speed (for linear growth). The growth rates/elongation speeds for the daughter cells are sampled from a normal distribution with a mean and CV as discussed before. The cell size trajectories are recorded for 80 min after the division event in the current cell cycle.

In Figure 2—figure supplement 2, simulations of the adder model for exponentially growing cells were carried out until a population of 5000 cells was reached. The parameters for size added in a cell cycle and mean growth rates were extracted from the experimental data (Kennard et al., 2016). The value of σn used in all growth conditions was 0.17 while CVλ decreased in faster growth conditions (0.2 in the three slowest growth conditions, 0.12 and 0.07 in the second fastest and fastest growth conditions, respectively).

Appendix 1

Comparing length, surface area and volume growth rate

In the paper, we use cell length to represent cell size. However, other cell size characteristics such as cell surface area and cell volume could also be used to denote cell size. How does the growth rate vary with our choice of cell length, cell surface area, or cell volume to be the cell size?

To study this, we assume a cell morphology as shown in Appendix 1—figure 1. We assume that E. coli cells are cylindrical with hemispherical poles. The total length of the cell is L with a radius R. The cell volume (V) is then,

(1-A1) V=πR2L23πR3.
Appendix 1—figure 1
Length growth rate vs volume and surface area growth rate.

(A) Cell morphology of E. coli used in the model is shown. The E. coli cells are assumed to be cylindrical with hemispherical end caps. Before constriction, the cell elongates with constant width (2 R). However, after onset of constriction, the septum starts forming at the mid-cell. (B) Length growth rate as a function of age assuming that the total cell surface area growth is exponential, and the radius is constant (R = 0.35 μm). (C) Length growth rate as a function of age assuming that the volume growth is exponential, radius is constant (R = 0.35 μm) and septum surface grows at a constant rate.

The morphology of the cell after constriction is also shown in Appendix 1—figure 1A. The volume in this case is,

(2-A1) V=πR2L-43πR3+2πR2h-2πh2R+23πh3.

If we make the assumption that cell biomass grows exponentially and the total cell surface area is coupled to the biomass (Oldewurtel et al., 2021), then cell surface area grows exponentially with time. Using the morphology in Appendix 1—figure 1A, the total surface area (S) before and after constriction is,

(3-A1) S=2πRL.

Surprisingly, this is independent of h. Since the surface area is proportional to the cell length (Equation (3-A1)), the length growth is also exponential with an identical growth rate as surface area growth, assuming the width of the cell is constant. The exponential growth of cell length is shown in Appendix 1—figure 1B using simulations where the cell surface is assumed to grow exponentially. So, for this model of cell growth and morphology, the length and the surface growth rates are found to be identical.

Next, we compare length growth rate to volume growth rate considering the same cell morphology as that in Appendix 1—figure 1A. In this model, the volume growth is assumed to be exponential. The volume before and after the onset of constriction are given by Equations (1-A1) and (2-A1), respectively.

Before constriction, the volume grows only by an increase in length of the cylindrical part of the cell while the width stays constant. However, after the constriction at mid-cell starts, the volume grows by an increase in length as well as by adding a septum surface at the mid-cell. We assume that the septum wall surface grows at a constant rate (c1) (Reshes et al., 2008). We can obtain c1 in terms of cell morphology variables to be,

(4-A1) c1=-4πRdhdt.

We can solve for h(t) using the following boundary conditions,

(5-A1) h(t=Tn)=R,h(t=Td)=0,

where Tn is the time from birth at which constriction starts. Using Equation (4-A1) and Equation (5-A1), we can obtain c1 in terms of cell cycle variables R, Tn and Td,

(6-A1) c1=4πR2TdTn

Under these assumptions, for exponential volume growth, we obtain the length growth via simulations. The length growth rate is shown in Appendix 1—figure 1C. The growth rate, the length at birth, the time at constriction from birth and the generation time parameters used in the simulations are obtained from experimental data in alanine growth medium. The width of the cells is assumed to be 0.35 μm. We find that before constriction, the length growth rate increases to a small extent (6%). However, after constriction there is a rapid increase in length growth rate. Since the length growth rate increases with age, the length growth is super-exponential. Thus, the mode of growth in length and volume are not identical.

Appendix 2

Linear regression on ln(LdLb) vs Tdλ plot and its interchanged axes plot

In section ‘Statistical methods like binning and linear regression should be interpreted based on a model’, we found that binning and linear regression on the plots ln(LdLb) vs λTd and its interchanged axes were not a suitable method to identify the underlying mode of growth. In this section, we explore binning and linear regression on similar plots ln(LdLb) vs Tdλ plot and its interchanged axes. We test the usability of these plots to elucidate the mode of growth using the methodology proposed in the paper.

Assuming exponential growth, λ for a cell cycle can be calculated as 1Tdln(LdLb). On plotting ln(LdLb) vs Tdλ (Appendix 2—figure 1 ) and Tdλ vs ln(LdLb) (Appendix 2—figure 1) for the experimental data, we obtain the slope of the best linear fit to be close to zero (values shown in Table 1-Appendix 2). Next, using the methodology of the paper, we interpret these results using an underlying model. We consider a model in which cells grow exponentially with the division determined by birth. In the model, growth rate is fixed at the beginning of each cell cycle and is independent of size at birth. The model predicts that ln(LdLb) will be independent of the growth rate (Equation 19 in main text). Thus, we would expect the slope to be zero for both of the plots ln(LdLb) vs Tdλ and Tdλ vs ln(LdLb). This is also shown using simulations of the adder model in Appendix 2—figure 1 where the slope of the plots is close to zero. In order to differentiate between exponential growth and linear growth, the best linear fit in case of linear growth for these plots must deviate from y = constant line. However, we find for the simulations of the adder model where cells grow linearly that the slope of the best linear fit for both of the above plots is still zero (Appendix 2—figure 1). Note that λ in the case of linear growth is still calculated as 1Tdln(LdLb). A slope of zero in case of linear growth can be explained using Equation 62 of the main text. Using the equation, we find that ln(LdLb) is independent of the underlying growth rate for linear growth. Thus, the best linear fit for both plots have a slope of zero in the case of linear growth. This indicates that binning and linear regression on the ln(LdLb) vs Tdλ and its interchanged axes plots are unsuitable for elucidating the mode of growth.

Appendix 2—figure 1
ln(LdLb)vs Tdλ and its flipped axes plots.

(A-E) ln(LdLb) vs Tdλ are shown for A. Experimental data in alanine medium. B. Experimental data in glycerol medium. C. Experimental data in glucose-cas medium. D. Simulations of the adder model where cells grow exponentially, carried out for N = 2500 cells. (E) Simulations of the adder model where cells grow linearly, carried out for N = 2500 cells. F-J. For the same order of the above experimental conditions and simulations, Tdλ vs ln(LdLb) plots are shown. In all of the plots, blue represents the raw data, red represents the binned data, and the yellow line represents the best linear fit obtained by applying linear regression on the raw data. In all of the plots, the slope of the best linear fit is close to zero. Thus, we find that these plots are not a suitable method to differentiate between linear and exponential growth as they provide a similar best linear fit.

Appendix 2—table 1
The slope and the intercept of the best linear fit along with their 95 % confidence intervals (CI) obtained on performing linear regression on experimental data.

The data is collected for cells growing in M9 alanine, glycerol and glucose-cas media (Tiruvadi-Krishnan et al., 2021).

MediaNo. ofTdln(LdLb) vs Tdλ plotTdλ vs ln(LdLb) plot
Slope (with 95% CI)Intercept (with 95% CI)Slope (with 95% CI)Intercept (with 95% CI)
Alanine8162140.04 (-0.01, 0.09)0.65 (0.62, 0.69)0.05 (-0.01, 0.12)0.67 (0.63, 0.72)
Glycerol648164–0.12 (-0.16,–0.07)0.75 (0.71, 0.79)–0.19 (-0.27,–0.11)0.83 (0.78, 0.89)
Glucose-cas737650.11 (0.06, 0.16)0.55 (0.52, 0.58)0.16 (0.09, 0.23)0.56 (0.51, 0.61)

Data availability

All data generated during this study are deposited in Dataverse (https://doi.org/10.7910/DVN/BNQUDW).

The following data sets were generated
    1. Kar P
    2. Tiruvadi-Krishnan S
    3. Männik J
    4. Männik J
    5. Amir A
    (2021) Harvard Dataverse
    Distinguishing different modes of growth using single-cell data.
    https://doi.org/10.7910/DVN/BNQUDW
The following previously published data sets were used
    1. Tanouchi Y
    2. Pai A
    3. Park H
    4. Huang S
    5. Buchler NE
    6. You L
    (2017) figshare
    Data from long-term growth data of Escherichia coli at a single-cell level.
    https://doi.org/10.6084/m9.figshare.c.3493548.v1

References

    1. Thom LC
    (2007)
    E. coli genome manipulation by p1 transduction
    Current Protocols in Molecular Biology 79:1–17.

Decision letter

  1. Gordon J Berman
    Reviewing Editor; Emory University, United States
  2. Aleksandra M Walczak
    Senior Editor; École Normale Supérieure, France
  3. Benjamin P Bratton
    Reviewer; Princeton University, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Decision letter after peer review:

Thank you for submitting your article "To bin or not to bin: analyzing single-cell growth data" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Aleksandra Walczak as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Benjamin P Bratton (Reviewer #3).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

1) The final result (Figure 4) is somewhat disconnected from the majority of the paper that precedes it. Specifically, the authors' procedure that resolves exponential vs. non-exponential growth results in E. coli in alanine being deemed exponential (Figure 2B) only to later be revealed as non-exponential (Figure 4A), albeit weakly. Furthermore, the procedure advertised as distinguishing exponential from linear growth (Figure 3B), when applied to the data, reveals neither (Figure 4). This makes the main point of the paper (the demonstration and resolution of pitfalls) feel disconnected from its application to a particular case, which is more nuanced and likely leaves many questions unanswered.

2) The title ("To bin or not to bin…") implies that binning is the main culprit behind potentially misleading analyses, but one of the reviewers argued that in the end, it is linear regression. Each of the two main pitfalls and their resolution would be unchanged if the data were never binned. Binning affects the apparent curvature of the y vs x relationship, but this reads as a more minor point. Therefore, the title may be a bit misleading in service of its poeticism.

3) The authors look at different choices of binning dimensions but do not sufficiently explore the power of their generative model to perform (un)weighted regression or parameter estimation from the not-binned data. They do explore the unbinned data from an analytical statistics approach in sections 5.4.1 and 5.5 but this is not yet extensively explored in the figures and/or discussion.

4) This manuscript could be improved by including a discussion on binning single-cell length trajectories, those taken on a single cell within one division cycle, either based on the length or on time. Experimentally, my understanding is that most growth measurements are done on a fixed \Δ t basis, and therefore measurements are averaged using length as a function of time. Another option, one that I have not seen used, is to use a \Δ L basis and measure the time to grow either a specific amount or a specific fractional amount. Of course, these are different ways to slice the same cake, and such an extension may be useful to expand the authors' argument that one needs to utilize a generative model to be able to assess their choice in statistical analysis.

5) As the formulation of this approach borrows from and is built heavily on previous data and previous statistical frameworks, it would be helpful to have more discussion on what is new to this particular manuscript. This is always a difficult balance as one needs the context of the previous work to understand the symbology used, but just reiterating what is already in the literature can lead to a dilution of the new/novel/important discoveries of the manuscript.

6) The authors should resolve the use of the term "cell size" to mean both "cell length" and "cell volume" (line 107, line 431/432). The authors briefly touch on a related concept of length growth vs biomass growth in 343-357. As a first, naively simple model of cell growth, it is possible to come up with a simple model for linear vs volumetric growth by imagining that a cell grows exponentially in length from birth to death. In this simple model, at the appropriate time, it switches from growing a cylinder to growing hemispherically endcaps, as it needs to perfectly duplicate its shape. In this simple model, one can easily see that the instantaneous, relative volumetric and length growth rates are not identical; vary throughout the cell cycle; and even their average over the whole cell cycle is not 1. While the authors mention that the average diameter is quite robust between growth conditions, only in the limit that the cell is very long and not a spherocylinder does the approximation that shape does not matter come in to play. Granted, this naïve model does not incorporate the wealth of information known about division timing or geometric changes during growth, and underscores the authors' main point that a specific model of the dynamical process should be included.Reviewer #1:

In this manuscript, the authors describe a generative model-based framework to better analyze stochastic growth data, including bacterial cell growth. They show how this framework can be applied to gain insight into the processes underlying these phenomena. More specifically, they start by showing how binning along different axes in the

This work is well-supported by simulations and data analysis and will likely be of interest to those trying to understand the processes governing bacterial growth, as well as those studying stochastic growth processes in biology more broadly.

Strengths:

– The choice and execution of the simulations were sensible and well-done, respectively, and they provided clarity as to the overall message of the manuscript.

– The conclusions are well-supported by the data.

– I found the writing to be clear throughout.

Weaknesses:

– It would be good to have a more extensive discussion about what is specifically new here. This is not my particular field, so having a bit more of an introduction about methods beyond binning (if any) that have emerged to understand these data.Reviewer #2:

The manuscript by Kar et al., uses single-cell experiments, simulations, and theory to investigate a common method of determining whether cell size grows exponentially. Specifically, they show that a relationship that should adhere to y = x for exponential growth on average (where x is the product of the division time and mean growth rate, and y is the log of the ratio of the birth and division size), in fact deviates from y = x with noise, because x contains noise that y does not (growth rate noise). This makes exponential growth seem non-exponential. The resolution, they show, is to plot x vs. y instead. Additionally, they show that when plotting x vs. y for linear growth, the relationship is coincidentally very close to x = y. This makes linear growth seem exponential. The resolution, they show, is to plot the instantaneous growth rate vs. the normalized cell age, which will decrease for linear growth and will be constant for exponential growth. Applying this protocol to E. coli size data, they find that the growth rate actually increases weakly with age, indicating somewhat superexponential growth.

Strengths

The insights in this manuscript are highly important for the field to know. The fact that exponential growth can masquerade as non-exponential growth and vice versa means that much confusion likely exists in a field that is already surprisingly complex given how simple its questions are to state. The fact that the authors offer resolutions to these pitfalls means that this work should add clarity to the field and move it forward in a meaningful way.

The conclusions are well supported by the data. For example, in the case of seemingly non-exponential growth, the problem is presented by the experimental data, reproduced by the simulations, and explained by the theory, while the resolution is inspired by the theory, proven by the simulations, and demonstrated by the experimental data.

The manuscript is well written. While it is rooted in careful analysis, it remains understandable to a largely non-quantitative audience.

Weaknesses

The final result (Figure 4) is somewhat disconnected from the majority of the paper that precedes it. Specifically, the authors' procedure that resolves exponential vs. non-exponential growth results in E. coli in alanine being deemed exponential (Figure 2B) only to later be revealed as non-exponential (Figure 4A), albeit weakly. Furthermore, the procedure advertised as distinguishing exponential from linear growth (Figure 3B), when applied to the data, reveals neither (Figure 4). This makes the main point of the paper (the demonstration and resolution of pitfalls) feel disconnected from its application to a particular case, which is more nuanced and likely leaves many questions unanswered.

The title ("To bin or not to bin…") implies that binning is the main culprit behind potentially misleading analysis, but I would argue that in the end, it is linear regression. Each of the two main pitfalls and their resolution would be unchanged if the data were never binned, I believe. Binning affects the apparent curvature of the y vs x relationship, but this reads as a more minor point. Therefore, the title may be a bit misleading in service of its poeticism.Reviewer #3:

Kar et al., examine an interesting and important question of how to make sense of large sets of observational data, specifically cell length data, which may or may not be consistent with various underlying biological mechanisms. As datasets improve in their technical quality (increasing spatiotemporal resolution, increasing numbers of observations), there is hope that the community will be able to resolve differences between underlying cell biological mechanisms of cell size homeostasis. As the authors point out, these interpretations and analyses require statistical analysis that can accurately perform the model selection or parameter estimation task of interest.

1) The authors succeed in bringing attention to the issue of appropriate binning when analyzing large datasets. The authors focus their figures and discussion on an important, and practical issue, as many researchers perform linear regression on binned data. The title and framing of the manuscript imply that it will provide a comparison with statistical methods that do not involve binning. The authors look at different choices of binning dimensions, but do not sufficiently explore the power of their generative model to perform (un)weighted regression or parameter estimation from the not-binned data. They do explore the unbinned data from an analytical statistics approach in section 5.4.1 and 5.5 but this not yet extensively explored in the figures and/or discussion.

2) The authors succeed in walking the reader through the power of examining a specific mechanism/model and the statistical properties of that model. In this case, the authors do this with both a model of cell length homeostasis that comes from exponential growth or linear growth with homeostatic feedback. This approach rests heavily on previous work from the same group including a reframing of the statistical correlations between different observables that was explored in the included references [13] and [16]. This reframing is complemented by additional experiments and reanalysis of various published experimental datasets.

https://doi.org/10.7554/eLife.72565.sa1

Author response

Essential revisions:

1) The final result (Figure 4) is somewhat disconnected from the majority of the paper that precedes it. Specifically, the authors' procedure that resolves exponential vs. non-exponential growth results in E. coli in alanine being deemed exponential (Figure 2B) only to later be revealed as non-exponential (Figure 4A), albeit weakly. Furthermore, the procedure advertised as distinguishing exponential from linear growth (Figure 3B), when applied to the data, reveals neither (Figure 4). This makes the main point of the paper (the demonstration and resolution of pitfalls) feel disconnected from its application to a particular case, which is more nuanced and likely leaves many questions unanswered.

We try to bridge the gap between Figure 4 and the text preceding it by using simulations of super-exponential growth as mentioned in response to Comment 1 of Reviewer 2.

2) The title ("To bin or not to bin…") implies that binning is the main culprit behind potentially misleading analyses, but one of the reviewers argued that in the end, it is linear regression. Each of the two main pitfalls and their resolution would be unchanged if the data were never binned. Binning affects the apparent curvature of the y vs x relationship, but this reads as a more minor point. Therefore, the title may be a bit misleading in service of its poeticism.

We have changed the title of the article to a more apt one – ”Distinguishing different modes of growth using single-cell data”.

3) The authors look at different choices of binning dimensions but do not sufficiently explore the power of their generative model to perform (un)weighted regression or parameter estimation from the not-binned data. They do explore the unbinned data from an analytical statistics approach in sections 5.4.1 and 5.5 but this is not yet extensively explored in the figures and/or discussion.

We apologise for the confusion and we try to clarify it in the manuscript as stated previously in response to Comment 1 of Reviewer 3.

4) This manuscript could be improved by including a discussion on binning single-cell length trajectories, those taken on a single cell within one division cycle, either based on the length or on time. Experimentally, my understanding is that most growth measurements are done on a fixed \Δ t basis, and therefore measurements are averaged using length as a function of time. Another option, one that I have not seen used, is to use a \Δ L basis and measure the time to grow either a specific amount or a specific fractional amount. Of course, these are different ways to slice the same cake, and such an extension may be useful to expand the authors' argument that one needs to utilize a generative model to be able to assess their choice in statistical analysis.

In Section 3 of the manuscript, we discuss binning in relation to the growth rate of single-cell length trajectories as a function of age, and time, and the elongation speed as a function of size. In the simulations conducted in Section 3, we consider the cell lengths measurements to be at equal intervals of time within the cell cycle so as to mimic our mother machine experiments. However, using our methodology of analysing based on a model, it is possible to speculate on the outcome of the data analysis methods in the case of mother machine experiments where measurements are done at equal intervals of length (∆L).

We perform simulations in which lengths within the cell cycle are recorded at ∆L = 0.05µm intervals and the time from birth corresponding to each length increase is also noted. The results from the simulation are shown in Author response image 1. We find that the binned data trend and the best linear fit for the cells growing exponentially following the adder model deviate from the y=x line for ln (LdLb)vs λTd plot (Author response image 1A). This is expected from Equations. 2, 27 and 28 of the main text. On interchanging the axes, the best linear fit and the binned data trend are expected to follow the y=x line for exponential growth. This is shown in Author response image 1B for simulations of cells following the adder model and undergoing exponential growth. For cells undergoing linear growth and birth controlling the division, the slope and intercept of the best linear fit of ⟨λTd vs ln(LdLb) are analytically found to be approximately 1.04 and -0.03 respectively. This is shown in Author response image 1C for simulations of linear growth and cells following the adder model. The binned growth rate trend in growth rate vs age plots are found to be almost constant for cells undergoing exponential growth irrespective of them following the adder model or the adder per origin model (Author response image 1D). In the case of linear growth, the growth rate is expected to decrease with cell age. This is observed for simulations of linearly growing cells following the adder model (Author response image 1D). We also observe an increase in growth rate in agreement with the simulations of cells following the adder model and undergoing faster than exponential growth (Author response image 1D). All of the results presented here are very similar to that of simulations where the cell lengths were measured at equal intervals in time. We also comment about it in p.37 line 685 of the revised manuscript:

“Note that if the measurements were to be carried out at equal length intervals instead of time, the results discussed in the paper would still remain unchanged.”

Author response image 1
Measurements at ∆L intervals: Results are shown for simulations carried out for N=2500 cells, where the cell length measurements are done at ∆L = 0. 05µm intervals of length instead of equal intervals in time.

For exponentially growing cells following the adder model, (A) ln(LdLb)vs<λ>Td plot is shown. The binned data trend (red) and the best linear fit (yellow) deviate from the y=x line (black dashed line). (B) <λ>Tdvsln(LdLb) plot is shown. The binned data trend (red) and the best linear fit (yellow) are close to the y=x dependence (black dashed line). C. For linearly growing cells following the adder model, the binned data trend (red) and the best linear fit (yellow) of the <λ> Tdvsln(LdLb) plot closely follow the y=x dependence (black dashed line). D. The growth rate vs age plots for exponentially growing cells following the adder (purple circles) and adder per origin model (magenta diamonds) are constant while for linear growth (green squares) and super-exponential growth (red triangles) following adder model, the growth rate decreases and increases respectively in agreement with the underlying mode of growth. In summary, the results obtained for ∆L measurements are similar to that obtained for ∆t measurements.

5) As the formulation of this approach borrows from and is built heavily on previous data and previous statistical frameworks, it would be helpful to have more discussion on what is new to this particular manuscript. This is always a difficult balance as one needs the context of the previous work to understand the symbology used, but just reiterating what is already in the literature can lead to a dilution of the new/novel/important discoveries of the manuscript.

We try to be clearer in the manuscript, as detailed in the response to Comment 1 of Reviewer 1.

6) The authors should resolve the use of the term "cell size" to mean both "cell length" and "cell volume" (line 107, line 431/432). The authors briefly touch on a related concept of length growth vs biomass growth in 343-357. As a first, naively simple model of cell growth, it is possible to come up with a simple model for linear vs volumetric growth by imagining that a cell grows exponentially in length from birth to death. In this simple model, at the appropriate time, it switches from growing a cylinder to growing hemispherically endcaps, as it needs to perfectly duplicate its shape. In this simple model, one can easily see that the instantaneous, relative volumetric and length growth rates are not identical; vary throughout the cell cycle; and even their average over the whole cell cycle is not 1. While the authors mention that the average diameter is quite robust between growth conditions, only in the limit that the cell is very long and not a spherocylinder does the approximation that shape does not matter come in to play. Granted, this naïve model does not incorporate the wealth of information known about division timing or geometric changes during growth, and underscores the authors' main point that a specific model of the dynamical process should be included.

Thank you for raising this point. In the revised version, we point out that cell length is used as a measure of cell size. Thus, the growth rate trend shown in the manuscript is the length growth rate.

However, other cell geometry parameters such as surface area and volume can be used to represent cell size and their respective growth rates might differ from the length growth rate. Theoretically, a comparison between the different rates requires choosing a specific model of cell growth as the reviewers point out. In the revised version of the manuscript, we try to compare the different rates assuming a simple cell morphology and using multiple models of cell growth. We add a discussion of these models as Appendix 1 in the revised manuscript:

“In the paper, we use cell length to represent cell size. However, other cell size characteristics such as cell surface area and cell volume could also be used to denote cell size. […] The width of the cells is assumed to be 0.35 µm. We find that before constriction, the length growth rate increases to a small extent (≈ 6%). However, after constriction there is a rapid increase in length growth rate. The mode of growth in length and volume are not identical.”

In conclusion, length growth rate and volume growth rate do vary and we try to state it more clearly in p.6 line 116 of the revised text:

“Note that we consider length to reflect cell size in this paper rather than other cell geometry characteristics such as surface area and volume. The length growth rate that we elucidate in the paper can be different from the cell volume growth rate as shown in Appendix 1 assuming a simple cell morphology and exponential growth. Using the same cell morphology, we also find the length growth rate to be identical to cell surface growth rate.”

We also reiterate this in Section 5, p.20 line 472 of the revised text:

“To reiterate, the length growth is not the same as cell volume growth as shown in Appendix 1.”

Reviewer #1:

In this manuscript, the authors describe a generative model-based framework to better analyze stochastic growth data, including bacterial cell growth. They show how this framework can be applied to gain insight into the processes underlying these phenomena. More specifically, they start by showing how binning along different axes in the

This work is well-supported by simulations and data analysis and will likely be of interest to those trying to understand the processes governing bacterial growth, as well as those studying stochastic growth processes in biology more broadly.

We thank Reviewer 1 for appreciating the methods used in the work and its scope.

Strengths:

– The choice and execution of the simulations were sensible and well-done, respectively, and they provided clarity as to the overall message of the manuscript.

– The conclusions are well-supported by the data.

– I found the writing to be clear throughout.

Weaknesses:

– It would be good to have a more extensive discussion about what is specifically new here. This is not my particular field, so having a bit more of an introduction about methods beyond binning (if any) that have emerged to understand these data.

Binning and linear regression are the most commonly used methods for probing the correlations between variables obtained using single cell data analysis [Ho et al. (2018), Jun et al. (2018)]. In this paper, we try to address the pitfalls associated with applying these simple procedures. In the revised version, we put more emphasis on linear regression throughout the manuscript for example in the ”Introduction” (p.4 line 54 of the revised text):

“While binning may provide a smooth non-linear relation between variables, linear regression is used to find a linear relationship between the variables. In addition to binning, we use the ordinary least squares regression where the slope and the intercept of the best linear fit line are obtained by minimizing the squared sum of the difference between the dependent variable raw data and the predicted value. Here, the best fit/the best linear fit is obtained using the raw data and not the binned data. Similar to binning, the assumption underlying linear regression is that our knowledge of x-axis variable is precise while the noise is in the y-axis variable.”

Following this comment, we added the best linear fits to the Figures 2C, 2D and 3A.

We also try to clearly point out the sections which are not novel to the paper. For example, the adder model in Figure 1A has been discussed previously. We emphasize that in p.5 line 77 of the revised text:

“This previously discussed example demonstrates and reiterates the use of statistical analysis on single-cell data to understand the underlying cell regulation mechanisms.”

Similarly, the novel results of the paper such as obtaining the best linear fits for the plots ln(LdLb) vs ⟨λTd and its flipped axes are based on a class of models studied by Eun et al. (2018). We try to make it clearer in p.7 line 133 of the revised manuscript:

“For that purpose, we use a previously studied model [Eun et al. (2018)] which considers growth to be exponential with the growth rate distributed normally and independently between cell cycles with mean growth rate ⟨λ⟩ and standard deviation CVλλ⟩.”

Reviewer #2:

[…]Weaknesses

The final result (Figure 4) is somewhat disconnected from the majority of the paper that precedes it. Specifically, the authors' procedure that resolves exponential vs. non-exponential growth results in E. coli in alanine being deemed exponential (Figure 2B) only to later be revealed as non-exponential (Figure 4A), albeit weakly. Furthermore, the procedure advertised as distinguishing exponential from linear growth (Figure 3B), when applied to the data, reveals neither (Figure 4). This makes the main point of the paper (the demonstration and resolution of pitfalls) feel disconnected from its application to a particular case, which is more nuanced and likely leaves many questions unanswered.

We attempt to resolve the gap between Figure 4 and the text preceding it by showing that growth rate vs age plot can be used to infer the modes of growth, including, but not limited to, exponential and linear growth. To verify this, we simulate the adder model for cells undergoing super-exponential growth. The binned growth rate trend as a function of age for super-exponential growth is shown in Figure 3B and the following text is added to p.13 line 270 of the revised manuscript:

“Thus, the two growth modes (exponential and linear) could be differentiated using the growth rate vs age plot (for details see Section 5.7). However, the growth rate vs age plots can be used to infer the mode of growth beyond the two discussed above. We show this by using simulations of cells following the adder model and undergoing faster than exponential or super-exponential growth (see the Simulations section for details). In such a case, the growth rate is expected to increase. This increase in growth rate is shown in Figure 3B using simulations. The binned data trend (red triangles) again matches the growth rate mode used in the simulations (red dotted line). Thus, the growth rate vs age plots are a consistent method to distinguish linear from exponential and super-exponential growths.”

The details about the simulation of super-exponential growth has been added to the Simulations section.

We have also added theoretical predictions (dotted lines of same color) to the growth rate vs age curves of different models shown in Figure 3B of the revised manuscript, that agree well with our simulations.

The title ("To bin or not to bin…") implies that binning is the main culprit behind potentially misleading analysis, but I would argue that in the end, it is linear regression. Each of the two main pitfalls and their resolution would be unchanged if the data were never binned, I believe. Binning affects the apparent curvature of the y vs x relationship, but this reads as a more minor point. Therefore, the title may be a bit misleading in service of its poeticism.

We thank the reviewer for the comment and we have changed the title to a more apt one for the manuscript- ”Distinguishing different modes of growth using single-cell data”.

Reviewer #3:

Kar et al., examine an interesting and important question of how to make sense of large sets of observational data, specifically cell length data, which may or may not be consistent with various underlying biological mechanisms. As datasets improve in their technical quality (increasing spatiotemporal resolution, increasing numbers of observations), there is hope that the community will be able to resolve differences between underlying cell biological mechanisms of cell size homeostasis. As the authors point out, these interpretations and analyses require statistical analysis that can accurately perform the model selection or parameter estimation task of interest.

We thank Reviewer 3 for the comments. Indeed, our message in the paper is to use underlying biological models to aid the inference of biological mechanisms using various statistical analyses methods.

1) The authors succeed in bringing attention to the issue of appropriate binning when analyzing large datasets. The authors focus their figures and discussion on an important, and practical issue, as many researchers perform linear regression on binned data. The title and framing of the manuscript imply that it will provide a comparison with statistical methods that do not involve binning. The authors look at different choices of binning dimensions, but do not sufficiently explore the power of their generative model to perform (un)weighted regression or parameter estimation from the not-binned data. They do explore the unbinned data from an analytical statistics approach in section 5.4.1 and 5.5 but this not yet extensively explored in the figures and/or discussion.

We apologize for the confusion. Throughout the paper, we perform linear regression on the raw data and not the binned data. We have now added a clarification statement in p.4 line 58 of the revised text:

“Here, the best fit/the best linear fit is obtained using the raw data and not the binned data.”

Further, we have changed the title of the manuscript to ”Distinguishing different modes of growth using single-cell data”. The paper aims to bring forth the issues in both binning and linear regression which arise from similar sources i.e., the intrinsic noise affecting the x-axis variable and the inspection bias. We discuss these issues in relation to mode of growth in single cells as the new title now states.

In addition to section 5.4.1 and 5.5 where we discuss calculating the best fit line for exponential and linear growth, we also try to explicitly mention linear regression on non-binned data in the figures, and the Results and Discussion section. This is shown by the addition of the best linear fit/best fit in Figures 2C, 2D and 3A.

https://doi.org/10.7554/eLife.72565.sa2

Article and author information

Author details

  1. Prathitha Kar

    1. Harvard John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, United States
    2. Department of Chemistry and Chemical Biology, Harvard University, Cambridge, United States
    Contribution
    Conceptualization, Formal analysis, Methodology, Writing - original draft, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-4091-6860
  2. Sriram Tiruvadi-Krishnan

    Department of Physics and Astronomy, University of Tennessee, Knoxville, United States
    Contribution
    Conceptualization, Methodology, Writing - review and editing
    Competing interests
    No competing interests declared
  3. Jaana Männik

    Department of Physics and Astronomy, University of Tennessee, Knoxville, United States
    Contribution
    Conceptualization, Methodology, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-0777-7846
  4. Jaan Männik

    Department of Physics and Astronomy, University of Tennessee, Knoxville, United States
    Contribution
    Conceptualization, Formal analysis, Methodology, Writing - review and editing
    For correspondence
    jmannik@utk.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-6759-3053
  5. Ariel Amir

    Harvard John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, United States
    Contribution
    Conceptualization, Formal analysis, Methodology, Writing - review and editing
    For correspondence
    arielamir@seas.harvard.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-2611-0139

Funding

US-Israel BSF Research Grant (2017004)

  • Jaan Männik

National Institutes of Health (R01GM127413)

  • Jaan Männik

National Science Foundation (NSF CAREER 1752024)

  • Ariel Amir

National Science Foundation (NSF award 1806818)

  • Prathitha Kar

National Institutes of Health (R01 AI143611)

  • Ariel Amir

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

The authors thank Ethan Levien, and Jie Lin for useful discussions, Jane Kondev, Xili Liu, and Marco Cosentino Lagomarsino for their useful feedback on the manuscript, Da Yang and Scott Retterer for help in microfluidic chip making, and Rodrigo Reyes-Lamothe for a kind gift of strain. Authors acknowledge technical assistance and material support from the Center for Environmental Biotechnology at the University of Tennessee. A part of this research was conducted at the Center for Nanophase Materials Sciences, which is sponsored at Oak Ridge National Laboratory by the Scientific User Facilities Division, Office of Basic Energy Sciences, U.S. Department of Energy. This work has been supported by the US-Israel BSF research grant 2017004 (JM), the National Institutes of Health award under R01GM127413 (JM), NSF CAREER 1752024 (AA), NIH grant R01 AI143611 (AA) and NSF award 1806818 (PK).

Senior Editor

  1. Aleksandra M Walczak, École Normale Supérieure, France

Reviewing Editor

  1. Gordon J Berman, Emory University, United States

Reviewer

  1. Benjamin P Bratton, Princeton University, United States

Publication history

  1. Received: July 28, 2021
  2. Accepted: November 21, 2021
  3. Accepted Manuscript published: December 2, 2021 (version 1)
  4. Accepted Manuscript updated: December 8, 2021 (version 2)
  5. Version of Record published: January 4, 2022 (version 3)
  6. Version of Record updated: January 7, 2022 (version 4)

Copyright

© 2021, Kar et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,820
    Page views
  • 253
    Downloads
  • 1
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Prathitha Kar
  2. Sriram Tiruvadi-Krishnan
  3. Jaana Männik
  4. Jaan Männik
  5. Ariel Amir
(2021)
Distinguishing different modes of growth using single-cell data
eLife 10:e72565.
https://doi.org/10.7554/eLife.72565

Further reading

    1. Epidemiology and Global Health
    2. Microbiology and Infectious Disease
    Paul Tupper, Shraddha Pai ... Caroline Colijn
    Research Article Updated

    The role of schools in the spread of SARS-CoV-2 is controversial, with some claiming they are an important driver of the pandemic and others arguing that transmission in schools is negligible. School cluster reports that have been collected in various jurisdictions are a source of data about transmission in schools. These reports consist of the name of a school, a date, and the number of students known to be infected. We provide a simple model for the frequency and size of clusters in this data, based on random arrivals of index cases at schools who then infect their classmates with a highly variable rate, fitting the overdispersion evident in the data. We fit our model to reports from four Canadian provinces, providing estimates of mean and dispersion for cluster size, as well as the distribution of the instantaneous transmission parameter β, whilst factoring in imperfect ascertainment. According to our model with parameters estimated from the data, in all four provinces (i) more than 65% of non-index cases occur in the 20% largest clusters, and (ii) reducing instantaneous transmission rate and the number of contacts a student has at any given time are effective in reducing the total number of cases, whereas strict bubbling (keeping contacts consistent over time) does not contribute much to reduce cluster sizes. We predict strict bubbling to be more valuable in scenarios with substantially higher transmission rates.

    1. Genetics and Genomics
    2. Microbiology and Infectious Disease
    Benjamin J Chadwick, Tuyetnhu Pham ... Xiaorong Lin
    Research Article Updated

    The environmental pathogen Cryptococcus neoformans claims over 180,000 lives each year. Survival of this basidiomycete at host CO2 concentrations has only recently been considered an important virulence trait. Through screening gene knockout libraries constructed in a CO2-tolerant clinical strain, we found mutations leading to CO2 sensitivity are enriched in pathways activated by heat stress, including calcineurin, Ras1-Cdc24, cell wall integrity, and Regulator of Ace2 and Morphogenesis (RAM). Overexpression of Cbk1, the conserved terminal kinase of the RAM pathway, partially restored defects of these mutants at host CO2 or temperature levels. In ascomycetes such as Saccharomyces cerevisiae and Candida albicans, transcription factor Ace2 is an important target of Cbk1, activating genes responsible for cell separation. However, no Ace2 homolog or any downstream component of the RAM pathway has been identified in basidiomycetes. Through in vitro evolution and comparative genomics, we characterized mutations in suppressors of cbk1Δ in C. neoformans that partially rescued defects in CO2 tolerance, thermotolerance, and morphology. One suppressor is the RNA translation repressor Ssd1, which is highly conserved in ascomycetes and basidiomycetes. The other is a novel ribonuclease domain-containing protein, here named PSC1, which is present in basidiomycetes and humans but surprisingly absent in most ascomycetes. Loss of Ssd1 in cbk1Δ partially restored cryptococcal ability to survive and amplify in the inhalation and intravenous murine models of cryptococcosis. Our discoveries highlight the overlapping regulation of CO2 tolerance and thermotolerance, the essential role of the RAM pathway in cryptococcal adaptation to the host condition, and the potential importance of post-transcriptional control of virulence traits in this global pathogen.