REPOP: bacterial population quantification from plate counts

Pedro Pessoa; Carol Lu; Stanimir Asenov Tashev; Rory Kruithoff; Douglas P Shepherd; Steve Pressé

doi:10.7554/eLife.107122.1

eLife Assessment

This important study introduces a Bayesian method to determine bacterial counts that accounts for the experimental noise inherent to dilution and plating methods, and distinguishes it from biological uncertainty. The evidence supporting the conclusions is convincing, combining simulated data and experimental data. The method will be of interest to microbial ecologists, and potentially to the broader community interested in inference from biological data, even more so if the domain of application and the limitations are further clarified.

https://doi.org/10.7554/eLife.107122.1.sa3

Significance of findings

important: Findings that have theoretical or practical implications beyond a single subfield

landmark
fundamental
important
valuable
useful

Strength of evidence

convincing: Appropriate and validated methodology in line with current state-of-the-art

exceptional
compelling
convincing
solid
incomplete
inadequate

During the peer-review process the editor and reviewers write an eLife assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife assessments

Abstract

Bacterial counts from native environments, such as soil or the animal gut, often show substantial variability across replicate samples. This heterogeneity is typically attributed to genetic or environmental factors. A common approach to estimating bacterial populations involves successive dilution and plating, followed by multiplying colony counts by dilution factors. This method, however, overestimates the heterogeneity in bacterial population because it conflates the inherent uncertainty in drawing a subsample from the total population with the uncertainty in the sample arising from biological origins. In other words, this approach may obscure features that may otherwise be present in the data hinting at the presence of genuine subpopulations. For example, in plate counting applied to C. elegans gut microbiota, observed multimodality is often interpreted as large host-to-host variance, while the randomness introduced by measurement is frequently ignored. To explicitly account for the uncertainty introduced by dilution and plating randomness, we introduce REPOP, a PyTorch-based library to REconstruct POpulations from Plates within a Bayesian framework. Beyond simple cases, REPOP addresses more complex scenarios, including multimodal populations and correcting the mathematically subtle, but experimentally relevant, bias introduced by excluding plates deemed too crowded to distinguish individual colonies. We demonstrate REPOP’s ability to resolve distinct population peaks otherwise obscured by standard multiplication methods. Applications to both simulated and experimental datasets, including bacterial samples of different concentrations and ones from the gut microbiota of C. elegans, show that REPOP accurately recovers the underlying multimodality by properly accounting for error propagation, where naive multiplication fails. REPOP is available on GitHub: https://github.com/PessoaP/REPOP.

1 Introduction

Plate counting is fundamental to microbiology in estimating the number of living microor-ganisms in a sample [1, 2]. For example, it is essential in food safety testing [3], water quality assessments [4, 5], amongst many other applications [6–9] that require flexible quantification of microbial loads.

Briefly, plate counting involves serially diluting an original sample to ensure an appropriate colony density, allowing for the accurate enumeration of colonies for each sample (as summarized in Fig. 1a). Subsequently, the diluted samples are spread over nutrient-rich agar plates and incubated. After incubation, each viable organism (typically a bacterium) spread over the plate forms a colony, or in some cases a plaque [10], which are then enumerated as colony forming units (CFUs) or plaque forming units (PFUs), respectively. The degree of dilution (represented by the dilution factor) is typically set in such a way as to collect a statistically precise number of colonies but simultaneously avoiding colony overlap – usually aiming for 30 to 300 CFUs in a plate with a diameter of 10 cm. However, when studying a poorly characterized or highly variable bacterial population, collecting plates that are overcrowded is inevitable. Challenges related to counting crowded plates were only recently mitigated by AI-inspired tools [11–14].

Visual description of the dilution and plate counting process and the need for its analysis.
a) A sample of volume V is mixed with sterile medium in a larger vial (e.g., 19V, for a total dilution of 20). A portion of volume V of this diluted sample is then spread on a plate and incubated. Further tenfold dilutions are performed by transferring V into a vial with 9V of medium before plating. The process continues until a reasonable number of counts is obtained. b) The histogram on the left shows simulated counts across 1000 plates, obtained by diluting samples from the ground truth distribution by a factor of 200. The usual method of multiplying the counts by the total dilution factor produces the orange histogram (in the right) which appears significantly broader than the ground truth (black line, right) due to additional stochasticity introduced by dilution and plating. The ground truth distribution is a Gaussian with a mean of 8000 and a standard deviation of 500. REPOP’s reconstruction (blue) corrects the broadening. c) A similar simulation to (b) where the ground truth population distribution is trimodal (parameters in Fig. 2). Here the stochasticity introduced dilution and plating obscure the three peaks, making them difficult to distinguish in the observed counts. Nevertheless, REPOP successfully reconstructs the trimodal distribution.

Beyond simple quantification, plate counting is often used to assess population variation within a single microbial species, the degree of phenotypic variability [15–18] and environmental resource availability [19,20]. This stems from interpreting multimodality – several samples exhibiting high and several others low counts – or general heterogeneity in plate counts as reflecting population differences in colony formation, growth rates, or stochastic expression of traits [21, 22]. This heterogeneity can then be used to understand ecological dynamics of bacterial populations [19, 23–26] such as the competition of populations inside the gut of C. elegans [27].

On account of this, learning the variability from sample to sample is critical in accurately capturing a population’s heterogeneity. Thus, in order to draw biological conclusions, one would have to disentangle the effects of the stochasticity involved in the measurement (e.g., the randomness involved in the process of dilution and plating) from the heterogeneity in the original sample’s population arising from the underlying biological or ecological variation.

Naively, in dilution and plating, the total number of viable bacteria in the original sample is usually estimated by multiplying the number of colonies by the total dilution factor for each plate, resulting in a histogram of the population in the original samples. This process significantly overestimates the true variability in the bacterial count within the samples and can even obscure the multimodality in reconstructed population count histograms, as demonstrated in Fig. 1b and c.

While all methods – including those based on flow cytometry or DNA quantification [28–35] – have their own limitations, plate counting is considered the gold standard in several fields of microbiology [4, 36]. It has, nevertheless, the critical limitation of propagating error introduced via dilution.

To overcome the limitations of traditional plate count estimation, we developed REPOP (REconstruct POpulations from Plates), a software package that employs a Bayesian approach that rigorously accounts for the stochasticity introduced by dilution and plating. By modeling these statistical processes, REPOP infers both the full probability distribution of the original samples, preserving its inherent heterogeneity, as demonstrated in Fig. 1b. Notably, REPOP can detect the multimodal structure of the data even when such multimodality is not readily apparent from dilution count histograms, as shown in Fig. 1c. REPOP is available on GitHub [37]

When presenting our method, we address a mathematically subtle point that may qualitatively affect the inferred counts: the cutoff for “uncountable” on a plate. That is to say, when multiple dilutions are performed in experiments to obtain plates with a large quantity of CFUs, plates that exceed a preset colony cutoff – a number beyond which colonies are considered too numerous to count – are excluded. While this practice is common, failing to properly account the exclusion of such plates can introduce bias in the inferred population distribution.

Within the present paper, we will explain how REPOP is able integrate the previously ignored over-populated plates exceeding the cutoff in a more sophisticated albeit requisite iteration of our simpler method. Why requisite? No matter how low we set our dilution factor, a plate may probabilistically always exceed the countable cutoff, though increasingly rarely. Thus, if we can deal with over-populated plates, then we may collect better statistics on colony counts by avoiding too large a dilution.

To demonstrate REPOP’s applicability to real-world datasets, we analyzed two distinct experimental systems. First, we applied REPOP to plate counting data obtained from controlled mixtures of E. coli populations at different optical densities. Without prior knowledge of vial identity, REPOP successfully reconstructed the underlying distribution. Second, we employed REPOP to study bacterial colonization dynamics in individual C. elegans hosts. By analyzing plate counts of gut bacterial populations over several days post-feeding, REPOP revealed the progressive increase in bacterial load and captured the multimodal structure associated with host heterogeneity. These results highlight REPOP’s ability to disentangle true biological variability from stochastic sampling noise in complex, real experimental contexts.

2 Methods

2.1 Statistical description of plate counting

We first describe the process of serial dilutions in mathematical terms. From Fig. 1a we observe that, on each plate, a fraction of the volume in the initial sample, equal to the inverse of their respective dilution factor (ϕ), is spread over the plate. As such, supposing the population in the original sample to be well-stirred, each individual bacterium has probability of being transferred onto that plate.

We denote the total number of bacteria in the original sample as n. Assuming each bacterium is independently and equally likely to be transferred, the number of bacteria transferred and forming colonies, denoted as k, follows a binomial distribution with parameters n (the number of trials) and (the probability of success in each trial) expressed as

With this description of the dilution and plating process, we now describe how we use Bayesian inference to reconstruct the population distribution.

2.2 Bayesian inference

For now we assume that each sample (drawn from the sample at the previous dilution) gives rise to one plate. The dataset obtained consists of the sequence of plate counts, k_i and their respective dilution factors ϕ_i for each replicate i. We denote the complete dataset as with and , with each pair (k_i, ϕ_i) generated independently as they arise from different samples.

Our overall goal is to learn the the probability distribution for the number of bacteria in the original samples, n, from the dataset D. To achieve this, we assume a priori that the distribution over n follows a model of population distribution with a set of parameters θ, written as p(n|θ).

The following subsection (2.3) is aimed at considering both unimodal and multimodal population distributions, p(n|θ). First, assuming that the population distribution is unimodal, we represent p(n|θ) as a simple Gaussian distribution, where θ reflects the mean and standard deviation. Second, in order to model a multimodal distribution, we invoke non-parametric models which provide a means to describe multimodal probability distributions for which the number of modes is unknown a priori [38–43]. More specifically, we assume a non-parametric Gaussian mixture, where θ represents a (potentially infinite) set of means, standard deviations, and mixture weights. Importantly, in large datasets, non-parametric models can flexibly adapt to approximate probabilities that do not strictly follow a predefined functional form [43–45].

The goal of reconstructing the distribution of bacteria in the original samples then becomes one of learning θ from D. In other words, we want to learn the probability of θ conditioned in D, p(θ|D). This probability is called a posterior. As we will see later, calculating the posterior will require the use of p(n|θ), which is unior multimodal, and the use of p(ϕ_i|n) when cutoffs on countability of plates are used in order to eliminate plates containing too many colonies.

REPOP is applied here to three cases of increasing complexity: the first (Model 1) is the simplest, assumes unimodal p(n|θ) without cutoffs. In Model 2 we keep no cutoff in counts, but generalize p(n|θ) to a multimodal (which simplifies to the unimodal Model when all but one component is considered). In Model 3 we keep the most general multimodal distribution for p(n|θ), but expand the counting method to account for the cutoff resulting in some plates not being counted and simply exceeding the cutoff. We summarize the differences among the three mechanisms in Table 1.

For ease of reference, we summarize the differences among the three models taken into account in the present article.

For all models highlighted above, in order to compute p(θ|D), we need the likelihood, p(D|θ), then related to p(θ|D) through Bayes’ theorem

Here p(θ) is the prior. Further details on the prior are given in the supplemental information A. Once we calculate probability p(θ | D), we may maximize this probability to obtain the maximum a posteriori (MAP) value, θ^MAP. More details are provided later in Sec. 2.5.

2.3 Models for the population distribution

2.3.1 Unimodal distribution (Model 1)

For Model 1, we can mathematically represent the probability of the true population count n given the distribution parameters θ as:

Here, θ includes the mean, µ, and standard deviation, σ, of the integer Gaussian p(n|θ), θ = {µ, σ}. The denominator above is necessary because we are dealing with (positive) integer Gaussians and will approach the usual when µ − 5σ ≫ 1. For computational reasons, we assume that the probability of having a number of bacteria, n, larger than twice the largest number found by multiplying CFU counts per dilution is approximately zero. Thus, the infinite upper bound over the sum in the denominator of (3) is substituted for . Note that, by using the integer Gaussian in (3), we truncate the negative support of a Gaussian distribution, meaning we assume that even if we have a pair of µ and σ that assigns non-negligible probability to counts n ≤ 0, those probabilities are set to zero, and the distribution is renormalized accordingly.

2.3 Multimodal distribution non-parametric inference (Models 2 and 3)

Naturally, the choice of modeling n through a single Gaussian component, Model 1, already commits us to assuming a unimodal distribution. However, if bacteria are present across replicates in a multimodal distribution — such as observed in some studies of C.elegans gut microbiota [8, 27, 46], where some replicates have few bacteria and others have more based on rare colonization events [27, 46] — how many modes should we consider?

This consideration immediately forces upon us an important motivation for considering a much more general choice for p(n|θ), namely a non-parametric Gaussian mixture. Loosely speaking, non-parametric Gaussian mixture models do not assume a fixed number of parameters (or components) beforehand. Instead, they adapt and determine the appropriate number of components as warranted by the data. Knowledge of non-parametric mixture models, while subtle and detailed in Ref. [43] and references therein, are not required for running REPOP. In such a model, θ encompasses the means , standard deviations , and mixture weights (the relative weights of each Gaussian component with the weights summing to unity) , such that . In this non-parametric model, the probability of having a sample of n bacteria is given as

When performing non-parametric inference, the choice of prior p(θ) is key. A well-chosen prior can add computational efficiency and avoid runaway pathologies associated with assigning every datapoint a different mixture component. A common choice of prior for the infinite mixture model satisfying both of these criteria is the Dirichlet process prior [43]; see supplemental information A for more details. While a finite truncation in the number of components can be avoided [47, 48], for computational reasons, here we use a finite truncation (defaulted to the smallest value among 25 and the square root of the total number of samples, though modifiable, in REPOP) for the sum over the components, α, in (4).

2.4 Calculating the likelihood

Here we explain how to calculate the likelihood, p(D|θ), appearing in (2) and start with Models 1 and 2 and move to Model 3 where the cutoff is taken into consideration.

2.4.1 Likelihood for Models 1 and 2

In the Models where cutoffs are not taken into consideration (i.e., where it is assumed that the colonies on a plate can always be counted and plates are never rejected for having too many colonies), the calculation of the total likelihood, for all counted plates from the original number of bacteria in one sample, n, follows the scheme described in Sec. 2.1. However, n_i is itself a realization of the population distribution p(n|θ) described in the previous section (2.3). Thus the probability that a plate with k_i counts were generated from a dilution ϕ_i is given by marginalizing over n_i as

With each factor p(k_i|ϕ_i, n_i) within the summation given by (1) and p(n_i|θ) given by the respective population model as presented in Sec. 2.3.

This allows us to write Bayes’ theorem (2) as

where I is defined as the total number of replicates. Note that while this summation is truncated to , these sums can still remain computationally intensive. Thus, while REPOP can run on CPUs, we automatically leverage GPU acceleration to enable faster evaluation of these summations if GPU is available.

2.4.2 Likelihood for Model 3

For Model 3, we assume that the number of colonies may exceed a certain cutoff, k_CO, and that such plates are not counted. Typically, for each sample, only one of the plates generated in the process described in Fig. 1a is reported. In other words, for each collected sample only the counts (and dilution) of one of the plates at the first dilution resulting in a countable number of colonies (below a pre-defined cutoff) is recorded as a datapoint. This is illustrated in Fig. 1a.

This process is different from the previous models. If a dilution different from the first dilution in the dilution schedule is reported, this implies that all lower dilutions therefore resulted in plates whose colony count (a stochastic variable) already exceeded the cutoff k_CO. To obtain the posterior over θ, it is necessary to know the dilution schedule from the plating experiment. We refer to the full set of these as Φ = {ϕ¹, ϕ², …, ϕ^M} ordered from lowest to highest dilution factor. In the example given in Fig. 1a we have ϕ¹ = 20, ϕ² = 200, and ϕ³ = 2000. While, mathematically, we could extend this indefinitely by sequentially plating dilutions that are factors of 10 larger, in practice only a finite number of plates can be produced, thus justifying the upper limit M. It is expected that the dilution schedule leads to dilution values sufficiently large that the approximate number of colonies recovered approaches zero. Expanding upon the previous models, it is now important to report not only the counts but also the associated plate index. The datapoint is now reported as (k_i, m_i), which means that for the i-th sample, the plate diluted by a total factor of had k_i counts and all plates with smaller dilution had counts larger than the cutoff k_CO.

With the set of all possible dilutions Φ explicitly considered, we incorporate this dependence into the calculation of all relevant probabilities. Thus, the total posterior for the full dataset is given by

Here, we used the fact that the population distribution and its parameters, θ, are independent of the dilutions selected to plate, Φ. Therefore, p(θ|Φ) = p(θ) is the original prior and p(n_i|θ, Φ) = p(n_i|θ) is given by the respective population model of Sec. 2.3. The remainder of this subsection focuses on how to calculate the likelihood of each datapoint, p(k_i, m_i|n_i, Φ), within the summation above. To obtain this likelihood, we revisit the dilution and plating process. As described in Fig. 1a, a plate is generated for each dilution within the set of dilutions Φ. For each sample, labeled i, the number of colonies on the plate with dilution ϕ^m is denoted as .

Since all plates originate from the same original sample (with a finite number of bacteria, n_i), the values of for the same i are, in principle, not independent. I nstead, the set of all is jointly sampled from a single multinomial distribution. However, as each bacterium has an independent probability of of appearing on the m-th plate, the correlations between the multiple arise solely due to the finite number of total bacteria in the sample. Consequently, as long as the fraction of plated bacteria remains significantly smaller than the total sample size (i.e., ), the number of plated bacteria can be treated as independent binomial random variables.

Under this assumption, each count can be modeled as a binomial realization of the original number of bacteria n_i with dilution factor ϕ^m, as described in Eq. (1). Thus, the likelihood of observing is written as:

As previously described, the datapoint (k_i, m_i) corresponds to the counts and the index of the first value within the sequence that satisfies the condition , where k_CO is the cutoff. To formalize this, we define m_i as the (superscript) index of the first plate where the counts fall within the cutoff:

from which the observed datapoint (k_i, m_i) is obtained with:

With this in mind, it becomes possible to rewrite the likelihood, p(k_i, m_i|n_i, Φ), as

Once more, we proceed by showing how to calculate each factor within the product above. Starting from the last factor, p(m_i|n_i, Φ), we note that if the plate with index m_i is reported, we must have that both and for all m < m_i,

Here we use the capitalized P to represent that we are calculating the probability that a variable is within a set, rather than the probability of a single possible value. In this specific model, what we are asking is the probability that is smaller than or equal to k_CO. This is given by summing the probability of every value of up to k_CO, meaning

so (12) becomes

with each given by (8).

Then, for the remaining factor in (11), p(k_i|m_i, n_i, Φ), since counts above the cutoff are not reported as data, the likelihood of finding k_i colonies in a plate diluted by a total factor is re-scaled to account for the fact it was truncated at the cutoff

for k_i ≤ k_CO and zero otherwise.

Thus, in order to obtain the posterior (7), we need to substitute the likelihood per datapoint, p(k_i, m_i|n_i, Φ) in (11), which is computed using (15) and (14).

2.5 Bayesian MAP

In Bayesian inference, we compute the posterior distribution p(θ|D) as it captures the updated probability over the model parameters after observing the data. However, posteriors are often complex and high-dimensional (technically infinite in non-parametric models), making direct computation and visualization challenging. To address this, we often resort to sampling methods to approximate the posterior. However, these samplers can also require long computation.

Since our objective here is to create a tool that is quick and easy to run, we resort to maximum a posteriori (MAP) estimation. This reduces the problem of knowing the distribution over θ to identifying the optimal θ. In this model, we maximize p(θ | D) and denote the maximum value as θ^MAP,

with p(θ | D) given by (6) for Models 1 and 2, or (7) for Model 3. For our purposes here, we say that p(n | θ^MAP) is the distribution of n we learned from the data. In REPOP, the maximum value θ^MAP is found using optimization libraries within PyTorch [49].

3 Results

In the previous section we have constructed the posterior p(θ|D) across unimodal and multimodal population distributions, and by using a likelihood that either assumes all plates are countable (Model 1-2) or considers plates exceeding the cutoff (Model 3).

3.1 When all plates are considered countable

For the first model, we generate simulated data drawn from a single Gaussian with equal dilution factors across all samples. Meaning, this assumes that the dilution factor used (200) is able to generate a countable number of colonies in all samples. We also assume a unimodal Gaussian, as in Model 1. The reconstruction obtained by REPOP was shown earlier in Fig. 1b where we see that REPOP corrects for the overestimation of variance introduced by the stochasticity in dilution and plating.

In the more complex multimodal Gaussian case (Model 2), we further observe that RE-POP uncovers the true underlying distribution’s peaks with greater resolution and sensitivity to multimodality than by naive multiplication. As an example, here we generate simulated data for a mixture of Gaussians with three modes. In Fig. 2a, we show a relatively simple dataset, where the three peaks are already discernible in the histogram obtained via naive multiplication. Nonetheless, REPOP improves the resolution of these peaks, particularly as the number of samples increases. In Fig. 2b, we present a more challenging case where the modes are less distinct under the naive multiplication approach; the additional stochasticity introduced by dilution causes the resulting histogram to appear as a single broad peak. Nonetheless, REPOP is able to accurately recover the underlying multimodality. In both cases, the performance of REPOP improves with larger datasets, as evidenced by the progressively sharper resolution of each peak.

REPOP can reconstruct multimodal populations even when they are not directly observable by naively multiplying counts by dilution.
Similar to Fig. 1b and c, this figure shows the observed counts, the naïve estimate obtained by multiplying counts by dilution, and the reconstruction produced by REPOP, compared to the ground truth. In both cases, the population n is drawn from a ground truth mixture of three Gaussians, shown as the black curves. In a), the mixture has means (4000, 8000, 14,000), standard deviations (200, 1500, 1000), and weights (0.25, 0.4, 0.35), while in b), it has means (8000, 16,000, 24,000), standard deviations (1000, 1000, 1000), and weights (0.3, 0.2, 0.5). Observed counts are sampled from (1), diluted by a factor of 200, and shown as histograms on the left. In a), a trimodal structure is visible in the observed counts histogram. However, in (b), the trimodality is heavily obscured by the stochasticity introduced by dilution and plating, making it indiscernible even in large datasets. Despite this, when applying REPOP to datasets of increasing size, we see that while a small dataset of 25 plates may result in missed modes, larger datasets allow REPOP to accurately recover the underlying multimodal structure in both cases. This demonstrates REPOP’s ability to infer the true population despite the stochastic noise introduced by dilution and plating.

3.2 When the cutoff is taken into account

For our final demonstration with simulated data, we show results when the data follows a cutoff in reporting, as described in Sec. 2.4.2. We have generated simulated data according to the following scheme: starting from the lowest dilution (20, as shown in Fig.1a). If the counts are larger than the cutoff, we used k_CO, we sample again at a dilution factor 10 times higher until we find a plate that has fewer colonies than k_CO. We then record that number of counts and that dilution factor as the i-th datapoint (k_i, ϕ_i), as would be done by rigorously following the cutoff. If we know the dilution schedule, this is equivalent to having the data as the counts and the index of the reported dilution (k_i, m_i) as discussed in Sec. 2.4.2, with each reported dilution matched to the corresponding index m_i, meaning the dilution reported is the m_i-th in the schedule . In Sec. 2.4.2 we also discuss how taking these cutoffs into account changes the likelihood calculation method.

We visualize the data generated this way and show the reconstruction obtained for Model 3 in Fig. 3. In the rightmost column, we demonstrate how failing to account for the cutoff leads to an artificial peak (e.g., an extra mode near the leftmost peak in the ground truth, Fig. 3b). This occurs because counts near the cutoff k_CO are ambiguously assigned to incorrect dilutions.

Not accounting for the cutoff can lead to incorrectly attributed multimodality.
This figure compares population reconstructions with and without accounting for the cutoff effect, (using Model 3 and Model 2 respectively). a) simulated data generated with a cutoff k_CO = 50. The ground truth population consists of three Gaussian components with means , standard deviations , and weights . b) A more realistic cutoff k_CO = 300, with ground truth parameters , and . The middle column shows reconstructions using Model 3 (which accounts for the cutoff), while the rightmost column shows results from Model 2 (which ignores it). The rightmost column also highlights key cutoff crossings, using red dashed lines. These lines indicate where bacterial counts approach the cutoff at different dilutions. Notably, failing to account for the cutoff leads to incorrect mode placement including the emergence of an extra peak observed in b). As discussed in Sec. 3.2, when the true distribution has substantial probability mass near these points, meaning there is a non-negligible way to find a sample with these numbers, failing to account for the cutoff effects results in inaccurate reconstructions.

These errors become particularly pronounced when colony counts approach k_CO, as the cutoff procedure forces certain samples to be reported at different dilutions. For example, consider a cutoff k_CO = 300 and two possible dilution factors, ϕ¹ = 20 and ϕ² = 200. A sample with an original bacterial count n_i near may yield counts at dilution ϕ¹ that either exceed or fall below the cutoff due to variability in the plating process. When the cutoff effect is ignored, the algorithm mistakenly interprets these counts as coming from separate populations, leading to incorrect peak formation, as seen in the rightmost column of Fig. 3b. Beyond this qualitative observation, we also assess the magnitude of these errors by computing the relative deviation between the estimated and ground truth means for each component (see Supplemental Information B).

3.3 Results for experimental data: Discerning different true concentrations of bacteria

While we have demonstrated in the previous subsections how to resolve overlapping distributions in simulated datasets, here we move on to obtaining a similar case built with real-world plate counting. In particular, we construct a case that emulates multiple peaks at different dilutions, as shown in Fig. 3, by generating three different sample sources of bacteria.

We prepared vials with four different concentrations of E.coli measured by optical density, and consider samples of a volume equivalent to 0.2nL from these vials. These samples are further diluted according to the dilution schedule Φ = 20, 200, 2000. However, in practice, we do not directly dilute 0.2nL. Instead, we take 1µL and dilute it by factors of 10⁵, 10⁶, and 10⁷ – further details in Supplemental Information C.1. This approach ensures that while the original vials can be assumed to be have neglegible variability, the small 0.2nL subsamples may no longer be, resulting in a unimodal but with a larger relative variance population for each vial.

The colony counts obtained are processed without identifying their original vial, aiming to reconstruct the underlying tetramodal structure. However, the dataset remains relatively small, consisting of only 97 total samples (see details in Supplemental Information C.1). This dataset and the reconstructed distribution are presented in Fig. 4a.

Reconstructing, from real experimental plate counting, the population from vials with different optical densities.
a) Distribution of observed colony counts obtained as described in Sec. 3.3. The data is passed to the system without vial identification, aiming to reconstruct the underlying tetramodal structure. b) Results from a simulated dataset with 500 datapoints (plates). The increased sample size improves the estimation of the four main components, yielding a more accurate reconstruction.

Although the limited sample size results in an imperfect match, the method successfully captures the four main components – corresponding to the estimated values of µ_α and σ_α associated with the four highest values of ρ_α. To further validate the approach, we supplement the dataset by generating 800 synthetic datapoints from the mixture distribution made of the four main components, leading to a dataset considerably larger than what is typical of plate counting. The results, shown in Fig. 4b, demonstrate a significant improvement in capturing the underlying structure of the data.

3.4 Application: Capturing host differences through examining gut microbiota population

Here, we use REPOP to studying heterogeneity in more complex biological systems, such as microbial ecology within a host organism. Thus, we apply our framework to analyze gut bacterial counts from individual C. elegans.

Heterogeneity in bacterial counts among individual C. elegans has been reported and is typically attributed to rare colonization events and size variability [27, 46, 50]. However, to derive meaningful insights into gut ecology, it is essential to distinguish true population heterogeneity from the stochasticity introduced by dilution and plating.

To minimize confounding factors, we restrict our analysis to the OP50 strain of E. coli, which is well-characterized in laboratory settings and, under idealized conditions, is expected to produce a unimodal population distribution. Any deviations from this expected pattern may reflect individual C. elegans variability, either due to differences in the gut microenvironment or the fact that gut colonization is a rare event [27].

Due to this, we conducted an experiment in which C. elegans were synchronized, cleared of gut bacteria at three days post-hatching and subsequently fed live E. coli for controlled durations (1, 3, 5, 7, or 9 days). At each studied time point, 25 individual worms were picked, cleaned of all non-gut adhered bacteria, and crushed. The worm homogenate was then serially diluted and plated with a dilution schedule, Φ = {22, 222, 2222}. The plates were counted after a 24 hour incubation. More detailed experimental methods may be found in the Supplemental Information C.2.

The results presented in Fig. 5 demonstrate a clear trend in bacterial population dynamics within the C. elegans gut. The population distribution shifts consistently to higher values with each successive day, aligning with our expectation that bacterial numbers increase over time due to colonization and replication inside the gut.

Results for the population of *E.coli*-fed *C. elegans*’ gut at days 3, 5, 7, and 9 of adulthood.
Here we are able to observe the shift in the population distribution obtained with plate counting shifts to higher population for longer-living *C. elegans*. To better visualize the distribution and model fits, we use a split-log x-axis: bacterial counts from 0 to 100 are shown on a linear scale, and counts above 100 on a logarithmic scale. The population distribution shifts toward higher values each day, as expected from colonization and replication within the gut. If biological significance, such as different *C. elegans* types, as proposed in e.g. [51] is to be inferred from this multimodality, a rigorous separation of population heterogeneity and plating stochasticity, as implemented by REPOP, is essential.

The multimodality exhibited in Fig. 5 can be explained by the heterogineity of the host. Recently, this has been collaborated by in vivo experiments [51]. This phenomenon could be related to the well-established stratification in heat-shock protein (HSP) expression, which is also correlated with aging and lifespan in nematodes [52–54]. Thus the variation in the population could be due to the fact that ingestion and egestion exhibit an age-related change [55, 56] or due to state switching [51] in feeding and digestion, akin to genetic state switching [57, 58]. The results in Fig. 5 support the conclusion that biologically meaningful differences in population counts are being captured, rather than the data being dominated by instrumentation error. REPOP provides a robust method for distinguishing multiple peaks in the population distribution, indicating that intrinsic variability among individual C. elegans can be deconvolved from measurement noise.

4 Conclusion

In accurately estimating bacterial counts, simply multiplying counts by the dilution factor provides a starting point. However, this method alone often fails to capture the original individual counts across samples, as straightforward multiplication of colony counts by the total dilution factor typically leads to an overestimation of the spread (Fig. 1b) that may obscure multimodal behavior (Fig. 1c). Conversaly, inferring population growth dynamics [8, 15,19,46,51] and learning features driving population heterogeneity across samples [19,25,26], demands a deeper analysis of the statistics involved in dilution and counting. Indeed, such a treatment avoids attributing ecological (or other) significance to the randomness introduced by plating.

The Bayesian reconstruction presented here and implemented through REPOP provides rigorous method to obtain the distribution of populations from plate counts, including observing the correctly multimodality of the population, as seen in Figs. 1 – 3.

The study of the multimodal case accounting for cutoff (Model 3) also demonstrates that retaining all data, regardless of their individual importance in reconstructing the final population size distribution, remains crucial. As shown in Fig. 3, ignoring the cutoff can lead to misidentifying multimodality, especially for plates with colony counts near the cutoff. Consequently, REPOP’s results emphasize that explicitly reporting both the dilution schedule and the cutoff values used should be a fundamental part of data reporting, as they impact the form in which likelihoods are calculated.

As a clear extension of the current work, it is possible to combine the two modular steps involved in counting bacteria in plates with the current work of establishing the original population size based on plated samples. For instance, a notion of possible error rate, such as false positive or negative rate in counting that may even appear at high dilution (where there is always a chance for bacterial colony overlap), could help in further reducing the breadth in the original counts over bacteria. For instance, intuitively, we may imagine the probability of miscounting two or more colonies as a single one to grow as we reach colony counts approaching the cutoff. Simple simulations, for instance, considering typical colony and plate size, may be used to gain initial insight on potential false negative rates, and thus its effect on the original sample size.

Finally, REPOP provides a means by which to test hypothesis [8, 51] relating observed multimodality in gut microbial content of C. elegans, as shown in Fig. 5, to distinct categories of hosts. What is more, REPOP can also be used to study other organisms, exhibiting similar multimodatlity in their microbiota [59–61]. More than just a large dataset, identifying these multiple modes requires a theoretical framework maximizing the information extracted from each plate while remaining flexible enough to accommodate different numbers of modes. REPOP fulfills both of these requirements.

Supplemental Information

A Prior selection across the uni-modal and non-parametric model

In Sec. 2.3.2 we allude to a selection of a prior, p(θ), capable of handling multi-modal data. This will be explained in detail here.

As mentioned in the main text, this prior encompasses all of the parameters needed to construct the probability distribution for n, given by Eq.(3) in the main text. Therefore, θ must include the means , standard deviations , and mixture weights of the Gaussian mixture, . As mentioned in the main text, for practical purposes we assume a weak limit, which we refer to as A and equivalent to the truncation on the number of components in Eq. (4) in the main text (defaulted as the smallest value among 25 and the square root of the total number of samples) although it is changeable in our software.

Let us separate each piece of the prior . First, we consider the prior for the weights. In a uni-modal distribution, case 1, the result is trivial: we have ρ₁ = 1 with probability 1. However, in a multi-modal cases (2-3) where the number of components is unknown, we use a Dirichlet distribution, a common choice for mixture models in Bayesian non-parametrics [43], given by:

where is a sequence of the same shape as and with Γ representing the gamma function. In order to prevent overfitting is an usual choice to have an exponentially decreasing. Let us write with being a sequence where the parameters decay exponentially by a factor q, such that η_α = q, and λ being a scalar (we use ). The expected value of ρ_α sampled from (17) will be .

With the prior for established, we proceed to define the prior for the means and standard deviations . Here, each component α is independent from each other. Initially, for the means, we select a distribution that avoids biasing the choice by being nearly uniform, but preventing too small numbers and adhering to a sensible cutoff in n. The mean of each Gaussian must be smaller than twice the largest number found by naively multiplying counts per dilution; we denote this largest cutoff as . Thus, the prior for each mean, µ_α, is given by

For the prior in we write a prior scales with its respective means but is still broad, we choose a lognormal prior here

Thus, the prior used to generate the figures in the main text is defined as:

With , p(µ_α), and p(σ_α|µ_α) given by (17), (18), and (19) respectively.

B Relative error between the means

When analyzing the error when ignoring the cutoff in Sec. 3.2, we used a metric that is the relative error between the means. When comparing two distributions, p(n|θ) and p(n|θ′), in terms of how the means match, we use the metric

given as percentages where α are indexes to the components of θ, while α′ are the indexes of the components of θ′. The term within the absolute value is equivalent to the relative error between two means µ_α and µ_α′. We take the minimum to account for cases where the labels for different components do not match across the distributions. Throughout the text, we have presented the metric M in percentages. In the main text, Sec. 3.2, θ represents the reconstruction obtained by the method while θ′ represents the ground truth.

For the dataset shown in Fig.3a, the relative error is 0.82% when the cutoff is accounted for (model 3). In contrast, when the cutoff is ignored (model 2), the error increases significantly to 7.50%. Similarly, for the dataset shown in Fig.3b, the relative error is 2.10% when the cutoff is considered, compared to 11.1% when it is not. Full details of these calculations are available on our GitHub repository [37].

C Experimental Methodology

C.1 E. coli culture for simulating different expected population peaks

An overnight culture of the OP50 strain of Escherichia coli was grown in lysogeny broth (LB) media, and its optical density at 600 nm (OD₆₀₀) was measured against an LB blank. The culture was concentrated to an expected 5X of OD₆₀₀=1.0 by centrifugation and resuspension in phosphate-buffered saline (PBS). The 5X suspension was then diluted to an expected OD₆₀₀ of 0.5, 0.1,, and two at 0.01, with empirical OD₆₀₀ values recorded against a PBS blank, which were 0.804, 0.188., 0.025, and 0.02. From the first three vials, we obtained 25 samples (each with 3 dilutions), while the last vial yielded 22 samples.

For each OD₆₀₀ condition, serial dilutions were prepared to 1:100,000, 1:1,000,000, and 1:10,000,000 in PBS. A total of 90 µL from each dilution was plated onto LB agar, resulting in 75 plates per OD condition. Plates were incubated at 37°C for 18–24 hours before counting colony-forming units (CFUs) to estimate bacterial concentration.

C.2 C. elegans culture

The temperature-sensitive, reproductively sterile strain AU37 of Caenorhabditis elegans (C. elegans) was cultivated and synchronized according to standard protocols [50]. Following the protocol described by Gore and Vega [27], synchronized eggs were cultivated at the non-permissive temperature of 25°C to produce sterile adult worms. These worms were then washed and transferred to a heat-killed E. coli OP50 suspension in S media supplemented with gentamicin for 24 hours to clear gut bacteria.

Once cleared of gut bacteria, worms were washed once with 10 mL M9 buffer containing 0.1% Triton X and twice with 10 mL M9 buffer. They were then transferred to a live bacterial suspension (E. coli OP50 at OD₆₀₀ = 1.0 in 1 mL S media) for a duration of 3, 5, 7, or 9 days. Fresh bacterial suspension was provided at 24, 72, 120, and 168 hours (approximately every 48 hours after the initial 24-hour feeding).

On the designated sampling days for gut bacterial enumeration, worms were washed to remove non-adhered surface bacteria and manually disrupted following the protocol outlined in the section titled Disruption of Worms and Plating of Intestinal Populations of Ref. [27]. The resulting homogenate was serially diluted using a 10-fold dilution series, with final expected dilution factors of 1:22, 1:222, and 1:2222 before plating. Colony-forming units were enumerated after incubating the plates for 24 hours.

Acknowledgements

SP acknowledges support from the NIH (R35GM148237), ARO (W911NF-23-1-0304), and NSF (Grant No. 2310610). We also thank Ayush Saurabh, Max Schweiger, Lance (W. Q.) Xu, and Ioannis Sgouralis for insightful discussions in the development of this work.

References

[1]
1. Corry J. E. L.
2. Curtis G. D. W.
3. Baird R. M.
2011Handbook of Culture Media for Food and Water MicrobiologyThe Royal Society of Chemistry https://doi.org/10.1039/9781847551450 Google Scholar
[2]
1. Smith H.
2. Brown A.
2021Benson’s microbiological applications laboratory manualMcGraw-Hill Education Google Scholar
[3]
1. Vial S. L.
2. Doerscher D. R.
3. Hedberg C. W.
4. Stone W. A.
5. Whisenant S. J.
6. Schroeder C. M.
2019Microbiological testing results of boneless and ground beef purchased for the U.S. national school lunch program, school years 2015 to 2018Journal of Food Protection 82:1761–1768https://doi.org/10.4315/0362-028x.jfp-19-241 Google Scholar
[4]
1. Bartram J.
2. Cotruvo J.
3. Exner M.
4. Fricker C.
5. Glasmacher A.
2003Heterotropic plate counts and drinking-water safety
In:
1. World Health Organization
, editors. Emerging issues in water and infectious disease series
Google Scholar
[5]
1. Some S.
2. Mondal R.
3. Mitra D.
4. Jain D.
5. Verma D.
6. Das S.
2021Microbial pollution of water with special reference to coliform bacteria and their nexus with environmentEnergy Nexus 1:100008https://doi.org/10.1016/j.nexus.2021.100008 Google Scholar
[6]
1. Wang X.
2. Cao Z.
3. Zhang M.
4. Meng L.
5. Ming Z.
6. Liu J.
2020Bioinspired oral delivery of gut microbiota by self-coating with biofilmsScience Advances 6:1472–1476https://doi.org/10.1126/sciadv.abb1952 Google Scholar
[7]
1. Tamargo A.
2. Molinero N.
3. Reinosa J. J.
4. Alcolea-Rodriguez V.
5. Portela R.
6. Bañares M.A.
7. Fernández J. F.
8. Moreno-Arribas M. V.
2022PET microplastics affect human gut microbiota communities during simulated gastrointestinal digestion, first evidence of plausible polymer biodegradation during human digestionScientific Reports 12:528https://doi.org/10.1038/s41598-021-04489-w Google Scholar
[8]
1. Martini K. M.
2. Boddu S. S.
3. Nemenman I.
4. Vega N. M.
2024Maximum likelihood estimators for colony forming unitsMicrobiology Spectrum 12:e03946–23https://doi.org/10.1128/spectrum.03946-23 Google Scholar
[9]
1. Taylor M. N.
2. Boddu S. Spandana
3. Vega N. M.
2022Using single-worm data to quantify heterogeneity in Caenorhabditis elegans-bacterial interactionsJournal of Visualized Ex-periments 185:e64027https://doi.org/10.3791/64027 Google Scholar
[10]
1. Marine E.
2. Milner D. S.
3. Lambert C.
4. Sockett R. E.
5. Pos K. M.
2020A novel method to determine antibiotic sensitivity in Bdellovibrio bacteriovorus reveals a DHFR-dependent natural trimethoprim resistanceScientific Reports 10:5315https://doi.org/10.1038/s41598-020-62014-x Google Scholar
[11]
1. Geissmann Q.
2013OpenCFU, a new free and open-source software to count cell colonies and other circular objectsPLOS One 8:1–10https://doi.org/10.1371/journal.pone.0054072 Google Scholar
[12]
1. Khan A. u. M.
2. Torelli A.
3. Wolf I.
4. Gretz N.
2018AutoCellSeg: robust automatic colony forming unit (CFU)/cell analysis using adaptive image segmentation and easy-to-use post-editing techniquesScientific Reports 8:7302https://doi.org/10.1038/s41598-018-24916-9 Google Scholar
[13]
1. Albaradei S. A.
2. Napolitano F.
3. Uludag M.
4. Thafar M.
5. Napolitano S.
6. Essack M.
7. Bajic V. B.
8. Gao X.
2020Automated counting of colony forming units using deep transfer learning from a model for congested scenes analysisIEEE Access 8:164340–164346https://doi.org/10.1109/ACCESS.2020.3021656 Google Scholar
[14]
1. Zhang J.
2. Li C.
3. Rahaman M. M.
4. Yao Y.
5. Ma P.
6. Zhang J.
7. Zhao X.
8. Jiang T.
9. Grzegorzek M.
2021A comprehensive review of image analysis methods for microorganism counting: from classical image processing to deep learning approachesArtificial Intelligence Review 55:2875–2944https://doi.org/10.1007/s10462-021-10082-4 Google Scholar
[15]
1. Martino D. D.
2. Capuani F.
3. Martino A. D.
2016Growth against entropy in bacterial metabolism: the phenotypic trade-off behind empirical growth rate distributions ine.coliPhysical Biology 13:036005https://doi.org/10.1088/1478-3975/13/3/036005 Google Scholar
[16]
1. De Martino D.
2017Maximum entropy modeling of metabolic networks by constraining growth-rate moments predicts coexistence of phenotypesPhysical Review E 96:060401https://doi.org/10.1103/physreve.96.060401 Google Scholar
[17]
1. Pecht T.
2. Aschenbrenner A. C.
3. Ulas T.
4. Succurro A.
2019Modeling population heterogeneity from microbial communities to immune response in cellsCellular and Molecular Life Sciences 77:415–432https://doi.org/10.1007/s00018-019-03378-w Google Scholar
[18]
1. Muntoni A. P.
2. Braunstein A.
3. Pagnani A.
4. De Martino D.
5. De Martino A.
2022Relationship between fitness and heterogeneity in exponentially growing microbial populationsBiophysical Journal 121:1919–1930https://doi.org/10.1016/j.bpj.2022.04.012 Google Scholar
[19]
1. Armitage D. W.
2. Jones S. E.
2019How sample heterogeneity can obscure the signal of microbial interactionsThe ISME Journal 13:2639–2646https://doi.org/10.1038/s41396-019-0463-3 Google Scholar
[20]
1. Blanchet F. G.
2. Cazelles K.
3. Gravel D.
2020Co-occurrence is not evidence of ecological interactionsEcology Letters 23:1050–1063https://doi.org/10.1111/ele.13525 Google Scholar
[21]
1. Veening J.-W.
2. Smits W. K.
3. Kuipers O. P.
2008Bistabilityepigenetics, and bet-hedging in bacteria, Annual Review of Microbiology 62:193–210https://doi.org/10.1146/annurev.micro.62.081307.163002 Google Scholar
[22]
1. Moore L. S.
2. Stolovicki E.
3. Braun E.
2013Population dynamics of metastable growth-rate phenotypesPLoS ONE 8:e81671https://doi.org/10.1371/journal.pone.0081671 Google Scholar
[23]
1. Mao J.
2. Lu T.
2016Population-dynamic modeling of bacterial horizontal gene transfer by natural transformationBiophysical Journal 110:258–268https://doi.org/10.1016/j.bpj.2015.11.033 Google Scholar
[24]
1. Werner J.
2. Pietsch T.
3. Hilker F. M.
4. Arndt H.
2022Intrinsic nonlinear dynamics drive single-species systemsProceedings of the National Academy of Sciences 119:e2209601119https://doi.org/10.1073/pnas.2209601119 Google Scholar
[25]
1. Goberna M.
2. Verdú M.
2022Cautionary notes on the use of co-occurrence networks in soil ecologySoil Biology and Biochemistry 166:108534https://doi.org/10.1016/j.soilbio.2021.108534 Google Scholar
[26]
1. Pinto S.
2. Benincá E.
3. van Nes E. H.
4. Scheffer M.
5. Bogaards J. A.
2022Species abundance correlations carry limited information about microbial network interactionsPLoS Computational Biology 18:e1010491https://doi.org/10.1371/journal.pcbi.1010491 Google Scholar
[27]
1. Vega N. M.
2. Gore J.
2017Stochastic assembly produces heterogeneous communi-ties in the caenorhabditis elegans intestinePLOS Biology 15:e2000633https://doi.org/10.1371/journal.pbio.2000633 Google Scholar
[28]
1. Van Nevel S.
2. Koetzsch S.
3. Proctor C.
4. Besmer M.
5. Prest E.
6. Vrouwenvelder J.
7. Knezev A.
8. Boon N.
9. Hammes F.
2017Flow cytometric bacterial cell counts challenge conventional heterotrophic plate counts for routine microbiological drinking water monitoringWater Research 113:191–206https://doi.org/10.1016/j.watres.2017.01.065 Google Scholar
[29]
1. Wilkinson M. G.
2018Flow cytometry as a potential method of measuring bacterial viability in probiotic products: A reviewTrends in Food Science and Technology 78:1–10https://doi.org/10.1016/j.tifs.2018.05.006 Google Scholar
[30]
1. Taimur M. M.
2. Acevedo R.
3. Sklar L. A.
4. Thomsen K.
5. Tegos G. P.
6. Camper A. K.
2020Rapid identification of bacterial viability, culturable and non-culturable stages by using flow cytometryprotocols.io https://doi.org/10.21203/rs.3.pex-1153/v1 Google Scholar
[31]
1. Davey H.
2. Guyot S.
2020Estimation of microbial viability using flow cytometryCurrent Protocols in Cytometry 93:e72https://doi.org/10.1002/cpcy.72 Google Scholar
[32]
1. Hansen S. J. Z.
2. Tang P.
3. Kiefer A.
4. Galles K.
5. Wong C.
6. Morovic W.
2020Droplet digital pcr is an improved alternative method for high-quality enumeration of viable probiotic strainsFrontiers in Microbiology 10:3025https://doi.org/10.3389/fmicb.2019.03025 Google Scholar
[33]
1. Weitzel M. L. J.
2. Vegge C. S.
3. Pane M.
4. Goldman V. S.
5. Koshy B.
6. Porsby C. H.
7. Burguière P.
8. Schoeni J. L.
2021Improving and comparing probiotic plate count methods by analytical procedure lifecycle managementFrontiers in Microbiology 12:693066https://doi.org/10.3389/fmicb.2021.693066 Google Scholar
[34]
1. Tracey H.
2. Coates N.
3. Hulme E.
4. John D.
5. Michael D. R.
6. Plummer S. F.
2023Insights into the enumeration of mixtures of probiotic bacteria by flow cytometryBMC Microbiology 23:48https://doi.org/10.1186/s12866-023-02792-2 Google Scholar
[35]
1. Sun Q.
2. Huynh K.
3. Muratovic D.
4. Gunn N. J.
5. Zelmer A. R.
6. Solomon L. B.
7. Atkins G. J.
8. Yang D.
2024Beyond the colony-forming-unit: Rapid bacterial evaluation in osteomyelitiseLife 13:RP93698https://doi.org/10.7554/elife.93698.2 Google Scholar
[36]
1. International Organization for Standardization
2013Microbiology of the food chain -— Horizontal method for the enumeration of microorganisms —- Part 2: Colony count at 30 degrees C by the surface plating technique, StandardGoogle Scholar
[37]
1. Pessoa P.
2024REPOP - Reconstructing population from plateshttps://github.com/PessoaP/platereconstruction
[38]
1. Pressé S.
2. Ghosh K.
3. Phillips R.
4. Dill K. A.
2010Dynamical fluctuations in biochemical reactions and cyclesPhysical Review E 82:031905https://doi.org/10.1103/PhysRevE.82.031905 Google Scholar
[39]
1. Jazani S.
2. Sgouralis I.
3. Pressé S.
2019A method for single molecule tracking using a conventional single-focus confocal setupThe Journal of Chemical Physics 150:114108https://doi.org/10.1063/1.5083869 Google Scholar
[40]
1. Tavakoli M.
2. Jazani S.
3. Sgouralis I.
4. Heo W.
5. Ishii K.
6. Tahara T.
7. Pressé S.
2020Direct photon-by-photon analysis of time-resolved pulsed excitation data using bayesian nonparametricsCell Reports Physical Science 1:100234https://doi.org/10.1016/j.xcrp.2020.100234 Google Scholar
[41]
1. Kilic Z.
2. Sgouralis I.
3. Pressé S.
2021Generalizing HMMs to Continuous Time for Fast Kinetics: Hidden Markov Jump ProcessesBiophysical Journal 120:409–423https://doi.org/10.1016/j.bpj.2020.12.022 Google Scholar
[42]
1. Sgouralis I.
2. Xu L. W. Q.
3. Jalihal A. P.
4. Kilic Z.
5. Walter N. G.
6. Pressé S.
2024BNPTrack: a framework for superresolved trackingNature Methods 21:1716–1724https://doi.org/10.1038/s41592-024-02349-9 Google Scholar
[43]
1. Pressé S.
2. Sgouralis I.
2023Data Modeling for the Sciences: Applications, Basics, ComputationsCambridge University Press Google Scholar
[44]
1. Bryan I.
2. Shepard J.
3. Sgouralis I.
4. Pressé S.
2020Inferring effective forces for Langevin dynamics using gaussian processesThe Journal of Chemical Physics 152:124106https://doi.org/10.1063/1.5144523 Google Scholar
[45]
1. Pessoa P.
2. Schweiger M.
3. Xu L. W. Q.
4. Manha T.
5. Saurabh A.
6. Camarena J. A.
7. Pressé S.
2025Avoiding subtraction and division of stochastic signals using normalizing flows: NFdeconvolvearXiv https://doi.org/10.48550/arXiv.2501.08288 Google Scholar
[46]
1. Taylor M. N.
2. Vega N. M.
2024Sample pooling inflates error rates in between-sample comparisons: an empirical investigation of the statistical properties of count-based databioRxiv https://doi.org/10.1101/2022.07.25.501406 Google Scholar
[47]
1. Walker S. G.
2007Sampling the dirichlet mixture model with slicesCommunications in Statistics - Simulation and Computation 36:45–54https://doi.org/10.1080/03610910601096262 Google Scholar
[48]
1. Kalli M.
2. Griffin J. E.
3. Walker S. G.
2009Slice sampling mixture modelsStatistics and Computing 21:93–105https://doi.org/10.1007/s11222-009-9150-y Google Scholar
[49]
1. Paszke A.
2. Gross S.
3. Massa F.
4. Lerer A.
5. Bradbury J.
6. Chanan G.
7. Killeen T.
8. Lin Z.
9. Gimelshein N.
10. Antiga L.
11. Desmaison A.
12. Kopf A.
13. Yang E.
14. DeVito Z.
15. Raison M.
16. Tejani A.
17. Chilamkurthy S.
18. Steiner B.
19. Fang L.
20. Bai J.
21. Chintala S.
2019Pytorch: An imperative style, high-performance deep learning libraryIn: Advances in Neural Information Processing Systems 32 Curran Associates, Inc pp. 8024–8035Google Scholar
[50]
1. Eisenmann D. M.
2005Wormbookhttp://www.wormbook.org/Google Scholar
[51]
1. Boddu S. S.
2. Martini K. M.
3. Nemenman I.
4. Vega N. M.
2024Variance in c. elegans gut bacterial load suggests complex host-microbe dynamicsbioRxiv https://doi.org/10.1101/2024.12.09.627557 Google Scholar
[52]
1. Yashin A. I.
2. Cypser J. W.
3. Johnson T. E.
4. Michalski A. I.
5. Boyko S. I.
6. Novoseltsev V. N.
2002Heat shock changes the heterogeneity distribution in populations of caenorhabditis elegans: does it tell us anything about the biological mechanism of stress response?The Journals of Gerontology Series A: Biological Sciences and Medical Sciences 57:B83–B92https://doi.org/10.1093/gerona/57.3.b83 Google Scholar
[53]
1. Wu D.
2. Rea S. L.
3. Yashin A. I.
4. Johnson T. E.
2006Visualizing hidden heterogeneity in isogenic populations of c. elegansExperimental gerontology 41:261–270https://doi.org/10.1016/j.exger.2006.01.003 Google Scholar
[54]
1. Vertti-Quintero N.
2. Berger S.
3. Solvas X. Casadevall i
4. Statzer C.
5. Annis J.
6. Ruppen P.
7. Stavrakis S.
8. Ewald C. Y.
9. Gunawan R.
10. DeMello A. J.
2021Stochastic and age-dependent proteostasis decline underlies heterogeneity in heat-shock response dynamicsSmall 17:2102145https://doi.org/10.1002/smll.202102145 Google Scholar
[55]
1. Bolanowski M. A.
2. Russell R. L.
3. Jacobson L. A.
1981Quantitative measures of aging in the nematode caenorhabditis elegans. i. population and longitudinal studies of two behavioral parametersMechanisms of ageing and development 15:279–295https://doi.org/10.1016/0047-6374(81)90136-6 Google Scholar
[56]
1. Huang C.
2. Xiong C.
3. Kornfeld K.
2004Measurements of age-related changes of physiological processes that predict lifespan of caenorhabditis elegansProceedings of the National Academy of Sciences 101:8084–8089https://doi.org/10.1073/pnas.0400848101 Google Scholar
[57]
1. Pessoa P.
2. Schweiger M.
3. Pressé S.
2024Avoiding matrix exponentials for large transition rate matricesThe Journal of Chemical Physics 160:094109https://doi.org/10.1063/5.0190527 Google Scholar
[58]
1. Kilic Z.
2. Schweiger M.
3. Moyer C.
4. Shepherd D.
5. Pressé S.
2023Gene expression model inference from snapshot rna data using bayesian non-parametricsNature Computational Science 3:174–183https://doi.org/10.1038/s43588-022-00392-0 Google Scholar
[59]
1. Acunã Hidalgo B.
2. Armitage S. A.
2022Host resistance to bacterial infection varies over time, but is not affected by a previous exposure to the same pathogenFrontiers in Physiology 13:860875https://doi.org/10.3389/fphys.2022.860875 Google Scholar
[60]
1. Clemmons A. W.
2. Lindsay S. A.
3. Wasserman S. A.
2015An effector peptide family required for drosophila toll-mediated immunityPLoS Pathogens 11:e1004876https://doi.org/10.1371/journal.ppat.1004876 Google Scholar
[61]
1. Chen J. Z.
2. Kwong Z.
3. Gerardo N. M.
4. Vega N. M.
2024Ecological drift during colonization drives within-host and between-host heterogeneity in an animal-associated symbiontPlos Biology 22:e3002304https://doi.org/10.1371/journal.pbio.3002304 Google Scholar

Article and author information

Author information

Pedro Pessoa
Center for Biological Physics, Arizona State University, Tempe, United States, Department of Physics, Arizona State University, Tempe, United States
ORCID iD: 0000-0002-6258-5679
Carol Lu
Center for Biological Physics, Arizona State University, Tempe, United States, School of Biological and Health Systems Engineering, Arizona State University, Tempe, United States, School of Mathematical and Statistical Sciences, Arizona State University, Tempe, United States
Stanimir Asenov Tashev
Center for Biological Physics, Arizona State University, Tempe, United States, Department of Physics, Arizona State University, Tempe, United States
ORCID iD: 0000-0002-0119-4412
Rory Kruithoff
Center for Biological Physics, Arizona State University, Tempe, United States, Department of Physics, Arizona State University, Tempe, United States
ORCID iD: 0000-0002-5131-1330
Douglas P Shepherd
Center for Biological Physics, Arizona State University, Tempe, United States, Department of Physics, Arizona State University, Tempe, United States
ORCID iD: 0000-0001-9087-0832
Steve Pressé
Center for Biological Physics, Arizona State University, Tempe, United States, Department of Physics, Arizona State University, Tempe, United States, School of Molecular Sciences, Arizona State University, Tempe, United States
ORCID iD: 0000-0002-1277-8915
- For correspondence: spresse@asu.edu

Author Notes

Competing interests: No competing interests declared

Version history

Preprint posted: April 6, 2025
Sent for peer review: April 24, 2025
Reviewed Preprint version 1: July 9, 2025

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.107122. This DOI represents all versions, and will always resolve to the latest one.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

views: 1,088
download: 1
citations: 0

Views, downloads and citations are aggregated across all versions of this paper published by eLife.