Figures and data

Visual description of the dilution and plate counting process and the need for its analysis.
a) A sample of volume V is mixed with sterile medium in a larger vial (e.g., 19V, for a total dilution of 20). A portion of volume V of this diluted sample is then spread on a plate and incubated. Further tenfold dilutions are performed by transferring V into a vial with 9V of medium before plating. The process continues until a reasonable number of counts is obtained. b) The histogram on the left shows simulated counts across 1000 plates, obtained by diluting samples from the ground truth distribution by a factor of 200. The usual method of multiplying the counts by the total dilution factor produces the orange histogram (in the right) which appears significantly broader than the ground truth (black line, right) due to additional stochasticity introduced by dilution and plating. The ground truth distribution is a Gaussian with a mean of 8000 and a standard deviation of 500. REPOP’s reconstruction (blue) corrects the broadening. c) A similar simulation to (b) where the ground truth population distribution is trimodal (parameters in Fig. 2). Here the stochasticity introduced dilution and plating obscure the three peaks, making them difficult to distinguish in the observed counts. Nevertheless, REPOP successfully reconstructs the trimodal distribution.

For ease of reference, we summarize the differences among the three models taken into account in the present article.

REPOP can reconstruct multimodal populations even when they are not directly observable by naively multiplying counts by dilution.
Similar to Fig. 1b and c, this figure shows the observed counts, the naïve estimate obtained by multiplying counts by dilution, and the reconstruction produced by REPOP, compared to the ground truth. In both cases, the population n is drawn from a ground truth mixture of three Gaussians, shown as the black curves. In a), the mixture has means (4000, 8000, 14,000), standard deviations (200, 1500, 1000), and weights (0.25, 0.4, 0.35), while in b), it has means (8000, 16,000, 24,000), standard deviations (1000, 1000, 1000), and weights (0.3, 0.2, 0.5). Observed counts are sampled from (1), diluted by a factor of 200, and shown as histograms on the left. In a), a trimodal structure is visible in the observed counts histogram. However, in (b), the trimodality is heavily obscured by the stochasticity introduced by dilution and plating, making it indiscernible even in large datasets. Despite this, when applying REPOP to datasets of increasing size, we see that while a small dataset of 25 plates may result in missed modes, larger datasets allow REPOP to accurately recover the underlying multimodal structure in both cases. This demonstrates REPOP’s ability to infer the true population despite the stochastic noise introduced by dilution and plating.

Not accounting for the cutoff can lead to incorrectly attributed multimodality.
This figure compares population reconstructions with and without accounting for the cutoff effect, (using Model 3 and Model 2 respectively). a) simulated data generated with a cutoff kCO = 50. The ground truth population consists of three Gaussian components with means

Reconstructing, from real experimental plate counting, the population from vials with different optical densities.
a) Distribution of observed colony counts obtained as described in Sec. 3.3. The data is passed to the system without vial identification, aiming to reconstruct the underlying tetramodal structure. b) Results from a simulated dataset with 500 datapoints (plates). The increased sample size improves the estimation of the four main components, yielding a more accurate reconstruction.

Results for the population of E.coli-fed C. elegans’ gut at days 3, 5, 7, and 9 of adulthood.
Here we are able to observe the shift in the population distribution obtained with plate counting shifts to higher population for longer-living C. elegans. To better visualize the distribution and model fits, we use a split-log x-axis: bacterial counts from 0 to 100 are shown on a linear scale, and counts above 100 on a logarithmic scale. The population distribution shifts toward higher values each day, as expected from colonization and replication within the gut. If biological significance, such as different C. elegans types, as proposed in e.g. [51] is to be inferred from this multimodality, a rigorous separation of population heterogeneity and plating stochasticity, as implemented by REPOP, is essential.