Inferring characteristics of bacterial swimming in biofilm matrix from time-lapse confocal laser scanning microscopy

  1. Guillaume Ravel
  2. Michel Bergmann
  3. Alain Trubuil
  4. Julien Deschamps
  5. Romain Briandet
  6. Simon Labarthe  Is a corresponding author
  1. University of Bordeaux, INRAE, BIOGECO, France
  2. Inria, INRAE, France
  3. Memphis Team, INRIA, France
  4. University of Bordeaux, IMB, UMR 5251, France
  5. CNRS, IMB, UMR 5251, France
  6. Université Paris-Saclay, INRAE, MaIAGE, France
  7. Université Paris-Saclay, INRAE, AgroParisTech, Micalis Institute, France
20 figures, 6 tables and 1 additional file


Microscopy data and model outlines.

(a) Temporal stacks of 2D images are acquired, with different fluorescence colors for host bacteria (Staphylococcus aureus, green) and swimmers (Bacillus pumilus, Bacillus sphaericus or Bacillus cereus, red). Bacterial swimmers navigate in a host biofilm and are tracked in the different snapshots. Swimmer trajectories are represented with white lines. High density and low density zones of host cells are visible in the biofilm (green scale). (b) Additionally to speed and acceleration distributions, three trajectory descriptors are considered. Distance is the total length of the trajectory path. Displacement is the distance between the initial and final points of the trajectory. Visited area is the total area of the pores left by the swimmer during its path. Hence, when a swimmer retraces its steps, the displacement is incremented but not the visited area. (c) Three different mechanisms are considered in the mechanistic model. Biofilm-dependant speed. A target speed is defined accordingly to the local density of biofilm and asymptotically reached after a relaxation time. Biofilm-dependent direction. Swimming direction is defined accordingly to the local biofilm density gradient. Random walk. A Brownian motion is added. (d) The image acquisition workflow is composed of a first step at the wet lab where host biofilm and swimmer are plated and imaged in different color channels. Then a post-processing phase recomposes the swimmer trajectories with tracking algorithms. Finally, temporal positions, speeds and accelerations are computed. On the biofilm channel, density and density gradient maps are processed at each time step.

TEM images of the three Bacillus.

TEM images of the three Bacillus are acquired, scaled in the same dimension and aligned (left panel). Images at lower scale are made with a zoom in on the flagella insertion (right panel). Note that the zoom in is optical so that the zoomed in image do not correspond to a zone of the larger scale images.

Swimmer trajectories The whole set of trajectories of each species is displayed in the control Newtonian buffer (upper panel) and in the host biofilm (lower panel).

Note that the 3 batches of the different species are pooled on these images. Number of trajectories are n = 517 and 123 (B. pumilus), n = 237 and 94 (B. sphaericus) and n = 279 and 144 (B. cereus) for, respectively, the biofilm and the control buffer. The physical size of the domain is 147x147μm.

Assessing run-and-tumble with speed and direction distributions.

For each time point, the swimmer mean speed V¯is(t), defined as the mean between the incoming and outgoing velocity vectors V¯is(t)=(Vis(t)+Vis(tΔt))/2, for t(T0,is+Δt,Tend,is), is plotted versus the direction change, defined as the angle θis(t) between the incoming and outgoing velocity vectors θis(t)=arccos((Vis(t)Vis(tΔt))/(||Vis(t)||Vis(tΔt)||)). The left and bottom panels indicate the marginal distributions, with the mean (dashed line) and quantiles 0.05, 0.5, and 0.95 (plain lines). Number of times points are n = 6 848 and 6 509 (B.pumilus), n = 2 818 and 3 740 (B. sphaericus) and n = 3 526 and 4 435 (B. cereus) for, respectively, the biofilm and the control buffer.

Analysis of swimming characteristics using trajectory descriptors.

Upper panel: normalized acceleration, speed, distance, displacement, and area distributions structured by species are displayed, together with quantile 0.05, 0.5, and 0.95 (vertical plain lines) and mean (vertical dashed line). The descriptor distribution in the control Newtonian buffer is indicated with the dotted line. All values are normalized by the corresponding reference value as indicated in Materials and methods. T-test pairwise comparison p-values are displayed in Appendix 1—table 2. Number of trajectories are n = 517 and 123 (B.pumilus), n = 237 and 94 (B. sphaericus) and n = 279 and 144 (B. cereus) for, respectively, the biofilm and the control buffer. Lower panel: we display the distribution of the instantaneous acceleration norm respectively to the local biofilm density gradient (i.e. ||Ai(t)|| function of b(Xi(t))) and of the instantaneous velocity norm respectively to the local biofilm density (i.e. ||Vi(t)|| function of b(Xi(t))), structured by population. The point cloud of each species is approximated by a gaussian kernel and gaussian kernel isolines enclosing 5, 50% and 95% of the points centered in the densest zones are displayed to facilitate comparisons between species (see Materials and methods Plots and statistics).

Inference assessment on synthetic data.

(a) Predicted vs true trajectories. Trajectories are recovered by sampling the parameter posterior distribution starting from the same initial condition than in the data. We represent a ground truth trajectory extracted randomly from the original dataset in red, the corresponding sampled trajectories with thin gray lines, and the trajectory obtained with the posterior means in orange. Note that in this simulation, the stochastic part is the same for all simulations, so that the only source of uncertainties comes from the inference procedure. (b) Trajectory descriptors. Trajectories are re-computed replacing the original parameters (ground truth) by the inferred parameters. The trajectory descriptors introduced in Characterizing bacterial swimming in a biofilm matrix through image descriptors are computed on the synthetic data (blue curves) and on the data obtained with the inferred parameters (orange curves). Number of trajectories are n = 72 for the ground truth and n = 100 after inference. (c) Ground truth vs fitted trajectories. The ground truth, that is the original trajectories (blue) and fitted (red) trajectories are displayed and show common characteristics. (d-e) Qqplot of fitted model output vs ground truth. After inference, the fitted model is used to re-compute the synthetic dataset. We plot the x (d) and y (e) components of the accelerations in a qqplot: the fitted model output quantiles are plotted against the quantiles of the original dataset (ground truth) with blue dots, together with the y=x line (red).

Inference result on the experimental images.

(a) To validate the inference process, a synthetic dataset is assembled by computing Equation 1 with the inferred parameters and the trajectory descriptors introduced in section Characterizing bacterial swimming in a biofilm matrix through image descriptors are computed and can be compared to the data descriptors in Figure 5. Acceleration, speed, distance and displacement distributions are displayed in the upper panel, with quantiles 0.05, 0.5 and 0.95 (plain lines) and mean (dashed line). The mean values observed in the image data are also displayed for comparison (black dashed line). The number of trajectories are identical than in Figure 5: n = 517 (B. pumilus), n = 237 (B. sphaericus) and n = 279 (B. cereus). Interactions between the host biofilm and, respectively, acceleration and speed distributions are displayed in the lower panel with isolines enclosing 5, 50% and 95% of the points, centered in the densest zones. (b) Inferred parameter posterior distributions after analysis of the confocal swimmer images, and posterior mean (dashed line). We used 4000 points for the computation of the Gaussian KDE.

Appendix 1—figure 1
Illustration of image data along the post-processing process.

Raw data (2 chanel images) are first displayed. Then, trajectory tracking are obtained. Finally, the biofilm density map is rescalled, and mapped to grayscale. Images dimensions are 147x147μm.

Appendix 1—figure 2
Assessment that the biofilm structure does not strongly vary in the z direction.

(A) Subsampling procedure. We illustrate the subsampling procedure in one of the 4 replicates. The 2D images constituting the 3D stack are sampled with a 4 × 4 cartesian grid. We can also visually observe that the biofilm variation between horizontal images are weak. Images dimensions are 147x147μm. (B) Pairwise correlation matrix. The correlation dissimilarity between sample pairs is displayed (black = 0 value, indicating identical samples, to light orange >1, indicating dissimilar samples) after hierarchical clustering. We indicate the stack,z and x-y label in the 3 first columns with a color code. We can observe that the samples are not gathered by,z but rather by stacks and x-y groups, indicating that images with identical x-y labels are clustered together, showing that they are more similar to samples with the same x-y coordinates in other z slices, than other samples in the same z slice with other x-y coordinates.

Appendix 1—figure 3
Illustration of pore formation.

Extractions of two successive images of B. sphaericus swimming in a S. aureus biofilm are displayed. The dashed ellipse indicates a zone where a swimmer moves between the two successive images, which creates a pore along its swimming path. Images dimensions are 76 x 78 μm.

Appendix 1—figure 4
Inference convergence validation.

The markov chain (upper panel) and the posterior distribution (lower panel) of each parameter is displayed, showing good convergence of the stochastic sampling of the posteriors.

Appendix 1—figure 5
Pairplot of parameter markov chains.

No strong covariance effect can be observed, showing that the model can not be reduced by analytical dependence between parameters. Slight correlation is observed between the parameters v0, v1 and γ: this feature is not surprising since γ, v0 and v1 are in the same term of equation (2). The correlation is however too low to expect a model reduction.

Appendix 1—figure 6
Impact of noise level on parameter inference.

We plot the relative error of the estimate of the different equation parameters for increasing noise applied on the biofilm density and the biofilm density gradients input data.

Appendix 2—figure 1
To illustrate the influence of each term of Equation 2, they are alternatively turned off (c to e), and swimmer trajectories are computed on mock biofilms.

(a) displaying marked density gradients (upper pannel) or marked pores (lower pannel). Trajectories can be compared to a basal simulation (b) when all the terms have the same intensity (1).

Appendix 2—figure 2
Sensitivity analysis of state observable respectively to state-equation parameters.

The sensitivity of different state observables to parameter shifts is systematically studied through sensitivity analysis methods. (a) Sobol indices are displayed for each output through barplots indicating the part of variance explained by a given parameter. The bars do not reach the value 1, indicating a residual variance reflecting interactions between parameters. (b) Pairwise correlation coefficient (PCC) of the observable respectively to the input parameters are displayed. A negative PCC indicates a negative impact on the output, and conversely. We note that the red dot indicating the PCC of β for V is confounded with the purple one indicating the PCC of ϵ.

Appendix 2—figure 3
Illustration of the interplay between friction and stochastic terms in Langevin equation.

Trajectories produced by different repetitions of Equation 11 are displayed with γ=1 (upper panel) and γ=0 (lower pannel). We note that the same random seed has been taken for the simulations of the same column with or without the friction term, meaning that the stochastic term is strictly identical in both simulations.

Appendix 2—figure 4
Influence of the stochastic process on swimmer trajectories.

We plot two different trajectories computed with the model (2), including the same parameters α, β, v0, v1, γ and ϵ, the same initial condition and identical host biofilm. The only uncertainty source comes from the different random samplings of the stochastic term. In this simulation, the ground truth (with default random seed) is plotted in blue.

Appendix 2—figure 5
Low influence of the stochastic term on the trajectory descriptors.

To assess the influence of the random term on the population-wide trajectory descriptors and overall prediction accuracy, we repeated the experiment displayed in Figure 6a. A synthetic database was first assembled (ground truth) and prediction were performed with a fitted model (After inference). Then, a second repetition of the prediction of the fitted model was computed with another seed for the random number generator, resulting in modifying the sampling of the stochastic terms and strong modifications of individual trajectories, like in Figure 4. The population-wide trajectory descriptors are however slightly impacted by this random effect, indicating that the main characteristics of the trajectory populations marginally depend on the stochastic term.

Appendix 2—figure 6
Respective influence of stochastic effects, speed or direction adaptation to the host biofilm.

We plot in a ternary plot the respective influence of the speed selection (V), the direction selection (D) and the random term (ϵ) of Equation 1 in the acceleration distribution of each species. Each squared instantaneous acceleration is mapped in the ternary plot coordinates through the relative contribution of V2, D2 and ϵ2, and this point cloud is approximated in the ternary plot coordinates with a gaussian kernel to display the point distributions. The 0.05, 0.5 and 0.95 quantile isovalues of these distributions are plotted. (a) The entire ternary plot is displayed. The dashed line represents the zoom box represented in Fig. (b), where the same isolines are displayed, but with a zoom in in the y direction to highlight differences between species. The number of trajectories are identical than in Figure 5: n = 517 (B. pumilus), n = 237 (B. sphaericus) and n = 279 (B. cereus).

Appendix 4—figure 1
Illustration of the Gaussian KDE isovalues computation.

Starting from a random 2D point distribution (left panel), a gaussian KDE is computed using the scipy.stats function (middle panel). Then, the gaussian KDE is evaluated at the original point positions, and quantiles of the resulting values are computed (quantiles 0.05, 0.5 and 0.95). Gaussian KDE isolines corresponding to this quantiles are finally computed (right panel). This isolines enclose respectively 5, 50% and 95% of the points of the original distribution, centered in the densest area of the initial point cloud. This procedure gives a good representation of the shape of the data, but allows to display several distributions in the same graph, enabling comparison while avoiding superimposition issues.


Table 1
Dataset characteristics.

We detailed, for each batch, the number of trajectories, the average number of time points by trajectory (and standard deviation), the total number of time points in the dataset, the total movie duration in seconds and the time interval between two snapshots in seconds.

SpeciesBatch# traject.traj. lengthtime pointsDuration [s]Δt[s]
B. pumilus112240 (7.4)4,590300.134
215225 (5.7)3,543300.134
324338 (6.9)8,825300.134
B. sphaericus19840 (7.6)3,762300.134
29143 (7.7)3,771300.134
34855 (7.9)2,543230.134
B. cereus110547 (7.9)4,766300.069
25336 (7.7)1,808300.069
312143 (7.1)5,006300.069
Table 2
Reference acceleration and speed, and acceleration variance decomposition between stochastic and deterministic terms.

The number N of acceleration time points is indicated for each specie. Then, reference values for acceleration Aref and speed Vref used for adimensionalization are computed by averaging the corresponding values by specie. Descriptive statistics of acceleration variance decomposition are then computed in order to illustrate the contribution of the deterministic terms in the observed acceleration distribution, and the part of the residual mechanisms that are not included in the model. We indicate for each species the acceleration variance σ(A), the part of the variance explained by the deterministic terms Rdet2 (see Materials and methods sec.Inference validation on experimental data) and the variance of the stochastic term ϵ2. We note that in order to compare species at vizualisation step, they are re-normalized with the average of the species reference values: Aref= 78.31 and Vref= 6.55.

Table 3
Inference outputs for the three species.

The posterior mean, standard deviation and inferred confidence interval are indicated for each parameter and each specie. Convergence diagnosis index neff and Rhat are provided.

speciesparammeanstdconfidence interval [2.5%–97.5%]neffRhat
B. pumilusγ0.773.95×10–3[0.77–0.77]4,5071
B. sphaericusγ0.614.53×10–3[0.60–0.62]4,9651
B. cereusγ0.831.11×10–2[0.80–0.86]2,7001
Appendix 1—table 1
P-values of pairwise comparison between distributions in biofilms.

Pairwise comparison were performed between 1D distributions displayed in Figure 5 using T-test and p-values are displayed. Non-significant values are indicated in bold.

B. cereusB. sphaericusB. cereusB. sphaericusB. cereusB. sphaericusB. cereusB. sphaericusB. cereusB. sphaericus
Appendix 1—table 2
P-values of pairwise comparison between distributions in the control Newtonian buffer.

Pairwise comparison were performed between 1D distributions displayed in Figure 5 using T-test and p-values are displayed. Non-significant values are indicated in bold.

B. cereusB. sphaericusB. cereusB. sphaericusB. cereusB. sphaericusB. cereusB. sphaericusB. cereusB. sphaericus
Appendix 1—table 3
Inference results on synthetic data.

The normalized ground-truth parameter values (i.e. ground truth parameter rescaled with Aref and Vref) are compared with the inference outputs on synthetic data: posterior distribution mean and standard deviation are indicated, together with the inferred confidence intervals for the true parameters. Convergence diagnosis indices are also given, with neff the effective sample size per iteration and Rhat the potential scale reduction factors, indicating that convergence occurred for all parameters.

parameterground truthmeanstdconfidence interval [2.5%–97.5%]neffRhat
β0.1460.166, 2×10-3[0.15-0.17]5,0501.0

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Guillaume Ravel
  2. Michel Bergmann
  3. Alain Trubuil
  4. Julien Deschamps
  5. Romain Briandet
  6. Simon Labarthe
Inferring characteristics of bacterial swimming in biofilm matrix from time-lapse confocal laser scanning microscopy
eLife 11:e76513.