Machine learning-assisted discovery of growth decision elements by relating bacterial population dynamics to environmental diversity

  1. Honoka Aida
  2. Takamasa Hashizume
  3. Kazuha Ashino
  4. Bei-Wen Ying  Is a corresponding author
  1. School of Life and Environmental Sciences, University of Tsukuba, Japan
8 figures and 1 additional file

Figures

Figure 1 with 3 supplements
Relating bacterial growth to environmental diversity.

(A) Flowchart of experimental conditions and data attainment. Colour gradation indicates the concentration gradient of the pure chemical compound used in the medium combinations. (B) Concentration …

Figure 1—figure supplement 1
Variations in concentration gradients of the components.

The number of varied concentrations used in the medium combinations is indicated for individual components.

Figure 1—figure supplement 2
Experimental tests of the changes in growth rate (r) responding to the concentration gradients of the minor compounds.

The product names of the compounds and the logarithmic concentrations are shown. The three biological replicates are shown as blue dots. The red lines indicate the condition of M63 minimal medium.

Figure 1—figure supplement 3
Experimental tests of the relationship between saturated population density (K) and the minor compounds.

The product names of the compounds and the logarithmic concentrations are shown. The three biological replicates are shown as blue dots. The red lines indicate the condition of M63 minimal medium.

Figure 2 with 3 supplements
Bacterial growth profiling.

(A) Three growth parameters calculated from growth curves. The lag time (τ), the growth rate (r), and the saturated population size (K) are indicated. (B) Distributions of the three parameters. The …

Figure 2—figure supplement 1
Clustering of the medium combinations.

(A) Clusters of medium combinations. (B) Distributions of lag time (τ), growth rate (r), and saturated population size (K) coloured by the four clusters.

Figure 2—figure supplement 2
Illustration of the fitness landscape.

Two fitness distributions in media A and B are shown as an example. Colour variation of circles represent the differentiation in cell populations.

Figure 2—figure supplement 3
Subculture of the E. coli cell population used in the growth assay.

The growth curves, the calculated growth rates, and maximal population size are shown from left to right panels. Blue, green, and yellow indicate the daily transfer of subculture for three times …

Figure 3 with 4 supplements
Evaluation of machine learning (ML) models.

(A) Workflow of ML. (B) Accuracy of the ML models. Boxplots of the evaluation metrics obtained in the ML prediction of growth rate are shown. The root mean squared errors (RMSEs) of five independent …

Figure 3—figure supplement 1
Accuracy of the machine learning (ML) and multiple regression models.

Boxplots of five different evaluation metrics, i.e., coefficient of determination (R2) (A), mean absolute error (MAE) (B), mean squared error (MSE) (C), explained variance score (D), and root mean …

Figure 3—figure supplement 2
Time required for the machine learning (ML) model training.

The time used to train the ML models by the supercomputer is shown in the boxplots of five independent replicates.

Figure 3—figure supplement 3
Effect of the abundance of training data on the accuracy of gradient-boosted decision tree (GBDT).

10–90% of the big data (the growth rate dataset) were randomly selected for model training. The accuracy of the growth rates predicted by the trained GBDT model was evaluated by coefficient of …

Figure 3—figure supplement 4
Accuracy of the machine learning ML models varied with the experimental errors of the data for training.

Five ML models and the number of data points used in common are indicated. The experimental errors caused by the biological replications are shown as the coefficient of variance (CV).

Figure 4 with 4 supplements
Contribution of the components to bacterial growth.

(A) Relative contributions of the components to the three parameters predicted by gradient-boosted decision tree (GBDT). 10 components with large contributions to the three parameters of lag time (τ)…

Figure 4—source data 1

Summary of the feature importance of the components for τ, r, and K.

https://cdn.elifesciences.org/articles/76846/elife-76846-fig4-data1-v1.xlsx
Figure 4—figure supplement 1
Violin plots of the growth parameters at varied ranges of chemical concentrations.

The chemical components with the largest contributions to the three parameters lag time (τ), growth rate (r), and saturated population size (K) are shown individually. The concentration gradients of …

Figure 4—figure supplement 2
Separation of the multimodal distribution of growth rate (r).

The continuous probability distribution of the multimodal distribution of r (A) determined by Gaussian kernel density estimation is indicated by the red lines. The two separated distributions …

Figure 4—figure supplement 3
Separation of the multimodal distribution of saturated population size (K).

The continuous probability distribution of the multimodal distribution of K (A) determined by Gaussian kernel density estimation is indicated by the red lines. The two separated distributions …

Figure 4—figure supplement 4
Separation of the multimodal distribution of lag time (τ).

The continuous probability distribution of the multimodal distribution of τ (A) determined by Gaussian kernel density estimation is indicated by the red lines. The two separated distributions …

Figure 5 with 5 supplements
Sensitivity of the components.

(A) Definition of sensitivity. As an example, the upper and bottom panels indicate the regression curve across the concentration gradient of glucose and the normalized regression curve, in which …

Figure 5—figure supplement 1
Schematic drawing of the analytical procedure of S.

The sensitivity of growth rate (r) for a given chemical component (Chemical 1) is shown as an example. r1,max and Sr,1 (i=1–6) represent the maximal growth rate and the normalized space area in …

Figure 5—figure supplement 2
Normalized regression curves of the growth rates (r).

Colour variation indicates the alternative combinations of the other 40 components.

Figure 5—figure supplement 3
Normalized regression curves of the saturated population size (K).

Colour variation indicates the alternative combinations of the other 40 components.

Figure 5—figure supplement 4
Normalized regression curves of the lag time (τ).

Colour variation indicates the alternative combinations of the other 40 components.

Figure 5—figure supplement 5
Sensitivity of the components.

The values of Sτ, Sr, SK, Sg, and Vs are shown.

Growth strategy of risk diversification.

(A) Schematic drawing of the decision-making components for bacterial growth. (B) Flux balance analysis (FBA) simulation. The predicted growth rates are plotted against the input rate of sulfate …

Correlation of the growth parameters.

(A) Density plots of the three parameters. Pairs of the three parameters lag time (τ), growth rates (r), and saturated population size (K) are plotted as dots. The colour bars indicate the numbers …

Figure 8 with 1 supplement
Relative abundance of branched-chain amino acids (BCAAs) in growing E. coli.

(A) Chromosomal distribution of 20 amino acids (AAs). The numbers of 20 AAs coded into the proteins are indicated with the upward vertical bars at the chromosomal positions of the corresponding …

Figure 8—figure supplement 1
Relative abundance of 20 amino acids (AAs) in growing Escherichia coli.

The boxplots represent the relative ratios estimated according to the genome and transcriptome information, and the red lines indicate the theoretical ratios of the 20 AAs.

Additional files

Download links