Training of a deep learning algorithm to identify single isolated bacteria cells and SEPT7 associated S. flexneri
(A) Table summarizing the annotated dataset used for the training process. The labels of the two classification classes used were “Single bacteria” and “Clump”. The annotated images were randomly split into two groups, one used as training dataset (comprising 80% of the images) and another used as validation dataset (with the remaining 20%). A separate annotated dataset was used for testing. (B) Representative Airyscan images of bacteria used for each class. Scale bar, 1 µm. (C) Architecture of the deep learning algorithm used for training. Each raw describes the characteristics of the sequential transformation steps applied. Conv2D stands for 2D Convolution. (D, E) Accuracy and Loss as metrics to represent the training process over subsequent epochs (entire passing of the training data through the algorithm). Accuracy increases and loss decreases for both training and validation datasets, indicating a good fit of the model to the data. Vertical dashed grey line indicates Early Stopping, or epoch value for the minimum validation Loss. (F) Confusion matrix performed on an annotated test dataset not used previously for the training or validation process, indicating the percentage of predictions that were correct or wrong for each class. (G) Precision, recall and f1 score as metrics that summarize the performance of the model or classifier. (H) Table summarizing the annotated dataset used for the training process. The labels of the two classification classes used were “Septin” and “Negative”. The annotated images were randomly split into three groups, one used as training dataset (comprising 75% of the images), another used as validation dataset (with 15%) and a last one used as test dataset (with 10%). Due to the data being very imbalanced (15% natural frequency of SEPT7 associated bacteria), the images comprised in the Negative class were under-sampled as indicated in the table. (I) Representative Airyscan images of bacteria used for each class. Scale bar, 1 µm. (J) Architecture of the deep learning algorithm used for training. Each raw describes the characteristics of the sequential transformation steps applied. Conv2D stands for 2D Convolution, Batch_norm stands for Batch normalisation. (K, L) Accuracy and Loss as metrics to represent the training process over subsequent epochs. Accuracy increases and loss decreases for both training and validation datasets, indicating a good fit of the model to the data. Vertical dashed grey line indicates Early Stopping, or epoch value for the minimum validation Loss. (M) Confusion matrix performed on the test dataset not used previously for the training or validation process, indicating the percentage of predictions that were correct or wrong for each class. (N) Precision, recall and f1 score as metrics that summarize the performance of the model or classifier.