A crowd of BashTheBug volunteers reproducibly and accurately measure the minimum inhibitory concentrations of 13 antitubercular drugs from photographs of 96-well broth microdilution plates

  1. Philip W Fowler  Is a corresponding author
  2. Carla Wright
  3. Helen Spiers
  4. Tingting Zhu
  5. Elisabeth ML Baeten
  6. Sarah W Hoosdally
  7. Ana L Gibertoni Cruz
  8. Aysha Roohi
  9. Samaneh Kouchaki
  10. Timothy M Walker
  11. Timothy EA Peto
  12. Grant Miller
  13. Chris Lintott
  14. David Clifton
  15. Derrick W Crook
  16. A Sarah Walker
  17. The Zooniverse Volunteer Community
  18. The CRyPTIC Consortium
  1. Nuffield Department of Medicine, University of Oxford, United Kingdom
  2. Zooniverse, Department of Physics, University of Oxford, United Kingdom
  3. Electron Microscopy Science Technology Platform, The Francis Crick Institute, United Kingdom
  4. Institute of Biomedical Engineering, University of Oxford, United Kingdom
  5. Citizen Scientist, c/o Zooniverse, Department of Physics, University of Oxford, United Kingdom
6 figures and 2 additional files

Figures

Figure 1 with 4 supplements
This dataset of 778,202 classifications was collected in two batches between April 2017 and Sep 2020 by 9372 volunteers.

(A) The classifications were done by the volunteers in two distinct batches; one during 2017 and a later one in 2020. Note that the higher participation during 2020 was due to the national …

Figure 1—figure supplement 1
Thank you to all the volunteers who contributed one or more classifications to this manuscript.

There are the 5810 usernames of all the volunteers in this montage – volunteers who did not register or sign in are not included.

Figure 1—figure supplement 2
The time spent by volunteers on each classification varied with a mode of 3.5 s.

Since one would expect different amounts of bacterial growth on the microdilution plates after (A) 7, (B) 10, (C) 14 and (D) 21 days the distributions of these were examined separately. All were, …

Figure 1—figure supplement 3
The time spent by volunteers on each classification varied depending on the drug being considered.

The mode of each distribution is labelled. The drug the volunteers spent the longest on (bedaquiline, mode 4.8 s) was also one of those with the largest number (8) of wells. As measured by its mode …

Figure 1—figure supplement 4
Every new user is shown this tutorial when they first join the BashTheBug Zooniverse project.

It uses example images to explain the task and then each of the options that they can choose to classify a drug image.

Figure 2 with 1 supplement
Heatmap showing how all the individual BashTheBug classifications (n=214,164) compare to the dilution measured by the laboratory scientist using the Thermo Fisher Vizion instrument after 14 days incubation (n=12,488).

(A) The probability that a single volunteer exactly agrees with the Expert +AMyGDA dataset varies with the dilution. (B) The distribution of all dilutions in the Expert +AMyGDA dataset after 14 days …

Figure 2—figure supplement 1
Heatmap showing how all the individual BashTheBug classifications (n=214,164) compare to the set of dilutions where the measurement made by the laboratory scientist using the Thermo Fisher Vizion instrument and a mirrored box after 14 days incubation concur (n=9402) (A).

The probability that a single volunteer exactly agrees with the Expert dataset varies with the dilution. The distribution of all MIC dilutions after 14 days incubation read by (B) laboratory …

Figure 3 with 1 supplement
Taking the mean of 17 classifications is ≥95% reproducible whilst applying either the median or mode is ≥90% accurate.

(A) Only calculating the mean of 17 classifications achieves an essential agreement ≥95% for reproducibility International Standards Organization, 2007, followed by the median and the mode. (B) …

Figure 3—figure supplement 1
Taking the mean of 17 classifications is ≥95% reproducible whilst none of the methods reach have an essential agreement for accuracy of ≥90% when using the Expert dataset.

(A) Only calculating the mean of 17 classifications achieves an essential agreement ≥95% for reproducibility International Standards Organization, 2007, followed by the median and then the mode. …

Figure 4 with 5 supplements
Reducing the number of classifications, n, used to build the consensus dilution decreases the reproducibility and accuracy of the consensus measurement.

(A) The consensus dilution becomes less reproducible as the number of classifications is reduced, as measured by both the exact and essential agreements. (B) Likewise, the consensus dilution becomes …

Figure 4—figure supplement 1
Reducing the number of classifications, n, used to build the consensus dilution decreases the reproducibility and accuracy of the consensus measurement.

(A) The consensus dilution becomes less reproducible as the number of classifications is reduced, as measured by both the exact and essential agreements. (B) Likewise, the consensus dilution becomes …

Figure 4—figure supplement 2
Altering the number of days incubation does not markedly affect the observed trends in reproducibility.

Shown are results for the Expert +AMyGDA dataset after (A) 7, (B) 10, (C) 14 and (D) 21 days of incubation. A previous study (Rancoita et al., 2018) showed that optimal results were achieved after …

Figure 4—figure supplement 3
Altering the number of days incubation does not markedly affect the observed trends in accuracy.

Shown are results for the Expert +AMyGDA dataset after (A) 7, (B) 10, (C) 14 and (D) 21 days of incubation. A previous study (Rancoita et al., 2018) showed that optimal results were achieved after …

Figure 4—figure supplement 4
Segmenting the drug images by the mean amount of growth in the positive control wells (Figure 6—figure supplement 3) does not markedly affect the reproducibility of the three consensus methods.

The plates are split into those with (A) low (≤ 10 %) growth, (B) medium (10 < growth ≤) growth and (C) high (> 50 %) growth. The drug images from the Expert +AMyGDA dataset were used and the …

Figure 4—figure supplement 5
Segmenting the drug images by the mean amount of growth in the positive control wells (Figure 6—figure supplement 3) does not markedly affect the accuracy of the three consensus methods.

The plates are split into those with (A) low (≤ 10% %) growth, (B) medium (10 < growth ≤ 50 %) growth and (C) high (> 50 %) growth. The drug images from the Expert +AMyGDA dataset were used and the …

Figure 5 with 1 supplement
The reproducibility and accuracy of the consensus MICs varies by drug.

Consensus MICs were arrived at by taking the median of 17 classifications after 14 days incubation. The essential and exact agreements are drawn as red and green bars, respectively. For the former …

Figure 5—figure supplement 1
The reproducibility and accuracy after 14 days incubation of the 13 antibiotics on the UKMYC5 plate.

A total of 17 classifications were used for each measurement and either the mean or mode was used to obtain a consensus reading of the (A) reproducibility and (B) accuracy. The essential agreement …

Figure 6 with 5 supplements
Each UKMYC5 plate was read by an Expert, by some software (AMyGDA) and by at least 17 citizen scientist volunteers via the BashTheBug project.

(A) 447 UKMYC5 plates were prepared and read after 7, 10, 14 and 21 days incubation. (B) The minimum inhibitory concentrations (MIC) for the 14 drugs on each plate were read by an by Expert, using a …

Figure 6—figure supplement 1
The UKMYC5 plate contains 14 different anti-TB drugs.

A previous study (Rancoita et al., 2018) showed that para-aminosalicylic acid (PAS) performed poorly and it has been removed from the subsequent UKMYC6 plate design. We have therefore excluded this …

Figure 6—figure supplement 2
Although the retirement limit within the Zooniverse platform was set to 17, over 1800 images received more classifications than this and a small number were only classified 15 or 16 times.
Figure 6—figure supplement 3
The Expert +AMyGDA consensus dataset has the same distribution of bacterial growth in the positive control wells as the Expert dataset after 14 days incubation.

(A) The distribution of the mean positive control well growth, as measured by AMyGDA, for the Expert +AMyGDA dataset. The dataset is arbitrarily split into three categories: low (<10%), medium (10 ≤ …

Figure 6—figure supplement 4
The Expert +AMyGDA dataset has a greater proportion of drug images with low dilutions compared to the Expert dataset.

The growth of the bacteria is also evident as the number of days the sample was incubated for is increased.

Figure 6—figure supplement 5
The average bias per volunteer decreases with experience.

The average bias per volunteer, as defined by the difference between a volunteer’s reading and that from the Expert +AMyGDA dataset, is plotted against the total number of classifications done by …

Additional files

MDAR checklist
https://cdn.elifesciences.org/articles/75046/elife-75046-mdarchecklist1-v3.pdf
Supplementary file 1

A supplementary file containing a tables (a-i) is available online.

The majority of the tables in the supplemental file can also be reproduced using the accompanying jupyter notebook at https://github.com/fowler-lab/bashthebug-consensus-dataset; Fowler Lab, 2022.

https://cdn.elifesciences.org/articles/75046/elife-75046-supp1-v3.tex

Download links